When troubleshooting public folder replication one should always keep in mind that the replication process is message based.
If you find that you have some problems with the replication of public folders then your troubleshooting efforts should always start with these 2 awesome posts by Bill Long: http://msexchangeteam.com/archive/2006/01/17/417611.aspx and http://msexchangeteam.com/archive/2006/01/19/417737.aspx .
They are pretty comprehensive and they cover almost all the troubleshooting techniques you will be likely to apply when dealing with public folder replication issues.
I’ve said ‘’almost all’’ because recently I have just found another technique that proved to be pretty useful in the following scenario:
Let’s say you have a public folder for which we have 2 replicas on 2 different servers. Server A has the up to date replica but it seems that we have trouble replicating some old content to the server B replica (I had almost 50 old posts that would not replicate).
New content replicates just fine between the servers and also when you turn up diagnostic logging for Replication Incoming and Replication Outgoing on the servers you notice that when you force replication, server B sends out a backfill request, server A responds with a backfill response that contains the items that don’t replicate but the response never makes it back to server B and those items are never processed, thus never replicated. Looking in at the SMTP queues won’t help much because they are clear on both servers.
It looked like the message carrying over the items that needed to be replicated from server A to server B just vanished. In my particular case, the only lead I got came from the message tracking center, which I’ve used to track the mail flow between the 2 public folder databases; looking at this, I've noticed that the backfill response that contained the items to be replicated never left server A cause it had ended up in a “Advanced Queuing Failed To Deliver Message” error.
Now comes the tricky part: as looking at a SMTP trace didn’t really help isolating the problem, we decided to use a very interesting feature of public folder replication in order to try to find out what was wrong whit this message – this feature is the Replication message size limit value.
You should know that when replicating public folder data between servers, Exchange packs multiple content changes into one large replication message that is sent over to the replication partner server. By default we pack changes in chunks of 300 KB, thus composing 1 replication message that is ready to be transferred out (more info on this at http://technet.microsoft.com/en-us/library/aa997291(EXCHG.65).aspx ).
We decided to play on this feature by lowering the Replication Message Size Limit on the Public Folder store of server A to 1 KB, hoping that we will in this way send out one content change (or public folder post) per replication message. The goal was to see if by any chance we weren’t dealing with a corrupt post that would obstruct our replication process by rendering the whole replication message corrupt. This technique also should allow us to precisely identify the corrupt item because in the end, it would be the only one that would still not replicate.
So instead of packing multiple content changes (public folder posts) into one 300 KB replication message, we decided to force Exchange to replicate each content change (public folder post) one by one.
The result: we got all the posts replicated to server B except one particular item that would still not replicate. This was our corrupt item ( aka bad guy) that prevent us replicated another 50 or so valid posts only because the whole replication message was saw as corrupt by Exchange and dropped as a result.So, in conclusion, remember: when you are troubleshooting public folder replication and you feel like nothing seems to help, you might also want to try this little trick. In the end it might save the day.