Checking on the public folder backfill process


This goes somewhat hand in hand with my previous entry where I talked a little about how the backfill process works. You’ve set up a new public store, or changed the replica list for a specific folder. In either case, there’s a dance the server participates in to get everything caught up.


The problem is there’s no UI to expose when this dance is complete. There’s a page in System Manager which purports to give you information, but in actual fact, it’s not able to give you an accurate picture. More on this shortly.


When you add a new public store to your topology, it makes a discovery of what data is missing locally. See my previous article for some details and steps to make this process work more smoothly. The backfill process isn’t done until the new server has no more entries in its backfill array for this folder.


The what?


The backfill array is a list of ranges of change numbers known to be missing on this server. There’s an independent array for each replicated folder, including one for the hierarchy itself. As the system discovers missing data, more entries are added to this array. As that data arrives, entries are removed from the array. If the data doesn’t arrive in a timely manner, the system will send out email prodding other servers to send some along. A “timely manner” varies depending on the conditions, but ranges from as little as 15 minutes to as long as 48 hours. Once this array for a folder is empty, the replication status for that folder is “caught up”, which does not mean everything’s completely in sync. Unfortunately, there is no exposure of the backfill arrays in the UI, so there’s no way to know when they’re empty (using ESM).


WAIT! If the folder is “caught up” why isn’t it in sync?


Well, further changes may have been made on another replica for the folder and those changes have yet to be broadcast. ESM shows you the replication status as perceived by the server you’re asking. It does not and can not know if there are even more changes from another server en route or pending replication. It can know if there are local changes waiting to be sent out, but that’s about it. All ESM can reliably tell you is the relationship between the set of change numbers certainly present on this server as compared to each other server’s last report of what change numbers it has. Unfortunately, since not all servers are going to broadcast their current status information at any regular interval, the display in ESM can show incorrect information.


So what’s the bottom line? You need to turn up event logging for the public store and keep an eye on the flow of backfill messages. When they taper off, you’re probably pretty well caught up.


Dave Whitney

Comments (3)
  1. Al says:

    Between this and the last document, I have to wonder why the PF backfill process is not improved? It’s nice to know we can quickly jump on the store after it’s built to dismount it, but why? Why shouldn’t we be able to disable the process or why not be able to specify the store to be built from a more controlled copy specified during build? Heck, even something similar to IFM would be fine, but why in the world should the installer have to jump through fiery, glass studded hoops to install a new PF server? Just seems silly is all and having been bit by it on occassion, it is painful.

    The concept of having data displayed that "should" tell me when I’m in sync is misleading and needs to be revised. Any scoop on when this may be addressed (or if?)

  2. Dave Whitney says:

    Repairing the UI to properly indicate if you’re in sync or not is possible, but requires live network connectivity with all the other PF servers at the time of your query. The Replication Status page only displays what the queried server currently believes, based on recent replication traffic. It’s not possible for it to know "the truth" at the moment of query since that requires live communication with the other servers, which may not be possible (limited connectivity periods with remote offices, etc).

    We have considered changing setup to not create a public store by default, since a) it’s rarely desired and b) leads to the unfortunate backfill issues spelled out in my other blog. Unfortunately, there are technical requirements which require setup to create the store (specifically, there *must* be at least one pub per admin group). If there is already a pub in the AG, you can delete the new pub using ESM just after setup creates it. This will prevent the replication storm from happening. I have brought this up with the setup folks to see what we can do about setup always creating a pub.

    When creating a new public store, ESM asks you if you want to mount it right away. In this case, answer NO. Wait the prescribed amount of time spelled out in my other blog and then mount the new store.

    As for having it improved, well, it IS substantially improved. The new backfill picker in Exchange 2003 makes much better decisions on where to send the backfill request than even Exchange 2000 made. (Tempered, of course, on the availability of data used to make those choices.) The changes, in total, usually result in a new server being up and ready in hours instead of days, and substantially fewer cases of backfill requests going to the wrong end of the planet.

    There are a couple of things we could do to improve things even more, and I am giving it some thought.

  3. Anonymous says:

    Wanted to talk about a subject that is very often a source of questions, especially in our Support Services….

Comments are closed.