Recovering Public Folders After Accidental Deletion (Part 2: Public Folder Architecture)

The_Exchange_Team · ‎Feb 08 2012

Introduction

In Part 1 of this series, I explained how to safely recover accidentally deleted public folders from backup. I briefly mentioned some important public folder concepts in that article, and in this, the second part, I’m going to describe some of the inner workings of public folders themselves. Each organization maintains a list of all public folders in the environment, as well as the locations of all replicas. This list is called the hierarchy, and it's common to all public folder stores in the environment. The hierarchy lists all public folders in the environment as well as which servers host replicas of each folder. Each public folder store has a copy of the hierarchy, and uses it to provide referrals to end users for public folder replicas on other servers (among other things). Each public folder store also maintains a table, called the replication state table, which keeps track of the status of each folder. This table is a critical yet little understood feature of public folders, and it has a huge impact on recovery.

Overview

As I said above, each public folder store maintains a replication state table, but unlike the hierarchy, it's unique to each store. A public folder store maintains information about the public folders for which it has a replica, not just for itself but for all servers with that replica. It does this so that it knows which other stores have more up-to-date public folder content, or which ones might have items required for backfill replication (catching up on old or missing items).

Imagine the following scenario: we have three servers, each hosting a public folder database – PFS1, PFS2, and PFS3. We have a folder – Folder1 – which is replicated to each database. If I could peer into the replication state table for PFDB1, I would see an entry for Folder1, and that entry would contain information about Folder1's status not on for PFS1, but also for PFS2 and PFS3. What kind of information does this table actually contain? To answer that, we need to dig yet further into public folder structure, and talk about CNs.

Change Numbers

CNs – or, to give their full name, change numbers – are numbers assigned to each modification made to content in a public folder. Think of them as per-folder odometers – they increment each time a change is made to a folder, and only increase, never decrease. Each public folder assigns CNs to the changes made on a given replica, and that information is transmitted to other replicas. These other replicas use this information to see if they've already received a particular change. For example, if I make a change to Folder1 on PFS1, that database might assign change number 211 to that modification. When the public folder database replicates that change to other databases, it records and transmits that change as FID1-123:PFS1:211. [Folder1 is represented within the public folder database, and by extension in the replication traffic, by a folder ID (FID). This becomes very important later.] PFS2 receives the replication message and checks to see if it has already received CN 211 from PFS1. If it hasn't, it applies the change and updates its own entry in the replication state table to reflect the fact that it has now received change 211 for Folder1 (FID1-123) from PFS1. If PFS3 later replicates the same change (FID1-123:PFS1:211) to PFS2, PFS2 will check its list, see that it has indeed already received that change, and discard that particular replication message.

Here’s a sample hierarchy replication message. Notice the CN min, CN max, and FID entries in the description field.

Event Type: Information
Event Source: MSExchangeIS Public Store
Event Category: Replication Outgoing Messages
Event ID: 3018
Description:
An outgoing replication message was issued.
Type: 0x2
Message ID: <23599A0EB070AA92F03E31C546C9C8FFA4F7@contoso.com>
Database "PFDB"
CN min: 1-11D3, CN max: 1-11D4
RFIs: 1
1) FID: 1-38BF, PFID: 1-1, Offset: 28
IPM_SUBTREE\TestPF

At any given time, each public folder store knows exactly what content it has, and has a general idea of what content the other public folder stores have. This is an important point - public folder databases are aware of their environment surroundings. It's this awareness that has implications for recovery.

The Replication State Table

Here’s a quick visualization of how a public folder change is propagated from one server to another. This table simulates the replication state table which is internal to every server. There are four columns – the first represents the replication details (the CNsets), and the next three represent the same folder on each of the three servers. In essence, this table shows you what each server knows about other server’s knowledge of this particular folder. Please note that this is a simplified version of the replication state table – it’s actually quite a bit more complicated than this, but this is all the detail 99.99% of engineers will ever need.

In this example, Folder1 has been replicated to three systems – PFS1, PFS2, and PFS3 – and public folder replication is fully up-to-date. The servers know what they’ve sent to their replication partners, and what’s been replicated back to them. Since end users could conceivably make updates on any of the servers, they each have their own CN sets for the same folder.

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3
PFS1	Last sent CN PFS1:10	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS2:20	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS3:30

An end user connected to PFS1 makes a change, which PFS1 assigned change number 11. The replication state table on PFS1 is updated to reflect this new CN.

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3
PFS1	Last sent CN PFS1:11	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS2:20	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS3:30

PFS1 packages this change (which we assume is the only one made to Folder1) and sends it to PFS2 and PFS3, which update their own replication state tables.

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3
PFS1	Last sent CN PFS1:11	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS2:20	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS3:30

Both PFS2 and PFS3 apply the changes, and since those two systems received the change from PFS1, they also update their “knowledge” of PFS1. Notice that PFS1 does not update its entries for PFS2 and PFS3 – while it has sent the content to them, it hasn’t received confirmation that they’ve applied that change. [Because public folder replication messages are delivered via Hub Transport, public folder stores don’t directly interact and so never assume that the updates were delivered and applied.]

Continuing with our example, an end user makes a change to Folder1 on PFS3:

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3
PFS1	Last sent CN PFS1:11	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS2:20	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS3:31

That change is now replicated to PFS1 and PFS2:

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3
PFS1	Last sent CN PFS1:11	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	Last sent CN PFS2:20	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-31	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-31	Last sent CN PFS3:31

Note that when PFS3 sent out its replication message, it included not only its own update, but also the fact that it had received update 11 from PFS1.

Again, while every server has the most up-to-date content for Folder1, they don’t necessarily know that every replica is up-to-date. [PFS1, for example, “thinks” that PFS2 is out of date, while PFS3 “thinks” that both PFS1 and PFS2 are out of date.] It’s important to note that this isn’t a problem – by only encapsulating status messages in outgoing replication, Exchange avoids saturating the network with constant messages from various servers confirming the receipt of recent replication messages.

Backfill Replication

However, from time to time, a server loses its connection to its replication partners, either through network failure, service failure, or other causes. When it does, its replication state table no longer receives updates to the CNs held by its partners for their replicas. In other words, its replication state table is outdated. When that server reconnects with its partners, and receives a new message, it may find that the CN on that new message is much higher than what it expected. Using the previous example, imagine that PFS3 is isolated from PFS1 and PFS2 due to a server failure, and does not receive updates to Folder1 from the other servers for several hours. The resulting table might look like this:

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3 (OFFLINE)
PFS1	Last sent CN PFS1:16	FID1-123:PFS1:1-16 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-16 FID1-123:PFS2:1-28 FID1-123:PFS3:1-30	Last sent CN PFS2:28	FID1-123:PFS1:1-10 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-31	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-31	Last sent CN PFS3:31

Notice that PFS1 is aware that the most recent replication message from PFS2, for change number 28, also included information about PFS2’s knowledge of PFS1 (namely, that PFS2 receives PFS1’s update numbers 12 to 16). PFS3 has not received any of these recent updates.

However, when PFS3 is brought back online, and receives a new replication message, it suddenly learns of the missing messages. This triggers a backfill request– a request from PFS3 to the source server for the missing content.

Details From	Folder1 on PFS1	Folder1 on PFS2	Folder1 on PFS3
PFS1	Last sent CN PFS1:17	FID1-123:PFS1:1-17 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30	FID1-123:PFS1:1-11, 17 FID1-123:PFS2:1-20 FID1-123:PFS3:1-30
PFS2	FID1-123:PFS1:1-16 FID1-123:PFS2:1-28 FID1-123:PFS3:1-30	Last sent CN PFS2:28	FID1-123:PFS1:1-16 FID1-123:PFS2:1-28 FID1-123:PFS3:1-30
PFS3	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-31	FID1-123:PFS1:1-11 FID1-123:PFS2:1-20 FID1-123:PFS3:1-31	Last sent CN PFS3:31 Backfill Request PFS1:12-16 Backfill Request PFS2:21-28

Notice that PFS3 is missing updates 12 through 16 for PFS1, and 21 through 28 for PFS2. PFS3 will request the missing content from any server that it believes has that content, which in this case would mean either PFS1 or PFS2. How does PFS3 know that both servers have the content? Because the replication message from PFS1, which included change number 17, included the information about the CN sets for PFS1, PFS2, and PFS3.

Strictly speaking, Exchange doesn’t issue these backfill requests right away – it waits a few hours (six or more, depending on the situation) before sending them out, just in case one of its replication partners happens to send that missing content. If a specific update hasn’t been received after the backfill timeout is reached, Exchange then generates that backfill request and sends it to the replication partners. This process is detailed in the “Backfill Requests and Backfill Messages” section of the TechNet page on “Understanding Public Folder Replication” at http://technet.microsoft.com/en-us/library/bb629523.aspx#Backfill.

Removing or Deleting Replicas

When you remove a public folder replica, the owning public folder database contacts all other database to find out if they have all of the content that's contained within the replica that's about to be removed. It does so by sending out a status message that contains the CNs for its replica of the folder. For example, if I were to remove the replica of Folder1 from PFS3, it would send a message to PFS1 and PFS2 confirming that between the two of them, they have every update from PFS3 from 1 to 31. [This is an important point: the content doesn’t need to be on one server. As long as the content exists somewhere in the organization, the replica can be removed.] If PFS3 had any unique content that neither PFS1 nor PFS2 had, it would replicate those items to its replication partners. Once it has confirmed that it no longer has any unique content, the public folder store removes that replica.

However, when you delete a public folder outright (as in, remove all replicas), there's no need to preserve content, so it's deleted from every public folder store. This is why it’s vital that public folder administrators understand the difference between removing a replica (with Set-PublicFolder -Replicas) and deleting a public folder (with Remove-PublicFolder).

These changes to replica lists and outright deletions are transmitted just like any other public folder change – as hierarchy replication messages, complete with their own CNs. If I remove the replica of Folder1 from PFS1, that change will go to PFS2 and PFS3 so that they know that they no longer need to replicate new content for Folder1 to PFS1. Likewise, if I delete Folder1, it will be deleted from all of the databases and removed from the hierarchy as well. The replication state table keeps track of changes to hierarchy too, and so knows which folders exist in the organization and which don't. It is this tracking mechanism that prevents us from simply restoring a public folder database and reintroducing the deleted folders into the environment.

Recovery of Deleted Public Folders

In part one of this blog, I outlined a process for safely and successfully restoring public folders which were accidentally deleted from the environment. Step six of the procedure reads, in part, “Copy each of the folders you wish to restore. [Although the new folders will have similar names to the originals, the underlying folder IDs (FIDs) are different.]” I’ve added italics to highlight the key point – when you copy (clone) public folders, you’re really creating new folders. They may bear the same name as the originals, but the folder IDs are different. So although my cloned copy of Folder1 may look like the original Folder1, and contain the same items as Folder1, none of the replication messages for the original Folder1 will apply to it, because it’ll have a completely different FID. This new folder is added to the hierarchy, and because end users see the name, not the FID, they’ll simply use it as they would the original folder.

Troubleshooting Replication

If you’re looking for troubleshooting information, look no further than Bill Long’s excellent four-part blog series on public folders:

Public Folder Replication Troubleshooting – Part 1: Troubleshooting the Replication of New Changes (http://blogs.technet.com/b/exchange/archive/2006/01/17/417611.aspx)
Public Folder Replication Troubleshooting – Part 2: Troubleshooting the Replication of Existing Data (http://blogs.technet.com/b/exchange/archive/2006/01/19/417737.aspx)
Public Folder Replication Troubleshooting – Part 3: Troubleshooting Replica Deletion and Common Problems (http://blogs.technet.com/b/exchange/archive/2006/01/23/417974.aspx)
Public Folder Replication Troubleshooting - Part 4: Exchange Server 2007/2010 tips (http://blogs.technet.com/b/exchange/archive/2008/01/10/3404629.aspx)

Summary

Public folders use their own replication mechanism, where changes are tracked in an internal, non-editable table and communicated to replication partners alongside the actual content changes. The public folder hierarchy follows the same principles, and so changes made to the hierarchy are replicated to all public folder databases in the environment. Understanding the replication mechanism helps an administrator understand not only disaster recovery, but troubleshooting as well.

John Rodriguez
Principal Premier Field Engineer
Microsoft Premier Support

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs