Hi folks, Ned Pyle here again. Back at AskDS, I used to write frequently about DFSR behavior and troubleshooting. As DFS Replication has matured and documentation grew, these articles dwindled. Recently though, one of the DFSR developers and I managed to find something undocumented:
A DFSR server upgrade where, despite perfect preseeding, files were conflicting during initial sync.
Sound interesting? Love DFSR debug logs? Have insomnia? Read on!
It began with a customer who was in the process of swapping out their existing Windows Server 2008 R2 servers with Windows Server 2012. They needed access to the new data deduplication functionality in order to save disk space; these servers were replicating files written in batches by an application; the files would never shrink or delete, so future disk space was at a premium.
The customer was following the DFSR replacement steps documented in this article. To their surprise, they found that after they reinstalled the operating system (i.e. Part 5, “reinstall or upgrade”), the new servers were writing DFSR file conflict event 4412 for many of the files during initial sync.
This was theoretically impossible, because their special application:
- Only wrote to a single server, not all replication nodes
- Never modified or overwrote existing files
Since this a new OS and the new dedup feature was in the mix, the initial concern was that scheduled dehydrations were somehow altering the files that DFSR had not yet completed examining for initial replication. Perhaps the files appeared different between servers, and DFSR was deciding to force existing files to lose conflicts. Even more interestingly though, when we examined the files using DFSRDIAG FILEHASH, the file hashes were identical:
- File Path: E:\rf1\1B\2B\0D\somefile.ned
- Windows Server 2008 R2 file hash: 6691A27E-030CEFC2-5234258D-3D812539
- Windows Server 2012 file hash: 6691A27E-030CEFC2-5234258D-3D812539
- After dedup optimization file hash: 6691A27E-030CEFC2-5234258D-3D812539
- After the conflict file hash: 6691A27E-030CEFC2-5234258D-3D812539
The only difference was the file attribute from the dedup reparse points as we would expect, and we knew Windows Server 2012 DFSR fully supports dedup and does not consider them differing files. The local conflicts were happening, in effect, cosmetically. It was pointless, and slowing initial sync slightly, but at least no data was being lost.
So why on Earth were we seeing this behavior?
We enabled DFSR debug logging’s most verbose mode and the customer performed a server replacement – we then waited to see our first conflict. What follows is a (greatly modified for readability) log analysis:
The sample downloaded file: somefile.ned:
DFSR is replicating in a file with the exact same name and path as an existing file on the downstream DFSR server:
DFSR decides to download it using RDC cross-file similarity:
It found similar files because the previous similarity info from the old Windows Server 2008 R2 replication still exists on the volume and DFSR was re-using it (more on this later):
DFSR decides that it’s going to use the file and checks to see if it is already staged (it’s not):
DFSR then stages the file and updates the hash and similarity information:
By doing this, DFSR also updates uidVisible, which is an indication that the file can replicate out (i.e. visible to other replicas). This makes sense because the file is in the similarity table and it therefore must have been staged in the past before, to be replicated out.
Now comes the turn to replicate in the “new” file that we are interested in, which is the same file with the same name, but of course a different UID (since when a server performs initial sync, it creates local UIDs for all the existing files). Its ID record has the uidVisible set to 1 and that leads to UidInheritEnabled returning FALSE:
This means that we can’t inherit the UID – and therefore cannot simply update the database and move on – because the file has “been replicated out” from DFSR perspective and must therefore be a unique file. Even though it really hasn’t – DFSR just assumes so, because how else would the similarity table already know about it? When DFSR goes through the download process, it finds out that we have same file with different UIDs on a file that has UID visible already:
Because of the different UIDs and the fact that the local one has UID visible already, DFSR generates the conflict:
But since the files are truly the same, the conflict doesn’t really matter. DFSR is just making a pointless conflict that writes an event, but which an end-user would never worry about because nothing is different in the winning file.
Why did we already have similarity?
This boils down to a by-design DFSR behavior: if it finds any old similarity files, it uses them. Those special sparse files live under the <volume>\system volume information\dfsr and are called:
The FileIdTable files act in conjunction with the SimilarityTable files, and contain the file info that matches with the similarity table’s signature data; that way cross-file can traverse the similarity table for matching signatures and then look up the matching file ID records.
This customer was doing the right thing and following our steps to remove the previous data, just as the blog posts state. However, since these were hidden files and the root DFSR folder was not deleted, they were skipped, leaving the old similarity table behind. Just a simple oversight (I have since reviewed the DFSR hardware migration article and downloads to make sure this is 100% clear in the steps).
The Sum Up
Like many issues with complex distributed computing systems like DFSR, the law of unintended consequences rules. When Windows Server 2003 R2 DFSR was first designed more than ten years ago, no one was thinking hard about DFSR pre-seeding or upgrading, of course.
Always make sure that you thoroughly delete previous DFSR configuration files when following the DFSR hardware and OS replacement steps, and everything will be swell.
Until next time,
– Ned Pyle