DFSR Cloning: Attribute Recompute Causes Redistribute, Requiring Troubleshoot

Hi folks, Ned here again. We created DFSR Cloning in Windows Server 2012 R2 to make initial synchronization faster. Today I talk about file attributes, how they sneak inefficiency into your cloning process, and what can you do about them.

Let’s get to it.

Attributes, DFSR, and Cloning

Attributes are simply metadata on files and folders that describe special states, like hidden or read-only. They often don’t change the file in a meaningful way. DFSR handles attribute changes with a few methods.

If you change these attributes, DFSR does replicate the attributes to the other server:

  • FILE_ATTRIBUTE_HIDDEN
  • FILE_ATTRIBUTE_READONLY
  • FILE_ATTRIBUTE_SYSTEM
  • FILE_ATTRIBUTE_NOT_CONTENT_INDEXED
  • FILE_ATTRIBUTE_OFFLINE
  • FILE_ATTRIBUTE_REPARSE_POINT (note: it’s not that simple – see this treatise on what actually does and doesn’t work. Files with the IO_REPARSE_TAG_DEDUP, IO_REPARSE_TAG_SIS, or IO_REPARSE_TAG_HSM reparse tags are replicated as normal files)
  • FILE_ATTRIBUTE_SPARSE_FILE
  • FILE_ATTRIBUTE_DIRECTORY
  • FILE_ATTRIBUTE_COMPRESSED

If you change these attributes, DFSR does not trigger replication of the attributes to the other server (but if the file is altered in another way that does trigger replication, these attributes come along for the ride):

  • FILE_ATTRIBUTE_ARCHIVE
  • FILE_ATTRIBUTE_NORMAL

There is the FILE_ATTRIBUTE_TEMPORARY attribute (good call, Dragos!), which makes DFSR ignore the file.

This is normal, day-to-day replication – Bobby Joe in Accounting sets the quarterly report to hidden and read-only, then DFSR updates all the other replicated copies of that file with these attributes. This gets more interesting in DFSR Cloning. The cloning process bypasses the step of always exchanging file information between servers during initial sync, by simply providing the new server with a copy of the old server’s database. This means that when you perform Import-DfsrClone, we need to check the local copy of the preseeded data with whatever is in the imported database. If the file dates, file sizes, or file ACL differ, we know that someone messed with the file between the export and the import, at least on this destination server.

However, we also check the attributes – if they differ between the preseeded files and the database records, we consider the file mismatched. Any mismatched files replicate to the destination using the normal initial sync mechanism, after cloning completes. Unlike usual, this is a full file replication, not just the metadata.

In other words:

  1. If someone decides to change attributes on the files after they cloned the database, those files replicate again. Even if they are files with the Archive or Normal bit being changed.
  2. If someone changes attributes on the source copy of the files, and they are in the list above that trigger replication, those files are going to queue into the source server’s backlog, and replicate once you finish cloning. I.e. a little later, and metadata replication only, but you still pay a price.

#2 is out of your control, and frankly, who cares? You created a replication topology, there’s no worry about it actually replicating. #1 is avoidable. What could be changing these files? Here are some possible culprits:

  • Archive bit – Usually disappearing, giving the file the Normal bit. The Archive bit is an idiotic legacy that supposedly tells you a file has not been backed up. Windows Server stopped using it many years ago. If this attribute is changing, you are likely running third party backup software on the destination server. Update your software or yell at kindly ask your vendor why they are still using this junk bit instead of the correct USN journal methodology.
  • Hidden, read-only, compressed, and/or system bit – This is all you, buddy! Some application, some script, some automation, or – hopefully not – some individual user is changing things on the destination prior to cloning completion. Get out the ProcMon, I have no way to tell you who, when, or how.
  • Reparse point and sparse file bit – This is likely Windows Server Deduplication. Not that dedup is at fault; you or your colleagues did the dirty deed. When you run an optimization job in dedup to dehydrate files, they have to be marked for the chunk store. That mark means setting the sparse file attribute and a reparse point, in this case with the IO_REPARSE_TAG_DEDUP tag. When you decided to turn on dedup and alter all the files in the middle of your cloning operations, you altered their attributes – probably a lot of files too, dedup is good at its job. This is one of those “Doctor, it hurts when I do this” scenarios. Don’t do that.

Detecting It

That’s all fine, but how to tell if you are getting attribute-based cloning inefficiency? After all, when you look at your event logs after cloning, you only see a count of mismatches. Those could be anything:

image
Four? That’s not so bad.

To the debug logs!

First, look for lines that start with “[WARN] DBClone::IDTableImportUpdate Mismatch record was found”. For instance, here is our file with that warning:

20150616 15:00:01.037 2980 DBCL  4054 [WARN] DBClone::IDTableImportUpdate Mismatch record was found. Local ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:52:48.545 FileSizeLow:402432 FileSizeHigh:0 Attributes:128 Clone ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:52:48.545 FileSizeLow:402432 FileSizeHigh:0 Attributes:32 idRec:
+      fid                             0x100000000040D
+      usn                             0x0
+      uidVisible                      0
+      filtered                        0
+      journalWrapped                  0
+      slowRecoverCheck                0
+      pendingTombstone                0
+      internalUpdate                  0
+      dirtyShutdownMismatch           0
+      meetInstallUpdate               0
+      meetReanimated                  0
+      recUpdateTime                   20150616 21:46:35.473 GMT
+      present                         1
+      nameConflict                    0
+      attributes                      0x20
+      ghostedHeader                   0
+      data                            0
+      gvsn                            {DD1D3870-9087-4C4E-AF39-5BC3E8130504}-v1011
+      uid                             {DD1D3870-9087-4C4E-AF39-5BC3E8130504}-v1011
+      parent                          {D9832800-4DF7-4EE3-AB59-4A0F2765FD6A}-v1
+      fence                           Initial Primary (2)
+      clockDecrementedInDirtyShutdown 0
+      clock                           20150616 21:44:36.437 GMT (0x1d0a87da3b085e3)
+      createTime                      20150616 00:55:28.705 GMT
+      csId                            {D9832800-4DF7-4EE3-AB59-4A0F2765FD6A}
+      hash                            3D1F6474-928B8530-1E0A6559-5F02A2C7
+      similarity                      49DEA00D-B09FD001-00240600-00000000
+      name                            _no_one_ever_says_italy.ned
 

Look closely. The ACL hashes match, the write times match, and the files sizes match. But the attributes?

20150616 15:00:01.037 2980 DBCL  4054 [WARN] DBClone::IDTableImportUpdate Mismatch record was found. Local ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:52:48.545 FileSizeLow:402432 FileSizeHigh:0 Attributes:128 Clone ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:52:48.545 FileSizeLow:402432 FileSizeHigh:0 Attributes:32

Aha! To the Internet!

A 128 is FILE_ATTRIBUTE_NORMAL. Someone removed the archive bit. Ok, that’s not too bad. How about:

20150616 17:39:22.810  276 DBCL  4054 [WARN] DBClone::IDTableImportUpdate Mismatch record was found. Local ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:52:15.558 FileSizeLow:564224 FileSizeHigh:0 Attributes:34 Clone ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:52:15.558 FileSizeLow:564224 FileSizeHigh:0 Attributes:32 idRec:

To the calculator! We know that 32 is a file with the archive bit. 34-32=2. A value of 2 is FILE_ATTRIBUTE_HIDDEN. Starting to make sense?

I wonder why it’s hidden?

       ==============================================================

+      name                            _project_arcturus.ned
==============================================================

Ok, one more:

 
20150616 17:39:24.857  276 DBCL  4054 [WARN] DBClone::IDTableImportUpdate Mismatch record was found. Local ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:55:32.951 FileSizeLow:8225792 FileSizeHigh:0 Attributes:1568 Clone ACL hash:3D1F6474-928B8530-1E0A6559-5F02A2C7 LastWriteTime:20150605 16:55:32.951 FileSizeLow:8225792 FileSizeHigh:0 Attributes:32 idRec:
1568 = 1024+512+32. That’s FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_SPARSE_FILE, and the archive bit. This is a de-duplicated file.

Got the hang of it? Good. What a strange set of files names…

image
Call me Hank

To the wrap-up! At Microsoft these are often called “learnings”, which makes me want to hurtle across the conference table and shake the person until they admit that learnings isn’t a $%#^#^%& word.

Ahem.

The Lesson

Don’t monkey with file attributes on the destination server. For that matter, don’t changes any files in any fashion on the destination while cloning; this adds inefficiency. Letting users party on the downstream file server isn’t a good idea – you are about to enable replication and DFSR will reconcile the differences. If users on the destination alter files first, their changes will go buh-bye.

image 
What part didn’t you understand?

Not to mention the confusion if you start seeding data onto your destination while they accessed it. “Hey Martha, what’s with this empty share? Oh, now it has some files. And some more. And some more. Ok, I’m going to lunch.”

You don’t have to do anything if you run into attribute changes causing less efficient replication – DFSR will fix everything up by performing initial sync on just those files. Seeing a couple of mismatches is no reason to get in a twist. If a sizable number mismatch, however, you need to evaluate what’s going on and decide if fixing the issue and re-importing is going to save you time in the end.

I want to thank Jeroen de Bonte, Dutch Microsoft Support Engineer extraordinaire, for working with us these behaviors, which pointed out that we had no documentation on it. Good man.

Until next time,

  – Ned “Destitute and in Disrepute” Pyle