Recovering from Unsupported One-Way Replication in DFSR Windows Server 2003 R2 and Windows Server 2008.

Warren here again. The purpose of this blog post is to outline the proper steps to move from an unsupported one-way replication deployment when using DFSR from Windows 2003 R2 and Windows 2008 to a supported configuration of two-way replication.

Before we get started, here is some good news. Starting with Windows 2008 R2 we will now have Read Only replicas support in DFSR. Details of this feature can be found here:

https://blogs.technet.com/filecab/archive/2009/01/21/read-only-replicated-folders-on-windows-server-2008-r2.aspx

Recovering from One Way Replication

First let’s define some terminology so the rest of this post will make more sense.

  • Upstream server – For this article “Upstream” server will refer to the DFSR server that is able to replicate changes to its partner.
  • Downstream server – For this article the “Downstream server” will refer to the DFSR server that is not allowed to replicate changes to its partner.

Note : In practice any server in a multi-master replication implementation can be Upstream or Downstream. It all depends on who has the changes “Upstream” and where the change is getting replicated to “Downstream”.

  • RG – Replication Group - a set of servers, known as members, that participates in the replication of one or more replicated folders
  • RF – Replicated Folder - defines a set of files and folders to be kept in sync across multiple servers in a replication group

See these links for more information on RG’s and RF’s

https://msdn.microsoft.com/en-us/library/bb540026(VS.85).aspx

https://technet.microsoft.com/en-us/library/cc759803.aspx

What Not To Do

  1. You do not want to simply enable replication from the Downstream server to the Upstream server. That is a BAD idea. The reasons why are listed in the section below “Why One-way Replication is not Supported”. You may end up with very old data in your Replicated Folder and some upset users wondering why they are looking at last year’s report.
  2. Do not remove and re-add the Downstream server to the Replication Group again. It is a common misconception that doing this will trigger DFSR to perform an Initial Sync and Initial Replication.

What To Do

To successfully recover from a one way replication deployment we must force the downstream server to perform an Initial Synchronization and Initial Replication of the RF so that all queued changes on the Downstream server are discarded.

I will provide two methods for you to consider. These would be the ones I would use based on key considerations:

  1. How divergent is the data?
  2. How slow are the links?

Method 1 will generate some file replication traffic as the downstream member is most likely out of date for at least some of its data. If the data is fairly consistent then I would use this method

Method 2 will generate the least amount of file replication as you will be pre-staging a recent backup of the data taken from the Upstream server on the Downstream. I would use this method if the data is very different or the degree of divergence is unknown but suspected to be high. Also if the links are slow Method 2 is a good choice.

If you use method 2 make sure you read Ned’s blog post on the proper method of pre-staging files. If you pre-stage the files incorrectly you will generate more replication traffic than if you had just used Method 1.

https://blogs.technet.com/askds/archive/2008/02/12/get-out-and-push-getting-the-most-out-of-dfsr-pre-staging.aspx

Whatever you do, always backup the data on your Upstream and Downstream server before making any changes to your one way replication configuration. Backing up your data is a matter of best practice so you should be doing this nightly already. As with any setup of replication, the work should be done off hours or on a weekend when possible to minimize user interruption.

Patching

Make sure your DFSR servers patch levels are up to date per the KB Articles linked below before implementing any other changes. Do not skip this point.

2003 - https://support.microsoft.com/default.aspx?scid=kb;EN-US;958802

2008 - https://support.microsoft.com/default.aspx?scid=kb;EN-US;968429

Important Note: DFSR stores its configuration in AD. When changes are made to the DFSR configuration the update will take place on the DC that the DFSR server is connected to. Those changes are then replicated to all DC’s in the domain. DFSR will pick up the change on its own during its next poll of AD.

In the methods mentioned below all changes to the DFSR configuration will be made on the Upstream server. We will then force AD replication and finally force a poll of AD on the DFSR servers. The steps to do this is listed below and the referenced in both Methods 1 and 2.

Forcing AD Synchronization

a. To find out what DC a DFSR server is connected to, use WMIC.

WMIC /namespace:\\root\microsoftdfs path DfsrReplicationGroupConfig get LastChangeSource

image

b. To force a synchronization of AD so the changes to DFSR are replicated to all DC’s in the domain run this command on the DC returned in step A. i.e <dc name> = the DC returned in step A.

repadmin /syncall /d /e /P < dc name> <Naming Context>

image

c. On the Upstream and Downstream servers run this command. “DFSRDIAG Pollad”.

image

Tip: If you want to remote the pollad command you can by specifying the target server with the “/mem:” switch.

image

Method 1 – Disabling the Downstream server’s membership in the RF to force Initial Sync and Initial Replication when the membership is re-enabled

1. Get a full backup of the Replicated Folder(s) from the Upstream server.

2. If there are changes you want to keep on the Downstream server, back them up now.

Note: Any files that are different or unique on the downstream will be moved to the ConflictAndDeleted or pre-existing directories. The data in ConflictAndDeleted will be permanently removed if the quota is reached. The quota by default is 660 MB. Make sure you backup the Downstream if there is any data there you want to keep. See https://technet.microsoft.com/en-us/library/cc782648.aspx

3. On the Upstream sever open the DFSR management snap-in and highlight the Replication Group that has the affected Replicated Folder. Make sure you are on the Memberships Tab. Right click the Downstream server and select Disable.

image

Figure 1 - Disabling the Downstream servers membership in the Replication Group

You will get prompted for verification that you want to disable the membership of the server. Depending on if the RF is published in a DFS Namespace or not you will get a different set of prompts.

If your RF is not published you will see only the prompt in Figure 2

Select yes.

image

Figure 2 - Disable Membership prompt when the Replicated Folder is not published in a DFS namespace

If your RF is published in a DFS Namespace you see the prompts in Figure 3 and 4.

Click yes and Ok.

image

Figure 3 First popup you will get disabling membership on an RF that is published via a DFS namespace

image

Figure 4 Second popup that you get disabling membership on a RF that is Published via a DFS Namespace

4. On the Upstream server use the DFS Management console to enable the connection from the Downstream server to the Upstream server. This is done on the Connections Tab. If there is no connection from Downstream to Upstream create it at this time.

image

Figure 5 - Enabling the connection from Downstream to Upstream

5. On the Upstream find out what DC it is connected to -Step A in Forcing AD Synchronization.

6. Force a synchronization of AD so the changes to DFSR are replicated to all DC’s in the domain. Step B in Forcing AD Synchronization.

7. Force the Upstream and Downstream servers to poll AD. Step C in Forcing AD Synchronization.

Once the downstream server detects the change to its membership the downstream server will then log events 4114 and 4008. Once you confirm these events are logged you can proceed to step 8. Event 4114 informs the admin that the data in the in the RF will be seen as pre-existing. The data will be treated like any other pre-staged data during Initial Sync and Replication when its membership is re-enabled.

image

Figure 6 Event ID 4008 logged

image 

Figure 7 Event ID 4114

 

 

 

 

 

 

 

 

 

 

8. Enable the membership of the Downstream server in the Replicated Folder using the DFS Management snap-in. This is located on the Memberships Tab.

image

Figure 8 - Enabling the membership of the Downstream server in the Replication Group.

Depending if your RF is published in a DFS Namespace or not you will a get set of different prompts.

If the RF is not published in DFS you will get prompted with the dialog box in figure 9. Verify the path and click OK.

image

Figure 9 Enabling Membership on a server where the RF is not published via DFS

If your RF is published in a DFS namespace you will get the popup in figure 10. You will not be able to click OK until you set the share permissions and share name by clicking on the Edit button. Clicking in the Edit button will bring up the popup in figure 11. Set the perms and share names as needed (defaults are fine if suitable). The extra prompts are presented when the RF is published because the member is being added again as a target for the folder in the DFS namespace.

image

Figure 10 First popup presented when enabling membership on an RF that is published in DFS.

image

Figure 11 Second popup displayed when enabling membership on an RF that is published in DFS

9. Repeat steps 5-7. After those steps are done the Downstream server will log Event 4102 when Initial Replication begins and 4104 when it is complete. You will also more than likely see 4412 events due to the servers having different versions of some files. If the data is very different on the servers you will see a large amount of these.

image

Figure 12 Event ID 4102

image

Figure 13 Event ID 4412 “Conflict Event”

image

Figure 14 Event 4104

10. Once you see event ID 4104 on the Downstream server for the replicated folder(s), you are finished with setting up two-way replication.

Method 2 – Total Recreation of the RG using pre-seeded data on the Downstream server.

1. On the Upstream server get a full Backup of the replicated folder.

2. Ship the Backup to the Downstream server’s site.

3. On the Upstream server delete the Replication Group.

4. On the Upstream server find out what DC is connected to. – Step A in Forcing AD Synchronization.

5. Force a synchronization of AD so the changes to DFSR are replicated to all DC’s in the domain. – Step B in Forcing AD Synchronization.

6. On the Upstream and Downstream servers force them to PollAD. – Step C in Forcing AD Synchronization.

7. On both the Upstream and Downstream server s you will log events 4010 and 3006 noting that the RF and RG have been removed from the configuration:

image 

Figure 15 Event ID 4010 logged when a RF is removed from a RG

image

Figure 16 Event ID 3006 Logged when a RG is removed

8. Pre-seed the backup on the downstream server. See Ned’s blog post on how to do this correctly: https://blogs.technet.com/askds/archive/2008/02/12/get-out-and-push-getting-the-most-out-of-dfsr-pre-staging.aspx. In my experience most pre-seeding attempts that fail are due to mismatched NTFS permissions. The entire directory tree must have matching ACLs or the files will be different and file replication will occur. See Ned’s post for options on how to get ACLS to match. (I like icacls.exe but there are more options)

9. On the Upstream server recreate the Replication Group and Replicated Folder(s), specifying that the Upstream server is the Primary server for the content. Make sure to set your staging areas to the largest size possible up to the size of the RF if possible.

10. Repeat Steps 4-6. The upstream server will log event 4112. The Downstream server will log 4102 when initial replication begins and 4104 when it is finished:

image

Figure 17 Event ID 4112 logged on the Primary Member of a new RF.

image

Figure 18 Event ID 4102 on the downstream when the RF is initialized

image

Figure 19 Event 4104 on the downstream when Intial Replication is finished.

11. Once you see the event ID 4104 on the downstream server Initial Replication is done and you are finished with your task.

Why One-way Replication is not Recommended or Supported

Here is a snippet from the blog post by DFSR PM Mahesh Unnikrishnan covering the reasons why it is not recommended nor supported to use one way replication in DFSR in Windows 2003 R2 and Windows 2008. The full post can be found here:

https://blogs.technet.com/filecab/archive/2007/08/16/using-one-way-connections-in-dfs-replication.aspx

“We recommend that customers avoid configuring such one way connections to the extent possible since:

a) The DFS Replication service’s conflict resolution algorithms are severely hampered if the outbound connection from a member server is deleted (or disabled). Therefore, scenarios where the DFS Replication service is unable to over-write undesired updates occurring on the ‘read-only’ member server with the authoritative contents of the hub/datacenter server may arise.

b) Accidental deletions on the ‘read-only’ server (in this case, site server ‘design.contoso.com’) could cause issues with the replication updates being trapped on that server. Further, as described above, updates from the authoritative server can potentially not be applied since the parent folder could have been deleted locally. Therefore, with time it is possible to see substantial divergence in the contents of the replicated folders across all replication member servers.

c) Problems with the deployment are difficult to detect without regular and meticulous monitoring. There might be a lot of false positives in the health report and system eventlogs owing to the fact that the replication topology is being set up to do something DFSR wasn’t designed to handle. Mining through these false positives and monitoring servers can be a challenge.

d) Administrators need to develop their own scripts to identify which files are backlogged on the ‘read-only’ member (in this case site server ‘design.contoso.com’) and replicate authoritative content back to that ‘read-only’ site server. This can be quite tricky to get right and might need a lot of very close monitoring (perhaps, at times on a per-file basis). Microsoft does not supply any tools for this purpose.

e) There is a risk of administrators inadvertently creating the missing connection and causing backlogs to flow to and corrupt the contents of an authoritative server. With these changes getting replicated out further from the authoritative server, the contents of the replicated folder could get out of sync and corrupt on all replication member servers very quickly.

“Please note that configuring one way connections is not a configuration supported by Microsoft Product Support Services.”

Reducing Unwanted Changes

To reduce unwanted changes at certain servers you have a few items in your arsenal. These work for UNC connections.

1. If it is not necessary to share the data on some servers, don’t share it. DFSR does not require that a replicated folder be shared.

2. Set Share permissions (not NTFS permissions) to read only.

3. If you never want anyone accessing the data through the DFS Namespace, disable the referral for that target or delete the server as a folder target.

Hopefully you will never find yourself in the situation where you need to use the information in this blog post.

- Warren “Don’t Call Me Warren” Williams