Replacing DFSR Member Hardware or OS (Part 3: N+1 Method)

Article
09/08/2010

Hello readers, Ned here again. In the previous two blog posts I discussed planning for DFSR server replacements and how to ensure you are properly pre-seeding data. Now I will show how to replace servers in an existing Replication Group using the N+1 Method to minimize interruption.

Make sure you review the first two blog posts before you continue:

Background

As mentioned previously, the “N+1” method entails adding a new replacement server in a one-to-one partnership with the server being replaced. That new computer may be using local fixed storage (likely for a branch file server) or using SAN-attached storage (likely for a hub file server). Because replication is performed to the replacement server – preferably with pre-seeded data – the interruption to existing replication is minimal and there is no period where replication is fully halted. This reduces risk as there is no single point of failure for end users, and backups can continue unmolested in the hub site.

The main downside is cost and capacity. For each N+1 operation you need an equal amount of storage available to the new computer, at least until the migration is complete. It also means that you need an extra server available for the operation on each previous node (if doing a hardware refresh this is not an issue, naturally).

Because a new server is being added for each old server in N+1, new hardware and a later OS can be deployed. No reinstallation or upgrades are necessary. The old server can be safely repurposed (or returned, if leased). DFSR supports renaming the new server to the old name; this may not be necessary if DFS Namespaces are being utilized.

Requirements

For each computer being replaced, you need the following:

A replacement server that will run simultaneously until the old server is decommissioned.
Enough storage for each replacement server to hold as much data as the old server.
If replacing a server with a cluster, two or more replacement servers will be required (this is typically only done on the hub servers).

Repro Notes

In my sample below, I have the following configuration:

There is one Windows Server 2003 R2 SP2 hub (HUB-DFSR) using a dedicated data drive provided by a SAN through fiber-channel.
There are two Windows Server 2003 R2 SP2 spokes (BRANCH-01 and BRANCH-02) that act as branch file servers.
Each spoke is in its own replication group with the hub (they are being used for data collection so that the user files can be backed up on the hub, and the hub is available if the branch file server goes offline for an extended period).
DFS Namespaces are generally being used to access data, but some staff connect to their local file servers by the real name through habit or lack of training.
The replacement computer is running Windows Server 2008 R2 with the latest DFSR hotfixes installed, including KB2285835.

I will replace the hub server with my new Windows Server 2008 R2 cluster and make it read-only to prevent accidental changes in the main office from ever overwriting the branch office’s originating data. Note that whenever I say “server” in the steps you can use a Windows Server 2008 R2 DFSR cluster.

Procedure

Phase 1 – Adding the new server

1. Inventory your file servers that are being replaced during the migration. Note down server names, IP addresses, shares, replicated folder paths, and the DFSR topology. You can use IPCONFIG.EXE, NET SHARE, and DFSRADMIN.EXE to automate these tasks. DFSMGMT.MSC can be used for all DFSR operations.

2. Bring the new DFSR server online.

3. Optional but recommended: Pre-seed the new server with existing data from the hub.

Note: for pre-seeding techniques, see Replacing DFSR Member Hardware or OS (Part 2: Pre-seeding)

4. Add the new server as a new member of the first replication group.

Note: For steps on using DFSR clusters, reference:

Deploying DFS Replication on a Windows Failover Cluster – Part I

Deploying DFS Replication on a Windows Failover Cluster – Part II

Deploying DFS Replication on a Windows Failover Cluster – Part III

5. Select the server being replaced as the only replication partner with the new server. Do not select any other servers.

6. Create (or select, if pre-seeded) the new replicated folder path on the replacement server.

Note: Optionally, you can make this a Read-Only replicated folder if running Windows Server 2008 R2. Make sure you understand the RO requirements and limitation by reviewing: https://blogs.technet.com/b/askds/archive/2010/03/08/read-only-replication-in-r2.aspx

7. Complete the setup. Allow AD replication to converge (or force it with REPADMIN.EXE /SYNCALL). Allow DFSR polling to discover the new configuration (or force it with DFSRDIAG.EXE POLLAD).

8. At this point, the new server is replicating only with the old server being replaced.

9. When done, the new server will log a 4104 event. If pre-seeding was done correctly then there will be next to no 4412 conflict events (unless the environment is completely static there are likely to be some 4412’s, as users will continue to edit data normally).

10. Repeat for any other Replication Groups or Replicated folders configured on the old server, until the new server is a configured identically and has all data.

Phase 2 – Recreate the replication topology

1. Select the Replication Group and create a “New Topology”.

2. Select a hub and spoke topology.

Note: You can use a full mesh topology with customization if using a more complex environment.

3. Make the new replacement server the new hub. The old server will act as a “spoke” temporarily until it is decommissioned; this allows for it to continue replicating any last minute user changes.

4. Force AD replication and DFSR polling again. Verify that all three servers are replicating correctly by creating a propagation test file using DFSRDIAG.EXE PropagationTest or DFSMGMT.MSC’s propagation test.

5. Create folder shares on the replacement server to match the old share names and data paths.

6. Repeat these steps above for any other RG’s/RF’s that are being replaced on these servers.

Phase 3 – Removing the old server

Note: this phase is the only one that potentially affects user file access. It should be done off hours in a change control window in order to minimize user disruption. In a reliably connected network environment with an administrator that is comfortable using REPADMIN and DFSRDIAG to speed up configuration convergence, the entire outage can usually be kept under 5 minutes.

1. Stop further user access to the old file server by removing the old shares.

Note: Stopping the Server service with command NET STOP LANMANSERVER will also temporarily prevent access to shares.

2. Remove the old server from DFSR replication by deleting the Member within all replication groups. This is done on the Membership tab by right-clicking the old server and selecting “Delete”.

3. Wait for the DFSR 4010 event(s) to appear for all previous RG memberships on that server before continuing.

4. At this point the old server is no longer allowing user data or replicating files. Rename the old server so that no accidental access can occur further. If part of DFS Namespace link targeting, remove it from the namespace as well.

5. Rename the replacement server to the old server name. Change the IP address to match the old server.

Note: This step is not strictly necessary, but provided as a best practice. Applications, scripts, users, or other computers may be referencing the old computer by name or IP even if using DFS Namespaces. If it is against IT policy to use server names and IP addresses instead of DFSN – and this is a recommended policy to have in place – then do not change the name/IP info; this will expose any incorrectly configured systems. Use of an IP address is especially discouraged as it means that Kerberos is not being used for security.

6. Force AD replication and DFSR polling. Validate that the servers correctly see the name change.

7. Add the new server as a DFSN link target if necessary or part of your design. Again, it is recommended that file servers be accessed by DFS namespaces rather than server names. This is true even if the file server is the only target of a link and users do not access the other hub servers replicating data.

8. Replication can be confirmed as continuing to work after the rename as well.

9. The process is complete.

Final Notes

As you can now see the steps to perform an N+1 migration operation are straightforward no matter if replacing a hub, branch, or all servers. Use of DFS Namespaces makes this more transparent to users. The actual outage time of N+1 is theoretically zero if not renaming servers and performing the operation off hours when users are not actively accessing data. Replication to the main office for never stops, so centralized backups can continue during the migration process.

All of these factors make N+1 the recommended DFSR node replacement strategy.

Series Index

- Ned “+1” Pyle