How to get the most from your FRSDiag…

Hello all, its Randy here again. The File Replication Service (FRS) is a technology used to synchronize data between several data shares on different computers and often in different sites throughout an organization. Any change made to the FRS data is updated on all the partners that share this replication. This is the technology that also manages the contents of SYSVOL on all the domain controllers in the domain. There are a lot of moving parts to FRS and you may need to troubleshoot why the contents are not consistent on all the SYSVOL servers.

FRSdiag is a .NET utility that can be used to gather diagnostics and troubleshoot FRS and can be run on any computer with administrative privileges against any computer running FRS. This utility can be downloaded at here. This report gathers data on all replica sets of which the target server is involved. This includes custom replica sets in the DFS namespace as well as SYSVOL replication. The example in this blog post reference SYSVOL replication, but the tips also pertain to replication in DFS. Below is a screenshot of the FRSDiag GUI interface.

image

Be sure to check out tools under the menu bar. This is a great way to do some simple troubleshooting tasks, like forcing FRS replication and querying for the OriginatorGUID of an FRS member server (OriginatorGUID is explained later.)

IMPORTANT

In order for any of this discussion to make sense, you first need to know how everything happens. If you are not an expert in FRS, please review the following article before proceeding. How FRS Works.

SUMMARY

The output from FRSDiag can look cryptic, so to simplify things, I want to separate the data gathered into three separate areas. The first area is Topology – this information focuses on the connections between servers and the components that make replication work. The second is VersionVector – this information tells replication partners that their data is current or what needs to be replicated with others. The third area is the Data being replicated. I will separate this blog post into these three areas and show some reports that give us this information. Lastly, I will not be discussing errors in the FRS event logs, as these are well documented. All the event logs are gathered in the FRSdiag report, this is a great way to troubleshoot an issue and find solutions by searching on the Microsoft support website for the event codes.

TOPOLOGY

The topology of your FRS environment lays out a map of how the data propagates to the FRS servers. The components of the topology are the Replica Sets, Replica Members, and the Connection Objects. A Replica Set is the replication of files and directories on a specific folder. This can be the SYSVOL folder, or a DFS folder using FRS to copy the contents among multiple targets. The Replica Members are the FRS servers that participate in the Replica Set. The Connection Objects are the paths that data can travel from an Upstream Replica Member to a Downstream Replica Member. The upstream member is where the changes happen and the downstream member receives these changes. When a change is made on the upstream partner, it sends a change notification to its downstream partners. When the downstream partner initiates the replication, it will reply to the upstream partner with a change acknowledgement and then pull the changes.

The FRSDiag report is run on one server and the reports only detail the components relevant to that server. Therefore, you need to run the report on all the Replica Members in the Replica Set in order to get a full picture of the topology. You can select all the members of the replica set by choosing the Browse option when selecting a target server.

image

Some reports that show this information are connstat.txt, ntfrs_config.txt and ntfrs_sets.txt. If you are troubleshooting SYSVOL, you can also use the two repadmin reports. The repadmin reports pertain to Active Directory replication. Because SYSVOL uses the same Connection Objects as Active Directory replication, these reports can be helpful.

Connstat.txt, ntfrs_config.txt and ntfrs_sets.txt all have their own unique bit of information, but they all basically say the same thing. All three reports group information on the Replica Set of which the target server is a member (the target server being the server that FRSDiag was pointing to when the report was ran.) Ntfrs_sets includes information on each of the Connection Objects used by the target server for that Replica Set. Here is a portion of my SYSVOL Replica Set information from my ntfrs_sets report.

image

Grouped beneath this Replica Set information are three Connection Objects that the target server uses.

image

This represents the connection to the NTFS Journal. This connection object pulls change orders from the local file system and populates data that originated on this server.

image

This represents the upstream member ADAR2DC2 and the downstream member ADAR2DC1. We have two listed because we have two-way replication. They look the same but you can tell which is upstream from another by looking in the ntfrs_ds.txt report. All of these components are objects in Active Directory, and ntfrs_ds.txt is the LDAP output of these objects. If you search this report for the Cxtion GUID: D9BDBC95-DCFC-43FA-8FAA-F6F60020669E, you will see that it is the Connection Object listed under “cn=d9bdbc95-dcfc-43fa-8faa-f6f60020669e, cn=ntds settings, cn=adar2dc1, cn=servers, cn=default-first-site-name, cn=sites,cn=configuration, dc=adatum, dc=com.” If this name sounds familiar, it’s because you see this in the AD sites and services MMC – we just display the connection object as <automatically generated>.

image

Now we will look at Connstat.txt. This is the most informative of the reports and where I go to first to get a good understanding of the situation. Even though we are discussing this under the context of topology, it also contains a wealth of information on the up-to-dateness of the downstream members and will make a good transition to our next topic, VersionVector information.

Let’s look at Connstat.txt from a sample report.

image

This displays all of your connections grouped by the replica set. In our case, our replica set is SYSVOL and we are reporting on domain controller DC04. We see some log data at the top: OutLogSeqNum: 5053 and OutlogCleanup: 5053. The OutLogSeqNum indicates the number of changes in the local FRS database. The OutlogCleanup indicates what file number has been acknowledged and pulled from all downstream partners. In this case the numbers are the same, so all downstream partners are in sync with this domain controller.

The next portion of data is registry information pertaining to the replica set. We see the root and staging path, these are local paths on the file server.

The last portion is a spreadsheet of each of the replica partners defined by each connection object. Each line represents one connection object, so in this scenario we see two entries for each domain controller (because we have a connection replicating in each direction.)

PartnerName is the name of either the upstream or downstream partner. It is upstream if the next column (I/O) indicates ‘In’, it is a downstream partner if the I/O indicates ‘Out’.

The Outbound connections are more interesting because they monitor how up to date is the downstream partner. When troubleshooting, you find better information from the upstream partner’s Connstat.txt.

Here is some information on how to read Connstat.txt. This goes into more detail than I do of the values and their meanings, so it is worth a look.

So let’s take a look at one of these entries. We will look at LITWAREINC\DC01$ (I/O = Out.) This is the connection object for DC04 as the upstream and DC01 as the downstream.

Rev = 8 - This indicates the revision number of NTFRS being used, A revision of 8 indicates that we are running Windows 2003.

LastJoinTime = Fri Feb 1– This will include date and time, I just included the date portion in order to get the output to fit on one page. This indicates the last time the downstream partner has connected, and Last VVJoin is the last time the downstream partner compared version vectors to do a full synchronization. See How FRS Works to compare a join to a vvjoin and when they are performed.

State = Joined and OLog State = OLP_ELIGIBLE indicate the current status of the connection. Do not be alarmed if it shows unjoined, this is normal. See Connstat remarks for an explanation of the state values.

In order to verify connection state and out of sync, look at the LeadX and TrailX values and compare to the OutLogSeqNum of the upstream partner. In my environment, you see that all equal 5053. This indicates that we are up to date. The upstream partner’s Outbound Log Sequence number is the same as of what the downstream partner has been notified (the LeadX value) and what changes have been acknowledged (the TrailX value.)

In my case, I had a remote DC (Remote 01) that appeared to not receive updates to SYSVOL. By using the information in the Connstat reports, I found the following topology described in the picture below and I was also able to find that our real problem was DC04, not Remote01.

image

Our symptom was that Remote01 did not have the latest GPO changes. Looking at the Connstat of Remote01, we look synchronized with DC04 and we are waiting for DC04 to update two changes. Below is the output.

image

Our Outlog Sequence Number is 8396 and the Trailx value from DC04 is 8394, indicating that 2 changes have not been accepted.

If we look at DC04, there appears to be no problem, this reinforces the lesson that the upstream partner holds the relevant information in their connstat report.

image

If we look at a Connstat on DC05, we see that DC01 is behind by 10 changes (not alarming) but DC04 is behind by 127 changes. This led us to the discovery that our real problem was DC04.

image

In our case, we just created connection objects from DC05 to Remote01 instead of DC04 and it replicated immediately. This example shows the importance of knowing the complete replication topology. In addition to the topology information, we also looked at how up-to-date the servers are with their partners. This leads us to our next discussion: VersionVector objects.

VersionVector

One important component of synchronizing data is the ability for servers to distinguish what changes are needed and what changes have already been received. The VersionVector is a summary of how up-to-date this FRS member is with all the updates made on the other members of the replica set.

Consider this scenario. You are throwing a huge party and want as many people to come as possible. To do this, you send out invitations to a group of friends and tell them to invite whoever they want. You need each of your friends to keep a list of attendees that is up-to-date regardless of which friend made the invitation. We need a simple way to ensure that all lists are the same and to propagate the changes as quickly as possible. To do this, we will require that each entry on the list require three things:

1. The person invited

2. Which friend made the invitation

3. An invitation number.

The invitation number is a simple count of how many invites a particular friend has made. So as I invite people I will count them 1,2,3,4,5 and so on. When I want to update my list with another friend, I can just tell them my latest count and they will know how many entries are required from me. Because there is a count associated with each friend, I can update someone's list with my invites, as well as invites from others. I will keep a table indicating how many invites I have received from others, similar to the one below:

Randy

 

15

 

Bob

 

18

 

Sean

 

14

 

Tim

 

21

 

Jonathan

 

16

 

I run into a friend and his table looks like this

Randy

 

11

 

Bob

 

14

 

Sean

 

17

 

Tim

 

12

 

Jonathan

 

16

 

I can quickly see that I need to tell my friend about my invites 12-15, Bob’s invites 15-18, and Tim’s invites 13-21. I also see that he is aware of all invites on my list originating from Sean and Jonathan because his number is equal to or greater than mine.

This is how VersionVectors work. The VersionVector Table is similar to the table above. Each member of the replica set is listed as an OriginatorGuid and each OriginatorGuid has an associated Version Sequence Number (VSN.) Now when one partner updates another (during a Join,) that partner can provide updates that originate from it as well as those changes originating from other members in the replica set.

A good report to review the VersionVector components are in NTFRS_SETS.txt and in the NTFRS_OUTLOG.txt. The outlog is the outbound log in the NTFRS database and includes all the latest change orders that the server posts for downstream partners. Each entry can be a local change order that originated on itself, or from its upstream partner and trickling down to its downstream partners. Here is a sample entry in the OUTLOG.txt

image

In this change order, we see a lot of GUIDs that are represented in the NTFRS_SETS.txt that was referenced above. We can track the referenced Connection Object by comparing the GUIDs in the change request above. First we see the change order above is for the Replica set SYSVOL

Table Type: Outbound Log Table for DOMAIN SYSTEM VOLUME (SYSVOL SHARE) (1)

We locate in the NTFRS_SETS.txt

ACTIVE REPLICA SETS

Replica: DOMAIN SYSTEM VOLUME (SYSVOL SHARE) (fac09b1b-fac4-41f2-95d63550da9f09bb)

We then see in the OUTLOG Change order

CxtionGuid : 7aa4ee28-09d5-498c-8a68c1bbf7e3c416

We see find this Connection object in the NTFRS_SETS.txt under our Replica Set

Cxtion: D9BDBC95-DCFC-43FA-8FAA-F6F60020669E (7aa4ee28-09d5-498c-8a68c1bbf7e3c416)

Lets look at the properties of this connection object in NTFRS_SETS.txt

image

Earlier we researched in Active Directory that this connection object was for the upstream partner as ADAR2DC2 and the downstream partner as ADAR2DC1. We see attributes of the partner – in this case it is ADAR2DC2 because the report was run from ADAR2DC1. We also see that it is an inbound connection (meaning that we are the downstream server.) We also see other valuable information such as the last Join Time and status.

We now can associate a change order in the NTFRS_OUTLOG.txt with the connection object that this change was received. We can also look back in the change order and see the VersionVector associated with this change, listed as FRSVsn.

FrsVsn : 01c8a008 5103ee38

In order to find this information in NTFRS_SETS.txt, we need to look at the Replica Set information rather than the specific connection object. If you scroll up to the beginning of the Replica set information, you will see its summary prior to the listing of the associated connection objects. Below is a screenshot of the Replica Set information

image

The last portion of this output is the VersionVector information. The VersionVectorTable consists of VvEntries. These entries are the pairings of the OriginatorGuid and the Version Sequence Number described earlier. The OriginatorGuid is a random number assigned to each of the FRS member servers. This number changes on a member whenever a VVJoin is done or a member is marked as authoritative by setting the burflags. The Version Sequence Number (VSN) is a hexadecimal number that increments with each change originating on that member. An easy way to determine the OriginatorGUID of a member, is to open the FRSDiag interface and select “Tools>Build GIUD to Name for Target Server(s)” from the menubar. You can search on these VVentries in the Outlog to locate the last known change on a particular connection. The entries under Replica Version Vector indicate the latest change order originating on that member server and Outlog Version Vector indicates the last entry purged from the Outbound Table. There is one more place that references the OriginatorGuid, and that is the FileIDTable. We will discuss this in our final topic, the data being replicated.

Replicated Data

The FileIDTable is a report that is not selected by default. It is a checkbox in the lower left corner of the tool named “ID Table Parser”. This report is a spreadsheet of every file and folder in the NTFRS database. You will get a warning message indicating that it could take an extremely long time to process. This warning is necessary because when we use FRS to replicate a DFS link containing hundreds of thousands of files If you are running this utility against a domain controller that does not host a DFS link, then this is typically not an issue. Below is an example of this output, the columns have been shortened to fit within this page, the GUIDs are much longer than those displayed:

image

Every file has a FileGuid that remains the same even if we rename the file. If we delete the file and create one with the same name, then it will have a new FileGuid. It also contains an attribute of ParentGuid, this tells us the folder where the object exists. These two attributes form the entire file/folder hierarchy for the replication group. With the combination of the FileGuid and ParentGuid, you can construct the entire data hierarchy of the replica set. As you can see in the picture above, the first entry on the list has a Parent ID of all zeros and a filepath of “.”; this is the replica root folder. For SYSVOL replication, the SYSVOL share is “.” and all other files stem from this point. If you are comparing FileID tables between different members, you will see that the FileGUID of the root folder is different for each member, but all other files and folders will share the same GUID across all members. The table also includes when and on which member it was created. You can see that member server’s OriginatorID as the Originator for the Replica Root Folder, this folder always originates on the local server.

You can see the activities of these objects, as well as all the activity on the FRS member, by looking at the NTFRS logs. There are numerous logs that are formatted NTFRS_00001 and ordered largest number most recent. There is a lot of good information on reading these logs in the How FRS Works article. When reading these log files, you must have this open as a reference to be able to follow along with what is happening. A good learning exercise is to create a test file and watch it originate on one partner and propagate to another. You can also see how often data is replicating by looking at the time stamps; or why the data is replicating by looking at the USNReason. Be sure to read the article on How FRS Works to see what all the entries mean.

As you may already know, FRS has been superseded by the DFSR (Distributed File System Replication) service introduced in Windows 2003 R2 and updated in Windows Server 2008. So why should we pay attention to something that is being replaced? The answer is SYSVOL replication between your domain controllers. Most of you will adopt DFSR in your distributed file system environment because of its enhanced efficiency, durability and reporting. But FRS will still have its place amongst your domain controllers to replicate SYSVOL content until you start deploying Win2008 and moving to DFSR for SYSVOL. You will be able to migrate SYSVOL replication to DFSR, but it will require your domain to be at 2008 functionality level and all your Domain Controllers running Windows 2008. For some of you, this may take some time, so in the meantime, hopefully you will find this information helpful.

See you next time!

- Randy Turner