Hi, Ned again. Today I’d like to talk about troubleshooting DFS Replication (i.e. the DFSR service included with Windows Server 2003 R2, not to be confused with the File Replication Service). Specifically, I’ll cover the most common causes of slow replication and what you can do about them.
Update: Make sure you also read this much newer post to avoid common mistakes that can lead to instability or poor performance: http://blogs.technet.com/b/askds/archive/2010/11/01/common-dfsr-configuration-mistakes-and-oversights.aspx
Let’s start with ‘slow’. This loaded word is largely a matter of perception. Maybe DFSR was once much faster and you see it degrading over time? Has it always been too slow for your needs and now you’ve just gotten fed up? What will you consider acceptable performance so that you know when you’ve gotten it fixed? There are some methods that we can use to quantify what ‘slow’ really means:
· DFSMGMT.MSC Health Reports
We can use the DFSR Diagnostic Reports to see how big the backlog is between servers and if that indicates a slowdown problem:
The generated report will tell you sending and receiving backlogs in an easy to read HTML format.
· DFSRDIAG.EXE BACKLOG command
If you’re into the command-line you can use the DFSRDIAG BACKLOG command (with options) to see how behind servers are in replication and if that indicates a slow down. Dfsrdiag is installed when you install DFSR on the server. So for example:
dfsrdiag backlog /rgname:slowrepro /rfname:slowrf /sendingmember:2003srv13 /receivingmember:2003srv17
Member <2003srv17> Backlog File Count: 10
Backlog File Names (first 10 files)
1. File name: UPDINI.EXE
2. File name: win2000
3. File name: setupcl.exe
4. File name: sysprep.exe
5. File name: sysprep.inf.pro
6. File name: sysprep.inf.srv
7. File name: sysprep_pro.cmd
8. File name: sysprep_srv.cmd
9. File name: win2003
10. File name: setupcl.exe
This command shows up to the first 100 file names, and also gives an accurate snapshot count. Running it a few times over an hour and give you some basic trends. Note that hotfix 925377 resolves an error you may receive when continuously querying backlog, although you may want to consider installing the more current DFSR.EXE hotfix which is 931685. Review the recommended hotfix list for more information.
· Performance Monitor with DFSR Counters enabled
DFSR updates the Perfmon counters on your R2 servers to include three new objects:
- DFS Replicated Folders
- DFS Replication Connections
- DFS Replication Service Volumes
Using these allows you to see historical and real-time statistics on your replication performance, including things like total files received, staging bytes cleaned up, and file installs retried – all useful in determining what true performance is as opposed to end user perception. Check out the Windows Server 2003 Technical Reference for plenty of detail on Perfmon and visit our sister AskPerf blog.
· DFSRDIAG.EXE PropagationTest and PropagationReport
By running DFSRDIAG.EXE you can create test files then measure their replication times in a very granular way. So for example, here I have three DFSR servers – 2003SRV13, 2003SRV16, and 2003SRV17. I can execute from a CMD line:
dfsrdiag propagationtest /rgname:slowrepro /rfname:slowrf /testfile:canarytest2
(wait a few minutes)
dfsrdiag propagationreport /rgname:slowrepro /rfname:slowrf /testfile:canarytest2
/reportfile:c:\proprep.xml
PROCESSING MEMBER 2003SRV17 [1 OUT OF 3]
PROCESSING MEMBER 2003SRV13 [2 OUT OF 3]
PROCESSING MEMBER 2003SRV16 [3 OUT OF 3]
Total number of members : 3
Number of disabled members : 0
Number of unsubscribed members : 0
Number of invalid AD member objects: 0
Test file access failures : 0
WMI access failures : 0
ID record search failures : 0
Test file mismatches : 0
Members with valid test file : 3
This generates an XML file with time stamps for when a file was created on 2003SRV13 and when it was replicated to the other two nodes.
The time stamp is in FILETIME format which we can convert with the W32tm tool included in Windows Server 2003.
<MemberName>2003srv17</MemberName>
<CreateTime>128357420888794190</CreateTime>
<UpdateTime>128357422068608450</UpdateTime>
w32tm /ntte 128357420888794190
148561 19:54:48.8794190 – 10/1/2007 3:54:48 PM (local time)
C:\>w32tm /ntte 128357422068608450
148561 19:56:46.8608450 – 10/1/2007 3:56:46 PM (local time)
So around two minutes later our file showed up. Incidentally, this is something you can do in the GUI on Windows Server 2008 and it even gives you the replication time in a format designed for human beings!
Based on the above steps, let’s say we’re seeing a significant backlog and slower than expected replication of files. Let’s break down the most common causes as seen by MS Support:
1. Missing Windows Server 2003 Network QFE Hotfixes or Service Pack 2
Over the course of its lifetime there have been a few hotfixes for Windows Server 2003 that resolved intermittent issues with network connectivity. Those issues generally affected RPC and led to DFSR (which relies heavily on RPC) to be a casualty. To close these loops you can install KB938751 and KB922972 if you are on Service Pack 1 or 2. I highly recommend (in fact, I pretty much demand!) that you also install KB950224 to prevent a variety of DFSR issues – in fact, this hotfix should be on every Win2003 computer in your company.
2. Missing DFSR Service’s latest binary
The most recent version of DFSR.EXE always contains updates that not only fix bugs but also generally improve replication performance. We now have a KB article that we are keeping up to date with the latest files we recommend running for DFSR:
KB 958802 – List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2003 R2
KB 968429 – List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2008 and in Windows Server 2008 R2
3. Out-of-date Network Card and Storage drivers
You would never run Windows Server 2003 with no Service Packs and no security updates, right? So why run it without updated NIC and storage drivers? A large number of performance issues can be resolved by making sure that you keep your drivers current. Trust me when I say that vendors don’t release new binaries at heavy cost to themselves unless there’s a reason for them. Check your vendor web pages at least once a quarter and test test test.
Important note: If you are in the middle of an initial sync, you should not be rebooting your server! All of the above fixes will require reboots. Wait it out, or assume the risk that you may need to run through initial sync again.
4. DFSR Staging directory is too small for the amount of data being modified
DFSR lives and dies by its inbound/outbound Staging directory (stored under <your replicated folder>\dfsrprivate\staging in R2). By default, it has a 4GB elastic quota set that controls the size of files stored there for further replication. Why elastic? Because experience with FRS showed us having a hard-limit quota that prevented replication was A Bad Idea™.
Why is this quota so important? Because if Staging is below quota – 90% by default – it will replicate at the maximum rate of 9 files (5 outbound, 4 inbound) for the entire server. If the staging quota of a replicated folder is exceeded then depending on the number of files currently being replicated for that replicated folder, DFSR may end up slowing replication for the entire server until the staging quota of the replicated folder drops below the low water mark, which is computed by multiplying the staging quota by the low water mark in percent (default is 60%).
If the staging quota of a replicated folder is exceeded and the current number of inbound replicated files in progress for that replicated folder exceeds 3 (15 in Win2008) then one task is used by staging cleanup and the three (15 in Win2008) remaining tasks are waiting for staging cleanup to complete. Since there is a maximum of four (15 in Win2008) concurrent tasks, no further inbound replication can take place for the entire system.
If the staging quota of a replicated folder is exceeded and the current number of outbound replicated files in progress for that replicated folder exceeds 5 (16 in Win2008) then the RPC server cannot serve anymore RPC requests, the maximum number of RPC requests being processed at the same time being five (16 in Win2008) and all five (16 in Win2008) requests waiting for staging cleanup to complete.
You will see DFS replication 4202, 4204, 4206 and 4208 events about this activity and if happens often (multiple times per day) your quota is too small. See the section Optimize the staging folder quota and replication throughput in the Designing Distributed File Systems guidelines for tuning this correctly. You can change the quota using the DFSR Management MMC (dfsmgmt.msc). Select Replication in the left pane, then the Memberships tab in the right pane. Double-click a replicated folder and select the Advanced tab to view or change the Quota (in megabytes) setting. Your event will look like:
Event Type: Warning
Event Source: DFSR
Event Category: None
Event ID: 4202
Date: 10/1/2007
Time: 10:51:59 PM
User: N/A
Computer: 2003SRV17
Description:
The DFS Replication service has detected that the staging space in use for the
replicated folder at local path D:\Data\General is above the high watermark. The
service will attempt to delete the oldest staging files. Performance may be
affected.
Additional Information:
Staging Folder:
D:\Data\General\DfsrPrivate\Staging\ContentSet{9430D589-0BE2-400C-B39B-D0F2B6CC972E}
-{A84AAD19-3BE2-4932-B438-D770B54B8216}
Configured Size: 4096 MB
Space in Use: 3691 MB
High Watermark: 90%
Low Watermark: 60%
Replicated Folder Name: general
Replicated Folder ID: 9430D589-0BE2-400C-B39B-D0F2B6CC972E
Replication Group Name: General
Replication Group ID: 0FC153F9-CC91-47D0-94AD-65AA0FB6AB3D
Member ID: A84AAD19-3BE2-4932-B438-D770B54B8216
5. Bandwidth Throttling or Schedule windows are too aggressive
If your replication schedule on the Replication Group or the Connections is set to not replicate from 9-5, you can bet replication will appear slow! If you’ve artificially throttled the bandwidth to 16Kbps on a T3 line things will get pokey. You would be surprised at the number of cases we’ve gotten here where one administrator called about slow replication and it turned out that one of his colleagues had made this change and not told him. You can view and adjust these in DFSMGMT.MSC.
You can also use the Dfsradmin.exe tool to export the schedule to a text file from the command-line. Like Dfsrdiag.exe, Dfsradmin is installed when you install DFSR on a server.
Dfsradmin rg export sched /rgname:testrg /file:rgschedule.txt
You can also export the connection-specific schedules:
Dfsradmin conn export sched /rgname:testrg /sendmem:fabrikam\2003srv16 /recvmem:fabrikam\2003srv17
/file:connschedule.txt
The output is concise but can be un-intuitive. Each row represents a day of the week. Each column represents an hour in the day. A hex value (0-F) represents the bandwidth usage for each 15 min. interval in an hour. F =Full, E=256M, D=128M, C=64M, B=32M, A=16M, 9=8M, 8=4M, 7=2M, 6=1M, 5=512K, 4=256K, 3=128K, 2=64K, 1=16K, 0=No replication. The values are either in megabits per second (M) or kilobits per second (K).
And a bit more about throttling – DFS Replication does not perform bandwidth sensing. You can configure DFS Replication to use a limited amount of bandwidth on a per-connection basis, and DFS Replication can saturate the link for short periods of time. Also, the bandwidth throttling is not perfectly accurate though it maybe “close enough.” This is because we are trying to throttle bandwidth by throttling our RPC calls. Since DFSR is as high as you can get in the network stack, we are at the mercy of various buffers in lower levels of the stack, including RPC. The net result is that if one analyzes the raw network traffic, it will tend to be extremely ‘bursty’.
6. Large amounts of sharing violations
Sharing violations are a fact of life in a distributed network – users open files and gain exclusive WRITE locks in order to modify their data. Periodically those changes are written within NTFS by the application and the USN Change Journal is updated. DFSR Monitors that journal and will attempt to replicate the file, only to find that it cannot because the file is still open. This is a good thing – we wouldn’t want to replicate a file that’s still being modified, naturally.
With enough sharing violations though, DFSR can start spending more time retrying locked files than it does replicating unlocked ones, to the detriment of performance. If you see a considerable amount of DFS Replication event log entries for 4302 and 4304 like below, you may want to start examining how files are being used.
Event ID: 4302 Source DFSR Type Warning
Description
The DFS Replication service has been repeatedly prevented from replicating a file due to consistent sharing violations encountered on the file. A local sharing violation occurs when the service fails to receive an updated file because the local file is currently in use.
Additional Information:
File Path: <drive letter path to folder\subfolder>
Replicated Folder Root: <drive letter path to folder>
File ID: {<guid>}-v<version>
Replicated Folder Name: <folder>
Replicated Folder ID: <guid2>
Replication Group Name: <dfs path to folder>
Replication Group ID: <guid3>
Member ID: <guid4>
Many applications can create a large number of spurious sharing violations, because they create temporary files that shouldn’t be replicated. If they have a predictable extension, you can prevent DFSR from trying to replicate them by setting and exception in DFSMGMT.MSC. The default file filter excludes file extensions ~*, *.bak, and *.tmp, so for example the Microsoft Office temporary files (~*) are excluded by default.
Some applications will allow you to specify an alternate location for temporary and working files, or will simply follow the working path as specified in their shortcuts. But sometimes, this type of behavior may be unavoidable, and you will be forced to live with it or stop storing that type of data in a DFSR-replicated location. This is why our recommendation is that DFSR be used to store primarily static data, and not highly dynamic files like Roaming Profiles, Redirected Folders, Home Directories, and the like. This also helps with conflict resolution scenarios where the same or multiple users update files on two servers in between replication, and one set of changes is lost.
7. RDC has been disabled over a WAN link.
Remote Differential Compression is DFSR’s coolest feature – instead of replicating an entire file like FRS did, it replicates only the changed portions. This means your 20MB spreadsheet that had one row modified might only replicate a few KB over the wire. If you disable RDC though, changing any portion of a files data will cause the entire file to replicate, and if the connection is bandwidth-constrained this can lead to much slower performance. You can set this in DFSMGMT.MSC.
As a side note, in an extremely high bandwidth (Gigabit+) scenario where files are changed significantly, it may actually be faster to turn RDC off. Computing RDC signatures and staging that data is computationally expensive, and the CPU time needed to calculate everything may actually be slower than just moving the whole file in that scenario. You really need to test in your environment to see what works for you, using the PerfMon objects and counters included for DFSR.
8. Incompatible Anti-Virus software or other file system filter drivers
It’s a problem that goes back to FRS and Windows 2000 in 1999 – some anti-virus applications were simply not written with the concept of file replication in mind. If an AV product uses its own alternate data streams to store ‘this file is scanned and safe’ information, for example, it can cause that file to replicate out even though to an end-user it is completely unchanged. AV software may also quarantine or reanimate files so that older versions reappear and replicate out. Older open-file Backup solutions that don’t use VSS-compliant methods also have filter drivers that can cause this. When you have a few hundred thousand files doing this, replication can definitely slow down!
You can use Auditing to see if the originating change is coming from the SYSTEM account and not an end user. Be careful here – auditing can be expensive for performance. Also make sure that you are looking at the original change, not the downstream replication change result (which will always come from SYSTEM, since that’s the account running the DFSR service).
There are only a couple things you can do about this if you find that your AV/Backup software filter drivers are at fault:
- Don’t scan your Replicated Folders (not a recommended option except for troubleshooting your slow performance).
- Take a hard line with your vendor about getting this fixed for that particular version. They have often done so in the past, but issues can creep back in over time and newer versions.
9. File Server Resource Manager (FSRM) configured with quotas/screens that block
replication.
So insidious! FSRM is another component that shipped with R2 that can be used to block file types from being copied to a server, or limit the quantity of files. It has no real tie-in to DFSR though, so it’s possible to configure DFSR to replicate all files and FSRM to prevent certain files from being replicated in. Since DFSR keeps retrying, it can lead to backlogs and situations where too much time is spent retrying backlogged files that can never move and slowing up files that could move as a consequence.
When this is happening, debug logs (%systemroot%\debug\dfsr*.*) will show entries like:
20070605 09:33:36.440 5456 MEET 1243 <Meet::Install> -> WAIT Error processing update. updateName:teenagersfrommars.mp3 uid:{3806F08C-5D57-41E9-85FF-99924DD0438F}-v333459
gvsn:{3806F08C-5D57-41E9-85FF-99924DD0438F}-v333459
connId:{6040D1AC-184D-49DF-8464-35F43218DB78} csName:Users
csId:{C86E5BCE-7EBF-4F89-8D1D-387EDAE33002} code:5 Error:
+ [Error:5(0x5) <Meet::InstallRename> meet.cpp:2244 5456 W66 Access is denied.]
Here we can see that teenagersfrommars.mp3 is supposed to be replicated in, but it failed with an Access Denied. If we run the following from CMD on that server:
filescrn.exe screen list
We see that…
File screens on machine 2003SRV17:
File Screen Path: C:\sharedrf
Source Template: Block Audio and Video Files (Matches template)
File Groups: Audio and Video Files (Block)
Notifications: E-mail, Event Log
… someone has configured FSRM using the default Audio/Video template which blocks MP3 files and it happens to be against our c:\sharedrf folder we are replicating. To fix this we can do one or more of the following:
- Make the DFSR filters match the FSRM filters
- Delete any files that cannot be replicated due to the FSRM rules.
- Prevent FSRM from actually blocking by switching it from “Active Screening” to “Passive Screening” by using its snap-in. This will generate events and email warnings to the administrator, but not prevent the files from being moved in.
10. Un-staged or improperly pre-staged data leading to slow initial replication.
Wake up, this is the last one!
Sometimes replication is only slow in the initial sync phase. This can have a number of causes:
- Users are modifying files while initial replication is going on – ideally, you should set up your replication over a change control window like a weekend or overnight.
- You don’t have the latest DFSR.EXE from #2 above.
- You have not pre-staged data, or you’ve done it in a way that actually alters the files, forcing the most of or the entire file to replicate initially.
Here are the recommendations for pre-staging data that will give you the best bang for your buck, so that initial sync flies by and replication can start doing its real day-to-day job:
(Make sure you have latest DFSR.EXE installed on all nodes before starting!)
- ROBOCOPY.EXE – works fine as long as you follow the rules in this blog post.
- XCOPY.EXE – Xcopy with the /X switch will copy the ACL correctly and not modify the files in any way.
- Windows Backup (NTBACKUP) – The Windows Backup tool by default will restore the ACLs correctly (unless you uncheck the Advanced Restore Option for Restore security setting, which is checked by default) and not modify the files in any way. [Ned – if using NTBACKUP, please examine guidance here]
I prefer NTBACKUP because it also compresses the data and is less synchronous than XCOPY or ROBOCOPY [Ned – see above]. Some people ask ‘why should I pre-stage, shouldn’t DFSR just take care of all this for me?’. The answer is yes and no: DFSR can handle this, but when you add in all the overhead of effectively every file being ‘modified’ in the database (they are new files as far as DFSR is concerned), a huge volume of data may lead to slow initial replication times. If you take all the heavy lifting out and let DFSR just maintain, things may go far faster for you.
As always, we welcome your comments and questions,
– Ned Pyle
Hi Ned!
When I started to read this article I was hoping to find a solution on our problem with disappaering shared Excel files on DFS shares.
Users are running WindowsXPsp2, Office2003sp3.
Server is Windows 2003 server SP1 running DFSR.
Server A is in a Datacentral and there the files are read only. When working with the files the users are working on Server B in a local place.
(Server B)
Users are getting Excel files saved as a
"random" extensionless HEX named file – e.g. 40120100 – and the
original file name is lost on.
The saved file (lost file) can be found on server B at ConfilictAndDeleted
(Server A)
The excel file exists with saved date and time before user saves the file on Server A.
I have read about this problem on other forum´s but no one seems to come up with a solution.
Please help us
Regards
Patrik Frisk
Hi Patrik,
That’s a bug that was fixed in DFSR.EXE about a year ago. If you are still seeing this issue with Service Pack 2 installed or with latest DFSR.EXE (see http://support.microsoft.com/kb/931685) please let me know!
-Ned
Hi Ned,
Great information. I have downloaded the hotfixes you have mentioned because i have the same problem as Patrik with excel files.
I have two file servers each running windows 2003 standard R2 with service pack 2. They are replicated with one of them being the primary.
In you information you listed the \servernamedirectoryDfsrPrivateConflictAndDeleted files. I have one per directory mount point and the one in this directory is taking up 3.14 GB of disk space. The files go back to when we initially installed DSFR and continue to today’s date, so I don’t think it is going to automatically clean itself up. How do i clean these directories up so i can up have my disk space back?
Thanks,
Bobbi
Bobbi and Patrik,
Reading Patrik’s description of the Excel file problem, I would first want to rule out that we aren’t just dealing with file conflicts. If a file is updated on two servers before the file can get in sync again, DFSR handles that as a conflict, and the file that loses the conflict is moved to DfsrPrivateConflictAndDeleted in the root of the replicated folder on one of the servers, and it is renamed to filename-GUID-version.
You can test this with a command like:
echo foo > \std1d$datatest.xls & echo foo > \std2d$datatest.xls
In that command, std1 and std2 are DFSR members replicating the folder D:Data. The command creates the files simultaneously on both servers which results in a conflict that is logged as Event ID 4412 on one of the servers.
Event Type: Information
Event Source: DFSR
Event Category: None
Event ID: 4412
Date: 10/18/2007
Time: 10:40:25 AM
User: N/A
Computer: STD1
Description:
The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.
Additional Information:
Original File Path: D:Datatest.xls
New Name in Conflict Folder: test.xls-{E3716117-034F-4998-A151-40DB382A4E4F}-v16188
Replicated Folder Root: D:Data
File ID: {E3716117-034F-4998-A151-40DB382A4E4F}-v16188
Replicated Folder Name: Data
Replicated Folder ID: 6939148D-3D46-4EDF-93FB-525061A91F2F
Replication Group Name: TESTRG2
Replication Group ID: F42975DB-33C5-4BC3-86E6-CAC21EF374E5
So first try to determine if these are just conflicts, and if not, we’d like to hear a detailed description of how the problem is reproduced in your environment.
For Bobbi’s second question, there is a WMI method CleanupConflictDirectory that can be used to purge the ConflictAndDeletedDirectory.
First you want to determine the GUID of the replicated folder whose ConflictAndDeleted folder you want to purge. This can be done with WMIC or Dfsradmin, but Dfsradmin is simpler.
dfsradmin rf list /rgname:testrg /attr:rfname,rfguid
In that command "testrg" is the name of the replication group that contains the replicated folder you are looking for.
Then you use the rfguid in a WMIC command to call CleanupConflictDirectory:
wmic /namespace:\rootmicrosoftdfs path dfsrreplicatedfolderinfo where
"replicatedfolderguid=’5B2BAE34-102B-4057-B8E5-EFE346D1FF19’" call
cleanupconflictdirectory
In the DFSR debug log (%windir%debugdfsr####.log) that will look like this –
FrsContentSetInfo::ExecQuery Executing query:select * from DfsrReplicatedFolderInfo where replicatedfolderguid = "6939148d-3d46-4edf-93fb-525061a91f2f" client:craig
FrsContentSetInfo::Enum Enumerating content info objects. client:craig
FrsContentSetInfo::Get Getting content set info objects. client:craig
FrsContentSetInfo::ExecMethod Invoking cleanupconflictdirectory() method. client:clandis
FrsContentSetInfo::InvokeCleanupConflictDirectory Output Parameters: ReturnValue=0 (Success)
ConflictWorkerTask::CleanupManifest Cleanup conflict directory
ConflictWorkerTask::PostOp type:1 op:0 size:7
ConflictWorkerTask::PostOp type:7 op:0 size:0
ConflictWorkerTask::Step Conflict fileSize:0 fileCount:0
Also, regarding the ConflictAndDeleted folder, I was assuming you had tried this but I’ll mention it anyway. If you double-click the folder on the Memberships tab in dfsmgmt.msc, go to the Advanced tab, you can reduce the Conflict And Deleted quota to as low as 10 megabytes. So another way to purge is to set that to 10 and restart the service, and it will purge down to the low water mark of 60% of 10 mb.
So that is a GUI method, but it appeared as if a service restart was needed for that to take effect immediately, although I imagine if I waited long enough it would run the cleanup thread and take into account the new 10 mb quota.
But the CleanupConflictDirectory WMI method works instantly.
Excellent article! Answered a lot of questions I had on DFSR.
I had been pre-staging using Robocopy but thought I’d try the Windows Backup instead having read the blog.
However I now have a major problem with Event id 1108.
I have eventually found an article on this but there are no solutions (apart from log a support call with MS for £250)
http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2.3790.1830&EvtID=1108&EvtSrc=DFSR&LCID=1033
Do you have any suggestions on what I can try as nothing is replicating at the moment at all.
Thanks.
Hi Alasdair,
This issue is typically caused by an invalid registry value in the Restore subkey for the DFSR service. Look at:
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesDfsrRestore.
There will be a sub key named year-date-time the restore was done with two values. One of those
values will be the network name that was used to perform the remote restore.
– Backup and delete the restore subkey
– Restart DFSR (if it won’t stop, restart machine).
– After the reg value is removed the service start and stop will be normal
More Information
================================
When a restore is done to a DFSR server a registry subkey and a few values will get added to the registry on the target system so that DFRS can process the restore. A good entry must use a local drive letter. It should look like this:
[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesDfsrRestore20070920-202505]
@="non-authoritative"
"<e:>"=""
However when you do a remote restore over SMB, meaning run NTBACKUP on serverA and restore to serverB) the entry will look like this:
[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesDfsrRestore20070920-202505]
@="non-authoritative"
"<\\dfstestfs04\e$>"=""
You will notice difference in the drive letter on the local restore as opposed to the e$ on the network restore. DFSR does not know how to process e$ and as a result cannot continue. It will sit there and wait for the registry key to be corrected.
All replication will stop till this key is deleted.
To prevent this from happening in the future either perform the restore of the data on the target machine or delete the registry value after performing the network restore and restart the service.
This has been fixed in the next OS release WIndows Server 2008.
Let me know if this doesn’t take care of it!
-Ned
Hi Ned, appreciate your prompt feedback.
I actually ended up phoning MS support and the chap there told me exactly the same. So having deleted that key and restarting the services all is fine again!! 🙂 (that is a big load of my mind).
What I like even more was when I asked about why it happened he agreed that it was a bug, called me back a few moments ago and told me I wasn’t going to be charged for my support call.
So that has made my day! However it might be nice if this "known" bug was documented somewhere to save others having the same headache… Of course now a search should bring them to this thread, so all’s well that ends well.
Thanks again.
Alasdair.
Hi Ned,
I keep getting this when I do a dfsrdiag backlog check:
[WARNING] Found 2 <DfsrReplicatedFolderConfig> objects with same ReplicationGrou
pGuid=1CF848D4-0F43-4334-A5F7-0EF85F0754F5 and ReplicatedFolderName=departments;
using first object.
How do I get rid of the extra GUID?
Thanks,
Jason
Hi Jason,
It’s likely that the local XML cache for the DFSR has some duplicate entries. Try this:
1) On DFSR server that has the errors from the output run DFSRDiag POLLAD.
2) Stop the DFS Replication service
3) Go to the drive that holds the replica_ files for the RG such as F:System Volume InformationDFSRConfig and rename the replica_*.xml files to replica_*.old
4) Go to the C:System Volume InformationDFSRConfig and rename the Volume_*.xml files to replica_*.old
5) Start the DFS Replication service
Check in the replica_ drive (i.e.- F:System Volume InformationDFSRConfig) and C:System Volume InformationDFSRConfig for the new xml files and in the registy at HKLMSystemCurrentControlSetServicesDFRSAccess ChecksReplication Groups for the values pertaining to the RG as well as HKLMSystemCurrentControlSetServicesDFRSParametersReplication Groups
Re-run the DFSRDiag commands to verify the fix.
Let me know how this works out.
-Ned
Hi Ned,
Very good! Thank you much… I thought that was it, but I wasn’t sure it was safe to play with those files.
Sorry, one more thing… I ran into some replication issues when a drive failed. When I restarted the dfs service I had some weird algorithm issues. The files that were kept shouldn’t have been. The modified date was newer on the ones moved to "Conflict and Deleted." Anyway, I’ve run the latest dfsr.exe hotfix and decided to pre-stage to get everything back in order. Is there a way to clear the backlog so dfs starts fresh after a pre-stage?
Many thanks!!! This blog is terrific!
Jason
Hmmm – are you using Trend Micro Officescan 7.X? We’ve seen issues where older files would get reanimated with that application running.
If you want to start fresh and remove your backlog, you can remove the replica set, get your ‘master’ data onto one box, then use NTBACKUP to create a BKF of it, move or delete the ‘bad’ data off the other server(s), then copy the BKF out to them and restore the data to the correct spot. Then you create the replica and choose the ‘master’ server as primary – then the data should all sync up, and since it’s indentical there shouldn’t be a long period before initial replication is done and you;re back in business.
If this all sounds nutso and dangerous, don’t hesitate to open a case with us here for backend support.
-Ned
Greetings Ned,
Great info!
I have a potentially stupid question.
Is there a way to disable RPC encryption for DFSR?
I use WAFS appliances and encrypted traffic is not optimized.
Thanks,
Joe Bedard
Hiya Joe,
That was an interesting question. After a bit of source code review I can say definitely that this is not possible and RPC encryption cannot be disabled.
– Ned
Hello Ned,
Thanks for this really usefull and powerful article.
sorry for my english, but I’m French 😉
anyway: here is my question: I’m running a Windows 2003 R2 SP2 with latest patches. I decide to upgrade the antivirus NOD32 from version 2.7 to version 3 and I started directly to got a couple of bad message in the system event viewer (ID 14530) saying more or less: DFS could not access its private data from the Active Directory. Please manually check network connectivity, security access, and/or consistency of DFS information in the Active Directory. This error occurred on root Company.
After the reboot, this message disapears but I noticed than the CPU goes to 50% in use by the the dfsr.exe process. After 3 or 4 hours, the server was not available and it was not possible to print, to access file and even top connect on the server physicaly. A hard reboot was necessary.
After the uninstallation of the AV, it was still the same thing. Finally, i apply the hotfix 931685 and it seems that the server is now accessible 24/7 but still with the dfsr.exe occuping 50% of the CPU (on an HS21 Blade).
After investigating, I notice that the dfsr00100.log which seems to be the running log file is full of strange message like that:
[Error:183(0xb7) Staging::OpenForWrite staging.cpp:3370 936 W565 Cannot create a file when that file already exists.]
20071130 13:42:27.880 936 STAG 3508 [WARN] Staging::OpenForWrite (Ignore) Failed to create stage file for GVSN {2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748:
I think that some cleanup in the DFS DB should be done but for the moment, I didn’t find anything helpfull.
Do you have any idea and could you please give me any tips or direction to search?
Thansk in advance.
Xav
Hi Xav,
No worries, your English is excellent and far better than my French!
It sounds like we’ve got something damaged in teh staging directory that the service keeps trying to process. So let’s try this:
1. Stop the DFSR service.
2. Look closely at the log you mentioned above – are all the file GVSN’s the same? In your case it was:
{2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748
3. If those endless repeat the same entry, go to teh staging directory. So for example:
C:Replicated FoldersDFSR-Replicated-FolderDfsrPrivateStagingContentSet{AB3C38D4-64A0-43A0-96C8-1F5102004D6A}-{3D9DE7E2-5FD4-4404-A6ED-A85EAD22AA81} 1690653-{7CB6C56A-6307-42F7
-B494-498DF8314789}-v806772-{8AE6FD76-BD8D-4D03-B522-FC91A58308C4}-v690653-Downloading.frx
4. Delete that file.
5. Start the DFSR service.
If there are ton of different files listed in the debug log with that error (which I have not seen before – always just one file), you will need to hunt them down as well.
Bonne chance!
-Ned
Hi Ned,
You know what? Thanks a million 😉
I follow your suggestion and now, it’s perfect: the processor went back to 0 to 5% and the dfsr.exe is running normaly. In addition, the log files contains now "normal" data.
It was one file, those one you talk about. In fact, I tried before that to delete it but without stoping the dfsr.exe service before; that’s why it didn’t work.
Now, I certainly have to reinstall the anti-virus, but I’m not so confident 😉
Thanks again for your help: it saved a lot of time and stress.
Xav
Have you ever felt your DFSR infrastructure wasn’t quite replicating up to your expectations, but didn’t
If you’re using DFS-R, which is included with Server 2003 R2, the Microsoft Directory Services Team has put together an excellent post explaining the top 10 reasons why replication may be slow. In short, the top 10 are: Missing Windows Server 2003 Network
Hi,
I just found this link, and I wanted to ask something about DFSR if possible.
I’m replicating files between 2 sites, and in one way, it goes just fine, but when I start replicating in different direction, it starts replications, and in one moment is completely stuck. Staging Area is enough big. When I run BACKLOG in one of the replication group, there is onely 1 text file there, and doesn’t go. We have enough bandwidth 4 Mbps, and is almost empty. Servers in both sides are completely updated, even with DFSR.exe fix.
I one side is Windows 2003 R2 Ent 64 bit, and in other side Windows 2003 R2 Std 32 bit (this is server that doesn’t replicate.
When I run Diagnostics report, it says everything is OK, and no single error in DFS Log.
Thank you
Agim
Hi,
If you look in the DFSR debug log (on the server where the file does not replicate out, after creating a new text file), do you see the file being written in a section called
UsnConsumer::CreateNewRecord LDB Inserting ID Record:
UsnConsumer::CreateNewRecord ID record created from USN_RECORD
?
After that, do you see any subsequent errors about his file?
It may save time for you to send me an email through the ‘EMAIL’ link at the top of the page and I can see your data.
-Ned
Many papers and KB articles have been posted about the "old-style" SYSVOL replication, or FRS,
Ned
Is there a way to move the ConflictandDeleted directory from its default location. I know we can move the staging directory but did not find anyway to move the above directory to a different location.
Hi Tom,
I’m afraid it cannot be changed (even by hacking in ADSIEDIT – if you change it from the default it will simply be ignored and the path constructed from the root RF).
What is the impact on DFS-R or RDC if "SMB signing" is used?
Hi DanPan,
There’s no impact – DFSR uses RPC for all replication work, SMB is not used in any way (not even named pipes).
Ned
I have 10 replicated folders in one replication group & I would like to move the staging directories for all 10 to a central location, for example, E:Staging. What’s the best way to do this & are there any issues that I should be aware of? Thanks
Hi Tom,
You should not share the same staging directory, if that’s what you’re meaning. So:
e:staging <– bad
e:stagingrf1 <– good
e:stagingrf2 <– good
e:stagingrf3 <– good
<etc>
Configuring the staging path to be the same for all replicated folders may lead to some problems during staging cleanup. We do not support this configuration even though it may seem to work. We’ve had some cases where this was done and there were bizarre parent-child relationship failures and blocked replication. Not fun to fix.
As far as changing it – you can just do it through DFSMGMT.MSC and it will all get created and used automagically. Once it has taken affect (after AD replication converges and DFSR polls), you can delete the old staging folders. Changing the staging path does not automatically move the contents to the new folder though, so you may see some slightly slower replication and reduced RDC efficiency for a while until staging starts getting filled again.
– Ned
Thanks a lot Ned!
So, I take it that the existing content of the staging directories does not get moved to the new staging locations.
Yessir, that is correct.
So, from a Best Practice Perspective, if you had to choose between keeping the staging directories in their default location or moving them to a new location (since each will need its own staging directory after all), which one would you recommend? Thanks
My recommendation would just be based on the environment – if you need more space, definitely move it to another drive. If not, don’t.
We always want you to allocate as much staging space is possible, so if that means having to move it – go for it.
I’m currently replacing branch office file servers and at the same time starting to use DFS-R for getting data back to a central site. Historically we’ve used Roboocopy to move data from the old server to the new server (security and all) because of the /mir capability. That works nicely because you can re-sync prior to the swap out, very quickly. BTW, we’re going to Server 2008.
I came across this post that says that Robocopy has a bug that causes you not to be able to copy security. I’m going to open a ticket with MS, but thought I would post here with my 2 cents. You recommend using xcopy… Robocopy is now built in (finally) to the OS in Vista and 2008. I just type xcopy /? at a cmd prompt and what appear… "NOTE: Xcopy is now deprecated, please use Robocopy."
Sounds like someone needs to fix the bug in Robocopy.
Hi shannontuten,
It’s not that robocopy completely fails to copy security, it’s that it sets the inheritance bit in such a way that the MD-5 checksum of the file changes. So while you have security working fine, apps that compare checksums will think the files are different.
Feel free to press for the fix in Robocopy if you have a Premier contract though (do not bother if you are calling in a credit card case, those cannot be escalated to bugs). The more contracted customers that call in on this issue, the more likely we are to cross the bar for a fix. I have also started this dicussion again internally to see if we can get more traction again against 2008 and Win7.
– Ned
We do indeed have a premier contract so I figure it is worth a quick low priority web ticket to let Microsoft know that it affects customers.
I tried using Robocopy and it works fine, the only bad thing is it spams the log with conflict file messages (for every file).
For migrating file servers, it’s hard to beat robocopy with a /mir command so that you sync the bulk of the data prior to a switch out and then run it one more time once you take access away. Xcopy just doesn’t fit the bill for that type of operation.
Thanks for the great article and response. DFS-R is a quite impressive technology.
Hi
1) IS it possible to View files in replication queue or being replicated ?? Any free tools on the market?
2) A deleted folder in the DFSPrivateConflictAndDeleted folder, is it possible to know who originally deleted it in the share.
3) Is there a software or built in tool to know the history of use of a shared folder/File,
Ex:
User Action Path/File Time/date
user_x modified file_x @ time
user_z moved file_w @ time
user_j Deleted file_w @ time
Reead.
Hi,
Answering these in turn:
1) It is possible to see which files have just been replicated, but there’s no way to easily tell which files are in the middle of being replicated except by examining the DFSR debug logs.
To see files as they replicate:
1. Create the following registry *key* (not value):
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesDfsrParametersEnable
Audit
2. Enable Object Access Auditing for these servers (via local or domain-based group
policy) for SUCESS.
3. Refresh policy with GPUPDATE /FORCE (there should be no need to restart DFSR or
the servers)
4. Replicate a new file from upstream to downstream partner.
5. In Event Viewer | Security Events on the upstream partner, you will see:
———————————————-
Event Type: Success Audit
Event Source: DFSR
Event Category: (3)
Event ID: 7006
Date: 2/16/2006
Time: 10:33:50 AM
User: NT AUTHORITYSYSTEM
Computer: M3
Description:
The DFS Replication service sent an update for the following file:
Additional Information:
Replicated Folder Root: C:Sales
Replicated Folder Name: Sales
Replicated Folder ID: 3B38DDC2-FFBF-428C-9853-71D2D2D65351
File Name: test.txt
File ID: {B4738E50-CED1-4DA0-94CF-0E21345F98F6}-v2328331
File Parent ID: {3B38DDC2-FFBF-428C-9853-71D2D2D65351}-v1
Partner name: M1.contoso.com
———————————————-
Event Type: Success Audit
Event Source: DFSR
Event Category: (3)
Event ID: 7002
Date: 2/16/2006
Time: 10:33:50 AM
User: NT AUTHORITYSYSTEM
Computer: M3
Description:
The DFS Replication service served the following file:
Additional Information:
Replicated Folder Root: C:Sales
Replicated Folder Name: Sales
Replicated Folder ID: 3B38DDC2-FFBF-428C-9853-71D2D2D65351
File Name: test.txt
File ID: {B4738E50-CED1-4DA0-94CF-0E21345F98F6}-v2328331
File Parent ID: {3B38DDC2-FFBF-428C-9853-71D2D2D65351}-v1
Partner name: M1.contoso.com
———————————————-
6. In Event Viewer | Security Events on the downstream partner, you will see:
Event Type: Success Audit
Event Source: DFSR
Event Category: (3)
Event ID: 7004
Date: 2/16/2006
Time: 10:33:50 AM
User: NT AUTHORITYSYSTEM
Computer: M1
Description:
The DFS Replication service received the following file:
Additional Information:
Replicated Folder Root: C:Sales
Replicated Folder Name: Sales
Replicated Folder ID: 3B38DDC2-FFBF-428C-9853-71D2D2D65351
File Name: test.txt
File ID: {B4738E50-CED1-4DA0-94CF-0E21345F98F6}-v2328331
File Parent ID: {3B38DDC2-FFBF-428C-9853-71D2D2D65351}-v1
Partner name: M3.contoso.com
———————————————-
So by monitoring the security event log for 7002, 7004, 7006 events, you can get a
picture of what’s being replicated.
2) It is possible to know who did what with Object Access Auditing. This is covered (at the end) of http://blogs.technet.com/askds/archive/2007/09/04/where-s-my-file-root-cause-analysis-of-frs-and-dfsr-data-deletion.aspx.
3) See above.
Let me know if you have further questions on this,
Ned
If I setup an initial backup type replication in which I select my branch office as authoritative and then run a health report should I expect to see a huge amount of backlogged sending transactions from the backup server (not branch server)? That scares me that the data on the backup server is older and thus the reason I want my branch server to be authoritative. These servers are both running 2008.
I’m gun shy here because we had some sort of event on the central backup server last week that seemed to cause a HUGE amount of sending transactions from our central server back to the branch servers. It seemed to affect some servers that were still in iniital replication. It is almost as if they forgot that the branch server was authoritative. We have since verified that indeed some old files made their way back to the branch office servers. No one has access to the central backup server, no mass changes were made, no ACL changes, etc. The only thing on the box is FCS agent and Veritas NetBackup.
Don’t know if it is related, but we can’t even stop the dfsr service without it timing out and terminating the process. This has the unfortunate side effect of causing a DB recheck that takes about an hour to run. I’m double and triple checked limits and such and feel we are well below them. We are replicating 33 servers (each with an inbound and outbound connection, so 66 connections) to this one 64 bit Windows 2008 server. The branch servers are 32 bit. There is approximately 5.5 Million files, with very little change rate. The jet database is 2.1 GB, which from reading some post on here doesn’t seem all that large.
Any insight would be appreciated. I’m starting to get nervous.
Please open a case with us in support, your issues will require much deeper analysis/data collection than this blog is capable of handling.
I’m starting one, I was just curious if on an initial replication I should see backlogged transactions from the nonauthoritative member?
I accidentally starting spewing too much into the post, sorry.
No worries. I’d expect to see:
1. Backlogged *receiving* transactions
2. Backlogged sending transactions if there were preexisting files and they had been staged incorrectly or modified in some manner prior to initial sync.
Excellent, thank you. These were robocopied with a /copyall so yes they were modified. We learned our lesson with Robocopy a little too late for this migration project.
Getting all my facts together now to call support.
Thanks again.
I thought I would pass along something that occurred to me a little to late to help my situation very much.
Branch office to central server collection group. I robocopied the data with /copyall and thus inadvertantly changed all the files. You can still use the files to stage, but it will spam your logs with conflict messages and fill up your dfsprivate with conflict files.
Instead of pointing your replication group to those files, as prestaged files, simply copy your data to the same volume but do not point to them in your replication group (assuming you have enough space). Doing it this way, dfsr will still use those files to as seeds to populate the replication group (and thus still not copy all the data acros) but will not spam your log or dfsprivate area.
I believe this approach assumes you have enterprise on one end or the other so you get that nice cross file whatchamacallit thing goin’ on.
Above listed the most common causes for replication problems. Regarding #6, I have a situation where I one of the servers is no longer receiving updates and the debug.log has a large number of the following entires:
0080730 11:20:58.135 520 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v67 updateName:wmsfdwn4.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v67 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169616 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
20080730 11:20:58.135 520 MEET 1190 Meet::Install Retries:53 updateName:wmsfdwn2.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v65 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169614 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS
20080730 11:20:58.135 520 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v65 updateName:wmsfdwn2.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v65 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169614 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
20080730 11:20:58.151 2024 MEET 1190 Meet::Install Retries:53 updateName:wmsf_obj.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v70 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169619 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS
20080730 11:20:58.151 2024 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v70 updateName:wmsf_obj.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v70 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169619 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
20080730 11:20:58.151 2024 MEET 1190 Meet::Install Retries:53 updateName:wmsf_dwn.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v69 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169618 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS
20080730 11:20:58.151 2024 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v69 updateName:wmsf_dwn.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v69 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169618 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
20080730 11:20:58.151 2024 MEET 1190 Meet::Install Retries:53 updateName:wms_main.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v74 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169623 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS
20080730 11:20:58.151 2024 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v74 updateName:wms_main.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v74 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169623 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
20080730 11:20:58.151 2024 MEET 1190 Meet::Install Retries:53 updateName:wmsrptap.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v72 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169621 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS
20080730 11:20:58.151 2024 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v72 updateName:wmsrptap.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v72 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169621 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
20080730 11:20:58.166 1228 MEET 1190 Meet::Install Retries:53 updateName:wmsdwsrv.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v63 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169612 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS
20080730 11:20:58.166 1228 MEET 4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v63 updateName:wmsdwsrv.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v63 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169612 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}
Does this mean there is a sharing violation that may be preventing replication? I checked the event log and I am only seeing one Event ID 4302. Thanks for your help!
Hi Mkielman,
Sort of. We’ve seen that issue with various anti-virus products running on servers that have DFSR. They were gaining handles/intercepting data, leading to this sort of behavior.
You can try:
1. Turning off the real-time scanning of your anti-virus software on that server temporarily to see if the problem stops.
2. If not, we recommend *temporarily* removing the anti-virus software, as some do not completely stop scanning and none ever dynamically unload their kernel-mode filter drivers.
3. If still seeing the issue, ping me back and here and we can noodle some more. It might require that you open a case in order to ship us more data.
– Ned
Ned
Are there any issues with using DFSR to replicate data from Windows 2003 R2 SP1 to Windows 2003 R2 SP2? Thanks
Hi Tom,
None instrinsic to the SP itself. But you should have the latest versions of DFSR.EXE and NTFS.SYS on both servers to avoid issues that were bugs in both versions.
KB944804 and KB948833
– Ned
Thank you for your help! It turns out that the Sharing Violiatons were causing replication to become backlogged. I excluded the directory that contained all the files that were constantly locked and replication caught up shortly thereafter. One thing to note, these files were considered "Locked" by the OS because they weren’t open for writing, however, the Application was locking these files which most likely prevent the event 4302 from being logged.
Ned –
Is there a way to use WMI to obtain the current DFS backlog of a system? I have found the ‘getOutboundBacklogFilecount’ but it requires that I use the VectorName or something. I want this script to be automated and scalable so it would be ideal if it could run on each individual system and output that systems backlog, much like "DFSRDiag Backlog" except without the other information.
Is this possible?
Hi,
Really sorry for the delay, I was out all last week. We actually have a fully functional WSF sample script of this already that you could implement with next to no modifications:
http://msdn.microsoft.com/en-us/library/bb540040(VS.85).aspx
All you do is save as a WSF file, then run the script giving it the arguments it wants as:
cscript backlogtest.wsf /replicationgroupname:blahrg /replicatedfoldername:blahrf /sendingserver:blahsrv1 /receivingserver:blahsrv2
So this gives a good example of how it works. It also shows what we mean by passing in the VersionVector (as it automatically figures it out). No matter what you are going to have always figure out a few details about the servers and topology in question, so if you wanted that to get more automated you would need to modify the script to actually figure all that out (not trivial, but not super hard either).
PIng me back here if you have some more questions,
Ned
Meh – that URL got wonky. Just copy and paste the whole thing.
Hi Ned
I want to delegate the right to create namespaces & replication groups in Active Directory to a group of users. I want these users to be able to fully manage the namespaces & replication groups that they create but not the ones that other people have created. How can this delegation be done from within Active Directory system partition? I know how to delegate rights from DFS management console. Thanks
Ned –
I am trying to understand if compression is used during initial replication, but I am unclear if that is the case. I understand that RDC is used to only replicate the deltas but that doesn’t affect initial replication unless pre-seeding has been performed. So, my simple question is: Is compression involved with initial replication?
Thanks,
Megan
Hi Tom,
Are you having issues doing this in the DFSMGMT.MSC console, under the delegation tab? If you create the RG/RF and then add your user/group to that which contains the specific person(s) who will manage that RG/RF, and don’t add the other users, and those users are not already domain admins, it would just work.
Or are you looking to somehow script this to do this outside DFSMGMT? That can be done with DFSRADMIN.EXE RG DELEGATE.
I suspect I have not answered your real question… 🙁
– Ned
Hi mkielman,
First, let me clear up ‘compression’ – there are two kinds here:
1. XPRESS Compression – this compresses files over 64KB and not excluded from compression by type. It’s similar to zip, but faster, more linear, and not as efficient.
2. RDC ‘compression’ – I hate that we call this compression, as it’s not compressing files, it’s compressing time and bandwidth. :/ This is (as you point out) we do block replication of ‘chunks’ of files.
If you pre-seed data, we *will* try to use RDC on those files. We will always use XPRESS when meeting the rules above.
– Ned
Hi Ned –
Among my servers is one share with 937,003 Files, 105,646 Folders.
I had to abandon DFSR in previous version, due to limits published in "DFS Replication scalability guidelines" topic on the Microsoft Web site (http://go.microsoft.com/fwlink/?LinkId=75043).
Is any update to this available for Server 2008? Should I expect success in a production environment?
Thanks!
Alan
Why did you have to abandon? We state that 8 million files is fully supported – did it exceed 1TB of data?
Keep in mind that these are soft limits – it just means that was what was tested by Dev during the creation of DFSR. The DFSR dev teams blog goes into detail on this:
http://blogs.technet.com/filecab/archive/2005/12/12/Understanding-DFS-Replication-_2200_limits_2200_.aspx
http://blogs.technet.com/filecab/archive/2006/02/09/more-on-dfs-replication-limits.aspx
As to your question, we do not yet have published supported limits for 2008; that scalability testing is still ongoing (as you can imagine, it is very time consuming to test replicating massive amounts of data to hundreds to servers). I can say that we have customer verified field experience with up to 26TB being replicated in Win2008.
– Ned
Hi Ned
I would like to know more about cross-file RDC. Documentation states that cross file RDC takes place when a file is on source but not on the target & a similar file exists on the target. What does similar mean in this case? for instance, does it have to be EXcel to Excel, Word to Word, etc. to be considered similar? Thanks
Hi,
DFSR doesn’t really understand file types. Cross-file RDC (sometimes called cross-file similarity) uses a special file located at:
<drive>:system volume informationdfsrsimilaritytable_n
This a sparse file (read more on MSDN if you like – it will appear to be *huge* on 2003, but that is an Explorer quirk. It’s size on disk will actually be quite small usually) which is used to store signature information for all the files that are in the replica set. By traversing this file with heuristics, DFSR can quickly find signatures that match blocks of data for RDC. By matching these signatures up to what the upstream server has sent, it can ‘recycle’ blocks of data from existing files that have matching data bits.
So for example: I create a word doc. And this word doc gets passed around for years, getting modified and monkeyed with and whatnot, to the point where various copies have a lot of similarity, but some individual differences. Cross-file can use those parts that didn’t change to save some bandwidth when a later version is replicated. It doesn’t really know about word, it just knows this file has some binary goo that is similar in some other files.
Hopefully this explained it well,
– Ned
Thanks Ned. So, will cross-file RDC ever be used in the initial DFS replication? Or does it come in the picture once replication is complete & the authoritative flag is removed from the source.
(Sorry for delay, I had to head out for a family emergency last week).
If the data was pre-seeded, it could be used in intial replication.
– Ned
Ned,
Our implementation includes managing ntfs permissions for all of our remote file servers via powershell scripts updating GPOs (File System). DFS-R was complete before this was implemented. Following a refresh of the GPO, backlog files increases to nearly all or all of the files on the remote server. Do I have any options ?
Tom
So if I understand you – you reset all the securiy on all the replicated files and those files backlogged? Andthis happens via GPO so the security is reset every 90-120 minuites?
I’d expect to see a huge backlog if that were the case. Even though the files themselves won’t be replicated, the security metadata will be, and that could take some time on a large number of files – metadata counts as ‘file replication’ in DFSR backlog terms (i.e. there is somedifference between two serers that must be reconciled). My advice would be to… not do that. 🙂 Set security less frequently. Or set once and don’t worry about it after that.
– Ned
Wait a minute. Are you suggesting the implementation of the GPO is changing the security metadata each time the GPO is applied to the server even if no actual changes have occurred ? Did Microsoft develop Group Policy and DFS-R each in a bubble ? How could it be that Group Policy provides a very nice ability in which to manage file system security and yet this same ability will cause DFS-R to thrash for days ? The reason we have gone to an automated approach for file system security is to bring control to this very difficult to control environment. For large orgs with lots of file servers, this is a very daunting task. Regardless of where NTFS permissions are initiated, this seems like it will always be a big deal for DFS-R.
If we applied the same change on both sides, would the results be different ?
Thanks,
Tom
I am suggesting no such thing. If you have configured GPO and powershell (you give no details here) to reapply security by re-writing the security arbitrarily (i.e. removing the security – that’s a change, setting security – that’s a change), then DFSR will react to whatever the USN journal tells it to.
I suggest you carefully reproduce your scenario in a test environment both with and without your powershell scripts or GPO, whatever those are. We don’t have 1000 customers a day calling us about this issue, so you are likely in a corner case because of how you are implementing things.
Hi Ned
I am having problem replicating PST files. In a previous posting by Jill Zoeller, she mentioned the following:
[[[Outlook 2003. Contact Microsoft Product Support Services to obtain the
post-SP1 hotfix package described in Knowledge Base article 839647,
available on the Microsoft Web site
(http://go.microsoft.com/fwlink/?LinkId=55324). After you install the fix,
follow the same process as for Outlook 2000 except use the following
registry key: HKEY_CURRENT_USERSoftwareMicrosoftOffice11.0OutlookPST key ]]]
First, Is this still the case because I do not see a PST key in the registry? Second, Is the workaround applicable to users running Outlook 2003 SP3?
Thanks
Hi,
That post from Jill is regrettable – we do not support replicating PST files that are actively opened from network shares. Even though you can make the PST registry hacks like you mention above, you are in an unsupported position.
From the DFSR FAQ – http://technet.microsoft.com/en-us/library/cc773238.aspx :
Can DFS Replication replicate Outlook .PST files?
Although DFS Replication does not explicitly omit Outlook Personal Folders Files (.PST) from replication, .PST files that are accessed across a network can cause DFS Replication to become unstable or fail. DFS Replication can safely replicate .PST files only if they are stored for archival purposes and are not accessed across the network using a client such as Microsoft Outlook (copy the files to a local storage device before opening them).
For more information about why .PST files cannot be safely accessed from across a network, see Microsoft Knowledge Base article 297019 (http://support.microsoft.com/kb/297019)
– Ned
(ps: not sure about the registry entry you mentioned, I doubt it’s changed between versions if it’s supported though – you’d have to ask the Office folks)
Ned,
How would you think Powershell is in the equation ? If I use GPMC and edit a GPO, a change is recorded. This change is then replicated to all DCs. The real question is how is the change applied to the destination server ? Is it a rip and replace or is it applying the changes only. By looking at the winlogon.log, we see all of the file system security entries from the GPO. My suggestion would then be it is replacing security on all listing folders.
A quick google shows at least one other customer who faced this same problem. He; however, was not using GPO. He was simply changing file permissions using the GUI. As for testing in the Lab. Really there is no difference.
So my question remains. If I apply the same security settings to the DFS-R destination, will this reduce replication traffic ?
Thanks
Tom
Because you said: "Our implementation includes managing ntfs permissions for all of our remote file servers via powershell scripts updating GPOs (File System)". I didn’t know what you meant by that – could be startup scripts deployed by GPO that update security, for example. Remember that I only know what you tell me here, I don’t have any familiarity with your environment.
So I just attempted a repro of this – I created a GPO that ACL’ed my replicated folder. I made sure the GPO was set to ‘replace all existing permissions’ mode. I forced policy to apply – at this point the permissions were a match, and there was no additional replication. Then I manually changed permissions on the replicated folder, and force policy to apply – security was replicated from that server, as would be expected as they did not match between servers. Then I forced policy to apply again – no replication occured because the security already matched. Does this match your repro steps? I only see replication when the security does not match, regardless of GPO, as one would expect. If the security matches, nothing happens.
– Ned
Hi,
Is there a way to delete unwanted files from the ‘Pre-Existing’ folder? I don’t seem to have (or be able to give the administrator) sufficient permission to delete them?
Thanks Huw
Hi Huw,
Yes, you would ordinarily just need to be a member of the Administrators group – by default it is ACL’ed with full control on that folder. If not, an administrator would need to give you rights to delete that folder. And if Administrators is not actually set for full control… well, someone has been changing things in there!
– ned
Thanks Ned, what are the default security settings for the SVI folder? On the 2008 servers I have here SYSTEM full control only with no access to administrators, could be something coming from a group policy?
Huw
By default, it’s SYSTEM only for the System Volume Information folder(s) on Win2008. That was done intentionally, as we saw a lot of customers accidently deleting/damaging security in Win2003 R2 in that folder. You will just need to make Administrators own that folder, then add ADministrators full control to its contents.
There is also a special folder/file protection for SVI in Win2008. So if you go through Explorer and try to delete files, they will… not delete. You will need to go through a CMD prompt.
Hi Ned
I would like to know Microsoft offical position regarding storing DFS replicated data (via DFSR) on an MS cluster? Thanks
Officially 100% not supported. On 2003 R2 and WIndows Server 2008, it will not work (in fact on 2008, if you try to add the DFSR role and the server is a cluster, it will prevent you with an error).
More info:
http://technet.microsoft.com/en-us/library/cc773238.aspx#BKMK_061
"Is DFS Replication cluster aware?
No. DFS Replication is not supported as a Cluster service resource. Replicated folders are not supported on shared storage.
"
– Ned
Hi Ned,
I have questions about the reporting features.
I have a customer who would like to see some statistical information about the replication effectiveness, like:
– The files (in a specified directory) which was replicated,
– The original size of a file,
– The start time of the replication of a file
– The end time (eg. when the file arrived to dest server) of the replication of a file
– The size of the data sent over the wire in bytes
I am wondering if the dfsrdiag is capable to create such a report (or an XML like you produced with the canary file which I can interpret or XSLT later),
OR
I have to write a solution which processes the logs from different servers and creates the reports.
This is a quick question before I build a virtual environment (lot of time) for testing the dfsrdiag. The first answer sould be Yes or No.
If the answer is Yes, the dfsrdiag can do this, the second part of the question is this:
Could you specify what parameters should I look for, please?
If the answer is No, I have several logs from my customer, so I will analyze them further (I dug myself into the logfiles, wrote some Regex for the processing, but it is a more complex work with multiple files).
For the end, there is a bonus question: where is the info in the log files which shows the replicated file’s original size?
Sorry for being gassy,
Gyorgy
Ned,
Thanks for a very informative post!! I stumbled upon this, however, while looking for a solution for replication that never even gets started. I keep getting this error every few hours, on both replication partners:
Event Type: Error
Event Source: DFSR
Event Category: None
Event ID: 4004
Date: 10/16/2008
Time: 6:05:42 AM
User: N/A
Computer: PGDC
Description:
The DFS Replication service stopped replication on the replicated folder at local path R:DFSDFSTestDFSTest2.
Additional Information:
Error: 87 (The parameter is incorrect.)
Additional context of the error: R:DFSDFSTestDFSTest2
Replicated Folder Name: DFSTest2
Replicated Folder ID: B12B9B9A-8553-4ACC-94DB-388B033C37E7
Replication Group Name: unicobank.wandfstestdfstest2
Replication Group ID: 5B02B26D-65E6-41F5-B5C0-25C7737E369C
Member ID: 4FA3CBDE-4EF2-4841-A8A6-F0BB6BE8EF7C
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
I couldn’t really find any information on this error, but I thought your 1. and 2. suggestions from this article might be a place to start. Do you agree? Or have you seen this particular error before?
Thanks!
–acorn
Hi guys. Sorry for the delay in response, I have been out of the office for a couple weeks.
@ Gyorgy –
We don’t have a perfect answer on this. All of that information is in the debug logs when they are set at level 5 verbosity, but parsing them will certainly require you to write some fairly complex string parsing code.
It is possible to see the statistics in a ‘meta’ fashion with the DFSR PerfMon counters, but they won’t be specific to a given file.
It is also possible to determine some of the data file-by-file by enabling auditing:
1. Create the following registry *key* (not value):
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesDfsrParametersEnable
Audit
2. Enable Object Access Auditing for these servers (via local or domain-based group
policy) for SUCESS.
3. Refresh policy with GPUPDATE /FORCE (there should be no need to restart DFSR or
the servers)
4. Replicate a new file from upstream to downstream partner.
5. In Event Viewer | Security Events on the upstream partner, you will see event 7006, 7002. On the downstream partner you will see 7004.
The problem here is not one single system has all this data – the DFSR database has some data, the perfmon objects have some data, the debug logs have some data. That’s why there’s no real easy way to do this.
– Ned
@ Acorn –
This sounds very much like you have two drives in this computer with the same volume serial number. This could have happened by breaking a RAID1 mirror, using disk imaging software, etc. Do you *also* see a 6602 event when the DFSR service is restarted, stating ‘The DFS Replication service detected and discarded an inconsistent volume
configuration’ ?
This can be fixed, but it’s a bit scary. I am including the steps, but if you are not 100% confident in following them, I highly recommend opening a supporet case with us to assist you.
You can change the volume serial number of the disk using a utility called
dskprobe.exe [ A setup utility ]
This should be done on the Server with the DFSR 4004 and 6602 errors.
Before doing this, ensure you have taken a backup of the data on that
volume[drive]
This can be done as shown below:
1. Run dskprobe.exe.
2. On the menu options click "Drives", select "Physical Drive".
Then choose the physical disk and click set active. and Click OK.
3. On the menu options click "Drives", select "Logical Volume".
Then choose the logical volume and click set active. and Click OK. [ The logical
volume is the drive letter on which the staging folders are missing ]
4. On the menu options click "View", select "NTFSBootSector".
Notice the box "Serial Number (hex)". This should be the second last box on the
left most column.
a. the last 8 digits of this number would be the existing volume serial number.
b. change any of the last 8 digits to make this volume serial number unique [i.e.
Not the same as the drive it was conflicting with]
5. On the menu options click "Sectors", select "Write".
You might be prompted to turn off the "Read Only" mode. Agree to it.
6. Close the window. Close all open programs and reboot.
7. After reboot, check the volume serial number. It will have changed.
The staging folders should get created automatically and DFSR replication should resume replication on folders on all drives within a few minutes. The forementioned events will also no longer appear.
– Ned
Forgot – dskprobe.exe is included with the 2003 Support Tools – download latest ones from microsoft.com
First of All thank you for the great info Above.
Can you possibly help me with the problems I am having below
I have a couple of questions regarding DFS-R
I have Two 2008 Core 64bit File Serves using DFS-R
How do I know if Initial Syncronization finished ?
How do I know if bandwith is an issue of slow replication ? I have it setup up to constantly replicate. I have about 10 replication groups, some of them are up to date all the time, others have around 150 backlogged transactions, but I can’t determine the reason for it. It takes a couple of hours to sync one single file (less than a meg)
My Staging folder actual size is using 3 Gigs out of 4. Should I double it to 8 Gigs. ?
I read that VSS is supported with DFS-R. But is this only the case if VSS snapshots sit on the same drive you are replicating ? How about if the Snapshots are located on a different storage ? Can you have VSS running at both targets independently ?
Hi Diego,
1. Initial sync is doen when the downstream server gets a 4104 event in its event log. You can see this on 2008 Core by using the WEVTUTIL event viewer or by connecting remotely to the event log from a Vista/2008 Full machine.
2. If you run DFSRDIAG BACKLOG <options> once an hour for a couple hours, are the samee 100 files always listed? Are those same files showing constant sharing violation events? Do you anti-virus scan your replicated folders? Does HANDLE.EXE (microsoft.com/sysinternals) show some particular application holding those files open all the time? The fact that you already suspect limited bandwidth is telling 🙂 – what are the connections like between servers? Very slow and thin?
3. Increase your staging if you are seeing 4202 DFSr events more than a few times a day. Doubling it is often a good start.
4. VSS snapshots are not replicated. You can definitely snapshot all servers.
Thanks for the quick response.
1. Great, I verified all my 16 Groups finished initail sync
2. No Antivirus installed on these servers Yet. Planning to do it soon. Should I exclude the DFSrPrivateStaging folders or any other folders once I do it ?
On Group A, I have about 3 files with sharing viloations for at around 6 days now. And about 170 files total (receiving/sending) that are backed logged. Could these 3 files be causing these large back log and making a file take 1 1/2 hours to replicate ?. Also the files with sharing violations are not the same that are being backlogged. These sharing files are not same that are on the backlog
Also one of the servers is not being accessed by anybody now since I have not enabled the dfs link on it and I am waiting to fix this slow replication issue first, however, this server has a backlog of 145 files pending to be sent. Shouldn’t it be ZERO, since there are no changes on any files at this server. And if I run the backlog report on the other server as the sending server where files are actually bieng changed, the backlog is actually lower, 31 Files. Same Goes for Group B, sending backlog is 667 where data is not being changed and where the data is being changed, the backlog is 95. Backlog Files seem to be the same for the past hour, I’ll keep an eye on them for the next few hours
On Group B, I have about about 47 files with sharing violations of which 7 have been there for one day and about 760 files files total (receiving/sending) that are backed logged. Backlog Files seem to be the same for the past hour, I’ll keep an eye on them for the next few hours
the bandwith between the location is 6Mbps, I have the replication set to use 4Mbps in each of the 16 Groups I have setup.
3. Event 4202 was constantly poping up during intial sync which is expected, but after this finished, it comes up once every 2 or 3 days in 2 of my groups which is taking a long time to sync. I don’t believe this is the case of slow sync but I will increase, it won’t hurt right ?
4. Thanks, so to confirm, I should be able to have VSS enabled for Drive 1 on Server A, have VSS snapshots located on drive 2 on Server A, then replicate data on from Drive 1 on Server A to Drive 3 to Server B, and have enable VSS on on Drive 3 on Server B, and have VSS snapshots located on drive 4 on Server B.
1. Cool.
2. Yes, we recommend you disablew AV scanning of dfsrprivate (and will have an official document on this releasing at some point). Let me know what you see on that backlog after a few hours, sounds like it’s not to do with the handful of sharing violations.
3. Correct, it will not hurt.
4. Correct.
OK, same files are on the backlog and they are increasing. Almost like DFRS is taking a long break on certain groups but not reporting any real errors. The only errors are I see are the ones below, they show up twice a day but they follow up almost immediatley with the message right below which says everything is back to normal.
This is only afecting a couple of groups because I’ve verified other groups that have no backloggs what so ever and If I copy a couple of megs files, they gets synced within seconds. And this is on the same server
I am tempted to restart the service and see if replication starts backup up and the number of backlog files start decreasing again.
Also is my concern above valid where I have a backlog of sending files on the server where nothing is being changed ?
Again, I appreciate your help on this.
The DFS Replication service is stopping communication with partner XXXX for replication group XXXXXXXX due to an error. The service will retry the connection periodically.
Additional Information:
Error: 1726 (The remote procedure call failed.)
Connection ID: E0C605C7-622D-4889-8046-9B87EB52F157
Replication Group ID: 5F87902A-405D-47E5-BC3A-C0ADC76322AA
***************************************************
The DFS Replication service successfully established an inbound connection with partner
Hmmm… starting to wonder if your issue is related to Scalable Network Pack, based on your symptoms and errors above.
If you are running Windows Server 2003 SP2, I would *highly* recommend you install the following on all servers (and not just the DFSR servers):
950224 A Scalable Networking Pack (SNP) hotfix rollup package is available for Windows Server 2003
http://support.microsoft.com/default.aspx?scid=kb;EN-US;950224
—
As far as a backlog where nothing is being changed – that’s not possible. *Something* is changing files, even if it’s not an expected change. 🙂 You could use Process Monitor or Object Access Auditing to see who and what it is.
I am running Windows 2008 Core x64bit.
I’ll dig deeper and see why there is a sending backlog on the server where nothing is being changed that I know of.
I staged this files using windows backup, could that be part of the issue since I see you crossed that part out in your article ?
At this point nothing seems to be replicating on some groups or its taking extremely long time to do so. Almost like if there were limitations on the scheduling. It looks like it starts by taking a long time then stopping all together. I will give it a couple of more hours tonight and If I see no change, I’ll restart the service and see if that makes any difference.
It appears that all of the Files that are backloged on the sending member where nothing has changed are located on the DFSPrivatePreExisting Folder. and not anywhere else. My understanding is that this data is not replicated and put aside. I am confused :@
Dang, I forgot you said 2008. Do you use HP Gigabit network cards in these machines? If so, you will need to go into the proerties of those NICs and turn off *HPs* built in scalable network pack pieces, as they are on by default (and our SNP is off by default in 2008).
Ehhh… in the preexisting folder. that’s bad. That folder is not replicated and DFSR will not replicate that folder. Are you sure that these are the files? That backlog report does not provide paths, so there may be multiple copies.
You can definitely always move the contents of the pre-existing folder out of the RF. Unless you want to actually restore those files…
It is actually a Dell 1855 Blade Server.
I am pretty sure those are the files, since I did a search on the file name on both servers and it only showed up on the server (which was where I staged my data) with the sending backlog (where nothing is being changed) on the preexisting folder, . My next step is to delete the contents of the preexisting folder.
So, since there was still a backlog and it was growing this morning, I decided to restart the DFRS service on the Sending Server, where data is being changed and guess what ? the groups I was having problems with, seems to be working now, the backlog is now decreasing.
However there is still a backlog on the server where nothing is being changed. It will probably go away once I delete the contents of the preexisting folder
So, it looks like we have 2 problems here
1. We have a server with pre-staged data trying to sync data on the prexisting folder but failing
2. We have 16 groups on a server, most of them working but something happens all of a sudden and stops replication for certain groups, no errors reported, and replication picks up again once the service is restarted.
🙁
Hmmm… if you are still having problems after restarting the DFSR service, at this point we’ve probably reached the end of effectively troubleshooting we can do in a blog comment post. 🙂 I’d recommend you open a case with us at that point so we can do deeper data analysis.
Restaring the service fixed the issue, but I have a feeling it will come back. but At least I have an understanding of what the problem is and how to fix it temporarily.
As far as the preexisting folder, after I deleted the contents and restarted the DFSR service, guess what ? no more back log. Another verification that there is something messed up with DFSR and it was trying to do something with those files.
If I decide to open a case with Microsoft, and find anything interesting, I’ll post it here.
Thanks again for your assitance.
Fantastic Blog…….
Help….
I am trying to run DFSRdiag backlog to view files not replicating. However l keep recieving this error message…
[ERROR] Replicated folder <dfs_fbv> not found. Err: -2147217406 (0x80041002)
I have installed the latest hotfixes as requested by the Microsoft KB but still no joy…
Please can anyone help?????
🙁
Can you first use DFSRADMIN RF LIST <options> to dump the list of replicated folders for that RG and verify it exists, the spelling, etc?
thanks for the post….
I ran the command above and results are below.
D:>dfsradmin rf List /RgName:DFS_FBV
RfName RfDfsPath
DFS-FBV
Command completed successfully.
On this domain we have 4 sites with 2 servers on each site, l do have the DFSRdiag Backlog running fine on one site and was trying to implement the same script on each of the others however l get the error message posted earlier i am using the command below editing for the DFS on each site.. Again thanks for any ideas you come up with..
dfsrdiag backlog /ReceivingMember:SERVERNAME /SendingMember:SERVERNAME /rgname:DFS_FBV /RFName:"DFS_FBV" >> d:DFSLOGDFS%datefile%
dfsrdiag backlog /ReceivingMember:SERVERNAME /SendingMember:SERVERNAME /rgname:Users_FBV /RFName:"Users_FBV" >> d:DFSLOGUser%datefile%
It looks like you are using an underscore for the RF name instead of a dash, which is what is your true RF name is above (your RG is DFS_FBV, and your RF name is DFS-FBV).
So this:
dfsrdiag backlog /ReceivingMember:SERVERNAME /SendingMember:SERVERNAME /rgname:DFS_FBV /RFName:"DFS_FBV" >> d:DFSLOGDFS%datefile%
should be this:
dfsrdiag backlog /ReceivingMember:SERVERNAME /SendingMember:SERVERNAME /rgname:DFS_FBV /RFName:"DFS-FBV" >> d:DFSLOGDFS%datefile%
May be the same with the other one that’s not working.
Perfect!!!! Thanks for your help on this…
Thanks
Hi everybody
I have a problem with my replicated files
the is 2 server, and replicted folders
users are working in the shared folder in server, so the problem is thes, i’m opening the exel file makeing some changeis then saveing, after a time when i open the file the changeis are lost.
can someone help me plese?
sorry for my poor english
even viewer reports
Source DFSR
Even ID 4304
The DFS Replication service has been repeatedly prevented from replicating a file due to consistent sharing violations encountered on the file. The service failed to stage a file for replication due to a sharing violation.
Additional Information:
File Path: E:MDFFinanceFinanceArchiveMaryFinance-FO-08 4_MDF-SI-FIN-08 4_MDF-SI-LT-08MDF-SI-LT Active-082D4E8000
Replicated Folder Root: E:MDFFinance
File ID: {E96ADA00-D7F6-4355-A3C5-9C307DE27470}-v1429372
Replicated Folder Name: Finance
Replicated Folder ID: 1A3545A5-13E4-48B3-8B8A-BA340A3C3D8F
Replication Group Name: mdf.localmdffinance
Replication Group ID: 3B0CA0BF-C168-407C-9203-2FB0C2505420
Member ID: 8B18AC22-9813-41A7-9ACF-2BCBE88A72EB
For more information, see Help and Support Center at
Source DFSR
Even ID 4412
The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.
Additional Information:
Original File Path: E:MDFPublic_DataFinance StaffMaryMary reports for Lusi2008MA-10-08PF_MA_Report_W_2008_10_24.xls
New Name in Conflict Folder: PF_MA_Report_W_2008_-{E96ADA00-D7F6-4355-A3C5-9C307DE27470}-v1429044
Replicated Folder Root: E:MDFPublic_Data
File ID: {EFD0CFCA-B308-4731-A6A2-EE15B106FE0A}-v850136
Replicated Folder Name: Public_Data
Replicated Folder ID: 4602B5CB-3637-4981-B5D5-97155721C2E0
Replication Group Name: mdf.localmdfpublic_data
Replication Group ID: CB1917F0-4551-4030-A639-3C23A2CB187F
Member ID: 8BDA8D1A-03A7-478C-982A-244AC952DF08
For more information, see Help and Support Center at
Source DFSR
Even ID 5002
The DFS Replication service encountered an error communicating with partner DBSERVER2 for replication group mdf.localmdfadmin_staff.
Partner DNS address: dbserver2.mdf.local
Optional data if available:
Partner WINS Address: dbserver2
Partner IP Address: 192.168.1.2
The service will retry the connection periodically.
Additional Information:
Error: 1753 (There are no more endpoints available from the endpoint mapper.)
Connection ID: B547A38A-7013-45D4-A029-5EFC974FDF1A
Replication Group ID: 26ECBDFF-EFFD-4CDF-B57C-26782E040E77
For more information, see Help and Support Center at
Hi Arm,
We actually had a bug on that years ago (fixed in kb917622). I would recommend that you install the latest DFSR service hotfix and verify that you still have the behavior – if you do, please reply back here.
Latest rollup hotfix for DFSR:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;948833
You can download from here, note the button in the upper left.
– Ned
Hi Ned,
thanks for yor help, but the problem still present
2 servers are 2003R2 and all microsoft updates are installed, also i installed WSUS_3.0
thank you ones agane
Date: 03/11/2008
Time: 8:46:30 AM
Source: DFSR
Event ID: 4304
The DFS Replication service has been repeatedly prevented from replicating a file due to consistent sharing violations encountered on the file. The service failed to stage a file for replication due to a sharing violation.
Additional Information:
File Path: E:MDFFinanceFinanceArchiveMarineSargsyan9.MDF-YE-FIN-08MDF-YE-Bank-08MDF-YE-BankCashFlow-088DCC0000
Replicated Folder Root: E:MDFFinance
File ID: {EFD0CFCA-B308-4731-A6A2-EE15B106FE0A}-v853516
Replicated Folder Name: Finance
Replicated Folder ID: 1A3545A5-13E4-48B3-8B8A-BA340A3C3D8F
Replication Group Name: mdf.localmdffinance
Replication Group ID: 3B0CA0BF-C168-407C-9203-2FB0C2505420
Member ID: 04C7B53F-BFB9-44B7-AA40-D2EDA251C62B
For more information, see Help and Support Center at
Hi Ned,
Just curious as to why these hotfixes aren’t available via Windows Update, and are rather released as hotfixes?
Given the issues that they resolve with the core functionality of DFSR, I would have thought that they’d be marked as important updates?
We have been using DFSR for years, and have seen many of the issues that these hotfixes are meant to resolve, yet we always thought we were running DFSR at the latest patch levels.
As always, your articles are fantastic, so keep up the good work!
That’s a good question – there are definitely a lot of hoops to jump through for a QFE to make it into Windows Update. I’ll ask around and see if we have plans to do this in the future or not.
– Ned
I have configured DFSR beetween servers. Since there where too much files and problems with replications i used robocopy to copy all files on the other side. I got errors and warnings like "file has been changed on multiple servers" and i know that is normal becaose of robocopy. However another thing happened on the old server (server that had updated files innitially). After all was finished (since files where replicating in both ways), DFSRPRIVATE folder became big big. first i didn’t worry about it, since i thought i have to wait for all of the things to finish, but now i have only 40GB left on that partition. After analyzing I found out that STAGING folder is 40Giga and PREEXISTING folder is 250GIGa. I read somewhere that after using updated dfsr.exe (which i have installed from the time that i started experiencing problmes ) some of the people experience this. Interesting fact is that those files do exist on the server (i havent checked them all).
What do i do with them. do i delete or what? I am sure that my organisation was not so productive to create 250Giga of data in a month. I suppose that they are all somehow copy of somthink.
Another important fact is that they have a stamp modified 10/11/2008 (which i guess coreespodents to the time i worked on the replication a month ago). I would delete them but i read this article
http://www.eggheadcafe.com/software/aspnet/30863354/this-member-is-waiting-fo.aspx
As far as i know users where not complaining about lost files or something
the last thing: DFS ROOT is in other server and initial replication was to get the files from the old server to the new one (DFS ROOT)
Hi,
The staging folder is as big as you configured the quota – so by default that’s 4GB per replicated folder.
The pre-existing folder only contains files that were not present on teh upstream when you set up replication – since the data is not accessible to end users, I suggest you back it up, verify the backup is good, and delete preexisting data.
We’ve been at this for over a year (since August 2007), with more than 100 posts (127 to be exact), so
Ned, you misundestood me on the issue.
All the folders inside Preeexisting (which started to become big after innitial replication was finished)
exist on both dfs servers. After innitial replication was finished i have put the values back to default (4GB) hoping that with time it will clean up.And it did clean up on the NEW server (in all folders inside DFSRPRIVATE), but not the old server. 2 folders just became bigger…actually here is very funny story.
In the old server i checked STAGING folder and PREEXISTING a week ago. Staging was 80GB and PREEXISTING 190GB.
after a week (yesterday) Staging was down to 40GB preexisting Growed to 250GB.
As i have read about it, Staging and Preexisting folder shouldn become bigger after innitial replication especially since they had the material
I am sure that i can delete Staging. I also know that when something is in the preexisting folder you cant find it somewhere else, but thats not my case. I see those folders and files out of preexisting which means they are some sort of copy. I am sure that my company did’nt produce 250 GB of word pdf and excel files in a month.
I am curios why it continues to happen.
I have also to confess that when i had some problems in the beggining, i stoped DFSR tried with NTFRS
and stoped immediately. than i did robocopy to copy all content of the OLD SERVER to NEW SERVER. Since Hashing is known issue i had that message (file has been changed on multiple servers….). After innitial replication i turn back all values to default.
Old server is x32 version and new is x64
Thanks Ned for everything , your blog helped me on the first place, it’s very interesting you do find anywhere on the net about dfsr
Ah, now I understand you better – sorry about that. yes, that is very intersting. It’s possible that something went very wrong during initial sync (or initial sync has never exited, but still with something gone wrong causing a loop) and data was being continuously recreated in pre-existing. I can say that I have never seen this before!
I recommend that you open a support case with us and have some deeper investigation. This is going to require a lot of data analysis that won’t be easy to do through the blog.
HI, Ned here again. We have put up a new(ish) KB article that will allow you to always see the latest
Hi Ned,
Excellent post! I can see why it’s so popular. I followed all steps above but things still get slow. The problem is I can’t really out the finger on the issue. Sure, there’s a few locked files (about 10 a day). The confusing thing is that one of the namespaces is working fine – and the other one spontaneaously decides to wait up to three hours for replicating a 1k file.
The only thing we can imagine is the amount of folders in the namespace. when designing the thing I read that you can use 5000 folders in a domain hosted namespace. So I figured that number would be the number of folders I put into the namespace manually and let replicate. Since there’s only about 10 of those, I wasn’t worried. Now that we have problems I think that I might have misunderstood that number… and that all subfolders in the folders’ targets are counted, too. In that case we’re in trouble: there’s a total of 15k subfolders in there…(don’t ask… some of our users seem to make a folder for each file…).
So: could the number of folders be the problem? They barely ever change, but the sheer volume…?
The 5000 folder limit is with actual DFS namespace folder targets, so that’s not really in play here. If this is Win2003R2, I would also recommend installing the SNP hotfix, and verifying in your network drivers that the vendor has not turned on their own home-made SNP settings (Receivce-side scaling, chimney offloading, etc). http://support.microsoft.com/kb/950224
If you are still having issues after following the whole blog post and that extra piece above, you might want to get a support case with us, as we’ll need to see a lot of data to figure out the issue.
– Ned
Hi Ned,
Now THAT was a fast resonse. Absolutely fabulous.
So my initial assumption was the right one. Kind of good, kinf of bad – that means I need to search further :-/
I already installed all the hotfixes mentioned above. I also implemented the other post-SP2 hotfixes I like the idea of the self backed SNP implementations. I’ll look into that.
As ususal Murphy’s Law holds true – right now there’s no backlog and things seem to be working fine – so I can’t see if things are ok… or if it’ll start acting up again next week (has been like that for a while now, so I suspect the latter).
I’ll keep you posted
Thanks again
Sven
Hi AL,
I responded to you in an email. Thanks.
– Ned
Wow that was quick !!!
I will try my best to follow your suggestions Ned !
Hello Ned,
In 2003 R2 or 2008 is it possible to relocate the dfsrprivate folder(the whole folder along with the subfolders)?
If it is, can you direct me to articles on the steps and check points?
Thanks.
It’s not possible to change the whole base folder path, it always lives in the RF (on 2003) or in the System Volume Information folder (in 2008/2008R2, via a junction point).
It is possible to change the staging path though:
http://technet.microsoft.com/en-us/library/cc773238.aspx#BKMK_012. That’s the part people usually want to manipulate anyways since the other folders are very small, especially if the Conflict And Deleted quota is reduced in size.
Hey Ned, slightly off topic (but still DFSR related) do you know if KB961655 applies if you are deleting an entire replication group, and recreating it from scratch but using the same replicated folder name and path?
At the moment we have one replication group per replicated folder, and I’m planning on consolidating these replicated folders in to a single replication group.
No, if you are completely removing the RG’s and making one big RG from scratch, that would just work.
On a server that has a directory that primarily receives replicated files from a branch office, I find a folder under DfsrPrivateStaging called ContentSet{71bf…etc..
Explorer claims that there are 100 folders containing about 6.85GB of data. DFSRDIAG BACKLOG says that everything is up-to-date in both directions. Is this space actually in use and what is it?
There is a similar set of folders on the other server. I thought that perhaps this is where RDC did it’s magic, but you refer to a similarity table located somewhere else.
Can this space be freed up?
Thanks!
The staging folder contains all of your staged files – so everything under ContentSet{GUIDGOO} is the actual files that are currently staged for replication. These hold the RDC signatures.
Hi Ned
I have 10 branch servers replicating to 1 HUB server. I plan to replace the HUB server with another server in a different location. The existing HUB server will be decommissioned. What’s the best way to point the branch servers to the new HUB server? Thanks
The best way is to add the new server, get it replicating and and in sync, then change your replicaiton topology to make him the hub, then remove the old server – using DFSMGMT.MSC.
Hi Ned
I will need to rename one of the replicated folders in a replication group. Since DFSR does not natively detect a folder renaming & there is no way to point to the new name in DFS management console, what is the best way to go about doing this? Thanks
Hi Ned! I have setup a replication group with about 500GB of data and i’m getting the message that the initial replciation completed however, i have a backlog of 602 files that will just not move. Can you provide some insite as to why that may be? I’m running server 2008. Thanks in advance!!
Are they temporary files? Named .TMP? .BAK? Etc? there are lots of reasons – the fact that initial synx finished means that either:
1. they did not exist when initial sync was being done.
2. They are not considered valid for replication.
Thanks for the response Ned. They are valid files… .doc and .xls files and they existed prior to the initial sync. If i add something to one of the folders on one server, they replicate, but on the other member if i delete them, nothing happens. The only thing different about these files compared to other replicated files was the archive bit… does that have an effect?
Do you run Forefront anti-virus or any other AV software on this computer, and if you do, have you applied hotfix:
953325
which comes via KB:
956123
Also, when I say temporary files, I mean do they have the temporary file attribute set.
The archive bit doesn’t matter.
Thanks for your response Ned. I applied the hotfix to my servers and identified the files with the temporary file attribute, ran the power shell command to recursivly repair them and still no luck. I recived the following errors with the DFSR heath reports early on but now i’m not getting them anymore. Any help would be GREATLY appreciated!
One or more replicated folders have content skipped by DFS Replication.
DFS Replication does not replicate certain files in the replicated folders listed above because they have temporary attribute set, or they are symbolic links . This problem is affecting at least 100 files in 1 replicated folders (up to 100 occurences per replicated folder are reported). Event ID: 11004
Just removing the temporary bit will not cause them to replicate – they need to be ‘touched’ in some meaningful way afterwards to trigger a USN update. A content modification, a security change, a rename, moved out and back in to the replicated folder, etc.
I tried to "touch" each file by changing permissions and the backlog count didnt lower. Any ideas? Thanks Ned!
You will need to examine the DFSR debug logs then. Make a change to a file, verify that it did not replicate, then open the %systemroot%debugdfsr*.log file on that server. Find the reference to that file, and see what details it is providing about why the file is not being replicated.
If not copmfortable doing this, I;d advise opening a a support case with us.
Hey Ned! I got it resolved and am fairly certain that it was the Temporary File Attribute that was causing the backlogged files. I ended up just deleteing the replication group and recreating it and all is well. Now i have another question that i can find a definitive answer on. Can i rename a server that is a member of a DFSR replication group? If so, does it trigger any kind of rescan? Any supporting docs would be great if you have them. Thanks again Ned!!
This will break DFSR as a number of topology attributes are not updated by renaming the computer object itself. We are toying with the idea of updating KB316826 to show how to do this for 2008 DC’s running DFSR for SYSVOL, but Win7 work has us seriously tied up and this is not a common operation (in fact, you are the first person to ever ask me this in years of DFSR).
In the meantime, the safe and approved way is to gracefully remove the server from the replication group, rename it normally, then add it back in (making sure AD replicaiton has converged between all three steps). This will cause replication to do initial non-authoritative sync on this server, but since you are doing this off hours and very little is likely to have changed in this short time frame, it should be over very fast. Just like using pre-seeded data.
– Ned
Hi Ned,
Hope you can offer some advice.
I have currently setup Windows 2003 R2 DFS on several servers. Theres a mix of SP1 and SP2 servers.
DFS replication is happily working right now, but was wondering if you able to advise any DFS hotfixes/updates i should be applying to avoid any potential problems in the future.
Secondly, how would i go about handling this situation.
A department would like to dump approx 50GB of data onto the DFS share. Is there any way i can pre-stage this 50GB of data onto the DFS servers and avoid having DFS replicate the full 50GB of data out to all DFS servers?
Or do i need to delete the existing replication group. Copy the 50GB data onto all of the DFS servers via external USB hard disk. Then create a new replication group?
Thank you!
Chau.
Hi Hockeman,
Lookee lookee: http://technet.microsoft.com/en-us/library/cc794759.aspx
It turns out we do have steps. Neither I nor the developers I spoke to were aware of this doc, but one of our tech writers chimed in and that shook the cobwebs free. Even though these steps are for DC’s running DFSR for sysvol, the same steps would apply for custom (with different paths, naturally). So there you go.
Hi Chau,
1st question: http://support.microsoft.com/default.aspx/kb/958802
2nd question: Yes, using robocopy with very particular steps. This is documented in another bloh post here under ‘pre-seeding’.
Hey Ned! Very cool about the rename. I’ll test it in my Lab. Regarding the Backlog issue I was having I simply deleted the replication group and re-added it and now it’s good… Zero backloged files. However, now I believe that I have screwed up my replication set by doing the "Big No No" of restarting the DFSR service because of WMI errors. I found this event message on one of the servers and wonder if you could provide some insight as to what may have happened and what I may could have done to prevent an entire rescan like is happening for all my sets now.
Event ID 5014-
The DFS Replication service is stopping communication with partner SERVERNAME for replication group dannenbaum.localdfsrootdatamyreplicaset due to an error. The service will retry the connection periodically.
Additional Information:
Error: 9036 (Paused for backup or restore)
Connection ID: 4C4497AF-A035-4AA0-BB73-1C58DD479F35
Replication Group ID: A1D6E57C-EED1-4B4B-B5DD-53120BCC466A
Hi Ned,
How do i control client DFS referrals for clients with 2 DFS servers?
We have two offices at seperate locations. With different ip subnets i.e 192.168.1.x adn 192.168.2.x
However in AD sites and services the subnets are under the same site.
Office 1 has a domain controller and is the name space server. Office 2 has no domain controller and is a name space server.
Clients at office 2 accessing the dfs share e.g \domain.com.aushare are going to office 1 DFS server, i know this by checking the DFS tab when you right click -> properties of the DFS folder.
People at Office 1 are happily using the Office 1 DFS server.
I want to direct people at office 2 to use their local DFS server, not the one at office 1.
Thank you,
Chau.
Hi Hockeman – are you using BackupExec for your backup software?
Hi Chau – this is more of a DFS Namespace question than DFSR. Since the subnets are both defined on the same AD logical site, there is nothing you can do to control the DFS target priority for those branch users. Your IP subnetting needs to match your logical sites,as DFS doesn’t know anything about the physical network.
Yes. Backupexec 12.5 latest and greatest patches.
Can you check to see if the following has happened? BE sets this key that can get us into some trouble:
Look at
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesDfsrRestore.
There will be a sub key named year-date-time the restore was done with two values. One of those values will be the network name that was used to perform the remote restore.
Backup and delete the restore subkey, then restart DFSR. The service will most likley hang on shut down but it will stop and restart. After the reg value is removed the service start and stop will be normal
The key was deprecated in 2008.
You say the key was depreciated in 2008. Does that mean that if we are running server 2008 that we are in the clear for this key? Also, we do not have the key on the DFSR server.
I would then say open up a support case with us for further troubleshooting. There are a lot of little one-offs that can cause that issue and it will require a bunch of data to see what’s going on.
Ned here again. Today’s post is probably going to generate some interesting comments. I’m going to discuss
Hi Ned
I have two servers both running Windows 2003 R2, the backup server has just been recently promoted to a DC with the primary one being a member server, and are located at the same site.
I am using DFS as a backup only. I currently have set up 3 replication groups, 2 of the groups are working fine and replication shows no errors apart from the usual sharing violations.
The 3rd had been working fine for a number of months until recently. The folder in question is quite large with a stupid amount of files and folders over 7million files please don’t ask what my users get up to!
For some reason certain folders directly under the main directory were not being replicated, no errors shown in the event logs. So I decided to delete the group and re-create but using smaller replicated groups. However I am now receiving the error below, and am unable to find any info on how to solve this problem. The other replication groups are working fine.
The DFS Replication service stopped replication on the replicated folder at local path F:CompanyAllIPReadOnly.
Additional Information:
Error: 9003 (The replication group is invalid)
Additional context of the error: F:CompanyAllIPReadOnly
Replicated Folder Name: AllIPReadOnly
Replicated Folder ID: 1778B859-E72F-486F-970C-F135CC06EB8C
Replication Group Name: Company1
Replication Group ID: ADD2A9F3-EA22-450D-8145-505CEA1A7E25
Member ID: 0B388181-DE95-4808-8E19-46B46E9BA2D7
For more information, see Help and Support Center at
If you shed some light on this it would be most appreciated.
Cheers
Greg
Unfortunately, this isssue is pretty serious. Your database is damaged and has a stale reference to the replicated folder, and is now blocked. The only way to fix this is to rebuild the database itself.
I would recommend you open a support case with us to walk through this carefully in order to minimise your recovery time and not cause any data overwrite/loss issues. If this is not possible let me know and I’ll give you the steps offline through an email (just ping me through the email form on the top of the blog menu); I don’t like having these steps floating around as they tend to get used too much when there are other solutions, usually.
– Ned
Hi Ned,
First of all i would like to thank you for all the information in this blog. It is very usefull.
I have a problem lately…with one of the servers.
It is connected to head office via adsl…both servers run win 2003 R2 Sp2…
the reports show there is nothing in the queue…all clear….but for some reason…the server in the remote office started lately to consume 200Kbps on a permanent basis…eventhough there is nothing replicating (i think)…when i stop DFS Rep Service..bandwidth go down to nearly 0.
The remote office is hardly in use. and not occupied every day..
thanks
Eldad
Take a look at the DFSR debug log in %systemroot%debugdfsr*.log. If files are replicating in/out you will see that happening.
There are some samples of what this will look like here, as well as some ways you can turn on auditing to see files if you;re not keen on the debug logs:
http://blogs.technet.com/askds/archive/2008/12/17/understanding-file-date-time-behavior-in-dfsr-replication-and-better-ways-of-knowing-that-a-file-has-replicated.aspx
– Ned
Ned,
On 6/16/08, you stated that we "should not share the same staging directory". I’m a bit confused because the "Additional information about DFS Replication staging folders" section of http://technet.microsoft.com/en-us/library/cc772778.aspx makes it sound as if this is recommended in some cases (see the 2nd bullet point).
What am I missing?
That technet doc is wrong and being changed if I have my way. 🙂
– Ned
Hi Ned, thank you for article!
Could you please advise what may be wrong – i have folder, that replicated between 4 servers, connected by WAN connections. A few days ago i’ve spotted that replication becomes very slow. I checked backlog file count on every server and found, that on one of them has very large file queue – around 500k files. On other servers backlog file count is around 100-500 files. The replicated folder contais around 600k files at all, how can be 500k in queue?
Thank you very much,
Ivan.
If it’s nothing on the above list of 10 items, you should open a support case for further troubleshooting. This will require significant time and data analysis, as well as collecting a few GB of data from you to analyze – basically, out of scope for this blog.
– Ned
Hi Ned,
We recently moved our 2003 R2 DFS to a Native 2008 DFS. I was having no problems with the R2 DFS since upgrading to the newest .exe that you previously recommmended. However, now it seems if there are sharing violations on files long enough, the DFS moves/deletes the files to "Deleted and Conflicted" and the users have to call me to recover them. Any ideas?
Thanks much!!!
Jason
Are there any new dfsr.exes for 2008? Thanks. Jason
Yes.
967326 Data loss occurs after you use the Dfsrmig.exe tool to migrate the SYSVOL share from the FRS to the DFSR service in a Windows Server 2008-based domain
http://support.microsoft.com/default.aspx?scid=kb;EN-US;967326
968733 The SYSVOL share migration from FRS to DFSR fails on Windows Server 2008 R2 Beta-based servers if a disjoint namespace is configured
http://support.microsoft.com/default.aspx?scid=kb;EN-US;968733
962969 Error message when you run Dfsradmin.exe to set membership properties in Windows Server 2008: "The property MemberSubscriptionReadOnly cannot be used"
http://support.microsoft.com/default.aspx?scid=kb;EN-US;962969
We are working on getting the master KB article released for 2008 as well, but the publishing timeline is not under my control unfortunately..
Thanks, Ned… I was hoping one of those would help me with my previous post. I did migrate sysvol to dfsr, but I’m experiencing a different kind of data loss. Jason
Should I call about my problem?
Yes.
Hey Ned… For some reason when using the script that has been provided to check for backlog counts, i get duplicate listings when running the following command on my servers for one of my replication groups. When running the command for other groups, the output is as expected (Two lines showing backlog counts for two servers) See the example below-
Command line run on ServerA
cscript backlog.wsf /ReplicationGroupName:"domain.localdfsrootdataprojects" /SendingServer:servera /ReceivingServer:serverb /Twoway
Command output-
ServerA -> ServerB, Replicated Folder: Projects is backlogged by: 0 files
ServerA -> ServerB, Replicated Folder: Projects is backlogged by: 0 files
ServerA -> ServerB, Replicated Folder: Projects is backlogged by: 0 files
ServerA -> ServerB, Replicated Folder: Projects is backlogged by: 0 files
ServerB -> ServerA, Replicated Folder: Projects is backlogged by: 0 files
As you can see, there are 5 entries… 4 of which are duplicate. Any ideas?
I have no idea what that script is. :-/ Where did you get it?
Ha-ha!! Sorry Ned… I got it from here- http://msdn.microsoft.com/en-us/library/bb540040(VS.85).aspx
It’s quite the handy script… I just can’t quite figure out why I’m getting duplicates. It’s getting this information for DFSR via WMI so it has to be duplicated somewhere in the system. I don’t think it’s a problem with the script because of the fact that I can run it against any other Rep group without a problem. It works PERFECTLY! I’m simply looking to understand where in the system I could look for these duplicate entries.
Do you have the same problem using the WMIC.EXE tool against those three DfsrReplicatedFolderConfig, DfsrReplicatedFolderInfo, and DfsrReplicationGroupConfig classes? Just to completely rule the script out once and for all, I mean.
Hey Ned I’m having a hard time replicating this with the WMIC Tool. I’ve sent an e-mail to the MSDN team that developed the script. In the mean time, can you tell me where I may look for something like this? As I said, it doesn’t behave like this for my other replication groups so although I’m not sure yet, I’m pretty sure it’s not the script. Thanks!
Could be in AD (under the computer dfsr-localsettings objects or the System dfsr-globalsettings objects), could be in the XML files cached locally on the server in the system volume informationdfsrconfig folders on each drive, could be registry under the dfsrparameters key, or could be in the WMI repository itself (which is totally opaque – hence the need to use the WMIC.EXE tool to confirm).
OK, this explains the problem I’m having:
http://support.microsoft.com/?kbid=967357
Except it’s with Native 2008 DFS rather than 2003R2… Any solutions for 2008 out there?
Thanks,
Jason
Here’s the version of dfsrs.exe we’re using: 6.0.6001.18000. This problem is making me cry…
Thanks,
Jason
Came into work with this error on native 2008 DFS, using Legato Networker for backups:
The DFS Replication service encountered an error communicating with partner myserver for replication group campus.mydomain.edushares$users.
Partner DNS address: myserver.campus.mydomain.edu
Optional data if available:
Partner WINS Address: myserver
Partner IP Address: myip
The service will retry the connection periodically.
Additional Information:
Error: 9036 (Paused for backup or restore)
Connection ID: 78C61C44-7B1D-4596-8939-94A108659FE0
Replication Group ID: F13E6B52-BBAB-43CC-922F-12FEB419A79A
Any help would be appreciated… Jason
Got this error on after DFSR started back up:
The DFS Replication service failed to recover from an internal database error on volume E:. Replication has been stopped for all replicated folders on this volume.
Additional Information:
Error: 9209 (The database resource was not found (-1601))
Volume: 47A2C3EC-2EAA-442D-8AFA-10EEE88AF9DE
Database: E:System Volume InformationDFSR
However, it looks like an initial replication started on its own. Should I take any further steps?
Thanks,
Jason
Hi turg77,
For your first question: that is normal. Your backup software is stopping DFSR in order to run backups. If you don’t want that error you will need to speak to Legato about changing their software, or invest in a different backup product.
For your second issue, there was a database problem that DFSR fixed automatically. There is no reason to do anything further there.
Hi NedPyle, I would like to thank you for posting all this info. I currently managed 30x DFSR servers with about 3TB’s of data all across WAN links. I’ve applied those 2 patches you recommend on every single server. I must say the DFS replication is running much better between the servers now. This blog is my DFS Bible 😛
My question: I upgraded a RAID array of about 1TB of data by doing the following.
1. ROBOCOPY /COPYALL… to another location
2. Expand RAID arrray
3. ROBOCOPY /COPYALL… to original location
I wish I had found this blog before I did that. /COPYALL = evil!!
DFSR is now trying to replicate every single file (1.7 million files). I get the following error…
"The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder."
Please could tell us what exact parmeters I should be using with ROBOCOPY to copy DFS data back and forth.
I was thinking of something like "robocopy <source> <dest> /E /SEC /W:3 /R:2 /DCOPY:T"
Your recommendation please! 🙂
COPYALL is ok as long as the root folder permissions don’t change, causing inheritance to permissions to change. If they do, that changes the hash on all the files, and you end up with this situation. It certainly is evil. :-/
Take a look at:
http://blogs.technet.com/askds/archive/2008/02/12/get-out-and-push-getting-the-most-out-of-dfsr-pre-staging.aspx
Hi Ned,
It certainly is good to have the ear of someone who knows what they are talking about!
I’m having a little problem with one of my DFS-R replicated folders. I have 3 servers all Win2K3sp2 R2 with 10 replicated folders. Two servers are in our main datacenter and the third is in a DR site. All but one of the folders replicates fine (except for some Excel files but, that’s a different subject). I pulled a DFS Health report this morning and saw that one of the folders has a large number of backlogged receiving transactions. I went and looked at the three servers and only one of them has anything in this particular folder. Meaning that this folder has never started it’s initial replication.
Is there a way to force dfs-r to perform this initial replication on just this one folder?
Thanks,
ScottB
So is that one folder an actual replicated folder root, or just a subfolder inside some RF and the rest of the data in that RF works without issues??
I’m reposting a response because I’m not sure if my last one got sent…
The folder in question is a root folder. It is directly under \xxxxroot.
It’s a very open-ended issue. I’d start by setting DFSR debug logging severity to 5 on all servers, then dropping a simple test file named after each server on them their respective folders. Then examine the debug logs to see what happens with that canary file on each box, any errors, does it replicate, etc.
Lots more info on interpreting the logs here:
http://blogs.technet.com/askds/archive/2009/04/09/dfsr-debug-log-series-wrapup-and-downloadable-copies.aspx
Hey Ned, What is the latest version of dfsrs.exe that I should be running on my Server 2008 x86 boxes? Everything has been pretty solid for the past couple of months but I’m about to roll to production and want to be sure that we are all up to date before moving forward. The current version that we are running is 6.0.6001.18000. I’ve looked at the patches for DFSR in the previous article but all of them say "Not applicable" to Server 2008 x86. Thanks!
Whoops! We have a KB article for 2008 and 2008 R2 now that tracks those, but I completely failed to update the above #2 with its link. It’s there now, and here:
KB 968429 – List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2008 and in Windows Server 2008 R2
Hey Ned! Do you know of, or can you find out if running the ‘DFSRDIAG Backlog etc. etc.’ too much can slow down replication? Specifically speaking to the initial replication? I Just want to confirm that this tool doesn’t put anything on hold while checking backlog counts. Thank you!
It’s not free – there’s a certain amount of expense when you run that tool as it has to query the DFSR jet database for all outstanding backlogged records. It’s not particularly efficient in 2003 or 2008 (it is much more efficient in 2008 R2), hence why it is limited to showing only 100 file names.
Bottom line – don’t run it often if you care about performance, as you will definitely be slowing things down. How much, is hard to say – depends on too many factors.
Master Document for DFSR patches got released finally didnt it? Would you mind posting it here? Thanks Ned!
Hi Ned, I have created several replicated folders and the work great, except I have 3 folders that generate this error "Pre-existing content is not replicated and is consuming disk space.".
I have attempted to delete this information on the Target server, however when I try to access the DFSPrivate directory in Windows Explorer I receive the message "Access is Denied". I can display the files in the Command Prompt, but can’t delete them on the target computer. I even attempted to "Take Ownership" of that directory and subdirectories as well as reset permissions, which appeared successful, but when I went back to delete it I still received "Access is Denied". How can I clean out the Pre-existing files?
Thanks Lyle
What OS – 2003R2 or 2008? My steps will change, depending on the answer.
Well, I got another problem… One of our departments has noticed xls files being renamed to odd names without xls extentions, i.e. AE8A8100. Recently updated both 2008 DFS servers to SP2. If you can offer anything, I’d greatly appreciate it Thanks, Jason (missing my 2003 R2 DFS)
That is how Excel 2003 and older works, as I recall – you will see that even without DFSR (run Process Monitor on your local computer and make some changes in an Excel doc locally). It does this sort of swappy rename behavior behavior. I don’t have Excel 2003 running to confiurm this though. If it’s ending up with the wrong file we’d need to investigate the debug logs to see where things are going south.
Please open a support case for troubleshooting on that, it will be worth your time.
Wow ! Just read this whole thread, and I’m feeling enlightened. Thanks for the great info.
I wanted to ask a little more about the Staging folder size configuration for larger files. I did read the perf guide on Technet, etc., but I’m still curious on a simple scenario. Here’s what I’m trying to tune:
Two Win2003 R2 SP2 servers using DFSR for only the purpose of backup data replication to a remote site for DR.
I have a single replication group with both servers included, and have configured the schedule, etc. It seems to work fine, but I have alot of staging folder cleanup event entries, which raised my concern about the performance, etc.
On Friday evening, we add approx 200GB of backup images to one of the members (ServerA). This 200GB is spread across appprox 14 files (Backup Exec Server Recovery BESR backup files). We then replicate all weekend to the remote site (ServerB). On Mon-Thurs, we have nightly approx 3GB of image files spread across approx 14 files per day. Each day, the files have a different name. We have the Replication group schedule open FULL overnight to allow for replication during non-business hours. Data is always originated on ServerA (after backups are done) and replicated out to ServerB.
How do I best configure the size of the staging folder for ServerA, and ServerB. Since there are only two members, and each day, the files to replicate have different file names? I am confused about the 9 files at a time, down to 1 file at time when staging is over 100%, etc. I initially thought to follow the info above about ensuring the staging size is greater than the largest files, but I was not sure if I should set a 200+GB staging value. I was concerned at how this might affect free space on the server volume where the replicated folders exist.
Dazed and confused on how to best proceed. Thank you for any advice you can offer.
Hi FLuhm,
For Win2003R2, we ordinarily recommend that your staging directory be set at least as large as your 9 largest files. This is because 2003R2 can replicate 4 files *in* and 5 files *out* concurrently. In 2008 (and soon to be R2), we say as large as your 32 largest files (as it will do 16 files in and 16 out concurrently).
For your case, where replication is quasi-one way – i.e. the DR site is never going to originate any changes – you would want:
1. Your ‘main server’ (where files originate) to have its staging be at least the size of your 5 largest files, in order to minimize taging cleanup.
2. Your ‘DR server’ (where files will be received) to have its staging to be at least the size of your 4 largest files.
If you have the disk space and an ideal world, the ulimate in staging perfection would be to have the staging space be the same size as all your data. Disk space has gotten pretty cheap (I saw a 1TB drive at Best Buy last week for $120 – ridiculous!), so it may be worth adding more storage in order to ensure your DR site performance is optimal.
– Ned
We’re having an issue between two DFSR members across a WAN link, where beginning about the middle of the day the backlog in one direction (from the hub to the spoke) begins climbing. It only clears out after the workday has finished.
I think this may be related to cause #6 of this blog post, with sharing violations. We do receive a large amount of sharing violation warnings on both members, mostly from AutoCAD DWG files as our users work directly off the server.
The functional mode is Server 2000, so only 4 files can be transferred at a time correct? If so, will DFSR keep retrying the same 4 locked files, until it is able to pass them through, or will it move onto other backlogged files? If not, can this be tweaked at all; for example, how long to skip the locked files?
Hi jmiles,
Functional mode won’t matter, but being Win2003R2 versus Win2008 will matter quite a bit. Win2008 can replicate at least 4 times faster inbound (16 inbound files at once), and actually typically replicates around 10 times faster (asynchronous RPC improvements). Lots of sharing violations are part of the issue. Being Win2003R2 is the other part.
DFSR will move on to other files, but periodically retry the previous locked ones. If a lot of files are locked (hundreds, thousands) all the time, it could really bog down as it will spend a great deal of time trying files that are not going to replicate until the user unlocks it. But no, it will never totally halt as logn as there’s work to do and files that can be worked.
– Ned
Thanks for that info. We are moving to upgrade the namespaces to Win2008 (I had emailed you previously about that), so maybe this will accelerrate it.
I’ll have to turn on the EnableAudit function of DFSR to be sure, but I don’t think too many files are getting through. Today the backlog has risen from 4 files at 9AM to 600 files at 2PM to 900 files, now currently 4:30PM.
Well, I guess we’ll have to breakdown and open a support ticket with Microsoft… However, for informational purposes, we’ve had nothing but problems since "upgrading" our Win2003R2 DFSR to Win2008 DFSR. Clean installs across the board. It seems muitple times daily we get calls about missing files which turnout to be in the ConflictAndDeleted folder. It’s seems the conflict resolution algorithm thinks nothing is a winning file… Ugh! Perhaps Win2008R2 will bring happiness, but SP2 didn’t.
Sorry to hear it. There were no changes in that C&D code between the OS’s, so based on past experience I would expect an external cause. Your case will tell.
Good luck,
ned
Oh, DSFR also likes to make whole directories disappear!
We’ve found at least two 3rd party applications that cause that – it’s odd create/rename/delete behavior makes DFSR delete folders incorrectly. All the more reason to open a case. To date, no DFSR ‘all by itself’ deletions with folders in 20032008 though.
– Ned
Ned,
Are talking like an anti-virus app? Because were pretty much a XP/Vista and Office 2003/2007 shop? Most files that disappear are .doc or .xls or folders with those types of documents in them.
Thanks,
Jason
One was document management software. The other was a specialty app that handled the proprietary files directly.
Hi Ned, Does DFSR use the Windows Change Journal? Thanks, Jason
It uses the NTFS USN Journal.
Thanks, Ned. Is there a article on the perfered method to backup a volume replicated by DFSR? My head is barely above water right now, so I haven’t been able to call support. However, I’m thinking our backup software is screwing with the journal which is causing problems for DFSR. Feasible? We got this Event ID 2206, DFSR, this morning:
The DFS Replication service successfully recovered from an NTFS change journal wrap or loss on volume E:.
Additional Information:
Volume: 47A2C3EC-2EAA-442D-8AFA-10EEE88AF9DE
Ouch. There is only one method – using a VSS writer. If you backup software doesn’t use that, it’s unsupported.
Journal loss == bad bad bad. 99.% of the time, that is due to failing hardware.
Head in hands… I was thinking about changing all my kids’ names to Ned before that "good" news… I might be in the one percent area, though, because we also got the same error on a different hardware volume (same server) that has the sysvol_dfsr on it. And, the events were posted on both servers in the replication groups…
For the 1% – was the DFSR service off for several hours/days and a ton of files modified in the meantime?
It seems to have stopped replicating last night which it seems to do from time to time (backup software?). It usually comes back on line. But, it hasn’t yet this morning. I’ve uninstalled the backup software client.
I’ve been put in the call back queue for the your team… Thanks.
Hi Ned: I’m guessing you review support cases, but just in case… We’ll know for sure over the next couple days, but after about eight hours on the phone it looks like the combination of the DFSR setting "Move deleted files to Conflict and Deleted folder" and some other settings (and perhaps Excel 2003?) were causing my problems. I’m hoping for the best… Thanks, Jason
Is the ConflictAndDeleted folder dynamic? While monitoring it on one of my servers in the DFSR group, I see files go in and out of it. Support wasn’t able to answer the question.
Absolutely. It has a quota just like staging.
I guess I should have clarified, my quota is set to 200GB (I increased it so I wouldn’t lose any files), and I’m only talking about four or five files in there at a time and then it’s empty.
If you already have a support case opened, please get it escalated. That is not normal behavior unless the files actually add up to 90% of 200GB.
NedPyle your expert help is much needed.
Across the country we are bringing our offices onto our DFS system in the datacenter. When convert an office to DFS, this is the procedure I use.
Step 1. Copy data from server to USB drive (I made sure that the directory on the USB drive has the exact same permissions as the one on the server.
robocopy "D:Data" "E:office2" /COPYALL /E /ETA /LOG:c:robolog.txt /TEE /W:3 /R:2
Step 2. Fly 3 hours via plane to datacenter
Step 3. Copy the data from USB drive to the server in the datacenter (I made sure that the directory on the server drive has the exact same permissions as the one on the USB drive).
robocopy "E:Data" "D:Dataoffice2" /COPYALL /E /ETA /LOG:c:robolog.txt /TEE /W:3 /R:2
Step 4. I run through the DFS setup and tell it to replicate the remote office to the datacenter server.
The problem is that every time I do this the DFS service performs the following on every single file…
"The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder"
Sometimes we have 350,000 files at an office and it takes DFS about 1 month to perform its "rehash" (for a lack of a better word).
What am I doing wrong? Is there a better way to get the data into the datacenter?
Your help is MUCH appreciated!
Zuldan, I think this may be expected behavior. DFSR still has to go through and check each file. Right now I’m performing the same operation as you, only the two servers are in the same LAN, and we’re deploying a new server. I robocopied the data from existing master to the new server, and during initial replication, every file gets a conflict error.
However, the initial replication takes much less time than if we had not pre-seeded the data, and afterwards there are no files in the pre-existing folder, so I think its normal occurance.
I would imagine its taking so long for you because your remote server is still remote, so it has to check every file across the WAN. Over a high latency link, this will take a while.
What does one do when there are a few files stuck in the backlog, when you’re sure that they’re not currently open? We have 3 files in backlog for a replication group. The server pushing out the updates has been restarted multiple times since we’ve seen the backlog. These files won’t replicate to multiple partners, both in the LAN and WAN.
Hi Zuldan,
You should not be getting conflict events if the file hashes really do match. If you believe you have gotten security to match perfectly, it could be some other change to the file. This won’t be diagnosable through a few blog comments, please open a support case so we can examine the data more closely.
– Ned
Hi Ned! Thanks for all the valuable info.
Have you ever seen and resolved this error: "[ERROR] Failed to execute GetOutboundBacklogFileCount method. Err: -2147217406 (0x80041002)" when running dfsrdiag /backlog?
This is part of an automated script on about 15 servers with only one having the error. What little I was able to find on this error indicated a WMI problem, but when I use WMI Diag the server appears fine. Any suggestions?
Hi,
Yes, that is an indictor of WMI issues. Try fixing it with:
1. Logon as an administrator and open an (elevated, if 2008/R2) CMD prompt.
2. CD to %systemroot%system32wbem
3. Run:
MOFCOMP.EXE dfsrprov.mof
(and if it exists)
MOFCOMP.EXE dfsrprov.mfl
4. Restart the DFSR service (heck, restart the server if you can).
5. See if it works now.
Hi Ned,
It is W2K3R2 and tried what you said including a reboot but I had the same result.
[ERROR] Failed to execute GetOutboundBacklogFileCount method. Err: -2147217406 (0x80041002)
Operation Failed
OK, try the hammer and anvil approach:
In a CMD prompt:
cd /d %windir%system32wbem
for /f %s in (‘dir /b *.dll’) do regsvr32 /s %s
for /f %s in (‘dir /b *.mof *.mfl’) do mofcomp %s
If still not working, open a Support case with us for further analysis.
Hi Ned,
The issue was resolved. There were three servers having the problem, but each had a different resolution:
SERVER1
(the one I was testing on) I had a mistake in the script which was the problem all along. This server was a slightly different configuration.
SERVER2
The DFS Replication service was terminated
SERVER3
The first suggestion resolved the problem
Cool. 🙂
> Important note: If you are in the middle of an initial sync, you should not be rebooting your server! All of the above fixes will require reboots. Wait it out, or assume the risk that you may need to run through initial sync again.
Hi Ned!
Does this mean that initial sync is not-restartable process that can be only completely restarted from scrtatch every time instead of just being paused and later resumed from the same point?
Not precisely – back in 2003 R2 there were a number of issues that could be cause initial sync to not restart at all and be only partially completed. In 2008 this issue is removed. This blog post is extremely old…
Hi Ned,
I inherited a network that is using DFSR. The server are all Windows 2003 R2 SP2. DFSR was working well as far as I know, but now it gives me errors in the logs.
Such as 1. Event ID:5014 with Error: 9033 (The request was cancelled by a shutdown)
2. Event ID:5014 with Error: 1726 (The remote procedure call failed.)
3. Events 5008, 5012, 6802.
I just installed KB950224-v3 and am hoping that will resolve my issues. 🙂
If you can offer any help or suggestions, it would be greatly appreciated.
Regards,
Tibido
Hello Ned,
How are you thanks for supporting people on DFS issue
I have am redesigning the DFS at customer place they have 3 dfs servers windows 2003 r2.
Server1 (hub)
Server2 (spoke)
Server3 (Spoke)
Now I have restructered the folders like before:
folder D:rootResearch – was configured in replication group Research
again subfolder D:rootResearchtools – was configured in replication group called tools
Now to remove this inconsitancy I have deleted the tools replication group since already parent folder is replicating the same data that is d:rootResearch.
Now I discovered the d:rootresearch folder doesn’t exist on spoke servers(server2 and server3).
I have checked event logs, dfsrlogs, nothing much I found the reason for this.
As a workaround I found that If I move this folder to other location and move back then replication getting started and working fine. I did same for small folder in "d:rootresearchrest" it worked fine exist on all spoke servers (server2 and server3).
Now I dont want to use this workaround on the big 400 GB D:rootResearchtools folder can you tell me how to troubleshoot this issue.
Many thanks in advance.
Regards,
Basheer.
Sorry correction
Hello Ned,
How are you thanks for supporting people on DFS issue
I have am redesigning the DFS at customer place they have 3 dfs servers windows 2003 r2.
Server1 (hub)
Server2 (spoke)
Server3 (Spoke)
Now I have restructered the replication folders;
like before:
folder D:rootResearch – was configured in replication group Research
again subfolder D:rootResearchtools – was configured in separate replication group called tools
Now to remove this inconsitancy I have deleted the tools replication group since already parent folder is replicating the same data that is through a separate RG "d:rootResearch".
Now I discovered that d:rootresearchtools folder doesn’t exist on spoke servers(server2 and server3).
I have checked event logs, dfsrlogs, nothing much I found the reason for this.
As a workaround I found that If I move this folder to other location and move back then replication getting started and working fine. I did same for small folder in "d:rootresearchrest" it worked fine exist on all spoke servers (server2 and server3).
Now I dont want to use this workaround on the big 400 GB D:rootResearchtools folder can you tell me how to troubleshoot this issue.
Many thanks in advance.
Regards,
Basheer.
Hi Ned,
In this excellent article you reference to KB968429 — List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2008 and in Windows Server 2008 R2. That is really valuable article… Or it was before it stopped being updated last spring. More and more DFS-R hotfixes come out these days and none of them got referenced in KB968429.
So I decided to make my own list of post-SP DFS-related hotfixes. And I hope some of the guys who read your post find it worthy. You can find my current list of DFS hotfixes at http://pronichkin.com/Lists/Posts/Post.aspx?ID=132
–Artem
@ Basheer:
I sounds like there is a database issue. If you don;t want to use that workaround, you will need to open a support case so that the environment can be examined in detail.
@ Artem:
Yep. Don’t worry, several updates to that KB are on the way. The whip got cracked on this a week ago, your timing is excellent. 🙂
Hi Ned,
Thanks for the information.
I have used the same workaround. Now this huge data has to pass through wan link.
Further I see there DISK Quota Hard quota is enabled with is only 5GB is free, so now I am changing the staging folder default path to speed up the replication and increase the size of stagging folder.
regards
Basheer/
Hi Ned,
That woraround of moving file did not work. Again after sometime it cleared all these folder from the two partners server1 and server2, now I mam not sure how to proceed, can you please suggest how to proceed.
Please see the logs
20100217 17:00:47.662 440 MEET 3634 Meet::InstallTombstone -> DONE Install Tombstone complete updateName:S1489.blc uid:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571314 gvsn:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736252 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csName:Commercial csId:{3585D669-4F22-42AA-903F-195D9E481AC2}
20100217 17:00:47.662 3976 MEET 3634 Meet::InstallTombstone -> DONE Install Tombstone complete updateName:S1488.blc uid:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571313 gvsn:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736251 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csName:Commercial csId:{3585D669-4F22-42AA-903F-195D9E481AC2}
20100217 17:00:47.662 440 INCO 4378 InConnection::UpdateProcessed Received Update. updatesLeft:216 processed:23743 sessionId:3 open:1 updateType:0 processStatus:0 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csId:{3585D669-4F22-42AA-903F-195D9E481AC2} csName:Commercial update:
+ present 0
+ nameConflict 0
+ attributes 0x80
+ gvsn {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736252
+ uid {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571314
+ parent {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3568542
+ fence 16010101 00:00:00.000
+ clock 20100217 11:42:09.875
+ createTime 20070926 11:27:49.500 GMT
+ csId {3585D669-4F22-42AA-903F-195D9E481AC2}
+ hash 00000000-00000000-00000000-00000000
+ similarity 00000000-00000000-00000000-00000000
+ name S1489.blc
+
20100217 17:00:47.662 440 MEET 1190 Meet::Install Retries:0 updateName:S149.blc uid:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571315 gvsn:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736253 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csName:Commercial
20100217 17:00:47.662 3976 INCO 4378 InConnection::UpdateProcessed Received Update. updatesLeft:215 processed:23744 sessionId:3 open:1 updateType:0 processStatus:0 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csId:{3585D669-4F22-42AA-903F-195D9E481AC2} csName:Commercial update:
+ present 0
+ nameConflict 0
+ attributes 0x80
+ gvsn {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736251
+ uid {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571313
+ parent {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3568542
+ fence 16010101 00:00:00.000
+ clock 20100217 11:42:09.875
+ createTime 20070926 11:27:49.500 GMT
+ csId {3585D669-4F22-42AA-903F-195D9E481AC2}
+ hash 00000000-00000000-00000000-00000000
+ similarity 00000000-00000000-00000000-00000000
+ name S1488.blc
+
20100217 17:00:47.662 3976 MEET 1190 Meet::Install Retries:0 updateName:S1490.blc uid:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571316 gvsn:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736254 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csName:Commercial
20100217 17:00:47.662 440 MEET 3699 Meet::MoveOut Moving contents and children out of replica. newName:S149.blc-{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571315 updateName:S149.blc uid:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571315 gvsn:{067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3736253 connId:{5C5A3D52-82A6-418E-B687-4FAFACB27876} csName:Commercialrecord:
+ fid 0x1100000002C79F
+ usn 0x2616edd30
+ uidVisible 1
+ filtered 0
+ journalWrapped 0
+ slowRecoverCheck 0
+ pendingTombstone 0
+ recUpdateTime 20100214 18:01:46.725 GMT
+ present 1
+ nameConflict 0
+ attributes 0x80
+ gvsn {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571315
+ uid {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3571315
+ parent {067B5647-7D78-45D0-8A8C-579BF10F96BD}-v3568542
+ fence 16010101 00:00:00.000
+ clock 20100104 04:22:02.187
+ createTime 20070926 11:27:49.500 GMT
+ csId {3585D669-4F22-42AA-903F-195D9E481AC2}
+ hash 8B6CA9F0-E5AAF6A5-86D17334-BF7FBFD9
+ similarity 00000000-00000000-00000000-00000000
+ name S149.blc
+
Open a support case.
Hi Ned,
One of my 17 replication groups to replicate stopped after one of the servers involved in replication has been restarted a few times.
In DFS Replication – Health Report, I receive the msg below:
The DFS Replication service is restarting frequently.
Affected replicated folders: All replicated folders on this server.
Description: The DFS Replication service has restarted 5 times in the past 7 days. This problem can affect the replication of all replicated folders to and from this server. Event ID: 1004
Last occurred: segunda-feira, 22 de fevereiro de 2010 at 07:40:12 (GMT-3:00)
Suggested action: If you restarted the service manually, you can safely ignore this message. For information about troubleshooting frequent service restart issues, see The Microsoft Web Site.
After several of these boots in one of the servers the files are not being replicated to the receiving member.
How can I solve this problem?
Regards,
Bruno.bbc
Hi Ned,
Do you think adding some supplementary hubs can improve data replication speed? I have one primary world server, 3 regions servers replicated from this primary, and about 40, 30 and 30 servers replicated from these 3 regions servers : what about adding one supplementary hub to each 3 regions servers? would this help?
Thanks a lot!
What direction is the data primarily flowing – from the 100 spokes towards the 1 primary? The 3 regional hubs could be overloaded by 30+ spokes if the regional was inbound replicating. With 2003 R2 it could only handle 4 files at a time. With 2008/R2, 16 files by default, and the option to tune up more. If this was all 2008/R2, doubling the layer of regional servers could potentially double replication performance.
I will be creating a new DFSR tuning blog post in the next few weeks BTW way, it covers more about this.
Hi Ned,
Thanks for your feedback. So the flow is from world primary towards spokes (through regionals). Files/folders are only updated on the world primary server and replicated to spokes. the purpose is to speed up replication from this server to others. All servers are 2003 R2. Well, correct me if i am wrong but the approach to add supplementary hubs is not the good one according to you? What would you recommand?
Thanks again!
HI,
In my organization DFSR is configured with root server & 68 replicated partner server.
DFS replication is working fine , but when in the DFSR health report i can see that around 14 servers getting error "Cannot connect to reporting DCOM server"
I have checked the permission, ports are open, reinstall the DFS service, but still the same.
We need to remove this errors.
On root server i am getting following events for which the servers haveing reporting issue.
Event Type: Error
Event Source: DCOM
Event Category: None
Event ID: 10006
Date: 3.4.2010
Time: 17:24:59
User: N/A
Computer: Root server name
Description:
DCOM got error "General access denied error " from the computer "repicated partner server name" when attempting to activate the server:
{3B35075C-01ED-45BC-9999-DC2BBDEAC171}
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
On repliacted partnet getting follown events.
Event Type: Error
Event Source: DFSR
Event Category: None
Event ID: 5002
Date: 3.4.2010
Time: 16:25:29
User: N/A
Computer: replicated partner
Description:
The DFS Replication service encountered an error communicating with partner "root server" for replication group IUBDATA.
Partner DNS address: root server
Optional data if available:
Partner WINS Address: root server
Partner IP Address: Ip address
The service will retry the connection periodically.
Additional Information:
Error: 1753 (There are no more endpoints available from the endpoint mapper.)
Connection ID: 80396848-A4A9-4C42-A446-BFD6C6E73F24
Replication Group ID: EC982F82-9B4A-4895-9275-4A13A12BC465
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Event Type: Error
Event Source: DFSR
Event Category: None
Event ID: 6104
Date: 2.4.2010
Time: 16:13:40
User: N/A
Computer: repliacted partner
Description:
The DFS Replication service failed to register the WMI providers. Replication is disabled until the problem is resolved.
Additional Information:
Error: 2147749902 (100e)
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
we are going to install the patches mentiomed in below artical to all the servers
http://support.microsoft.com/default.aspx?scid=kb%3bEN-US%3b958802
Could you please chekc & help us some solution to resolve this issue.
Thanks in advance.
Regards.
nadarajg
Hey Ned,
We’re just getting into the business of using DFS in our environment. We’re all upgraded to the latest and greatest, 2008 domain and using 2008 R2 servers for our DFS hosts.
Here’s my question. We are hosting quite a bit of data (700+ GB, 6,000,000+ Files, in 3,000,000 Folders). Now I’ve looked at the documentation, which is telling me that 2008 DFS doesn’t have any limits and to only watch performance. All of this data is in 1 replication group. The reason for doing this is that we are using group policy redirection, which points to the sub folders within this location. So if we broke the namespace up into sub letter target folders instead of 1 massive target folder, we wouldn’t easily be able to continue to use these policies. And to let you know, both of these servers are in the same location and we aren’t ever going to use another across a WAN or in a remote location.
With all of that being said, obviously initial replication takes a while along with some massive data copies as we are mirroring our production environment to keep the data fresh. I was wondering if there is any way to make replication go any faster during these massive file copies? Especially because we aren’t concerned with bandwidth usage, we’d prefer that they go near 100% if they could.
And do you have any recommendations for our DFS design? The good part is when initial and mirrored replication is done however, it does replicate smaller changes very quickly.
With the huge amount in the replication group, I’ve also seen the servers take a long time to rebuild their databases if one becomes corrupt in some way. Do you have any recommendations besides keeping A/V away from them, to keep them safe?
Thanks!
Marcus
Hi Marcus –
Ned’s out watching the Cubs beat the Braves.
Can clarify what you mean by "mirrored replication"? Do you mean you’re replicating the data with DFSR or using SAN replication?
The most effective thing that can be done to improve the initial replication times is pre-seeding the data on the downstream server. The second most effective thing is to tune DFSR if applicable. Both of these points are covered in good detail with performance
numbers in the blog post
http://blogs.technet.com/askds/archive/2010/03/31/tuning-replication-performance-in-dfsr-especially-on-win2008-r2.aspx.
To mitigate DB failure/recovery, in addition to following
http://support.microsoft.com/default.aspx?scid=kb;EN-US;822158 which you are already doing, you can:
1) Adjust the amount of time that DFSR has to commit the logs and close the DB on shutdown:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;977518.
2) Make sure that storage is functioning as expected and that all firmware and drivers are up-to-date.
Thanks for following our blog!
–Jonathan
Hi.
We’re trying to use the dfsrdiag backlog command to gather some trends about our replication topology, and find that the backlog filenames are not listed if we run the dfsrdiag backlog command under Windows 7 or 2008 R2. It works fine under straight 2008.
All we get is something along the lines of…
Member <server> Backlog File Count: 4
Backlog File Names (first 4 files)
but no filenames.
I know it’s only a trivial thing, but it is irritating having to RDP to 2008 server to get the file list when we should be able to do this from a local client.
This works fine when I run it on 2008 R2 – please be more specific in your repro steps. Are you saying it only doesn’t work when the dfsrdiag backlog is run on a 2008 R2 server and is pointing rmem/smem to a 2008 NON-R2 server?
Hi Ned, thans for the response.
It only seems to fail when I run dfsrdiag on a W7 or 2008 R2 machine. The rmem/smem are a combination of 2008, 2008R2 and 2003.
As a specific example.
smem 2008 (x64 non-R2)
rmem 2003 (x64 R2)
No file list is produced when running dfsrdiag on W7, or 2008R2, but running the same command on 2008 (non-R2) or 2003 is fine.
For info, dfsrdiag.exe is vesion 6.1.7600.16385
Ah. I am able to reproduce this, when running the new DFSRDIAG against *non-2008 R2* servers. If the smem/rmem are 2008 R2, it works fine.
There were a bunch of changes in 2008 R2 to make the backlog command work faster/better, it looks like this new version of dfsrdiag is not fully backwards compatible. I’ll look into this a bit more to see what’s up.
Nice catch, thanks. 🙂
Hi Ned,
I am having some issues with my DFSR environment. We are running Windows 2003 Server R2 SP2 on all of the servers. We have NOT applied all of relevant DFS hotfixes and patches per KB article 958802. There are about 11 patches on that list, of which I have only applied KB933061. We are running into a problem where several hundred file/folders are NOT replicating for some reason and they are being dumped into the Deleted and Conflicted folder with Event ID 4302. Here is the information from the DFSReport.
"Due to ongoing sharing violations, DFS Replication cannot replicate files in the replicated folders listed above. This problem is affecting 332 files in 1 replicated folders. Event ID: 4302
Last occurred: Thursday, April 22, 2010 at 2:54:08 PM (GMT-8:00)
Suggested action: Verify that the files you want to replicate are closed and have no open handles to them. For information about troubleshooting sharing violations, see The Microsoft Web Site"
We are planning on applying any of the relevant hotfixes tonight but I wanted to see if you have any thoughts about the issue. Any help would be very much appreciated. Thank you, Mike
scorchtoggs, I deleted your post. Don’t be alarmed, it’s only becasue you posted your case # in there. You should treat that like a social security number and never post it publically – other people could use it to get support and you will get the bill.
Please continue working with your support folks. You can also ask them for escalation if you are not making progress.
Warren here again. This is a quick reference guide on how to calculate the minimum staging area needed
Pingback from DFS – Logs de eventos 4202, 4204, 4206, 4208 e 4212.
My name is Bryan Zink and I am a Microsoft Premier Field Engineer focused on supporting Windows Server
My name is Bryan Zink and I am a Microsoft Premier Field Engineer focused on supporting Windows Server
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
Top 10 Common Causes of Slow Replication with DFSR – Ask the Directory Services Team – Site Home – TechNet Blogs
This is a collection of the top Microsoft Support solutions for the most common issues experienced when
This is a collection of the top Microsoft Support solutions for the most common issues experienced when
This is a collection of the top Microsoft Support solutions for the most common issues experienced when