Where’s my file? Root cause analysis of FRS and DFSR data deletion

Hi, Ned here. In the Directory Services support space here at Microsoft, we are often contacted by customers for disaster recovery scenarios. We’re also brought in for deeper forensic analysis of what lead to a problem. Today we’re going to talk about a situation that covers both:

  • A customer has seen some critical data go missing.
  • That data was replicated via the File Replication Service (FRS) or the Distributed File System Replication (DFSR) Service.
  • Before they restore the data with their backup copy, they want to have root cause on who deleted what and where it started. We can’t do this after restoring data because our whole audit trail will of course be destroyed within the respective JET databases.

FRS Deletion Forensics – The Where and When

You need to start by determining the name of some folder or file that has been deleted. It's important that this be exact as we will be using it to search. You will need the full original path since it is possible just the name could be duplicated throughout the content set.

  • For this example we have three servers called 2003SRV13, 2003SRV16, and 2003SRV17.
  • We have a folder called c:\frstestlink\importantfolder13 that has been deleted.
  • It contained a file called c:\frstestlink\importantfolder13\importantfile13.doc which was deleted (naturally). Our folder could contain thousands of files but we just need to know one. That’s easy, someone is screaming at you that it’s missing. :)

Install FRSDIAG on any server that participated in the FRS content set where data was deleted.

Open a CMD prompt and navigate to the FRSDIAG directory. This will default to:

c:\program files\windows resource kit\tools\frsdiag

You’ll see that we have a very useful utility called NTFRSUTL.EXE. Running it with /? will show you its options:

ntfrsutl [idtable| configtable | inlog | outlog] [computer] = enumerate the service's idtable/configtable/inlog/outlog
computer = talk to the NtFrs service on this machine.

ntfrsutl [memory|threads|stage] [computer]
= list the service's memory usage
computer = talk to the NtFrs service on this machine.

ntfrsutl ds [computer]
= list the service's view of the DS
computer = talk to the NtFrs service on this machine.

ntfrsutl sets [computer]
= list the active replica sets
computer = talk to the NtFrs service on this machine.

ntfrsutl version [computer]
= list the api and service versions
computer = talk to the NtFrs service on this machine.

ntfrsutl forcerepl [computer] /r SetName /p DnsName
= Force FRS to start a replication cycle ignoring the schedule.
= Specify the SetName and DnsName.
computer = talk to the NtFrs service on this machine.
SetName = Name of the replica set.
DnsName = DNS name of the inbound partner to force repl from.

ntfrsutl poll [/quickly[=[N]]] [/slowly[=[N]]] [/now] [computer]
= list the current polling intervals.
now = Poll now.
quickly = Poll quickly until stable configuration retrieved.
quickly= = Poll quickly every default minutes.
quickly=N = Poll quickly every N minutes.
slowly = Poll slowly until stable configuration retrieved.
slowly= = Poll slowly every default minutes.
slowly=N = Poll slowly every N minutes.
computer = talk to the NtFrs service on this machine.

Cool stuff – you can force replication, list out various FRS configuration info, or get performance stats. It’s great for scripting.

In order to figure what happened, we will dump out some tables from the FRS JET database by executing:

NTFRSUTL OUTLOG > outlog.txt
NTFRSUTL IDTABLE > idtable.txt

(Note: the output from NTFRSUTL IDTABLE is not the same as collecting IDTABLE information with FRSDIAG’s GUI console).

We then start FRSDIAG and click the 'Browse' button. Drop down the Replica Set and select the one that contained deleted data. Click 'Add All' to add the members. Then click ok.

Click 'Tools' then select 'Build GUID2Name for Target Server(s) '.

This will create us a text file that lists out GUID and its associated SERVER NAME, like so:

======================================================
Replica Set GUID : 6f83352f-f404-4eda-a714ae1691e3e9d8
Replica Set Name : FRSTEST|FRSTESTLINK
======================================================

GUID                                MEMBER NAME                            SERVER NAME
----                                -----------                            -----------
30409f5d-8493-41ad-a98ab03fc1b795e5 {6AEC89F5-24B5-4C1B-B15E-6EFE5A60B75C} 2003srv16 e8219dee-532a-4dff-83f09f036e331daa {6F617C11-2997-4134-952B-5B3572D4AF70} 2003srv17
e8feaedc-6bce-41f4-94c39cade3932da8 {88600E1E-AAF8-4C70-A1C3-A36EB471E2B3} 2003srv13

We open the IDTABLE.TXT file and search for our file we know was deleted:

Table Type: ID Table for FRSTEST|FRSTESTLINK (1)
FileGuid                     : 3647d318-502f-11dc-a1070003ff6813c5
FileID                       : 00130000 00003e45
ParentGuid                   : 2d7a8327-7308-4464-a63e367e39c27690 << Folder it was in
ParentFileID                 : 000d0000 00003e37
VersionNumber                : 00000001
EventTime                    : Wed Aug 22, 2007 11:57:26 << when deleted
OriginatorGuid               : 30409f5d-8493-41ad-a98ab03fc1b795e5 << where deleted

OriginatorVSN                : 01c7e4c3 ea6dd496
CurrentFileUsn               : 00000000 001be408
FileCreateTime               :
FileWriteTime                :
FileSize                     : 00000000 000000a0
FileObjID                    : 3647d318-502f-11dc-a1070003ff6813c5
FileName                     : importantfile13.doc << here's our file
FileIsDir                    : 00000000
FileAttributes               : 00000020 Flags [ARCHIVE ]
Flags                        : 00000001 Flags [DELETED ] << Proof of deletion
ReplEnabled                  : 00000001
TombStoneGC                  : Sun Oct 21, 2007 11:57:26
OutLogSeqNum                 : 00000000 00000000
Spare1Ull                    : 00000000 00000001
MD5CheckSum                  : MD5: a41eea20 979f04e9 dff7592a e8dc3e8b
RetryCount                   : 0
FirstTryTime                 :

We then look in the OUTLOG.TXT to confirm the folder matches up using ParentGuid above:

Table Type: Outbound Log Table for FRSTEST|FRSTESTLINK (1)
SequenceNumber               : 0000008d
Flags                        : 01000024 Flags [Content LclCo CmpresStage ]
IFlags                       : 00000001 Flags [IFlagVVRetireExec ]
State                        : 00000014  CO STATE:  IBCO_OUTBOUND_REQUEST
ContentCmd                   : 00002000 Flags [RenNew ]
Lcmd                         : 0000000f  D/F 1   NoCmd
FileAttributes               : 00000010 Flags [DIRECTORY ]
FileVersionNumber            : 00000001
PartnerAckSeqNumber          : 00000000
FileSize                     : 00000000 00000000
FileOffset                   : 00000000 00000000
FrsVsn                       : 01c7e44b df699803
FileUsn                      : 00000000 001bd690
JrnlUsn                      : 00000000 001bd690
JrnlFirstUsn                 : 00000000 001bd690
OriginalReplica              : 1  [???]
NewReplica                   : 1  [???]
ChangeOrderGuid              : cf3bc76b-b3e1-4e72-ae8962371bb48501
OriginatorGuid               : e8feaedc-6bce-41f4-94c39cade3932da8
FileGuid                     : 2d7a8327-7308-4464-a63e367e39c27690 << there’s our GUID
OldParentGuid                : 53605485-4dd4-4b9a-bc5a022760515559
NewParentGuid                : 53605485-4dd4-4b9a-bc5a022760515559
CxtionGuid                   : 92f5d906-cd45-4639-973021461e454c8b
Spare1Ull                    :
MD5CheckSum                  : MD5: b68d5ccf 21f8b5dd e7eb48f1 f45b01d9
RetryCount                   : 0
FirstTryTime                 : Wed Aug 22, 2007 11:55:21
EventTime                    : Wed Aug 22, 2007 11:55:18
FileNameLength               :       34
FileName                     : ImportantFolder13 << definitely our folder that was deleted
Cxtion Name                  : <Jrnl Cxtion> <- <Jrnl Cxtion>\<Jrnl Cxtion>
Cxtion State                 : Joined

Then we can verify back in the IDTABLE.TXT what the time and source were:

Table Type: ID Table for FRSTEST|FRSTESTLINK (1)
FileGuid                     : 2d7a8327-7308-4464-a63e367e39c27690
FileID                       : 000d0000 00003e37
ParentGuid                   : 53605485-4dd4-4b9a-bc5a022760515559
ParentFileID                 : 00030000 000039bc
VersionNumber                : 00000002
EventTime                    : Wed Aug 22, 2007 11:57:23 << there's the delete time
OriginatorGuid               : 30409f5d-8493-41ad-a98ab03fc1b795e5 << here's the source of the delete
OriginatorVSN                : 01c7e4c3 ea6dd495
CurrentFileUsn               : 00000000 001beba8
FileCreateTime               :
FileWriteTime                :
FileSize                     : 00000000 00000000
FileObjID                    : 2d7a8327-7308-4464-a63e367e39c27690
FileName                     : Dc9
FileIsDir                    : 00000001
FileAttributes               : 00000010 Flags [DIRECTORY ]
Flags                        : 00000001 Flags [DELETED ] << confirmed that it's been deleted
ReplEnabled                  : 00000001
TombStoneGC                  : Sun Oct 21, 2007 11:57:23
OutLogSeqNum                 : 00000000 00000000
Spare1Ull                    : 00000000 00000000
MD5CheckSum                  : MD5: b68d5ccf 21f8b5dd e7eb48f1 f45b01d9
RetryCount                   : 0
FirstTryTime                 :

===

Now we look back at the GUID2Name table we generated earlier, we can see that:

30409f5d-8493-41ad-a98ab03fc1b795e5 = 2003SRV16.

So we know that at Wed Aug 22, 2007 11:57:23 AM on server 2003SRV16, something or someone deleted all this data. Wasn’t that easy? :) If we had object access auditing enabled on that server at the time and the folder configured for auditing, we can even see who did it. More on this later…

DFSR Deletion Forensics – The Where and When

Since DFSR exposes nearly all its interfaces through WMI, we can use a powerful command-line utility called WMIC that can be used to return useful info from the databases. This way we don’t need to rely on add-on tools and debug logs and such. For my example below I am intentionally not using VBScript as I want everyone to understand exactly what it is we’re doing – but feel free to script it up, all the WMI classes are well documented on MSDN.

So here we go again:

  • We have our three servers 2003SRV13, 2003SRV16, and 2003SRV17.
  • All are in a Replication Group called ImportantData.
  • They have a Replicated Folder called… wait for it… ReplicatedFolder.
  • That folder contains various files and folders, including a folder called ImportantSubFolder. It contains some files, including one called critical.doc. Naturally, someone has deleted critical.doc… let’s figure out where and when.

First we open a CMD prompt as an admin and dump the Replicated Folder info like so:

C:\>wmic /namespace:\\root\microsoftdfs path DfsrReplicatedFolderInfo get ReplicatedFolderGuid,ReplicatedFolderName,ReplicationGroupName > rfinfo.txt

This returns the following into our rfinfo.txt file:

ReplicatedFolderGuid                  ReplicatedFolderName  ReplicationGroupName
8722EF11-6466-4472-888F-11B8A57B68A4  replicatedfolder      importantdata

Now we have enough info to confirm we're looking at the right data. Let's get the status of the deleted file by running this command and providing it the file name and the ReplicatedFolderGuid from above:

C:\>wmic /namespace:\\root\microsoftdfs path DfsrIdRecordInfo WHERE (filename= 'critical.doc' and replicatedfolderguid='8722EF11-6466-4472-888F-11B8A57B68A4' ) get filename,flags,updatetime,GVsn > file.txt

Our output file.txt contains:

FileName Flags GVsn UpdateTime

critical.doc 4 {BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC} -v113867 20070823233010.774625-000

We can see that it was deleted on Aug 23, 2007 at 23:30:10 (11:30 PM) GMT time. For us this means 7:30PM EDT.

The Flags value of 4 tells us it's been deleted; examine the chart below. An ordinary replicated file will have a Flags value of 5 (meaning 0x1 && 0x4 for Present and Replicated). If 4 the Present flag has been removed meaning the file is tombstoned, i.e. removed from the replica.

Value Meaning
PRESENT_FLAG0x1 The resource is not a tombstone; it is available on the computer.
NAME_CONFLICT_FLAG0x2 The tombstone was generated because of a name conflict. This flag is meaningful only for tombstones.
UID_VISIBLE_FLAG0x4 The ID record has already been sent out to other partners; therefore, other partners are aware of this resource.
JOURNAL_WRAP_FLAG0x10 The volume has had a journal wrap and the resource has not been checked to determine if there is any change by the journal wrap recovery process.
PENDING_TOMBSTONE_FLAG0x20 The ID record is in the process of being tombstoned (or deleted.)

The GVsn value is important in that the GUID inside those curly brackets will always contain the unique database GUID of the server where the file was last changed. Since deleting the file counts as a change, now we just need to figure out who owns that GUID. So let's use the DFSRDIAG command to find the culprit:

C:\>dfsrdiag guid2name /guid:BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC /rgname:importantdata

Which returns:

Object Type : DfsrVolumeInfo
Computer    : 2003SRV16.fabrikam.com  << The Server where the delete occurred
Volume Guid : 346CA491-54BA-11DB-91ED-806E6F6E6963
Volume Path : C:
Volume SN   : 1826913329
DB Guid     : BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC

Badda-bing! That BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC GUID matches our GVsn above. There's our guy. What is it with people deleting files and folders off this 2003SRV16 server? We should have a chat with their site admin...

The same exact steps will work for folders. Let's do it real fast this time and figure out where someone deleted the whole ‘importantsubfolder’:

C:\>wmic /namespace:\\root\microsoftdfs path DfsrReplicatedFolderInfo get ReplicatedFolderGuid,ReplicatedFolderName,ReplicationGroupName > rfinfo.txt

ReplicatedFolderGuid                  ReplicatedFolderName  ReplicationGroupName
8722EF11-6466-4472-888F-11B8A57B68A4  replicatedfolder      importantdata

C:\>wmic /namespace:\\root\microsoftdfs path DfsrIdRecordInfo WHERE (filename='importantsubfolder' and replicatedfolderguid= '8722EF11-6466-4472-888F-11B8A57B68A4' ) get filename,flags,updatetime,GVsn,clock > file.txt

FileName            Flags  GVsn                                        UpdateTime
importantsubfolder  4      {97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v31  20070823235332.792930-000 

C:\>dfsrdiag guid2name /guid:97DA0CC3-DBB4-437F-BB6F-BE8A970FE318 /rgname:importantdata

   Object Type : DfsrVolumeInfo
Computer    : 2003SRV13.fabrikam.com
Volume Guid : 929F0871-54B9-11DB-B293-806E6F6E6963
Volume Path : C:
Volume SN   : 1826913329
DB Guid     : 97DA0CC3-DBB4-437F-BB6F-BE8A970FE318

Done! Looks a lot easier the second time around, doesn’t it?

Note that when a deletion is replicated in to a DFSR server, the file by default is moved to \DfsrPrivate\ConflictAndDeleted under the root of the replicated folder. If the delete was not replicated in, but instead was the result of a local deletion, the file is moved to the Windows Recycle Bin (unless you held down the SHIFT key while deleting, in which case the file is deleted for good). By default the quota on ConflictAndDeleted is 660 MB but that is configurable on the Advanced Tab in the replicated folder properties. In the same location un-checking the “Move deleted files to Conflict And Deleted folder” box will make it so deletions that are replicated in are actually deleted for good without being moved to ConflictAndDeleted.

The information about all the data residing in ConflictAndDeleted is contained in the \DfsrPrivate\ConflictAndDeletedManifest.xml file. When the quota is reached, files are purged from ConflictAndDeleted folder and the ConflictAndDeletedManifest.xml in the order that they were put there. This means you have a limited amount of time to catch a deletion and be able to restore it from ConflictAndDeleted.

There is a sample script for restoring data from ConflictAndDeleted. This is needed because the folder structure of deleted data is flattened and all data resides directory off the root of ConflictAndDeleted, and the filename is appended with the GVSN. The script reads the ConflictAndDeletedManifest.xml so it knows the original file names and folder structure. But you can also determine that using the DfsrConflictInfo WMI class. For example you can check for the presence of a file in a given server’s ConflictAndDeleted folder by running:

C:\>wmic /namespace:\\root\microsoftdfs /node:2003srv13 path DfsrConflictInfo where "filename like 'critical.doc% '" get * /format:textvaluelist

ConflictFileCount=1
ConflictPath=\\.\C:\replicatedfolder\importantsubfolder\critical.doc
ConflictSizeInBytes=881211
ConflictTime=20070823233010.000000-000
ConflictType=5
FileAttributes=32
FileName=critical.doc-{97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v27
GVsn={97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v27
MemberGuid=2C50672F-32A2-4D7D-AF44-88E1812F6E08
ReplicatedFolderGuid=8722EF11-6466-4472-888F-11B8A57B68A4
ReplicationGroupGuid=6143BD54-C9CC-42E1-A1FA-03BB34BF87F2
Uid={97DA0CC3-DBB4-437F-BB6F-BE8A970FE318}-v27

The trailing % wildcard is needed because the FileName property has the GVSN of the file appended to it. The ConflictPath property contains the original path and file name for the file before it was deleted.

Auditing – The Who

Now that we’ve covered when and where, the Windows Auditing subsystem can be used to tell you who via the Object Access Auditing setting. The important take away is that you need to have it in place and the audit descriptors set on your files and folders before you need them. Setting it up after the files have gone missing isn’t going to buy you anything. I can tell your eyes have started to glaze over on this post so I’m going to wrap it up here. J

To set up Object Access Auditing you can follow this checklist and set your critical replicated folders to audit EVERYONE for DELETE and process that with inheritance on down. We really don’t usually care about files being changed and certainly not added, but deletions drive end users nuts.

An important thing to understand about Auditing in Windows 2000 and 2003 is that it’s bound by some legacy limitations in the Event Log system (no longer true in Windows Server 2008 or Vista). Basically, you want to keep the total size of all your event logs at around 300MB total or they will become unstable. You’ll find that enabling Object Access Auditing is going to make your Security event logs wrap pretty often, so if it was only 256MB you wouldn’t have much time for forensics. You can get around this by using KB312571 to configure AutoBackupLogFiles and save off the logs when they wrap automagically. Then they can be backed up and deleted periodically with a scheduled task or whatever you like.

Auditing is not free – it costs CPU time, disk I/O, and can increase memory usage within LSASS. Please be sure to test this for your environment. Really.

For excellent tips and deep explanation of Windows Auditing, check out the Windows Security Logging and Other Esoterica MSDN blog by Eric Fitzgerald (who ran Auditing for many years as a Program Manager).

For everything Auditing, check out the Windows Server 2003 Auditing web portal

- Ned Pyle