How VSS Tracing Works!
Where to start? Well this depends on your current level of knowledge on the topic. So let’s start with a basic understanding of what VSS is. If you are just starting to work with VSS I suggest reading the articles below.
The key points to focus on in this article are:
- Volume Shadow Copy Service Components
- Copy-on-Write (Differential Copy)
- Hardware-based Providers
- Software-based Providers
- How Shadow Copies Are Created
Below are a couple of TechNet articles as a reference.
How Volume Shadow Copy Service Works
Volume Shadow Copy Service
I like to think about VSS as being camera that takes pictures of your data. Like a picture a VSS snapshot is a point in time view of a volume. The overall concept is rather simple. We hold writes to a disk so we can take a picture of the blocks on that disk. We then release the hold and are left with a view of the data at the time we took the picture. In the case of a client-accessible snapshot we then use COW or Copy-on-Write to make a copy of any blocks that change. This allows you to build a view of your data from the time of the snapshot. Since we have any a copy of the changed blocks we can use them to restore our view of the volume. Backup applications use this same type of process only they use different types of snapshot depending on the type of backup being run.
Now that you have a basic understanding of how VSS works we can look at what to do when you run into errors. We have several tools at our disposal. VSSTrace.exe, Logman.exe, and TraceLog.exe are all tools that we can leverage to capture traces. Tracing VSS will provide you with a log file that contains what function was called and the result. From this we can see where VSS is failing and the error being returned.
One of the more common ways to capture a VSS trace is with Vsstrace.exe. Using this tool and a few other tools from the VSS SDK, like Vssagent.exe we can capture additional diagnostics data that helps provide a better understanding of the type of snapshot being ran and details about the error we encountered. VSSAgent collects disk, volume, shadowcopy storage, events log entries and other critical information necessary to provide context to the trace you are capturing.
One of the most important things to know when reviewing a VSS trace is the context. Case in point, are you looking at the actions of a software provider or hardware provider? Or, maybe the trace is simply a command being run in which case we would not be expecting the see the same function calls as we would with a complete snapshot. For example,
vssadmin list writers
Knowing the intention of the requestor at the time of the snapshot is important to understanding what to look for in the trace.
Tracing Options Explained
Let’s spend some time and talk about the options available when capturing tracing of VSS. For this I am going to focus on VSSTRace.exe. VSSTRace.exe allows us to use the various tracing levels and options available to us from the VSS service. For example:
vsstrace.exe -o trace.txt
This will generate a trace file in plain text format. Where this command:
vsstrace.exe -etl trace.etl
This would create an ETL formatted file which you would not be able to read with a text editor.
So let’s say you were having an issue where you wanted to only capture tracing from action performed by the writers. You could run the following command:
vsstrace.exe -f 0 +WRITER
In this command the “-f” would indicate the enabling of flags and the “0” would indicate no modules are going to be traced. Now that we have told tracing to not capture anything we have to add back the stuff we want. That is where the “+WRITER” comes into play. This enables tracing for just the writers.
There is a great chart on MSDN that provides you with all the various modules and flags that can be turned on (http://msdn.microsoft.com/en-us/library/windows/desktop/dd765233(v=vs.85).aspx). To find out more about Vsstrace.exe, please see:
Using VSS Diagnostics
Level is the next thing we should cover. There are 21 different log levels for VSS tracing. The most commonly used is 170. This is the default log level for Vsstrace.exe; however, this can be adjusted by appending to the command line we use to start the trace. Here is the command you would use with VSSTrace.exe to capture all events for only the VSS service components.
vsstrace.exe +f 0xffff -COORD -l 255
This chart has all the log levels listed for you. This can also be found in greater detail on the above mentioned links to MSDN.
Information included in trace output
Event Log activity
Function enter and exit
Function return values
Function parameters (terse)
Function parameters (verbose)
Verbose information level 1
Verbose information level 2
Verbose information level 3
Fast Code Level 1
Fast Code Level 2
Fast Code Level 3
Taking a Trace
Now it’s time for us to do some tracing. Earlier we talked about the options available to us, now let’s put them practice. We will start with a basic trace that will capture all modules and write the trace data to a plain text file on C:\TEMP.
vsstrace.exe +f 0xffff -o C:\temp\trace.txt
Now say you need a trace of only the Windows built-in in software provider. You would run the following command:
vsstrace.exe -f 0 +SWPRV -o C:\trace.txt
Here we see that the “-f 0” indicates no modules to be traced, and then we add the “+SWPRV” to indicate that we will be only tracing the built-in software provider to a text file located at “C:\Temp\Trace.txt”. This kind of granularity is useful when trying to isolate a particular module.
If you need to capture a greater level of detail within the trace. We can add the “-l” flag to set the level of logging to capture. By default we use 170 from the above chart. Let’s turn that up to the maximum setting of 255. The command would now look like this:
vsstrace.exe -l 255 –o C:\trace.txt
This setting will get you every possible message from all modules. That being said, keep in mind that this comes at the cost of space. The trace file will grow in size quickly depending on the flags set. In contrast to this by limiting logging to specific modules and trace levels you can limit the overall size on lengthy captures.
Reading the Trace
Now that we understand the options and tracing levels and have captured our trace, it’s time to figure out what’s wrong. The best way to do this is to use a text analysis tool like Notepad++ or some other tool with a filtering option. You are going to want to create filters on the following key words.
These key words should help point out where the trace is failing. Please remember that there are all kinds of patterns and failures that will be shown in these logs. Understanding them comes in time so don’t get discouraged if you cannot figure these out easily.
Here is an example of a VSS Trace.
—Trace Snip —
12402 [0480230921,0x001e34:0x1864:0x3bc9015a] modules\writers\evtlogwriter.cxx(0460): CEventLogWriter::BackupLogs: Failed to backup event log Microsoft-Windows-EventCollector/Operational to C:\WINDOWS\Repair\Backup\ServiceState\EventLogs\Microsoft-Windows-EventCollector/Operational.evt. [0x0000007b]
12403 [0480230937,0x001e34:0x1864:0x3bc9015a] modules\writers\evtlogwriter.cxx(0460): CEventLogWriter::BackupLogs: Throwing HRESULT code 0x8000ffff. Previous HRESULT code = 0x00000000
12404 [0480230937,0x001e34:0x2438:0x3bc9015a] coord\src\async.cxx(0509): CVssAsync::QueryStatus: Returning *pHrResult: 0x00042309
12466 [0480231203,0x001e34:0x1864:0x36bde5db] modules\registry\registry.cxx(1277): CVssDiag::RecordGenericEvent: Event name: VSS_WS_FAILED_AT_PREPARE_SNAPSHOT (SetCurrentState)
In the example, you can see a failure to access a particular resource during the snapshot process. Let’s break this down. The first line in example 1 shows that VSS was running the CEventLogWriter::BackupLogs: function and its result was:
Failed to backup event log Microsoft-Windows-EventCollector/Operational to C:\WINDOWS\Repair\Backup\ServiceState\EventLogs\Microsoft-Windows-EventCollector/Operational.evt. [0x0000007b]
Looking deeper at the results, we see that the failure code was 0x0000007b.
To see what the this error code is indicative of, you can run the following at an administrative command prompt:
C:\> slui.exe 0x2a 0x0000007b
This will give you a dialog box that gives the description:
You can also use a tool such as Err.exe.
Taking what we just learned about the error code we can look at the path to the object that we were trying to access. Notice that the path is invalid as the path contains a “/”. From this we can now investigate that issue with that path.
Now that we have taken tracing from A-Z, you should be well armed to get out there and collect some traces. Remember it takes time to get the hang of what’s going on in a trace.
Good luck and successful backups!
Sr. Support Escalation Engineer