Frequently in support, we encounter several backup related calls for Exchange 2010 databases. A sample of common issues we hear from our customers are:
- “My backup software is not able to take a successful snapshot of the databases”
- “My backups have been failing for quite a while. I have several thousand log files consuming disk space and I will eventually run out of disk space”
- “My backup software indicates that the backup is successful but at the end of my backup, logs do not truncate”
- “The Exchange Writer /VSS writer is not in a stable state (state is listed as ‘Retryable‘, ’Waiting for completion‘ or ’Failed’)”
- “We suspect that the Volume Shadow Copy Service (VSS) is failing on the server and hence there are no successful backups”
It is critical to understand how backups and log truncation work in Exchange 2010. If you haven't already done so, check out our three-part blog series by Jesse Tedoff on backups and log truncation in Exchange 2010, Everything You Need to Know About Exchange Backups*.
When troubleshooting backups in Exchange 2010 we are interested in two writers – the Exchange Information Store Writer (utilized for active copy backups) and the Exchange Replica Writer (utilized for passive copy backups). The writers are responsible for providing the metadata information for databases to the VSS Requestor (aka the backup software). The VSS Provider is the component that creates and maintains shadow copies. At the end of successful backups, when the Volume Shadow Copy Service signals backup is complete, the writers initiate post-backup steps which include updating the database header and performing log truncation. (For more details, see Exchange VSS Writers on MSDN.)
As explained above, it is the responsibility of the VSS Requestor to get metadata information from Exchange writers and at the end of successful backup, VSS service signals backup complete to the Exchange writers so the writers can perform post-backup operations.
The purpose of this blog is to discuss the VSSTester script, its functionality and how it can help diagnose backup problems.
What does the script to?
The script has two major functions:
- Perform Diskshadow backup of a selected Exchange database so we can exercise the VSS framework in the system, so at the end of a successful snapshot, database header is updated and log files are truncated. We will discuss in detail what Diskshadow is and what it does.
- The second function of this script is to collect diagnostic data. For backup cases, there is a lot of data that needs to be collected. To get the diagnostic data you may have to manually go to different places in the Exchange server and turn on logging. If that is not done correctly, we will miss getting crucial logs during the time of the issue. The script makes the data collection process much easier.
- The current version of the script works only on Exchange 2010 servers.
- The script needs to be run on the Exchange server that is experiencing backup issues. If you are having issues with passive copy backups, please go to the appropriate node in the DAG and run the script. For example: You may have Database A having copies on Server1, Server2 and Server3. Server1 hosts the active copy of the database. If backups of the active copy have previously failed run the script on Server1. Otherwise run script on whichever of the remaining servers has failed previously when backing up the passive copy.
- Please ensure that you have enough space on the drive you save the configuration and output files. Exchange and VSS traces, Diagnostic logs can occupy up to several GBs of drive space depending on the time taken for taking backup. For example: Running the script in a lab environment consumed close to 25MB of drive space a minute.
- The script is unsigned. On the server where you run the script you will have to set the execution policy to allow unsigned PowerShell scripts. Please see this for how to do this.
The script can be run on any DAG configuration. You can use this to troubleshoot Mailbox and Public folder database backup issues. Databases and log files can be on regular drives or mount points. Mix and match of the two will also work!
Let us discuss in detail the two main functionalities of the script.
Diskshadow functionality and how the script uses it
What is Diskshadow and why do we utilize it in VSSTester script?
Diskshadow.exe is a command line tool built in to Windows Server 2008 operating system family as well as Windows Server 2012. Diskshadow is an in-box VSS requestor. It is utilized to test the functionality provided by the Volume Shadow Copy Service (VSS). For more details on Diskshadow please visit:
The best part about Diskshadow is that it includes a script mode for automating tasks. This feature of Diskshadow is utilized in the VSSTester. The shadow copy done by Diskshadow is a snapshot of the entire volume at a given point in time. This copy is read-only.
More details on how a shadow copy is created, please visit the following link: http://technet.microsoft.com/en-us/library/ee923636(v=ws.10).aspx
During the course of the blog post, I will be mentioning the term “Diskshadow backup”. It is very important to understand that the term “backup” is relative here. Diskshadow uses the VSS service and gets the appropriate writer to be utilized for the snapshot. The writer will provide the metadata information of database /log files to the Diskshadow. After which Diskshadow utilizes the VSS Provider to create a shadow copy.
After a successful shadow copy /snapshot of databases and log files, the VSS Provider signals an end-backup to Exchange writers. To Exchange this looks like a full backup has been performed on the database. The key to understand here is NO data is actually transferred to a device, tape etc. This is only a test! You will see events in the application logs that usually show up when you take a regular backup, but NO data is actually backed up. Diskshadow has simply run all the backup APIs through the backup process without transferring any data.
The VSS Provider will take a snapshot of all the databases and logs (if present) on the volume. We will be doing a mirrored snapshot of the entire volume at the point in time when Diskshadow was run. Anything that is on the volume will be part of the snapshot. During the Diskshadow backup, we will be utilizing either the Information store writer (for active copy backup) or the Replica Writer (Passive copy backup) to provide the metadata information for the database.
When you use the VSSTester script, it prompts you for a database to be selected to perform the Diskshadow backup. When we take a snapshot of the volume all other databases (if present on the same drive) will be part of the snapshot, but post-backup operations will happen only on the selected database. This is because we will be utilizing either the Information store Writer (Active Copy Backup) or the Replica Writer (Passive copy backup) that is associated with the selected database. DB headers get updated based on VSS Requestor interaction with the Exchange writer that was utilized, which in turn leads to log truncation. Hence the header of the selected database will be only updated and logs will be purged (only for that the selected database) without being backed up.
When would you be interested in utilizing this Diskshadow functionality of the script?
You would be interested to utilize this functionality in almost all scenarios that I discussed at the start of this blog post. In addition to those scenarios another one that is not related to backups sometimes arises:
- “I had an unexpected high transactional log growth issue in my exchange 2010 environment and now I am on the verge of losing all disk space in the logs directory. I do not have the time to perform a backup to truncate logs and my goal is to safely remove all the log files”
In the scenario mentioned above (and, by the way, if you have that problem, please go here), Exchange administrators would like to avoid causing a service outage by dismounting the database, removing log files and remounting the database. Another downside to manually removing the log files is breaking replication if the database has replicas across Database Availability Group members.
If you are willing to forgo a backup of the log files you can use the Diskshadow functionality of the script to trigger the backup APIs and tell Exchange to truncate the log files. The truncation commands will replicate to the other database copies and purge log files there as well. If successful, the net result is that the database will not go offline for lack of disk space on the log drive, but you will not have the security of retaining those log files for a future restore.
A sample run of the VSSTester script (with Diskshadow functionality)
Let me demonstrate the Diskshadow functionality of the script.
The Script can be downloaded from here.
The script initializes and gives us the following options.
We select the option 1 to test backup using the built-in Diskshadow function.
If the path does not exist, the script will create the folder for you.
We gather the server name and verify it is an Exchange 2010 server. The script will check for the VSS writer status on the local machine. If we detect, any of the writers are not in a “Stable” state, the script will exit. You will need to restart the service associated with the writer to get the writers to a stable state (The Replication service for the Replica Writer or the Information Store service for the Exchange Writer).
The script then gets a list of databases present on the local server and displays the database name, if database is mounted or not and what is the server that holds the active copy of the database. You will have to select the number of the database.
Note: If the user does not provide an input, the script will automatically select the last database in the list.
In my case, I selected database mdb5. The number to enter would be 8.
The next important check is ensuring that the database’s replicas (if present) are healthy. If we detect that one of the copies is not healthy, the script will exit mentioning that the database copies need to be in healthy status before running the script.
The script next detects the location of the database file and log files. We create the Diskshadow configuration file on the fly every time a database is selected. This configuration file is also saved to the location you had specified earlier (in the example screenshots of this blog c:\vsstesterlogs) to save the configuration and output files. In this case the log files are in a mount point and the database file is on a regular volume. The script will add the appropriate volumes to the disk shadow file.
The script will then prompt you to provide the drive letters to expose the snapshots. A common question that arises is, do I need to initialize the drive before I specify a drive letter? The answer is no!
You will be specifying a drive letter that is currently not in use, so Diskshadow will create a virtual drive and expose the snapshot. Remember, the virtual drive that exposes the shadow copy is a read-only volume. The shadow copy is a read only copy .If the database and logs are in the same mount point / drive only, one drive letter is required to expose the snapshot, otherwise you will need to provide two different drive letters. One for exposing database snapshot and another for log files.
When you select the option to perform the Diskshadow backup, the script will automatically collect Diagnostic logs, ExTRA traces and VSS traces. Also verbose logging is turned on for Diskshadow. Whatever activity the script does is also logged in to transcript log and saved in the output files directory (c:\vsstesterlogs in this example).
Note: If you are performing a passive copy backup, ExTRA tracing will also be turned on in the active node. At the end of the script, we turn off ExTRA tracing in the active node and it will be automatically moved to the passive node. The active node ETL will be placed in the logs folder you had specified at the start of the script. .
Now, the main Diskshadow function will execute.
In the screenshots below we have excluded all other writers on the system that are associated with all other databases on the node (that are mounted or be replicas) and we are ONLY utilizing the writer associated with the selected database. This node hosts the passive copy of the database MDB5. Hence, the writer utilized will be associated with the Replication service aka the Microsoft Exchange Replica Writer.
From the screen shot below, you can see that VSS Provider has taken a successful snapshot of the database and signaled end backup to the replica writer.
Now that we performed a successful snapshot of the database and log files, all the logging that was turned on will be turned off. The log files will be consolidated in the logs folder that you specified earlier at the start of the script. The script checks the VSS writer status after the backup is complete.
When the snapshot operation is complete, you will be prompted for an option to either remove the snapshot or leave the snapshots exposed in Windows Explorer.
I selected the option to remove the snapshot; hence we will be invoking Diskshadow again to delete the snapshot created earlier.
Let us discuss in detail exposing and removing snapshot functionality:
- Remove snapshots - The snapshots that were taken earlier (database or log files) will be exposed in the Windows explorer, if the snapshot operation was successful. In this script we expose the snapshots as a drive letter (that you had specified earlier). If you do not want to have a copy of the log files, you may chose this option and the snapshot will be deleted. All the logs that got purged after post-backup will be present in this read only volume and when this volume is removed they will be deleted forever.
- Expose Snapshots – You may choose to have the snapshots exposed. Later, if you want to delete the snapshot, please do the following
- Open Command prompt
- Type in Diskshadow
- Delete shadows exposed <volume>
Note: It is highly recommended to take a full backup of the database using your regular backup software after utilizing Diskshadow.
After this, the script collects the application and system logs. The script filters them to cover only the period you started the script to the present. The transcript log is also stopped. The logs will be saved as a text file and saved in the output folder you had specified earlier (c:\vsstesterlogs in this example).
The most reliable method to verify log truncation takes place is to get the log sequence before and after the backup. Hence, before running the script I ran eseutil/ml ENN (the log generation prefix associated with database).
Post-backup, when I ran the same command, and can see:
We can clearly see a difference in the start of the sequence, meaning log truncation has occurred for the database. One more verification that can be done is to check the database header. We can see that the database header got updated to the most recent time, where Diskshadow was run.
I ran the script; what have I accomplished?
If the script finished successfully:
- We were able to successfully test and exercise the underlying VSS framework in the server. Volume Shadow Copy service was able to successfully identify and utilize the Exchange writers in the box
- The Exchange writers are able to provide the metadata information to the VSS Requestor (Diskshadow)
- VSS Provider was able to successfully create a snapshot /shadow copy
- VSS successfully signaled the Exchange writers on backup complete
- The Exchange writers were able to perform the post snapshot operations which included log truncation.
Let us now look in to the other major functionality of the script.
Enable logging to troubleshoot backup issues
Use this if you do not want to test backup using Diskshadow and you just want to collect diagnostic logs for troubleshooting backup issues.
You may collect the diagnostic logs and have them handy before calling Microsoft Support saving a lot of time in the support incident because you can provide the files at the beginning of the case.
This time we will be selecting option 2 to enable logging.
Selecting this option does the majority of the things that the script did earlier, EXCEPT Diskshadow of course!
After checking the writer status, you can select the database to backup. We will be enabling all the logging like before (Diagnostic Logging, ExTRA, VSS tracing). Remember that, even though you would still be selecting one database - diagnostic logging, ExTRA tracing, VSS tracing are not database specific and are turned on at the server level. When you are utilizing the script to troubleshoot backup issues you can select any one database on the server and it will turn on appropriate logging on the server.
After the logging is turned on and traces enabled, you will see:
Now you will need to start your regular backup. After the backup completes/fails, you will need to come back to the PowerShell window where you are running the script and use the “ENTER” key to terminate the data collection. The script then disables diagnostic logging and tracing that was turned up earlier. If needed it will copy diagnostic logs from the active node for that database copy as well.
The script will again check for writer status after the backup then collect the application and system logs. It will stop the transcript log as well.
At this point, in order to troubleshoot the issue, you can open a case with Microsoft Support and upload the logs.
I hope this script helps you in better understanding the core concepts in Exchange 2010 backups, thus helping you troubleshoot backup issues! You can utilize Diskshadow to test Volume Shadow Copy Service and also check if the Exchange writers are performing as intended. If Diskshadow completes successfully without any error and you are still experiencing issues with backup software, you may need to contact the backup vendor to further troubleshoot the issue.
Your feedback and comments are most welcome.
Special thanks to Michael Barta for his contribution to the script, Theo Browning and Jesse Tedoff for reviewing the content.