Exchange and VSS -- My Exchange writer is in a failed retryable state…

In Exchange 2007 and Exchange 2010 many customers are leveraging VSS based backups to retain and protect their Exchange data.  By default Exchange provides two different VSS writers that share the same VSS writer ID but are loaded by two different services.  The first is the Exchange Information Store VSS writer and the second is the Exchange Replication Service VSS writer.  The Information Store writer allows for the backup of active / mounted databases and the replication service writer allows for the backup of passive databases (should a replicated database model be utilized).  You can see the writers by running the command VSSADMIN LIST WRITERS from a command prompt.

 

Here is a sample put of a VSSAdmin List Writers from a Windows 2008 R2 SP1 server with Exchange 2010 SP1.  Note how both writers share the same writer ID within the VSS framework.

 

Writer name: 'Microsoft Exchange Replica Writer'
Writer Id: {76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}
Writer Instance Id: {17e8df11-a8a2-4ee3-a3fb-e552b7da2d83}
State: [1] Stable
Last error: No error

 

Writer name: 'Microsoft Exchange Writer'
Writer Id: {76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}
Writer Instance Id: {e0ad4b68-8938-4be5-9b88-4c74df2b2d65}
State: [1] Stable
Last error: No error

In the course of protecting Exchange servers there maybe conditions that cause a backup job to fail.  When an Exchange backup job fails the VSS framework aborts the backup and subsequently Exchange clears the backup in progress settings.  When a failure is encountered either a single Exchange writer or both Exchange writers maybe left in a FAILED RETRYABLE state.  We can utilize VSSAdmin List Writers again to query the writer status and see these results.  Here is an example showing the Exchange Replication Service writer with a status 8 FAILED last error RETRYABLE.

 

Writer name: 'Microsoft Exchange Replica Writer'
Writer Id: {76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}
Writer Instance Id: {17e8df11-a8a2-4ee3-a3fb-e552b7da2d83}
State: [8] Failed
Last error: Retryable error

 

Writer name: 'Microsoft Exchange Writer'
Writer Id: {76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}
Writer Instance Id: {e0ad4b68-8938-4be5-9b88-4c74df2b2d65}
State: [1] Stable
Last error: No error

 

Now the typical question that comes up at this point is how do I actually deal with an Exchange writer that consistently disallows backups.  The answer – restart the service that the writer was associated with and/or fix whatever configuration issue is causing the failures.  For example, given the above output I would restart the Exchange Replication Service in an attempt to return the writer to a Stable No Error state.  (If it would have been the Microsoft Exchange Writer I would have restarted the Exchange Information Store Service).

The real question though is do I need to deal with a writer that is in a failed state?  Unfortunately many administrators find themselves having to deal with a writer in a failed state because their experience is that while the writer is in a failed state subsequent backup jobs fail.  If reviewing the issues carefully what you’ll find is that the backup jobs are not failing because of a VSS failure but rather they are failing because a writer was found in a failed state.  From an Exchange / VSS perspective this is unexpected –> after all although the writer is failed the error is RETRYABLE –> essentially saying “hey…something failed but come on back and try me again…”

 

Let’s take a look at why this might be happening….

 

Within the VSS framework there are two states that we are interested in –> the Session State and the Current State.  When a VSS session is in progress, and an administrator runs VSSAdmin List Writers, the state that is displayed is the current session state.  When the VSS snapshot creation has completed, the current state becomes a session specific state and the status of the most recently completed session is copied to the current state.  At this point when the administrator runs VSSAdmin List Writers the state of the most recently completed session is displayed.  This is an important distinction  -->  the SESSION STATE AT THIS POINT REFLECTS THE STATUS OF THE LAST SESSION!  The status of the last session does not imply anything in regards to the success <or> failure of future sessions.

Now that we know where VSSAdmin List Writers gets its information we’ll take a look at how the backup process should progress.  (I’m going to attempt to present an overly simplified timeline of an expected backup)

The process starts with the VSS requester establishing a VSS session. 

 

image

 

After the session is established the VSS requester requests metadata from the VSS framework.

 

image

 

At this point the VSS request and VSS framework further progress the snap shot process by determining components and preparing the snapshot set.

 

image

 

Once the components and snapshot sets have been prepared the VSS requester issues a PrepareForBackup.  This in turns causes the VSS framework to prepare the components for backup.

 

image

 

After prepare backup is called the individual application level writers are now responsible for current writer status.  The VSS requester is now allowed to call GatherWriterStatus.  This call in turn should return the current writer status.  For example, current writer status at this stage could be FREEZE / THAW / etc.  This is regardless of if the previous status was FAILED or HEALTHY.  This is the status that the VSS requester should be utilizing to make logic decisions at this point.

 

image

 

Once the snapshot is created the contents can then be transferred to the backup media.  Once the transfer is complete, the VSS requester can inform the VSS framework that a backup has completed successfully and subsequently the VSS session ended.

 

image

 

In summary if the VSS requester is performing operations in an order that is expected, the writer status should be queried after the framework has received a prepare for backup event.  This will ensure the writer status reflects that of the CURRENT SESSION IN PROGRESS and not the SESSION STATE OF THE PREVIOUS BACKUP.

 

The administrator can verify the functionality of the Exchange writer by utilizing the VSHADOW or DISKSHADOW utilities.  These utilities utilize the workflow outlined in the successful handling of a failed retryable writer case.  If either of these utilities are successful in creating the backup, and the writer in turn is returned to a healthy state you might consider following up with the backup vendor to ensure VSS calls are being made appropriately.  Microsoft can also assist you in verifying the calls are made appropriately through assisting with both Exchange and OS VSS tracing.