Backups fail due to consistency check failure…

Last week I had the opportunity to work with a customer who was experiencing issues backing up their Exchange 2010 databases.  The issue they experienced though is relevant to both Exchange 2007 and Exchange 2003 installations (that leverage VSS based backups and consistency checking enabled).

 

After reviewing the logs it was apparent that the VSS process was functioning appropriately.  All relevant events regarding the snapshot process were present.  In this case the backup job was configured for consistency check, and relevant consistency check events were noted.  In almost all backup jobs the following error was present in the logs:

 

Log Name: Application
Source: Storage Group Consistency Check
Event ID: 403
Task Category: Termination
Level: Error
Keywords: Classic
Description:
Instance: The physical consistency check successfully validated 0 out of xxxxxxxx pages of database 'DATABASE'. Because some database pages were either not validated or failed validation, the consistency check has been considered unsuccessful.

 

In general this event would indicate that consistency check encountered an error when scanning the pages of an Exchange database.  In most cases this would mean that there is page level corruption in the database such that the validation checks performed by consistency check would fail and the backup would be terminated.  This is by design.

 

In theory corruption of this type would not be present in the environment configured.  The customer was utilizing a Database Availability Group which has protections in it to self heal databases from this type of corruption.  Replication was healthy and there were no indication that any page corrections were performed.

 

If you look at the event in greater detail you will see that it provides the number of pages that were successfully scanned before the issue occurred.  When reviewing the application logs it was noted that on the same database the failure occurred after scanning a different number of pages.  For example, in one failure the failure occurred after scanning 28000 pages and another failure 42456 pages.

 

At this point when reviewing the system log the following error was noted:

 

Time: 1/9/2012 12:40:56 PM
ID: 36
Level: Error
Source: volsnap
Machine: server.company.com
Message: The shadow copies of volume F: were aborted because the shadow copy storage could not grow due to a user imposed limit.

 

This error would imply that while attempting to store differential changes while the snapshot existed the allotted snapshot storage space was exhausted and could not be grown.  When reviewing vssadmin list shadowstorage it was noted that the shadow storage space assigned to the volume hosting the database was 321 megabytes.

 

vssadmin list shadowstorage

Shadow Copy Storage association
For volume: (F:)\\?\Volume{0ecc7a68-be78-4c40-baf6-4d0d3b0b6693}\
Shadow Copy Storage volume: (H:)\\?\Volume{ed074b1d-b500-465b-a720-d2f733f49761}\
Used Shadow Copy Storage space: 0 B (0%)
Allocated Shadow Copy Storage space: 0 B (0%)
Maximum Shadow Copy Storage space: 321 MB (0%)

 

This is an extremely small shadow copy storage space.  By default the allotted space is generally 10% of volume size.  To correct this issue we can utilize the vssadmin command in order to reset the shadow storage space.

 

vssadmin Resize ShadowStorage /For=F: /On=F: /maxsize=20%

Successfully resized the shadow copy storage association

In our case the in-ability to continue to store differential changes in the shadow storage space caused the shadow copy to be removed.  This subsequently caused consistency check to fail resulting in a failure of the backup job.  Once the shadow copy storage was was allocated to an appropriate size, and differential changes could be successfully stored for the entire duration of the backup operation, the backups proceeded successfully.