My Windows Home Server 2011 Backups Failed But I Fixed It (With Some Help)

I run Windows Home Server (WHS) 2011 and use it to do nightly backups of the family computers. A few months ago I bought a new HP laptop for my use and joined it to WHS for nightly backups. At first they failed until I did some research and found I was running into a known issue resolved in the update below:

2781272 - A hotfix is available to add backup support for UEFI-based computers to back up to servers that are running Windows Home Server 2011

After installing this update all was well for a few weeks as far as backing up my new laptop. At some point though they started to fail and I wasn’t sure why. It took a while to figure out why and I thought I’d pass along the story in case anyone else runs into a similar issue since my efforts to fix it were in the truest geek fashion – “fun”. I mean “technically interesting” Smile

A few caveats on what resolved this:

  • These steps worked for me and for this specific issue only
  • The tools used have the potential to really affect the ability of your computer to work as you want it to (i.e., boot to Windows) so if you’ve never used these tools do your research or ask for help with someone who’s been down this path before
  • To be clear; just because I’m blogging this solution doesn’t mean I’m supporting it. Again, these steps worked for me for this issue only. I’m sharing the steps mostly to share the story in case anyone else sees similar behavior with their backup failures since I couldn’t find documentation anywhere else telling me how to fix this

If you’ve used WHS before and are familiar with the Dashboard you’ll know that backup failures are pretty easy to find. Bringing up the properties of the computer and then seeing why the failure happened may or may not provide useful diagnostic information (at least in my experience with this issue). I’d already done some testing with the backups and found that failures for me only occurred if I backed-up the C: Drive; other drives I could back up without issue.

Worked

Failed

clip_image001[6]

clip_image002

With that information in hand I started digging around some more to see if I could find out what was causing this failure. I logged onto the server itself and in the Event Log found this failure:

Log Name: WSSG
Source: Windows Server
Date: 4/27/2013 5:31:06 PM
Event ID: 269
Task Category: Backup
Level: Error
Keywords: Classic
User: N/A
Computer: WHS
Description:
The Windows Server Client Computer Backup Service received an abort process message from LAPTOP. Reason: 25.

Not very specific is it?

If you dig further and look on the affected client’s Windows Server Event Log you’ll get some additional information that again, isn’t very helpful (IMHO):

Log Name: WSSG
Source: Windows Server
Date: 4/27/2013 5:31:10 PM
Event ID: 514
Task Category: Backup
Level: Error
Keywords: Classic
User: N/A
Computer: laptop
Description:
Backup job XX on WHS did not succeed. Reason: EspCaptureFailed, System.String[]

I did some digging on the string “Reason: EspCaptureFailed, System.String[]” and only found this one link. If you look at that link and find anything that means something to you, you’re doing better than me.

At this point I started asking around to see if anyone internally could assist me in figuring out where to go next. It took some time but eventually I learned that the server backup logs are located in this folder: %programdata%\microsoft\windows server\logs. Looking in that folder finds the file called “backup-MMDDYY.log”. Opening up that file and searching on failure brought me to these lines of interest:

[04/27/2013 17:05:41 15ac] The Windows Server Client Computer Backup Service received an abort process message from laptop.
Reason: 25.
[04/27/2013 17:05:41 15ac] BackupProtocol: Got abort code laptop 1 25
[04/27/2013 17:05:41 15ac] DataFile: current file isD:\ServerFolders\Client Computer Backups\S-1-5-21-863003910-3293894345-3313517167-1041.Machine.configdat and version:: 10
[04/27/2013 17:05:41 15ac] BackupSetOperation: Backup Failed - 25

Reading through these should get you to the conclusion that there is something on the client causing the backup failure and WHS is off the hook as the suspect. On the client computer in the same %\ProgramData%\Microsoft\Windows Server\Logs\ path you’ll find the client log (EspCapture.log) with the real issue at hand. Open up that log file and search on “failed”. You’ll find that the backup is a robocopy script that logs the parameters used and results. Scroll down from the timeframe of your failure and you’ll find the real reason on why the backups are failing; here’s a snippet from my log:

     New Dir 3 Z:\EFI\HP\BIOS\New\
New File 4.0 m 01847.bin
2013/04/27 17:30:47 ERROR 1392 (0x00000570) Copying File Z:\EFI\HP\BIOS\New\01847.bin
The file or directory is corrupted and unreadable.

        New File 256 01847.s12
0%
100%
New File 256 01847.sig
2013/04/27 17:30:47 ERROR 1392 (0x00000570) Copying File Z:\EFI\HP\BIOS\New\01847.sig
The file or directory is corrupted and unreadable.

As you can see there are corrupt files on the drive causing the backup failure. Effectively, WHS can’t backup corrupt files resulting in the failure at hand. So how do I fix it?

The first thing you’re probably thinking is where is the “Z:” drive coming from (well, that’s what I was thinking)? The short answer is the backup process is a robocopy script that requires a drive letter to work. The drive letter is assigned on the fly when the backup process starts. If you look in Windows Explorer or Computer Management you won’t find a “Z:” drive. Another tool called Diskpart though will help us change that so we can fix the corrupted files. For ease of use I ran all of the following commands from an Administrative Command Prompt (commands I typed are in bold).

C:\Windows\System32>diskpart

Microsoft DiskPart version 6.2.9200

Copyright (C) 1999-2012 Microsoft Corporation.
On computer: LAPTOP

DISKPART> list disk

  Disk ### Status Size Free Dyn Gpt
-------- ------------- ------- ------- --- ---
Disk 0 Online 465 GB 0 B *

DISKPART> list volume

  Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 0 E DVD-ROM 0 B No Media
Volume 1 C NTFS Partition 440 GB Healthy Boot
Volume 2 D RECOVERY NTFS Partition 24 GB Healthy
Volume 3 WINRE NTFS Partition 400 MB Healthy Hidden
Volume 4 FAT32 Partition 260 MB Healthy System

Notice that there still is no drive Z: to be found? Let’s fix that so we can run another utility to fix the corrupt files:

DISKPART> select volume 4

Volume 4 is the selected volume.

DISKPART> assign letter=z

DiskPart successfully assigned the drive letter or mount point.

DISKPART> list volume

  Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 0 E DVD-ROM 0 B No Media
Volume 1 C NTFS Partition 440 GB Healthy Boot
Volume 2 D RECOVERY NTFS Partition 24 GB Healthy
Volume 3 WINRE NTFS Partition 400 MB Healthy Hidden
* Volume 4 Z FAT32 Partition 260 MB Healthy System

Now that we have a drive letter we can use chkdsk to fix errors on the volume. First though we need to exit out of diskpart:

DISKPART> exit

Leaving DiskPart...

C:\Windows\System32>z:

FULL DISCLOSURE: I forgot to take better notes of the actual error being reported by chkdsk /f on my drive so I’m pasting a screen shot instead that was taken by the person who helped me figure this out:

clip_image001

If you go back up on this write-up you’ll find those same file names are the ones that the backup failed because of. You’ll have to trust me when I say the following step worked for me:

On the Z: drive I ran chkdsk /f which fixed the errors. Be sure to follow the directions on the screen; failure to do so could render the machine unbootable! In my case I’d already ran the HP-provided backup utility to reinstall everything if needed. If you have a similar option I suggest doing that first.

Here’s what my drive looks like running this command today:

Z:\>chkdsk /f
The type of the file system is FAT32.
Cannot lock current drive.

Chkdsk cannot run because the volume is in use by another
process. Chkdsk may run if this volume is dismounted first.
ALL OPENED HANDLES TO THIS VOLUME WOULD THEN BE INVALID.
Would you like to force a dismount on this volume? (Y/N) y
Volume dismounted. All opened handles to this volume are now invalid.
Volume Serial Number is 90A0-0AD4
Windows is verifying files and folders...
File and folder verification is complete.

Windows has scanned the file system and found no problems.
No further action is required.

  268,435,456 bytes total disk space.
110,592 bytes in 5 hidden files.
630,784 bytes in 154 folders.
101,720,064 bytes in 413 files.
165,969,920 bytes available on disk.

        4,096 bytes in each allocation unit.
65,536 total allocation units on disk.
40,520 allocation units available on disk.

You’ll want to dismount the Z: drive so below are the steps to do that. We had to leave diskpart to run the chkdsk /f so going back into diskpart is the first step:

Z:\>c:

C:\Windows\System32>diskpart

Microsoft DiskPart version 6.2.9200

Copyright (C) 1999-2012 Microsoft Corporation.
On computer: LAPTOP

DISKPART> list disk

  Disk ### Status Size Free Dyn Gpt
-------- ------------- ------- ------- --- ---
Disk 0 Online 465 GB 0 B *

DISKPART> list volume

  Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 0 E DVD-ROM 0 B No Media
Volume 1 C NTFS Partition 440 GB Healthy Boot
Volume 2 D RECOVERY NTFS Partition 24 GB Healthy
Volume 3 WINRE NTFS Partition 400 MB Healthy Hidden
Volume 4 Z FAT32 Partition 260 MB Healthy System

DISKPART> select volume 4

Volume 4 is the selected volume.

DISKPART> remove letter=z

DiskPart successfully removed the drive letter or mount point.

DISKPART> list volume

  Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 0 E DVD-ROM 0 B No Media
Volume 1 C NTFS Partition 440 GB Healthy Boot
Volume 2 D RECOVERY NTFS Partition 24 GB Healthy
Volume 3 WINRE NTFS Partition 400 MB Healthy Hidden
* Volume 4 FAT32 Partition 260 MB Healthy System

DISKPART> quit

To be completely honest I was in mostly uncharted waters running these tools to fix my drive issues (Diskpart was entirely new to me). They simply weren’t tools that I’d had reason to use in my experience to date. However, as you can see they did fix the file corruption issues that were the source of my failed backups. I’ve already provided feedback to the WHS group on possibly providing better error messages so users can resolve this issue if they run into it. However, to be fair to the WHS team, I suspect this issue is a fairly rare issue in that you can tell from the corrupted file names that they were part of a BIOS update which gets stored in a hidden OEM partition on the drive. My computer still works so the obviously the BIOS update works successfully. Since I bought the computer with Windows preinstalled use of the OEM partition for the most part is negligible in that the partition exists only for restoring the computer. Since I’d done that once already in trying to fix this I know that works. After going through these steps I’m also now backing up my laptop to WHS, too.

Here are few other references on chdkdsk and diskpart you may be interested in:

I hope this was helpful to at least someone.

Thanks.