Avoid expansion if you’re snap-happy

Occasionally the purpose of snapshotting VMs in Hyper-V is misunderstood (and hence misused), and people find that their initial guesstimate as to required disk space was insufficient, so try to extend it after the fact.

If there was an eleventh commandment, it should possibly be “Thou shalt not expand a VHD file if it hath snapshots”.

 

Scenario:

Virtual machine FOO has a 1GB dynamically expanding VHD file.

A snapshot is taken of FOO, then it is shut down.

The “Edit Disk” function in the Hyper-V Manager is used to expand the VHD file to 2GB, where the following warning is presented and dutifully ignored by the user:

edit-disk-warning-message

A request is made to start FOO, which fails with the following error:

vm-fails-to-start-after-vhd-expanded

The following events are recorded under Applications and Services Logs > Microsoft > Hyper-V-Worker > Admin:

Event 12142: 'FOO': Failed to open virtual disk 'C:\Virtual Machines 2\Test Machines\FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd'. A problem was encountered opening a virtual disk in the chain of differencing disks, 'C:\Virtual Machines 2\Test Machines\FOO.vhd' (referenced by 'C:\Virtual Machines 2\Test Machines\FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd'): 'The size of the virtual hard disk is not valid.' (0xC03A0012). (Virtual machine ID CAC3EE73-20EC-43A5-BE9C-04177CAEFEB8)

Event 12140: 'FOO': Failed to open attachment 'C:\Virtual Machines 2\Test Machines\FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd'. Error: 'The chain of virtual hard disks is corrupted. There is a mismatch in the virtual sizes of the parent virtual hard disk and differencing disk.' (0xC03A0017). (Virtual machine ID CAC3EE73-20EC-43A5-BE9C-04177CAEFEB8)

Event 12010: 'FOO' Microsoft Emulated IDE Controller (Instance ID {83F8638B-8DCA-4152-9EDA-2CA8B33039B4}): Failed to Power on with Error 'The chain of virtual hard disks is corrupted. There is a mismatch in the virtual sizes of the parent virtual hard disk and differencing disk.' (0xC03A0017). (Virtual machine ID CAC3EE73-20EC-43A5-BE9C-04177CAEFEB8)

Event 12030: 'FOO' failed to start. (Virtual machine ID CAC3EE73-20EC-43A5-BE9C-04177CAEFEB8)

 

(It does not matter that the VM has no OS installed, or that one would not fit on such a small disk – this is for illustration only, and the contents of the virtual disks are irrelevant as far as Hyper-V is concerned.)

 

So what went wrong (other than the user disregarding the warning that something bad might happen)?

First, you might want to familiarise yourself with this TechNet article which points to the document with the details specs for the VHD file format.

Citing relevant parts here to give some background:

The basic format of a dynamic hard disk is shown in the following table.

Dynamic Disk header fields

Copy of hard disk footer (512 bytes)

Dynamic Disk Header (1024 bytes)

BAT (Block Allocation table)

Data Block 1

Data Block 2

Data Block n

Hard Disk Footer (512 bytes)

 

Every time a data block is added, the hard disk footer must be moved to the end of the file. Because the hard disk footer is a crucial part of the hard disk image, the footer is mirrored as a header at the front of the file for purposes of redundancy.

 

All hard disk images share a basic footer format. Each hard disk type extends this format according to its needs.

The format of the hard disk footer is listed in the following table.

Hard disk footer fields

Size (bytes)

Cookie

8

Features

4

File Format Version

4

Data Offset

8

Time Stamp

4

Creator Application

4

Creator Version

4

Creator Host OS

4

Original Size

8

Current Size

8

Disk Geometry

4

Disk Type

4

Checksum

4

Unique Id

16

Saved State

1

Reserved

427

 

Original Size

This field stores the size of the hard disk in bytes, from the perspective of the virtual machine, at creation time. This field is for informational purposes.

Current Size

This field stores the current size of the hard disk, in bytes, from the perspective of the virtual machine.

This value is same as the original size when the hard disk is created. This value can change depending on whether the hard disk is expanded.

 

Now let’s take a look at the Hard Disk Footer of the VHD file before it was expanded (as the Reserved bytes are all zero I have truncated the view to the interesting bits):

63 6F 6E 65 63 74 69 78 00 00 00 02 00 01 00 00 00 00 00 00 00 00 02 00 13 21 5D 1A 77 69 6E 20 00 06 00 01 57 69 32 6B 00 00 00 00 40 00 00 00 00 00 00 00 40 00 00 00 08 20 10 3F 00 00 00 03 FF FF F1 2E 79 05 30 D3 82 5C 22 44 86 97 5B F9 70 B3 53 4C 00 00 00 00 00 00 00 00 00 00 00 00

I have coloured the bytes for the Original Size, Current Size and Disk Geometry, as it is logical to assume these would be the bits changing when the disk is expanded.

0x0000000040000000 == 1GB, as we would expect both sizes are set to this.

The disk geometry shows us the C/H/S values are 0x820 / 0x10 / 0x3F (convert these values to decimal and multiply by 512 bytes per sector and you magically get 1GB).

Here is the Hard Disk Footer of the snapshot (or “differencing”) disk:

63 6F 6E 65 63 74 69 78 00 00 00 02 00 01 00 00 00 00 00 00 00 00 02 00 13 21 5D 9A 77 69 6E 20 00 06 00 01 57 69 32 6B 00 00 00 00 40 00 00 00 00 00 00 00 40 00 00 00 08 20 10 3F 00 00 00 04 FF FF EF F6 41 AE 88 9D 5D CA 6C 40 AE 2D 01 3C E0 5C F7 7D 00 00 00 00 00 00 00 00 00 00 00 00

No great surprises here – the header is very similar, except the Disk Type field is 4 instead of 3, and the Unique Id is, well, unique – both of these mean that the Checksum is different too, unsurprisingly.

So what happens to the VHD’s Hard Disk Footer when we ignore the warning and expand it?

63 6F 6E 65 63 74 69 78 00 00 00 02 00 01 00 00 00 00 00 00 00 00 02 00 13 21 5D 1A 77 69 6E 20 00 06 00 01 57 69 32 6B 00 00 00 00 40 00 00 00 00 00 00 00 80 00 00 00 10 41 10 3F 00 00 00 03 FF FF F0 C5 79 05 30 D3 82 5C 22 44 86 97 5B F9 70 B3 53 4C 00 00 00 00 00 00 00 00 00 00 00 00

0x0000000080000000 == 2GB, which is the new value for Current Size.

The Disk Geometry has changed (the cylinder count is up to 0x1041), and the resulting Checksum has therefore been modified.

 

There is no reference to the differencing in the parent (why would there be? there can be many differencing disks all using the same base VHD) so it’s not unsurprising to find that disk was untouched and still believes its size to be 1GB.

So now we try to start FOO, which is pointing to the snapshot as it’s “Now”, and the chain of virtual disks is checked… and fails because the disks do not agree.

 

The correct approach would be to merge the snapshots before expanding the disk, but how do we get out of this problem state?

If there’s a backup of the VHD before it was edited, this is the best time to put it back… what do you mean “the snapshot is our backup”?

NOTE: Snapshots are not, repeat NOT backups, and you are advised to avoid running production servers on snapshots for any length of time – just long enough to verify that any configuration change or patching you did has not had horrible side effects.

Basically we need to undo the changes made to the Hard Disk Footer (both of them, at the start and end of the file) – luckily there is “vhdtool” which has a “/repair” switch which can attempt this job for you:
https://code.msdn.microsoft.com/vhdtool

If you’ve mounted the VHD file in a running VM and changes were made to it, there is a possibility you will get data loss or corruption.

 

I ran the tool on the VHD and AVHD pair and here was the output:

C:\Virtual Machines 2\Test Machines>vhdtool /repair FOO.vhd FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd Status: Resizing base VHD "FOO.vhd" to match the size indicated in child VHD "FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd". Status: Attempting to open file "FOO.vhd" Status: File opened, current size is 6144 Status: Attempting to open file "FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd" Status: File opened, current size is 70144 Status: Opened "FOO.vhd" as base VHD file, type is dynamic-sized. Status: Base VHD's identifier is "d3300579-5c82-4422-8697-5bf970b3534c" Status: Opened "FOO_1CE6964B-29B6-4D37-97AA-E847195217F3.avhd" as child VHD file. Status: Child VHD's parent identifier is "d3300579-5c82-4422-8697-5bf970b3534c" Status: Resizing base VHD to match child size of 1073741824 bytes Status: VHD footer generated Status: VHD footer written to file. Status: VHD footer written to file. Status: Operation complete. Status: Complete

The current sizes mentioned are the size of the (VHD and AVHD) files on disk in bytes, not information stored within the files – the important line is the one that reports it is resizing the base VHD file, this is the Hard Disk Footer surgery being performed on Current Size.

 

The broken VM now starts up – looking at the Hard Disk Footer for the base VHD file there are some slight differences to the original, however:

63 6F 6E 65 63 74 69 78 00 00 00 02 00 01 00 00 00 00 00 00 00 00 02 00 13 21 60 E2 68 61 63 6B 00 02 00 00 57 69 32 6B 00 00 00 00 40 00 00 00 00 00 00 00 40 00 00 00 08 20 10 3F 00 00 00 03 FF FF F0 3F 79 05 30 D3 82 5C 22 44 86 97 5B F9 70 B3 53 4C 00 00 00 00 00 00 00 00 00 00 00 00

 Time Stamp is different – given the action we are performing, not a surprise.

Creator Application is different – from 0x77696E20 to 0x6861636B… what gives?
Let’s have a look in the spec doc:

Creator Application

This field is used to document which application created the hard disk. The field is a left-justified text field. It uses a single-byte character set.

If the hard disk is created by Microsoft Virtual PC, "vpc " is written in this field. If the hard disk image is created by Microsoft Virtual Server, then "vs " is written in this field.

Other applications should use their own unique identifiers.

0x77696E20 == “win “
0x6861636B == “hack”
vhdtool is following the recommended practice :)

Once more, Checksum is different as it’s calculated from the sum of all the other bytes in the data structure.

Creator Version has been changed from 0x00060001 (6.1) to 0x00020000 (2.0) – this is because the original creator was Windows Server 2008 R2 (NT 6.1) and the new creator is vhdtool v2.0.

 

Why does vhdtool need the snapshot file if the changes are being made to the base VHD file?

vhdtool needs to know what to set as Current Size and Disk Geometry in the base VHD file, as this information was overwritten by the disk expansion.
If you manually trash the snapshots and point the VM configuration to the base VHD file (and the Hard Disk Footer’s Checksum field adds up okay) then you have effectively rolled back without reducing the disk size back – but all the data in the snapshots is obviously lost.

 

What happens if I have multiple snapshots of a VM? Do I need to do multiple edits?

No – as the virtual disk chain check fails, the VM cannot be started or more snapshots taken after the disk expansion, so if there are already multiple snapshots they should have the same Current Size value which we are restoring in the base VHD.