Why is recovery point consumption smaller than transferred data?

There are people that understand and monitor what’s happening in their environment, I’m sure you are one of those and may observe strange looking statistics. On the 10.000ft level and a per job basis, not all data that gets transferred is also stored!? Are we losing information? No, of course not!

DPM is a CDP or ‘incremental forever’ solution. So when a job transfers 1GB of data those are changes. DPM uses shadow copies of its storage pool to maintain changes between one recovery point and the next using ‘copy_on_write’. That consumes space on the recovery point volume which is a dedicated VSS diff area volume, so far so good but given the 1GB of transferred data you may see only 700MB or so being consumed on the recovery point volume! This is a VSS ‘trick of the trade’ we get for free. If VOLSNAP.SYS is about to execute the copy_on_write it checks that the new data is actually different and if not skips the copy reducing consumption of the recovery point volume. Practical cases have shown that differences can mount to 30-50%.
Hey, was the agent not supposed to transfer changes only? Yes, and it does but the agent tracks changes on the block level and does not actually compare. Rewriting a block with the same data still is a ‘change’. The same VOLSNAP trick could be exploited at agent side but obviously requires maintaining shadow copies between synchronizations which is too resource intensive for many circumstances.

The above is not to be confused with another DPM (2007 and later) efficiency; changing the same block 1 or 10 times between two consecutive synchronizations makes no difference in transferred data. A block changes and is marked in the bitmap or not, how often is of no consequence in change volume (for those that recall, indeed this was different with DPM2006).
Also note the above is about protection job delta synchronization, not “Consistency Checks”. This is another form of synchronization that does not rely on ‘what is tracked’ (because that went south for some reason) but figures out what the actual differences are in a 2-stage checksum approach. A lightweight check to see if anything could be different and heavyweight to figure out what the actual differences are if the lightweight found evidence of a change.

Another trick plays on recovery point usage when deleting data. VSS must copy away data before it gets deleted so we can return to the point in time where it was still present, right? But it does not! Huh? Well, not right away. For as long as the ‘freed’ block is not re-written there is no reason to copy it away and does not consume additional space. On the block level the data is still available in the volume but marked free in file system. The copy_on_write is triggered when freed blocks are about to be overwritten delaying that action until it really necessary.

So a moments observation of recovery point used space may not reflect reality and better be looked at as an average over time. 

On a side note, since we are in comparing mode, accumulated transfer of synchronizations and the transfer on subsequent ExpressFull do not match-up. Logs describe a transaction or command rather than holding the ‘change’ itself (except for new data of course). Consider “replace all of ‘this’ by ‘that’” is a few byte command but could change megabytes. The incremental would need to sync the ‘few dozen bytes’ whilst the ExpressFull must do the megabytes database change as well.

Stay tuned, there is more to come…