About DPM2010 scalability and new features

Why the hassle about 300 data sources, the ‘LDM limit’ thing and where do the new DPM2010 options “collocation” and “autogrow” come into play? More importantly, why should I care and what’s in it for me? Let’s start with that…

Understanding the background of this helps to design and configure DPM solutions in a maintainable fashion rather than having to rework from scratch somewhere down production lane. I’m also convinced you rather use 1 than 5 DPM servers to protect a thousand or so databases for example. For coherent understanding we need to start at the base and correlate some relevant facts …

The magic ‘300’

This is a chosen limit that derives from the Logical Disk Manager (LDM) number of volumes that can coexist on a Windows system. We’ll explore some details in a minute but first the following; note that DPM requires 2 volumes to protect a data source, 1 for the replica and 1 for the recovery point volume. This means if we max out a supportable DPM server on this aspect there will be at least 600 volumes on the system.
LDM has a fixed size data structure (the ldm database) with records (to define volumes) that occupy at least 1 ‘slot’ and sometimes 2. To cut short, there are 2960 slots available and each new volume requires 3 or 4 slots and 1 more for each time a volume is extended. Wait a minute, with 300 data sources requiring 600 volumes, that consumes 1800 out of 2960. In other words; you cannot extend all replicas and recovery point volumes twice on a maximum configuration. Not that this is likely to occur but chances increase with DPM2010 as we will see. Obviously if there are less data sources or less need be extended you can do that more often. At some point we need ‘consolidation’ to reduce consumption of slots. Okay, but how? Create a new volume large enough to hold all data of an extended volume, move data and delete the old volume releasing all extent slots. For each 3 extends no longer needed a new volume can be created. Although we facilitate you with a migrate script to move DPM data around, take it from me this is something you like to avoid. So, what did we learn so far?

  • You now understand where the max. ‘300’ data sources comes from
  • Use common sense and historic experience on predicting growth for a data source
  • Don’t allocate lean including growth projection and avoid extending too often with many data sources
  • This is far less of a consideration with few dozen data sources than with hundreds

No, you cannot shrink DPM replica volumes except by migrating volumes as mentioned earlier, you can however shrink recovery point volumes (as of ‘RC’ and later builds). There are multiple reasons that may call for DPM volume migration, therefore the MigrateDatasourceDataFromDPM script is provided with DPM. A good step by step sample of usage can be found here: http://blogs.technet.com/askcore/archive/2009/06/22/how-to-use-the-migratedatasourcedatafromdpm-ps1-dpm-powershell-script-to-move-data.aspx

Sid Ramesh wrote a fantastic script for information on how many extents data sources use and which to migrate first.

DPM2010 new features and scalability

Clearly the above articulates scalability limits and operational challenges that had to be addressed, and they are!

“The auto grow and auto heal features”

Blunt overestimation of required disk space to protect data is easy but expensive. Probably overdone but I remind you anyway that disk storage cost is by far the largest part of a DPM investment. Even a decent sized allocation still cannot deal with unexpected and often unintended sudden growth exceptions causing a backup to fail. To cope with this automatically if you like, you can opt to use auto grow and the auto heal feature such that DPM will automatically extend the volume and repeat the failed backup without intervention. But remember this thing about 2960 slots?
Depending on the number of data sources you already protect and how often allocations need to be extended, over time you may unexpectedly run out on slots and ‘McMurphy’ is very proficient to let that happen at the worst possible moment. Check this post by me on how to determine what is still available if you plan large scale migration of volumes.
Don’t get overly concerned now, DPM2010 implements checks and warns you well ahead of time that you might be heading for this condition and won’t consume all slots so you still have the ability to migrate volumes and consolidate extents. Furthermore it is fixed to auto grow 25% with a minimum of 10GB, see Prashant’ post here for more background on this.   
Also the Virtual Disk Service won’t let you extend a single volume more than 32 times. What I am saying is that these options are meant to automatically deal with a number of exceptions that even top-dog operational control cannot prevent but still need to be used with care and common sense.
What else have we learned?

  • Auto grow and heal are great features but not meant to implement a thin provisioning strategy, not in the least because a volume can only have 32 extents
  • Consolidating extents or shrinking volumes requires migration using the standard DPM provided script except recovery point volumes in DPM2010 RC and later versions.
  • DPM2010 warns well ahead of time for relative high LDM database consumption

“The disk collocation feature”

DPM2010 supports protecting clients and SQL databases at numbers that would grossly exceed the maximum number of volumes calling for more DPM servers for no other reason. More data sources must be protected using a single replica and recovery point volume twin and that is exactly what the disk collocation feature does for SQL database and Hyper-V (and client) data protection. Defining a collocation strategy is too much of a topic on its own but it should be clear that protecting hundreds up to say 1500-2000 small databases using only a few dozen volumes reduces the amount of supportable DPM servers needed from 5 or 6 to just 1 with regards to this aspect. The ‘300’ still stands but DPM2010 is far more proficient with co-locatable data source types. Another advantage is that we can be more relaxed in allocating disk space because it serves many data sources leveling the variance.

  • Disk collocation strongly improves scalability when protecting many small data sources and enables better DPM server utilization
  • Disk  collocation increases the chance that accumulated data sources grow beyond anticipation but the auto grow and auto heal features deal with that automatically.

Help us now by testing!

We have listened and improved but don’t take my word for it… and see for yourself!

The listening continues to yet further improve DPM for you!