What are the high availability options for ConfigMgr 2012?

If you have worked with SMS/ConfigMgr over the years you understand that the product is continually evolving and adding more functionality.  In many companies, ConfigMgr is, or is close to becoming, a core critical IT service.  With the said, what are the high availability options for ConfigMgr 2012? 

We can utilize a SQL cluster for the central administration site (the CAS) and for the primary site servers.  We cannot use a SQL cluster for the secondary site.

Configuration Manager 2012 also provides the ability to install multiple instances of several site system roles to increase availability, including the management point, distribution point, state migration point, the application catalog roles and the reporting services point.  You can also use an NLB cluster for the software update point role.  In ConfigMgr 2007 we provided the ability to use an NLB cluster for the management point role, but in ConfigMgr 2012 we replace that with the ability to add multiple management point servers to the hierarchy.

But what about high availability for the CAS, primary or secondary site server roles themselves?  There is no clustering or NLB support for those roles.  If the CAS or primary site servers go down, our recovery model is to use the backup/recovery process in ConfigMgr.  If the secondary site server goes down, the only recovery model is to reinstall the site.  You could possibly make the case to backup the package files to avoid having to repush those across the WAN, but backing up the secondary site server itself (and the database) is unnecessary. Also we do not support restoring any ConfigMgr components/servers using the snapshot feature that virtual server products provide.  You might use it, it might work in a recovery, but completely unsupported.

But what if I decide to have two primary site servers (and a CAS, since you would need it to 'bind' the two primary's together) and have one primary there (say it's site code PR2) in case the other primary (PR1) goes down (or vice versa).  In that scenario, your plan might be to temporarily assign the clients from PR1 to PR2.  Then, rebuild PR1 and then move the clients back from PR2 to PR1.  Temporarily assigning clients to another primary site is possible, but may introduce issues. After assigning the clients to the other site the clients will begin submitting data (inventory, compliance settings data, Endpoint-related data, software update compliance, etc) to the newly assigned site.  Once the original site is recovered and the clients are assigned back to it, the clients would exist and be viewable in both primary sites and would remain there until that data was manually deleted or triggered to delete once it become aged. This may introduce various issues around targeting and software delivery and perhaps other unforeseen issues. The more effective recovery scenario would be to recover the original primary site and replicate information from the central administration site database (if a CAS has been implemented). Otherwise, restoring the site using the ConfigMgr backup/recovery process would be the recommended option.