Moving the ConfigMgr site database to an Always On Availability Group


Back in 2012 I really wanted to be able to have the ConfigMgr database in an Always On Availability Group (I’ll refer to this as an “Availability Group” or an “AG” from now on). So, a coworker and I made that happen – sure, it wasn’t supported and auto-failover didn’t work but nevertheless we had it working.

So now that it’s legit to have the database in an AG I’m very happy. Especially since we are moving our workloads into Azure and you can’t have an Active/Passive cluster (for high availability) in Azure. You can, however, have an Availability Group in Azure (which could also be used as a DR solution)! This post will explain how to efficiently move the CM DB to an Availability Group. This process will be the same whether the AG is in Azure or not. However, if you’ve got a small database this method could be overkill and perhaps not worth the effort (depends on how "small" it is), but if you’ve got a large database this should be quite helpful.

Prerequisites

Naturally, the SQL servers should be created and configured already, including the permissions. I highly recommend creating the AG ahead of time with a dummy database so you can ensure everything in the AG is working and setup correctly without having to do that troubleshooting when trying to move your production database. Thus, I’m going to assume that the AG has already been created. As for permissions, because the real work is performed by site recovery (a database move) check that the proper permissions are in place – just like you’d do for any recovery.

The last item before we get into the steps to move your CM database into an AG, is to make sure you have the server level settings (for SQL) properly configured for CM. Specifically, the server must allow CLRs and use a defined replication text size.* To do that run the following on each of the AG nodes.

USE [master];
GO
EXECUTE sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'clr enabled', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'max text repl size (B)', 2147483647;
RECONFIGURE WITH OVERRIDE;
GO

*If I get my way CM will also accept a value of “-1” for the “max text repl size (B)” since that’s much easier to tell people to use and is how to tell SQL to use the max size of each data type.

When I said we’d move the CM database into an AG “efficiently” what I meant was “with as little downtime as possible”. That means we’re going to do as much as possible before even starting our downtime – this could be days before but I don’t recommend waiting more than a couple days.

Pre-Downtime Activities

The first change we need to make on the CM database is to put it into the FULL recovery model. This is a requirement for being in an Availability Group. Something not for this blog post but worthy of mentioning is the need for SQL log backups for a database in the Full recovery model. If you’re going to use an Availability Group then you need to brush up on SQL backups (full and log backups specifically) and make sure to have a plan in place.

ALTER DATABASE [CM_xxx] SET RECOVERY FULL;

Because we’ve changed the recovery model we need to take a new backup (even if you had one right before updating to full recovery model. So, take a Full backup of the CM database at this point. If you have a job that does this you can kick this job off or run the backup manually (see sample code below). However, if you do have a backup job you’ll want to disable it. After we take this backup we don’t want any new full backups taken (at least until we’ve fully moved the CM database to the AG).

BACKUP DATABASE [CM_xxx] TO DISK = N'E:\**YOUR DESIRED BACKUP LOCATION**\CM_xxx_Full.bak'
WITH COMPRESSION, STATS = 1;

Now that we’ve got a new full backup we need to backup the log as well. Take a log backup of the CM database using the wizard, a job, or use something like this:

BACKUP LOG [CM_xxx] TO DISK = N'E:\**YOUR DESIRED BACKUP LOCATION**\CM_xxx_Log1.trn'
WITH COMPRESSION, STATS = 1;

In the next step you’ll be restoring the database using the backups which were just taken. The restore can be performed locally or from a network location. If you’re going to do it from local files, copy the backup files to the nodes. If you’re going to restore the database over the network (using a UNC path rather than a local path) you can skip this copy step.

Restore these backups to both of the AG nodes. In this step you have to restore the full backup first and then the log. And this is very important, you must use the NORECOVERY option! If you don’t use the NORECOVERY option in the restore then all of this is for naught. So make sure this is not forgotten. Trust me…I forgot one time and lost 6 hours of preparation.

RESTORE DATABASE [CM_xxx] FROM DISK = N'**THE LOCATION OF THE BACKUP FILES**\CM_xxx_Full.bak'
WITH NORECOVERY, STATS = 1;
RESTORE LOG [CM_xxx] FROM DISK = N'**THE LOCATION OF THE BACKUP FILES**\CM_xxx_Log1.trn'
WITH NORECOVERY, STATS = 1;

Now, and this is also very important, you must continue to take log backups until you’re ready to make the official move to the AG. So, if you’re going to wait a day or two it would be best to have a job scheduled to take a log backup every couple of hours unless you want to remember to do this yourself.
This is important to do for several reasons. The biggest reason being, if you don’t the log will continue to grow and perhaps fill up your disk, which means SQL stops working (for this DB at least). Another reason is because taking them more often will create smaller log backup files to copy and/or restore.

You will need to name each log backup something different than what was previously used. You’ll notice that the example log backup file has a “1” appended to the filename. That was intentional so that as you take additional backups you increment that number (or do something else to make the name unique).

These log backups will need to be restored to each of the AG nodes just like the previous backups were. So, you can either copy the backups locally or perform the restore over the network just as previously done (again, making sure to use the NORECOVERY option). The restores can either wait until right before taking the downtime for the move or can be done throughout the time so there is less to do when it comes time to take the downtime. Oh, and if it isn’t clear, the restores will need to be performed in ‘oldest to newest’ order.

Downtime Activities

If you’re familiar with SQL then you’re realizing that we’ve essentially just created our own log shipping routine for our CM database – from the production server to our new AG nodes. Now that we’ve got our ‘log shipping’ activities happening and have waited until our downtime window we’re ready to begin the real move. If you have a job running to perform the log backups, it’s time to disable that job. That’s because we want to make sure to control exactly when the last log backup is taken.
Turn off the SMS services and wait 10-15 minutes. We’ll let things stop processing and wind down before calling things ready.

Once you’re comfortable with where things are at, take one last log backup and restore it to both nodes (just like previously done). At this point both AG nodes will have the latest data from the CM database. Both nodes should show the database in the “RESTORING” state. If not…you’re not ready and should turn the services back on and start over…and don’t forget to use NORECOVERY in the restore statements next time :). You can check the status in Object Explorer or with the following query.

SELECT  name
       ,state_desc
  FROM sys.databases
 WHERE name LIKE N'CM[_]___';

If things look correct at this point, then you’re ready to make one of the nodes the primary server for the CM database. To do this, on one node AND ONE NODE ONLY, run the following:

RESTORE DATABASE [CM_xxx] WITH RECOVERY;

Now this database should be in the “ONLINE” state and you can join the database to the Availability Group which you’ve already created (with the dummy database). Use the “Add Database…” wizard under “AlwaysOn High Availability --> Availability Groups --> [Your AG Name] --> Availability Databases”. In the screenshot below showing the location of this wizard you can see the dummy database I used to create the AG is named “CM_AAG” (for Azure Availability Group). I’ll be adding the CM database “CM_EA1” (for future screenshot reference).

When going through the “Add Database” wizard it is very important to choose “Join only” on the “Select Initial Data Synchronization” screen (should be the third screen to show up). If this option isn’t chosen then all the work we’ve done so far to minimize downtime is useless since SQL will have to take a backup and restore it to the other node. Since we’ve already done this via our ‘log shipping’, SQL does not need to perform a new backup and restore (a time consuming activity for larger dbs).
Finish going through the wizard and at the end your “Results” screen should look something like this:

At this point the database is part of the Availability Group on the new servers, and almost ready for the site recovery activity (database move). We’re not completely ready to perform this action because CM requires some database settings to be set for CM to work – and these settings aren’t kept in the backup/restore actions.

To ensure the required settings are set on the database, connect to the primary node and run the following SQL statements. We need to update these database settings on each database in the Availability Group because these settings aren’t replicated by SQL. Therefore, you’ll need to failover to each secondary (using SQL Server Management Studio, NOT the Failover Cluster Manager!) and run the same statements while the node is the “Primary” node. After each database has had these statements run against it while it was the “Primary” in the AG you can failover to whatever your preferred node is.

USE [master];
GO
ALTER DATABASE [CM_xxx] SET TRUSTWORTHY ON;
ALTER DATABASE [CM_xxx] SET HONOR_BROKER_PRIORITY ON;
ALTER DATABASE [CM_xxx] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE;
GO

The last step is to perform the database move via a Configuration Manager Recovery. You’ll specify the listener name for the SQL instance. ConfigMgr will take care of the rest and since we’ve configured all the database and server settings the site recovery process will not try to do something it can’t do to a database in an Availability Group – namely, put it into single user mode to change the configurations.

There you have it, nice and easy right?

One thing to note about running CM in an Availability Group: at the time of writing this blog post the AG must be in the “manual failover” mode when performing a CM upgrade, however, before and after you can (and should) run the AG in “automatic failover” mode.

Checklists

Looking for an easy checklist for your own activities? Well, look no further!

Pre-Downtime Checklist

  • Run the following on each node that will be in the Availability Group
USE [master];
GO
EXECUTE sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'clr enabled', 1;
RECONFIGURE WITH OVERRIDE;
GO
EXECUTE sp_configure 'max text repl size (B)', 2147483647;
RECONFIGURE WITH OVERRIDE;
GO
  • Set permissions as in any site recovery process
  • Create the Availability Group using a “dummy” database
  • Optional: On the current CM DB server install SQL backup jobs - full and log backup job
  • Change the CM DB to FULL recovery model
  • Take a Full database backup (then disable this job for now)
  • Take a Log backup (and ensure the job is running every couple of hours)
  • Restore the first full and log backups to both of the nodes in the Availability Group (“WITH NORECOVERY”!)
  • Continue to backup/restore logs until ready for downtime

Downtime Checklist

  • Disable the log backup job
  • Turn off SMS services
  • Wait 10-15 minutes
  • Take one last log backup and restore this on both the nodes as previously done
  • Check that the CM database is in a status of "Recovering" on both nodes
  • On one node (and one node only!) run the following statement
RESTORE DATABASE [CM_xxx] WITH RECOVERY;
  • Add the database to the Availability Group via the Availability Group “Add Database” wizard (JOIN ONLY!)
  • Run the following script on the primary node
USE [master];
GO
ALTER DATABASE [CM_xxx] SET TRUSTWORTHY ON;
ALTER DATABASE [CM_xxx] SET HONOR_BROKER_PRIORITY ON;
ALTER DATABASE [CM_xxx] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE;
GO
  • Failover to the other node (via SSMS) and run the previous script again
  • Perform a DB Move via Configuration Manager Recovery
  • Make sure you have jobs or at least a plan for SQL backups including log backups!
Comments (4)

  1. SamGeorge says:

    Super nice one Ben, thanks for sharing.

  2. flyvinni says:

    This is awesome Ben…

  3. Derek Luk says:

    I have done a SCCM AG in the lab. However, SCCM support Auto Failover (sync) only. During the Site Maintenance to repoint the Site Server to AG. The setup process will auto failover to each node to perform configuration and fail back to the primary node. Also, you mention using Site Recovery instead of Site Maintenance. Could you please clarify? As I know, SCCM currently support default instance and default port only.

  4. Fantastic article, this is bookmarked and I’m prepping my lab to try this out now.

Skip to main content