SQL Management Pack for SCOM: Low Privilege configuration in Clusters

 

You know it’s going to be a bad day when you take a support case and the person on the other end of the line has not only worked for Microsoft, but done your job (and done it well by all accounts) at Microsoft longer than you have.  That was my joy last week when I took a case about the SQL Management Pack low privilege setup not working properly.   I knew I was going to be in for a long day, and the case did not disappoint.

The case in question was due to issues with SCOM being unable to discover SQL 2012 databases using the SQL low privilege setup on clustered servers.  SQL discovery was working just fine on non-clustered instances, but on clustered servers we were only seeing the database engine (and it was showing as not monitored) and not the databases, SQL jobs, etc.

Just a bit of context on low privilege environments.  SCOM typically discovers SQL via one of two methods.  First is that the default action account happens to have sys admin rights to the SQL Server.  The second is that the Local System Account has these same rights.  The problem with this is security.  In the case of the Local System Account, a compromise of the server can lead to a very quick compromise of the SQL installation housed on said server.  While the Default Action Account is, in that respect a bit safer, this account usually has quite a bit of rights in the environment as a whole, making it an account that an intruder would be interested in compromising, which grants the intruder rights to every single SQL installation and administrative rights and most every server.  Given the average cost of a cyber event (now north of $12 million) and the fact that the SQL databases are what usually house the data that intruders are most interested in, it makes sense to reduce the attack surface of the SQL environment in any way possible.  Enter SQL low privilege.  Instructions for this setup are included in the SQL Management Pack documentation, and it includes creating three domain user accounts and giving them the basic rights that they need to monitor the SQL environment without granting SCOM administrative rights to the SQL servers themselves.  Given the cost of intrusions, it’s a simple thing that should be done in any environment that houses SQL data that would be considered confidential.

Now back to the problem.  In our environment, we could see the instances for the clustered systems, and were discovering databases for non-clustered systems housed on the same Operating System.  Also, if we made these accounts local admins, everything would discovery perfectly.  Just to make sure no mistakes were made, we walked through the SQL documentation again.  That all checked out clean.  The management pack documentation does note that for clustered environments, some of the accounts need Local Launch and Local Activation permissions (set in dcomconfig.exe), but that wasn’t the problem here.  Once we dove into the SCOM logs, we saw this:

image

I didn’t include a snapshot of the error in there because it had client data in it. (note this is a 2012 discovery error, more on that in a bit, but we were essentially seeing similar issues in 2008).  A look at the SQL logs showed no errors.  The NTFS security log on the system was also not showing any sort of permission denied events.  We checked antivirus, and for a while we thought it was that, but after disabling AV in our test environment, still no discovery data. 

We eventually had to resort to ProcMon to determine the breakdown.  Note that this screenshot doesn’t list every key, there are more.  I’ll list them below, but this is what we saw:

image

This screenshot unfortunately doesn’t seem to render well here, but a closer look at those script.exe events showed that they were the processes identified in the above shots and were being executed by the accounts specified in the SQL low privilege setup.  Apparently for clusters, these accounts need a bit more rights.  I ran this by a few people that I know who have had to do this and it appears that often times, these accounts are simply granted local admin rights, which starts to defeat the purpose of doing a low privilege setup.  At the end of the day, these accounts were trying to write to the following registry keys and failing:

 

HKLM\Software\Microsoft\Microsoft Operations Manager\3.0\Modules\{GUID} (note the GUID is the management group ID)

HKLM\Software\Microsoft\WEBEM\CMOM

HKLM\System\CurrentControlSet\Services\WinSock2\Parameters

The first key was preventing discovery of the Databases.  The second two keys were preventing discovery of SQL Agent jobs on a SQL 2008 installation.  I don’t have a clustered lab environment to test this on, so I’m somewhat curious if others have run into this before.  I’ve submitted a request to get the SQL MP documentation updated with this data, but until that happens, I hope this post will be useful.

Technorati Tags: GetSQL2012DBEngineWinServState SCOM Low Privilege SQL Agent Jobs discovery