Health Service problem on Windows 2000 Agent

I recently ran into an interesting issue with a customer.  A Windows 2000 Agent (running OpsMgr SP1) was not able to process configuration due to problems creating/using the self-signed certificate that the Health Service uses (this is not a Gateway or DMZ scenario, this is the certificate that all agents create and use).  At first, we were seeing the following errors in the OpsMgr Event Log:

 

Event ID: 1220
Description:
Received configuration cannot be processed. Management group "<MANAGEMENT_GROUP_NAME>". The error is Cannot find the certificate and private key for decryption.
(0x8009200B).

Event ID: 21021
Description:
No certificate could be loaded or created. This Health Service will not be able to communicate with other health services. Look for previous events in the event log for more detail.

 

After removing/reinstalling the agent, the Health Service would not start, and the following error was seen in the System Event Log:

 

Event ID: 7024
Description:
The OpsMgr Health Service service terminated with service-specific error 2148073494.

 

This error maps to "Keyset does not exist".

 

This looks to me like the Health Service is having problems creating its self-signed certificate.  To investigate this:

 

Check to see if we have the certificate in the certificate store:

  1. Start – Run – MMC.exe
  2. File – Add/Remove Snap-in
  3. Add – Certificates – Add
  4. Computer Account – Next – Local Computer – Finish

Here’s what it looks like when the cert is there:

image

 

If the certificate is there and we still think we’re having problems with it, there’s no harm in deleting it….it should be re-created when the Health Service starts.  In our case, since we had uninstalled the agent, the certificate was removed.  When we tried to start the Health Service, it was failing to create the certificate.  So, the next step is to verify that the Health Service is running under the context of the Local System account:

image

 

If it is, then the next step is to verify that the System and Administrator accounts have Full Control of the following directories:

 

%System Drive%\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys

%System Drive%\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\S-1-5-18

 

Also, verify that the Administrators group is the owner of these directories.  This is necessary for the Local System account to be able to create the certificate.

 

So, everything above checked out fine in my customer’s environment.  While researching this, I came across another customer case where some other service was failing to create a certificate because a service named “Protected Storage Service” was not running.  I tested on a Windows Server 2003 Agent and could not reproduce the problem…we created the self-signed cert just fine without the Protected Storage service running.  Then, I remembered that my customer’s problem was on a Windows 2000 Agent, and the other customer case I was reading was quite old, so likely from Windows 2000.

Anyway, we checked the Protected Storage Service and it was disabled.  Enabled and start it and the Health Service started without error, created its certificate, and was talking to the Management Server in no time.

So, if you have any of the above errors, check to verify that the Protected Storage Service is started.