I spent the better part of a day and a half working with a client on a rather frustrating issue deploying the SCOM agent to Linux machines. I ended up working with a few people internally until we were all able to narrow down what it was (special thanks to Kris Bash, Steve Webber, and Ken Engelhardt on this one).
Anyways, our deployment was relatively straight forward, we could install the agent, but we were unable to sign the certificate. The error we got is not one completely unusual to SCOM, but the common solutions for it did not apply:
“The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.”
Generally with this error, the issue is due to a mismatch between the machine name to whom it was issued and the value listed in DNS. It can also occur when there are multiple machines in the resource pool and their certificates haven’t been exchanged. This is fairly well documented, but in our case it was related to something different.
There were other symptoms as well. Manually signing the certificate and managing it still failed.
After quite a bit of digging, it was apparent that the breakdown was with WSMan. You could see this in the WSMan operational log in the event viewer, with errors each time we attempted to deploy the agent through the SCOM console. Manually running the WSMAN piece failed as well. For the record, this is the powershell syntax that SCOM is using to connect via WSMan.
|Test-WSMan –Computername <xxxxxx> –Authentication Basic –Credential (get-Crediential) –Port 1270 –UseSSL|
The error was exactly the same, citing a CRL lookup failure; and likewise, an event showed up in the WSMan logs. WSMan, however, doesn’t appear to have a native way to skip the CA check (or at least one that I could figure out.. WinRM, however, does appear to have one. SCOM will also use WinRM to communicate to communicate. I’ve added those parameters in bold. Running without that generates the same CRL error. I didn’t need all of the parameters. I simply used the –skiprevocationcheck parameter and everything worked.
|winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username:<UNIX/Linux user> -password:<UNIX/Linux password> -r:https://<UNIX/Linux system>:1270/wsman -auth:basic -skipCACheck -skipCNCheck -skiprevocationcheck -encoding:utf-8|
We were eventually able to trace this down to a 3rd party product called Axway, also known as Tumbleweed. This product is used in higher secure environments and aids in CRL checking and certificate authentication. It has an option to bypass CRL checking for self-signed certificates, but this wasn’t working here. I suspect that this is because the SCOM/Unix cert, while technically self signed, is really being issued by scxadmin (a tool on the Unix machine). The certificate’s issuer is listed as SCX-Certificate instead of the machine name of the Unix/Linux machine. As such, Tumbleweed was forcing a CRL check, and when it couldn’t look up the CRL, it would fail.
Uninstalling Tumblweed fixed the problem. That may not be a solution for you, and that’s certainly understandable, but this was the cause. It may be possible to exclude a specific certificate from the check, but keep in mind that this will need to be done for each cert.
For the record, the way Tumbleweed works is that it replaces our cryptography DLL with it’s own. This is information stored in the registry and you can do a quick check to see if that’s the case:
HKEY_LOCAL_MACHINE\Software\Microsoft\Cryptography\OID\Encoding Type 1\CertDllVerifyRevocation\Default
Our DLL is cryptnet.dll
It was noted by one of our escalation engineers that uninstalling it doesn’t always uninstall it.