I spent the better part of a day and a half working with a client on a rather frustrating issue deploying the SCOM agent to Linux machines. I ended up working with a few people internally until we were all able to narrow down what it was (special thanks to Kris Bash, Steve Webber, and Ken Engelhardt on this one).
Anyways, our deployment was relatively straight forward, we could install the agent, but we were unable to sign the certificate. The error we got is not one completely unusual to SCOM, but the common solutions for it did not apply:
“The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.”
Generally with this error, the issue is due to a mismatch between the machine name to whom it was issued and the value listed in DNS. It can also occur when there are multiple machines in the resource pool and their certificates haven’t been exchanged. This is fairly well documented, but in our case it was related to something different.
There were other symptoms as well. Manually signing the certificate and managing it still failed.
After quite a bit of digging, it was apparent that the breakdown was with WSMan. You could see this in the WSMan operational log in the event viewer, with errors each time we attempted to deploy the agent through the SCOM console. Manually running the WSMAN piece failed as well. For the record, this is the powershell syntax that SCOM is using to connect via WSMan.
|Test-WSMan –Computername <xxxxxx> –Authentication Basic –Credential (get-Crediential) –Port 1270 –UseSSL|
The error was exactly the same, citing a CRL lookup failure; and likewise, an event showed up in the WSMan logs. WSMan, however, doesn’t appear to have a native way to skip the CA check (or at least one that I could figure out.. WinRM, however, does appear to have one. SCOM will also use WinRM to communicate to communicate. I’ve added those parameters in bold. Running without that generates the same CRL error. I didn’t need all of the parameters. I simply used the –skiprevocationcheck parameter and everything worked.
|winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -username:<UNIX/Linux user> -password:<UNIX/Linux password> -r:https://<UNIX/Linux system>:1270/wsman -auth:basic -skipCACheck -skipCNCheck -skiprevocationcheck -encoding:utf-8|
We were eventually able to trace this down to a 3rd party product called Axway, also known as Tumbleweed. This product is used in higher secure environments and aids in CRL checking and certificate authentication. It has an option to bypass CRL checking for self-signed certificates, but this wasn’t working here. I suspect that this is because the SCOM/Unix cert, while technically self signed, is really being issued by scxadmin (a tool on the Unix machine). The certificate’s issuer is listed as SCX-Certificate instead of the machine name of the Unix/Linux machine. As such, Tumbleweed was forcing a CRL check, and when it couldn’t look up the CRL, it would fail.
Uninstalling Tumblweed from the management server fixed the problem. That leaves us with a couple of solutions:
- Uninstall the product. Not the best choice.
- See if you can get the certificate excluded. This is ideal, though in my case, no one seemed to know who that person was.
- Temporary bypass.
Tumbleweed updates the following registry key with it's own information:
HKEY_LOCAL_MACHINE\Software\Microsoft\Cryptography\OID\Encoding Type 1\CertDllVerifyRevocation\Default
Our DLL is cryptnet.dll
You can remove their DLL and replace it with ours, deploy the agent, and then put it back. Everything works.