Today's post is about the much talked about feature on ConfigMgr 2012- Data Replication Service (DRS). The replication mechanism internally uses SQL Service broker and I am discussing the three common issues with SSB that can stop the replication.
Issue 1: In the SQL error log we see the following:
'Connection handshake failed. Error 15581 occurred while initializing the private key corresponding to the certificate. The SQL Server errorlog and the Windows event log may contain entries related to this error. State 88.'
In the sys.transmission_queue we see this in the transmission_status column:
Service Broker login attempt failed with error: 'Connection handshake failed. An error occurred while receiving data: '10054(An existing connection was forcibly closed by the remote host.)'
The history was the SQL server was installed with the system account and then later changed to a domain user account.
The problem with doing the above is that when Configuration Manager is installed it creates some internal certificates which are dependent on the master key. When the account being used to run the database server changes the new account is no longer able to 'unlock' the master key and consequently can not read the internal certificates which then cause communication between sites to fail.
In order to address this we dropped and regenerated the master key in the SQL database - this effort was hampered because the user account being used to run the SQL Service appeared not to have sufficient rights to generate a new master key. To combat this we temporarily put the user account into the local administrators group after which a new master key could be generated. We then regenerated the primary site server's certificate using the spCreateandBackupSQLCert stored procedure in the ConfigMgr database. Having successfully regenerated the SSB certificates we copied them to the CAS (parent) and Secondary (child) site and the SQL errorlogs no longer showed SQL Service Broker login failures.
Following this we left the sites to recover but overnight it appeared that some init messages from the Primary had become lost in transit - possibly cleared from the queue as a part of the spCreateandBackupSQLCert stored proc. We set the current rows with status < 3 in RCM_DRSInitilizationTracking to status = 7 to kick off the replication initialization process. This executed very quickly and replication appeared to be working as expected.
To resolve this issue, give full control permissions to the account that the SQL Service is running under to the ProgramData\Microsoft\Crypto\RSA\MachineKeys directory.
Service Broker login attempt failed with error: 'Connection handshake failed. The login 'ConfigMgrEndpointLoginCAS' does not have CONNECT permission on the endpoint. State 84.
In many cases the Endpoint itself would be missing we can create the same and give it the connect permissions:
SQL > Security > logins > checked ConfigMGRENDPOINTLOGINCAS > properites > Securables and there were no permissions for the Connect for the user, Gave Grant for Connect.
Service Broker login attempt failed with error: 'Connection handshake failed. The certificate used by the peer is invalid due to the following reason: Certificate not found. State 89.'. [CLIENT: 10.172.20.133]
This happens when the public key cert of the other SSB endpoint login for the ConfigMgr somehow goes missing on the other SSB endpoint login.
So suppose if I have the hierarchy is as follows-
When I run the SpDiagDRS on the PRI site, I see that it cannot find the cert for the SEC site.
Then I export the cert from SEC by using the query:
Then copy the SEC.CER file onto the primary site C:\SEC.cer.
After this we can connect to the Primary site DB and run the below query.
Hope it helps !
Support Escalation Engineer | ConfigMgr Microsoft