ConfigMgr 2012: DRS and SQL service broker certificate issues


Hi Folks,

Today's post is about the much talked about feature on ConfigMgr 2012- Data Replication Service (DRS). The replication mechanism internally uses SQL Service broker and I am discussing the three common issues with SSB that can stop the replication.

 

Issue 1: In the SQL error log we see the following:

'Connection handshake failed. Error 15581 occurred while initializing the private key corresponding to the certificate. The SQL Server errorlog and the Windows event log may contain entries related to this error. State 88.'

In the sys.transmission_queue we see this in the transmission_status column:

Service Broker login attempt failed with error: 'Connection handshake failed. An error occurred while receiving data: '10054(An existing connection was forcibly closed by the remote host.)'

Approach 1:

The history was the SQL server was installed with the system account and then later changed to a domain user account.

The problem with doing the above is that when Configuration Manager is installed it creates some internal certificates which are dependent on the master key. When the account being used to run the database server changes the new account is no longer able to 'unlock' the master key and consequently can not read the internal certificates which then cause communication between sites to fail.

In order to address this we dropped and regenerated the master key in the SQL database - this effort was hampered because the user account being used to run the SQL Service appeared not to have sufficient rights to generate a new master key. To combat this we temporarily put the user account into the local administrators group after which a new master key could be generated. We then regenerated the primary site server's certificate using the spCreateandBackupSQLCert stored procedure in the ConfigMgr database. Having successfully regenerated the SSB certificates we copied them to the CAS (parent) and Secondary (child) site and the SQL errorlogs no longer showed SQL Service Broker login failures.

Following this we left the sites to recover but overnight it appeared that some init messages from the Primary had become lost in transit - possibly cleared from the queue as a part of the spCreateandBackupSQLCert stored proc. We set the current rows with status < 3 in RCM_DRSInitilizationTracking to status = 7 to kick off the replication initialization process. This executed very quickly and replication appeared to be working as expected.

Approach 2:

To resolve this issue, give full control permissions to the account that the SQL Service is running under to the ProgramData\Microsoft\Crypto\RSA\MachineKeys directory.

 

Issue 2:

Service Broker login attempt failed with error: 'Connection handshake failed. The login 'ConfigMgrEndpointLoginCAS' does not have CONNECT permission on the endpoint. State 84.

In many cases the Endpoint itself would be missing we can create the same and give it the connect permissions:

CREATE ENDPOINT [ConfigMgrEndpoint] 
AUTHORIZATION 
[domain\account]
STATE=STARTED
AS TCP (LISTENER_PORT = 4022, LISTENER_IP = 
ALL)
FOR SERVICE_BROKER (MESSAGE_FORWARDING = ENABLED
, 
MESSAGE_FORWARD_SIZE = 5
, AUTHENTICATION = CERTIFICATE 
[ConfigMgrEndpointCert]
, ENCRYPTION = REQUIRED ALGORITHM AES)
GO

SQL > Security > logins > checked ConfigMGRENDPOINTLOGINCAS > properites > Securables and there were no permissions for the Connect for the user, Gave Grant for Connect.

 

Issue 3:

Service Broker login attempt failed with error: 'Connection handshake failed. The certificate used by the peer is invalid due to the following reason: Certificate not found. State 89.'.  [CLIENT: 10.172.20.133]

This happens when the public key cert of the other SSB endpoint login for the ConfigMgr somehow goes missing on the other SSB endpoint login.

So suppose if I have the hierarchy is as follows-

PRI

    |

  SEC

   When I run the SpDiagDRS on the PRI site, I see that it cannot find the cert for the SEC site.

Then I export the cert from SEC by using the query:

 

 
Use master
Backup Certificate ConfigMgrEndpointCert TO 
FILE='C:\SEC.CER'

 

 

 

Then copy the SEC.CER file onto the primary site C:\SEC.cer.

After this we can connect to the Primary site DB and run the below query.

 
Use CM_PRI
Exec dbo.spCreateSSBLogin @EndPointLogin='ConfigMgrEndpointLoginSEC', 
@DestSiteCode='SEC', @DestSiteCertFile='C:\SEC.cer', 
@EndpointName='ConfigMgrEndpoint'

 

Hope it helps !

Umair Khan

Support Escalation Engineer | ConfigMgr Microsoft

Comments (17)

  1. Express could cause limitation in many scenarios but You might have to engage the SQL team for further analysis.

  2. Anonymous says:

    hi… after
    ” Use CM_PRI

    Exec dbo.spCreateSSBLogin @EndPointLogin=’ConfigMgrEndpointLoginSEC’, @DestSiteCode=’SEC’, @DestSiteCertFile=’C:SEC.cer’, @EndpointName=’ConfigMgrEndpoint'”

    I have error in SQL
    ” The activated proc ‘[dbo].[spRCMActivation]’ running on queue ‘CM_RU2.dbo.ConfigMgrRCMQueue’ output the following: ‘ERROR 50000, Level 16, State 1, Procedure spRethrowError, Line 42, Message: ERROR 50000, Level 16, State 1, Procedure spRethrowError, Line 42, Message: ERROR 50000, Level 16, State 1, Procedure spGetSSBDialogHandle, Line 58, Message: Route is not defined for target site with service name ConfigMgrRCM_SiteTV2.’ “

  3. Anonymous says:

    very very helpful !!! 🙂

  4. Anonymous says:

    Great article it solved one of my issues. Have you seen issues with SQL Express with memory errors? I get this in the sql errorlog over and over again.

    spid367 Error: 701, Severity: 17, State: 65.
    spid367 There is insufficient system memory in resource pool ‘internal’ to run this query.
    spid293 Error: 18052, Severity: -1, State: 0. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.
    spid293 There is insufficient system memory in resource pool ‘internal’ to run this query.

  5. I see that you have used the SEC I the command and assume that it is the sitecode.
    From the error it seems that the route to the site is missing. You can view the same in the SQL management studio under your DB in the Service broker tab. Compare the number of routes from a working site and replicate/create the same route with the desired sitecode.

  6. turbomcp says:

    thanks a lot for sharing

  7. Shalabh says:

    Amazing work 🙂 Waiting for more DRS blogs from you 🙂

  8. Anonymous says:

    Hi Folks,

    Thanks for your feedback on the previous DRS unleashed post. As promised, this post would

  9. Nisar Shaikh says:

    Hi Umer,
    I am getting different error on my windows event log as follows. Kindly help me out how can i remove this error.
    Service Broker login attempt failed with error: ‘A previously existing connection with the same peer was detected during connection handshake. This connection lost the arbitration and it will be closed. All traffic will be redirected to the previously existing
    connection. This is an informational message only. No user action is required. State 80.’. [CLIENT: 10.61.208.37]

  10. Charu says:

    I have created a secondary site, State is showing ACTIVE, but Database Replication shown Link is being configured from last two days.
    I have found error: “Service Broker login attempt failed with error: ‘Connection handshake failed. The login ‘public’ does not have CONNECT permission on the endpoint. State 84.’. [CLIENT: 172.16.1.169]”.
    Don’t know why it is showing ‘public’ as login ID? Any Idea what needs to be done to solve this issue?

    1. That actually means the ConfigMGrEndPoint login which generally gets created with Public access is not currently linked to it. You can delete and recreate the login again and that should resolve the issue.

  11. Khi says:

    great blog. We are facing the issue since Upgrade to CB 1610 that everytime while Upgrading to latest Current Branch SSB Port gets reset by ConfigMgr. We use Custom Ports. After CAS is upgraded successfully PRI says in replictrl.log “SSB Port was not found for site DEC, so using 4022”. We have to delete the routes manually and set them to right port. After that PRI gets updated to Current Branch.

    Any clue why ConfigMgr cant find configured SSB Port?

    1. We generally store the value in the ‘SSBPort’ on site server Reg Key ‘HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SMS\SQL Server\Site System SQL Account’. Please verify if this is empty which is where it might then default.
      After that we store the value in the DB which can be queries below:

      select SD.SiteCode, RSCP.Name, RSCP.Value from RcmSqlControlProperty RSCP
      inner join RcmSqlControl RSC on RSCP.ControlID = RSC.ID
      inner join SC_SiteDefinition SD on RSC.SiteNumber = SD.SiteNumber
      where RSCP.Name = ‘SQL Server Service Broker Port’

  12. Rajat K says:

    Do we have a certificate for stand alone primary site? I am getting errors in Hman stating that the Service broker certificate is expired or missed. When i check in the database , i see certificate section stating NOT FOUND. But in registry i have the data regarding the certificate. Please share the location of the Service broker certificate.

    1. For a standalone primary with no secondaries this should not be a cause of worry. The certificate can be viewed from master.sys.certificates view.

  13. Rajat K says:

    Hierarchy : CAS and Primary.
    Upon running SPDIAGDRS ..SSB_cert, SSB_certname, SSB_login, SSB_user is missing on both the sides. Primary and on CAS too. Please suggest how shall i go ahead with this. Because in the given example one side was missing on one side and then we exported it from the other site. Whereas in my case its missing on both.

Skip to main content