OpsMgr 2007: Agents stuck in Pending Management with Event ID 21016

I was going through some of the calls we've been getting lately and this issue seems to be quite popular.  The description and resolution is below so maybe if you run into it now you can save yourself some time and a phone call. 

==========

When you deploy a System Center Operations Manager 2007 agent using the Discovery Wizard, the installation completes successfully but the computer remains in the Pending Management view under "Type: Installation in Progress." When you right-click the computer in Pending Management, the only command available is Reject. The Approve and Install Agent commands are unavailable. If you reject the computer in Pending Management, it reappears in the same state the next time you start the OpsMgr Health Service on the agent.

Additionally, each time OpsMgr Health Service starts on the agent, it logs an event in the Operations Manager log that is similar to the following:

Event Type:    Error
Event Source:    OpsMgr Connector
Event Category:    None
Event ID:    21016
Date:
Time: 
User:        N/A
Computer:    AGENT
Description:  OpsMgr was unable to set up a communications channel to OPSMGRMS.momv3.local and there are no failover hosts.  Communication will resume when OPSMGRMS.momv3.local is both available and allows communication from this computer.

In this scenario, if you install an agent manually, the manually installed agent logs the OpsMgr Connector 21016 event, but never appears in the Operations Console. This occurs even when you have enabled the option to review or automatically approve manually installed agents.

Cause:

The "Installation in Progress" pending management type indicates that the agent was installed, but has never successfully connected to the management server. In this state, the Approve command is unavailable by design (because it's a push install) and the only options are to reject the agent or fix the communication problem. General connectivity problems can cause this, however the most likely cause is a Kerberos error.

These symptoms can occur when the ServicePrincipalName (SPN) for the management server's HealthService is not registered or is not registered correctly (e.g. there's a duplicate SPN). In this scenario, the agent may log the following two events immediately prior to the OpsMgr Connector 21016 event:

Event Type:    Error
Event Source:    OpsMgr Connector
Event Category:    None
Event ID:    20057
Date:      
Time:     
User:        N/A
Computer:    AGENT
Description: Failed to initialize security context for target MSOMHSvc/OPSMGRMS.momv3.local The error returned is 0x80090303(The specified target is unknown or unreachable).  This error can apply to either the Kerberos or the SChannel package.

Event Type:    Error
Event Source:    OpsMgr Connector
Event Category:    None
Event ID:    21001
Date:       
Time:    
User:        N/A
Computer:    AGENT
Description: The OpsMgr Connector could not connect to MSOMHSvc/OPSMGRMS.momv3.local because mutual authentication failed.  Verify the SPN is properly registered on the server and that, if the server is in a separate domain, there is a full-trust relationship between the two domains.

If you enable Kerberos event logging on the agent by using the steps in KB 262177, the agent logs events similar to the following in the System log:

Event Type:    Error
Event Source:    Kerberos
Event Category:    None
Event ID:    3
Date:
Time:
User:        N/A
Computer:    AGENT
Description:
A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 
Error Code: 0x7  KDC_ERR_S_PRINCIPAL_UNKNOWN
Extended Error:
Client Realm:
Client Name:
Server Realm: MOMV3.LOCAL
Server Name: MSOMHSvc/opsmgrms.momv3.local
Target Name: MSOMHSvc/opsmgrms.momv3.local@MOMV3.LOCAL
Error Text:
File: 9
Line: ae0
Error Data is in record data.

A network trace shows the same 0x7  KDC_ERR_S_PRINCIPAL_UNKNOWN response received by the agent computer from the KDC.

These symptoms indicate SPN registration issues for the management server's HealthService. To check SPN registration, use setspn.exe from the Windows Server 2003 Support Tools. A copy of this tool is also available for download at the following URL:
https://www.microsoft.com/downloads/details.aspx?FamilyID=5fd831fd-ab77-46a3-9cfe-ff01d29e5c46&DisplayLang=en

To list SPNs for a computer or user account, use the following syntax:

setspn -L computername
-or-
setspn -L username

If the management server's OpsMgr Health Service logon account is Local System, its HealthService SPNs should be registered with the computer account in AD. If the management server's OpsMgr Health Service logon account is a user (not common), its HealthService SPNs should be registered with the user account. Be aware that the OpsMgr Health Service logon account may be different from the management server's Action Account. To see the logon account, open the Services MMC snap-in, double-click OpsMgr Health Service and click the Log On tab.

The HealthService SPNs should be as follow:

MSOMHSvc/FQDN
MSOMHSvc/COMPUTERNAME

Example setspn -L output
=======================================================================

Using Local System for the OpsMgr Health Service logon account:

Registered ServicePrincipalNames for CN=OPSMGRMS,CN=Computers,DC=momv3,DC=local:

    MSOMHSvc/OPSMGRMS
    MSOMHSvc/OPSMGRMS.momv3.local
    HOST/OPSMGRMS
    HOST/OPSMGRMS.momv3.localset

Using a domain user as the OpsMgr Health Service logon account for three different management servers:

Registered ServicePrincipalNames for CN=hservice_acct,CN=Users,DC=momv3,DC=local:
    MSOMHSvc/OPSMGRMS1
    MSOMHSvc/OPSMGRMS1.momv3.local
    MSOMHSvc/OPSMGRMS2
    MSOMHSvc/OPSMGRMS2.momv3.local
    MSOMHSvc/OPSMGRMS3
    MSOMHSvc/OPSMGRMS3.momv3.local

=======================================================================

Be aware that the OpsMgr Health Service will attempt to register its SPN every time it starts. If automatic SPN registration is failing, it indicates that the service's logon account doesn't have permission in Active Directory to register its SPN. In that case, you can run setspn -A to add the SPNs manually. You must run this command using a Domain Admin account. For example, to register the HealthService SPNs for a management server named opsmgrms.momv3.local that uses Local System as its OpsMgr Health Service logon account, run the following commands:

setspn -A MSOMHSvc/OPSMGRMS opsmgrms
setspn -A MSOMHSvc/OPSMGRMS.momv3.local opsmgrms

Hope this helps!


Michael Sadoff | Support Escalation Engineer