Fixing troubled agents


<!--[if lt IE 9]>

<![endif]-->

Comments (33)

  1. Kevin Holman says:

    We really try hard to come up with ways to solve the problem without resrting to editing a SQL table directly…. doing so is really unsupported and should only be done under the direct guidance (or should I say order) of PSS in a case with Microsoft.  There are a few circumstances, where that seems to be the only recourse… but we should exhaust all other options first.

  2. Anonymous says:

    Useful information Kevin,

    I'm also into a situation where 1-2 agents are not healthy. While checking i found config.xml file is not updated though i cleared the cached and even allowed the system to recreate Health Service State folder but that failed too in updating the xml file. I've also noticed Temp folders are not getting created on these agents. I've reinstalled the agent as well. Agent gets into gryed state after a while even if i restart the service.

    In the event log i see lot of logs generated   Rule/Monitor "Microsoft.SystemCenter.LearningModule.FailedInitialization.Alert"  cannot be initialized and will not be loaded" and many more similar to this.

    Any help is appreciated.

  3. Sameer Dave says:

    Thats a very good article Kevin.

    I have seen one more problem where agents are hung in one of the tables of SQL, specially during new installations.

    I have seen that once you delete that information from the tables, then you could install the agent again fine.

    Thanks for the great article once again

  4. TechJet 2010 says:

    Thanks Kevin, very good blog.  Can you provide any additional advice or reasons why an agent health turns grey, we get this a lot ?  the agents are multi-homed, could this be impacting ?

  5. DJ says:

    This Blog is very useful.  As to other potential issues with grey agents, check out this kb  support.microsoft.com/…/2288515.

  6. Muhammad Saad says:

    Simply Log on to DC and run the following commands

    1. hslockdown /L

    you will see NT Authoritysystem is in denied state

    Then run the command to bring it in allowed state

    hslockdown /A "NT AUTHORITYSystem"

    Cheers

    Saad

  7. Coolz203 says:

    hi Kevin,

    Great article.  I have used this advice a few times to help with agetns issues.  However I have come across with an issue I am having a hard time with.  I have an agent deployed and teh agent is showing healthy in the Agent State view.  This particular  agent is on a Windows 208 R2 server.  For some reason the disovery of this windows 2008 server is not working.  I have other windows 2008 servers that are working fine.  The agent knows enough that it is on a windows server, but all of the OS specifc monitors are not active.  The logs show nothing.  I am at a loss here.  I have cleared the cache, repaired the agent.  Any help is apprecieated.  thanks.

  8. jayson says:

    Kevin,

    My problem lies on the Root Management Server. Absolutely everything is running with no issues but for some reason, the Server is greyed out I can restart the service and it is okay for a few minutes, then goes right back into the greyed out status… Operationally and all functions correctly but it just never looks good to see the RMS greyed out… Any ideas?

  9. shahar says:

    After upgrading System Center Essentials 2007 with the latest OS Management Pack, the owner’s agent of the Hyper-V cluster became grayed out.

    If I change the cluster current host server to other server, it becomes grayed out and the previous one (which was the current host server before) becomes healthy again.

  10. Hemant says:

    How to Flush the Health Service State and Cache on multiple machines at a time? any command line utility available?

  11. zahurulislam says:

    It will also apply/reapply any agent related hotfixes in the management server’s Program FilesSystem Center Operations Manager 2007AgentManagement directories.

  12. zahurulislam says:

    It will also apply/reapply any agent related hotfixes in the management server’s Program FilesSystem Center Operations Manager 2007AgentManagement directories.

  13. charlie says:

    "The problem might be that they are not showing up in the console at all!"….any suggestions for diagnosing this problem? This particular agent also has no "OperationsManager". A few of the other logs are there, but many that I typically see in a client
    are missing. This was a manual installation of the ccm client.

  14. charlis says:

    i have this problem and i tried all the steps that i know but the problem still exist.can any one help me..i am using SCOM 2012 ,
    "The System Center Management Health Service 1EC09CB7-1B1E-EAC9-D15A-D2C927046DE2 running on host xxx-xx-xxx.Root.net and serving management group with id {0407FB6F-896A-7389-EA01-D60C72ABBD5A} is not healthy. Some system rules failed to load."

  15. Dominique says:

    Hello Kevin,

    Excellent article for the machine, but do you have something similar for Virtual Machine working through collectors and not reporting…

    Thanks,
    Dom

  16. khalid khan says:

    helpful tips .
    i have an issue . i have scom 2012 sp1 . when i am checking windows server computer group it showing me windows 7 computers as well .
    when i am creating new group for servers it also showing me windows 7 computers mix with windows servers.

  17. Nirmal says:

    Hi Kevin,

    You have been an inspiration since I started working on SCOM. Your blog has helped me a lot. Thanks a ton!

    I have been encountering an issue in my environment. We have a domain say ‘A’ on which we have our MS and we have a gateway server on our another domain B which has trust relationship with domain ‘A’. We have an agent in domain ‘C’ which has two way trust with domain ‘B’.

    When I tried to install agent on the server in domain ‘C’ and make it communicate to the gateway in domain ‘B’.
    Agent is not communicating with the gateway, we could see event 20002 on the gateway and event 20070 on the agent machine.
    It is not able to be authenticated and getting rejected.

    Could you please help me on this issue ?

    1. Kevin Holman says:

      Just because you have a trust between B and C, does not mean Kerberos is supported. If agent in C is rejected by GW in B, it most likely means you need to use certificates between C and B.

  18. Dual Home Agents- we have a multi tired environment where we have monitored servers from Test,QC tiers reporting up to the primary Prod mgmt server just for ease of mgmt. I would like to make their secondary/failover set to be the primary mgmt server for their home tier:
    Example: BTST01 Prim = SCOMP01, Seconday/Failover would be SCOMT01
    Question: would I need the Biztalk (BT) action acct for Prod AND Test tiers to be in the local administrators group on BTST01 server for this to work correctly?
    I have searched for suitable answers but seem to come up short on this particular topic/issue?
    Thanks in advance for your time and assistance!
    T.S.

    1. Kevin Holman says:

      You should send me an email. This is WAY too complex and I barely understand what you are asking. kevin dot holman at microsoft dot com

      1. Morning, understood, thank you for taking the time to respond! email forthcoming.
        T.S.

  19. Dinesh Tashildar says:

    Here is another agent issue scenario
    All servers including management server is part of one common AD domain called AD.XXX.com domain

    When I ran discovery on server to install agent then management server discover it as .abc.com and agent gets install successfully. But on same server DNS suffix is registered as .xyz.com.

    After successful agent installation, agent never connect to management server successfully. It throws following error in event viewer and management server always shows in “not monitor” state.

    The OpsMgr Connector connected to mgmtserver.abc.com, but the connection was closed immediately after authentication occurred. The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration. Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

  20. Brian Wright says:

    I have a very stubborn agent that is stuck and can no longer be reinstalled. I’ve uninstalled, cleaned registry, and file system but if I try to manually install (2016 UR4 MG) it never shows up in pending management (I’ve set my GMS Settings to review), and when I try to push the discovery fails with “there were no computers disvovered…”. I’ve deleted the agent, ran remove-scomdisabledclassinstance, and did state and object cleanup in the SQL, but when I use the search in the console I still get all the objects that are associated with this agent. Get-SCOMPendingManagement yields no results, and I can’t reinstall this agent into my SCOM 2016 environment, please help.

  21. Abhijeet Gore says:

    Hi Kevin Sir

  22. Abhijeet Gore says:

    Hello Sir

  23. Sreejeet says:

    Hi Kevin,
    We are having an issue with one of the SCOM monitored servers, where sometimes the SCOM Agent goes under Not Monitored state, other days we find the Windows Operating System State goes under Not Monitored state but the SCOM Agent remains healthy. So we tired to Flush the health service state, repaired the SCOM Agent at that time the Windows OS state and Agent State comes back to healthy but after a couple of days the issue reoccures. Could you please advice on this issue.

  24. Jaggu1684 says:

    Hi Kevin,

    I have couple of scom agents on windows server 2008 R2 datacenter SP1 and they couldn’t talk to management server. Both Management server and agents are in the same domain and there are no firewalls in between. I only see 21023 events in on agents Opsmgr eventlog as below

    OpsMgr has no configuration for management group ‘XXXXX’ and is requesting new configuration from the Configuration Service.

    I have troubleshooted the issue in all the ways but couldn’t figure out the root cause. Any suggestions?

    1. Kevin Holman says:

      I’d look in SQL in the AgentPendingActions table – and see if they are present there as “ghosts”.
      https://blogs.technet.microsoft.com/kevinholman/2008/09/29/agent-pending-actions-can-get-out-of-synch-between-the-console-and-the-database/

      Do these show up in “Agent Managed”? Have you tried deleting them, and let them come back in?

      1. Jaggu1684 says:

        Actually the agents are auto approved in our environment. They don’t go to Pending management. Yes, they are showing up in Agent managed as Not Monitored. I tried deleting them in Agent managed and they are coming back in.

        1. Kevin Holman says:

          That means they are working – but cannot get config. This usually means something is broken with their parent path (GW and/or MS) where the parents need their cache flushed, OR – you are having problems generating/calculating new config (snapshot/delta failing) or you have management servers where the config service isnt running.

          Lastly – you can see this happen where is something orphaned in the SCOM database breaking config, or just config for that specific agent.

          1. Jaggu1684 says:

            I have verified and removed couple of orphaned entries for other agents in the Opsmgr DB. But still thet doesn’t turnup green in the agent managed view for the problematic agents.

  25. Anonymous says:
    (The content was deleted per user request)
Skip to main content