An in-depth look at grooming in System Center Service Manager: Part 2

    • ~ Scott Walker | Senior Support Escalation Engineer

      In the first installment in this series, I discussed a few items related to grooming.  In this post I'm going to focus in on the Internal Job History which keeps track of which grooming jobs ran, when they ran, and whether or not they completed successfully.

      Note: Part 1 can be found here. Part 3 is here.

      Throughout this post I will be using information from other posts, and in the last installment I’ll include a references section to give credit to the excellent blog writers who created content that I am referencing in this series. Now, back to Internal Job History.

      To get started, let's query the InternalJobHistory table in Service Manager to see what's been happening over the last two weeks:

      SELECT * FROM dbo.InternalJobHistory 

      WHERE TimeStarted > getdate()-15 AND StatusCode <> 1 

      ORDER BY TimeStarted DESC

      Ideally, we want to see no results from this query, however if you have results then you'll want to see if any of those that didn't complete successfully eventually did complete.  From time to time, grooming jobs do fail, but if that same job completes at a future time then you generally don't need to be concerned with it.

      Personally, I like to take out the StatusCode qualifier and then save the results out to a tab delimited file so I can slice and dice the results in Excel. This allows us to filter by each command (job) and see if there are any that haven't completed at all, or at least haven't completed in recent days.  Here's the query and what the results look like after filtering them in Excel:

      SELECT * FROM dbo.InternalJobHistory 

      WHERE TimeStarted > getdate()-15 

      ORDER BY TimeStarted DESC

      image

      All of these have a StatusCode of 0, meaning they didn't complete successfully.  0 typically means the job timed out, so to see if this is happening a lot, change your Excel filter to show just one of the failed commands.  This snippet shows that the job I've selected failed, but later looks like it's been completing successfully (where the status code is 1):

      image

      Remember, the ones you need to be concerned with are the ones that don't ever complete, like this one where the status code is 0:

      image

      Your Service Manager/Operations Manager Event logs are also a good place to look so you can correlate the results.  The following events provide confirmation of grooming failures:

      EventID: 10870
      Severity: Error
      Description:
      SqlJob execution failed.
      Mom failed executing SQL Job <Jobname>

      EventID: 10871
      Severity: Error
      Description: 
      Grooming of disabled subscription workflow watermarks failed.

      EventID: 10880
      Severity: Error
      Description:
      Service Manager CMDB Grooming failed.
      Service Manager CMDB Grooming failed.
      GroomingType: <Grooming Type>
      The following error was encountered: <Error>

      Once you know you have jobs that are consistently failing and not completing due to a 0 StatusCode, it's time to ask some questions:

      1. What has changed in the Service Manager environment recently

      One change I see a lot in support that causes grooming to get behind is a change in connectors, namely deleting a connector that was responsible for pulling in a lot of data or deleting a large number of connectors at one time.

      2. Is there a performance issue on the SQL Server hosting the Service Manager Database?

      This may require the assistance of your DB team to investigate further.

      3. How much data is being groomed by the failing jobs?  

      This is a tough one to analyze but it's necessary to know the answer before you proceed with any manual grooming.  If the dataset to be groomed is very large, it may be necessary to incrementally groom, and then ultimately prevent the built-in job from running for hours, or even days which can negatively affect performance in SQL and Service Manager.

      In part 3 I'll show you how to answer this last question for some of the most common grooming jobs.

    Scott Walker | Senior Support Escalation Engineer | Microsoft GBS Management and Security Division

    Get the latest System Center news on Facebook and Twitter:

    clip_image001 clip_image002

    System Center All Up: http://blogs.technet.com/b/systemcenter/

    Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/ 
    Data Protection Manager Team blog: http://blogs.technet.com/dpm/ 
    Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ 
    Operations Manager Team blog: http://blogs.technet.com/momteam/ 
    Service Manager Team blog: http://blogs.technet.com/b/servicemanager 
    Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

    Microsoft Intune: http://blogs.technet.com/b/microsoftintune/
    WSUS Support Team blog: http://blogs.technet.com/sus/
    The RMS blog: http://blogs.technet.com/b/rms/
    App-V Team blog: http://blogs.technet.com/appv/
    MED-V Team blog: http://blogs.technet.com/medv/
    Server App-V Team blog: http://blogs.technet.com/b/serverappv
    The Surface Team blog: http://blogs.technet.com/b/surface/
    The Application Proxy blog: http://blogs.technet.com/b/applicationproxyblog/

    The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
    The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
    The Forefront TMG blog: http://blogs.technet.com/b/isablog/
    The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/