Changes in the TimerJobHistory table maintenance introduced in November 2017 CU for SharePoint 2013

[The blog post below is based on the information collected by Derek Smay and Alexander Peter, two Microsoft Premier Field Engineers located in the US]

Overview

One of the changes in the November 2017 CU for SharePoint 2013 affects the maintenance of the [dbo].[TimerJobHistory] table in the SharePoint configuration database, which can experience significant growth if it is not trimmed on a regular schedule.

Historically SharePoint 2013 preserved the timerjob history for all timerjobs which ran in the last 7 days. Starting with November 2017 CU for SharePoint 2013 (build 15.0.4981.1002), the maintenance of the [dbo].[TimerJobHistory] table has been updated in a way that it will never keep more than 10,000 job records.

To maintain the [dbo].[TimerJobHistory] table SharePoint only relied on the Delete Job History timer job, which is responsible to purge records older than 7 days. The default schedule is to run once a day but it is possible to change the schedule by updating the frequency in the Central Administration Site. The Delete Job History timer job, invokes the [dbo].[proc_DeleteOldTimerJobHistory] stored procedure which does the actual work.  If this timer job is not enabled or does not complete successfully, the [dbo].[TimerJobHistory] table can grow to hundreds of millions of records and hundreds of Gigabytes, which can impact performance and availability.

With November 2017 CU, an additional stored procedure, [dbo].[proc_DeleteOldTimerJobHistoryByRows], was added, which limits the number of rows in the [dbo].[TimerJobHistory] table table to 10,000 records. Be aware that both of these stored procedures actually maintain three different tables: [dbo].[TimerJobHistory], [dbo].[TimerScheduledJobs] and [dbo].[TimerRunningJobs]. Of these three, the largest table is [dbo].[TimerJobHistory].

The new stored procedure is not called from any specific timerjob – it is executed as the final step of each single timer job running on your system. That means the maintenance of the [dbo].[TimerJobHistory] does no longer rely the successful execution and on proper schedule of a single timerjob.

This is the logic built into the execution of SharePoint timer jobs after November 2017 CU:

  1. Execute the logic of the timer job including the execution of one or more stored procs to complete intended task of the timer job
  2. Execute the [dbo].[proc_AddTimerJobHistory] proc to document the execution of the completed timer job to the [dbo].[TimerJobHistory] table
  3. Execute the [dbo].[proc_DeleteOldTimerJobHistory] proc to limit the number of records in [dbo].[TimerJobHistory] table to a maximum of 10,000 by removing the oldest records.

This new enhancement eliminates the possibility of the SharePoint configuration database becoming bloated.  By imposing a ceiling on the [dbo].[TimerJobHistory] record count, a number of benefits are recognized such as disk space consumption for the database and its backups, I/O activity, backup impact on network throughput as well as maintenance activities (i.e. CHECKDB, Index, etc).

After running November 2017 CU you should never have to worry about [dbo].[TimerJobHistory] to exceed 10,000 rows. And indeed, this will clearly keep this table in check BUT be careful if there was a problem with the maintenance of your [dbo].[TimerJobHistory] table before installing of November 2017 (or later) CU:

Potential Problem

After installing November 2017 CU each and every timerjob will be responsible to maintain the size of the [dbo].[TimerJobHistory] table. In case that the [dbo].[TimerJobHistory] had several million rows due to the fact that the Delete Job History timer job was disabled or did not work correctly the new maintenance stored procedure will take a very, very long time to complete. As several SharePoint timerjobs will start and end each second that will cause this stored procedure to run in parallel on your SQL server in multiple different SQL sessions.

In this case you can see your SQL server to run on high CPU on each of the available cores till the [dbo].[TimerJobHistory] has been trimmed to the new maximum size.

To prevent this from happening you should ensure that the the Delete Job History job has been running successfully. If you have a SharePoint Config database that is large in size, check the row count and size of [dbo].[TimerJobHistory] table.

SharePoint 2016?

The code change introduced in November 2017 CU for SharePoint 2013 is not included in SharePoint 2016 and is currently not planned to be ported to SharePoint 2016. The reason is that the maintenance of the [dbo].[TimerJobHistory] table in SharePoint 2016 is done differently than in SharePoint 2013. E.g. the Delete Job History job does bulk deletes of 1000 records rows rather than on row after the other and it only keeps records for 3 days compared to 7 days in SharePoint 2013.

But be aware that the proper maintenance of the the [dbo].[TimerJobHistory] table in SharePoint 2016 requires that the Delete Job History job runs successful on a regular basis.

11 Comments


  1. Hooray! Another thing should also be addressed is the number of rows in table TimerScheduledJobs and TimerRunningJobs, in some extreme cases they can be huge too.
    For instance, in one SP2013 farm we have 3000+ site collections and RecordPoint creates 3 timer jobs for each site collection, two of these timer jobs are scheduled to run every minute. It has a huge impact on the farm performance and also cause large SQL transaction logs piling up.

    Reply

  2. Hi Stefan,
    I have deploy CU February 2018, which includes the TimerJobHistory table maintenance changes and I have noticed the 10000 record limit doesn’t keep much in terms of timeframe (just a few hours worth of jobs in my case as we have 5 web application running in the farm). If timer jobs fail, it becomes difficult to troubleshoot. Any recommendations for flagging issues with jobs, other than monitoring ULS?
    Thanks in advance,
    Paul

    Reply

      1. OK! Thanks for the heads-up!
        Pau;

        Reply

      2. Hi Stefan
        We’re currently facing a huge issue on our productive environment (12 SP servers, 30’000 users) after patching to CU March 18 yesterday evening. PSConfig ran on 4 of 12 servers yet and took hours each time. The configDB timerjobhistory table still contains more than 70mio items, decreasing verly slowely. SQL server is under heavy load and we fear that the system goes down completely.
        Is there any “quick fix” avilable on this issue? Can we temporarely stop timer services or other jobs on the sharepoint servers to overcome the issue?
        Advice is much apprechiated.
        Best regards
        Andy

        Reply

  3. We run into an issue with the TimerJobHistory table with the following notification.

    Could not allocate space for object ‘dbo.TimerJobHistory’.’IX_TimerJobHistory_Id’ in database ‘SP_Config’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

    We then realized that the job (once a day) fails with a timeout after 5 minutes. We confirmed that the job really runs but the amount of records in the table was enorm (about 1 milliard and a size of 300 GB only this table!). Every time the job can delete about 1,5 million records and then times out.

    We configured now the job to run every 15′ in order to reduce the amount of logs. We believe that only the query takes long enough and then there is no time to execute the deletion. Can it be? The question here is, if we are producing more logs daily than the delete job can handle, can we reduce the amount of logs being written? Definately we can run the job more than once per day but I am not quite sure if that is normal.

    Lastly when we have the table at the end of day cleaned, can we shrink the Config-DB? I am really not fan at all of touching this db at all.

    Any info, comments or help is appreciated!

    Reply

    1. Hi Johny,
      if you need assistance to shrink the table you need to open a support case.
      Support can provide details how to shrink this quickly but this information needs to be tracked internally to ensure that the product group is informed.
      Thanks,
      Stefan

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.