Exchange 2010: Log Truncation and Checkpoint At Log Creation in a Database Availability Group

In previous versions of Exchange, when a backup was completed, almost all log files prior to the current log file were truncated from the system.  Administrators monitoring the directory would originally see many logs, and post backup note that only a few logs remained.  In Exchange 2010 Service Pack 1 and later administrators note that multiple log files remain on the disk post backup or the appearance that no log files have truncated at all.  In many cases this leads to a belief that logs are actually not truncating successfully or that there is an issue with backups.

 

Why do we see logs remaining on disk for longer?  Exchange 2010 SP1 and newer introduces a change in the behavior of log truncation.  The changes were taken to ensure that replicated copies of databases within a database availability group always had the appropriate log files on the source server to complete an incremental resynchronization. 

 

The change to log truncation is the tracking of Checkpoint At Log Creation.  Remember that in a database availability group we can expect the checkpoint to be approximately 100 logs (or slightly more) off the current log file – this is known as checkpoint depth.  As Exchange creates new log files we stamp into the header of the new log files what log file the checkpoint was pointing at when the current log was created.  For example, let us say that log file 0xA679 (42617) was just created as the current ENN.log.  We can expect that the checkpoint at log creation value stamped within the header of this log file would be approximately 0xA16 (42517).  You can see the checkpoint at log creation value by using eseutil /ml <logfilename> to dump the header of a log file.

 

[PS] P:\DAG\DAG-DB0\DAG-DB0-Logs>eseutil /ml .\E020000A67E.log

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...

      Base name: E02
Log file: .\E020000A67E.log
lGeneration: 42622 (0xA67E)
Checkpoint: (0xA679,8,0)
creation time: 03/11/2012 06:00:48
prev gen time: 03/11/2012 04:01:17
Format LGVersion: (7.3704.16.2)
Engine LGVersion: (7.3704.16.2)
Signature: Create time:05/02/2010 18:04:08 Rand:399094376 Computer:
Env SystemPath: d:\DAG\DAG-DB0\DAG-DB0-Logs\
Env LogFilePath: d:\DAG\DAG-DB0\DAG-DB0-Logs\
Env Log Sec size: 512 (matches)
Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
( off, 1027, 51350, 16384, 51350, 2048, 2048, 29487)
Using Reserved Log File: false
Circular Logging Flag (current file): off
Circular Logging Flag (past files): off
Checkpoint at log creation time: (0xA679,8,0)
1 d:\DAG\DAG-DB0\DAG-DB0-Database\DAG-DB0.edb
dbtime: 18078306 (0-18078306)
objidLast: 2957
Signature: Create time:05/02/2010 18:04:08 Rand:399127765 Computer:
MaxDbSize: 0 pages
Last Attach: (0xA348,9,86)
Last Consistent: (0xA346,9,B5)

      Last Lgpos: (0xa67e,252,0)

Number of database page references: 770

Integrity check passed for log file: .\E020000A67E.log

Operation completed successfully in 0.265 seconds.

 

In the previous example the checkpoint at log creation is 0xA679.

 

Within a DAG all servers that contain a replicated copy of a database report the maximum log file that is eligible for truncation.  These values are reported to the active node which subsequently calculates the maximum log file for truncation.  In simplest terms the following process occurs:

 

  • Passive copy on Node-2 reports OK to truncate log 0xA679 (42617).
  • Passive copy on Node-3 reports OK to truncate log file 0xA678 (42616)
  • Passive copy on Node-4 reports ok to truncate log 0xA679 (42617).
  • The active node determines that the best log file eligible for truncation based on the passive copies is 0xA678 (42616).  [This is essentially the minimum of all reported OK logs to truncate.]
  • The active node then looks at the checkpoint at log creation of 0xA678 (42616) and determines that value is 0xA614 (42516).  In this example that would be 100 logs off the best log reported for truncation of the passive copies.
  • The active node sets the truncation point to be log 0xA614 (42516).
  • Therefore after a successful backup logs prior to 0x614 (42516). would truncate.

 

This essentially means that 100 additional logs that would have previously truncated prior to this change do not truncate.

 

Taking into account checkpoint at log creation administrators can better understand how log files are truncated and why log files remain on disk after a backup that might have in prior versions been truncated.

 

============================

Update 5/16/2012

Corrected hex conversions in example.

============================