Log file shipping in SCR "deep dive"

 

Ok, so let's get started with the gears and cogs of how it works.

With Exchange 2007 we implement a new feature called log shipping, utilizing the Microsoft Exchange Replication Service. The service is a managed code application that runs in the Microsoft.Exchange.Cluster.ReplayService.exe process and is a pull model.

On the source server, once SCR is enabled we create a read-only, hidden share on the log file folder. The share name we use is the storage group's objectGUID, followed by a $ ($ hides the share). 

In active directory the replication service uses the attribute msExchStandbyCopyMachines to identify if the storage group is enabled for replication and if so also denotes the target machine(s) fully qualified domain name (FQDN) and the settings for replaylagtime and truncationlagtime.

At this point we create the target folders and paths for log file placement and database placement (this is auto-magically done). Note, these path locations will use the exact path that the source uses and are not configurable. So if you are using E:\SG1Logs as your log path on the source server, then the replication service must be able to create the same E:\SG1Logs on the target server and the same holds true for the database pathing as well.

Now to the actual log shipping, once a log file is closed, the Replication service uses an ESE API to inspect and replay log files that are copied over from the source storage group to the target (copy) storage group. When a log file is successfully copied into the inspector directory, the log inspector object associated with the replica instance verifies the log file header, checksum, and signature.

If everything is correct, the log file will then be moved to the target log directory and then set for replay into the passive copy of the database (remember we do not replay immediately in a SCR environment).

*Note* If a log file cannot be copied from the source a re-seed to the target is required.

Replay happens when the built-in lag time of 50 log files, and the value of the ReplayLagTime parameter has elapsed (or the default ReplayLagTime of 24 hours) has elapsed.

There are 2 other directories that you may see in your source log folder, "IgnoredLogs" and "catalogdata-<guid>-<guid>":

1. IgnoredLogs Directory - used to keep log files that cannot be set for replay.

This might occur for any of the following reasons:

The header information did not pass verification, the file is out of date (to old), checksum did not pass etc.

2. catalogdata-<guid>-<guid> - (Has nothing to do with Log Shipping) The Microsoft Exchange Search Indexer service (MSExchangeSearch) allows users to perform full-text searches of documents and attachments in messages. Search indexes are not stored in Exchange databases. The search index data for a specific mailbox database is stored in a directory in the same location as the database files. The directory name follows the convention CatalogData-<guid>-<guid> where the first <guid> is the GUID of the database and the second <guid> represents the Instance GUID, which is used in the clustered scenario to distinguish between the nodes.

Also the IgnoredLogs directory can contain two other subdirectories, one for failed inspection log files and is named accordingly "InspectionFailed" and the other "E00OutofDate":

InspectionFailed - This directory holds all log files that failed inspection. Any time we fail inspection of a log file, we trigger an event 2013 in the application event logs. We remove the log file from the inspector directory and place it in the InspectionFailed subdirectory of IgnoredLogs.

E00OutofDate - If there is an old E00.log file existing on the target copy at the time of activation, it is moved here. You will see this E00.log file on the target if it was running as a source server at any time previously. 

 

The complete path from creation to set for replay is as follows:

Source Database -> Store -> Source Log Directory -> Replication Service -> Inspector Directory -> Replication Service -> Target Log Directory -> Replication Service -> Set For Replay in Target Database 

 

How do you know what the status of your target copy is and where you stand with Copy and Replay Queue length?

There are a couple of ways to view this information:

1. You can use Exchange Management Shell and the Get-StorageGroupCopyStatus -Identity <server name>\<storage group name> -StandbyMachine <target server name>

or

2. With the implementation of SP1 for Exchange Server 2007 the Microsoft Exchange Replication service creates an instance of the counters in the following table for each storage group copy. This enables you to independently monitor the health and performance of each storage group. You can monitor the health and status of each storage group by monitoring the ReplayQueueLength and CopyQueueLength counters under the MSExchange Replication performance object.

 

 

Counter name Counter description

Copy Queue Exceeds Mount Threshold (CCR only)

Indicates if the copy queue length is greater than the threshold specified by the auto database mount dial. In a CCR environment, the value for this counter will be 1 if the auto database mount dial threshold is exceeded. The value will always be 0 in an LCR environment.

CopyGenerationNumber

Indicates the generation sequence number of the last log file that has been copied.

CopyNotificationGenerationNumber

Indicates the generation sequence number of the last log file known to the Microsoft Exchange Replication service.

CopyQueueLength

Indicates the number of log files waiting to be copied and inspected.

Failed

With a value of 1, indicates that continuous replication is in a Failed state for the selected instance (storage group). A value of 0 indicates that continuous replication is not in a Failed state.

Initializing

With a value of 1, indicates that continuous replication is in an Initializing state for the selected instance (storage group). This state indicates that the storage group copy is performing initial startup checks or that the Microsoft Exchange Replication service is performing an incremental reseed. A value of 0 indicates that continuous replication is not in an Initializing state.

InspectorGenerationNumber

Indicates the generation sequence number of the last log file that was inspected.

ReplayBatchSize

Indicates the number of log files that have been replayed together.

ReplayGenerationNumber

Indicates the generation sequence number of the last log file that was replayed successfully.

ReplayGenerationsComplete

Indicates the number of log files replayed in the current batch.

ReplayGenerationsPerMinute

Indicates the rate of replay (in log generations per minute) for the current batch.

ReplayGenerationsRemaining

Indicates the number of log generations remaining to be replayed in the current batch.

ReplayNotificationGenerationNumber

Indicates the generation sequence number of the last log file known to the Microsoft Exchange Replication service.

ReplayQueueLength

Indicates the number of log files waiting to be replayed.

Suspended

With a value of 1, indicates that continuous replication activity is suspended. Suspended means that log files are not being copied or replayed into the passive copy.

TruncatedGenerationNumber

Indicates the generation sequence number of the last log file truncated by the Microsoft Exchange Replication service.

 

Look at 

Monitoring Continuous Replication for further details that the above blurb and table was pulled from.

 

Hopefully after reading this your understanding of log shipping is more in depth!

 

Other references:

Planning for Standby Continuous Replication

 

Managing Standby Continuous Replication