TMG Large Logging Queue: No More SQL Lockdowns?

What you say!?

The new logging system in TMG 2010 is seriously cool, and it’s designed to cope with extended instances of SQL Server going away. Extended meaning multi-hour, but depending on disk space, it could be multi-day.

Short Version

There’s a good detailed description of it here, which I’ll try to crystallize:

The most likely reason for the Firewall Service to stop, and for TMG 2010 to enter Lockdown mode, is a lack of disk space on the TMG box.

Yep. If the LLQ files can’t be created, that’s when the Firewall Service will be stopped. If SQL goes away for a few days… well, that’s okay. We got that covered*.

Long But-Not-That-Long Version

But why does that help avoid lockdown mode? More crystallization required:

The kernel-mode FWENG component is now responsible for logging. When everything’s going swimmingly, it logs to a buffer in memory.

The logs in this memory buffer are the logs that might be lost if the server experiences a hard crash – like a blue screen – and that’s why the registry key names governing it are appropriately pessimistic (both in HKLM\System\CurrentControlSet\Services):

(1) Fweng\Parameters\LogQueueMaxLossCount
(2) Fweng\Parameters\LogQueueMaxLossTimeInSeconds

To get the FWENG buffers out to SQL Server, or MSDE/SQL Express, or text files (more on that later), the Microsoft Forefront TMG Control Service (the service name is still IsaCtrl, by the way) executes a Log Formatter.

The Log Formatter is responsible for taking the raw log information from memory or disk (we’ll get to that), transforming it into the desired output format, and storing it wherever you’ve specified. (The Firewall Service used to do that.)

If FWENG starts exceeding its thresholds - meaning that either the FWENG memory buffers fill up (1) or have remained in memory for Long Enough (2) and haven’t been pulled and stored by IsaCtrl - FWENG will start pushing its Log Queue buffers to disk instead, in .LLQ files.

When the Control service restores its connection to SQL or MSDE, and operations start succeeding again, the logs will be formatted and stored, and the LLQ files deleted as they’re dealt with.

How cool is that? SQL logging here we come!

You mentioned Text Files, but they seem kinda robust?

Well, the log formatter is responsible for transforming log queue records to text format as well, so a text formatting failure would have a similar effect to someone unplugging SQL.

But the reasons I could dream up for that failing are pretty much the same as they currently would be for text logging to fail either way – the disk is full, or very heavily fragmented, or perhaps broken, or Antivirus hasn’t been properly excluded from the path and/or process.

* – pending traffic and disk space. If you can generate 100GB of logs in an afternoon, you’re unlikely to last days before SQL comes back, aren’t ya? Unless you’ve got a billion gig to play with.