How To Equip Your Windows Server Environment With A Blackbox Flight Recorder


Summary:   Holger Hatzfeld , a Microsoft Senior Support Escalation Engineer, provides us with a PowerShell script that deploys customizable performance logs to any or all of you domain-connected servers, effectively mimicking flight recorders for server performance capture. Powerful stuff.


Image of a Flight RecorderHello, my name is Holger Hatzfeld and I am a Microsoft Senior Support Escalation Engineer (SEE) in the Platforms Support group here in Microsoft, Germany. This post will cover the steps needed to deploy an Always-On-Performance-Capture for your environment which will provide you with a basic set of performance data that could guide you for further troubleshooting. The idea is to utilize PowerShell to query your AD for Servers, then use Logman to create the performance log on every server using a detailed set of counters, and modifying the local registry to start the performance counter log every time the system starts.

Note that Microsoft has utilized this approach to deploy performance logs to over 1,000 systems within minutes during critical situations to ensure that we have the right set of data in place to catch any issues.  But let’s take this one step at the time and start by identifying every server in your domain environment. 

The usual disclaimer: please note that the following script(s) are provided as-is and therefore not officially supported by Microsoft.

STEP 1:  Find Your Servers

Use PowerShell to query AD for Servers.  With the following part of the script (I placed the entire script at the end of the post for your convenience) you query your domain for every ServerOS from any System in your AD-environment.

 

$cred = Get-Credential contoso\MyUser # Enter Domain-User Account which has also local Admin-rights, in this Example Contoso\MyUser

# We use the invoke command to run the query on any DC and store the result locally in $servers

$servers = Invoke-Command -cn contoso-dc1 -cred $cred -script {import-module ActiveDirectory; #Enter Name of DC, in this example contoso-dc1

Get-ADComputer -LDAPFilter "(&(objectcategory=computer)(OperatingSystem=*server*))"}

The next step is to create the performance log on every server we identified. For that we utilize LogMan which is part of the Windows operating system.

STEP 2: Create performance logs on your Servers

Run logman to create a performance log and then start it. Here we run logman against every Server found in AD-Query and generate a performance counter log based on a list of counters saved in a text file named counters.txt.  After that, we start each Performance counter.

$Servers | foreach

{

# Write ServerName in the Console

write-host "Create Blackbox-Counter on : " $_.name

# use Logman to generate counter on every Server. Log size is limited to 250 MB as circular logfile. Sample Interval: 15sec. Works also on Windows Server 2003.        

Logman create counter BlackBox -s $_.name -v mmddhhmm -cf counters.txt -si 00:15 -f bincirc -o "c:\Perflogs\Blackbox_%computername%" -max 250

# use Logman to start counter on every Server. Works also on Windows Server 2003.        

Logman start -s $_.name Blackbox

}

Here’s the content of the counters.txt file:

\Cache\Dirty Pages

\Cache\Lazy Write Flushes/sec

\LogicalDisk(*)\% Free Space

\LogicalDisk(*)\% Idle Time

\LogicalDisk(*)\Avg. Disk Bytes/Read

\LogicalDisk(*)\Avg. Disk Bytes/Write

\LogicalDisk(*)\Avg. Disk Queue Length

\LogicalDisk(*)\Avg. Disk sec/Read

\LogicalDisk(*)\Avg. Disk sec/Write

\LogicalDisk(*)\Current Disk Queue Length

\LogicalDisk(*)\Disk Bytes/sec

\LogicalDisk(*)\Disk Reads/sec

\LogicalDisk(*)\Disk Transfers/sec

\LogicalDisk(*)\Disk Writes/sec

\LogicalDisk(*)\Free Megabytes

\Memory\% Committed Bytes In Use

\Memory\Available MBytes

\Memory\Cache Bytes

\Memory\Commit Limit

\Memory\Committed Bytes

\Memory\Free & Zero Page List Bytes

\Memory\Free System Page Table Entries

\Memory\Long-Term Average Standby Cache Lifetime (s)

\Memory\Pages Input/sec

\Memory\Pages Output/sec

\Memory\Pages/sec

\Memory\Pool Nonpaged Bytes

\Memory\Pool Paged Bytes

\Memory\System Cache Resident Bytes

\Memory\Transition Pages RePurposed/sec

\Network Inspection System\Average inspection latency (sec/bytes)

\Network Interface(*)\Bytes Received/sec

\Network Interface(*)\Bytes Sent/sec

\Network Interface(*)\Bytes Total/sec

\Network Interface(*)\Current Bandwidth

\Network Interface(*)\Output Queue Length

\Network Interface(*)\Packets Outbound Errors

\Network Interface(*)\Packets Received/sec

\Network Interface(*)\Packets Sent/sec

\Network Interface(*)\Packets/sec

\Paging File(*)\% Usage

\PhysicalDisk(*)\Avg. Disk Queue Length

\PhysicalDisk(*)\Avg. Disk sec/Read

\PhysicalDisk(*)\Avg. Disk sec/Write

\PhysicalDisk(*)\Current Disk Queue Length

\PhysicalDisk(*)\Disk Bytes/sec

\PhysicalDisk(*)\Disk Reads/sec

\PhysicalDisk(*)\Disk Writes/sec

\Process(*)\% Privileged Time

\Process(*)\% Processor Time

\Process(*)\Handle Count

\Process(*)\ID Process

\Process(*)\IO Data Operations/sec

\Process(*)\IO Other Operations/sec

\Process(*)\IO Read Operations/sec

\Process(*)\IO Write Operations/sec

\Process(*)\Private Bytes

\Process(*)\Thread Count

\Process(*)\Virtual Bytes

\Process(*)\Working Set

\Processor Information(*)\% DPC Time

\Processor Information(*)\% Interrupt Time

\Processor Information(*)\% of Maximum Frequency

\Processor Information(*)\% Privileged Time

\Processor Information(*)\% Processor Time

\Processor Information(*)\% User Time

\Processor Information(*)\DPC Rate

\Processor Information(*)\Parking Status

\Processor(*)\% DPC Time

\Processor(*)\% Interrupt Time

\Processor(*)\% Privileged Time

\Processor(*)\% Processor Time

\Processor(*)\% User Time

\Processor(*)\DPC Rate

\Server\Pool Nonpaged Failures

\Server\Pool Paged Failures

\Server\Work Item Shortages

\Server\Server Sessions

\Server\Logon/sec

\Objects\Processes

\System\Context Switches/sec

\System\Processor Queue Length

\System\System Calls/sec

\TCPv4\Connection Failures

This set of counters have been reviewed and defined by several product groups within Microsoft. With this set of counter we take the approach to say: If Windows, as the operating system has a performance problem, any application running on it will be impacted as well. And we do define AD, Exchange, SharePoint, SQL, Hyper-V and so on as Application.

After this Step we have equipped every single Server System in your environment with a performance log which will not grow over 250 MB with a 15 second interval. But if the server reboots, for whatever reasons, we want to start this log automatically. This leads to the next step:

STEP 3: Start Logging Automatically

Create the Registry-Entry below the RUN-Key of HKLM.  With the following part of the PowerShell script you create an entry below the RUN-Key for Local Machine, which will be executed every time the system starts and it does not require a user to logon.

$Servers | foreach

{

# Write ServerName in the Console

write-host "Create Registry key to start Blackbox-Counter on : " $_.name

# Create registry key in the RUN-Key for HKLM using Invoke-command

Invoke-Command -Cn $_.name -Credential $cred –ScriptBlock

{New-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run -Name Logman -Value "logman start blackbox"}

}

The Complete Script for Your Reference

Here’s the whole script in one place for your reference.  I'd recommend using the built-in PowerShell ISE tool to view/modify the script.

# Script to roll out performance log as BlackBox-Flight Recorder on every server in your domain

# written by Holger Hatzfeld, Microsoft. Please note that this script is provided as it is and therefore not supported by MS.

# Requirements: Firewall configured for Performance Logs & Alerts and WinRM, PowerShell scripts should be allowed, a txt-File named counters.txt in the folder where you execute this script. 

# We define a list of Server by using an AD-Attribute of computer accounts named “OperatingSystem”

$cred = Get-Credential contoso\MyUser # Enter Domain-User Account which has also local Admin-rights, in this Example Contoso\MyUser

# We use the invoke command to run the query on any DC and store the result locally in $servers

# Enter Name of DC, in this example contoso-dc1

$servers = Invoke-Command -cn contoso-dc1 -cred $cred -script

{import-module ActiveDirectory; Get-ADComputer -LDAPFilter "(&(objectcategory=computer)(OperatingSystem=*server*))"}

# Generate BlackBox on each server in my domain and create Registry-Key to start performance log after reboot automatically

$Servers | foreach

       {

# Write server name in the Console

write-host "Create Blackbox-Counter on : " $_.name

# use Logman to generate counter on every server. Log-size is limited to 250 MB and configured as circular logfile with a sample interval of 15sec. Works also on Windows Server 2003.        

Logman create counter BlackBox -s $_.name -v mmddhhmm -cf counters.txt -si 00:15 -f bincirc -o "c:\Perflogs\Blackbox_%computername%" -max 250

# use Logman to start counter on every server. Works also on Windows Server 2003.        

Logman start -s $_.name Blackbox

}

$Servers | foreach

       {

# Write server name in the Console

write-host "Create Registry key to start Blackbox-Counter on : " $_.name

# Create registry key in the RUN-Key for HKLM using Invoke-command

Invoke-Command -Cn $_.name -Credential $cred –ScriptBlock

{New-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run -Name Logman -Value "logman start blackbox"}

}

If you follow the guidelines outlined above, you should have some basic data which will help you identify many performance related issues in your environment.  If you have any questions, please feel free to leave a comment.


Written by Holger Hatzfeld ; Posted by Frank Battiston , MSPFE Editor