Great tool for Windows 2003: Server Performance Advisor (SPA)

First off you can download SPA 2.0 here.  I'm going to explain how to quickly use SPA, and then what type of data is returned in this post.

What is SPA?

So what is SPA and how can you use it?  Well the official overview is:

Microsoft ® Windows Server ™ 2003 Performance Advisor is the latest version of Server Performance Advisor, which is a simple but robust tool that helps you diagnose the root causes of performance problems in a Microsoft Windows Server 2003 deployment. Server Performance Advisor collects performance data and generates comprehensive diagnostic reports that give you the data to easily analyze problems and develop corrective actions
Microsoft ® Windows Server ™ 2003 Performance Advisor provides several specialized reports, including a System Overview (focusing on CPU usage, Memory usage, busy files, busy TCP clients, top CPU consumers) and reports for server roles such as Active Directory, Internet Information System (IIS), DNS, Terminal Services, SQL, print spooler, and others.

Really I think of it as network monitor and performance monitor wrapped into one package so that you can correlate which clients might be causing load on your system.

Some nifty things about SPA:

1) It's XML based so the reports that are collected get organized "automagically" by date and server so you can drill down on a particular server.   You could have a thousand reports on your reporting server and its quite easy to navigate via IE to the server and date that you are looking for.

2) You can setup SPA on your servers in "Data" mode and then setup a member server as a SPA "reporting" server, then you can schedule your servers to collect at a certain time and send that data to the reporting server.  You can also have SPA (with version 2.0) take the data from those servers and put it in a SQL database for trending purposes.  This is what we do internally, we setup the jobs to run at 10 and 2 to get peak utilization trending on our domain controllers.  There is a chm file with SPA with more details on this.

3) Doesn't require a reboot to install.

4) Was deemed so awesome it is built right into Vista and Windows Server 2008 (Data Collection Sets)

 

I'm not going to dabble into the trending and reporting server side of SPA as that would require a lot more typing but like I said if you install SPA, you can read the chm about scheduling tasks and trending.  I just wanted to point it out because some people might not have a monitoring solution where you can do some rudimentary trending and this could be a free solution.

 

The install

Double click MSI, leave defaults.

How and when to use

We're going to be focusing on how to use SPA to troubleshoot, lets look at an example of that.  SPA is useful at narrowing down resource issues on a system with regards to processor, memory, network, and disk.

Last week we had a WINS server that was throwing database errors and so our team was engaged.  I installed SPA using the steps above,I then could have used the GUI to launch SPA and start a collection (default 300 seconds), but this is the faster way (the way I use).

1) Navigate to the SPA directory, if you installed on an x64 system it will be under "Program Files (x86)", otherwise just "Program Files\Server Performance Advisor"

2) Since I want just a system overview report I ran spacmd start "system overview"

a) At this point the collection starts and you should see some processes labeled plahost running in task manager.  You can let this run for 300 seconds but in my case I just needed a quick 30 second snapshot since the repro was constantly happening.

b) If you installed this on a domain controller you could do spacmd start "active directory" or spacmd start * which would start all the templates you have installed.

3) Now stop the collection: spacmd stop "system overview"

a) At this point as long as you left the defaults during install you should see a new folder under c:\perflogs with the server name and a few files underneath that.

C:\PerfLogs\Data\System Overview\Current\BRAD-SERVER_200706211545>dir
Volume in drive C is C_Drive
Volume Serial Number is 70C4-9FFD

 Directory of C:\PerfLogs\Data\System Overview\Current\BRAD-SERVER_200706211545

06/21/2007 03:49 PM <DIR> .
06/21/2007 03:49 PM <DIR> ..
06/21/2007 03:49 PM 1,673 global_reg.xml //Some registry settings are checked by SPA there saved here
06/21/2007 03:49 PM 1,441,792 system_kernel.etl //A trace file that SPA analyzes during the capture.
06/21/2007 03:49 PM 1,638,400 system_perf.blg //Perfmon binary log file that SPA analyzes from the capture.
3 File(s) 3,081,865 bytes
2 Dir(s) 960,020,480 bytes free

4) Now we need to compile the data we captured into a report: spacmd compile "system overview"

a) Once this is complete, you should see the report in the reports directory.  If using the GUI then the report will show up under reports under System Overview.

C:\PerfLogs\report\System Overview\Current\BRAD-SERVER_200706211545>dir
Volume in drive C is C_Drive
Volume Serial Number is 70C4-9FFD

 Directory of C:\PerfLogs\report\System Overview\Current\BRAD-SERVER_200706211545

06/22/2007 09:35 AM <DIR> .
06/22/2007 09:35 AM <DIR> ..
06/22/2007 09:24 AM 1,721 global_reg.xml
06/22/2007 09:35 AM 2,365 obelisk.ip
06/22/2007 09:35 AM 608,594 report.xml //Double click this one.
06/22/2007 09:34 AM 62,417 report.xsl
06/22/2007 09:35 AM 656 summary.xml
06/22/2007 09:24 AM 6,881,280 system_kernel.etl
06/22/2007 09:24 AM 6,094,848 system_perf.blg
7 File(s) 13,651,881 bytes
2 Dir(s) 963,108,864 bytes free

Analyzing the report

So now that we have the report we can open it up and start looking at it, just double click report.xml and IE should open.  You'll want to allow scripts and ActiveX so that you can adjust the data in the xml doc as it is dynamic.  For example, if you look in the second JPG below on the top right its says "3 of 15" if you wanted to see the top 15 of 15 you could just click the 3 and type in 15, and the report would change.

 

The first part of the report is a summary, and links to other sections pertaining to CPU, Network, Disk, and Memory.  Below that is any performance advisories that SPA flagged for you and then how each of the components were doing.  In the first JPG below, on the right there is a little help icon, if you click the icon it will open a chm file with further steps you can take to narrow down the issue. 

I can't go through each area of concern but you get the idea.  As I was going through the network section I noticed this:

This seemed odd so I filtered my network monitor capture that I took during the same time period for vm-lab-machine and it came back with a ton of 1F registrations and releases for the 1F record for that server like so:

13861 5.703125 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13863 5.703125 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13864 5.703125 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13865 5.703125 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Registration Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13866 5.703125 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13867 5.703125 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13868 5.703125 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13869 5.703125 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Registration Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13870 5.703125 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13871 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13872 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13873 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Registration Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13874 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13875 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13876 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13877 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Registration Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13878 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13879 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13880 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13881 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Registration Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13882 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13883 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13884 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13885 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Registration Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13886 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Registration Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13887 5.718750 VM-LAB-MACHINE BRAD-SERVER NbtNs NbtNs: Release Request for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx
13888 5.718750 BRAD-SERVER VM-LAB-MACHINE NbtNs NbtNs: Release Response, Success for VM-LAB-MACHINE <0x1F> NetDDE Service, xxx-xx-xxxx-xx

I then popped the query 1F Wins Server into live.com and the first hit was the issue.

SPA roles:

There is more than just the "system overview" template, there are templates for AD, print servers, terminal servers, etc.  Each one of these templates focuses on that role and collects different counters depending on the role.  For example, on a DC SPA will capture the DS perfmon counters and then analyze the output from those counter and flag issues it finds for follow-up.

Conclusion:

Using SPA I was able to easily find the network client causing the issue on our WINS server and then correlate that with the network capture.  This is only one example of where SPA has really assisted in narrowing down the issue for me.  One caveat, SPA is CPU intensive when it compiles the report, so if the system is already pegged at 100% its best to compile the report off the the system in question.

If you run into any issues with SPA (only supported on Win2k3), send me an e-mail or drop a comment and I'll try to help you out.

 

Technorati tags: Windows 2003, SPA

IceRocket tags: Windows 2003, SPA