What do Microsoft Support Engineers look for in the results of the MPS Reports diagnostic tool?
The Microsoft Product Support Report (aka MPS Reports) is arguably the most used diagnostic gathering tool used by Support Engineers to diagnose and correct server and environmental technical issues. The MPS Reports are meant to be used only by Microsoft personnel (according to the End User Licensing Agreement), but I can discuss with you what we look for in them. The tool runs as a series of batch file commands in a command window. Most support teams have their own version of the tool that can be downloaded here. The Microsoft Knowledge Base Article that discussed it is found here. This discussion with be limited to the Exchange specific version. One important thing to note is that the tool makes no changes to the server other than the addition of the diagnostic, and related files themselves. Ideally, it should be run under an account with Exchange Full Admin and domain Admin privileges to gather the maximum amount of data.
The first thing I look at is the Application and System Event logs (Servername_Application.evt and Servername_System.evt respectively). I use an internal log parser and filtering tool, but there are public tools available that overcome the limitations of the one included in the operating system. A parser tool will allow sorting, excluding, and searching of events not otherwise easily done with the native Event Viewer. The event logs often are the indicators of the nature of a technical problem. They will often tell a story of what occurred during a particular failure. It's advantageous to view them both and compare timelines. For example a failure of Exchange to read or write from storage device as recorded in the application event log will often correlate with the report in the system log of a disk channel hardware problem. The text version of the application and system event logs are included (Servername_Application.txt and Servername_System.txt respectively), so in the event of mismatched DLL files between the source and the viewing computers, I can use them to parse the actual error by searching on a known part of the event using Microsoft Excel or a text editor.
Included with the MPS Report is an XML output file for the Exchange Best Practices Analyzer Tool (Servername_EXCH_exBPA.XML). This tool is invaluable in detecting and offering remedies for many Exchange technical issues. The stand-alone version can be found on http://www.microsoft.com/exchange/analyzers. The version included in the MPS Report is slightly behind the latest, so I'll make sure I'm download the current one from the ExBPA site above. While there, also check out the Exchange Server Performance Troubleshooting analyzer Tool and the Exchange Server Disaster Recovery Analyzer tool. All of the analyzer tools are meant to be run by you, the administrator, for the purpose of diagnosing and providing recommendations so that you may not have to incur a costly and time consuming call Microsoft Support at all. The Exchange Best Practices Analyzer file will not be included if the MPS Report was run on a server that does not have the .Net Framework 1.1 or later installed. It's installed by default on Windows 2003 servers, but not on Windows 2000 Servers. The latest version of .Net Framework can be downloaded via Windows Update.
Exchange depends heavily upon Active Directory and consequentially the Domain Naming Servers' health. If technical issues are present in DNS or the Active Directory, Exchange will often exhibit failures as well. (It's an inside joke that Exchange is the best diagnostic tool.) There are a couple of tools included that will help diagnose those failures. The first is Netdiag. The output of this tool is included in the MPS Report and is named Netdiag.log. This is the same version of Netdiag included with the Windows 200x Support Tools. If Netdiag is run outside of the MPS Report, it produces a netdiag.log in the same directory from which its run. With the netdiag.log file open, I search for the word "fail". Some of the failures are not consequential, such as the IPX, Kerberos (due to a bug in Netdiag) and Retries (although Retries can sometimes be helpful in diagnosing that there is a network problem. Red flags go up if there are failures with locating domain controllers, LDAP or DNS sections of the test. I tell my customers that it's critical to repair any DNS or Active Directory problems before further diagnosing Exchange, as often the remediation of those problem will alleviate the Exchange ones. DCDiag is another sometimes useful tool that generates the Servername_dcdiag file, but only if the Exchange server itself is a domain controller. In most environments, installing Exchange on a Domain Controller is not recommended. See Michael's Blog for more information. The output is included in the MPS Report cab file. It too is the same version included in the Windows 200x Support Tools. If run from the Support tools, it doesn't create a log file by default and must therefore be piped to a text file. When used in the MPS Reports, however, the text file is generated automatically.
Permissions problems often occur in Exchange-related technical issues. For diagnosing those, I use the included Exchdump file (along with the its back end XML file). Upon opening the ExchDumpxxxxx.HTM file, I right click on the warning bar above the main window in Internet Explorer selecting to Allow Blocked Content so the XML file can be accessed, which will contain additional configuration and permissions data when the highlighted portions of the html file are clicked to expand them.
The exact configuration and permission of Exchange related Active Directory objects can be determined with this tool. There is a section called Objects Flagged for Further Investigation. It is common to find some NTDS connectors in the Lost and Found Config Container, for example. They can usually be ignored. Other objects contained in this section I'll want to investigate as they may be duplicates or the results of Active Directory collisions. In some cases deleting objects in the Lost and Found Config Container can alleviate the Exchange related technical issue.
Here is an example of the "Objects flagged for further investigation" section of the report:
Objects flagged for further investigation
-Objects under msExchConfigurationContainer with ACL inheritance disabled:
No objects found
-Explicit Deny (Everyone group) exists on object msExchOrganizationContainer:
CN=Orgname (LDAP://CN=Orgname,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=ad,DC=domain,DC=com)
cn : "Orgname"
legacyExchangeDN : "/o=ABC"
whenChanged : Wednesday, 04/21/2006 22:14:40 (GMT)
->Click for more details...
->Click for Permissions on object...
-The LostAndFoundConfig container contains the following objects:
CN=NTDS Settings (LDAP://CN=NTDS Settings,CN=LostAndFoundConfig,CN=Configuration,DC=ad,DC=domain,DC=com)
Child of CN=LostAndFoundConfig
cn : "NTDS Settings"
->Click for more details...
The Lost and Found Config Container can be viewed with ADSIEdit (also from the Windows 200x Support Tools in the path:
It is not recommended to disable permissions inheritance on Exchange objects it unless there is a compelling reason. Disabling inheritance should be done with caution as it can cause problems with the Exchange environment. If I find inheritance disabled on an object I can then check that object with ADSIEdit for the disabling of the inherited permissions (Properties\Security\Advanced).
Two other useful files are the Windows Info (Servername_._Summary.txt) and Exchange Info (Servername_Exch_Info.txt) files. From the Windows info file I can get the Windows build number, the drive location of the operating system and other useful data. The Exchange info file gives me a listing of all the Exchange databases, transaction log locations and other Exchange related drives and paths, as well as the build numbers for Exchange. If the server is a server cluster, information on each Exchange cluster resource will also be included.
The cab file includes the MSINFO file (Servername_MSINFO32.NFO. It's the same output generated if Start > Run > MSINFO32 (Windows 200x version) or WINMSD (NT version) is typed. When I double-click on it, I can get complete hardware, software and application data. The places I usually look at are the Services list and the Loaded Modules list. Between these two listings, I can find detailed information on what's running on the server, whether it's service is started, and the manufacturer of the loaded module. If I see something suspicious, I can follow up with a search on http://support.microsoft.com, or a search on the Internet. A text version is also included, (Servername_winmsd.txt). I can also get more information on processes and device drivers from the Process listing file (Servername_Process.CSV/TXT and Servername_DRIVERS.CSV/TXT). Some performance information for running processes is included in the Servername_PSTAT.TXT file.
Every installation of Exchange or Exchange System Manager creates a file on the root of the c: drive called the Exchange Server Setup Progress.log (called Servername_Exchange Server Setup Progress.log by MPS reports). This file gives a history of installations and updates on the server. There are typically a lot of errors listed in it that are inconsequential, but sometimes I can tell if something went wrong during an install or update that would affect the current operation of my customer's Exchange server. For instance, sometimes a real time anti-virus file scanner prevents some DLLs loading from an update. I should be able to catch that here. I tell my customer's that if possible, shut off anti-virus scanning while installing updates and patches. If the Exchange server is part of a server cluster, updates and patches should be loaded only on the passive nodes. See Microsoft Knowledge Base Article KB328839 for more information.
DR Watson is the module that generates a log file whenever an application faults. Its log file is included here. Once opened in notepad, I ctrl-end to get the cursor to the end of the document and then search up for "exception occurred" (without the quotes) to find the last application exception that occurred. For example, in the Dr. Watson log from my desktop, the last exception was incurred by Network Monitor:
Application exception occurred:
App: C:\WINDOWS\System32\NetmonFull\netmon.exe (pid=5980)
When: 5/27/2006 @ 16:48:09.923
Exception number: c0000005 (access violation)
Outside of MPS Reports, one can locate the Dr. Watson files by running "drwtsn32" from the Run menu.
Servername_BOOT_INI.TXT lists the contents of the boot.ini file. Here it can be confirmed if the /3GB and/or /USERVA=xxxx switches are in place on that file. If they're not, there would have been a warning in both the event log and the Exchange Best Practices Analyzer reports. See Microsoft Knowledge Base Article KB810371 for more information.
Also included and often useful are the cluster log from the node (Servername_CLUSTER.LOG), the cluster registry hive (Servername_CLUSTER_REGISTRY.HIV), the Exchange registry hive (Servername_EXCH_REG.TXT), DSaccess information (Servername_EXCH_dsaccess.TXT), a complete list of installed hotfixes (Servername_HOTFIX.TXT), SMTP bindings (Servername_EXCH_smtpreg.TXT), application setup logs created by the Microsoft Installer (Servername_SETUPACT.LOG, Servername_SETUPAPI.LOG and Servername_SETUPERR.LOG), network information files (Servername_NETINFO.TXT and Servername_MISC.TXT), the Metabase (IIS Configuration database) output to a text file (Servername_Metabase.TXT and Servername_Metabase.xml), IIS information files and registry hives (Servername_IISREG.TXT) and .Net Framework information (Servername_.NETFramework.TXT/CSV).
More files are included (too many to list), but this blog post was meant to touch upon the most often used ones. When working with a customer, I like to be sure to let them know why I'm asking them to perform a particular action. Hopefully this blog post will help toward that goal.