Process Tracking is a way of associating trace data with a process. This is a very powerful correlation that helps you to understand and isolate issues. For Network Monitor, we had to cleverly capture this information by polling the system. Moving forward we want to rely on the OS for this information, rather than trying to glue the data on top. Process Tracking with Message Analyzer works differently than with Network Monitor, so let’s discuss the differences and why they exist.
The Old Way
Network Monitor collected process information while polling the system and associating the traffic via the TCP/UDP connections. This has various issues, but most importantly as we try to move tracing mechanisms inbox, that solution does not work longer term. It puts a load on the system that doesn’t have to be there when the OS can supply the information.
The New/Old Way
Surprise, but ETL actually contains process information already. It’s been there for a while. For each event, there is an associated process ID. Additionally, there are kernel system trace events that map which processes names are associated with which process IDs. Using these two pieces of information, we can create a table and expose this information as a new property called ProcessName. The trick is you need to capture data that has the process information from the Kernel Tracing events.
Using the Process Tracking feature
If you already have data with the required information, you can add ProcessName as an Analysis Grid viewer column, or execute a Group command in the Analysis Grid by looking under Global Properties in the Field Chooser Tool Window and selecting the Add as Column or Add as Grouping right-click context menu command, respectively.
You can also bring up the Grouping viewer to obtain a look and feel that is similar to the Network Monitor conversation tree, minus the application icons. Launch the Grouping viewer from the New Viewer drop-down list on the global Message Analyzer toolbar, and then choose Process Name and Conversations in the Layout drop-down list on the Grouping viewer toolbar. Using the new Grouping expansion and contraction controls, we can collapse the top-level groups and get a nice overview, as shown in the following illustration:
Note that the <Others> category in the illustration represents all messages that do not have a process ID and the row items with a process ID number in parentheses represent process IDs that have no associated ProcessName. Additionally, there are some known improvements we are already fixing for the next version which should make messages in the <Others> category more rare.
Grouping Viewer Multiple Selection
An advantage of the Grouping viewer compared with the Network Monitor conversation tree is that you can multi-select items in the Grouping viewer. There are cases where the trace data from non-NDIS components contains partial information. For instance you can collect a trace using the InternetClient scenario by running the following command line:
NetSh trace start scenario=InternetClient Capture=yes
The result will contain network traffic in addition to provider data from TCP and AFD. These additional pieces of data sometimes contain only the Source Address or Source Ports, and leave the Destination as unspecified. In the Grouping viewer, this results in the data getting split into two categories, as shown in the example below for 192.168.1.14:43685.
So rather than get the partial story, you can select multiple nodes in the Grouping view and factor in more data. While not all the trace information may be relevant, some of it likely is because it shares the same TCP client port, 43685.
Below we select TCP:43685 – HTTP(80) and TCP:43685 – unspecified in the Grouping viewer. The result is that new AFD and TCPIP messages show up, provided that the Grouping viewer Filtering Mode is active (click the funnel shaped icon on the Grouping viewer toolbar).
Using NetSh to Collect Data with Process Information
When you use NetSh, the process information is automatically collected. For instance, the following NetSh command will capture traffic with process information on your local network interfaces. The results of executing this command are essentially the same as running the Message Analyzer Local Network Interfaces scenario, or clicking Start Local Trace on the Message Analyzer Start Page, except that it also collects the process information:
NetSh trace start capture=yes
Other NetSh scenarios include even more information, plus you can customize your own. For instance the InternetClient scenario also adds in the TCP and AFD layers, which can help you when resets occur:
NetSh traces start Scenario=InternetClient capture=yes
Using Logman to Collect Data with Process Information
We can’t support capturing ProcessName data within the Message Analyzer UI directly. It turns out the Windows Kernel Trace, which supplies the process names, isn’t enabled like every other provider. I think we can capture this data natively in the future, but this current implementation is the first step.
You can enable the collection separately and then combine the data later by utilizing a Data Retrieval Session (select Files in the New Session dialog). Before starting your trace with Message Analyzer, run Logman manually with the command line that follows. After stopping the trace in Message Analyzer, you can then use the second command below to stop the Kernel Logger trace.
logman start "NT Kernel Logger" -p "Windows Kernel Trace" "(process,thread,net,file)" -o kernel.etl -ets -ct perf -bs 1024 -nb 20 20
logman stop "NT Kernel Logger" -ets
Once the Logman trace is running, select Start Local Trace from the Message Analyzer Start Page, (remember to start Message Analyzer as Administrator first).
After saving the Message Analyzer trace and closing it, you will have two sets of data which will be combined. Click New Session on the Start Page to display the New Session dialog and then click Files to start Data Retrieval Session configuration, from where you can Add Files, such as the *.etl and *.matp files that contain the trace data you previously collected, as shown in the following illustration:
How Relevant is the Process Name?
Maybe this sounds like a funny question to ask at this point. However, the process information that ETW provides has some caveats which make it less than perfect. Like our Network Monitor process capturing feature, there are some important things you need to know.
When an ETW message is logged, it takes the process information from the current running process and stores it in the ETW message ProcessId field. This value can be found in the ETW layer of any message that is captured by using the methods described above. In Message Analyzer, you will see a ProcessId field and value in the Details Tool Window for selected messages, as shown below:
However, for incoming network traffic from the NDIS layer, ETW doesn’t have time to go locate the correct process. In fact, it uses the process ID that is currently being executed on the CPU when the incoming packet arrives. This has some implications on the relevance of the ProcessId and ProcessName.
How to interpret the Process Name for the NDIS ETW Provider
To boil down the implications, we can make some general statements about the ProcessName and ProcessId when diagnosing issues.
- Outgoing traffic is accurate – The ProcessName will be accurate if the process that generates network traffic sends data to make a connection. Even though responses are coming in the other direction, they get paired with the request as operations for most traffic. So this allows you to properly associate the traffic to the Process. But still, TCP 3-way traffic and ACKs with no payloads can be disassociated from the traffic with the right ProcessName. And of course Operations don’t exist for every protocol yet, so there are exceptions.
- Incoming traffic is not accurate – The ProcessName may be inaccurate if a server receives traffic. The ETW system doesn’t have time to go track down the right operation, so it uses the thread local data for the process name and ID. What you often see is that the busiest app contains a lot of messages.
ProcessName for WFP ETW Provider
We include a Windows Filtering Provider, which is included in some of our built-in trace scenarios to capture Unencrypted IPSEC, Tunnel, and Loopback traffic. The limitations are much the same concerning incoming and outgoing traffic. But perhaps moving forward this is one place where we can improve the WFP driver tracing.
King of Correlations
ProcessName is a key correlating factor when trying to nail down a difficult problem. And while this solution doesn’t cover every scenario, it adds a new tool which is extensible so we can continue to improve it over time. Rather than hold back and try to perfect the correlation, we decided to deliver it now as it can be useful in many situations.
To learn more about some of the concepts discussed in this article, see the following topics in the Message Analyzer Operating Guide: