Relating and Correlating Data Sources


We often talk about correlation with Message Analyzer. The word relating might be a more simple word to describe the process of taking two different logs or traces and making sense of them together. Let’s discuss how Message Analyzer helps you accomplish correlation of different data sources.

Correlate What?

Messages often expose key elements in the data that allow you to relate them. The uber correlation element is time, which is the often the most relevant. By loading two data sources into Message Analyzer, you can relate an error in one log with data in another by simply relating the timestamps. For example, you could create a Data Retrieval Session to load messages from two trace files by using File->New Session->Files, which displays the dialog below. You can click the Add Files button to display the Open dialog and locate the target traces that contain the data you want to load into Message Analyzer:

clip_image002

However, if you already opened a trace and started a session, you can click the Edit Session button from the global Message Analyzer toolbar to reconfigure the session with additional files that contain other target data.

clip_image004Note that by default, the existing session configuration will open in Restricted Edit mode, which means you can only add files while in this mode, as indicated by the Information bar below:

clip_image006

After you add one or more new files to the list and Apply the session changes, the data will be parsed and added to the existing message set. However, it might be necessary to reparse the original file data as well, if the data in the added trace files relates to the original data. For instance, if you add a chained capture or a file with process information, you’ll need to use the Full Edit mode which results in a full reparse of the original trace data, along with some impact on performance.

Adjusting for Time Differences

After you’ve loaded all your data, you might notice that it is not chronologically aligned. It could be that some log files require that you configure Message Analyzer manually with time zone information, which is needed to line up with traces that do, for example, files in the .etl, .pcap, or .cap format.

You can accomplish this from the Shift Time dialog that displays when you click the Shift Time button on the global Message Analyzer toolbar.

clip_image008By using the Shift Time dialog, you can change the timestamps of messages in selected traces (data sources) by choosing presets that make adjustments for different time zones. You can also adjust message timestamps by fractions of a second, for instance, when you need to more accurately align the data in remote log files, as shown in the dialog below:

clip_image010

Other Correlations

Time is the simple and obvious correlation, but there are many others. The next obvious one is TCP and IP conversations. By associating IP address pairs you can see what two machines are talking about. This lens into the process provides a high-level view of what one machine might be doing with another. The next level into the process from IP is the TCP conversation, which organizes each conversation into TCP Port pairs. For example, common ports such as HTTP:80 communicate with client-specific ports that are unique. Often, a single web page can trigger multiple TCP conversations to many different sites, but it’s hard to comprehend them separately. Correlating this data is the method to understand the madness.

It turns out, multiplexing conversations over a protocol occurs in other places in the stack. For instance, SMB can be correlated in at least the following 3 ways:

  • The file name that is referenced when you open a remote file, as exposed by a property called FileName.
  • The user who initiated the SMB session, which is indicated by a UID.
  • The TreeID, which represents a connection to a Share, for example, \\Server\Tree1.

By relating messages that have the same correlation key, you can bring together different pieces of data from multiple sources to facilitate better analysis.

For the most part, correlation occurs on fields and properties. These entities relate to a range of values, but can have different names depending on the trace source. For example, a network trace might reference the SMB Tree ID as a TreeID field. However, a text log could use a different name such as TID. Message Analyzer exposes tools and defaults thatenable you to resolve and exploit these differences, which can help you to quickly correlate data from multiple data sources.

Let’s dig in and explore some of the built-in defaults that you can use as correlation techniques and how you can extend these defaults to virtually fit any correlation that you require.

Grouping

If you don’t know about it already, shame on you. J Not only does Message Analyzer allow you group and compare data with-in the Analysis Grid viewer, you can also use a separate Grouping viewer which entirely reorganizes the view of your session data, based on the grouping of correlation factors.

Again, a basic correlation for network traffic is by IP addresses and TCP ports. If you use the Message Stack and the new Details view, you can expose options for grouping. By selecting an IP message from the Message Stack view and setting the Properties mode in the Details tool window, you can locate a property called Network which represents the IP address source/destination pair as a string value, as shown in the figure below:

clip_image012

This property is exposed for every IPv4 message. In fact, it’s populated for IPv6 and Ethernet as well. By right-clicking and grouping on this value, you can collate your network traffic into groups in the Analysis Grid viewer so you can understand each one separately, instead of viewing them as a mass of unrelated messages.

clip_image013

As you can see in the following figure, not only are IPv4 addresses grouped, but also IPv6 and Ethernet addresses. This is because the Network property can be exposed by any protocol.

clip_image014

Message Analyzer can also group multiple levels, so next you can explore TCP by using the same steps as above. When you do that, you will find a Transport property which represents the Source/Destination Port pair. Again by configuring a Grouping using either the right-click method, or alternatively by clicking the (clip_image015) button in the Details tool window, traffic is now organized by both the Network address pairs and the TCP Source/Destination Port pairs in nested fashion. By looking at the blue number in parenthesis, as shown in the figure that follows, you can tell how many children each conversation has. The pointer below shows an Address pair with a “4” designator, which means there are 4 Transport conversations for this particular pair. In this case, it could potentially mean more interesting traffic than if it had only one child Transport pair.

clip_image016

In fact, this particular type of grouping is so useful, it is provided in several built-in view Layouts for the Analysis Grid and Grouping view, with some different variations. For instance, the TCP with Network Grouping layout shows the Network and Transport grouping configuration, with a major difference being that additional TCP related columns are included to assist you with network level diagnostics.

clip_image018

Grouping Viewer

I’ve already discussed the Grouping View in some detail in the Grouping Viewer blog. It provides this same kind of data organization, but in this case, it is separate from the Analysis Grid. By having the ability to multi-select messages in the Grouping viewer tree, you can drive message selection in the Analysis Grid viewer and see loosely correlated data together. For instance, some providers, such as AFD, might provide only a Destination IP address when logging a diagnostic. However, by selecting both the specific source/destination port TCP traffic and the AFD high-level data, you can understand if any of the AFD messages and the network addresses are related.

Unions

As mentioned earlier, sometimes the data that you import from different files can have fields that are named similarly, but not identically. So grouping on a ProcessID field, isn’t going to help if some of your trace data or log files use PID as the field name instead. While it’s possible to provide a more rich programmable correlation for any types of fields using OPN, Message Analyzer contains a built-in mechanism to create a Union for simple cases where the names just don’t line up.

To create a union, click the New Union button on the global Message Analyzer toolbar. This action displays the Edit Union dialog, which enables you to provide a Union name and to select fields that are identical in meaning, but with disparate names. In the example below, I’ve opened a built-in Union from the Unions drop-down list, which associates (correlates) SMB1’s TID field with SMB2’s TreeID field and Samba SysLog’s TID field.

When you add the SMBTID field as a column, grouping, or even a filter, you automatically reference the same value from any of these multiple SMB data sources. Note that any new Union that you create displays in the Details tool window in Properties mode, under the Global category. You can also explore Unions in the Field Chooser under a top-level node called Unions.

clip_image019

Meaning from Chaos

I frequently hear stories about how users have multiple views of their data in different windows. They manually relate the data in a process which must require great concentration and determination. With Message Analyzer, our goal is to make that process less tedious and more productive.

More Information

To learn more about some of the features and concepts described in this article, see the following topics in the Message Analyzer Operating Guide:

Configuring a Data Retrieval Session

Editing Existing Sessions

Setting Time Shifts

Applying and Managing Analysis Grid View Layouts

Message Details Tool Window

Using the Analysis Grid Group Feature

Grouping Viewer

Configuring and Managing Unions


Comments (0)

Skip to main content