Trace data can be vast. 4GB ETL files aren’t rare, and network traffic is infinite in variation. Let’s discover some ways to address memory issues when dealing with large traces.
Memory Garbage Collection Options
Garbage collection is a memory management technique that automatically reclaims memory that is no longer in use by the program. Message Analyzer is configured to use the Server Garbage Collection mode by default, which enables higher throughput and performance, but consumes more memory when available. The theory is that you can wait to the last second and only collect garbage when it is critical. During our testing, we discovered that setting gcServer enabled=”true” was optimal, but we intend to continue validating this through telemetry.
You can check the memory usage by using our new Options dialog, Tools->Options->Memory. This might help you understand what is happening with regards to memory usage. You’ll see the memory values update continuously.
You can modify the indicated setting by editing the file "C:\Program Files\Microsoft Message Analyzer Internal\MessageAnalyzer.exe.config". In the file, you can set the “gcServer enabled” entry to True or False, as appropriate.
Edit Session: Load What You Need
You have other options with static traces to limit what Message Analyzer reads into memory, where you constrain the data that is loaded through a Time Filter, Session Filter, or Parsing Level. A Time Filter works best if you have a time range in mind to start with. For example, you might know the time frame in which some issue exists that you want to isolate. Otherwise, you can arbitrarily retrieve some fraction of your data by configuring a particular time window in which to view results. In either case, you will create the time window by adjusting the Time Slider controls in the Session configuration dialog, which should improve performance and reduce memory consumption when you load the data. Note that you also have the option to edit a session that is currently loading by clicking the Edit Session button on the global Message Analyzer toolbar. There’s no reason to wait for loading to complete if the trace file you are opening contains a high volume of data and you wish to maximize performance, in which case, you might want to reduce its memory footprint with any of the methods mentioned in this article.
You can also start a Data Retrieval Session from New Session->Files and configure a New Session prior to starting to load data. In either case, a similar dialog opens, as shown in the figure that follows. In the case below of editing an existing session, you would need to click the Full Edit button in the Edit Session dialog to add a Time Filter, because the session is already created. Once you commit the changes by clicking Apply, the session will start over and reload if you have changed anything more than the input file configuration while in the Full Edit mode.
Any of the following configurations that you add to a new or existing Data Retrieval Session can reduce Message Analyzer’s memory usage:
- Time Range – the easiest thing is to set a time window range for your data. You’ll need to select the Use Start Filter and Use End Filter check boxes to enable the time window slider controls and the text boxes that contain start and end time values, as you can only use the sliders if the start and end times are known. With some types of data, Message Analyzer cannot detect the start time, or may not be able to detect both the start and end times. This is optimized in many cases because Message Analyzer doesn’t have to parse messages for certain file types in order to read the timestamp. Note that parsing is where you pay the highest price in terms of memory and CPU usage.
- Filter – you can specify a Session Filter to reduce the amount of data being loaded, which frees up memory because the filtered-out data’s memory will be released. This still might take just as long to load, but afterwards more memory will be available to you.
- Parsing Level – by setting a Parsing Level, you can limit how far up the stack Message Analyzer parses and you can alter which protocol parsers Message Analyzer loads, based on your input data. If you specify the High Performance Capture without Parsing option in the Parsing Level drop-down, as shown in the figure that follows, no parsing occurs at all. In this case, even network messages will appear as raw ETL. The Network Analysis Parsing Level will parse up to TCP/UDP, which will save the cost of maintaining some state information that is required for reassembly of TCP fragments. Keep in mind that the Parsing Level drop-down list does not appear in the Edit Session dialog until you click Full Edit in the Restricted Edit information bar.
Where is My Memory Going?
You might ask, “Why does it take so much memory?” Rather than simply “parsing” the data, Message Analyzer also “analyzes” the data, which includes validation and reassembly of network data, and collapsing messages into operations. In order to do that, we leverage our extensive parser set that models protocols such as TCP. To achieve this modeling and to optimize for speed, Message Analyzer requests more memory from the underlying .NET framework, which for its own optimization purposes, subsequently requests more memory from the underlying operating system—and in some cases doing so very aggressively. So some of your memory is simply not available because the system has not yet released it.
There are also cases where the number of messages processed contributes to the overall memory usage. There is constant usage based on already loaded parsers and new parsers are loaded on demand as new protocols are detected. There is parser state, but for the most important parsers, memory usage doesn’t grow infinitely. However, there are linear caches that are applied to all detected messages, so as the number of messages increases, the overhead can be significant. We will try to improve this but in the meantime, if you focus your analysis on a specific time window, it can help alleviate this problem.
That’s not to say there aren’t places where we can do better. We are continually tuning our memory usage, as we hope to make the base memory footprint even smaller.
Memory is King
Let’s face it, the more memory you have, the faster things go. Yes, maybe 4 gigs is enough, or perhaps 24 gigs is your sweet spot. But it really depends on the load. The thinking has always been memory is cheap. But the scale of the data everyone works with is up trending just as fast, and in practice, you can’t easily add 24 gigs on a whim. Hopefully, the options presented here will help you better cope when loading big data.