Do you want to easily troubleshoot performance issues in your environment? Wished you could collect Near-Real Time Performance Data from all your servers into one single place?
We heard this ask from many of you, and now we are proud to announce the availability of the Near-Real Time (NRT) Performance Data Collection in OMS. With this new feature, you can now collect any performance counter with a sampling interval as small as 10 seconds, and visualize any of the metrics that helps you easier troubleshoot performance issues.
Enabling the performance data
In order to enable the NRT performance data collection, in the Setting page, click on the Logs tab. In this page, you can search for any performance counter to get collected. There is also a list of suggested counters that you can add right away. In this example, I will be adding the suggested performance counters plus one other that I’m interested in.
Once you click on the Add button, here is the list that would see. All the sample intervals are defaulted to 10 seconds, but you can choose a different interval if you want to.
In order to add more counters, you can use the search bar to enter the exact counter name. OMS will give you autosuggestions for some of the common counters. For any custom counter, you need to specify the exact name. You can use any of the following formats:
All instances of a counter:
Object name\Counter name
Object name(*)\Counter name
A specific instance of a counter:
Object name(Instance name)\Counter name
In this example, I’m also interested to add the Free Megabyte counter.
I select the counter, and then click +. I also want to collect this counter every 30 seconds. Finally I click on Save button to save all the new changes.
Configuring this for the first time, it takes about an hour for the data to be accessible through Search. Once the initial configuration is done, the data will be sent at the sampling intervals.
Using Search to visualize the data
To view this data, I go to search, and type the following query:
Type = Perf
This will return all the performance counters that are being collected. There are two perspective views for this search. The first one (Logs) will return the 30 minute aggregate values for all the performance counters, and the second one (Metrics) will return the visualization of the raw results for each of the returned counters.
Currently, the raw data is stored for 14 days. The 30 min aggregates will be stored based on your data plan with OMS.
In this example, I want to look at the counters for one of my servers which has been running slowly lately. Here is how my query looks like:
Clicking on the + button on the right side of the chart will expand the view to give you more details on any given metric. In this example I’m looking at the Disk Read/sec counter for the last 7 days in more details by clicking on the expand button.
Another common query is to look at a given counter across all you computers. The following query returns the “Current Disk Queue Length” Counter for the past 2 hours across all my servers.
Type=Perf CounterName=”Current Disk Queue Length”
I can now look into all instances of this counter across one of my computers. I also expand all the graphs to see them in more details and be able to compare their values.
Type=Perf CounterName=”Current Disk Queue Length” Computer=”MyComputerName”
Viewing the Live data
You can also hover on the chart to find out the values for any of the samples. If you make your time range small enough (6 hours or less), the charts will show you live data. In this example I’m looking at the last 1 hour data for the Logical Disk(C:) Disk Writes/sec from one of my computers. As you can see the light blue color is showing the live data coming in for the %Processor Time.
With the 30 min aggregate data, you can do different aggregate queries. Here are few examples:
Average CPU Utilization across all computers
Type:Perf (ObjectName=Processor) CounterName="% Processor Time" InstanceName=_Total | measure Avg(Average) as AVGCPU by Computer
Maximum CPU Utilization across all computers
Type:Perf (CounterName="% Processor Time") | measure max(Max) by Computer
Average Current Disk Queue length across all the instances of a given computer
Type:Perf ObjectName=LogicalDisk CounterName="Current Disk Queue Length" Computer=”MyComputerName” | measure Avg(Average) by InstanceName
95 Percentile of Disk Transfers/Sec across all computers (What is the Disk Transfers/Sec of all computers 95% of the time)
Type:Perf CounterName:"DiskTransers/sec" |measure percentile95(Average) by Computer
*Note that you can do any percentile calculation by replacing the "95" in the above query with any number between 1 to 99 (e.g., percentile1, percentile50,percentile99)
70 Percentile of Current Disk Queue Length across all instances of a given computer (What is the Current Disk Queue Length of all instances of a given computer 70% of the times)
Type:Perf ObjectName=LogicalDisk CounterName="Current Disk Queue Length" Computer=”MyComputerName” | measure Percentile70(Average) by InstanceName
If you like to start using this feature, but don’t know the estimated usage for performance counters, this example may help you.
For a particular computer, a given counter instance (e.g., Processor(_Total)\% Processor Time) with 10 second sample interval will send ~1MB per day (~1MB/day/counter instance). You can multiply this number by the number of computers that you have to get an estimate usage per counter instance across all your computers.
Hope you enjoy this new feature in OMS. Please post any feedback or questions on UserVoice.