View your Agent Health in OMS


Summary: Learn how to view and monitor the health of your Windows and Linux agents in OMS.

Hi everyone, Nini here, and today I’m happy to announce that you are now able to view and monitor the health of your Windows and Linux agents in OMS. With agent heartbeat events, you can gain insights into the performance, availability and geo-location of your agents. If you have SCOM agents or direct agents connected to OMS, we will automatically start to generate and collect these agent heartbeat events. Best of all, they do not count towards your data volume.

First, go into Log Search and input Type=Heartbeat:

Screenshot of results when you type “Type=Heartbeat” in Log Search.

Now that you can see a list of all computers and servers with Windows and Linux agents installed, you can view the following fields:

Table of fields for agents.
Visualize Heartbeat Events

Using the View Designer, you can create an Agent Health solution to visualize these results. You can dive into key counters that you care about, such as the distribution of your agents and agent heartbeat overtime.

Be sure to enable the View Designer feature in “Preview Features” below, before following the next steps.

Screenshot of the Preview Features tab where you enable the View Designer feature.

Next, open the View Designer, and populate each blade by using the following queries. In this example, I’m populating my Overview tile to see the total number of agents in my environment and the Agent Count over Time blade to view a trend of my number of agents over time.

Agent Health solution tile.

For the Overview tile, I use these queries to get the total number of agents, and a count of unresponsive agents in the last 24 hours:

  • Type=Heartbeat | distinct Computer
  • Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-24HOURS

In the Agent Count over Time blade, I use this query to get a trend of my number of agents over time:

Type=Heartbeat | measure countdistinct(Computer) by OSType

I use this query to get a list of all my computers in the “All Agents” list right below the trend:

Type=Heartbeat | Distinct Computer

Agent Health Overview.

For the other blades in my Agent Health solution, I use the following queries:

Queries for other blades in my Agent Health solution.

Screenshots of the completed solution:

Agent health overview for the completed solution.

 

Agent health overview for the completed solution.

And voila! You’re can now get a quick sense of your agent health and set up from a glance at this dashboard.

Create Alerts to track Agent heartbeating

In addition to visualizing heartbeat events, you can also create alert rules so that you get notified when an agent has stopped heart beating for 24 hours or longer. You can simply do this by creating a log search and then selecting the Alert button to create and configure the alert.

Using this log search, Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-24HOURS, here’s what the alert rule should look like:

Add alert rule options.

Click save, and your alert should start running immediately. To read more about how to set up alerts in Log Analytics, see Alerts in Log Analytics.

What's next?

We are currently working on an Agent Health OMS solution so that you can have a more in-depth and insightful visualization in OMS.

That’s all I have for you today. If you have any feedback or questions, please comment below.

Priscilla Nini Ikhena
Microsoft Operations Management Team

Comments (11)

  1. Timothy says:

    Why is so much development effort going into OMS, which nobody uses, and almost none going into scom?

    1. Jasenko says:

      Hi Timothy,

      I don’t agree with your comment. We need wide OMS functionality and OMS vs SCOM comparison should stop.
      If no effort goes into further developing OMS no one will ever use it an it won’t have a chance.
      In Certain scenarios OMS is a better option then SCOM .
      It all depends on the situation(business).
      Don get me wrong, I’ll take SCOM any time any place over competitors, but when it comes to cloud monitoring(SaaS to IaaS monitoring) OMS should be one of the basic tools to use.
      I agree that OMS misses certain basic tooling which is commodity in SCOM. Once again the twoo should not be directly compared.
      Did you look at the SCOM 2016 roadmap, it looks pretty decent and I believe that community had enough input 9as it should).

  2. M.mathew says:

    Am seeing incorrect values show up for “ComputerIP” field.Any thoughts?

  3. Maibok says:

    What do you mean when you say “IsGatewayInstalled: true if a Gateway is installed”? Are you talking about SCOM Gateway or OMS Gateway?.

    I have installed an OMS Gateway (OMS Agent + OMS Gateway) and this property is marked as “false”. Is this behaviour normal???

    Apparently all is working well. One direct agent is reporting to OMS Gateway and OMS Gateway is reporting to OMS Portal. Moreover “Count of Gateways Installed” view in dashboard is 0.

    1. Sambit Mishra says:

      IsGatewayInstalled value indicates whether the OMS Gateway is installed on that particular machine. Are you seeing the value of this as “false” for heartbeats from that machine ?

  4. jens says:

    Hi, thank you for the really helpful article but unfortunately I cannot confirm it, the alert does not work. I shut down one of my Azure VM and it still listed with actual data under column timegenerated when I use: Type=Heartbeat Computer=”ComputerName”. I dont know what LastCall is but in my mind this is the wrong parameter to search for, because the call runs against the VM every few minutes, although its turned off. Is there somethink like LastAnswer or anything else? I adapt your query to test to Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-1MINUTES and never get an alert of the turned off machine. (machine is successfully connected to OMS via MMA). Any hints? Thank you

    1. jens says:

      I know LastCall = measure max(TimeGenerated).

  5. Rahul Shinde says:

    Is there any way of having the complete agent health report for a month in details showing which resource was down and when along with graph either line or bar graph.

  6. Chris Clark says:

    Hello Priscilla,

    Thank you for adding agent health into OMS , this will help identify computers that are unresponsive. However I did have a question in regards to generating alerts. If a device becomes unresponsive an alert should trigger specifying that particular device. The current monitor will generate an alert for all devices if only a single computer becomes unresponsive. Is this the designed behavior, if so then an alert will need to be created per device unless i’m missing something.

    Kind Regards

  7. Mario says:

    Why is the default 24 hours? Wouldn’t you want to know your server went down in like 5-15 minutes? Also, how do you change the default dashboard from 24HOURS to something reasonable?

  8. Once a systems loss heartbeat how you restore Heartbeat collection? The system never went down or disconnected but OMS identified as if it was. When you check the Overview tile every day there are less and less agents accounted and running. (first day 88, 2nd day 87, 3rd day 84, 4th day 80 in the morning and 79 on the evening) is this normal behavior for OMS? and all server are up an running giving services to the clients.

Skip to main content