Which servers are DOWN in my company, and which just have a heartbeat failure, RIGHT NOW?


 

 

 

 

 

In OpsMgr 2007, when a agent experiences a heartbeat failure, several things happen.  There are diagnostics, and possibly recoveries that are run.  Alerts, and possibly notifications go out.

But what happens if my Operations team misses on of these alerts?  What can I do to "spot check" agents with issues?

Well, any time an agent has a heartbeat failure, we gray out the state icon of the agents last known state for in each state view. 

However – you CAN create a State view that will turn Red or Yellow just like any other state views.  Simply create a new State View, and scope the class to Health Service Watcher (Agent).

I called mine Heartbeat State View:

image

This view will show us when any of the agent health service watcher monitors are unhealthy:  In my case – OWA and EXCH1 have issues.  OWA is DOWN, while EXCH1 agent healthservice is stopped.

image

However – here is the issue.  This view shows us when ANY monitor rolls up unhealthy state…. this includes heartbeat failures AND computer unreachable (server IP stack is down):

image

What if I want a State View – to ONLY show me computers that are DOWN…. as in… not heartbeating AND not responding to any PING?  Most customers consider this their "most critical situation".  Well, I haven’t found an easy way to do that…. so I wrote a report which handles it.  This report will query the OpsDB for the state of the "Computer Not Reachable" monitor, and only display those servers.  It is based on the following query:

SELECT bme.DisplayName, s.LastModified as LastModifiedUTC, dateadd(hh,-5,s.LastModified) as ‘LastModifiedCST (GMT-5)’
FROM state AS s, BaseManagedEntity as bme
WHERE s.basemanagedentityid = bme.basemanagedentityid AND s.monitorid
IN (SELECT MonitorId FROM Monitor WHERE MonitorName = ‘Microsoft.SystemCenter.HealthService.ComputerDown’)
AND s.Healthstate = ‘3’ AND bme.IsDeleted = ‘0’
ORDER BY s.Lastmodified DESC

You can import this report if you have created a data source as shown in my previous post: 

http://blogs.technet.com/kevinholman/archive/2008/06/27/creating-a-new-data-source-for-reporting-against-the-operational-database.aspx

Import this report into your custom folder… and run it.  You can schedule it to receive it first thing every day… if you like the output:

image

*****  Update 6-30-08  I removed a section of the original query relating to maintenance mode.  We found that if a down server had never been in maintenance mode, the server would not show up in the report.  The query and report download have been updated to address this.

Report is attached below:

Servers_Down_Report.rdl

Comments (21)

  1. Anonymous says:

    SQL 2008 R2 is not supported at this time for upgrading existing installations.

  2. Anonymous says:

    Hi Kevin

    Trouble is by creating the monitor you mention you are actually duplicating work that OpsMgr is doing. It sort of highlights the lack of logic in some functionality.

    To me, it makes no sense that I have to do a ping script as a monitor when OpsMgr has a much more powerful solution – agent heartbeat with associated ping of servers on which the agent heartbeat has been missed. I just need to get that information into the console …. and the fact that OpsMgr can’t is a something of design flaw.  

    As I mentioned on the newsgroups, I don’t think the report is feasible for near real time info in a large environment.

    Cheers

    Graham

  3. Anonymous says:

    You are correct – It looks like in this RDL file I named my data source "Ops" instead of "OpsDB".

    Simply open the RDL file – edit that, and import….. or simply go to your imported report – edit it – change the data source to your live data source that points to the opsDB.

  4. Anonymous says:

    Here is a unique way to use web page views in the OpsMgr console. You can create a web page view in the

  5. Anonymous says:

    YES!  I do.  🙂

    In R2 – this is super easy – because we can subscribe to alerts rule by rule – monitor by monitor.

    In SP1 – it is doable – just a bit more difficult.  Please see my how to post at:

    http://blogs.technet.com/kevinholman/archive/2008/10/12/creating-granular-alert-notifications-rule-by-rule-monitor-by-monitor.aspx

  6. Anonymous says:

    Nice report. Works well,and I learnt a thing or two about SRS along the way. I did have to refer to Marnix’s blog about importing rdl files, but then it all came together.

    Thx Kevin

    John Bradshaw

    http://thoughtsonopsmgr.blogspot.com/2009/12/how-to-upload-rdl-file-for-sql-server.html

  7. Anonymous says:

    Hi,

    I have a different problem to the same topic. If a server goes down I do not receive any alerts. When I open Health Explorer with the above settings, I see only white bullets under Availability except Local Health Service Availability. Computer not Reachable, … are disabled in their sealed MP. What is wrong in our configuration and what do I have to change to get an alert when a server goes down?

    Thanks

    Hendrik

  8. Anonymous says:

    Hi,

    I do not understand how to IMPORT the report in to the new Custom report folder.

    I notice that the reports on my reporting server are *.rpdl but this attachment is *.rdl.

    How do I get this report into the new folder?

    Thx,

    John Bradshaw

  9. Anonymous says:

    Ahhh .. didn’t read that properly before I posted!! Meant the fact that agent health state couldn’t be incorporated into the computer state view is something of a flaw … realise there are the agent health state views as per my posting in the newsgroup 😉

  10. Anonymous says:

    One note to add – in OpsMgr you will get a distinct alert whenever an agent doest not respond to ping, in addition to the heartbeat failurre alert.  What we dont have – is a state view JUST for computers that are down…

    You could easily write a custom monitor that runs a ping script – and build your own state view for this in the console… and not need this report.  The benefit of the report is being able to schedule it and deliver via email or sharepoint.

  11. Anonymous says:

    Upgrade from SQL 2008 to SQL 2008 R2 is unsupported as of this writing.  

    It is on the roadmap, and will come with the release of the next CU.  This is a blog post on its own on my blog with details.

  12. Anonymous says:

    Crud, is this still the case?  That you can't upgrade SQL 2008 to SQL 2008 R2 on a SCOM box?

    I'm getting the same error, failing on "No Custom Security Extensions" and "No Custom Authentication Extensions" during the "Upgrade Rules"  part of the upgrade.

    So what's the solution, to uninstall everything, reinstall SQL R2 & Reinstall/Reconfigure SCOM?  Or is an upgrade path on the horizon?

  13. StuartR says:

    With MOM 2005, we can accomplish this quite easily using the following approach:

    An SERVER DOWN alert can be generated in response to an internally-generated ping failure event created by a MOM Agent ping script which is part of a MOM Agent connectivity rule.

    This rule monitors for the internal failure event and will generate a "Service Unavailable" alert indicating that the Agent Computer is most likely down (or has lost network connectivity).

    Note that 12 ping attempts over a 90 second period along with an additional ping after 15 minutes must all have failed before this alert is generated.

  14. REN4 says:

    I tried to use this UDL file by following the steps as mentioned in this site. When i run the report getting this error "An error has occurred during report processing.

    Cannot create a connection to data source ‘ops’.

    For more information about this error navigate to the report server on the local server machine, or enable remote errors ".

    Please advise.

    Ren

  15. mccreerJ says:

    Do you know of any way to setup subscriptions for only "ping failed" notifications?  Right now every time a server fails a heart beat and cannot be pinged we receive two text messages.  One for the heart beat failure and one for the Ping failure.

  16. reto says:

    Nice Blog, thank you Kevin

    I wanted to import your report (Servers_Down_Report.rdl) but that was written with Report Server 3.

    Now I'm updating SQL 2008 SP1 to R2 but the setup failed with two errors:

    – RS_NoCustomAuthExtensions (The Report Server has some custom authentication extensions configured)

    – RS_NoCustomSecurityExtensions (The Report Server has some custom security extensions configured)

    can you please help?

    Thanks,

    Reto

  17. Anonymous says:

    ,Belstaff Weybridge jacket walnuthttp:// Belstaff Weybridge jacket walnut http://www.belst belstaff olivers mount blouson black mojave affsonline.com/belstaff-men-signature-hand-waxed-leather-weybridge-jacket-black

    Immigrants have helped pay the nation’s bills

  18. Anonymous says:

    Antibiotics are not a proven treatment for asthma, yet the drugs ar Belstaff Waxed Cotton jacket care e frequently pr belstaff mojave jacket price escribed to child asthma sufferers, according to a new study.

    The results show 1 in 6 U.S. children

  19. Anonymous says:

    Pearl Have remai michael kors handtaschencheap – and outlet also relatives ns the full declare for duress, ray ban sunglasses outlet outlet – falling to the ground your Marine corps patrol all over armored cars and trucks, any helicopter hovering on the

  20. Anonymous says:

    Printed Silicone Wristband Awareness Bracelets, custom wristbands no minimum Livestrong Bracelets, Solid Color … Custom
    http://www.silicon-wristband.com designed, high quality, solid color Tyvek (R) event wristbands.if you wanna know buy silicone wristbands

  21. Anonymous says:

    we supply cheap wrist bands,wrist-bands.com,if you need cheap wrist ban
    http://www.silicon-wristband.com ds,contact us.the more you choose ,the lower price you get,the silicone wristbands from china!our site is
    http://www.silicon-wristband.com welcome