Monitoring – a Key Activity to a Trustworthy Infrastructure?

As you might have read, I recently blogged about my infrastructure and the future of a platform towards a better management of compliance. I wrote about

Especially the Time Sync post was more about a technical challenge rather than a high-level view but nevertheless I think interesting. Let me walk further on the way to share some experiences with you I had when I started to deploy additional technology in my network at home.

I will definitely touch on certain themes like NAP and IPSec but at the beginning, I had to get the basics right. One of the basics to me was trying to get monitoring fixed.

I look at monitoring from two sides: One internally and one from externally. The reason why I wanted to look at external monitoring was, that I had sometimes challenges that my ISP had a problem and I did not realize this as all my infrastructure was up and running but still there was no availability from the outside.

Let’s start with internal monitoring first:

I knew that I had to have System Center Operations Manager 2007 R2 in place in order to get Stirling running later, I recently upgraded to SCOM R2.

Most of you probably know the drill: You run through the prerequisite-checker and fix all your pre-reqs until it starts installing. You deploy the agents and import all the management packs you think you want to have. If you need to do it, start small in order to really understand the different Alerts you will get, fix the issues (I had quite a few, which I did not find before) and then tailor the Alerts to the needs you have.

That’s standard procedure but let me add a few things I did in addition:

  • I have a NAS in my network. The way I do “monitoring” of this NAS is that I told my NAS the mail-address of the Operator (me but on a different mail-address) and go for the NAS telling me if it feels “sick”. However, I would like to have it integrated into my SCOM environment. As my NAS is capable of doing syslog transfers (as are quite some devices), I decided to go down that path. Once you know what your device is actually flagging, it is pretty straight-forward. There is a KB describing this: How to collect and monitor UNIX Syslogs in System Center Operations Manager 2007 or in System Center Essentials 2007
  • When it comes to Unix/Linux integration however, I would go down a different path as the monitoring of these OSs is now natively integrated into SCOM.
  • Another problem is how to monitor network devices like print servers. Again, this is pretty easy to do if you want to use SNMP (and please do not use the default community strings). Basically you can just go through the wizard to add a device and give it the IP-range and the SNMP community string and you are set.

So, you see the internal monitoring is pretty straight-forward.

The same is actually true with external monitoring.

I  found an interesting service called mon.itor.us. The basic service is for free and it saved my live more than once. Say: I was outside my network (and being the only network admin this happens pretty often) and e.g. my Internet connection fails. There is no way for SCOM to get hold of me as it cannot send mail anymore (and I do not pay for an SMS service).

With mon.itor.us you are able to define URLs or IPs and Ports to be monitored. So, I decided to have four services to be monitored from the outside:

  • My web server (standard HTTP)
  • My mail server (standard SMTP)
  • Terminal service to two of the key servers internally to see whether they are still alive

So, to monitor now you services (again from the outside) you get different options. You can use a Windows Vista Sidebar Gadget, which works on Windows 7 as well:

2009,05 Mon 2I use this pretty often as I can easily see if one of the servers go red. Sometimes, however, there is a problem with the mon.itor.us system and just one of the three locations gets a timeout. So, if you logon to your website consisting of a dashboard with different gadgets on like the one below:

2009,05 Mon 1Here you see the the US got a delay (sometimes even a timeout) whereas the other two locations have good performance – no reason for me to act.

And finally, there is an RSS Feed for messages2009,05 Mon 3

 

 

 

I linked it into my homepage in IE (my.live.com) and always see, when I have a problem – here I rebooted my firewall.

This is really cool stuff!

Roger