Written by Kip Ng, Principal Premier Field Engineer.
Most companies today have some form of monitoring mechanism implemented to monitor different components of their IT system. There are various monitoring solutions out there. Some are more sophisticated than others, and, in general, there are different levels of monitoring available. I personally categorize most monitoring needs into the following five areas:
Of course, I should note that being a Microsoft Premier Field Engineer (PFE), I look at monitoring generally from a Microsoft product perspective, and I hope the list above is quite self-explanatory.
Most monitoring solutions focus on System Health Monitoring, Alerts and QoS (Quality of Service) Data (Performance) which are the basic forms of monitoring, giving the operations team a better understanding of the general health and the availability of the environment. The Diagnostics category is a more advanced proactive monitoring pillar and it is extremely useful in narrowing down specific issues. This usually requires some form of customization and it is developed on an ongoing basis.
What I want to talk about today isn’t one of these first 4 categories, rather it is one of the monitoring systems that I don’t see many companies implement, which is Configuration Monitoring.
Configuration monitoring remains as one of the most neglected monitoring needs for most companies, yet, in my opinion, it is probably one of the most essential ones. Configuration monitoring ensure that all the servers of the same function have similar (or identical) configurations, and it’s very effective in helping find unapproved changes to an environment. For example, group membership monitoring, server computer account monitoring, SQL database security monitoring, etc.
So why is Configuration Monitoring so essential?
1. Problems generally get introduced as a result of changes to an environment.
Ask any of the Operations team’s members, and you will probably find that most of them will tell you that most problems come from changes to existing systems. Be it the addition of some software, removal of some hardware, introduction of some new storage, change in service packs, application of hotfixes, etc. In short, if you don’t touch the system and if there are no changes introduced to the system, generally you don’t see problems.
So, how can you be sure that the configuration hasn’t changed? How can you be sure that there are no unapproved changes made to the environment? Well, you don’t unless you have a way to monitor configuration changes. Of course, it can be somewhat mitigated by a very strict change management process.
2. Minimizing issues associated with inconsistent configurations
How often do we check if all the servers of the same function are configured similarly and correctly and according to recommended practices from the associated vendor(s)? Well, it is easy when the systems are brand new, but as days go by where you have some folks troubleshooting stuff, for example, enabling some tracing, making changes to some registry keys or XML configurations files or maybe the NIC settings, you will very quickly realize that it is extremely difficult to keep track.
Another common scenario I come across in many companies is that the patch level of the systems of similar functions is different. This is where we need to have a way to constantly monitor the configurations to ensure that we are good.
I once worked with a customer as a Dedicated PFE, assisting them with their messaging system (Microsoft Exchange Server in this case). I can tell you that almost 70% of the incidents raised were a result of changes made to the system or inconsistent configuration settings on the systems.
3. Keeping on top of vendor-recommended updates and changes
Configuration Monitoring should not be limited to just monitoring changes. It should also include the ability to monitor and see if the configurations in place are correctly mapped to vendor recommendations. Those who have been in the IT world long enough know that recommendation do change from time to time. Some recommendation changes are due to components being rewritten for better performance handling or better scalability and etc. And some recommendation changes are for other reasons, of course.
The larger the system, the more sophisticated the solutions, the more servers in the environment, the greater the need for having some form of capable configuration monitoring. If you haven’t thought about it, perhaps it’s time to look into a solution for your environment.