Hey all, I wanted to share with you some details regarding the new Active Directory-Domain Services Management Pack for System Center Operations Manager that we will be releasing in conjunction with Windows Server 2016 (WS2016). But first who am I. My name is Eric Hunter and I have worked with the Active Directory (AD) product team for over 8 years now. A large part of that time I was responsible for the operational support of a live production domain that we, the Active Directory team here at Microsoft, use to validate the next version of AD through live, production usage. We validate it by promoting domain controllers with beta builds of the next version of Windows Server and having our employees and services use them daily. The one difference between our domain and yours is that we are constantly rebuilding domain controllers, other than that we are very similar. It is a multi-domain forest with thousands of users and computers, and mission critical systems relying on the domain to be operating with a high service quality.
Our approach for this new MP was to use it for our daily operations and to continually iterate on the MP functionality over time. The previous design philosophy had been a waterfall model where we would try to add features and bugs in a small window of time to ship with a version of Windows Server. But we found that was not conducive to creating an MP that we really wanted to use and that would work for us let alone you our customers. In the past we had relied on in-house monitoring tools to help fill in the monitoring gaps from the MP thinking that our environment was different from what a customer would have. What we found though was that what we wanted in the MP our customers wanted as well. So we decided to decrease the reliance on the in-house tools and shift to relying solely on SCOM in our environment. When we found a gap in the monitoring, we filled it. If the monitors we too noisy, we changed them. We have been working in this manner now for a few years and slowly but surely have been fixing many of the issues that were inherent in the old MP. Continually updating it over time, based on real world data, to eventually become what we rely on for domain health.
One of the biggest changes we made was to clean up the MP and remove anything that wasn’t working like we thought it should. We cleaned up all rules that did not auto-resolve, removed the oomads dependency from all scripts, reliance on down-level discovery MPs, and cleaned up some legacy monitors and OS specific libraries. We were removing so many of the legacy monitors, alerts, and libraries that we had to start over with a new library requiring a whole new MP. Thus the new MP that we are releasing with a shiny new name, Active Directory-Domain Services MP.
Another large change is how we monitor replication now. In the past replication was monitored by injecting a change into AD and tracking how long it took to replicate that change. The old way also required you to opt-in to the monitor, configuring it for individual DC’s or all at once. The old method generated a bunch of replication traffic and created objects in AD that could be confusing to admins. We didn’t like this method, and many customers did not either, and so we set out to find a way to monitor replication in a more passive manner. Now we use the built in tools in AD to track replication health. We added a replication queue monitor that will alert you when a DC starts lagging and a replication health check to make sure your DC has replicated within expected thresholds. Note that there are so many different replication environments out there and no one size fits all so some adjusting maybe necessary. As is the case with almost all of our new monitors, both the replication queue and replication health check monitors are configurable. You can set the thresholds for when they alert you so that they will fit your personal environment.
Lastly we changed our philosophy on how we monitor AD from an event driven model to synthetic monitors. In our experience, the event based model is both noisy and not very helpful. When an event rule fired, is there really a problem? Is it still a problem a few hours later? What’s the impact? Therefore we decided to move exclusively to synthetic based health monitors. We put a number of changes into the MP we now call the Domain Member Perspective MP (this was previously named Client Perspective). This MP verifies in a generic way if it can contact the domain so you know that clients are able to connect to it. It also tests the bind performance of your PDC and verifies that group policy is getting applied without errors on your member systems. But the biggest monitor we added to the Domain Member Perspective is the Domain Controller Health Monitor. This monitor has become my favorite for alerting me to an issue on a DC. We have spent a lot of time verifying and updating the logic of this monitor to not throw false alerts, but to know when a DC is not responding like it should and whether or not it is in a state that could impact users. It is our go to monitor in our environment and hopefully it will help you as well.
You can find the new ADDS 2016 MP at the DLC page for Windows Server 2016 Technical Preview Management Packs.
I hope you enjoy the changes we made to the ADDS MP and we would love to have your feedback as we continue to iterate on this and make it better for you and for us. We would especially like to hear from you about how you chose to change the defaults in the new monitors. It would really help us to know what the best default value is and if there are other monitors we can add or ways to adjust the MP to be more efficient out of the box. Please share your feedback about the MP at the SCOM Product Feedback forum. If your feedback has already been suggested by someone else, please up-vote it.
Eric Hunter | Software Engineer | Microsoft