There is a new Base OS MP version 6.0.6972.0 available here: http://www.microsoft.com/en-us/download/details.aspx?id=9296
Be very careful updating to this new version – there are multiple changes and potential issues you should plan for and test with, that might impact your existing environments. I will discuss them below.
I previously wrote about the last MP update HERE and HERE. Then I wrote about some issues in the MP’s with Logical Disk monitoring HERE. Additionally, there were some problems with the network monitoring utilization scripts HERE. All of these items have been addressed in this latest MP update. (somewhat)
First – lets cover the list of updates from the guide:
Changes in This Update
• Updated the Cluster shared volume disk monitors so that alert severity corresponds to the monitor state.
• Fixed an issue where the performance by utilization report would fail to deploy with the message “too many arguments specified”.
• Updated the knowledge for the available MB monitor to refer to the Available MB counter.
• Added discovery and monitoring of clustered disks for Windows Server 2008 and above clusters.
• Added views for clustered disks.
• Aligned disk monitoring so that all disks (Logical Disks, Cluster Shared Volumes, Clustered disks) now have the same basic set of monitors.
• There are now separate monitors that measure available MB and %Free disk space for any disk (Logical Disk, Cluster Shared Volume, or Clustered disk).
Note : These monitors are disabled by default for Logical Disks, so you will need to enable them if you want to use them in place of the default Logical Disk monitor for free space.
• Updated display names for all disks to be consistent, regardless of the disk type.
• The monitors generate alerts when they are in an error state. A warning state does not create an alert.
• The monitors have a roll-up monitor that also reflects disk state. This monitor does not alert by default. If you want to alert on both warning and error states, you can have the unit monitors alert on warning state and the roll-up monitor alert on error state.
• Fixed an issue where network adapter monitoring caused high CPU utilization on servers with multiple NICs.
• Updated the Total CPU Utilization Percentage monitor to run every 5 minutes and alert if it is three consecutive samples above the threshold.
• Updated the properties of the Operating System instances so that the path includes the server name it applies to so that this name will show up in alerts.
• Disabled the network bandwidth utilization monitors for Windows Server 2003.
• Updated the Cluster Shared Volume monitoring scripts so they do not log informational events.
• Quorum disks are now discovered by default.
• Mount point discovery is now disabled by default.
Notes: This version of the Management Pack consolidates disk monitoring for all types of disks as mentioned above. However, for Logical Disks, the previous Logical Disk Free Space monitor, which uses a combination of Available MB and %Free space, is still enabled. If you prefer to use the new monitors (Disk Free Space (MB) Low Disk Free Space (%) Low), you must disable the Logical Disk Free Space monitor before enabling the new monitors.
The default thresholds for the Available MB monitor are not changed, the warning threshold (which will not alert) is 500MB and the error threshold (which will alert) is 300MB. This will cause alerts to be generated for small disk volumes. Before enabling the new monitors, it is recommended to create a group of these small disks (using the disk size properties as criteria for the group), and overriding the threshold for available MB.
Ok, sounds good. But what does all that mean to me?
I will summarize the fundamental changes below:
1. Disk discovery and monitoring has changed. We now will UNDISCOVER any “Logical Disks” that are hosted by a Windows Server 2008 R2 cluster, and REDISCOVER those as a new entity, of the “Cluster Disk” class. This discovery only pertains to Windows Server 2008 R2 and later, it does not affect Server 2008 and older clusters.
There are now THREE types of disks we will discover and monitor:
- Logical Disks
- Cluster Disks
- Cluster Shared Volumes
Logical Disks include disks that are not part of/hosted by a cluster, and include disks with a drive letter, and any disks without a drive letter (which are discovered as mount points).
Cluster Disks include any disk that is hosted by a Microsoft Cluster as a shared resource, but not a specific Cluster Shared Volume.
Cluster Shared Volumes are a specific type of cluster disks, that is leveraged by Hyper-V clusters for placement of virtual machines.
For most customers, the impact will be if you have placed any instance or group specific overrides for your cluster disks, these will no longer apply, as these disks are going to be re-discovered as a new entity of a new class, “Cluster Disk”. This new class will have entirely different monitoring targeting it, described below.
However, this is a GOOD thing! In the past, if you had a disk that was part of a cluster, it was undiscovered and rediscovered on each NODE when a failover occurred. If you did overrides for the disk while it was on one node, your changes would no longer apply when it failed over to another node, because it was literally discovered as a different disk! (basemanagedentity) This is now resolved – the disk will retain the same BaseManagedEntityId (its unique GUID under the covers in SCOM) as it moves from node to node. It is also now “hosted” by the cluster, and not the Operating System class.
I put together a state dashboard that demonstrates these different disk types:
There are also distinct views for these that ship inside the management pack:
Another point to make here – is that the Mount Point discovery, which has been enabled in all previous Base OS MP’s, is now DISABLED. This means you will no longer discover mount points by default. You can enable this via override if you want mount point discovery, or selectively enable it only for specific servers that you know host a mount point that you wish to monitor.
Our mount point discovery is a bit misleading. We don’t actually only discover mount points, we actually use the mount point discovery to discover ANY disk that does not have a drive letter assigned. For instance, you may have noticed on your Server 2008 R2 machines, that you discovered a 100MB logical disk.
These 100MB disks are System Reserved for Bitlocker use, to hold the boot loader. Once you upgrade to the new MP version – new mounted disks (non-clustered disks with no drive letter) will no longer be discovered, as this discovery is disabled by default. This will NOT remove the previously discovered disks, however. Neither will running Remove-DisabledMonitoringObject. The reason that Remove-DisabledMonitoringObject does NOT remove these discovered disks, is because it will only remove objects if there is an explicit *override* for a discovery, disabling it. If we change the default configuration of a discovery to disabled, the cmdlet has no impact. So if you wanted to remove these from your management group, you simply need to add an explicit override disabling the mount point discovery, and THEN run the cmdlet. Keep in mind – doing this will undiscover ALL your mounted disks, possibly including real mount points if you have those. As there is ZERO value in discovering and monitoring these 100MB disks, I’d recommend disabling the mounted disk discovery with an explicit override, then create instance specific or group specific overrides for your servers that DO host a mounted disk.
2. Logical Disk free space monitoring, along with Cluster Disk and Cluster Shared Volume monitoring has changed. Here are the details:
The default configuration of the “Logical Disk Free Space” monitor is largely UNCHANGED from MP version 6.0.6958.0, which I wrote about HERE. This was done to create the lowest possible impact on you, the admin, who is using this monitor, and likely already has many overrides and has implemented this alert into any ticketing systems. There were many complaints that this monitor (once it was modified to allow for consecutive samples) no longer generated alerts that contained free space and MB free in the alert description. This is still the case in this version – the monitor was not modified. This monitor will also generate alerts for warning state AND critical state, which is NOT a good thing. When a single monitor generates alerts on both warning and critical state, a *new* alert is *not* generated when the monitor changes from warning to critical. We simply modify the existing alert from warning to critical (if it exists in an open state). This modification will NOT generate a new notification subscription, nor will it route the alert to a connector subscription set with a filter for “critical” severity alerts, because it has already been inspected and watermarked. For this reason I never recommend using three state monitors and alerting on a warning and a critical state.
However, another complaint we often got was that customers didn’t understand how this monitor worked, in that we inspect BOTH % free threshold AND MB free threshold, and BOTH conditions need to be met before we will change the state of the monitor and generate an alert. This is a very good design, because it helps cut out the majority of noise and remains flexible for disks of different sizes. That said, many customers would say “I just want a simple monitor to alert on % free ONLY, or MB free ONLY…” which was easier for them to understand. Therefore, we have added THREE new monitors for disk space monitoring of logical disks.
These new monitors are disabled by default, to allow customers to choose if they want to implement them. What we have done is to create two new Unit monitors, one for % free and one for MB free. Then place both of these under an aggregate rollup monitor.
If enabled, the customer can pick if they want only %, or only MB free, or both, via overrides. These new Unit monitors also provide a richer alert description as seen below:
The disk F: on computer computer1.domain.com is running out of disk space. The value that exceeded the threshold is 28 free Mbytes.
The disk F: on computer computer1.domain.com is running out of disk space. The value that exceeded the threshold is 4% free space.
Additionally, if the customer DOES want alerts on warning state for these monitors, they can enable this, and additionally enable alerting on the Aggregate rollup monitor above, to issue critical alerts only. This way, you can have unique alerting for a warning state, but if any monitor is critical, we can roll up health and generate a NEW alert for critical state, which can be used to send a notification or send to a ticketing system.
As you can see, a lot of thought went into this new design, trying to make the new format fit as many customer requested scenarios as possible. You essentially have three options now:
- Continue to use the existing Logical Disk Free space monitor that is provided and enabled in the management pack.
- Enable and start using the newly designed Logical Disk free space monitors, based on your specific requirements.
- Use my addendum MP which uses a single free space monitor that is similar to the old Base OS management packs, described and available HERE.
For Cluster Disks, and Cluster Shared Volume disks – both of those are using the new format for free disk space monitoring:
Based on this, I’d recommend considering and testing a move of your logical disk free space monitoring over to the new style as well, to have a consistent experience. I welcome your feedback on this point.
***Note – if you enable the new Logical Disk free space monitors, the MB Free monitor will go into a critical state for any Logical disk that is under 2GB (non-system) or 500MB (system). This means if you have any tiny disks, such as the 100MB bitlocker disks, this monitor will alert on all of those disks, potentially creating a large number of alerts. I’d recommend undiscovering those 100MB disks (see #1 above) or create a dynamic group of disks in your override MP, based on “size is less than a specific numerical size”, and use this group to disable free space monitoring.
3. The previous “Cluster Shared Volume” MP with was “Microsoft.Windows.Server.ClusterSharedVolumeMonitoring.mp” has a new displayname of “Windows Server Cluster Disks Monitoring” and the new classes for Cluster disks mentioned above are included in this MP, so if you didn’t import it previously because you weren't using Hyper-V Cluster Shared Volumes, you need this MP now to discover and monitor clustered disks.
4. We have disabled the Network Utilization scripts by default on Server 2003, and fixed them for Server 2008 to make them consume less resources. I wrote about this previously HERE. This now should be addressed, so if you previously disabled these, but want that counter for alerting or perf collection, you can consider enabling it. It should REMAIN disabled for Windows 2003, as there is an issue with Netman.dll which causes the crash of services.
5. The “Total CPU Utilization Percentage” monitor was changed. In previous management packs, it would inspect the value every 2 minutes, and if the AVERAGE of 5 samples for “CPU Queue length”AND “% Processor Time” were over their default thresholds, we would generate an alert. Now, we inspect the value every 5 minutes, and if the AVERAGE of 3 samples for both counters are over the thresholds, then an alert is generated. I am told this change was made on customer request, I have to assume to spread out the time period over a longer time span…. not really sure. Seems fairly insignificant.
Known Issues/Things to remember:
1. Which MP’s to import: This MP update contains the following files:
Don’t import management packs that you don’t need or use.
Don’t import the BPA management pack if you don’t want to see alerts for this new feature.
Don’t import the Microsoft.Windows.Server.Reports.mp if your back-end SQL is still running SQL 2005, this MP is supported on SQL 2008 and newer only. It will cause your reporting to break if you import this MP and your management group leverages SQL 2005 on the back-end.
DO import the Microsoft.Windows.Server.ClusterSharedVolume.mp because this contains the discovery and monitoring for Cluster Disks, not just Cluster Shared Volumes. If you don’t import this your monitoring of clustered disks will disappear.
2. The knowledge for the Total CPU Utilization Percentage is incorrect – the monitor was updated to a default value of 3 samples but the knowledge still reflects 5 samples.
3. There is no free space perf collection rules for “Cluster Disks”. We have multiple performance collection rules for Logical Disks, and for Cluster Shared Volumes, however there are none for the new Cluster Disks class. If you want performance reports on free space, disk latency, idle time, etc, you will need to create these.
4. Perf collection and disk monitoring for cluster disks and CSV’s only works when the resource group hosting the disks, are on the same node that is hosting the cluster name (quorum) resource. If the disk’s resource group is running on a different node than the cluster name itself, perf collection and monitoring will cease.