SCOM 2007 SP1 MP – aktualizace 26.10.2009

Koncem minulého mesíce se dockal aktualizace také základní management pack Operations Manager 2007 SP1 Management Pack , který je nyní po dvou aktualizacích ve verzi 6.0.6709.0. Tato aktualizace se snaží držet krok se zmenami a opravami ve SCOM 2007 R2 a odstranit podobné problémy a nekonzistence také ve SCOM 2007 SP1. Všem provozovatelum SCOM SP1, kterí nechtejí nebo nemohou v prejít v krátké dobe na verzi R2, DOPORUCUJI prostudovat prínosy a zvážit prechod na aktuální verzi MP.

Doprovodný dokument uvádí následující prehled zmen:

  • Updated the layout and default filters and sort order for a number of views.
  • Fixed an issue that was previously preventing all rules related to agentless exception monitoring from generating alerts.
  • Added display names, descriptions, and product knowledge where missing.
  • Added the rule “Collects Opsmgr SDK Service\Client Connections” to collect the number of connected clients for a given management group. This data is shown in the view “Console and SDK Connection Count” under the folder “Operations Manager\Management Server Performance”.
  • Updated a number of monitors and rules to ensure that data is reported to the correct management group for multihomed agents.
  • The following rules and monitors are now disabled by default as they are generally not actionable:
    • A GroupPopulator module unloaded due to an unrecoverable error
    • Health Service Cannot Find Management Group
    • Data Validity Check
    • Root Connector Data Validity Check
  • Added event collection rule for events 5400, 5401, 5402, 5404 5405, and 5500.
  • Updated the alert suppression criteria for the rule “Alert on Dropped MultiInstance Performance Module” in order to significantly reduce the alert volumes generated by this rule and make it easier to identify the root cause.
  • The implementation that triggers the “Restart Health Service” recovery was changed to be driven by monitors as opposed to rules, to address a number of shortcomings in the previous design.
  • Changed the default severity and priority of alerts raised by the “SDK Spn Registration” rule from “warning” to “critical” and updated the knowledge for the rule significantly.
  • Fixed an issue in which the “RunAs Authorization Check” alert could be incorrectly auto-resolved.
  • Added the “Communication Certificate Expiration Check” monitor to monitor certificate expiration for untrusted domain endpoints (agents, gateways, servers) and alert before the certificate expires.
  • Added event details to the “Secure Storage Configuration Check” monitor alert.
  • Changed the time-out value of the “Log Distributed Workflow Test Event” to 300 seconds.
  • Fixed an issue with the “Management Configuration Service - Windows Service State” monitor so that it will properly generate alerts for the state of the “OpsMgr Config Service” on a clustered root management server.
  • Fixed the “Operational Database Space Free (%)” monitor to compute free space based on maximum data file size, rather than maximum data and log file sizes combined.
  • Updated the workflows that drive the state of the “Computer Not Reachable” monitor to handle the condition when the computer’s name does not resolve.
  • Added additional criteria to the rule “WMI Raw Performance Counter Module Execution Failure” to account for some event IDs that were not being detected.
  • Removed criteria from the rule “Performance Data Source Module could not find a performance counter” to avoid generating alerts on warning events.

 

Duležité zmeny jsou v oblasti monitorování zdravého stavu agenta a volitelná možnost automatické nápravy problému, dále monitorování problému s Running Workflows

Local and Remote Monitoring of an Agent’s Health

  • Operations Manager agents monitor themselves for events and performance indicators that signal an issue with the agent’s health.
  • Management servers also maintain an external perspective of an agent’s health via the Health Service Watcher.
  • The ‘Agent Health State’ view provides a side-by-side dashboard of both perspectives on the agent.

Optional, Automatic Agent Remediation Capabilities

  • If the Health Service Watcher determines that an agent is unhealthy, a series of diagnostics and recoveries can be enabled to further diagnose the problem and event take actions to attempt to fix the problem (e.g. Ping the server to see if it is completely offline, start a stopped agent, trigger a reinstall, etc.). Refer to the management pack guide for more details.
  • Agents are monitoring their own process to ensure that memory utilization is not sustained at unacceptable levels. If this condition is detected then the agent will automatically restart itself to force the freeing up of memory.

Detection of Problems and Misconfigurations with Run As Accounts and Profiles

  • Checks are run on a regular basis to detect if any of the management group’s “Windows” type Run As Accounts have credentials which are about to expire. Alerts will be raised, and where possible this will be done in advance of the credentials expiring to avoid outages.
  • Alerts will be raised if any errors are encountered during the distribution of Run As Accounts.

Monitoring of problems with Running Workflows in Management Packs

  • Numerous rules are provided to detect if workflows within management packs are failing. Examples of workflows include discoveries, rules, monitors, etc. Failures can range from bad configurations on the workflows themselves, script failures, permissions problems, etc.

Znacky Technorati: SCOM 2007 SP1