Operations Management Suite Onboarding Troubleshooting Steps

[NOTE –  Operational Insights is now a part of Operations Management Suite. Learn more at microsoft.com/OMS ]

This article provides a series of steps and different procedures and known troubleshooting hints for either Operations-Manager attach mode or for Direct Agent. Some sections apply to both, some only to one type of reporting infrastructure (SCOM or DA).

If you none of these steps work for you:

  • Customers with Premier support can log support cases via Premier
  • Customers with Azure support agreements can log support cases in the Azure portal
  • send us an email scdata@microsoft.com and we would be more than happy to help get your issue resolved.
  • Feedback forum for ideas and bugs https://aka.ms/opinsightsfeedback
  • For general discussion/question and answers (not ideas and bug reports) use the MSDN Forum
  • How do I do XYZ? Try our documentation
  • Follow us on Twitter @msopsmgmt and feel free to engage, but be aware that many questions can’t be answered in 140 characters!

The article contains the following sections below:

  • SCOM REGISTATION – ERROR 3000 – describes an error you might encounter when registering a SCOM management group
  • PROXY – REGISTRATION / CONFIGURATION STEPS – describes how you need to configure proxy servers (if you have them) to allow traffic to OpInsights
  • VERIFYING IF THINGS ARE WORKING POST REGISTATION – troubleshooting steps both for SCOM and for directly-connected agents, and how to check if data flow is happening or what common errors to look for and how to resolve them
  • OTHER KNOWN ISSUES AND WORKAROUNDS (SCOM) – other miscellaneaous issues related to OpInsights onboarding from SCOM
  • IIS LOG COLLECTION – important things to know about collecting IIS Logs with OpInsights
  • SQL and AD ASSESSMENT  – important things to know about SQL and AD Assessment Intelligence Packs
  • MALWARE ASSESSMENT – important things to know about Malware Assessment Intelligence Packs
  • DIRECT AGENT SPECIFIC INFORMATION – important things to know about directly connected agents
  • WINDOWS AZURE DIAGNOSTICS INFORMATION – important things to know about collecting logs from Azure storage accounts

SCOM REGISTATION – ERROR 2200
If you run into the “Error 2200: Unable to register to the Advisor Service. Please contact the system administrator” while trying to connect their OpsMgr 2012 Management group to OMS.

 

There are two reason why a customer may run into this:

1. The OMS workspace has not been created prior to trying to onboard via SCOM. If that is the case please go the Microsoft.com\OMS and create a workspace first and then trying onboarding with the same account.

2. After install or after installing the latest update rollups required the necessary management packs have not been imported in your SCOM management group. In the SCOM console navigate to the administration view and choose to import management packs. Navigate to %SystemDrive%\Program Files\System Center 2012 SP1\Operations Manager\Server\Management Packs for Update Rollups and import the MPs in this folder. Once complete restart the console and try onboarding again.

SCOM REGISTATION – ERROR 3000
We have had a few customers run into the “Error 3000: Unable to register to the Advisor Service” while trying to connect their OpsMgr 2012 Management group to OMS. 

Error 3000: Unable to register to Advisor Service.

There are two reason why a customer may run into this:

  1. The server clock is off sync with the current time by more than 5mins. You can resolve this pretty easily by changing the clock time on your server to match the current time, you can accomplish this with opening command prompt as an Administrator type w32tm /tz to check the time zone, and w32tm /resync to sync.
    Note that, even if your clock says it is synchronized (i.e. with your company’s time server), it might still be out of sync with the one of our machines in Azure… since the time window is only 5 minutes, this often is the issue too. Verify you are synchronizing with a reliable time server ON THE INTERNET. You can further troubleshoot this type of issue by enabling VERbose tracing on the Management Server/Console machineuse this article http://support.microsoft.com/kb/942864 to learn about OpsMgr tracing. In a nutshell you need to doStartTracing.cmd VER
    – reproduce the issue –
    StopTracing.cmd
    FormatTracing.cmdin the formatted trace files you should find an exception saying the token was rejected because it was not yet valid or expired, or similar phrasing.
  2. Their internal proxy server\firewalls are blocking communication to the Advisor service endpoints. We provide detailed instructions for this second case in this article. Read on.

IF YOU HAVE A PROXY – REGISTRATION / CONFIGURATION STEPS

Depending on your proxy configuration, you might not be able to register at all, or – even when you do manage to register – some communication from SCOM to the service will later fail and scenarios might not light up in the portal. We describe the type of communications and endpoints you need to allow your management servers, console and direct agents to talk to in order for OpInsights to work for you.

Step 1: Request exception for the service endpoints

The following domains and URLs need to be accessible through the firewall/proxy for the management server to access the Azure Operational Insights Web Services

Management Server

URL

Ports

service.systemcenteradvisor.com

scadvisor.accesscontrol.windows.net

scadvisorservice.accesscontrol.windows.net

*.blob.core.windows.net/*

data.systemcenteradvisor.com

 

*.ods.opinsights.azure.com

*.systemcenteradvisor.com

Port 443

Port 443

Port 443

Port 443

Port 443

Port 443

Port 443

Large Volume scenarios / intelligence packs and OpsMgr agents

Note that with some upcoming intelligence packs (i.e. ‘Security and Audit’), given the large volume of data sent for those scenarios (Windows Security Logs), the agents, even if reporting to OpsMgr and receiving configuration from the OpsMgr Management Grup, will report data directly (=without queuing thru the management server) to the cloud. The destination needed for this communication is the following

URL

Ports

*.ods.opinsights.azure.com Port 443

Note that the proxy setting specified in Step 2 below will be automatically propagated to OpsMgr agents.

Operations Manager Console

The following domains and URLs need to be accessible through the firewall to view the Advisor Web portal and OpsMgr Console (to perform ‘registration’ to Azure Operational Insights).

Resource

Ports

*.systemcenteradvisor.com

*.live.com

*.microsoft.com

*.microsoftonline.com

login.windows.net

Ports 80 and 443

Ports 80 and 443

Ports 80 and 443

Ports 80 and 443

Ports 80 and 443

Also ensure the Internet Explorer proxy is set correctly on your computer you are trying to login with. Especially valuable test is to try and connect to a SSL-enabled website, i.e. https://www.bing.com/ – if the HTTPS connection doesn’t work from a browser, it probably also won’t in the Operations Manager Console and in the server modules that talk to the web services in the cloud.

 

Directly-connected Agents

Direct Agent does not us your credentials to connect to the workspace: you have to enter workspace id and key. Those credentials are used for registration, after the agent is registered, a certificate is used. Direct Agent only needs to connect to the following destinations

URL Ports
*.blob.core.windows.net/*

*.oms.opinsights.azure.com

*.ods.opinsights.azure.com

Port 443

Port 443

Port 443

Once you have completed registering your OpsMgr Environment to the Advisor Service you need to follow Steps 2, 3 and 5 to allow your Management servers to send data to the Advisor Web Service (step 4 is only required if you have an old patch level… but you are running the latest update rollup, right?).

Step 2: Configure the proxy server in the OpsMgr Console

  • Open the OpsMgr Console
  • Go to the “Administration” view
  • Select “Advisor Connection” under the “System Center Advisor” Node

Click “Configure Proxy Server”

  • Check the checkbox to use a proxy server to access the Advisor Web Service
  • Specify the proxy address in the http://proxyserver:port format

  


Step 3: Specify credentials for OpsMgr if the Proxy Server requires Authentication

If your proxy server requires authentication, you can specify one in the form of an OpsMgr RunAs account and associate it with the ‘System Center Advisor Run As Profile Proxy’

  • In the OpsMgr Console, go to the “Administration” view
  • Select “Profiles” under the “RunAs Configuration” Node
  • Double click and open “System Center Advisor Run As Profile Proxy
  • Click ‘Add’ to add a ‘RunAs Account‘. You can either create one or use an existing account. This account needs to have sufficient permissions to pass through the proxy
  • Set the Account to be targeted at the ‘Operations Manager Management Servers’ Group
  • Complete the wizard and save the changes

  

Note: not all code paths currently support authentication. It is still possible that you will need to set some of those exclusions mentioned in Step 1 to allow anonymous traffic to some of those destinations. We will keep this document up-to-date as this requirement evolves.


Step 4: Configure the proxy server on each UNPatched OpsMgr Management Server for WinHTTP

NOTE: this step is NO LONGER required IF you UPDATED your Management Servers to Update Rollup 3 for System Center 2012 R2, or Update Rollup 7 for System Center 2012 SP1 (or newer ones). In fact, we recommend you don’t do this step and just upgrade to the latest Rollup if you can!

  • Open Command Prompt as an Administrator on the Management Server
  • Type netsh winhttp set proxy myproxy:80
  • Restart the ‘System Center Management’ Service (HealthService)
  • Do step 2 on each of your management servers in your management group

Step 5: Configure the proxy server on each OpsMgr Management Server for Managed code

There is another setting in Operations Manager, which is intended for general error reporting, but we have noticed that – when set – due to the same modules being used in multiple workflows, this proxy setting also ends up affecting Advisor connector’s functionality.
The recommendation is therefore to also set it (to the same proxy you set in the other places) for each and every MS if you use a proxy.

  • In the OpsMgr Console, go to the “Administration” view
  • Select “Device Management” and then the “Management Servers” Node
  • Right-click and choose “Properties” for each MS (one at the time) and set the proxy in the “Proxy Settings” tab.

Proxy settings per MS

If none of the above steps resolve your issue please let us know and we will help you!

VERIFYING IF THINGS ARE WORKING POST COMPLETING THE CONFIGURATION WIZARD

Procedure 1: Validate if the right Management Packs get downloaded to your OpsMgr Environment

Note: Depending on which Intelligence Packs you have enabled from the OpInsights Portal will you see more or less of these MPs. Search for keyword ‘Advisor’ or ‘Intelligence’ in their name.

Advisor Management Packs in SCOM

You can additionally check for these MPs using OpsMgr PowerShell and typing these commands

get-scommanagementpack | where {$_.DisplayName -match ‘Advisor’} | select Name,DisplayName,Version,KeyToken

get-scommanagementpack | where {$_.DisplayName -match ‘Advisor’} | select Name,DisplayName,Version,KeyToken | Out-GridView

Note: if you are troubleshooting Capacity Intelligence Pack, check HOW MANY management packs with the name containing ‘capacity’ you have: there are two management packs that have the same display name (but different internal ID’s) that come in the same MP bundle; if one of the two does not get imported (often due to missing VMM dependency) the other MP does not get imported and the operation does not retry.

You should see the following three MPs related to ‘capacity’

  • Microsoft System Center Advisor Capacity Intelligence Pack
  • Microsoft System Center Advisor Capacity Intelligence Pack
  • Microsoft System Center Advisor Capacity Storage Data

if you only see one or two of them but not all three, remove it and wait 5/10 minutes for OpsMgr to download and import it again – check the event logs for errors during this period.

 

Procedure 2: Validate if the right Intelligence Packs get downloaded to your Direct Agent

In Direct Agent you should see the Intelligence Packs collection policy being cached under C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Management Packs

Intelligence Packs on Direct Agent

 

Procedure 3: Validate if data is being sent up to the Advisor service (or at last attempted)

  • Open ‘Performance Monitor’
  • Select ‘Health Service Management Groups’
  • Add all the counters that start with ‘HTTP’
  • If things are configured right you should see activity for these counters, as events and other data items (based on the intelligence packs onboarded in the portal, and the configured log collection policy) are uploaded. Those counters don’t necessarily have to be continuously ‘busy’ – if you see little to no activity it might be that you are not onboarded on many intelligence packs or have a very lightweight collection policy.

 

Procedure 4: Check for Errors on the Management Server or Direct Agent Event Logs

As a final step if all of the above fails see if you are seeing any errors in Event Viewer –> Application and Services –> Operations Manager event log and filter by Event Sources: Advisor, Health Service Modules, HealthService and Service Connector (this last one applies to Direct Agent only). You can copy these event and post them in the ‘Feedback’ forum so we in the product team can help you further. Most of these events would be also be found on Direct Agent, the steps for troubleshooting would be similar. The only part that differs between SCOM and Direct Agent is really the registration process:

  • in SCOM you have a nice wizard with browser integration that lets you pick your workspace as a user/admin then SCOM takes care of exchanging certificates and uses those for MP download and data transfer/upload to OpInsights
  • in Direct Agent you just copy/paste the workspace id and key and those are used to authenticate / prove that it’s really you registering those agents and you own that workspace, and then certs are exchanged under the hood by the service similarly to SCOM and used the same way

Hence, many of these events apply to both types of reporting infrastructure.

Open Event Viewer –> ‘Application and Services’ –> ‘Operations Manager’ and filter by Event Sources: Advisor, Health Service Modules, HealthService and Service Connector (this last one applies to Direct Agent only).

  

A few of the ‘bad’ events you might see when looking if things aren’t working are described in the following table:

EventID Source Meaning Resolution
2138 Health Service Modules Proxy requires authentication Follow step 3 and/or step 1 above
2137 Health Service Modules Cannot read the authentication certificate Re-running the Advisor registration wizard will fix certificates/runas accounts
2132 Health Service Modules Not Authorized Could be an issue with the certificate and/or registration to the service; try re-running the Advisor registration wizard that will fix certificates and runas accounts. Additionally, verify the proxy has been set to allow exclusions as in step 1 above, and/or verify authentication as in step 3 (and that the user indeed has access thru the proxy)
2129 Health Service Modules Failed connection / Failed SSL negotiation There could be some strange TCP settings on this server. Check this other blog post from the community for such as case http://jacobbenson.com/?p=511
2127 Health Service Modules Failure sending data received error code If it only happens once in a while, this could be just a glitch. Keep an eye to understand how often it happens. If very often (every 10 minutes or so throughout the day), then it is an issue – check your network configuration, proxy settings described above, and re-run registration wizard. But if it only happens sporadically (i.e. a couple of times per day) then everything should be fine, as data will be queued and retransmitted.
Some of the HTTP error codes have some special meanings, i.e.:
– the FIRST time that a MMA direct agent or management server tries to send data to our service, it will get a 500 error with an inner 404 error code – 404 means not found; this indicates that the storage area we’ll use for this new workspace of yours isn’t quite ready yet – it is being provisioned. On next retry, this will however be ready and flow will start working (under normal conditions).
A 403 might indicate a permission/credential issue, and so forth. There are more information on the 403 below in the Direct agent specific section of this post.
2128 Health Service Modules DNS name resolution failed You server can’t resolve our internet address it is supposed to send data to. This might be DNS resolver settings on your machine, incorrect proxy settings, or a (temporary) issue with DNS at your provider. Like the previous event, depending if it happens constantly or ‘once in a while’ it could be an issue – or not.
2130 Health Service Modules Time out Like the previous event, depending if it happens constantly or ‘once in a while’ it could be an issue – or not.
4511 HealthService Cannot load module “System.PublishDataToEndPoint” – file not found Initialization of a module of type “System.PublishDataToEndPoint” (CLSID “{D407D659-65E4-4476-BF40-924E56841465}”) failed with error code The system cannot find the file specified.

This error indicates you have old DLLs on your machine, that don’t contain the required modules. The fix is to update your Management Servers to the latest Update Rollup.

4502 HealthService Module crashed If you see this for workflows with names such as CollectInstanceSpace or CollectTypeSpace it might mean the server is having issues to send some data. Depending on how often it happens – constantly or ‘once in a while’ – it could be an issue or not. If it happens more that every hour it is definitely an issue. If only fails this operation once or twice per day, it will be fine an able to recover. Depending on how the module actually fails (description will have more details) this could be an on-premises issue – i.e. to collect to DB – or an issue sending to the cloud. Verify your network and proxy settings, and worst case try restarting the HealthService.
4501 HealthService Module “System.PublishDataToEndPoint”  crashed A module of type “System.PublishDataToEndPoint” reported an error 87L which was running as part of rule “Microsoft.SystemCenter.CollectAlertChangeDataToCloud” running for instance “Operations Manager Management Group” with id:”{6B1D1BE8-EBB4-B425-08DC-2385C5930B04}” in management group “SCOMTEST”.

You should NOT see this with this exact workflow, module and error anymore, it used to be a bug *now fixed* tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6714689-alert-management-intelligence-pack-not-sending-ale

4002 Service Connector The service returned HTTP status code 403 in response to a query.  Please check with the service administrator for the health of the service. The query will be retried later. You can get a 403 during the agent’s initial registration phase, you’ll se a URL like

https://<YourWorkspaceID>.oms.opinsights.azure.com/ AgentService.svc/AgentTopologyRequest

Error code 403 means ‘fordbidden’ – this is typically a wrongly-copied WorkspaceId or key, or the clock is not synced (just like for ‘error 3000’ in SCOM at the beginning of this article) – see more here

Procedure 5: Look for your agents to send their data and have it indexed in the Portal

Check in the OpInsights Portal to see if your machines are reporting.  From the Overview page navigate to the large blue SETTINGS tile – it will be either the first or last tile depending on your configuration state.  In SETTINGS click the CONNECTED SOURCES tab.  Each column on this page represents a different data source type attached to OI (Servers attached directly, SCOM management groups and Azure storage accounts). Clicking the blue “X servers/mgmt groups/storage accounts connected” will bring you to search with more detail.  On this page you’ll also see a list of individual management groups connected – clicking one of these management groups will also bring you to search and show you a list of the servers connected to this management group.

NOTE: If a data source is listed as reporting on this page, it does not necessarily mean we have collected any data from the source.  In this case it’s possible that drilling into search from this page will show inconsistent results (i.e. you’ll see a data source listed in CONNNECTED SOURCES, but it won’t be in search).  Once data collection has started (either from an IP or from log collection), the results in search will be consistent.

In addition to the above, the Advisor engineering team is committed to resolving all your onboarding issues so please contact us if you run into any issues. We are here to help.

OTHER KNOWN ISSUES AND WORKAROUNDS (SCOM)

‘Search’ button in the ‘Add a Computer/Group’ dialogue is missing

We have had a couple of customers report that the Search button in the Computer Search dialog is invisible. We are trying to investigate why this happens. A temporary workaround is click in the ‘Filter by(optional)’ edit box and press TAB to get to the invisible search button, and then activate it by <Spacebar> or <Enter>.

 

IIS LOG COLLECTION

There is another post here with specific information on how to best configure IIS logging for use with OpInsights and some other known issues http://blogs.technet.com/b/momteam/archive/2014/09/19/iis-log-format-requirements-in-system-center-advisor.aspx

Some info there also apply to Direct Agent too, but was mostly written for SCOM. There is other information about IIS with Direct Agent further below in this post.

 

SQL AND AD ASSESSMENT

SQL and AD Assessment require .NET 4 to run on each agent to be assessed. Analysis runs on the SQL Server machines and on the Domain Controllers (for AD). SQL Assessment supports the Standard, Developer and Enterprise editions of SQL Server, all currently supported versions.

 

MALWARE ASSESSMENT

Windows 7 and Windows Server 2008 R2 have the issues described/tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519211-windows-server-2008-r2-sp1-servers-are-shown-as-n

See what Anti-Malware products are enabled by following this thread http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519202-support-other-antivirus-products-in-malware-assess

 

DIRECT AGENT SPECIFIC INFORMATION

MOST of the errors in the table above in ‘Procedure 4’ about ‘Management Servers’ also apply to Direct Agent. In Direct Agent, each agent is responsible to talk to OpIsights on its own, while in Operations Manager it is the Management Server that sends data on behalf of the agents reporting to it, acting as a gateway.

On Direct agent the most common issue we have seen so far is Error code 403 which means ‘fordbidden’ – this is typically a wrongly-copied workspaceId or key – see more here.

 

Other things that we are currently tracking for Direct Agent:

Capacity Management Intelligence Pack does NOT work with Direct Agent; only with Operations Manager. In fact it needs even Operations Manager to be integrated with Virtual Machine Manager. We are tracking ideas to either generalize it starting here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6662146-open-up-the-capacity-management-pack-for-other-sys

Alert Management Intelligence Pack does NOT work with Direct Agent; it depends on and requires Operations Manager, whose alerts it synchronizes to the cloud.

Malware Assessment works, other than for the same symptom noted above for 2008R2/Win7.

Update Assessment, Change Tracking as well as Log management Intelligence Packs for collecting Windows Events and IIS Logs works for both SCOM and Direct Agent already.

 

If you need documentation on how to install the agent (also in scripted/unattended way) check the documentation here https://azure.microsoft.com/en-us/documentation/articles/operational-insights-direct-agent/ – and if you need, Direct Agent supports passing thru proxy – there is a PowerShell script in the official documentation above that you can use to configure which proxy and credentials to use on the agent (it’s an application-specific setting; no other process than MMA’s needs to be be able to know how to reach the internet).

If your VM is in Azure, you can one-click install/enable the agent from the Azure portal http://azure.microsoft.com/en-us/updates/easily-enable-operational-insights-for-azure-virtual-machines/

We currently only have 64bit version of the agent – 32bit is tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6744349-support-for-windows-2003-and-2008-servers-32-bit

 

WINDOWS AZURE DIAGNOSTICS INFORMATION

Log management thru Azure Portal integration allows to also ingest windows events from Windows Azure Diagnostics (WAD) Storage. This works for Cloud Services roles and IaaS VMs configured to write to WAD.

Collecting IIS Logs from WAD works for Cloud Services and for IaaS VMs, but not currently for Azure Web Sites – this is tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519351-collect-iis-logs-from-windows-azure-diagnostics-st

Check out and vote the other ideas about what to collect in this category on our forum http://feedback.azure.com/forums/267889-azure-operational-insights/category/88086-log-management-and-log-collection-policy

Here is a good paper on how to configure your azure roles and VMs to write to Windows Azure Diagnostics storage in the first place http://download.microsoft.com/download/B/6/C/B6C0A98B-D34A-417C-826E-3EA28CDFC9DD/AzureSecurityandAuditLogManagement_11132014.pdf

 


Satya, Daniele and other folks on the OpInsights team maintain and update this post regularly with new information and learning; check it regularly!