Monitoring UNIX/Linux with OpsMgr 2016


 

imageimage

 

Microsoft started including Unix and Linux monitoring in OpsMgr directly in OpsMgr 2007 R2, which shipped in 2009.  Some significant updates have been made to this for OpsMgr 2012.  Primarily these updates are around:

  • Highly available Monitoring via Resource Pools
  • Sudo elevation support for using a low priv account with elevation rights for specific workflows.
  • ssh key authentication
  • New wizards for discovery, agent upgrade, and agent uninstallation
  • Additional PowerShell cmdlets
  • Performance and scalability improvements
  • New monitoring templates for common monitoring tasks

Now – with SCOM 2016 – we have added:

  • Support for additional releases of operating systems:  (Link)
  • Increased scalability (2x) with asynchronous monitoring workflows
  • Easier agent deployment using existing RunAs account credentials
  • New Management Packs and Providers for LAMP stack
  • New UNIX/Linux Script templates to ease authoring  (Link)
  • Discovery filters for file systems  (Link)

 

I am going to do a step by step guide for getting this deployed with SCOM 2016.  As always – a big thanks to Tim Helton of Microsoft for assisting me with all things Unix and Linux.

 

 

High Level Overview:

 

  • Import Management Packs
  • Create a resource pool for monitoring Unix/Linux servers
  • Configure the Xplat certificates (export/import) for each management server in the pool.
  • Create and Configure Run As accounts for Unix/Linux.
  • Discover and deploy the agents

 

Import Management Packs:

 

The core Unix/Linux libraries are already imported when you install OpsMgr 2016, but not the detailed MP’s for each OS version.  These are on the installation media, in the \ManagementPacks directory.  Import the specific ones for the Unix or Linux Operating systems that you plan to monitor.

Additionally, there is a download location for Unix/Linux MP’s which have been *UPDATED*, however, the updated MP’s do not contain all Unix/Linux packs, so you should always START by importing the relevant management packs from the SCOM 2016 Media.

image

 

Here is an example of the MP’s I will import, which is all the important core libraries, and includes Red Hat, SUSE, and Universal Linux (CentOS, Debian, Oracle, Ubuntu)

 

image

 

Once these above are imported – THEN we can update to the most current ones available for those MP’s that have updates:

The *LATEST* version of these MP’s (and the ones you should be using) are located for download at:

https://www.microsoft.com/en-us/download/details.aspx?id=29696

Download those, and then import any relevant updated libraries.  The following screenshot shows version 7.6.1072.0 which was from the SCOM 2016 UR2 timeframe.

image

 

***NOTE: You will need to restart the Microsoft Monitoring Agent service on all Management Servers that will monitor Linux systems, after importing these management packs, before continuing.  This restart is required to allow each MS to deploy the agent files locally.

 

 

Create a resource pool for monitoring Unix/Linux servers

 

The FIRST step is to create a Unix/Linux Monitoring Resource pool.  This pool will be used and associated with management servers that are dedicated for monitoring Unix/Linux systems in larger environments, or may include existing management servers that also manage Windows agents or Gateways in smaller environments.  Regardless, it is a best practice to create a new resource pool for this purpose, and will ease administration, and scalability expansion in the future.

Under Administration, find Resource Pools in the console:

 

image

 

OpsMgr ships 3 resource pools by default:

 

image

 

Let’s create a new one by selecting “Create Resource Pool” from the task pane on the right, and call it “UNIX/Linux Monitoring Resource Pool”

 

image

 

Click Add and then click Search to display all management servers.  Select the Management servers that you want to perform Unix and Linux Monitoring.  If you only have 1 MS, this will be easy.  For high availability – you need at least two management servers in the pool.

Add your management servers and create the pool.  In the actions pane – select “View Resource Pool Members” to verify membership.

 

image

 

 

Configure the Xplat certificates (export/import) for each management server in the pool

 

Operations Manager uses certificates to authenticate access to the computers it is managing. When the Discovery Wizard deploys an agent, it retrieves the certificate from the agent, signs the certificate, deploys the certificate back to the agent, and then restarts the agent.

To configure for high availability, each management server in the resource pool must have all the root certificates that are used to sign the certificates that are deployed to the agents on the UNIX and Linux computers. Otherwise, if a management server becomes unavailable, the other management servers would not be able to trust the certificates that were signed by the server that failed.

We provide a tool to handle the certificates, named scxcertconfig.exe.  Essentially what you must do, is to log on to EACH management server that will be part of a Unix/Linux monitoring resource pool, and export their SCX (cross plat) certificate to a file share.  Then import each others certificates so they are trusted.

If you only have a SINGLE management server, or a single management server in your pool, you can skip this step, then perform it later if you ever add Management Servers to the Unix/Linux Monitoring resource pool.

In this example – I have two management servers in my Unix/Linux resource pool, MS1 and MS2.  Open a command prompt on each MS, and export the cert:

On MS1:

C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS1.cer

On MS2:

C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe -export \\servername\sharename\MS2.cer

Once all certs are exported, you must IMPORT the other management server’s certificate:

On MS1:

C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS2.cer

On MS2:

C:\Program Files\Microsoft System Center 2016\Operations Manager\Server>scxcertconfig.exe –import \\servername\sharename\MS1.cer

If you fail to perform the above steps – you will get errors when running the Linux agent deployment wizard later.

 

 

Create and Configure Run As accounts for Unix/Linux

 

Next up we need to create our run-as accounts for Linux monitoring.   This is documented here:  (Link) 

We need to select “UNIX/Linux Accounts” under administration, then “Create Run As Account” from the task pane.  This kicks off a special wizard for creating these accounts.

 

image

 

Lets create the Monitoring account first.  Give the monitoring account a display name, and click Next.

 

image

 

On the next screen, type in the credentials that you want to use for monitoring the UNIX/Linux system(s).  These accounts must exist on each UNIX/Linux system and have the required permissions granted:

 

image

 

On the above screen – you have two choices.  You can use a privileged account for handling monitoring, or you can use an account that is not privileged, but elevated via sudo.    I will configure this with the most typical customer scenario – which is to leverage sudo elevation which is specifically granted in the sudoers file.  (more on that later)

 

On the next screen, always choose “more secure” and click “Create

image

 

 

Now – since we chose More Secure – we must choose the distribution of the Run As account.  Find your “UNIX/Linux Monitoring Account” under the UNIX/Linux Accounts screen, and open the properties.  On the Distribution Security screen, click Add, then select "Search by resource pool name” and click search.  Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK.  This will distribute this account credential to all Management servers in our pool:

 

image

 

Next up – we will create the Agent Maintenance Account.

This account is used for SSH, to be able to deploy, install, uninstall, upgrade, sign certificates, all dealing with the agent on the UNIX/Linux system.

 

image

 

image

 

Give the account a name:

 

image

 

From here you can choose to use a SSH key, or a username and password credential only.  You also can choose to leverage a privileged account, or a regular account that uses sudo.  I will be choosing the most typical – which is an account that will leverage sudo:

 

image

 

Next – depending on your OS and elevation standards – choose to use SUDO or SU:

 

image

 

On the next screen, always choose “more secure” and click “Create

image

 

Now – since we chose More Secure – we must choose the distribution of the Run As account.  Find your “UNIX/Linux Agent Maintenance Account” under the UNIX/Linux Accounts screen, and open the properties.  On the Distribution Security screen, click Add, then select "Search by resource pool name” and click search.  Find your Unix/Linux monitoring resource pool, highlight it, and click Add, then OK.  This will distribute this account credential to all Management servers in our pool:

 

image

 

 

Next up – we must configure the Run As profiles. 

There are three profiles for Unix/Linux accounts:

image

 

The agent maintenance account is strictly for agent updates, uninstalls, anything that requires SSH.  This will always be associated with a privileged (or sudo elevated) account that has access via SSH, and was created using the Run As account wizard above.

The other two Profiles are used for Monitoring workflows.  These are:

Unix/Linux Privileged account

Unix/Linux Action Account

The Privileged Account Profile will always be associated with a Run As account like we created above, that is Privileged OR a unprivileged account that has been configured with elevation via sudo.  This is what any workflows that typically require elevated rights will execute as.

The Action account is what all your basic monitoring workflows will run as.  This will generally be associated with a Run As account, like we created above, but would be used with a non-privileged user account on the Linux systems, and wont request sudo elevation.

***A note on sudo elevated accounts:

  • sudo elevation must be passwordless.
  • requiredtty must be disabled for the user.

 

For my example – I am keeping it very simple.  I created two Run As accounts, one for monitoring and one for agent maintenance.  I will associate these Run As account to the appropriate RunAs profiles.  

 

I will start with the Unix/Linux Action Account profile.  Right click it – choose properties, and on the Run As Accounts screen, click Add, then select our “UNIX/Linux Monitoring Account”.  Leave the default of “All Targeted Objects” and click OK, then save.

Repeat this same process for the Unix/Linux Privileged Account profile, and associate it with your “UNIX/Linux Monitoring Account”.

Repeat this same process for the Unix/Linux Agent Maintenance Account profile, but use the “Unix/Linux Agent Maintenance Account”.

 

 

Discover and deploy the agents

Run the discovery wizard.

image

Click “Add”:

image

 

Here you will type in the FQDN of the Linux/Unix agent, its SSH port, and then choose All Computers in the discovery type.  ((We have another option for discovery type – if you were manually installing the Unix/Linux agent (which is really just a simple provider) and then using a signed certificate to authenticate))

Check the box next to “Use Run As Credentials”.  This will leverage our existing Agent Maintenance account for the discovery and deployment. 

 

image

 

Click “Save”.  On the next screen – select a resource pool.  We will choose the resource pool that we already created.

 

image

 

Click Discover, and the results will be displayed:

image

 

Check the box next to your discovered system – and click “Manage” to deploy the agent.

 

image

 

DOH!

 

There are many reasons this could fail.  The most common is rights on the UNIX/Linux systems you are trying to manage.  In this case – I didn’t configure SUDO on the Linux box.  Lets discuss that now.

I need to modify the /etc/sudoers file on each UNIX/Linux server, to grant the granular permissions.

NOTE:  The sudoers configuration has changed from SCOM 2012 R2 to SCOM 2016.  This is because we no longer install each package directly (such as .rpm packages).  Now, each agent is included in a .sh file that has logic to determine which packages are applicable, and install only those.  Because of this – even if you configured sudoers for SCOM 2012 R2 and previous support, you will need to make some modifications. 

Here is a sample sudoers file for all operating systems, in SCOM 2016:

#----------------------------------------------------------------------------------- #Example user configuration for Operations Manager 2016 agent v1.1 #Example assumes users named: scxmaint & scxmon #Replace usernames & corresponding /tmp/scx-<username> specification for your environment ##General requirements #These are any accounts you are using that use SUDO elevation including the Agent Maintenance account and or the monitoring account Defaults:scxmaint !requiretty Defaults:scxmon !requiretty ##Agent maintenance #Agent maintenance for LINUX #Certificate signing scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem #Agent maintenance for UNIX #Certificate signing scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem ##Install or upgrade #AIX scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].aix.[[\:digit\:]].ppc.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].aix.[[\:digit\:]].ppc.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #HPUX scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].hpux.11iv3.ia64.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].hpux.11iv3.ia64.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #RHEL scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #RHEL 7.1 PPC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].ppc.sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].rhel.[[\:digit\:]].ppc.sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #SLES scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].sles.1[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].sles.1[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #SOLARIS 10 scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.sparc.sh --install * scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.sparc.sh --upgrade --force * scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.x86.sh --install * scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.10.x86.sh --upgrade --force * #SOLARIS 11 scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].x86.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].x86.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].sparc.sh --install ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].solaris.1[[\:digit\:]].sparc.sh --upgrade --force ; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC #UNIVERSAL LINUX scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC ##Uninstall #Uninstall for LINUX scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall #Uninstall for UNIX scxmaint ALL=(root) NOPASSWD: /usr/bin/sh -c /opt/microsoft/scx/bin/uninstall ##Log file monitoring scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p ###Examples #Custom shell command monitoring example – replace <shell command> with the correct command string # scxmon ALL=(root) NOPASSWD: /bin/bash -c <shell command> #Daemon diagnostic and restart recovery tasks example (using cron) #scxmon ALL=(root) NOPASSWD: /bin/sh -c ps -ef | grep cron | grep -v grep #scxmon ALL=(root) NOPASSWD: /usr/sbin/cron & #End user configuration for Operations Manager agent #-----------------------------------------------------------------------------------

Since the above file contains ALL OS’s and examples, I am going to trim it down to just what I need for this Ubuntu Linux system:

 

#----------------------------------------------------------------------------------- #Ubuntu Linux configuration for Operations Manager 2016 agent ##General requirements Defaults:scxmaint !requiretty Defaults:scxmon !requiretty ##Agent maintenance #Certificate signing scxmaint ALL=(root) NOPASSWD: /bin/sh -c cp /tmp/scx-scxmaint/scx.pem /etc/opt/microsoft/scx/ssl/scx.pem; rm -rf /tmp/scx-scxmaint; /opt/microsoft/scx/bin/tools/scxadmin -restart scxmaint ALL=(root) NOPASSWD: /bin/sh -c cat /etc/opt/microsoft/scx/ssl/scx.pem ##Install or upgrade #UNIVERSAL LINUX scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC scxmaint ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scxmaint/scx-1.[5-9].[0-9]-[0-9][0-9][0-9].universal[[\:alpha\:]].[[\:digit\:]].x[6-8][4-6].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scxmaint; exit $EC ##Uninstall scxmaint ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall ##Log file monitoring scxmon ALL=(root) NOPASSWD: /opt/microsoft/scx/bin/scxlogfilereader -p #-----------------------------------------------------------------------------------

 

I will edit my sudoers file and insert this configuration.  You can use vi, visudo, or my personal favorite since I am a Windows guy – download and install WINSCP, which will allow a gui editor of the files and helps anytime you need to transfer files to and from Windows and UNIX/Linux using SSH.  Generally we want to place this configuration in the appropriate section of the sudoers file – not at the end.  There are items at the end of the file that need to stay there.  I put this right after the existing “Defaults” section in the existing sudoers configuration, and save it.

Now – back in SCOM – I retry the deployment of the agent:

image

 

image

 

 

This will take some time to complete, as the agent is checked for the correct FQDN and certificate, the management servers are inspected to ensure they all have trusted SCX certificates (that we exported/imported above) and the connection is made over SSH, the package is copied down, installed, and the final certificate signing occurs.  If all of these checks pass, we get a success!

There are several things that can fail at this point.  See the troubleshooting section at the end of this article.

 

 

Monitoring Linux servers:

 

Assuming we got all the way to this point with a successful discovery and agent installation, we need to verify that monitoring is working.  After an agent is deployed, the Run As accounts will start being used to run discoveries, and start monitoring.  Once enough time has passed for these, check in the Administration pane, under Unix/Linux Computers, and verify that the systems are not listed as “Unknown” but discovered as a specific version of the OS:

Here is is immediately – before the discoveries complete:

 

image

 

Here is what we expect after a few minutes:

 

image

 

 

Next – go to the Monitoring pane – and select the “Unix/Linux Computers” view at the top.  Look that your systems are present and there is a green healthy check mark next to them:

 

image

 

Next – expand the Unix/Linux Computers folder in the left tree (near the bottom) and make sure we have discovered the individual objects, like Linux Server State, Logical Disk State, and Network Adapter state:

image

 

Run Health explorer on one of the discovered Linux Server State objects.  Remove the filter at the top to see all the monitors for the system:

 

image

 

Close health explorer. 

Select the Operating System Performance view.   Review the performance counters we collect out of the box for each monitored OS.

image

 

Out of the box – we discover and apply a default monitoring template to the following objects:

  • Operating System
  • Logical disk
  • Network Adapters

Optionally, you can enable discoveries for:

  • Individual Logical Processors
  • Physical Disks

I don’t recommend enabling additional discoveries unless you are sure that your monitoring requirements cannot be met without discovering these additional objects, as they will reduce the scalability of your environment.

Out of the box – for an OS like RedHat Enterprise Linux 5 – here is a list of the monitors in place, and the object they target:

image

There are also 50 or more rules enabled out of the box.  46 are performance collection rules for reporting, and 4 rules are event based, dealing with security.  Two are informational letting you know whenever a direct login is made using root credentials via SSH, and when su elevation occurs by a user session.  The other two deal with failed attempts for SSH or SU.

To get more out of your monitoring – you might have other services, processes, or log files that you need to monitor.  For that, we provide Authoring Templates with wizards to help you add additional monitoring, in the Authoring pane of the console under Management Pack templates:

 

image

image

image

 

In the reporting pane – we also offer a large number of reports you can leverage, or you can always create your own using our generic report templates, or custom ones designed in Visual Studio for SQL reporting services.

image

 

 

As you can see, it is a fairly well rounded solution to include Unix and Linux monitoring into a single pane of glass for your other systems, from the Hardware, to the Operating System, to the network layer, to the applications.

Partners and 3rd party vendors also supply additional management packs which extend our Unix and Linux monitoring, to discover and provide detailed monitoring on non-Microsoft applications that run on these Unix and Linux systems.

 

 

Troubleshooting:

The majority of troubleshooting comes in the form of failed discovery/agent deployments.

Microsoft has written a wiki on this topic, which covers the majority of these, and how to resolve:

http://social.technet.microsoft.com/wiki/contents/articles/4966.aspx

  • For instance – if your DNS name that you provided does not match the DNS hostname on the Linux server, or match it’s SSL certificate, or if you failed to export/import the SCX certificates for multiple management servers in the pool, you might see:

image

Agent verification failed. Error detail: The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.

The SSL certificate is signed by an unknown certificate authority.
It is possible that:
1. The destination certificate is signed by another certificate authority not trusted by the management server.
2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

The server certificate on the destination computer (rh5501.opsmgr.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority.
It is possible that:
1. The destination certificate is signed by another certificate authority not trusted by the management server.
2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: rh5501.opsmgr.net.
3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

The solution to these common issues is covered in the Wiki with links to the product documentation.

  • Perhaps – you failed to properly configure your Run As accounts and profiles.  You might see the following show as “Unknown” under administration:

image

Or you might see alerts in the console:

Alert:  UNIX/Linux Run As profile association error event detected

The account for the UNIX/Linux Action Run As profile associated with the workflow "Microsoft.Unix.AgentVersion.Discovery", running for instance "rh5501.opsmgr.net" with ID {9ADCED3D-B44B-3A82-769D-B0653BFE54F9} is not defined. The workflow has been unloaded. Please associate an account with the profile.

This condition may have occurred because no UNIX/Linux Accounts have been configured for the Run As profile. The UNIX/Linux Run As profile used by this workflow must be configured to associate a Run As account with the target.

Either you failed to configure the Run As accounts, or failed to distribute them, or you chose a low priv account that is not properly configured for sudo on the Linux system.  Go back and double-check your work there.

If you want to check if the agent was deployed to a RedHat system, you can provide the following command in a shell session:

image

 

More good troubleshooting links and useful info:

Enable logging:  https://technet.microsoft.com/en-us/library/ee344801.aspx

https://blogs.msdn.microsoft.com/scxplat/2010/02/05/cant-get-your-linux-computer-discovered-check-your-network-configuration/

http://www.bictt.com/blogs/bictt.php/2010/02/22/scom-discovery-wizard-error-while-deploying-redhat-agent

https://technet.microsoft.com/en-us/system-center-docs/om/manage/install-agent-and-certificate-on-unix-and-linux-computers-using-the-command-line


Comments (10)

  1. ronald van den berg says:

    Hi Kevin, i have some additions to this article.

    First, the new option 'Use Run-As Credentials' sounds fantastic, but it isn't. It only works if you associate the unix/linux profile in the way you describe it. If you have more than 1 set of unix/linux accounts you cannot associate all accounts to all targeted objects, just 1. So in my case i associate the accounts to groups, but of course the server is not yet member of the group when there is no agent on the server. And using the same accounts on all domains is not allowed for us.

    Second the commands for installing, upgrading and de-installing are changed compared to older versions.
    If you try to un-install an agent that has not been upgraded to 1.6 (Or higher?) it tries to run a non existing uninstall command.
    So you first need to upgrade the agent before you can un-install it.

    The upgrade, installation and un-install requires new commands to be added to sudoers, check your /var/log/secure for that. Maybe it's not changed for all distributions, but i tested with centos and that requires these commands to be added since it's no longer using rpm in the command lines:

    scomuser ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scomuser/scx-[0-9].[0-9].[0-9]-[0-9][0-9][0-9].universalr.[0-9].x[0-9][0-9].sh --upgrade --force; EC=$?; cd /tmp; rm -rf /tmp/scx-scomuser; exit $EC
    scomuser ALL=(root) NOPASSWD: /bin/sh -c sh /tmp/scx-scomuser/scx-[0-9].[0-9].[0-9]-[0-9][0-9][0-9].universalr.1.x[0-9][0-9].sh --install; EC=$?; cd /tmp; rm -rf /tmp/scx-scomuser; exit $EC
    scomuser ALL=(root) NOPASSWD: /bin/sh -c /opt/microsoft/scx/bin/uninstall

    Gr,
    Ronald

  2. Rich Manship says:

    I've searched high and low to find anyone suffering the same issue as me. I've followed you guide to the letter but the 3 Linux agents I've installed (1 CentOS, 2 Ubuntu) only report in on the Heartbeat and other basic checks that don't really seem to be info taken from the monitored server itself. I haven't found an answer to this or even anyone suffering the same issue. Any help or places to look would be incredible. Any suggestions?

    1. Kevin Holman says:

      Have you imported the correct MP's for these OS?

  3. Scott Banyas says:

    Is there a distinguishing alert, similar to Windows, for Linux/Unix for ICMP (Failed to Connect to Computer) after a Heartbeat Failure?

    It is one thing for the agent to have issues, but a whole bigger issue if the computer is not reachable via ping.

    1. Kevin Holman says:

      No, not built in. Microsoft considers HB failures the same as "server down" even though we know that's not realistic. Many customers institute a ping solution, the problem is that many environments today blog ICMP, therefore this isnt always a reliable method.

  4. Eric says:

    dont believe this post is being monitored. I would LOVE to hear of ANYONE who is successfully monitoring Linux systems with SCOM 2016... in particular on SLES 11. Operations Manager is great. SCOM NOT SO!

    1. Kevin Holman says:

      I have customers doing this. Whats the problem?

  5. Javier says:

    Hi Kevin, hope you can help me. Im installing SCOM 2016 agent on SUSE 12 Enterprise, i run scx-1.6.2-338.sles.12.x64.sh with putty and all is ok, but when i go to see the scx-host-[hostname].pem (/etc/opt/microsoft/scx/ssl/) isn´t created, i only can see scx.pem. How can i generate scx-host-[hostname].pem to sign the certificate in the MS?

    really thanks

  6. rob1974 says:

    As there's not much documentation about multihomed unix servers and i dont really have the time for writing proper blog. if you are upgrading your SCOM 2012 environment to SCOM 2016 side by side, this will work fine as well.

    Just discover the unix/Linux servers already in the old environment. Make sure you have imported the xplat certs (in the blog above Configure the Xplat certificates) from the 2012 resource pool to all servers in de 2016 resource pool.
    The agent doesn't get upgraded in the proces, so this actually works without changing your sudoers.
    However, you do want to upgrade to the latest version, so you need to change sudoers to push the upgrade (also described in kevin's post).

  7. Hi Kevin,

    I am trying to accomlish an automated solution for this. Do you have any inputs? We want to roll out the software using chef (manually install seen from scom) where we will have to find a solution for copying and signing agent certificates before we have to run the discovery wizard, which is a nightmare.

    Any thoughts on this?

    Martin

Skip to main content