SCOM Gray Agents in VMM - Report & Email

A customer had a number of agents hosted in VMM, and often a VM would hang after patching. It was "Running" according to VMM but not actually responsive. This script helped identity these Gray Agents. They also had VMs hosted in Azure, and those were identifiable by naming convention.

It relies on the VMM and SCOM PowerShell modules, and invokes WinRM to check the actual VM state. It has been tested repeatedly in a large SCOM environment (4000+ agents) and typically runs in a few minutes or less.

Example output:

Gray Agent Report Info

Script Start Time : 01/01/2018 06:00:00

User Account Used To Produce Report : contoso\scom-adon

 

Gray Agents

Total Agents in SCOM: 4359

Total Gray Agent count: 11

 

9 gray agents in VMM.

- 2 are VMM hosts

- 0 are hung

- 6 are Running

- 1 are in a state other than Running (e.g. Saved, Paused, etc.)

0 gray agents in Azure.

2 gray agents are not in Azure, not in VMM

 

Name Comment
VM00.contoso.net VMM Host
VM02.contoso.net VMM Host
VM25.contoso.net Server is running
VM71.contoso.net Unknown (not Azure or VMM)
VM35.contoso.net Server is running
VM31.contoso.net Server is running
VM24.contoso.net Server is running
VM20.contoso.net Server is running
VM01.contoso.net Unknown (not Azure or VMM)
VM04.contoso.net Saved
VM38.contoso.net Server is running

 

Total Script Run Time: 0 hrs 1 min 30 sec

 

 

 # Gets all Gray Agents and finds which are hosted in Azure (IaaS VMs), VMM, or other/Unknown.

# Puts list of Gray agent Name and Comment (Azure VM, VMM Host, Server is Running, Server is Hung, ... etc.)

# into Results.csv. Writes gray agent breakdown of total counts in Report.txt. Sends email of HTML report.

#

# Laura.Park@microsoft.com

# August 2017





### Prep

$starttime = get-date



#load VMM module

cd C:\temp\VMM_Module\bin\psModules

Import-Module .\virtualmachinemanager

Get-SCVMMServer -ComputerName vmm1.contoso.com -setasdefault



#load SCOM module

import-module operationsmanager

$connect = New-SCOMManagementGroupConnection –ComputerName managementserver1.contoso.com

####



# variables

$CSVFilePath = "C:\temp\GrayAgentResults$(get-date -f yyyy-MM-dd_HH-mm).csv"  ### Replace with desired file path

$TXTFilePath = "C:\temp\GrayAgentReport$(get-date -f yyyy-MM-dd_HH-mm).txt"   ### Replace with desired file path

$tempTest = 'C:\temp\winrmjob.log'; #log for WinRM tests



# Email variables

$EmailServer =  "smtp.contoso.com"

$EmailTo = "MONEngineers@contoso.com, Admins@contoso.com"

$EmailFrom = "SCOM-HealthCheckReport@contoso.com"

#

$whoami = whoami #account running this script



$colMonitoringObjects = @()



$WindowsAgentClass = Get-SCOMClass -Name "Microsoft.SystemCenter.Agent"

$UXClass = Get-SCClass -name "Microsoft.Unix.Computer"



$WindowsAgents = Get-SCOMMonitoringObject -Class $WindowsAgentClass

$UnixAgents = Get-SCOMMonitoringObject -Class $UXClass

$AgentlessMonitored = (Get-SCOMAgentlessManagedComputer).Computer



$colMonitoringObjects += $WindowsAgents

$colMonitoringObjects += $UnixAgents

$colMonitoringObjects += $AgentlessMonitored

#

$colVMMmanagedcomputers = Get-SCVMMManagedComputer # VMM hosts

$colVMMmanagedcomputers += Get-SCvirtualmachine    # VMM vms



# all Gray Agents in SCOM

$colGrayAgents = $colMonitoringObjects | where {$_.IsAvailable –eq $false} # Mon Obj type



$colVMMGrayAgents = @()       # VMM obj type

$colVMMGrayAgentsMonObj = @() # Mon Obj type

$colAzureGrayAgents = @()     # Mon Obj type

$colUnknownGrayAgents = @()   # DisplayName



$colVMMRunningTEMP = @() # before we know whether a VM is really Running or Hung

$colVMMRunning = @()     # VMM obj type

$colVMMHung = @()        # VMM obj type

$colVMMOtherStatus = @() # VMM obj type

$colVMMHosts = @()       # VMM obj type



# count variables for final Report

$countVMMGrayAgents = 0



# possible gray state causes

$countAzureGrayAgents = 0 # Azure VM

$countUnknownGrayAgents = 0 # Unknown (not in Azure, not in VMM)

$countVMMHung = 0 # hung

$countVMMRunning = 0 # running

$countVMMOtherStatus = 0 # VM is a state other than "Running" or [empty]

$countVMMHosts = 0 # VMM host  (empty status)

#

# Find source (VMM, Azure, neither/unknown) of gray agents

foreach ($agent in $colGrayAgents)

{

    # Separate Azure-hosted gray agents based on naming convention (if applicable)

    if ($agent.DisplayName -like "AZURE*")

    {

        Write-debug "Skipping AzureVM : $($agent.DisplayName)"

        $countAzureGrayAgents++

        $colAzureGrayAgents += $agent

       

    } else { # VMM-hosted gray agents    

        foreach ($vm in $colVMMmanagedcomputers)

        {

            if ($vm.Name -eq $agent.DisplayName)

            {

                $colVMMGrayAgents += $vm

                $colVMMGrayAgentsMonObj += $agent 

                $countVMMGrayAgents++



                Write-verbose "Found #$countVMMGrayAgents gray agent in VMM : $($vm.name) :: VMM Status: $($vm.status)"



                if ($vm.status -eq "Running")

                {

                    $colVMMRunningTEMP += $vm # "TEMP" because need to check with WinRM if VM is actually Running or Hung

               

                } elseif ($vm.status -eq $null)

                {

                    $colVMMHosts += $vm

                    $countVMMHosts = $colVMMHosts.count

                } else

                {

                    $colVMMOtherStatus += $vm

                    $countVMMOtherStatus = $colVMMOtherStatus.count

                }  

                break #inner foreach loop to move onto searching source of next gray $agent

            }

        }

        # if we get to this point, $agent not found in Azure nor VMM

        #$colUnknownGrayAgents += $agent  # see next statement, more efficient

    }

}



# Gray agents that are not hosted in Azure or VMM

$colUnknownGrayAgents = Compare -Ref $colGrayAgents -Diff ($colVMMGrayAgentsMonObj + $colAzureGrayAgents) -Property DisplayName | foreach DisplayName

#display names only

$countUnknownGrayAgents = $colUnknownGrayAgents.count



# How long did this script take to up to this point?

$CheckTime1=Get-Date

$Check1RunTime=$CheckTime1-$StartTime



write-verbose "Completed finding source (VMM, Azure, unknown) of all gray agents"

write-verbose " - Checkpoint 1 Runtime: $($Check1RunTime.hours) hrs $($Check1RunTime.minutes) min $($Check1RunTime.seconds) seconds"



#### Write Partial Results to .csv File

# - up to this point we have all Gray Agents in various $col* variables

# - still need to check the "Running" VMM VMs

# - write the current results up to this point to file



# write the Azure gray agent VMs to Results.csv

$temp1 = @()

$colAzureGrayAgents | ForEach-Object {

    $row1 = "" | Select "Name","Comment"

    $row1."Name" = $_.DisplayName

    $row1."Comment" = "Azure VM"

    $temp1 += $row1

}

$temp1 | Export-Csv $CSVFilePath -NoTypeInformation



# write the Unknown VMs (not Azure or VMM) to Results.csv

$temp2 = @()

$colUnknownGrayAgents | ForEach-Object {

    $row2 = "" | Select "Name","Comment"

    $row2."Name" = $_# .DisplayName

    $row2."Comment" = "Unknown (not Azure or VMM)"

    $temp2 += $row2

}

$temp2 | Export-Csv $CSVFilePath -Append -NoTypeInformation



# write the VMM Hosts to Results.csv

$temp3 = @()

$colVMMHosts | ForEach-Object {

    $row3 = "" | Select "Name","Comment"

    $row3."Name" = $_.name # VMM obj type

    $row3."Comment" = "VMM Host"

    $temp3 += $row3

}

$temp3 | Export-Csv $CSVFilePath -Append -NoTypeInformation



# write the VMM other status ("Paused", "Saved", "PowerOff", etc.) to Results.csv

$temp4 = @()

$colVMMOtherStatus | ForEach-Object {

    $row4 = "" | Select "Name","Comment"

    $row4."Name" = $_.name # VMM obj type

    $row4."Comment" = $_.status

    $temp4 += $row4

}

$temp4 | Export-Csv $CSVFilePath -Append -NoTypeInformation

#### End of writing partial results



# How long did this script take to up to this point?

$CheckTime2=Get-Date

$Check2RunTime=$CheckTime2-$StartTime



write-verbose "Completed writing partial results to file. Beginning VMM running/hung state checks. This may take a while."

write-verbose " - Checkpoint 2 Runtime: $($Check2RunTime.hours) hrs $($Check2RunTime.minutes) min $($Check2RunTime.seconds) seconds"



####

# WinRM check on "Running" VMM VMs

$originalErrActionPref = $ErrorActionPreference

$ErrorActionPreference = "SilentlyContinue"



$checkcount = 0



#region WinRM Test - Remote Jobs (Batch/Parallel invocation) -- (thank you Arlu!)

try {

    $remoteJobQueue = [System.Collections.Queue]::Synchronized((New-Object System.Collections.Queue));

    $colVMMRunningTEMP | % {$remoteJobQueue.Enqueue($_);}

    $maxAllowed = 30; #number of jobs allowed per batch

    $maxTime = 15; #max time allowed for the whole job process (minutes)

    $allJobs = @();

    $jobArray = @();

    $timeOut = [datetime]::Now.AddMinutes($maxTime);

    $batchNumber = 1;

    while ($remoteJobQueue.Count -gt 0 -and ([datetime]::Now -lt $timeOut)) {

        Write-Verbose "Starting batch $batchNumber. $($remoteJobQueue.Count) remaining in queue." -Verbose;

        $jobArray = @();

        while ($jobArray.Count -lt $maxAllowed) {

            if ($remoteJobQueue.Count -eq 0) {

                Write-Verbose "Queue is empty" -Verbose;

                break;

            }

            $remoteVm = $remoteJobQueue.Dequeue();

            Write-Verbose "Starting test for $($remoteVm.Name). $($remoteJobQueue.Count) remaining in the queue." -Verbose;

            $jobArray += Invoke-Command -ComputerName $remoteVm.Name -ScriptBlock { return $true } -AsJob;

            Write-Verbose "Remote job count: $($jobArray.Count)" -Verbose;

        }

        Write-Verbose "Waiting for batch $batchNumber to complete" -Verbose;

        $null = Wait-Job -Job $jobArray -Timeout 120 -ErrorAction SilentlyContinue;

        Write-Verbose "All jobs in batch $batchNumber are completed." -Verbose;

        $allJobs += $jobArray;   

        $batchNumber++;

    }



    #Get result from all remote jobs.

    foreach ($job in $allJobs) {

        if ($job.state -eq 'Completed') {

            $colVMMRunning += $colVMMRunningTEMP | ? { $_.Name -eq $job.Location };

            Write-Verbose "WinRM connection successful - $($job.Location) server is running";

        }

        else {       

            $colVMMHung += $colVMMRunningTEMP | ? { $_.Name -eq $job.Location };

            Write-Verbose "WinRM connection failed - $($job.Location) server is hung";

        }

    }

    Get-Job | Remove-Job -Force -ErrorAction SilentlyContinue;

}

catch {   

    $errorMsg =  "Error encounter during WinRM test process: $($_.Exception.Message)";

    "{0}: {1}" -f [datetime]::Now.ToString('MMddyyy:hh:mm:ss'),$errorMsg | Out-File $tempTest

    throw "Error encounter during WinRM test process: $($_.Exception.Message)";

}

#endregion



$countVMMRunning = $colVMMRunning.count

$countVMMHung = $colVMMHung.count



$ErrorActionPreference = $originalErrActionPref

### End of WinRM check



Write-verbose "Completed all $checkcount VMM running/hung checks"

$CheckTime3=Get-Date

$Check3RunTime=$CheckTime3-$StartTime

write-verbose " - Checkpoint 3 Runtime: $($Check3RunTime.hours) hrs $($Check3RunTime.minutes) min $($Check3RunTime.seconds) seconds"



### Write final results to .csv file

$temp5 = @()

$colVMMRunning | ForEach-Object {

    $row5 = "" | Select "Name","Comment"

    $row5."Name" = $_.name # VMM obj type

    $row5."Comment" = "Server is running"

    $temp5 += $row5

}

$temp5 | Export-Csv $CSVFilePath -Append -NoTypeInformation



$temp6 = @()

$colVMMHung | ForEach-Object {

    $row6 = "" | Select "Name","Comment"

    $row6."Name" = $_.name # VMM obj type

    $row6."Comment" = "Server is hung"

    $temp6 += $row6

}

$temp6 | Export-Csv $CSVFilePath -Append -NoTypeInformation

### End of writing results to .csv file



### Write .txt Report with final counts

$txt = "Total Agents in SCOM:  $($colMonitoringObjects.count - $AgentlessMonitored.count)"

$txt += "`nTotal Gray Agent count*:  $($colGrayAgents.count)"

$txt += "`n(*may include gray Agentless objects)`n"

$txt += "`nBreakdown of Gray Agents:"



if ($countVMMGrayAgents -ge $countAzureGrayAgents){

    # most gray agents are in VMM

    $txt += "`n`t$countVMMGrayAgents gray agents in VMM. Further breakdown:"

    $txt += "`n`t`t- $countVMMHosts are VMM hosts"

    $txt += "`n`t`t- $countVMMHung are hung"

    $txt += "`n`t`t- $countVMMRunning are Running"

    $txt += "`n`t`t- $countVMMOtherStatus are in a state other than Running (e.g. Saved, Paused, etc.)"

    $txt += "`n`n`t$countAzureGrayAgents gray agents in Azure."

} else { # most gray agents are in Azure

    $txt += "`n`t$countAzureGrayAgents gray agents in Azure."

    $txt += "`n`n`t$countVMMGrayAgents gray agents in VMM. Further breakdown:"

    $txt += "`n`t`t- $countVMMHosts are VMM hosts"

    $txt += "`n`t`t- $countVMMHung are hung"

    $txt += "`n`t`t- $countVMMRunning are Running"

    $txt += "`n`t`t- $countVMMOtherStatus are in a state other than Running (e.g. Saved, Paused, etc.)"

}



$txt += "`n`n`t$countUnknownGrayAgents gray agents are not in Azure, not in VMM"

$txt += "`n`nEnd of report"

$txt | out-file $TXTFilePath

### End of writing .txt Report



### Interactive output



# see matched VMM vms name and status in VMM

write-verbose "`nGray agents in VMM and their Status:`n"

$colVMMGrayAgents | select name, status | Format-Table



write-verbose "`nGray agents in Azure:`n"

$colAzureGrayAgents | select displayname | Format-Table



# including Agent + Agentless

write-verbose "`nFound $($colMonitoringObjects.count) agents/agentless in SCOM ( $($WindowsAgents.count) Windows, $($UnixAgents.count) Unix, $($AgentlessMonitored.count) Agentless)"



# not including Agentless

write-verbose "Found $($colMonitoringObjects.count - $AgentlessMonitored.count) agents in SCOM ( $($WindowsAgents.count) Windows, $($UnixAgents.count) Unix, not including Agentless)"



Write-verbose "Found $($colGrayAgents.count) gray agents in SCOM"

#Write-verbose "Found $($colVMMmanagedcomputers.count) VMs in VMM"

Write-verbose "Found $countVMMGrayAgents gray agents in VMM"

Write-verbose "Found $countAzureGrayAgents agents in Azure"

# count of agents not in VMM or Azure

$countUnknownGrayAgents = $colGrayAgents.count - ($countVMMGrayAgents + $countAzureGrayAgents)

Write-verbose "Found $countUnknownGrayAgents gray agents not in VMM or Azure (unknown)"



###

# How long did this script take to up to this point?

$CheckTime5=Get-Date

$Check5RunTime=$CheckTime5-$StartTime



write-verbose "Completed writing results to file. Beginning email report."

write-verbose " - Checkpoint 5 Runtime: $($Check5RunTime.hours) hrs $($Check5RunTime.minutes) min $($Check5RunTime.seconds) seconds"



# Create header for HTML Report

$Head = "<style>"

$Head +="BODY{background-color:#CCCCCC;font-family:Verdana,sans-serif; font-size: small;}"

$Head +="TABLE{border-width: 1px;border-style: solid;border-color: black;border-collapse: collapse; width: 98%;}"

$Head +="TH{border-width: 1px;padding: 0px;border-style: solid;border-color: black;background-color:#293956;color:white;padding: 5px; font-weight: bold;text-align:left;}"

$Head +="TD{border-width: 1px;padding: 0px;border-style: solid;border-color: black;background-color:#F0F0F0; padding: 2px;}"

$Head +="</style>"



#beginning of HTML content:



#Section to insert the Script Generic info

$ReportOutput += "<h2>Gray Agent Report Info</h2>"

$ReportOutput += "<p>Script Start Time      :  $StartTime</p>"

$ReportOutput += "<p>User Account Used To Produce Report      :  $WhoAmI</p>"



$ReportOutput += "<br>"

$ReportOutput += "<h2>Gray Agents</h2>"



$ReportOutput += "<p>Total Agents in SCOM:  $($colMonitoringObjects.count - $AgentlessMonitored.count)</p>"

$ReportOutput += "<p>Total Gray Agent count:  $($colGrayAgents.count)</p>"

$ReportOutput += "<br>"

$ReportOutput += "<p>$countVMMGrayAgents gray agents in VMM.</p>"

$ReportOutput += "<p>-  $countVMMHosts are VMM hosts</p>"

$ReportOutput += "<p>-  $countVMMHung are hung</p>"

$ReportOutput += "<p>- $countVMMRunning are Running</p>"

$ReportOutput += "<p>- $countVMMOtherStatus are in a state other than Running (e.g. Saved, Paused, etc.)</p>"

$ReportOutput += "<p>$countAzureGrayAgents gray agents in Azure.</p>"

$ReportOutput += "<p>$countUnknownGrayAgents gray agents are not in Azure, not in VMM</p>"

$ReportOutput += "<br>"





# Gray Agent Table

$ReportOutput += ($temp1 + $temp2 + $temp3 + $temp4 + $temp5 + $temp6) | Sort-Object Name  | Select Name, Comment| ConvertTo-HTML -fragment



# How long did this script take to run?

$EndTime=Get-Date

$TotalRunTime=$EndTime-$StartTime



# Close the Body of the Report

$ReportOutput += "<br>"

$ReportOutput += "<p>Total Script Run Time: $($TotalRunTime.hours) hrs $($TotalRunTime.minutes) min $($TotalRunTime.seconds) sec</p>"

$ReportOutput += "</body>"



# Save the Final Report to a File (HTML)

#ConvertTo-HTML -head $Head -body "$ReportOutput" | Out-File $ReportPath



# Send Final Report by email...

Write-verbose "Emailing Report"

$SMTPServer = $EmailServer

$SmtpClient = New-Object Net.Mail.SmtpClient($smtpServer)



$Body = ConvertTo-HTML -head $Head -body "$ReportOutput"

$mailmessage = New-Object system.net.mail.mailmessage

$mailmessage.from = $EmailFrom

$mailmessage.To.add($EmailTo)



$mailmessage.Subject = "SCOM Gray Agent Report"

$MailMessage.IsBodyHtml = $true

$mailmessage.Body = $Body

#$smtpclient.Send($mailmessage)

Send-MailMessage -To $mailmessage.To -Subject $mailmessage.Subject -From $mailmessage.From -Body $mailmessage.Body -SmtpServer $EmailServer -BodyAsHtml -Encoding $mailmessage.BodyEncoding



# How long did this script take to up to this point?

$CheckTime6=Get-Date

$Check6RunTime=$CheckTime6-$StartTime



write-verbose "Script complete."

write-verbose " - Total Runtime: $($Check6RunTime.hours) hrs $($Check6RunTime.minutes) min $($Check6RunTime.seconds) seconds"

# end of script