Broken DSC Reporting, RequestEntityTooLarge and some LCM internals

Abtract

After migrating a complex DSC configuration using partial configurations to composite resources, DSC reporting using SQL Server as described in Using SQL Server 2016 for a DSC Pull Server did not work any longer or was no longer reliable. This article is about troubleshooting this and learning something about Event ID 4260 and how the LCM sends data to the reporting database.

The change that leads to the issue

The configuration for a node in my case is quite complex and consists of at least 200 configuration items, and the number is growing. To make the management of the configuration easier and being able to manage dependency, I needed to split the configuration into sections. Recently I have learned that the way to split large DSC configurations by means of composite resources. Unfortunately, everything was build on partial configurations.

The conversion was actually much easier than expected. Everything was working afterwards but one important part was missing: Reporting. Somehow the data send from the LCM to the pull server is bigger when using composite resources instead of partial configurations, even if the configuration that the client creates contains the same configuration items.

 

The effect on the database

After not seeing any new data in the report, first I have taken a look at the database.

When the LCM starts applying a configuration (Start-DscConfiguration), it writes an entry of type 'Initial' into the reporting database. This record has a StartTime but not yet a valid EndTime. Also, StatusData only contains some basic information about the node.

Note: The OperationType 'Initial' indicates that the node applies a configuration the first time. All subsequent applications are logged with the OperationType 'Consistency'. The OperationType 'LocalConfigurationManager' logs the configuration of the LCM.

 

[code lang="PowerShell"]
Invoke-Sqlcmd -Database DSC -Query 'SELECT TOP 1 * FROM StatusReport'

 JobId                : d7352efe-3ce2-11e7-836e-0017fa00000a
Id                   : 77baf262-3ca1-11e7-8365-0017fa00000a
OperationType        : Initial
RefreshMode          :
Status               :
LCMVersion           : 2.0
ReportFormatVersion  : 2.0
ConfigurationVersion :
NodeName             : XDC11
IPAddress            : 192.168.88.129;127.0.0.1;fe80::99e2:71d5:6f53:7fe0%5;::2000:0:0:0;::1;::2000:0:0:0
StartTime            : 5/20/2017 12:31:03 AM
EndTime              : 12/30/1899 12:00:00 AM
Errors               : []
StatusData           : ["{\"IPV4Addresses\":[\"192.168.88.129\",\"127.0.0.1\"],\"CurrentChecksum\":\"0926E211C6D8617558
                       55CED942CB734CC94BF620EC6E4F8D1B9889F5B57ED6C7\",\"MACAddresses\":[\"00-17-FA-00-00-0A\",\"00-00
                       -00-00-00-00-00-E0\"],\"Type\":\"Initial\",\"HostName\":\"XDC11\",\"Locale\":\"en-US\",\"StartDa
                       te\":\"2017-05-20T09:31:03.9100000+11:00\",\"JobID\":\"{D7352EFE-3CE2-11E7-836E-0017FA00000A}\",
                       \"LCMVersion\":\"2.0\",\"IPV6Addresses\":[\"fe80::99e2:71d5:6f53:7fe0%5\",\"::2000:0:0:0\",\"::1
                       \",\"::2000:0:0:0\"]}"]
RebootRequested      : False
AdditionalData       : [{"Key":"OSVersion","Value":"{\"VersionString\":\"Microsoft Windows NT 10.0.14393.0\",\"ServiceP
                       ack\":\"\",\"Platform\":\"Win32NT\"}"},{"Key":"PSVersion","Value":"{\"CLRVersion\":\"4.0.30319.4
                       2000\",\"PSVersion\":\"5.1.14393.0\",\"BuildVersion\":\"10.0.14393.0\"}"}]

 

After the LCM has finished the job, it sends an update statement with the same JobId to the pull server. The pull server then forwards this update query to the database server. In this update statement the LCM provides all the missing details like Status, StatusData and the EndTime.

In my case, this did not happen and the database was filling up with records of OperationType 'Initial' or 'Consistency' that did not have an EndTime. Additionally, all my nice report queries and PowerBI did not work anymore.

sqlstatusreportonlyinitial

 

One takeaway is that if something does not work with reporting, first take look at the DSC nodes operational log. I found an error with the ID 4260 there that was telling something about "RequestEntityTooLarge".

eventviewer4260

 

Of course, you can also use PowerShell to find these event log entries using Get-WinEvent.

[code lang="PowerShell"]
Get-WinEvent -FilterHashTable @{ LogName = 'Microsoft-Windows-Dsc/Operational'; ID = 4260 } | Select-Object -First 1 | Format-List

 TimeCreated  : 5/21/2017 6:32:10 AM
ProviderName : Microsoft-Windows-DSC
Id           : 4260
Message      : Job {03B1E172-3D93-11E7-836F-0017FA00000A} :
               Http Client 77BAF262-3CA1-11E7-8365-0017FA00000A failed for WebReportManager for configuration The
               attempt to send status report to the server https://xpull1:8080/PSDSCPullServer.svc/Nodes(AgentId='77BAF
               262-3CA1-11E7-8365-0017FA00000A')/SendReport returned unexpected response code RequestEntityTooLarge..

 

The problem are some limitations on the pull server's web configuration. After some research and testing I was confused by contradictory statements and resolutions. After trying without success for quite some time, I finally got it working by adding this into the web.config in 'C:\inetpub\PSDSCPullServer'.

[code lang="xml"]
<bindings>
<webHttpBinding>
<binding maxBufferPoolSize="2147483647" maxReceivedMessageSize="2147483647" maxBufferSize="2147483647" transferMode="Streamed">
</binding>
</webHttpBinding>
</bindings>

 

Unfortunately the xWebConfigKeyValue does not yet support doing these settings when setting up the DSC pull server using DSC. As I do not want to make this a manual step that I can forget, I have added these lines into the pull server's setup routine.

[code lang="PowerShell"]
$path = 'C:\inetpub\PSDSCPullServer\web.config'

$doc = [[xml]](Get-Content -Path $path)
$serviceModelNode = ($doc | Select-Xml -XPath '//system.serviceModel').Node

$bindingsNode = ($doc | Select-Xml -XPath '//system.serviceModel/bindings').Node
if (-not $bindingsNode)
{
$bindingsNode = $doc.CreateElement('bindings')
[void]$serviceModelNode.AppendChild($bindingsNode)
}

$webHttpBindingNode = ($doc | Select-Xml -XPath '//system.serviceModel/bindings/webHttpBinding').Node
if (-not $webHttpBindingNode)
{
$webHttpBindingNode = $doc.CreateElement('webHttpBinding')
[void]$bindingsNode.AppendChild($webHttpBindingNode)
}

$bindingNode = ($doc | Select-Xml -XPath '//system.serviceModel/bindings/webHttpBinding/binding').Node
if (-not $bindingNode)
{
$bindingNode = $doc.CreateElement('binding')
[void]$webHttpBindingNode.AppendChild($bindingNode)
}

$bindingNode.SetAttribute('maxBufferPoolSize', [int]::MaxValue)
$bindingNode.SetAttribute('maxReceivedMessageSize', [int]::MaxValue)
$bindingNode.SetAttribute('maxBufferSize', [int]::MaxValue)
$bindingNode.SetAttribute('transferMode', 'Streamed')

$doc.Save($path)

 

After this change, it requires restarting the pull server's app pool. The easiest way is calling iisreset.exe.

Reporting works again

After having altered the web.config and restarted the app pool, the event ID 4260 on the DSC nodes got replaced by 4266, which looks much better.

eventviewer4266

 

Also the records in the database are being updated again, as you can see by the valid EndTime.

Note: What is not visible in the screenshots is the column StatusData which stores the most interesting data.

sqlstatusreportcomplete

 

Some Richer Reporting

If you create the database with the script references in SQL Server 2016 for a DSC Pull Server you have a view that parses the Json data and returns also some valuable information like Duration (how long has it taken applying the configuration) and how many resources are or are not in desired state. The non-compliant resources are also names in the column ResourceIdsNotInDesiredState.

sqlfilledrawstatusdata

I hope this helps. Have fun with PowerShell and DSC.