ClientKeyData table gets corrupted on central site server

 

Here’s an issue I’ve now seen on two different customers. They’re both running a 3-tier hierarchy with more than 30.000 clients. And both updated from SMS 2003 about half a year ago. The problem they’re experiencing is:

More than 20% of the clients do not report on hardware inventory properly and a lot of DDRs seem to get lost. The Matching errors for the data loader (the hardware inventory processor) and the statsys (the processor of the status messages) on the central site are:

 

PROBLEM:

sinvproc.log:
The data file "D:\ConfigMgr\inboxes\auth\sinv.box\l5s8o51h.SID" that was submitted by the client whose SMS unique ID is "GUID:34DA8F05-787B-4D3B-8539-AD4322A29E92", was rejected because the file was signed but the authentication key did not match the recorded key for this client.

dataldr.log:
Manager\inboxes\auth\dataldr.box\Process\X3vxse509.MIF" that was submitted by the client whose SMS unique ID is "GUID:1ACBD35E-8426-4CA3-AC84-D1212ACAD3CC", was rejected because the file was signed but the authentication key did not match the recorded key for this client.

It seems like the KeyData, which is stored for every client leads to the issue. The KeyData is the clients authentication key which is used to prove incoming data within inboxes\auth. If the KeyData on the central site somehow gets corrupted, the uploaded data from the clients is being rejected ad you experience backlogs in the folders:

inboxes\auth\dataldr.box\BADMIFS
inboxes\auth\sinv.box\bad
inboxes\auth\statesys.box\corrupt
inboxes\auth\ddm.box\BAD_DDRS

 

In order to prove the cause, you need to follow these steps:

Step 1: Get the SMS GUID of the suspicious client:

select Name0, smsid0 from system_data where Name0=‘ClientName’

 

Step 2: Get the KeyData field on both, the central site and the child primary site:

select SMSID, KeyData from ClientKeyData where SMSID=‘GUID:2A3D67C4-0F80-42B9-8659-1F4DE540B91B’

(insert the GUID received on Step 1)

 

Step 3: Compare the KeyData fields from Step 2:

image

 

RESOLUTION:

Currently I’m aware of two workarounds:

  1. Delete the entire row of the problem client on the central site server. It will be re-populated within 1-2 minutes.
  2. Select the KeyData field from the child primary site server and insert it in the matching field on the central site server. (Be careful, update table commands on the central site server can crash a lot of stuff and are not supported…)

 

ROOT CAUSE:

Well good question, next question. It looks like it happens when clients are getting re-staged and DDM in Central Site fails to update the table ClientKeyData via
spUpdateClientRegistration.