SMSExec fails after SP1 Upgrade with error 00000080

I'd just finished upgrading my Config Mgr lab to SP1, and found that as soon as I'd restarted my machine, SMS_EXECUTIVE started to fail about 30 seconds or so after starting.  I didn't get a lot of information to know this was happening, apart from the usual "SMSexec has crashed, do you want to send an error report". 

Putting on my troubleshooting hat, I had a look at the CrashDumps directory under the default logs directory.  This is where a dump of various logs and bits of the Config Mgr state get dropped should a component fail.  The interesting log in this case was crash.log which is a summary of the failure, and the various thread states of each of the components.  I had an error which looked like:

EXCEPTION INFORMATION

Time = 06/05/2008 15:21:29.358
Service name = SMS_EXECUTIVE
Thread name = SMS_HIERARCHY_MANAGER
Executable = D:\Program Files\Microsoft Configuration Manager\bin\i386\smsexec.exe
Process ID = 3548 (0xddc)
Thread ID = 4256 (0x10a0)
Instruction address = 78141e3a
Exception = c0000005 (EXCEPTION_ACCESS_VIOLATION)
Description = "The thread tried to read from the virtual address 00000080 for which it does not have the appropriate access."
Raised inside CService mutex = No
CService mutex description = ""

The key message in this case is the description, which points to a virtual memory address (00000080).  To find this, I searched the crash.log for that error, and found this:

STACK TRACE FOR SMS_HIERARCHY_MANAGER THREAD 4256 (0x10a0) AT 06/05/2008 15:21:29.358

EAX: 00000080  CS: 001b  EIP: 00000000  EFLAGS: 00010202
EBX: 2ec8a0fe  SS: 0023  ESP: 013df35c
ECX: 7ffffffe  DS: 0023  EBP: ffffffff
EDX: 00000073  ES: 0023
ESI: 00000000  FS: 003b
EDI: 00000080  GS: 0000

This seemed to indicate that the issue was happening in Hierarchy Manager.  Opening hman.log  the last lines were:

Update the Sites table: Site=XXX Parent=~  $$<SMS_HIERARCHY_MANAGER><Thu Jun 05 15:21:29.137 2008 New Zealand Standard Time><thread=4256 (0x10A0)>
Nothing has changed for the boundary  $$<SMS_HIERARCHY_MANAGER><Thu Jun 05 15:21:29.268 2008 New Zealand Standard Time><thread=4256 (0x10A0)>
 No profile in DB, will try to create first version of AMT profile  $$<SMS_HIERARCHY_MANAGER><Thu Jun 05 15:21:29.298 2008 New Zealand Standard Time><thread=4256 (0x10A0)>
Trying Update Amt profile, new version will be created  $$<SMS_HIERARCHY_MANAGER><Thu Jun 05 15:21:29.338 2008 New Zealand Standard Time><thread=4256 (0x10A0)>

This seemed a little strange, as there should have been more information in there after that.  It didn't indicate why the Hierarchy Manager would have failed, just that it had suddenly stopped.  On a hunch I went looking in the hman.box directory (<Config Mgr install path>\Inboxes\hman.box) and found a bunch of CT2 files in there.  In a steady state you wouldn't expect to see any files in there - they should get processed and moved on.  I decided the best thing I could do would be to move the files out (but keeping them in case they were important) and try restarting the SMS_EXECUTIVE service.  As soon as I did this, all sorts of activity started to take place.  All the components that should have been reinstalled as part of the SP1 upgrade were reinstalled, and the system eventually started functioning as normal.

 Lessons learned:

  • All the information you need to troubleshoot is there in the logs
  • It's not necessarily that obvious what log you need to look at
  • Follow your hunches