Another Who Done It

Hi my name is Bob Golding, I am an EE in GES. I want to share an interesting problem I recently worked on.  The initial symptom was the system bugchecked with a Stop 0xA which means there was an invalid memory reference.  The cause of the crash was a driver making I/O requests while Asynchronous Procedure Calls (APCs) were disabled.  The bugcheck caused by an invalid memory reference was the result of the problem and not the cause.

 

An APC is queued to a thread during I/O completion. This is to guarantee the last phase of the I/O completion occurs in the same context as the process that issued the request.

 

The stack of the trap is presented below.  The call stack shows that APCs are being enabled allowing queued APCs to run.

 

Child-SP          RetAddr           Call Site

fffff880`07bf3598 fffff800`030b85a9 nt!KeBugCheckEx

fffff880`07bf35a0 fffff800`030b7220 nt!KiBugCheckDispatch+0x69

fffff880`07bf36e0 fffff800`030d8b56 nt!KiPageFault+0x260

fffff880`07bf3870 fffff800`030959ff nt!IopCompleteRequest+0xc73

fffff880`07bf3940 fffff800`0306c0d9 nt!KiDeliverApc+0x1d7

fffff880`07bf39c0 fffff800`033f8a1a nt!KiCheckForKernelApcDelivery+0x25

fffff880`07bf39f0 fffff800`033cce2f nt!MiMapViewOfSection+0x2bafa

fffff880`07bf3ae0 fffff800`030b8293 nt!NtMapViewOfSection+0x2be

fffff880`07bf3bb0 00000000`772df93a nt!KiSystemServiceCopyEnd+0x13

00000000`0015dea8 00000000`00000000 0x772df93a

 

The reason the trap occurred is because when issuing requests to lower drivers it is common practice in drivers to implement code similar to:

 

KEVENT event;

 

status = IoCallDriver( DeviceObject, irp );

 

//

//  Wait for the event to be signaled if STATUS_PENDING is returned.

//

if (status == STATUS_PENDING) {

   (VOID)KeWaitForSingleObject( &event, // event is a local which is declared on the stack

                                Executive,

                                KernelMode,

                                FALSE,

                                NULL );

}

 

As you can see in the above code, if the return from IoCallDriver does not return pending the code continues and exits. Part of the last phase of I/O processing that takes place in the APC is signaling the event. If the call to IoCallDriver returns success, because the event is on the stack it is critical that the APC execute immediately before the stack unwinds. Since APCs where disabled, the execution of the APC was delayed and during this time the event became invalid. The APCs were delayed because the memory manager was in a critical area and APCs could not run.

 

I needed to determine which driver did this so I enabled IRP logging in Driver Verifier to trace I/O requests.  With this enabled the next dump should contain a transaction log that will help identify what driver is performing I/O while APCs are disabled.  The command line to enable this is:

verifier /flags 0x410 /all

 

The new dump with verifier enabled also crashed after delivering an APC to the thread and completing the IRP.  From the debug output below I can find the IRPs that were issued and the thread that issued them, this is what I need to look for them in the log.

 

1: kd> !thread

THREAD fffffa80064c9b50 Cid 0200.0204  Teb: 000007fffffde000 Win32Thread: 0000000000000000 RUNNING on processor 1

IRP List:

    fffff9800a33ec60: (0006,03a0) Flags: 40060070  Mdl: 00000000

    fffff9800a250c60: (0006,03a0) Flags: 40060070  Mdl: 00000000

    fffff9800a3f4ee0: (0006,0118) Flags: 40060070  Mdl: 00000000

Not impersonating

DeviceMap                 fffff8a000007890

Owning Process            fffffa80064bbb30       Image:         csrss.exe

Attached Process          N/A            Image:         N/A

Wait Start TickCount      1656           Ticks: 0

Context Switch Count      25             IdealProcessor: 0

UserTime                  00:00:00.000

KernelTime                00:00:00.000

Win32 Start Address 0x000000004a061540

Stack Init fffff88003b21c70 Current fffff88003b20890

Base fffff88003b22000 Limit fffff88003b1c000 Call 0

Priority 14 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

Child-SP          RetAddr           Call Site

fffff880`03b21428 fffff800`0307a54c nt!KeBugCheckEx

fffff880`03b21430 fffff800`030d02ee nt!MmAccessFault+0xffffffff`fff9c15c

fffff880`03b21590 fffff800`030c8db9 nt!KiPageFault+0x16e

fffff880`03b21728 fffff800`030e6ab3 nt!memcpy+0x229

fffff880`03b21730 fffff800`030c4bd7 nt!IopCompleteRequest+0x5a3

fffff880`03b21800 fffff800`0307ba85 nt!KiDeliverApc+0x1c7

fffff880`03b21880 fffff800`0331d96a nt!KiCheckForKernelApcDelivery+0x25

fffff880`03b218b0 fffff800`033e742e nt!MiMapViewOfSection+0xffffffff`fff36baa

fffff880`03b219a0 fffff800`030d1453 nt!NtMapViewOfSection+0x2bd

fffff880`03b21a70 00000000`7761159a nt!KiSystemServiceCopyEnd+0x13

00000000`0025f078 00000000`00000000 0x7761159a

 

The command “!verifier 100” will dump the transaction log.  Below is the relevant portion of the log containing the IRPs for our thread.

 

IRP fffff9800a3f4ee0, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -4, SpecialApcDisable = -1

fffff80003573a68 nt!IovAllocateIrp+0x28

fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

fffff880010f8bcc fltmgr!FltGetVolumeGuidName+0x18c

fffff88004e4fbe1 baddriver+0x12be1

fffff88004e73523 baddriver +0x36523

fffff88004e7300c baddriver +0x3600c

fffff88004e72cce baddriver +0x35cce

fffff88004e5f715 baddriver +0x22715

fffff88004e4c6c7 baddriver +0xf6c7

fffff88004e48342 baddriver +0xb342

fffff88004e5e44e baddriver +0x2144e

fffff88004e5e638 baddriver +0x21638

 

IRP fffff9800a250c60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

fffff80003573a68 nt!IovAllocateIrp+0x28

fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

fffff88001010a6b mountmgr!QueryDeviceInformation+0x207

fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

fffff80003574c16 nt!IovCallDriver+0x566

fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

fffff88004e4fbe1 baddriver +0x12be1

fffff88004e73523 baddriver +0x36523

fffff88004e7300c baddriver +0x3600c

 

IRP fffff9800a33ec60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

fffff80003573a68 nt!IovAllocateIrp+0x28

fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

fffff88001010afd mountmgr!QueryDeviceInformation+0x299

fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

fffff80003574c16 nt!IovCallDriver+0x566

fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

fffff88004e4fbe1 baddriver +0x12be1

fffff88004e73523 baddriver +0x36523

fffff88004e7300c baddriver +0x3600c

 

From the IRP log in verifier I can see that baddriver.sys is calling FltGetVolumeGuidName while APCs are disabled. Further investigation found that baddriver.sys had registered a function for image load notification, and the memory manager has APCs disabled when it calls the image notification routine. The image notification routine in baddriver.sys called FltGetVolumeGuidName which issued the I/O.  From the log output I see KernelApcDisable and SpecialApcDisable, the issue is SpecialApcDisable being –1.  The I/O completion APCs are considered special APCs, so kernel APC disable would not affect them.

 

The solution was for the driver to check for APCs disabled before issuing the FltGetVolumeGuidName and not make this call if APCs are disabled.