Archive for the ‘x86’ Category

DPCs execute on their own call stack (x86 Edition)

Thursday, April 29th, 2010

Deferred Procedure Call (DPCs) are callbacks to an arbitrary thread context at IRQL DISPATCH_LEVEL. There is a DPC queue per processor, and queueing a DPC performs two steps:

1) Inserts the DPC onto the DPC queue of the current processor.

2) Requests a DISPATCH_LEVEL software interrupt on the current processor.

Note that there are exceptions to both of those, though I’m not interested in talking about them at this moment

When the operating system is about to return to an IRQL < DISPATCH_LEVEL, the DISPATCH_LEVEL software interrupt is delivered to the processor. On XP, the ISR for this interrupt is hal!HalpDispatchInterrupt, which does some interrupt management work and calls nt!KiDispatchInterrupt. You can get a feel for how this works by setting a breakpoint on KiDispatchInterrupt and checking out a few call stacks, which should look like the following:

kidispatch

While KiInterruptDispatch serves a few different purposes, for our discussion all we care about is the beginning of the function shown here:

kidispatch_asm

Note the call near the end of the listing to nt!KiRetireDpcList. This is the function that will sit in a loop dequeing DPCs from the current processor’s DPC queue and calling the callbacks. There’s some interesting code leading up to that call though, so let’s go line by line and figure out exactly what this code is doing.

nt!KiDispatchInterrupt:
mov     ebx,dword ptr fs:[1Ch]

This line is moving the contents of offset 0x1c from the far segment into EBX. In kernel mode, the base of the far segment is the base address of what is called the PCR for the current processor:

fs_pcr

Thus, this code is grabbing whatever field is at offset 0x1c from the base of the PCR structure. Luckily we have the type information for the PCR, which is nt!_KPCR so we can easily see what is at that offset in the structure:

pcr_1c

That is the SelfPcr field, which is just the flat address of the PCR (in this case that would be 0xffdff000). Let’s move on to the next fragment:

nt!KiDispatchInterrupt+0x7:
lea  eax,[ebx+980h]
cli
cmp  eax,dword ptr [eax]
je   nt!KiDispatchInterrupt+0x2f (805459df)

Here, we add 0×980 to the base address of the PCR and store the result in EAX. We then disable interrupts on the current processor and check to see if the contents of the pointer match the pointer address.

The CMP instruction will do a logical subtract of the two values and set the Z-Flag to one if the result is zero, which would mean that the two values are the same. The JE instruction will, “Jump if the Z-Flag Equals one”, so if the contents of the pointer match the address of the pointer then this code will jump over the code segment that calls KiRetireDpcList.

If you’ve never looked at much assembly that might seem a bit weird, so let’s see what’s add offset 0×980 from the PCR and see if we can figure out what this code is doing.

If you go to a full listing of the PCR structure, you’ll notice that the last offset given is 0×120 and that is the PrcbData field:

pcr_prcb

Thus, in order to figure out what’s at offset 0×980 from the base of the PCR we’ll need to go to offset 0×860 into the PRCB. We’ll find this by doing a dt nt!_kprcb and scanning the output:

prcb_queue

Aha! That field is labeled as the DpcListHead (a.k.a. the DPC queue) and the type is a LIST_ENTRY, which is the standard type for a doubly linked list in the kernel.

LIST_ENTRY structures have two fields, a Flink field that points to the next entry and a Blink field that points to the previous entry. When a list is empty, the Flink field points back to the address of the head of the list. So our previous check above is testing the value of the Flink field against the address of the list head, in other words it is checking to see if the list is empty. If it is, the code avoids draining the DPC queue (which makes sense).

If the list is not empty, then the code sets up to call KiRetireDpcList:

nt!KiDispatchInterrupt+0x12:
push    ebp
push    dword ptr [ebx]
mov     dword ptr [ebx],0FFFFFFFFh
mov     edx,esp
mov     esp,dword ptr [ebx+988h]
push    edx
mov     ebp,eax
call    nt!KiRetireDpcList (80545e0e)

I’m going to save the first three instructions for another time if I ever get to talk about Structured Exception Handling (SEH). Right now it’s sufficient to set that the code there prevents kernel mode exceptions from being raised to user mode exception handlers.

The next two instructions are interesting though:

mov     edx,esp
mov     esp,dword ptr [ebx+988h]

Note that the code saves the current stack pointer and then overwrites ESP with a different pointer value from the PCR. We saw previously that the last offset in the PCR is 0×120, which is the beginning of the PRCB. So, whatever value is at offset 0×868 from the PRCB is what we put into the stack pointer register. If you scroll up to the previous graphic, you’ll see that field labeled as DpcStack:

   +0x868 DpcStack         : Ptr32 Void

Thus, each processor has its own DPC stack that is used when DPCs are executed. Shortly this is going to lead to an unexpected problem that this post will hopefully help you solve.

Lastly, the old stack pointer is pushed onto the stack and finally the call to KiRetireDpcList occurs. When it completes, the old stack is restored and all is right in the World.

However, there’s an interesting issue that can arise in your crash analysis. What if the system crashes inside a DPC? Due to the stack swap that occurs in KiRetireDpcList you’ll get this when you try to dump the call stack:

stackswapped

In other words, you’ll get a listing for the DPC stack and you won’t necessarily be able to see the actual kernel stack of the current thread. While in 99% of the cases the DPC stack will be the only stack that you care about, there’s that 1% where knowing the current thread stack will provide the insight necessary to solve the crash (in almost 10 years I’ve seen two). Luckily, it’s going to be relatively straightforward to get the stack back. Even more luckily, it’s mostly formulaic so even if you’re not sure why you can get it back you’ll still be able to :)

First thing you need is the old stack pointer, which is the first thing on the stack before the return address in the call to nt!KiRetireDpcList:

oldesp

Then we’re going to dump this out with the dps command and find the return address to hal!HalpDispatchInterrupt that the nt!KiDispatchInterrupt will return to. We’ll also want the first thing on the stack after the return address:

prevebp_halp

In my case, I have 0xf715da0c and hal!HalpDispatchInterrupt+0xbb. Now all that’s left is to feed those two values into the special k syntax that allows you to specify your own EBP, ESP, and EIP overrides:

origstack

Note that there’s a cheater shortcut, I could have just done k = f715da00 f715da00 @eip in this case and gotten a slightly busted but still legible stack. The technique above gives a more attractive and correct stack in the end

Possibly we can cover why this command works in the future, but for now hopefully that’s enough of a guide for you to go experiment yourselves. Don’t forget that you can always play with this on a live system where you can verify your results by simply stepping out of nt!KiRetireDpcList.

Random Other Points

1) The DISPATCH_LEVEL software interrupt isn’t always requested, so the DPC isn’t always drained when returning to an IRQL < DISPATCH_LEVEL.

2) The Idle thread also checks the DPC queue and, if it isn’t empty, drains the queue by dequeueing entries and calling the callbacks. In this case, the DPCs execute on the Idle thread’s stack

3) It is possible to target a DPC to a processor other than the current processor

Calling conventions are really just conventions…

Saturday, May 9th, 2009

In doing some work to update IrpTracker for Vista, I noticed something pretty strange.

A new API call was added in the IoCallDriver path. As of Vista, IoCallDriver makes a call to IopPoHandleIrp for any power IRPs that it receives. This API takes care of the work that used to be handled by PoCallDriver/PoStartNextPowerIrp (which is goodness, it was pretty annoying to have to special case the power IRP path). What’s so strange about this, you ask? Well, what’s strange is the calling convention used by IopPoHandleIrp and its helper routines.

Calling conventions on the x86 are pretty well known. In most cases parameters are passed on the stack, with the exception of fastcall where the first two parameters are passed in ECX and EDX (respectively) and the remaining parameters passed on the stack. C++ is actually a special case of fastcall, where the “this” pointer is passed in ECX and the first parameter is in EDX. While if you’re debugging hand written assembly you’re on you’re own, in general the compiler sticks to these conventions and you can count on them when debugging. You can imagine my surprise then when analyzing IopPoHandleIrp:

0:000> uf IopPoHandleIrp
ntkrnlmp!IopPoHandleIrp:
0042b546 mov     edi,edi ; This module was linked with /hotpatch
0042b548 push    ebp
0042b549 mov     ebp,esp ; standard EBP frame
0042b54b push    ecx
0042b54c push    ecx
0042b54d lea     eax,[ebp-4]
0042b550 push    eax
0042b551 mov     eax,esi ; Whoa! This API uses ESI without ever having loaded it!
0042b553 call    ntkrnlmp!PoHandleIrp (0050eea8)

Pretty weird! No documented x86 calling convention loads ESI with a valid value for the called function. So, strange to see a subroutine assume that the contents of ESI are valid within its frame. Clearly the compiler decided that ESI was the best way to pass a parameter to this routine and that makes things difficult for anyone that might want to call this API (it’s going to require hand written assembly to load ESI before making the call).

Digging further here, more results are in PoHandleIrp:

0:000> uf PoHandleIrp
ntkrnlmp!PoHandleIrp:
0050eea8 mov     edi,edi ; /hotpatch
0050eeaa push    ebp
0050eeab mov     ebp,esp ; EBP frame
0050eead sub     esp,18h ; Make room for local variables
0050eeb0 push    ebx
0050eeb1 push    esi
0050eeb2 push    edi
0050eeb3 mov     esi,eax ; Save EAX! Why save it? Must contain something important...
0050eeb5 call    dword ptr [ntkrnlmp!_imp__KeGetCurrentIrql (004011d0)]
0050eebb cmp     al,2 ; EAX is overwritten with the result of the above API call
0050eebd jbe     ntkrnlmp!PoHandleIrp+0x19 (0050eec1)
ntkrnlmp!PoHandleIrp+0x17:
0050eebf int     2Ch ; Assertion failure
ntkrnlmp!PoHandleIrp+0x19:
0050eec1 test    esi,esi ; ESI came from EAX, so EAX assumed to be set up by the caller!
0050eec3 jne     ntkrnlmp!PoHandleIrp+0x1f (0050eec7)

We again have an API that conforms to no known calling convention as it assumes that EAX contains some valid value set up by the caller.

The lesson here? Calling conventions on the x86 are just that, conventions. Keep your head about you when performing an analysis of a function and tracing the arguments!