Archive for the ‘Windows internals’ Category

We’re Hiring!

Friday, February 8th, 2013

For those who don’t know, I’m a Consulting Associate at OSR Open Systems Resources, Inc. We specialize in all things kernel mode on Windows, from file systems to device drivers to general Windows internals knowledge. If you come to work for us, you’ll get to work on all kinds of interesting projects in different ways (i.e. development, design, review, etc.). We also train students all over the world, which has its own fun and challenges.

I’m about to pass my 11 year anniversary here, so I think it’s a pretty great place to work and maybe you will too! Check out our job posting here:

http://www.osr.com/careers.html

And feel free to contact me if you have any questions.

Scanning for Control Areas (!ca and !memusage)

Tuesday, January 3rd, 2012

Control areas are an important data structure in Windows in that they track the state of memory mapped files. From this state you can find out things like the name of the file that is memory mapped, the location of the data in memory, etc.

Control areas are created not only for files that are memory mapped for data and executable access, but even for files being accessed via cached I/O (internally the Cache Manager memory maps these files), “special” files such as the MFT (the file systems effectively perform cached I/O on these files), and memory mapped sections that are backed by the paging file. As you can imagine, these data structures can be useful when analyzing a memory dump for lots of different reasons. So, how do you go about finding control areas to play with?

Turns out, there are two different methods provided by WinDBG for quickly scanning a target for control areas. Note that control areas can also be found indirectly through file objects, but that’s a separate technique that we’re not covering here. 

The first is the !ca command, which is normally used to display a control area that you already have the address of:

0: kd> !ca fffffa8003c2c770
ControlArea  @ fffffa8003c2c770
  Segment      fffff8a00f39d190  Flink      fffffa8003c2c3f8
  Section Ref                 0  Pfn Ref                   1
  User Ref                    0  WaitForDel                0
  File Object  fffffa800593dc40  ModWriteCount             0
  WritableRefs                0
  Flags (4080) File Accessed     
File: \Program Files\Debugging Tools for Windows (x64)\windbg.exe
Segment @ fffff8a00f39d190
  ControlArea     fffffa8003c2c770  ExtendInfo    0000000000000000
  Total Ptes                    9d
  Segment Size               9cd10  Committed                    0
  Flags (c0000) ProtectionMask
Subsection 1 @ fffffa8003c2c7f0
  ControlArea  fffffa8003c2c770  Starting Sector        0
  Base Pte     fffff8a00a3e2820  Ptes In Subsect       9d
  Flags                d100000d  Sector Offset        d10
  Accessed
  Flink        fffffa8003c2c4c0  Blink   fffffa8006a1cae0

For information about how to use the above output to retrieve the contents of the file, refer to my previous article found here.

However, !ca also has another interesting usage. If instead of a valid control area address you specify 0, you’ll be greeted to a list of all the control areas the command can find. It doesn’t do this via any sort of elegant method, instead what it does is search the various executive pools for allocations with the appropriate pool tags:

0: kd> !ca 0
Scanning large pool allocation table for Tag: MmCa (fffffa80072df000 : fffffa80075df000)
fffffa8005115bd0 0       File: \$Directory
fffffa80057a7450 1       Pagefile-backed section
...
Searching NonPaged pool (fffffa8003901000 : ffffffe000000000) for Tag: MmCa
fffffa80039cbd40 0       File: \WinDDK\7600.16385.1\inc\api\SpecStrings_strict.h
fffffa800396c250 0       File: \Windows\Fonts\couri.ttf
fffffa8007738640 0       File: \Windows\SysWOW64\davhlpr.dll
...
Scanning large pool allocation table for Tag: MmCi (fffffa80072df000 : fffffa80075df000)
fffffa80051596c0 0       Image: \Windows\SysWOW64\WMASF.DLL
fffffa8003e51260 0       Image: \Windows\SysWOW64\thumbcache.dll
fffffa8003e575f0 0       Image: \Windows\SysWOW64\netshell.dll
fffffa80061f5b50 0       Image: \Windows\System32\samlib.dll
fffffa80067125d0 0       Image: \Windows\System32\atiu9p64.dll
fffffa8006e34610 0       Image: \Windows\System32\WUDFx.dll
fffffa80056d1df0 0       Image: \Windows\System32\iertutil.dll
fffffa80056f49c0 0       Image: \Windows\System32\urlmon.dll
...
Searching NonPaged pool (fffffa8003901000 : ffffffe000000000) for Tag: MmCi
fffffa8003aace30 0       Image: \Windows\System32\dmocx.dll
fffffa8003ae5750 0       Image: \WinDDK\7600.16385.1\bin\x86\rc.exe
fffffa8006dff160 0       Image: \Windows\System32\mapi32.dll
fffffa8006dff460 0       Image: \Windows\SysWOW64\mfplat.dll
fffffa8006e169a0 0       Image: \Windows\SysWOW64\d3d8thk.dll
fffffa8006e2b5a0 0       Image: \Windows\SysWOW64\atiumdva.dll
...

From here, you can start running !ca on whatever control areas you might find interesting (the control area address is the address in the first column of the output). Note that files may in fact have two control areas, one for data access as well as one for image access.

The other way of finding control areas is by using the !memusage command (this is, in fact, the method documented in the Windows Internals book). This method is different in that instead of searching for pool tags, the !memusage command actually scans the PFN database and uses the data from the entries to find control areas. This is a much more complex procedure that we’ll have to save for another time :)

 

Interpreting !pool results part II: Large Pool oddities

Monday, February 14th, 2011

In my previous post Interepeting !pool results, I talked about the !pool command and walked through the output. However, I purposefully left off a couple of details when it comes to non-standard pool allocations, such as allocations from special pool and large pool allocations. This led to a question from a reader asking for an explanation of the strange output they were seeing when running !pool on a particular allocation:

Note how there appears to be two entries for a single allocation, one showing this as a freed allocation and the other saying that it is a valid, “large page allocation” of 0x2bc0 bytes. What’s up with that?

The answer is that large pool allocations are treated differently from other types of allocations. An allocation is considered to be a large allocation if it cannot fit within a page of memory (as defined by the architecture, i.e. 4K on the x86/x64) minus the overhead of the pool header required on the block. So, for example, an allocation of PAGE_SIZE bytes would be considered a large page allocation and thus tracked as such.

A large allocation is unique in that it does not contain a POOL_HEADER structure. Instead, the caller is returned a page aligned virtual address and the pool header is tracked in a global variable in the system. When a page aligned addressed is freed back to the allocator, the large pool table is consulted to determine if the allocation is indeed a large page so that the memory can be freed properly (NOTE: on legacy systems such as XP, this was managed a bit differently and the global table was only maintained when pool tracking was enabled via GFlags).

Now we can begin to unravel why the !pool output above is confused. Remember from the previous post that when supplied with an address, !pool will round the address down to PAGE_SIZE and interpret the result as a POOL_HEADER structure. However, in this case the address supplied is the base of a large pool allocation, thus it does not start with a POOL_HEADER structure. So what we have is the first bytes of this driver’s pool allocation being interpreted as a POOL_HEADER:

This is effectively garbage and thus the walk to the next entry fails. However, before reporting an error !pool decides to consult the large pool allocation table nt!PoolBigPageTable to determine if this is a valid large pool allocation (the structure of this table is not documented, however if you’re interested enough to care it’s easy enough to figure out :) ). In this case an entry is found, thus we get a successful hit and the output corrects itself.

I consider this to be a bug in the implementation, it would be better to look at the large pool allocation table first and then fall back to interpreting it as a pool header if it is not found. While this would be more overhead in the common case, it would make the less common case a little less confusing. As such, I have reported it as a bug and hopefully we’ll get a fix in the future.

Finding registered Configuration Manager (i.e. registry) callbacks

Monday, November 29th, 2010

In response to my previous article on finding image and process load notification callbacks, I have put together this article on finding registered Configuration Manager (i.e. registry) callbacks.

Way back in the day when the Sysinternals folks wanted to write a utility to monitor registry activity, there was only one way to do it: hook the System Service Descriptor Table. While this is great for a developer support utility, SSDT hooks have all kinds of security and stability issues that are outside the scope of this post. In addition, there is no clear layering model to hooks. What happens if another product gets installed on the machine that also wants to monitor registry activity?

In order to address the need in the community for monitoring registry activity, Microsoft added an architected solution to Windows XP in the form of Configuration Manager callbacks. These callbacks can be registered with CmRegisterCallback on legacy platforms and CmRegisterCallbackEx on Vista and later.

In the interest of finding out who is registering these callbacks, I have created cmcallbacks.wbs that can be executed with the following command:

$$><c:\dumps\cmcallbacks.wbs

If everything goes right, you should see the following output:

1: kd> $$>a<c:\dumps\cmcallbacks.wbs
************************************************
* This command brought to you by Analyze-v.com *
************************************************
**********************************
* Printing registry callbacks... *
**********************************
MpFilter!MpRegCallback:
fffff880`011d1ea8 48895c2408      mov     qword ptr [rsp+8],rbx
--------------------------------------------

Here is the script in its entirety, enjoy! It can also be downloaded here.

$$ cmcallbacks.wbs
$$
$$ Version 1
$$
$$
$$ Scott Noone - analyze-v.com
$$ snoone@analyze-v.com
$$
$$ This script walks the list of registered Configuration Manager (registry)
$$ callbacks
$$
$$ Currently tested platforms:
$$
$$  Win7 x64
$$
$$  WinXP x86
$$
.echo ************************************************
.echo * This command brought to you by Analyze-v.com *
.echo ************************************************
.echo
$$ Start off by creating some aliases that will make this script more readable
$$ Globals
aS BuildNumber    "low(dwo(nt!NtBuildNumber))";
aS CmCallBackCount "dwo(nt!CmpCallBackCount)";
$$ Pre-Vista uses an array of callbacks
aS CmCallBackBase  "nt!CmpCallBackVector";
$$ Vista and later has a list of callbacks
aS CmCallBackListHead "@@c++((nt!_LIST_ENTRY *)@@masm(nt!CallbackListHead))"
$$ Can't seem to find a type for the entries in the list, so offsets it is
aS CallBackOffsetx86 0x1C
aS CallBackOffsetx64 0x28
$$ variables
aS listEntry "@$t0"
aS i "@$t1";
aS cmEntry "@$t2";
aS functionPtr "@$t3";
$$ The .block is necessary to make sure that all of the above aliases are evaluated
.block
{
    .echo **********************************;
    .echo * Printing registry callbacks... *;
    .echo **********************************;
    .echo
    .if (${BuildNumber} <= 0n3790)
    {
        $$ Pre-Vista callbacks. These work like the Ps callbacks (see pscallbacks.wbs,
        $$ also available on analyze-v.com)
        $$ Walk the CM notification routines
        r ${cmEntry} = ${CmCallBackBase};
        .for (r ${i} = 0; ${i} < ${CmCallBackCount}; r ${i} = ${i} + 1)
        {
            $$ This points to a function, though the bottom bits are control info.
            $$ So, mask those off
            r ${functionPtr} = (poi(${cmEntry}) & -8);
            $$ Unassemble the first instruction
            .if (@$ptrsize == 4)
            {
                $$ 32bit systems seem to have more control info ahead of the function
                $$ pointer, so skip that.
                u poi(${functionPtr} + 4) l1;
            }
            .else
            {
                u poi(${functionPtr}) l1;
            }
            $$ Walk to the next entry in the array
            r ${cmEntry} = ${cmEntry} + @$ptrsize;
     .echo --------------------------------------------;
        }
    }
    .else
    {
        $$ Vista and later callbacks
        $$ Get the Flink value of the head. Note that r? is required
        $$ to use the C++ syntax in the r statement. Also, the type is
        $$ assigned to the resulting register
        r? ${listEntry} = @@c++(${CmCallBackListHead}->Flink)

        $$ And loop while we have entries...

        .while (${listEntry} != ${CmCallBackListHead})
        {
            $$ Dump the callback
            .if (@$ptrsize == 4)
            {
                u poi(${listEntry} + ${CallBackOffsetx86}) l1
            }
            .else
            {
                u poi(${listEntry} + ${CallBackOffsetx64}) l1
            }
            $$ Move on to the next entry. Our r? command makes sure
            $$ that ${listEntry} is typed appropriately
            r? ${listEntry} = @@c++(${listEntry}->Flink);
            .echo --------------------------------------------;

         }
     }
}
$$ Clean up our aliases
ad ${/v:listEntry};
ad ${/v:BuildNumber};
ad ${/v:CmCallBackCount};
ad ${/v:CmCallBackListHead};
ad ${/v:CallBackOffsetx86};
ad ${/v:CallBackOffsetx64};
ad ${/v:CmCallBackBase};
ad ${/v:i};
ad ${/v:cmEntry};
ad ${/v:functionPtr}; 

Interpreting !pool results

Wednesday, November 10th, 2010

Yes, I’m still here! Been quite busy lately and all of the other usual excuses that people give for not updating their blogs…

!pool is an incredibly handy WinDBG command. It takes a virtual address as a parameter and will attempt to determine if that address represents an executive pool allocation. If it is a pool allocation, !pool will identify the base address of the allocation that this address is a part of, the size of the allocation, the state of the allocation (valid or free), and the four character tag used to allocate the memory. Pretty nifty, but how exactly does this command work and what are we looking at in the resulting output?

First off, I’ll start by saying that, in the general case, small allocations in the pools are maintained in PAGE_SIZE units. This means that on the x86 and x64, the pools are maintained in chunks of 4K (note that I’m leaving pesky cases like large allocations and special pool for another time). Each allocation in the page is preceded by a POOL_HEADER structure, which tracks interesting info about the allocation in the page:

The interesting fields here are:

PreviousSize

On first reading you might be tempted to think of this as the previous size of the allocation, but in reality what it tracks is the size of the preceding entry in the pool page plus the size of the pool header structure. As we’ll see in a moment, the O/S and the !pool command will use this information to perform validation on the pool page. Note that this field is only 8 bits long, so it’s not large enough to actually describe any reasonable allocation. Thus, there is a multiplier applied to this field in order to get the actual size of the allocation.

BlockSize

This is the size of the allocation described by the pool header, including the size of the pool header. Note again that this is only eight bits and thus requires a multiplier.

PoolType

The pool from which this allocation came from, for example the non-paged pool. This field can also indicate that the pool region represents freed memory by having a type of zero.

PoolTag

The four character tag used when allocating the buffer.

With that information in hand, we can look at the output of  a !pool command and see what exactly is going on:

NOTE: The Protected value in the output simply means that the allocation was made with the PROTECTED_POOL bit set in the pool tag (this is the high bit in the tag value). See here for more information about protected pool.

In the output above, we have asked the !pool command to find the allocation containing fffffa80`070c63e0. Hopefully with the explanation above, the output here should make a bit more sense. Note the first line of the output:

If you compare the address of the first pool entry listed in the output to the address supplied as the parameter, we see that it is the specified address rounded down to PAGE_SIZE. !pool does this because small pool allocations are maintained in PAGE_SIZE chunks, so if the command wants to find out of the supplied address is a valid pool allocation it just needs to round down to PAGE_SIZE and then start trying to walk the pool page. The values supplied in the above output for size, previous size, and tag are then just values retrieved from the pool header that begins the pool page:

In the output above, we see that the allocation size of the preceding entry in the page is zero, which makes sense seeing as how this is the first allocation in the page. The size of the allocatoin is 0×15, which is represented in the !pool output as being 0×150, so we have to multiply that size by 0×10 in order to get the actual size of the allocation. The pool type of two tells us that this is non-paged pool and the pool tag is ‘eliF’ with the high bit set, resulting in the File (Protected) output in the !pool result. 

 Based on that information, we can now found the next entry in the pool page by simply adding the size of the allocation to the address of the header and casting that as a pool header:

  

 Note how the previous size field in this header matches the size of the previous entry. That is a simple consistency check performed throughout the entire page of pool which allows !pool to determine if this is actually a valid block of pool. If those values didn’t match up, !pool would suspect that this was an invalid or otherwise corrupted pool page and return an error. The O/S also uses this consistency check to determine if someone has corrupted the pool page when allocating or freeing memory. Inconsistencies in the page lead to the memory manager crashing the system with a BAD_POOL_CALLER bugcheck.

From here, all that is left is for !pool  to find the address specified falling within the address of a pool header plus the length of the allocation in header. !pool indicates that it has found the address by putting an asterisk next to the address of the pool header of the allocation. !pool then continues to walk the remainder of the pool page to make sure that the entire page is consistent and has not been corrupted.

Object Tracking and WinDBG

Wednesday, August 25th, 2010

The Object Manager (Ob) in Windows provides and excellent feature called object tracking, which causes the Ob to maintain a list of every active object in the system. When activated, it allows you to find every driver object, event object, file object, mutex object, etc. at any point in time via the !object command. While the overhead of this is likely to be unacceptable for everyday use, in certain debugging situations it can be immensely helpful. For example, I recently debugged an issue where autochk would not run when our file system filter was installed on the system. I suspected that  a rogue file object was preventing NTFS from dismounting, so I turned on object tracking in order to quickly find the file object causing the problem (turned out to be multiple file objects).

Unfortunately things have changed in Windows 7 and the debugging tools haven’t caught up so this no longer works, but I’ll provide a solution to that once I get there…

Enabling Object Tracking

Object tracking is enabled via the FLG_MAINTAIN_OBJECT_TYPELIST GFlags option. You can enable this via the GFlags utility on the target machine, but I prefer to do it via the debugger on a per-boot basis so that I don’t have to remember to shut it off:

1: kd> !gflag + otl
Current NtGlobalFlag contents: 0x00004000
    otl - Maintain a list of objects for each type

Important note: This must be done very early in the boot process before the Ob initializes. I recommend setting an initial break in the debugger by using the WinDBG command CTRL+ALT+K and using the !gflag command at the initial break.

Once you’ve enabled the command, just hit Go and proceed to run whatever tests or do whatever you like. Once you’re ready to start inspecting objects, the path you will take will differ on Vista (and earlier) and Windows 7.

Dumping Objects Prior To Win7

Prior to Win7, life is fairly straightforward as the !object command supports walking the object list. The syntax for the command is:

!object 0 Name

Where Name is documented to be:

Name
If the first argument is zero, the second argument is interpreted as the name of a class of system objects for which to display all instances.

So, for example:

!object 0 File
!object 0 Event
!object 0 Semaphore
!object 0 Device
!object 0 Driver

Any of these will dump out all of the objects of that particular type and you can then pick through and do whatever it is you do with that information.

Dumping Objects on Win7

Now for the fun part. If you attempt to run any of the above !object  commands on a Win7 target, you’ll get the following error:

1: kd> !object 0 File
Scanning 723 objects of type 'File'
WARNING: Object header 83d8bb48 flag (42) does not have
OB_FLAG_CREATOR_INFO (4) set

The problem is that starting with Win7, that flag no longer exists. Instead, the object header tracks whether or not this feature is enabled via another field in the header. So, unfortunately, the Ob changed but the !object command wasn’t updated to reflect the changes.

We can get this back though from a gratuitously complicated debugger command that walks the list starting at an entry. Finding the starting entry could be simplified, but I’ll make you find it manually because that’s how I did it when I wrote the script and I don’t want to make it too easy on you :)

Also, I’ll apologize in advance for the script being on a single line and thus guaranteeing that it will require some sort of WinDBG Rosetta Stone in order to decipher (again, because that’s how I did it when I wrote it…Job security!).

First, you’ll need to dump the global type variable for the type of objects you want to see. Examples of these are IoFileObjectType, ExEventObjectType, IoDriverObjectType, etc. (if you’re having trouble finding the name of the one you want just let me know). I’ll pick the file object type:

1: kd> x nt!iofileobjecttype
82775a54 nt!IoFileObjectType = 0x83d656e0
1: kd> dt nt!_object_type 0x83d656e0
   +0x000 TypeList         : _LIST_ENTRY [ 0x83d8bb38 - 0x846022d0 ]
   +0x008 Name             : _UNICODE_STRING "File"
   +0x010 DefaultObject    : 0x0000005c Void
   +0x014 Index            : 0x1c ''
   +0x018 TotalNumberOfObjects : 0x2d3
   +0x01c TotalNumberOfHandles : 0xaa
   +0x020 HighWaterNumberOfObjects : 0x691
   +0x024 HighWaterNumberOfHandles : 0xb6
   +0x028 TypeInfo         : _OBJECT_TYPE_INITIALIZER
   +0x078 TypeLock         : _EX_PUSH_LOCK
   +0x07c Key              : 0x656c6946
   +0x080 CallbackList     : _LIST_ENTRY [ 0x83d65760 - 0x83d65760 ] 

Note the TypeList field. That’s the list of currently valid objects for that type in the system in the form of OBJECT_HEADER_CREATOR_INFO structures, which currently exist directly before the object header. So, TypeList entry address + sizeof(OBJECT_HEADER_CREATOR_INFO) + FIELD_OFFSET(OBJECT_HEADER, Body) is where the actual object address is. I’ll put that all together into the following command (I’m going to break it up C style with “\” characters so you can see it, please remove before actually using and make into a single line):

r @$t0 = @@(sizeof(nt!_object_header_creator_info) \
+ #FIELD_OFFSET(nt!_object_header, Body)); \
!list "-x \".block {as /x Res @$extret+@$t0} ;\
.block{.echo ${Res}; !object ${Res}} ; \
ad /q Res\" 0x83d8bb38" 

Note that the address used at the end of the command is from the dt output above in bold. Also remember that it’s written to occupy a single line, so it needs to get pasted into the KD prompt with no newlines and those backslashes removed. Running the command should provide you with relatively the same output at the old !object command on previous O/S releases.

If you’d like to clean up the script or, even better, turn it into a debugger extension that takes a name like !object, please let me know or send along the results. Right now it’s on my ever increases prioritized list of things to do and I’d like to take it off :)

Checked out The NT Insider digital edition yet?

Wednesday, August 18th, 2010

We’ve finally gone digital with The NT Insider! You can grab the PDF here and read about all sorts of interesting topics (writing file system filter drivers, debugger extensions, and virtual storage miniports, to name a few).

DPCs execute on their own call stack (x86 Edition)

Thursday, April 29th, 2010

Deferred Procedure Call (DPCs) are callbacks to an arbitrary thread context at IRQL DISPATCH_LEVEL. There is a DPC queue per processor, and queueing a DPC performs two steps:

1) Inserts the DPC onto the DPC queue of the current processor.

2) Requests a DISPATCH_LEVEL software interrupt on the current processor.

Note that there are exceptions to both of those, though I’m not interested in talking about them at this moment

When the operating system is about to return to an IRQL < DISPATCH_LEVEL, the DISPATCH_LEVEL software interrupt is delivered to the processor. On XP, the ISR for this interrupt is hal!HalpDispatchInterrupt, which does some interrupt management work and calls nt!KiDispatchInterrupt. You can get a feel for how this works by setting a breakpoint on KiDispatchInterrupt and checking out a few call stacks, which should look like the following:

kidispatch

While KiInterruptDispatch serves a few different purposes, for our discussion all we care about is the beginning of the function shown here:

kidispatch_asm

Note the call near the end of the listing to nt!KiRetireDpcList. This is the function that will sit in a loop dequeing DPCs from the current processor’s DPC queue and calling the callbacks. There’s some interesting code leading up to that call though, so let’s go line by line and figure out exactly what this code is doing.

nt!KiDispatchInterrupt:
mov     ebx,dword ptr fs:[1Ch]

This line is moving the contents of offset 0x1c from the far segment into EBX. In kernel mode, the base of the far segment is the base address of what is called the PCR for the current processor:

fs_pcr

Thus, this code is grabbing whatever field is at offset 0x1c from the base of the PCR structure. Luckily we have the type information for the PCR, which is nt!_KPCR so we can easily see what is at that offset in the structure:

pcr_1c

That is the SelfPcr field, which is just the flat address of the PCR (in this case that would be 0xffdff000). Let’s move on to the next fragment:

nt!KiDispatchInterrupt+0x7:
lea  eax,[ebx+980h]
cli
cmp  eax,dword ptr [eax]
je   nt!KiDispatchInterrupt+0x2f (805459df)

Here, we add 0×980 to the base address of the PCR and store the result in EAX. We then disable interrupts on the current processor and check to see if the contents of the pointer match the pointer address.

The CMP instruction will do a logical subtract of the two values and set the Z-Flag to one if the result is zero, which would mean that the two values are the same. The JE instruction will, “Jump if the Z-Flag Equals one”, so if the contents of the pointer match the address of the pointer then this code will jump over the code segment that calls KiRetireDpcList.

If you’ve never looked at much assembly that might seem a bit weird, so let’s see what’s add offset 0×980 from the PCR and see if we can figure out what this code is doing.

If you go to a full listing of the PCR structure, you’ll notice that the last offset given is 0×120 and that is the PrcbData field:

pcr_prcb

Thus, in order to figure out what’s at offset 0×980 from the base of the PCR we’ll need to go to offset 0×860 into the PRCB. We’ll find this by doing a dt nt!_kprcb and scanning the output:

prcb_queue

Aha! That field is labeled as the DpcListHead (a.k.a. the DPC queue) and the type is a LIST_ENTRY, which is the standard type for a doubly linked list in the kernel.

LIST_ENTRY structures have two fields, a Flink field that points to the next entry and a Blink field that points to the previous entry. When a list is empty, the Flink field points back to the address of the head of the list. So our previous check above is testing the value of the Flink field against the address of the list head, in other words it is checking to see if the list is empty. If it is, the code avoids draining the DPC queue (which makes sense).

If the list is not empty, then the code sets up to call KiRetireDpcList:

nt!KiDispatchInterrupt+0x12:
push    ebp
push    dword ptr [ebx]
mov     dword ptr [ebx],0FFFFFFFFh
mov     edx,esp
mov     esp,dword ptr [ebx+988h]
push    edx
mov     ebp,eax
call    nt!KiRetireDpcList (80545e0e)

I’m going to save the first three instructions for another time if I ever get to talk about Structured Exception Handling (SEH). Right now it’s sufficient to set that the code there prevents kernel mode exceptions from being raised to user mode exception handlers.

The next two instructions are interesting though:

mov     edx,esp
mov     esp,dword ptr [ebx+988h]

Note that the code saves the current stack pointer and then overwrites ESP with a different pointer value from the PCR. We saw previously that the last offset in the PCR is 0×120, which is the beginning of the PRCB. So, whatever value is at offset 0×868 from the PRCB is what we put into the stack pointer register. If you scroll up to the previous graphic, you’ll see that field labeled as DpcStack:

   +0x868 DpcStack         : Ptr32 Void

Thus, each processor has its own DPC stack that is used when DPCs are executed. Shortly this is going to lead to an unexpected problem that this post will hopefully help you solve.

Lastly, the old stack pointer is pushed onto the stack and finally the call to KiRetireDpcList occurs. When it completes, the old stack is restored and all is right in the World.

However, there’s an interesting issue that can arise in your crash analysis. What if the system crashes inside a DPC? Due to the stack swap that occurs in KiRetireDpcList you’ll get this when you try to dump the call stack:

stackswapped

In other words, you’ll get a listing for the DPC stack and you won’t necessarily be able to see the actual kernel stack of the current thread. While in 99% of the cases the DPC stack will be the only stack that you care about, there’s that 1% where knowing the current thread stack will provide the insight necessary to solve the crash (in almost 10 years I’ve seen two). Luckily, it’s going to be relatively straightforward to get the stack back. Even more luckily, it’s mostly formulaic so even if you’re not sure why you can get it back you’ll still be able to :)

First thing you need is the old stack pointer, which is the first thing on the stack before the return address in the call to nt!KiRetireDpcList:

oldesp

Then we’re going to dump this out with the dps command and find the return address to hal!HalpDispatchInterrupt that the nt!KiDispatchInterrupt will return to. We’ll also want the first thing on the stack after the return address:

prevebp_halp

In my case, I have 0xf715da0c and hal!HalpDispatchInterrupt+0xbb. Now all that’s left is to feed those two values into the special k syntax that allows you to specify your own EBP, ESP, and EIP overrides:

origstack

Note that there’s a cheater shortcut, I could have just done k = f715da00 f715da00 @eip in this case and gotten a slightly busted but still legible stack. The technique above gives a more attractive and correct stack in the end

Possibly we can cover why this command works in the future, but for now hopefully that’s enough of a guide for you to go experiment yourselves. Don’t forget that you can always play with this on a live system where you can verify your results by simply stepping out of nt!KiRetireDpcList.

Random Other Points

1) The DISPATCH_LEVEL software interrupt isn’t always requested, so the DPC isn’t always drained when returning to an IRQL < DISPATCH_LEVEL.

2) The Idle thread also checks the DPC queue and, if it isn’t empty, drains the queue by dequeueing entries and calling the callbacks. In this case, the DPCs execute on the Idle thread’s stack

3) It is possible to target a DPC to a processor other than the current processor

Undocumented !verifier flags value (!verifier 0×200)

Wednesday, April 14th, 2010

Starting with Windows Vista, Driver Verifier has been updated to include circular trace buffers for interesting events. My favorite up until this point has been the pool allocate and free log, which records the call stack, calling thread, and address of pool allocations and frees. If the system then crashes due to a double free or access to a freed pool block, the debugger’s !verifier 0×80 command can be used to dump the alloc/free log. Even better, the command takes an optional address value that will show only the allocations and frees of the pool block containing that address.

You can see the results in this example from the WinDBG docs:

0: kd> !verifier 80 a2b1cf20
Parsing 00004000 array entries, searching for address a2b1cf20.
=======================================
Pool block a2b1ce98, Size 00000168, Thread a2b1ce98
808f1be6 ndis!ndisFreeToNPagedPool+0x39
808f11c1 ndis!ndisPplFree+0x47
808f100f ndis!NdisFreeNetBufferList+0x3b
8088db41 NETIO!NetioFreeNetBufferAndNetBufferList+0xe
8c588d68 tcpip!UdpEndSendMessages+0xdf
8c588cb5 tcpip!UdpSendMessagesDatagramsComplete+0x22
8088d622 NETIO!NetioDereferenceNetBufferListChain+0xcf
8c5954ea tcpip!FlSendNetBufferListChainComplete+0x1c
809b2370 ndis!ndisMSendCompleteNetBufferListsInternal+0x67
808f1781 ndis!NdisFSendNetBufferListsComplete+0x1a
8c04c68e pacer!PcFilterSendNetBufferListsComplete+0xb2
809b230c ndis!NdisMSendNetBufferListsComplete+0x70
8ac4a8ba test1!HandleCompletedTxPacket+0xea
=======================================
Pool block a2b1ce98, Size 00000164, Thread a2b1ce98
822af87f nt!VerifierExAllocatePoolWithTagPriority+0x5d
808f1c88 ndis!ndisAllocateFromNPagedPool+0x1d
808f11f3 ndis!ndisPplAllocate+0x60
808f1257 ndis!NdisAllocateNetBufferList+0x26
80890933 NETIO!NetioAllocateAndReferenceNetBufferListNetBufferMdlAndData+0x14
8c5889c2 tcpip!UdpSendMessages+0x503
8c05c565 afd!AfdTLSendMessages+0x27
8c07a087 afd!AfdTLFastDgramSend+0x7d
8c079f82 afd!AfdFastDatagramSend+0x5ae
8c06f3ea afd!AfdFastIoDeviceControl+0x3c1
8217474f nt!IopXxxControlFile+0x268
821797a1 nt!NtDeviceIoControlFile+0x2a
8204d16a nt!KiFastCallEntry+0x127

In the output, the most recent event is at the top. Thus, here you can see that the buffer was allocated with ndisAllocateFromNPagedPool and freed with ndisAllocateFromNPagedPool.

In addition to the pool allocation log, !verifier 0×100 shows the IRP log, which logs all IoCallDriver, IoCompleteRequest, and IoCancelIrp calls.

Based on the docs you’d think that’s all there is, but there’s an undocumented log that can be accessed with !verifier 0×200 and that is the critical region log.

This is not to be confused with the user mode concept of critical regions. In a driver, one can call KeEnterCriticalRegion and KeExitCriticalRegion in order to disable and re-enable APC delivery. Without getting too much in to why a driver needs to disable APC delivery, what’s important to note is that every call to KeEnterCriticalRegion must be matched with a call to KeExitCriticalRegion. If a driver gets this wrong, then the system will crash with an APC_INDEX_MISMATCH bugcheck when it notices that the enter/exit count is off.

The way this works is that entering a critical region decrements a field of the KTHREAD structure and exiting a critical region increments the field of the structure. At various points in the O/S, the field of the KTHREAD is checked to make sure that it is zero. If it isn’t, then the system crashes with the previously mentioned APC_INDEX_MISMATCH bugcheck code. One such place that this is checked is in the system service dispatcher before returning back to the caller, which is why you’ll see these bugchecks come from KiSystemServiceExit.

What makes these crashes particularly difficult to track down is that the crash is a secondary failure, by the time the system notices that the count field is incorrect the code that caused the bad state is gone. Enter the critical region log, which will trace every call to KeEnterCriticalRegion and KeLeaveCriticalRegion for the Verified drivers. Now, when the system crashes you can just type !verifier 0×200 in the debugger and find the mismatched call.

Note that this only works with Driver Verifier enabled, just another reason to make sure that you’re always testing with Verifier!

Why does my target machine sometimes become unresponsive after I set a breakpoint?

Wednesday, February 10th, 2010

Ever had this happen to you? You set a seemingly normal breakpoint in your target machine from WinDBG and resume the system:

bpset

You then go back over to your target system and it is completely unresponsive. The mouse won’t move, keyboard doesn’t work, etc. So, you go back over to your debugger system, break into the target, clear the breakpoint, and all is fine again. Disassembling the routine just to make sure all is well doesn’t show anything of interest that might explain this:

coderesident

So what’s up with that?

The problem is actually a variation of the issue that was described by Bryce Jonasson and Jen-Leng Chiu of the Debugging Tools for Windows team in a recent issue of The NT Insider, available here. The issue is that, even though the u command indicates otherwise, the offending code in question is actually paged out at the moment. Not entirely paged out to disk mind you, but currently in transition. We can see this by examining the state of the PTE with the !pte command:

notvalidintrans1

If the page is not currently valid, then why are we seeing valid data contents when we unassemble the virtual address? The reason is that, by default, the debugger will automatically decode and display the contents of pages that are in transition. They are still in memory and the PTE contains the information necessary to find the actual memory, so why not show the data? This behavior can be changed with the nodecodeptes parameter to the .cache command:

autodecodeoff

If we now try to disassemble the address we’ll see that the virtual address is in fact current invalid:

nolongermapped

We can restore the default behavior by specifying the decodeptes option:

decodeenabled

But why does the system go into a tailspin when we set a breakpoint on this address?  The reason is that the debugger cannot set a breakpoint on an invalid PTE. Thus, setting a breakpoint on this address sets what is called an, “owed” breakpoint. When we resume the target system, the target system will try like heck to set a breakpoint on this address every chance it gets, which might result in the system not doing much but trying to set this breakpoint.

The best workaround for this issue is to set a hardware access breakpoint on the address instead of a software breakpoint. This will utilize the support of the processor to break when someone finally executes this address instead of trying to set the breakpoint by replacing a byte of memory with an int 3 instruction: 

boabp

An alternative would be to use the .pagein command to force the memory to be paged in on the target, though paging in kernel memory requires support from the operating system that is only in Windows Vista and later. There is also the .allow_bp_ba_convert command, which enabled an option that would have automatically converted our bp breakpoint into a ba breakpoint in this situation.