Archive for the ‘Driver development’ Category

We’re Hiring!

Friday, February 8th, 2013

For those who don’t know, I’m a Consulting Associate at OSR Open Systems Resources, Inc. We specialize in all things kernel mode on Windows, from file systems to device drivers to general Windows internals knowledge. If you come to work for us, you’ll get to work on all kinds of interesting projects in different ways (i.e. development, design, review, etc.). We also train students all over the world, which has its own fun and challenges.

I’m about to pass my 11 year anniversary here, so I think it’s a pretty great place to work and maybe you will too! Check out our job posting here:

http://www.osr.com/careers.html

And feel free to contact me if you have any questions.

Why is lmv sometimes more verbose?

Saturday, January 28th, 2012

While working in WinDBG, lm is the standard command for viewing the loaded module list of the target. Amongst other things, this command takes a v flag to increase the verbosity as well as an m flag to match a particular module. Thus, if you want to see verbose information for a module named foo you can execute the following command:

lmv mfoo

That’s all fine and good, but you might notice that some modules provide more information than others. For example, check out the detailed information provided for the NTFS module:

1: kd> lmv mntfs
start             end                 module name
fffff880`01250000 fffff880`013f3000   Ntfs       (deferred)
    Image path: \SystemRoot\System32\Drivers\Ntfs.sys
    Image name: Ntfs.sys
    Timestamp:        Mon Jul 13 19:20:47 2009 (4A5BC14F)
    CheckSum:         00195F88
    ImageSize:        001A3000
    File version:     6.1.7600.16385
    Product version:  6.1.7600.16385
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     ntfs.sys
    OriginalFilename: ntfs.sys
    ProductVersion:   6.1.7600.16385
    FileVersion:      6.1.7600.16385 (win7_rtm.090713-1255)
    FileDescription:  NT File System Driver
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

Versus the limited information we get for the FltMgr module:

1: kd> lmv mfltmgr
start             end                 module name
fffff880`010fa000 fffff880`01146000   fltmgr     (deferred)
    Image path: \SystemRoot\system32\drivers\fltmgr.sys
    Image name: fltmgr.sys
    Timestamp:        Mon Jul 13 19:19:59 2009 (4A5BC11F)
    CheckSum:         00056413
    ImageSize:        0004C000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

What’s up with that?

The answer is the availability of resource information. The PE file format allows for a .rsrc section, which contains developer provided information about the binary (company, version, description, etc.). The information placed here comes from a .rc file supplied when the image is compiled and is usually accessed via the Details tab of the file properties:

If an image contains a resident and valid .rsrc section, lmv will provide the details from it as part of its output. If we dump the PE header for the NTFS module, we can see that it does indeed have a .rsrc section:

1: kd> !dh ntfs
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
    8664 machine (X64)
       8 number of sections
4A5BC14F time date stamp Mon Jul 13 19:20:47 2009
...
SECTION HEADER #7
   .rsrc name
   1CFF0 virtual size
  185000 virtual address
   1D000 size of raw data
  176600 file pointer to raw data
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40000040 flags
         Initialized Data
         (no align specified)
         Read Only

Which explains why we saw all of that detailed info for NTFS. But why didn’t we see it for FltMgr? In fact, if we check out the file properties for FltMgr.sys we see that it does indeed have a valid resource section:

But lmv was unable to parse it for some reason. Let’s check out the PE header to find out why:

1: kd> !dh fltmgr
File Type: DLL
FILE HEADER VALUES
    8664 machine (X64)
       B number of sections
4A5BC11F time date stamp Mon Jul 13 19:19:59 2009
...
SECTION HEADER #A
   .rsrc name
    1E50 virtual size
   49000 virtual address
    2000 size of raw data
   42E00 file pointer to raw data
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
42000040 flags
         Initialized Data
         Discardable
         (no align specified)
         Read Only

Aha! The .rsrc section is marked as discardable, meaning the the image loader is free to remove this section from memory when the module is loaded. Thus, while this module has a valid .rsrc section on disk, it is not guaranteed to have one in memory.

Discardable is the default used by the compiler/linker for resource sections because the information is typically only used in file property dialogs, where it can easily be retrieved from the image on disk. However, certain Microsoft supplied modules (such as NTFS) go out of their way to mark their resource sections as non-discardable to make it possibly to query version information from the debugger at the cost of some extra RAM.

Finding image load and process notification callbacks

Sunday, November 14th, 2010

For a while now, Windows has supported image and process load notification callbacks. These callbacks are established by a driver by calling either PsSetImageLoadNotifyRoutine or PsSetCreateProcessNotifyRoutine{Ex}, respectively.

Given this, do you wonder what drivers are running on your system that are monitoring your image and process loads? Up until now there has been no easy way to view the current list of registered drivers. I was finally curious enough on my own system to toss together a WinDBG script that will walk both lists. It’s long, though I’ve tried to comment it enough that anyone can follow it. To run it, just copy the contents of the script to a file pscallbacks.wbs and execute the following command:

$$>a<c:\dumps\pscallbacks.wbs

The command also takes an option parameter to show just the image load or process load callbacks:

$$>a<c:\dumps\pscallbacks.wbs i

$$>a<c:\dumps\pscallbacks.wbs p

If everything goes right, you should see the following output:

0: kd> $$>a<c:\dumps\pscallbacks.wbs
************************************************
* This command brought to you by Analyze-v.com *
************************************************
************************************
* Printing image load callbacks... *
************************************
nt!EtwpTraceLoadImage:
fffff800`02de4ecc 48895c2420      mov     qword ptr [rsp+20h],rbx
--------------------------------------------
MpFilter!MpLoadImageNotifyRoutine:
fffff880`02a78220 4c8bdc          mov     r11,rsp
--------------------------------------------
**********************************************
* Printing process notification callbacks... *
**********************************************
nt!ViCreateProcessCallback:
fffff800`02a90688 48895c2408      mov     qword ptr [rsp+8],rbx
--------------------------------------------
ksecdd!KsecCreateProcessNotifyRoutine:
fffff880`01211330 4883ec28        sub     rsp,28h
--------------------------------------------
cng!CngCreateProcessNotifyRoutine:
fffff880`01158910 4883ec38        sub     rsp,38h
--------------------------------------------
tcpip+0x4e020:
fffff880`0164f020 4883ec28        sub     rsp,28h
--------------------------------------------
CI!I_PEProcessNotify:
fffff880`00c9db94 4584c0          test    r8b,r8b
--------------------------------------------
MpFilter!MpCreateProcessNotifyRoutineEx:
fffff880`02a77cd8 488bc4          mov     rax,rsp
--------------------------------------------
vmci!VMCIEvent_Unsubscribe+0x21ac:
fffff880`036b12dc 4584c0          test    r8b,r8b
--------------------------------------------
peauth+0x18d2c:
fffff880`03c18d2c 48895c2408      mov     qword ptr [rsp+8],rbx
--------------------------------------------

Here is the script in its entirety, enjoy!

Update: Just posted a new version that should work across all Windows platforms, so no special support needed to execute on XP.

Update: You can download a text file version of the script here, much easier than trying to fix the formatting yourself.

$$ pscallbacks.wbs
$$
$$ Version 2
$$
$$  - Updated script to work unchanged across all platforms
$$
$$ Scott Noone - analyze-v.com
$$ snoone@analyze-v.com
$$
$$ This script walks the list of registered image load and process load notification
$$ routines.
$$
$$ Currently tested platforms:
$$
$$  Win7 x64
$$
$$  WinXP 32bit
$$
$$
.echo ************************************************
.echo * This command brought to you by Analyze-v.com *
.echo ************************************************
.echo
$$ Start off by creating some aliases that will make this script more readable
$$ Globals
aS BuildNumber    "low(dwo(nt!NtBuildNumber))";
aS ImageLoadCount "dwo(nt!PspLoadImageNotifyRoutineCount)";
aS ImageLoadBase  "nt!PspLoadImageNotifyRoutine";
$$
$$ Include the Ex callbacks where available
$$
.block
{
    .if (${BuildNumber} <= 0n3790)
    {
        aS ProcessNotifyCount "dwo(nt!PspCreateProcessNotifyRoutineCount)";
    }
    .else
    {
        aS ProcessNotifyCount "dwo(nt!PspCreateProcessNotifyRoutineCount) + dwo(nt!PspCreateProcessNotifyRoutineExCount)";
    }
}
aS ProcessNotifyBase  "nt!PspCreateProcessNotifyRoutine";
$$ Our options
aS DISPLAY_IMAGE_NOTIFY "1";
aS DISPLAY_PROCESS_NOTIFY "2";
aS options "@$t0";
$$ variables
aS i "@$t1";
aS imageEntry "@$t2";
aS processEntry "@$t3";
aS functionPtr "@$t4";
$$ The .block is necessary to make sure that all of the above aliases are evaluated
.block
{
    r ${options} = 0;
    $$ Check the parameters. Valid values are:
    $$
    $$ i - Image load callbacks only
    $$ p - Process notification callbacks only
    $$ * - All callbacks
    $$
    .if (${/d:$arg1} == 1)
    {
        .if ('${$arg1}' == 'i')
        {
            r ${options} = ${DISPLAY_IMAGE_NOTIFY};
        }
        .elsif ('${$arg1}' == 'p')
        {
            r ${options} = ${DISPLAY_PROCESS_NOTIFY};
        }
        .elsif ('${$arg1}' == '*')
        {
            r ${options} = ${DISPLAY_IMAGE_NOTIFY} | ${DISPLAY_PROCESS_NOTIFY};
        }
        .else
        {
            .echo Error! Valid parameters are i,p, and *.
        }
    }
    .else
    {
        $$ Default to showing everything.
        r ${options} = ${DISPLAY_IMAGE_NOTIFY} | ${DISPLAY_PROCESS_NOTIFY};
    }
    .if ((${options} & ${DISPLAY_IMAGE_NOTIFY}) != 0)
    {
        .echo ************************************;
        .echo * Printing image load callbacks... *;
        .echo ************************************;
     �
        $$ Walk the image load notify routines
        r ${imageEntry} = ${ImageLoadBase};
        .for (r ${i} = 0; ${i} < ${ImageLoadCount}; r ${i} = ${i} + 1)
        {
            $$ This points to a function, though the bottom bits are control info.
            $$ So, mask those off
            r ${functionPtr} = (poi(${imageEntry}) & -8);
            $$ Unassemble the first instruction
            .if (@$ptrsize == 4)
            {
                $$ 32bit systems seem to have more control info ahead of the function
                $$ pointer, so skip that.
                u poi(${functionPtr} + 4) l1;
            }
            .else
            {
                u poi(${functionPtr}) l1;
            }
            $$ Walk to the next entry in the array
            r ${imageEntry} = ${imageEntry} + @$ptrsize;
     .echo --------------------------------------------;
        }
    }
    .if ((${options} & ${DISPLAY_PROCESS_NOTIFY}) != 0)
    {
        .echo **********************************************;
        .echo * Printing process notification callbacks... *;
        .echo **********************************************;
        $$ Walk the process notification routines
        r ${processEntry} = ${ProcessNotifyBase};
        .for (r ${i} = 0; ${i} < ${ProcessNotifyCount}; r ${i} = ${i} + 1)
        {
            $$ This points to a function, though the bottom bits are control info.
            $$ So, mask those off
            r ${functionPtr} = (poi(${processEntry}) & -8);
            $$ Unassemble the first instruction
            .if (@$ptrsize == 4)
            {
                $$ 32bit systems seem to have more control info ahead of the function
                $$ pointer, so skip that.
                u poi(${functionPtr} + 4) l1;
            }
            .else
            {
                u poi(${functionPtr}) l1;
            }
            $$ Walk to the next entry in the array
            r ${processEntry} = ${processEntry} + @$ptrsize;
     .echo --------------------------------------------;
        }
    }
}
$$ Clean up our aliases
ad ${/v:DISPLAY_IMAGE_NOTIFY};
ad ${/v:DISPLAY_PROCESS_NOTIFY};
ad ${/v:options};
ad ${/v:i};
ad ${/v:imageEntry};
ad ${/v:processEntry};
ad ${/v:functionPtr};
ad ${/v:BuildNumber};
ad ${/v:ImageLoadCount};
ad ${/v:ImageLoadBase};
ad ${/v:ProcessNotifyCount};
ad ${/v:ProcessNotifyBase};

Interpreting !pool results

Wednesday, November 10th, 2010

Yes, I’m still here! Been quite busy lately and all of the other usual excuses that people give for not updating their blogs…

!pool is an incredibly handy WinDBG command. It takes a virtual address as a parameter and will attempt to determine if that address represents an executive pool allocation. If it is a pool allocation, !pool will identify the base address of the allocation that this address is a part of, the size of the allocation, the state of the allocation (valid or free), and the four character tag used to allocate the memory. Pretty nifty, but how exactly does this command work and what are we looking at in the resulting output?

First off, I’ll start by saying that, in the general case, small allocations in the pools are maintained in PAGE_SIZE units. This means that on the x86 and x64, the pools are maintained in chunks of 4K (note that I’m leaving pesky cases like large allocations and special pool for another time). Each allocation in the page is preceded by a POOL_HEADER structure, which tracks interesting info about the allocation in the page:

The interesting fields here are:

PreviousSize

On first reading you might be tempted to think of this as the previous size of the allocation, but in reality what it tracks is the size of the preceding entry in the pool page plus the size of the pool header structure. As we’ll see in a moment, the O/S and the !pool command will use this information to perform validation on the pool page. Note that this field is only 8 bits long, so it’s not large enough to actually describe any reasonable allocation. Thus, there is a multiplier applied to this field in order to get the actual size of the allocation.

BlockSize

This is the size of the allocation described by the pool header, including the size of the pool header. Note again that this is only eight bits and thus requires a multiplier.

PoolType

The pool from which this allocation came from, for example the non-paged pool. This field can also indicate that the pool region represents freed memory by having a type of zero.

PoolTag

The four character tag used when allocating the buffer.

With that information in hand, we can look at the output of  a !pool command and see what exactly is going on:

NOTE: The Protected value in the output simply means that the allocation was made with the PROTECTED_POOL bit set in the pool tag (this is the high bit in the tag value). See here for more information about protected pool.

In the output above, we have asked the !pool command to find the allocation containing fffffa80`070c63e0. Hopefully with the explanation above, the output here should make a bit more sense. Note the first line of the output:

If you compare the address of the first pool entry listed in the output to the address supplied as the parameter, we see that it is the specified address rounded down to PAGE_SIZE. !pool does this because small pool allocations are maintained in PAGE_SIZE chunks, so if the command wants to find out of the supplied address is a valid pool allocation it just needs to round down to PAGE_SIZE and then start trying to walk the pool page. The values supplied in the above output for size, previous size, and tag are then just values retrieved from the pool header that begins the pool page:

In the output above, we see that the allocation size of the preceding entry in the page is zero, which makes sense seeing as how this is the first allocation in the page. The size of the allocatoin is 0×15, which is represented in the !pool output as being 0×150, so we have to multiply that size by 0×10 in order to get the actual size of the allocation. The pool type of two tells us that this is non-paged pool and the pool tag is ‘eliF’ with the high bit set, resulting in the File (Protected) output in the !pool result. 

 Based on that information, we can now found the next entry in the pool page by simply adding the size of the allocation to the address of the header and casting that as a pool header:

  

 Note how the previous size field in this header matches the size of the previous entry. That is a simple consistency check performed throughout the entire page of pool which allows !pool to determine if this is actually a valid block of pool. If those values didn’t match up, !pool would suspect that this was an invalid or otherwise corrupted pool page and return an error. The O/S also uses this consistency check to determine if someone has corrupted the pool page when allocating or freeing memory. Inconsistencies in the page lead to the memory manager crashing the system with a BAD_POOL_CALLER bugcheck.

From here, all that is left is for !pool  to find the address specified falling within the address of a pool header plus the length of the allocation in header. !pool indicates that it has found the address by putting an asterisk next to the address of the pool header of the allocation. !pool then continues to walk the remainder of the pool page to make sure that the entire page is consistent and has not been corrupted.

Checked out The NT Insider digital edition yet?

Wednesday, August 18th, 2010

We’ve finally gone digital with The NT Insider! You can grab the PDF here and read about all sorts of interesting topics (writing file system filter drivers, debugger extensions, and virtual storage miniports, to name a few).

Undocumented !verifier flags value (!verifier 0×200)

Wednesday, April 14th, 2010

Starting with Windows Vista, Driver Verifier has been updated to include circular trace buffers for interesting events. My favorite up until this point has been the pool allocate and free log, which records the call stack, calling thread, and address of pool allocations and frees. If the system then crashes due to a double free or access to a freed pool block, the debugger’s !verifier 0×80 command can be used to dump the alloc/free log. Even better, the command takes an optional address value that will show only the allocations and frees of the pool block containing that address.

You can see the results in this example from the WinDBG docs:

0: kd> !verifier 80 a2b1cf20
Parsing 00004000 array entries, searching for address a2b1cf20.
=======================================
Pool block a2b1ce98, Size 00000168, Thread a2b1ce98
808f1be6 ndis!ndisFreeToNPagedPool+0x39
808f11c1 ndis!ndisPplFree+0x47
808f100f ndis!NdisFreeNetBufferList+0x3b
8088db41 NETIO!NetioFreeNetBufferAndNetBufferList+0xe
8c588d68 tcpip!UdpEndSendMessages+0xdf
8c588cb5 tcpip!UdpSendMessagesDatagramsComplete+0x22
8088d622 NETIO!NetioDereferenceNetBufferListChain+0xcf
8c5954ea tcpip!FlSendNetBufferListChainComplete+0x1c
809b2370 ndis!ndisMSendCompleteNetBufferListsInternal+0x67
808f1781 ndis!NdisFSendNetBufferListsComplete+0x1a
8c04c68e pacer!PcFilterSendNetBufferListsComplete+0xb2
809b230c ndis!NdisMSendNetBufferListsComplete+0x70
8ac4a8ba test1!HandleCompletedTxPacket+0xea
=======================================
Pool block a2b1ce98, Size 00000164, Thread a2b1ce98
822af87f nt!VerifierExAllocatePoolWithTagPriority+0x5d
808f1c88 ndis!ndisAllocateFromNPagedPool+0x1d
808f11f3 ndis!ndisPplAllocate+0x60
808f1257 ndis!NdisAllocateNetBufferList+0x26
80890933 NETIO!NetioAllocateAndReferenceNetBufferListNetBufferMdlAndData+0x14
8c5889c2 tcpip!UdpSendMessages+0x503
8c05c565 afd!AfdTLSendMessages+0x27
8c07a087 afd!AfdTLFastDgramSend+0x7d
8c079f82 afd!AfdFastDatagramSend+0x5ae
8c06f3ea afd!AfdFastIoDeviceControl+0x3c1
8217474f nt!IopXxxControlFile+0x268
821797a1 nt!NtDeviceIoControlFile+0x2a
8204d16a nt!KiFastCallEntry+0x127

In the output, the most recent event is at the top. Thus, here you can see that the buffer was allocated with ndisAllocateFromNPagedPool and freed with ndisAllocateFromNPagedPool.

In addition to the pool allocation log, !verifier 0×100 shows the IRP log, which logs all IoCallDriver, IoCompleteRequest, and IoCancelIrp calls.

Based on the docs you’d think that’s all there is, but there’s an undocumented log that can be accessed with !verifier 0×200 and that is the critical region log.

This is not to be confused with the user mode concept of critical regions. In a driver, one can call KeEnterCriticalRegion and KeExitCriticalRegion in order to disable and re-enable APC delivery. Without getting too much in to why a driver needs to disable APC delivery, what’s important to note is that every call to KeEnterCriticalRegion must be matched with a call to KeExitCriticalRegion. If a driver gets this wrong, then the system will crash with an APC_INDEX_MISMATCH bugcheck when it notices that the enter/exit count is off.

The way this works is that entering a critical region decrements a field of the KTHREAD structure and exiting a critical region increments the field of the structure. At various points in the O/S, the field of the KTHREAD is checked to make sure that it is zero. If it isn’t, then the system crashes with the previously mentioned APC_INDEX_MISMATCH bugcheck code. One such place that this is checked is in the system service dispatcher before returning back to the caller, which is why you’ll see these bugchecks come from KiSystemServiceExit.

What makes these crashes particularly difficult to track down is that the crash is a secondary failure, by the time the system notices that the count field is incorrect the code that caused the bad state is gone. Enter the critical region log, which will trace every call to KeEnterCriticalRegion and KeLeaveCriticalRegion for the Verified drivers. Now, when the system crashes you can just type !verifier 0×200 in the debugger and find the mismatched call.

Note that this only works with Driver Verifier enabled, just another reason to make sure that you’re always testing with Verifier!

Great description of IRQL by Jake Oshins

Thursday, February 4th, 2010

Doron Holan’s blog has a guest post by Jake Oshins on IRQL that provids a nice summary on the concept:

http://blogs.msdn.com/doronh/archive/2010/02/02/what-is-irql.aspx

For those who aren’t aware, Jake has done lots of development work on the HAL and ACPI (amongst other things) so he’s the one that you want to talk to when it comes to core Windows concepts such as IRQL, interrupt handling, power management, etc.

The FullImageName parameter to an image load callback may be NULL

Tuesday, November 24th, 2009

I had an interesting crash on a system running Process Monitor from Sysinternals that implicated the Process Monitor driver in a NULL pointer dereference:

procmoncrash

Here we see what appears to be an image load callback (as indicated by the call to PsCallImageNotifyRoutines on the stack) that is dereferencing a NULL pointer value contained in EDI.

Zooming into the assembly of the routine in question, we can see that the bad pointer value of EDI is retrieved a couple of lines up from EBP+8. Since we also can see that this is a standard EBP frame in the prolog, we can assume at this point that the NULL pointer deref occurred due to a failure to check the first parameter for NULL:

procmonasm

At this point I went back to the PsSetLoadImageNotifyRoutine documentation to check the prototype of the callback function, hoping to found out what the first parameter of this routine was and if it was documented to be NULL. According to the documentation, the following is the prototype of the function:

VOID
(*PLOAD_IMAGE_NOTIFY_ROUTINE) (
IN PUNICODE_STRING  FullImageName,
IN HANDLE  ProcessId, // where image is mapped
IN PIMAGE_INFO  ImageInfo
);


Note that the first parameter is indicated to be a pointer to a UNICODE_STRING structure. This fits with the assembly lines listed above:

mov edi, dword ptr [ebp+8]

xor eax, eax

cmp word ptr [edi], ax

Since that assembly listing would match the following C code due to the fact that the first field of a UNICODE_STRING structure is an USHORT Length value:

if (FullImageName->Length == 0)

Thus, this routine assumes that if there is no image name available for the module being loaded, the first parameter to this routine will be a pointer to an empty UNICODE_STRING. Clearly in my case the first parameter passed in was NULL, thus I had to dig further back to determine if I was dealing with a corruption or if Process Monitor’s image load callback had a genuine bug.

Moving up a couple of frames, I located the code that gathers the name for the image being loaded and calls the dispatcher routine. Below is the highlighted assembly path that is of interest to us:

callerasm

The highlighted sections correspond to the following four steps that ultimately lead to the crash:

1) Set the value of EBX to be zero, or NULL

2) Call an internal memory manager routine, MmGetFileNameForSection. While I’m not familiar with this routine, I suspect that it tries to get the file name of a section :) Upon return, compare the return value of the routine (EAX) with EBX, which we know to be zero.

3) Based on the compare, we will “jump if less than zero.” This assembly pattern basically checks a value to determine if it is less than zero and, if it is, jumps to a new location. Failure codes in the kernel are negative values, thus this is a fairly common pattern that corresponds to the following C statement:

status = SomeFunction();

if (!NT_SUCCESS(status)) {

}

Thus, if MmGetFileNameForSection returns a negative value (i.e. a failure code) we will jump.

4) The final highlighted section is the failure path that we take if MmGetFileNameForSection returns a failure code. In that case, a push of EBX is performed. This means that we will pass NULL as the first parameter to PsCallImageNotifyRoutines and not a pointer to an empty UNICODE_STRING structure.

So, after all of that digging, the result is that this is a legitimate bug in the Process Monitor image load callback. The system that this crash occurred on had several other third party drivers present including two security products, thus I suspect that someone failed the attempt to get the image name for some reason or another. This triggered an untested path in the Process Monitor driver and led to the crash.

Note: If you try to follow the above assembly on your own system you might find different routines in play here as this code was rewritten for Vista and later. However, the bug is still real as the first parameter to the image load callback may be NULL on those platforms as well.

Beware using user mode handles in a driver

Tuesday, November 17th, 2009

Driver Verifier has been updated in Win7 and several new checks have been added. One of the more interesting checks is the check for accessing user mode handles for kernel mode access. So, for example, take a handle from a user mode application and call ObReferenceObjectByHandle specifying KernelMode as the access mode. Prior to Windows 7 this would bypass any access checking and provide the underlying object pointer. On Windows 7 it will do the same without Verifier enabled, but with Verifier enabled you’ll receive a DRIVER_VERIFIER_DETECTED_VIOLATION (0xC4)/0xF6 bugcheck.

The reasons behind this check are, of course, security and reliability. The problem with user handles is that the user mode application has access to them as well as the driver. So, for example, the application could close the handle or the object that the handle maps to could change by the application opening and closing things while the driver is working with the handle.

Fixing this crash is quite simple in a driver by:

1) Always specifying OBJ_KERNEL_HANDLE in your OBJECT_ATTRIBUTES structures when creating a new handle. An alternative to this is to make sure that you always open your files in the context of the System process.

2) If you’re working with a user mode handle, always specify UserMode as the access mode parameter to any API that requires one.

One final thing that I’ll mention is that kernel handles have an interesting characteristic on Windows 2000 and Windows XP: IRP_MJ_CREATE and IRP_MJ_CLEANUP operations for kernel handles are sent in the context of the System process. Thus, if you specify OBJ_KERNEL_HANDLE and call ZwCreateFile, the I/O manager will call KeStackAttachProcess and force a switch into the System process before calling the target driver with an IRP_MJ_CREATE. Once the target driver completes the IRP, the I/O manager will switch back into the original process context. Later when you ZwClose the handle, the I/O manager will again switch back into the System process before sending the IRP_MJ_CLEANUP to the target driver. Any other operation against the handle arrives in the context of the requesting process, which can lead to some strange behavior if the target driver pays attention to such things.

!pool broken for Special Pool allocations

Wednesday, October 7th, 2009

Driver Verifier has a Special Pool option, which causes your pool allocations to come out of special pool and get all sorts of added checking. This includes things such as guard pages at the end of your allocations to avoid buffer overruns, checks against accessing buffers after you free them, etc. Unfortunately the !pool command in WinDBG appears to be broken when given a special pool address on Windows XP. I haven’t had the chance to investigate further on newer O/S platforms, so it’s possible that there are also some issues there.

The extension command appears to have two issues:

1) The size shown as the allocation size is really the allocation size minus 8. Take for example a verified driver that does the following:

	a = ExAllocatePoolWithTag(NonPagedPool, 4, 'xxxx');
       	b = ExAllocatePoolWithTag(NonPagedPool, 8, 'xxxx');
       	c = ExAllocatePoolWithTag(NonPagedPool, 16, 'xxxx');

A !pool on a, b, and c shows the following (respectively):

1: kd> !pool 0x82a5aff8
Pool page 82a5aff8 region is Special pool
*82a5b000 size: fffffffc non-paged special pool, Tag is xxxx
	Owning component : Unknown (update pooltag.txt)
1: kd> !pool 0x824eeff8
Pool page 824eeff8 region is Special pool
*824ef000 size:    0 non-paged special pool, Tag is xxxx
	Owning component : Unknown (update pooltag.txt)
1: kd> !pool 0x82940ff0
Pool page 82940ff0 region is Special pool
*82940ff8 size:    8 non-paged special pool, Tag is xxxx
	Owning component : Unknown (update pooltag.txt)

2) The pool header addresses are incorrect. Take a in the example above:

1: kd> !pool 0x82a5aff8
Pool page 82a5aff8 region is Special pool
*82a5b000

That address given as the pool header is actually the guard page and not the header:

1: kd> dt nt!_pool_header 82a5b000
	+0x000 PreviousSize     : ??
	+0x000 PoolIndex        : ??
	+0x002 BlockSize        : ??
	+0x002 PoolType         : ??
	+0x000 Ulong1           : ??
	+0x004 ProcessBilled    : ????
	+0x004 PoolTag          : ??
	+0x004 AllocatorBackTraceIndex : ??
	+0x006 PoolTagHash      : ??
	Memory read error 82a5b006

The pool header in this case is the page rounded down to page size, not up to:

1: kd> dt nt!_pool_header 0x82a5a000
	+0x000 PreviousSize     : 0y000000100 (0x4)
	+0x000 PoolIndex        : 0y0100000 (0x20)
	+0x002 BlockSize        : 0y001010001 (0x51)
	+0x002 PoolType         : 0y0000000 (0)
	+0x000 Ulong1           : 0x514004
	+0x004 ProcessBilled    : 0x78787878 _EPROCESS
	+0x004 PoolTag          : 0x78787878
	+0x004 AllocatorBackTraceIndex : 0x7878
	+0x006 PoolTagHash      : 0x7878

As you can see, the tag is the correct “xxxx” that we specified in the allocation:

1: kd> .formats 0x78787878
	Evaluate expression:
	Hex:     78787878
	Decimal: 2021161080
	Octal:   17036074170
	Binary:  01111000 01111000 01111000 01111000
	Chars:   xxxx
	Time:    Tue Jan 17 20:38:00 2034
	Float:   low 2.01583e+034 high 0
	Double:  9.98586e-315