Archive for the ‘Driver development’ Category

Checked out The NT Insider digital edition yet?

Wednesday, August 18th, 2010

We’ve finally gone digital with The NT Insider! You can grab the PDF here and read about all sorts of interesting topics (writing file system filter drivers, debugger extensions, and virtual storage miniports, to name a few).

Undocumented !verifier flags value (!verifier 0×200)

Wednesday, April 14th, 2010

Starting with Windows Vista, Driver Verifier has been updated to include circular trace buffers for interesting events. My favorite up until this point has been the pool allocate and free log, which records the call stack, calling thread, and address of pool allocations and frees. If the system then crashes due to a double free or access to a freed pool block, the debugger’s !verifier 0×80 command can be used to dump the alloc/free log. Even better, the command takes an optional address value that will show only the allocations and frees of the pool block containing that address.

You can see the results in this example from the WinDBG docs:

0: kd> !verifier 80 a2b1cf20
Parsing 00004000 array entries, searching for address a2b1cf20.
=======================================
Pool block a2b1ce98, Size 00000168, Thread a2b1ce98
808f1be6 ndis!ndisFreeToNPagedPool+0x39
808f11c1 ndis!ndisPplFree+0x47
808f100f ndis!NdisFreeNetBufferList+0x3b
8088db41 NETIO!NetioFreeNetBufferAndNetBufferList+0xe
8c588d68 tcpip!UdpEndSendMessages+0xdf
8c588cb5 tcpip!UdpSendMessagesDatagramsComplete+0x22
8088d622 NETIO!NetioDereferenceNetBufferListChain+0xcf
8c5954ea tcpip!FlSendNetBufferListChainComplete+0x1c
809b2370 ndis!ndisMSendCompleteNetBufferListsInternal+0x67
808f1781 ndis!NdisFSendNetBufferListsComplete+0x1a
8c04c68e pacer!PcFilterSendNetBufferListsComplete+0xb2
809b230c ndis!NdisMSendNetBufferListsComplete+0x70
8ac4a8ba test1!HandleCompletedTxPacket+0xea
=======================================
Pool block a2b1ce98, Size 00000164, Thread a2b1ce98
822af87f nt!VerifierExAllocatePoolWithTagPriority+0x5d
808f1c88 ndis!ndisAllocateFromNPagedPool+0x1d
808f11f3 ndis!ndisPplAllocate+0x60
808f1257 ndis!NdisAllocateNetBufferList+0x26
80890933 NETIO!NetioAllocateAndReferenceNetBufferListNetBufferMdlAndData+0x14
8c5889c2 tcpip!UdpSendMessages+0x503
8c05c565 afd!AfdTLSendMessages+0x27
8c07a087 afd!AfdTLFastDgramSend+0x7d
8c079f82 afd!AfdFastDatagramSend+0x5ae
8c06f3ea afd!AfdFastIoDeviceControl+0x3c1
8217474f nt!IopXxxControlFile+0x268
821797a1 nt!NtDeviceIoControlFile+0x2a
8204d16a nt!KiFastCallEntry+0x127

In the output, the most recent event is at the top. Thus, here you can see that the buffer was allocated with ndisAllocateFromNPagedPool and freed with ndisAllocateFromNPagedPool.

In addition to the pool allocation log, !verifier 0×100 shows the IRP log, which logs all IoCallDriver, IoCompleteRequest, and IoCancelIrp calls.

Based on the docs you’d think that’s all there is, but there’s an undocumented log that can be accessed with !verifier 0×200 and that is the critical region log.

This is not to be confused with the user mode concept of critical regions. In a driver, one can call KeEnterCriticalRegion and KeExitCriticalRegion in order to disable and re-enable APC delivery. Without getting too much in to why a driver needs to disable APC delivery, what’s important to note is that every call to KeEnterCriticalRegion must be matched with a call to KeExitCriticalRegion. If a driver gets this wrong, then the system will crash with an APC_INDEX_MISMATCH bugcheck when it notices that the enter/exit count is off.

The way this works is that entering a critical region decrements a field of the KTHREAD structure and exiting a critical region increments the field of the structure. At various points in the O/S, the field of the KTHREAD is checked to make sure that it is zero. If it isn’t, then the system crashes with the previously mentioned APC_INDEX_MISMATCH bugcheck code. One such place that this is checked is in the system service dispatcher before returning back to the caller, which is why you’ll see these bugchecks come from KiSystemServiceExit.

What makes these crashes particularly difficult to track down is that the crash is a secondary failure, by the time the system notices that the count field is incorrect the code that caused the bad state is gone. Enter the critical region log, which will trace every call to KeEnterCriticalRegion and KeLeaveCriticalRegion for the Verified drivers. Now, when the system crashes you can just type !verifier 0×200 in the debugger and find the mismatched call.

Note that this only works with Driver Verifier enabled, just another reason to make sure that you’re always testing with Verifier!

Great description of IRQL by Jake Oshins

Thursday, February 4th, 2010

Doron Holan’s blog has a guest post by Jake Oshins on IRQL that provids a nice summary on the concept:

http://blogs.msdn.com/doronh/archive/2010/02/02/what-is-irql.aspx

For those who aren’t aware, Jake has done lots of development work on the HAL and ACPI (amongst other things) so he’s the one that you want to talk to when it comes to core Windows concepts such as IRQL, interrupt handling, power management, etc.

The FullImageName parameter to an image load callback may be NULL

Tuesday, November 24th, 2009

I had an interesting crash on a system running Process Monitor from Sysinternals that implicated the Process Monitor driver in a NULL pointer dereference:

procmoncrash

Here we see what appears to be an image load callback (as indicated by the call to PsCallImageNotifyRoutines on the stack) that is dereferencing a NULL pointer value contained in EDI.

Zooming into the assembly of the routine in question, we can see that the bad pointer value of EDI is retrieved a couple of lines up from EBP+8. Since we also can see that this is a standard EBP frame in the prolog, we can assume at this point that the NULL pointer deref occurred due to a failure to check the first parameter for NULL:

procmonasm

At this point I went back to the PsSetLoadImageNotifyRoutine documentation to check the prototype of the callback function, hoping to found out what the first parameter of this routine was and if it was documented to be NULL. According to the documentation, the following is the prototype of the function:

VOID
(*PLOAD_IMAGE_NOTIFY_ROUTINE) (
IN PUNICODE_STRING  FullImageName,
IN HANDLE  ProcessId, // where image is mapped
IN PIMAGE_INFO  ImageInfo
);


Note that the first parameter is indicated to be a pointer to a UNICODE_STRING structure. This fits with the assembly lines listed above:

mov edi, dword ptr [ebp+8]

xor eax, eax

cmp word ptr [edi], ax

Since that assembly listing would match the following C code due to the fact that the first field of a UNICODE_STRING structure is an USHORT Length value:

if (FullImageName->Length == 0)

Thus, this routine assumes that if there is no image name available for the module being loaded, the first parameter to this routine will be a pointer to an empty UNICODE_STRING. Clearly in my case the first parameter passed in was NULL, thus I had to dig further back to determine if I was dealing with a corruption or if Process Monitor’s image load callback had a genuine bug.

Moving up a couple of frames, I located the code that gathers the name for the image being loaded and calls the dispatcher routine. Below is the highlighted assembly path that is of interest to us:

callerasm

The highlighted sections correspond to the following four steps that ultimately lead to the crash:

1) Set the value of EBX to be zero, or NULL

2) Call an internal memory manager routine, MmGetFileNameForSection. While I’m not familiar with this routine, I suspect that it tries to get the file name of a section :) Upon return, compare the return value of the routine (EAX) with EBX, which we know to be zero.

3) Based on the compare, we will “jump if less than zero.” This assembly pattern basically checks a value to determine if it is less than zero and, if it is, jumps to a new location. Failure codes in the kernel are negative values, thus this is a fairly common pattern that corresponds to the following C statement:

status = SomeFunction();

if (!NT_SUCCESS(status)) {

}

Thus, if MmGetFileNameForSection returns a negative value (i.e. a failure code) we will jump.

4) The final highlighted section is the failure path that we take if MmGetFileNameForSection returns a failure code. In that case, a push of EBX is performed. This means that we will pass NULL as the first parameter to PsCallImageNotifyRoutines and not a pointer to an empty UNICODE_STRING structure.

So, after all of that digging, the result is that this is a legitimate bug in the Process Monitor image load callback. The system that this crash occurred on had several other third party drivers present including two security products, thus I suspect that someone failed the attempt to get the image name for some reason or another. This triggered an untested path in the Process Monitor driver and led to the crash.

Note: If you try to follow the above assembly on your own system you might find different routines in play here as this code was rewritten for Vista and later. However, the bug is still real as the first parameter to the image load callback may be NULL on those platforms as well.

Beware using user mode handles in a driver

Tuesday, November 17th, 2009

Driver Verifier has been updated in Win7 and several new checks have been added. One of the more interesting checks is the check for accessing user mode handles for kernel mode access. So, for example, take a handle from a user mode application and call ObReferenceObjectByHandle specifying KernelMode as the access mode. Prior to Windows 7 this would bypass any access checking and provide the underlying object pointer. On Windows 7 it will do the same without Verifier enabled, but with Verifier enabled you’ll receive a DRIVER_VERIFIER_DETECTED_VIOLATION (0xC4)/0xF6 bugcheck.

The reasons behind this check are, of course, security and reliability. The problem with user handles is that the user mode application has access to them as well as the driver. So, for example, the application could close the handle or the object that the handle maps to could change by the application opening and closing things while the driver is working with the handle.

Fixing this crash is quite simple in a driver by:

1) Always specifying OBJ_KERNEL_HANDLE in your OBJECT_ATTRIBUTES structures when creating a new handle. An alternative to this is to make sure that you always open your files in the context of the System process.

2) If you’re working with a user mode handle, always specify UserMode as the access mode parameter to any API that requires one.

One final thing that I’ll mention is that kernel handles have an interesting characteristic on Windows 2000 and Windows XP: IRP_MJ_CREATE and IRP_MJ_CLEANUP operations for kernel handles are sent in the context of the System process. Thus, if you specify OBJ_KERNEL_HANDLE and call ZwCreateFile, the I/O manager will call KeStackAttachProcess and force a switch into the System process before calling the target driver with an IRP_MJ_CREATE. Once the target driver completes the IRP, the I/O manager will switch back into the original process context. Later when you ZwClose the handle, the I/O manager will again switch back into the System process before sending the IRP_MJ_CLEANUP to the target driver. Any other operation against the handle arrives in the context of the requesting process, which can lead to some strange behavior if the target driver pays attention to such things.

!pool broken for Special Pool allocations

Wednesday, October 7th, 2009

Driver Verifier has a Special Pool option, which causes your pool allocations to come out of special pool and get all sorts of added checking. This includes things such as guard pages at the end of your allocations to avoid buffer overruns, checks against accessing buffers after you free them, etc. Unfortunately the !pool command in WinDBG appears to be broken when given a special pool address on Windows XP. I haven’t had the chance to investigate further on newer O/S platforms, so it’s possible that there are also some issues there.

The extension command appears to have two issues:

1) The size shown as the allocation size is really the allocation size minus 8. Take for example a verified driver that does the following:

	a = ExAllocatePoolWithTag(NonPagedPool, 4, 'xxxx');
       	b = ExAllocatePoolWithTag(NonPagedPool, 8, 'xxxx');
       	c = ExAllocatePoolWithTag(NonPagedPool, 16, 'xxxx');

A !pool on a, b, and c shows the following (respectively):

1: kd> !pool 0x82a5aff8
Pool page 82a5aff8 region is Special pool
*82a5b000 size: fffffffc non-paged special pool, Tag is xxxx
	Owning component : Unknown (update pooltag.txt)
1: kd> !pool 0x824eeff8
Pool page 824eeff8 region is Special pool
*824ef000 size:    0 non-paged special pool, Tag is xxxx
	Owning component : Unknown (update pooltag.txt)
1: kd> !pool 0x82940ff0
Pool page 82940ff0 region is Special pool
*82940ff8 size:    8 non-paged special pool, Tag is xxxx
	Owning component : Unknown (update pooltag.txt)

2) The pool header addresses are incorrect. Take a in the example above:

1: kd> !pool 0x82a5aff8
Pool page 82a5aff8 region is Special pool
*82a5b000

That address given as the pool header is actually the guard page and not the header:

1: kd> dt nt!_pool_header 82a5b000
	+0x000 PreviousSize     : ??
	+0x000 PoolIndex        : ??
	+0x002 BlockSize        : ??
	+0x002 PoolType         : ??
	+0x000 Ulong1           : ??
	+0x004 ProcessBilled    : ????
	+0x004 PoolTag          : ??
	+0x004 AllocatorBackTraceIndex : ??
	+0x006 PoolTagHash      : ??
	Memory read error 82a5b006

The pool header in this case is the page rounded down to page size, not up to:

1: kd> dt nt!_pool_header 0x82a5a000
	+0x000 PreviousSize     : 0y000000100 (0x4)
	+0x000 PoolIndex        : 0y0100000 (0x20)
	+0x002 BlockSize        : 0y001010001 (0x51)
	+0x002 PoolType         : 0y0000000 (0)
	+0x000 Ulong1           : 0x514004
	+0x004 ProcessBilled    : 0x78787878 _EPROCESS
	+0x004 PoolTag          : 0x78787878
	+0x004 AllocatorBackTraceIndex : 0x7878
	+0x006 PoolTagHash      : 0x7878

As you can see, the tag is the correct “xxxx” that we specified in the allocation:

1: kd> .formats 0x78787878
	Evaluate expression:
	Hex:     78787878
	Decimal: 2021161080
	Octal:   17036074170
	Binary:  01111000 01111000 01111000 01111000
	Chars:   xxxx
	Time:    Tue Jan 17 20:38:00 2034
	Float:   low 2.01583e+034 high 0
	Double:  9.98586e-315

Shutting off OACR

Wednesday, August 12th, 2009

Finally had a chance to install the RTM Win7 WDK and start dealing with Office Automated Code Review (OACR). I’ve had to deal with OACR before and never was a fan, but now that it’s an integrated part of my day to day life I find it more distracting than helpful. It’s not Prefast (sorry, Prefast) that I don’t like, it’s the being forced to deal with a contantly changing icon in my tray and the barrage of pop up balloons (complete with sound notifications!):

oacrpop

My immediate question is, of course, how do I make this go away? There are in fact a couple of reasonable options that you can use to get rid of it either permanently or temporarily.

If you want to remove it permanently, the first option is to follow the instructions described in this MSDN article. This will involve either editing your WDK build environment shortcuts or changing an oacr.ini file.

Alternatively, there is a way to disable OACR from the command line within a build environment. The first step is to turn OACR off for your builds, which is done using the particularly cryptic command oacr set off. This will cause OACR to be bypassed for subsequent builds, allowing you to avoid being pestered for anything you build in this window:

oacroff

From here, if you want to also kill the tray icon you can execute oacr stop and OACR should go away completely. Note that the oacr set off command is important, if you only do oacr stop OACR will come right back the next time you invoke build.

Finding a paged out buffer when you need one

Sunday, July 26th, 2009

One day I wanted to demonstrate the !pte command on a page that had been paged out to a paging file. The platform in question was an XP SP3 dump and I was slightly on the spot, so I had to quickly think of a way to find a page that was actually paged out.

I immediately thought to crawl through the paged pool range in the kernel virtual address space, running !pte on each page. I could do this with something like .for or simply manually walk every 4K until I hit a paged out page.

Because I was feeling lucky (and didn’t want to look up the .for syntax), I decided to try manually walking but quickly found that tedious and I didn’t get the results that I was looking for. Everything I hit was paged in, as indicated by the valid bit being set:

allvalid

So, I needed something else that might provide a quicker solution. Then I thought about paged code sections in a driver image. By default, all driver code (and data) is non-pageable. However, a driver writer is able to use a pragma directive to put code into the PAGE section of its image.

Code in the PAGE section of a driver is subject to being paged out. In addition, the memory manager pages this code to a paging file instead of retrieving it again from the image file (like in most user mode application scenarios). You can see an example of the pragma usage in the FASTFAT source from the WDK:

fatpragma

Being reminded of this, I decided to grab a Microsoft supplied driver and check for a PAGE section in the PE header, since MS supplied drivers often utilize this to decrease their memory footprint in the system. For no particular reason, I went with fltmgr.sys, which is the binary for Filter Manager. This did indeed reveal a code section named PAGE in the image:

pagesect

This gave me a new, smaller space to search for a page paged out to a paging file. So, I gave it a shot and lucked out immediately:

pagedout

PDB Paths and the PE Debug Directory

Sunday, July 26th, 2009

You might have noticed that WinDBG sometimes “knows” where to find the PDB for your driver even if you have never set your Symbol Search Path to point to your object directory. Spooky, and it certainly begs the question of how WinDBG knows where to look for your PDB.

The answer is in the debug directory of your driver’s PE image. The debug directory can contain, well, debug information, or can point off to a file containing debug information. This is how WinDBG can so easily find your PDB, by default the fully qualified path (FQP) to the PDB is found in the debug directory. If you want to see the path, you can do so with the !dh command, which will parse the PE header of the image and show you the intepreted results. Just execute !dh modulename in the debugger and scroll down to the Debug Directories section:

dbgdir

You can even leverage this to find information about drivers that you don’t own. I just checked the loaded module list on my current machine and had a driver loaded that I didn’t recognize: NETw4x32.sys. Doing a !dh on that module tells me that the FQP is:

c:\sandbox\94922\win_driver\miniport\ndis5x\objfre_wxp_x86\i386\NETw4x32.pdb

So, it’s the free build of an NDIS 5.x miniport driver, which isn’t bad for information gleaned from two seconds of effort.

The problem with those FQPs is that they can contain information that you don’t want to share with the world. Build machine names, developer names, etc. may show up in the path and you might not want people outside your company to have access to that info. Luckily, there is a way to strip the path information out of your image files and that is exactly what Microsoft does with their binaries. For example, take the kernel on my system using the info from !dh nt:

dbgnt

The secret to making this work is the binplace tool, which exists mostly to move binary outputs of the build process to various locations. However, it has the added features of generating public PDB files as well as stripping the path infromation from your released image files.

Getting binplace integrated to your build is another post for another day, but details can be found on MSDN if you’re interested: http://msdn.microsoft.com/en-us/library/ms791453.aspx

Owning Process vs Attached Process

Thursday, July 16th, 2009

A change was made to Windows around the Server 2003 timeframe that can make for some confusing information in the !thread output. Specifically, I’m referring to the Owning Process and Attached Process fields:

ownatt

The above output is from an XP machine and indicates that no information is available for the process that currently owns this thread, but the thread is attached to the System process. Let’s compare that with a similar thread from a Win7 machine:

win7ownatt

On the Win7 machine, the information is switched! So, this indicates to us that the owning process is the System process and the thread is not attached to any process. What’s the deal? Are these threads running under different circumstances or is the debugger showing us bogus information?

Well, what happened is that changes have been made to the KTHREAD and WinDBG was modified to reflect the new changes. This ends up making the output a little misleading on older platforms.

Prior to Server 2003, the kernel only tracked the attached processes of the thread. Threads start out attached to the process that they were created for, but the O/S and drivers are able to temporarily attach  a thread to another process with KeStackAttachProcess. While attached to another process, a thread runs under the context of that process and is able to reference that process’ handles, virtual address space, etc. Because the KTHREAD didn’t have a field that pointed directly to the process that the thread was originally created for, there was only one process to show in the !thread output. This is reflected in the example output shown in the !thread documentation, which was clearly captured from an older debugger version that made no mention of the Attached Process:

2kownt

On Server 2003 and later, the KTHREAD keeps track of the original process that the thread was created for. This is what !thread is trying to leverage, showing both the Owning Process (process that the thread was originally created for) and Attached Process (the process under which the thread is currently running).

If you run !thread on a Server 2003 and later target, the Owning Process will show the process that the thread was created for and Attached Process will be NULL if the thread isn’t currently running under a different process context.

However, if you run !thread on XP, the extension will get an error when trying to get the owning process field from the KTHREAD. The way that !thread reports this to you is by showing <Unknown> in the output. The extension will then check the attached process info and get a valid result (since that’s all XP tracks). The output will then show the Attached Process information as being valid.

It’s also interesting to note that, as the name implies, KeStackAttachProcess calls can be nested. Unfortunately, on no version of Windows does the !thread output try to traverse the “stack” of processes that this thread is currently attached to, they all just show the current one.