Archive for November, 2009

The FullImageName parameter to an image load callback may be NULL

Tuesday, November 24th, 2009

I had an interesting crash on a system running Process Monitor from Sysinternals that implicated the Process Monitor driver in a NULL pointer dereference:

procmoncrash

Here we see what appears to be an image load callback (as indicated by the call to PsCallImageNotifyRoutines on the stack) that is dereferencing a NULL pointer value contained in EDI.

Zooming into the assembly of the routine in question, we can see that the bad pointer value of EDI is retrieved a couple of lines up from EBP+8. Since we also can see that this is a standard EBP frame in the prolog, we can assume at this point that the NULL pointer deref occurred due to a failure to check the first parameter for NULL:

procmonasm

At this point I went back to the PsSetLoadImageNotifyRoutine documentation to check the prototype of the callback function, hoping to found out what the first parameter of this routine was and if it was documented to be NULL. According to the documentation, the following is the prototype of the function:

VOID
(*PLOAD_IMAGE_NOTIFY_ROUTINE) (
IN PUNICODE_STRING  FullImageName,
IN HANDLE  ProcessId, // where image is mapped
IN PIMAGE_INFO  ImageInfo
);


Note that the first parameter is indicated to be a pointer to a UNICODE_STRING structure. This fits with the assembly lines listed above:

mov edi, dword ptr [ebp+8]

xor eax, eax

cmp word ptr [edi], ax

Since that assembly listing would match the following C code due to the fact that the first field of a UNICODE_STRING structure is an USHORT Length value:

if (FullImageName->Length == 0)

Thus, this routine assumes that if there is no image name available for the module being loaded, the first parameter to this routine will be a pointer to an empty UNICODE_STRING. Clearly in my case the first parameter passed in was NULL, thus I had to dig further back to determine if I was dealing with a corruption or if Process Monitor’s image load callback had a genuine bug.

Moving up a couple of frames, I located the code that gathers the name for the image being loaded and calls the dispatcher routine. Below is the highlighted assembly path that is of interest to us:

callerasm

The highlighted sections correspond to the following four steps that ultimately lead to the crash:

1) Set the value of EBX to be zero, or NULL

2) Call an internal memory manager routine, MmGetFileNameForSection. While I’m not familiar with this routine, I suspect that it tries to get the file name of a section :) Upon return, compare the return value of the routine (EAX) with EBX, which we know to be zero.

3) Based on the compare, we will “jump if less than zero.” This assembly pattern basically checks a value to determine if it is less than zero and, if it is, jumps to a new location. Failure codes in the kernel are negative values, thus this is a fairly common pattern that corresponds to the following C statement:

status = SomeFunction();

if (!NT_SUCCESS(status)) {

}

Thus, if MmGetFileNameForSection returns a negative value (i.e. a failure code) we will jump.

4) The final highlighted section is the failure path that we take if MmGetFileNameForSection returns a failure code. In that case, a push of EBX is performed. This means that we will pass NULL as the first parameter to PsCallImageNotifyRoutines and not a pointer to an empty UNICODE_STRING structure.

So, after all of that digging, the result is that this is a legitimate bug in the Process Monitor image load callback. The system that this crash occurred on had several other third party drivers present including two security products, thus I suspect that someone failed the attempt to get the image name for some reason or another. This triggered an untested path in the Process Monitor driver and led to the crash.

Note: If you try to follow the above assembly on your own system you might find different routines in play here as this code was rewritten for Vista and later. However, the bug is still real as the first parameter to the image load callback may be NULL on those platforms as well.

Getting rid of “Page <> not present in the dump file.” errors

Thursday, November 19th, 2009

If you’ve ever run a command that does a full memory scan against a kernel summary dump you’ve no doubt had to deal with the inevitable flood of this warning message. For example, here I try to run !irpfind against a kernel summary dump:

notpresent

The warning message output is so annoying that you spend more time wading through the lines with the warning than you do looking at useful data.

Luckily, there’s a solution: the .ignore_missing_pages command. Specify a parameter of 1 and this message will never bother you again (where “never” is either the end of the world or the next time you open WinDBG):

.ignore_missing_pages 1

If you want the message back for any twisted reason then you just need to specify a parameter of 0 to get the default behavior back:

.ignore_missing_pages 0

Process directory table base doesn’t match CR3

Wednesday, November 18th, 2009

You might occasionally have seen this error when opening a crash dump file:

WARNING: Process directory table base <address> doesn't match CR3 <address>

What does it mean and why does it happen?

The answer to what it means lies in virtual memory. The page directory table is the term used for the base of the virtual memory tables that describe the address space of a process. When you dereference any virtual address on a processor, the processor retrieves the base of the virtual memory tables from the CR3 register on the processor. From there, the processor can walk the tables and translate the virtual address to the underlying physical address and retrieve the requested memory.

In Windows, each process has its own address space where the lower half of the virtual address space is the application code and data and the upper half of the address space is the operating system code and data, the drivers, the executive pools (sounds posh), etc. In order to switch between process address spaces, all that Windows has to do is switch the CR3 register from the base of process A’s tables to the base of process B’s tables. From there on the processor will decode all virtual addresses relative to the new process’ tables.

Thus, Windows keeps track of the base of each process’ tables inside the per-processor data structure. This can be seen as the DirBase in the !process output:

dirbase

Note that the address here is the physical address of the page directory table base. This is the exact value that will be put into the CR3 register on the processor when using this process’ address space.

Now that we have that out of the way, we can get back to the error message:

WARNING: Process directory table base <address> doesn't match CR3 <address>

What happens here is that the debugger looks at the current thread on the processor and then retrieves the containing process. From there it retrieves the DirBase value and compares it against the CR3 register. If the two don’t match, then you receive this error message when opening the dump. This can indicate that the dump file is corrupted somehow, but generally is a result of two things:

1) A crash occurring inside the process switching code. The switching of the current thread and the CR3 register are not an atomic operation, thus there are transient states where the current thread will reference a different process than the CR3 register.  Sounds scary, but the O/S is OK during this state since the O/S address space is the same across all processes.

2) You generate a crash dump with any of the “live dump” utilities, such was Windd (http://pangowings.msuiche.net/) or LiveKD (http://technet.microsoft.com/en-us/sysinternals/bb897415.aspx). These tools provide an “in flight” snapshot of the system and thus can collect inconsistent information.

The error code is important to note if you’re trying to look at per-process state as you may be looking at process memory using the wrong set of virtual tables. You’ll need to make sure that you manually switch into whatever process you want to look at, even if it’s the current process at the time of the dump.

Beware using user mode handles in a driver

Tuesday, November 17th, 2009

Driver Verifier has been updated in Win7 and several new checks have been added. One of the more interesting checks is the check for accessing user mode handles for kernel mode access. So, for example, take a handle from a user mode application and call ObReferenceObjectByHandle specifying KernelMode as the access mode. Prior to Windows 7 this would bypass any access checking and provide the underlying object pointer. On Windows 7 it will do the same without Verifier enabled, but with Verifier enabled you’ll receive a DRIVER_VERIFIER_DETECTED_VIOLATION (0xC4)/0xF6 bugcheck.

The reasons behind this check are, of course, security and reliability. The problem with user handles is that the user mode application has access to them as well as the driver. So, for example, the application could close the handle or the object that the handle maps to could change by the application opening and closing things while the driver is working with the handle.

Fixing this crash is quite simple in a driver by:

1) Always specifying OBJ_KERNEL_HANDLE in your OBJECT_ATTRIBUTES structures when creating a new handle. An alternative to this is to make sure that you always open your files in the context of the System process.

2) If you’re working with a user mode handle, always specify UserMode as the access mode parameter to any API that requires one.

One final thing that I’ll mention is that kernel handles have an interesting characteristic on Windows 2000 and Windows XP: IRP_MJ_CREATE and IRP_MJ_CLEANUP operations for kernel handles are sent in the context of the System process. Thus, if you specify OBJ_KERNEL_HANDLE and call ZwCreateFile, the I/O manager will call KeStackAttachProcess and force a switch into the System process before calling the target driver with an IRP_MJ_CREATE. Once the target driver completes the IRP, the I/O manager will switch back into the original process context. Later when you ZwClose the handle, the I/O manager will again switch back into the System process before sending the IRP_MJ_CLEANUP to the target driver. Any other operation against the handle arrives in the context of the requesting process, which can lead to some strange behavior if the target driver pays attention to such things.