New WinDBG is finally here

February 27th, 2010

After almost a year without updates, the latest version of WinDBG (6.12.2.633) is available and ready for consumption.

Unfortunately, in order to get it you’re going to need to sit through a 600MB download since WinDBG is now only distributed along with the Windows Driver Kit. There’s also rumblings that it is available with the SDK, though I have not confirmed that yet.

Direct link to WDK download

Update

The WinDBG installer is available from the SDK as well as the WDK, however the SDK ships with an older version and not the latest. So, as of now the WDK is going to be the only way to get the latest WinDBG.

x64 Crash Dump Analysis: Every bit counts

February 14th, 2010

Happy Valentine’s Day! And what’s more romantic than a post about analyzing an x64 crash dump? If you haven’t picked up a card already, feel free to print this out and hand it to your significant other.

Way back in December, we started looking at the fundamentals of x64 crash analysis so that we could work up to analyzing an actual x64 crash. If you haven’t already, I suggest that you read them in order since we’ll put all of those posts in practice here:

x64 Trap Frames

x64 Calling Convention

x64 Stack Frame layout

Reconstructing parameters from x64 crash dumps

With that out of the way, we can start our analysis the way we always do with any crash by running !analyze -v:

pfnpa

The bugcheck code in this case is PAGE_FAULT_IN_NONPAGED_AREA (0×50). In order to solve this crash we should probably cover what exactly this crash code is indicating.

The kernel virtual address space in Windows contains lots of different memory regions that serve different purposes. Regardless of the purpose of the region, one characteristic that all kernel memory has is whether or not the memory is pageable. When a memory region is pageable, the Memory Manager (Mm) is free to take the contents of the physical page of memory, write it out to disk, and then invalidate the virtual address. The next time someone tries to read the contents of that address, a page fault occurs and the contents are brought back into memory from disk. Once the memory is again resident, the Mm fixes the virtual address pointer and resumes the thread. When a memory region is non-pageable, it means that the Mm promises to never page out the memory and invalidate the virtual address in this manner.

Having non-pageable memory is important in Windows because it is the only kind of memory that you are allowed to access at IRQL DISPATCH_LEVEL or above (see my previous post here for more on IRQL). The reason for this is that you are not allowed to perform any wait operations at IRQL DISPATCH_LEVEL or above and by using pageable memory you’re implicitly stating that you can wait if the memory that you’re trying to access is not resident in memory.

With that out of the way, we can understand what the particular bugcheck code means. These non-pageable regions are only guaranteed to be valid if you have a valid outstanding resource allocation from the Mm. Take, for example, non-paged pool, which is the kernel equivalent to the user mode heap with the exception that memory allocated from this pool is guaranteed to never be paged out to disk. However, that does not mean that every address within the non-paged pool area is valid at all times. The Mm may delay programming a particular virtual address in this region until he is going to return a pool allocation to a particular caller, or mark the virtual address as invalid when someone frees a valid pool allocation. If someone tries to access one of these invalid addresses, a page fault will occur and the Mm will inspect the invalid virtual address to decide what needs to be done with it. If this virtual address corresponds to a region that is guaranteed to not page fault when valid, the Mm calls KeBugCheck with a bugcheck code of PAGE_FAULT_IN_NONPAGED_AREA. If you think about it, this is the only reasonable thing that the Mm can do since there is no solution to this state (you can argue that there are things that could be done, but that’s in the realm of fault tolerant systems and not relevant to the discussion).

We can now break down the text associated with the bugcheck code and understand a bit more of what it means:

Invalid system memory was referenced.  This cannot be protected by try-except, it must be protected by a Probe.  Typically the address is just plain bad or it is pointing at freed memory.

Invalid system memory was referenced.

This bugcheck is always the result of dereferencing a bad kernel virtual address.

This cannot be protected by try-except, it must be protected by a Probe.

If you dereference a bad user virtual address, a structured exception is raised that your driver can catch in a structured exception handling (SEH) block. When an invalid kernel address is accessed, there is no structured exception raised and the system simply bugchecks. The comment here about being protected by a probe is misleading. There is no way to validate a kernel address other than dereferencing it and hoping for the best. The idea is that kernel callers are trusted, thus if a kernel component hands you a kernel virtual address you must assume that it is valid. What the comment here is referring to is that you should not be touching kernel virtual addresses that originated from user mode. You can avoid this situation by calling ProbeForRead or ProbeForWrite on any address handed to you from user mode, which will raise an exception if the address is a kernel virtual address. This is only useful if you’re performing METHOD_NEITHER I/O and is not relevant to our conversation.

Typically the address is just plain bad or it is pointing at freed memory.

This is the short version of what we’ve been talking about up until now. If you have page faulted on an address in the non-paged area it means that you have dereferenced something that is not a valid memory allocation. Generally this means that it’s a garbage value (e.g. uninitialized pointer), freed memory, or some kind of corruption (e.g. a pointer value from a corrupted data structure).

Now we should have a much better idea as to what we’re looking at and the kind of bug that we’re looking for. According to the !analyze output, the invalid address that caused the wreck was 0xffffba80`07122a88 and we were attempting to write to the address (parameter 2 from the bugcheck information). Let’s look at the trap frame output and attempt to identify the invalid address in the faulting instruction:

traprtl

If you haven’t read my previous post about x64 trap frames, the output above will likely be confusing. There are two pointer dereferences in the above output, the instruction pointer RIP and RBX. Neither one of these values is 0xffffba80`07122a88 and, in fact, this looks to be a NULL pointer dereference since RBX is zero. However, as we know, the trap frames on the x64 do not contain non-volatile register state and RBX is a non-volatile register. So, in order to get the value of RBX at the time of the crash back we’ll need to scroll back through the assembly and find another volatile register that either shadows RBX in this frame or we can use to derive RBX.

The first step to getting this information will be to execute the .trap command and get ourselves into the correct trap frame. We don’t need to find the trap frame address ourselves since it was already present in the !analyze output:

usetrap

Now that our registers are back, we can go back through the disassembly and figure out where RBX came from prior to the dump. I generally do this by bringing up the disassembly window (Alt+7) since I find that a bit more convenient in this situation than trying to use the keyboard shortcuts to navigate the assembly. Bringing up the window and scrolling back a bit shows RBX coming from RDX a few instructions earlier:

rdxrbx

If we go back and view the RDX register value contents, we’ll see that they match the pointer value from the first parameter to the bugcheck:

rdxbugcheck

We could have intuited the value of RBX based on the bugcheck information and our knowledge of x64 trap frames, though I like to take this extra step to make sure that I understand these things and also get my bearings with the dump. Also, in this case I know that the RBX value came from RDX, which is likely the second parameter to the function. Thus, if I can find a function prototype I can know what the type of the structure should be. This all provides me with greater context for the dump and a greater chance that I’ll have some success in analyzing it.

Let’s review what we have up to this point:

1) Someone has tried to write to an invalid system address

2) The address was passed as the second parameter to this routine

The next logical step that we need to explore is, “what kind of address is this supposed to be and why is this address invalid?

For the first part of this question, we can typically use the !address extension and figure out what region this address lies in. Unfortunately, at the time of this writing that command does not work on Windows 7 and determining this without that extension is beyond the scope of this article. There is one thing we can quickly check though and that is whether or not this address lies in one of the various executive pools, which we can do with the !pool command:

notpool

Based on that output, it’s likely that this is not a pool allocation. The fact that it states that it is corrupt or free pool doesn’t really mean much, it should really say, “I have no idea what this is, but it doesn’t look like pool to me.” While that doesn’t provide us much positive information, we at least know that it’s not likely a bad pool address due to being used after it was freed. This at least removes a class of bugs and narrows our search a bit.

Since it’s not pool and !address doesn’t work, the best we can now is inspect the PTE contents and see if there’s anything interesting there:

reallybadptr

According to the !pte output, this address is not only bad it’s really bad. At no level of the page translation process does this address have valid information. To me, this screams that this address is the result of an uninitialized pointer reference or a data corruption. Since the crash occurs in a Microsoft supplied component, the fact that this crash would come from an uninitialized pointer is very unlikely and so my sights are set firmly on some type of corruption. But what kind?

When I first analyzed this dump, I stayed at this point for almost 24 hours (luckily not straight!). I just couldn’t see what the corruption was or where it came from. I spent the time to go through every other thread in the system and even searched the kernel address space for other references to this address hoping for some light to shine on a clue. Since this was a Filter Manager structure that was corrupt, I also checked all of the current Filter Manager mini-filters with the !fltkd.filters command and was bummed to find only in box Microsoft supplied filters running.

At this point I took the advice that I give to all students: I walked away from the dump. Anyone who claims that crash dump analysis isn’t difficult is either lying or doesn’t get presented with many challenging dumps. Sometimes walking away from the dump gives you the fresh perspective and eyes that you need to spot something miniscule, such as a missing bit.

Before calling it quits on the dump, I gave it one last look and noticed something curious about the faulting pointer value when compared to the other values in the trap frame: only the high four bytes at the top of the address were 0xf, not the high five bytes like the other registers.

fournotfive

I felt like Archimedes in the bathtub, though I expressed my excitement in a slightly less dramatic fashion (and without all of the nudity). What if this was a single bit flip error? It’s possible that due to some sort of cosmic error there was a single bit in this address that should have come back as 1 but instead read as 0. So, I flipped the bit and found what was an entirely plausible pool address that indicated it was a valid Filter Manager allocation:

flipthebit

This gave me a plausible explanation for the dump: hardware failure. I wanted to collect at least one other piece of information to support this, so I decided to check to see if there were any physical pages marked as bad in this system. This this wouldn’t provide any type of solution, it would at least be another indicator that this machine was having hardware related issues. Thus, I inspected the state of the Page Frame Database in this machine using the !memusage command and did indeed find ~7MB of bad pages:

badpages

 

 

How to install xperf

February 13th, 2010

I’m finally living the full Windows 7 lifestyle, with everything from our home computers to my workstation running some flavor of Windows 7. Since I now don’t have any legacy platforms to deal with, I’m buying in to using all of the various features and utilities that are available on Vista and later (I skipped Vista entirely and have been clinging to XP).

One of the things that I’ve been most interested in getting a chance to play with is xperf, which is a performance measurement utility that leverages the Event Tracing for Windows (ETW) instrumentation that has been added to Windows over the years. I figured that with all of these Win7 systems around it would be worthwhile to have xperf installed on every machine in case there is some mysterious slowdown that needed investigation. While I thought this would be a straightforward download and install, it turns out that one needs to take a fairly circuitous route to get the necessary data capture and viewing tools installed.

The first thing you’ll need to do is install the Windows SDK, available here. Yes, the SDK…For some reason, the installation package of the tool is provided only in the SDK’s \Bin directory. Thus, in order to install the tool you’ll need to install the SDK so that you can get access to the MSI package that will actually install xperf.

While this is a pain, I suspect that you can get away with a fairly minimal SDK install (though I haven’t bothered to try that yet). Also, once you have the SDK installed in one location you’ll have the MSI you need and can just carry that around with you.

Once installed, you’ll need to navigate to the Bin directory and actually find the installation package. This was not intuitive to me, which is why I figured I’d bother writing this post. The files that you’re interested in are the \Bin\wpt_arch.msi files, where arch is whatever version of Windows you’re currently running: x86, x64, or IA64. Just double click on the file that is appropriate for your architecture and you’re on your way to drilling down to whatever performance issue you’re having.

For a good intro to xperf, check out the following Ntdebugging Blog post.

Why does my target machine sometimes become unresponsive after I set a breakpoint?

February 10th, 2010

Ever had this happen to you? You set a seemingly normal breakpoint in your target machine from WinDBG and resume the system:

bpset

You then go back over to your target system and it is completely unresponsive. The mouse won’t move, keyboard doesn’t work, etc. So, you go back over to your debugger system, break into the target, clear the breakpoint, and all is fine again. Disassembling the routine just to make sure all is well doesn’t show anything of interest that might explain this:

coderesident

So what’s up with that?

The problem is actually a variation of the issue that was described by Bryce Jonasson and Jen-Leng Chiu of the Debugging Tools for Windows team in a recent issue of The NT Insider, available here. The issue is that, even though the u command indicates otherwise, the offending code in question is actually paged out at the moment. Not entirely paged out to disk mind you, but currently in transition. We can see this by examining the state of the PTE with the !pte command:

notvalidintrans1

If the page is not currently valid, then why are we seeing valid data contents when we unassemble the virtual address? The reason is that, by default, the debugger will automatically decode and display the contents of pages that are in transition. They are still in memory and the PTE contains the information necessary to find the actual memory, so why not show the data? This behavior can be changed with the nodecodeptes parameter to the .cache command:

autodecodeoff

If we now try to disassemble the address we’ll see that the virtual address is in fact current invalid:

nolongermapped

We can restore the default behavior by specifying the decodeptes option:

decodeenabled

But why does the system go into a tailspin when we set a breakpoint on this address?  The reason is that the debugger cannot set a breakpoint on an invalid PTE. Thus, setting a breakpoint on this address sets what is called an, “owed” breakpoint. When we resume the target system, the target system will try like heck to set a breakpoint on this address every chance it gets, which might result in the system not doing much but trying to set this breakpoint.

The best workaround for this issue is to set a hardware access breakpoint on the address instead of a software breakpoint. This will utilize the support of the processor to break when someone finally executes this address instead of trying to set the breakpoint by replacing a byte of memory with an int 3 instruction: 

boabp

An alternative would be to use the .pagein command to force the memory to be paged in on the target, though paging in kernel memory requires support from the operating system that is only in Windows Vista and later. There is also the .allow_bp_ba_convert command, which enabled an option that would have automatically converted our bp breakpoint into a ba breakpoint in this situation.

Sometimes session context is important too

February 6th, 2010

I go on and on about thread and process context in this blog and in my courses, but every once in a rare while session context becomes an important topic.

Historically we’ve thought of sessions as being a Terminal Services only concept, where each user logged on to the Terminal Server is provided their own session. This allows the user to have their own desktop, mapped drive letters, processes, etc. Windows XP brought the idea of sessions to the desktop with the introduction of Fast User Switching, which provides each logged on user their own session. Windows Vista took advantage of the built in support of sessions to provide isolation from background services with interactive user tasks. This so called, “Session Zero Isolation” means that every Vista and later Windows installation is always running at least two sessions, session 0 containing critical Windows processes and services and session 1 providing the default user desktop.

In order to maintain global state for each session, Windows carves out a portion of the kernel virtual address space and stores and global data there. In order to scale appropriately across hundreds or even thousands of sessions, Windows cannot simply use an array of global session data with an entry for each session. Instead, Windows will use a single portion of the address space and simply map it differently based on the session that the current process runs in.

Why this is interesting is that we typically believe that all kernel virtual addresses are the same across all processes. However, we may find that different kernel virtual addresses are mapped differently depending on which session our process context maps to, just  like user space virtual addresses. An example of this was recently discovered on NTDEV and had to do with the tsddd.dll module. The OP couldn’t seem to figure out why he couldn’t read the image from either the debugger or his kernel driver. As it turns out, this image is only mapped in session zero on the target platform, which was Windows 7 (haven’t bothered to look anywhere else personally).

Let’s check it out with LiveKD. First, we can try to dump out the beginning of the image:

tsddds1

Nada. We can use the !pte command to determine what exactly is wrong with this address and we’ll see that there is simply no mapping for this address (as evidenced by the contents of the PTE being zero):

emptypte

Let’s use the !session command to look into which session we are in and what sessions are available:

session

According to this, we’re running in session one and there is another valid session, session zero. This matches what we understand about Vista and later where there are always at least two sessions running on the system. Let’s try switching to the other session and seeing if we can dump out the module header:

tsddds0

Jackpot! Thus, what we can determine from this is that this kernel module is currently only mapped into session zero on this machine.

Including user mode state in !process and !thread output

February 4th, 2010

I’ve talked about WinDBG and process context before and thought I’d share another tip on the subject that comes in handy to me day to day.

Both !process and !thread accept a flags value that will cause them to switch to the appropriate process context before displaying stack traces. This is great because it allows you to see the entire call stack all the way back into user mode when working with a live debug target or a complete memory dump.

The flags value in question is 0×10, and can be used in combination with any other flags values that you might use. The WinDBG documentation for !process describes the bit as follows:

Bit 4 (0×10)
(Windows XP and later) Sets the process context equal to the specified process for the duration of this command. This results in a more accurate display of thread stacks. Because this flag is equivalent to using .process /p /r for the specified process, any existing user-mode module list will be discarded. If Process is zero, the debugger displays all processes, and the process context is changed for each one. If you are only displaying a single process and its user-mode state has already been refreshed (for example, with .process /p /r), it is not necessary to use this flag. This flag is only effective when used with Bit 0 (0×1).

We can see the effect of the bit by running an experiment with LiveKD. Let’s check out the CMD process without the 0×10 flags value supplied:

 no0x10

 

Notice again how the stack walks off the end off the cliff when it hits user mode. Let’s try this again with 0×1f as the flags instead of 0xf:

 with0x10 

I’ll leaving running a similar experiment with !thread as an exercise for the reader.

Great description of IRQL by Jake Oshins

February 4th, 2010

Doron Holan’s blog has a guest post by Jake Oshins on IRQL that provids a nice summary on the concept:

http://blogs.msdn.com/doronh/archive/2010/02/02/what-is-irql.aspx

For those who aren’t aware, Jake has done lots of development work on the HAL and ACPI (amongst other things) so he’s the one that you want to talk to when it comes to core Windows concepts such as IRQL, interrupt handling, power management, etc.

A new appreciation for learning how to use WinDBG

February 1st, 2010

I’ve been working on something lately that requires me to debug a Cygwin built application. After spending the last 8 or so years using WinDBG as a debugger, I’ve taken for granted just how “obvious” the commands to do various things are. After struggling to even figure out how to get debug information compiled into the binary (which came after figuring out that a separate file with debug information wasn’t created for all builds!), I struggled to perform common tasks such as single stepping and displaying local variables. I often take for granted that the way to do these things in WinDBG is obvious since I’ve long forgotten what it was like to not know every command.

Definitely a good learning experience, sometimes it’s important to step back and remember what it was like to be a noob.

Reconstructing parameters from x64 crash dumps

January 14th, 2010

We’ve been working on a series of x64 related crashes so that we can work up to analyzing a crash dump from a real world system. In this post I’m going to demonstrate the techniques you can use to find the actual parameters passed to a function when working with an x64 crash. As mentioned previously, this can be tricky due to the x64 compiler’s use of volatile registers for parameter passing. So we’re going to have to rely on the parameters being stored into other registers or on the execution stack.

Let’s use the dump that we’re eventually going to analyze and get the parameters to the faulting subroutine back. Here’s a shot of the faulting instruction in the faulting module:

faultinginst

The faulting intruction is the second arrow and the first arrow shows the bad pointer value being loaded from RDX into RBX. The analysis will later show that this bugcheck is a result of a bad pointer dereference via the RBX register. As we know from our discussion of calling conventions, RDX is the second parameter to this function thus presumably this function was passed a bad pointer value as the second parameter.

Googling the function name gives us the function prototype for RtlDeleteNoSplay:

VOID
  RtlDeleteNoSplay(
    IN PRTL_SPLAY_LINKS  Links,
    IN OUT PRTL_SPLAY_LINKS *Root
    ); 

What I would like to do now is reconstruct the two parameters to the function so that I can perform further analysis. Because this can range from a simple task to something quite annoying, I’m going to demonstrate reconstructing the first parameter to this routine using three different levels of difficultly. This is always going to of course be very situation specific, but hopefully the three techniques shown here will come in handy.

EASY

In this routine I get lucky in finding the first parameter, as RCX is loaded into the R11 register before the crash:

 rcxr11

Since R11 is a volatile register, I can trust the value of R11 in the trap frame that I have so I can inspect the first parameter by looking in R11:

r11param

 But, what if RCX hadn’t been stored into R11 before the crash? Then we’d have to dig a little deeper and try to find the parameter in a previous frame. I like to do this as a practice exercise when I can, I already have the result that I know I have to get to so I can easily validate my result from going the hard way.

MEDIUM

So, how do I find the first parameter to this function without using R11? Well, I know that the previous function in the stack must have loaded a value into RCX to set up for this frame. So, I’ll start by walking up the stack to the previous function, which in this case is TreeUnlinkNoBalance:

treeunlink1

You’ll note here that RCX is stored in RBX before calling into RtlDeleteNoSplay. Now I know that upon entry to RtlDeleteNoSplay RCX was equal to RBX, so that gives me another register I can track inside RtlDeleteNoSplay in the hopes of recontructing the first parameter. So, let’s look at RtlDeleteNoSplay again:

rtldeleteprolog

Notice the first instruction of RtlDeleteNoSplay, it’s a push of RBX onto the stack! That means that the value of the first parameter to this routine was preserved on the execution stack. The RSP value in the trap frame will be that of the stack pointer at the time of the crash, thus in order to get the value on the stack back I will need to unwind any stack manipulations done after the save of the value on the stack. In this case, I simply have to add 0×20 to the stack pointer to undo the subtraction of 0×20 from RSP after the push:

r11match

HARD

But what if we had to go even further up in the stack to find a parameter? Would we be able to get a parameter to a function back? Let’s give it a shot!

For this one I’m doing to use a different routine with different parameters, since it’s further up in the stack and that will complicate our analysis.

Earlier in the stack PurgeStreamNameCache calls DeleteNameCacheNodes:

deletenamecachecall

Note that the routine sets up RCX by loading it with the value of RSI. Thus, if I want the first parameter back when I’m inside DeleteNameCacheNodes I can look for either RCX or RSI being loaded into another register or, even better, saved on the stack:

rsihomed

Immediately at the beginning of the function we see DeleteNameCacheNodes storing RSI into the R9 home space on the stack. This is great news for us, as it means that we can get this parameter back by plucking it off of the stack location. But, how can we figure out what stack pointer value to use?

This is the key trick to reconstructing parameters on the x64. It’s quite simple actually, all we need to do is execute the k command and get the Child-SP value for the frame that we’re interested in:

childsp

That is the value the stack pointer will be upon return from the current call that subroutine is making. That’s a bit confusing, but what I mean is that in this case we see DeleteNameCacheNodes calling TreeUnlinkMulti. The Child-SP indicates what RSP will be when that call returns. This makes it very easy to find anything on the stack that we want in DeleteNameCacheNodes since all we need to do is start unwinding the stack changes that were made in the prolog until we hit the value that we want:

offsets

(Note: Picture above fixed on 1/15 based on comments from Anonymous. See comments section of the post.)

Now that we have the correct values, we can just dump out that memory location with the dq command:

paramback

As a final tip, in this can I could have quickly found this value by leveraging the fact that the Args to Child in the kv output shows the four home space values from the stack. Since RSI was put in the R9 home space (RSP+0×20), I could have just done a kv and pulled the fourth value from the resulting output in the DeleteNameCacheNodes entry.

 kveasy

Happy New Year!

January 1st, 2010

Happy 2010 everyone! Hope it’s a healthy and prosperous year for us all.