Sometimes session context is important too

February 6th, 2010

I go on and on about thread and process context in this blog and in my courses, but every once in a rare while session context becomes an important topic.

Historically we’ve thought of sessions as being a Terminal Services only concept, where each user logged on to the Terminal Server is provided their own session. This allows the user to have their own desktop, mapped drive letters, processes, etc. Windows XP brought the idea of sessions to the desktop with the introduction of Fast User Switching, which provides each logged on user their own session. Windows Vista took advantage of the built in support of sessions to provide isolation from background services with interactive user tasks. This so called, “Session Zero Isolation” means that every Vista and later Windows installation is always running at least two sessions, session 0 containing critical Windows processes and services and session 1 providing the default user desktop.

In order to maintain global state for each session, Windows carves out a portion of the kernel virtual address space and stores and global data there. In order to scale appropriately across hundreds or even thousands of sessions, Windows cannot simply use an array of global session data with an entry for each session. Instead, Windows will use a single portion of the address space and simply map it differently based on the session that the current process runs in.

Why this is interesting is that we typically believe that all kernel virtual addresses are the same across all processes. However, we may find that different kernel virtual addresses are mapped differently depending on which session our process context maps to, just  like user space virtual addresses. An example of this was recently discovered on NTDEV and had to do with the tsddd.dll module. The OP couldn’t seem to figure out why he couldn’t read the image from either the debugger or his kernel driver. As it turns out, this image is only mapped in session zero on the target platform, which was Windows 7 (haven’t bothered to look anywhere else personally).

Let’s check it out with LiveKD. First, we can try to dump out the beginning of the image:

tsddds1

Nada. We can use the !pte command to determine what exactly is wrong with this address and we’ll see that there is simply no mapping for this address (as evidenced by the contents of the PTE being zero):

emptypte

Let’s use the !session command to look into which session we are in and what sessions are available:

session

According to this, we’re running in session one and there is another valid session, session zero. This matches what we understand about Vista and later where there are always at least two sessions running on the system. Let’s try switching to the other session and seeing if we can dump out the module header:

tsddds0

Jackpot! Thus, what we can determine from this is that this kernel module is currently only mapped into session zero on this machine.

Including user mode state in !process and !thread output

February 4th, 2010

I’ve talked about WinDBG and process context before and thought I’d share another tip on the subject that comes in handy to me day to day.

Both !process and !thread accept a flags value that will cause them to switch to the appropriate process context before displaying stack traces. This is great because it allows you to see the entire call stack all the way back into user mode when working with a live debug target or a complete memory dump.

The flags value in question is 0×10, and can be used in combination with any other flags values that you might use. The WinDBG documentation for !process describes the bit as follows:

Bit 4 (0×10)
(Windows XP and later) Sets the process context equal to the specified process for the duration of this command. This results in a more accurate display of thread stacks. Because this flag is equivalent to using .process /p /r for the specified process, any existing user-mode module list will be discarded. If Process is zero, the debugger displays all processes, and the process context is changed for each one. If you are only displaying a single process and its user-mode state has already been refreshed (for example, with .process /p /r), it is not necessary to use this flag. This flag is only effective when used with Bit 0 (0×1).

We can see the effect of the bit by running an experiment with LiveKD. Let’s check out the CMD process without the 0×10 flags value supplied:

 no0x10

 

Notice again how the stack walks off the end off the cliff when it hits user mode. Let’s try this again with 0×1f as the flags instead of 0xf:

 with0x10 

I’ll leaving running a similar experiment with !thread as an exercise for the reader.

Great description of IRQL by Jake Oshins

February 4th, 2010

Doron Holan’s blog has a guest post by Jake Oshins on IRQL that provids a nice summary on the concept:

http://blogs.msdn.com/doronh/archive/2010/02/02/what-is-irql.aspx

For those who aren’t aware, Jake has done lots of development work on the HAL and ACPI (amongst other things) so he’s the one that you want to talk to when it comes to core Windows concepts such as IRQL, interrupt handling, power management, etc.

A new appreciation for learning how to use WinDBG

February 1st, 2010

I’ve been working on something lately that requires me to debug a Cygwin built application. After spending the last 8 or so years using WinDBG as a debugger, I’ve taken for granted just how “obvious” the commands to do various things are. After struggling to even figure out how to get debug information compiled into the binary (which came after figuring out that a separate file with debug information wasn’t created for all builds!), I struggled to perform common tasks such as single stepping and displaying local variables. I often take for granted that the way to do these things in WinDBG is obvious since I’ve long forgotten what it was like to not know every command.

Definitely a good learning experience, sometimes it’s important to step back and remember what it was like to be a noob.

Reconstructing parameters from x64 crash dumps

January 14th, 2010

We’ve been working on a series of x64 related crashes so that we can work up to analyzing a crash dump from a real world system. In this post I’m going to demonstrate the techniques you can use to find the actual parameters passed to a function when working with an x64 crash. As mentioned previously, this can be tricky due to the x64 compiler’s use of volatile registers for parameter passing. So we’re going to have to rely on the parameters being stored into other registers or on the execution stack.

Let’s use the dump that we’re eventually going to analyze and get the parameters to the faulting subroutine back. Here’s a shot of the faulting instruction in the faulting module:

faultinginst

The faulting intruction is the second arrow and the first arrow shows the bad pointer value being loaded from RDX into RBX. The analysis will later show that this bugcheck is a result of a bad pointer dereference via the RBX register. As we know from our discussion of calling conventions, RDX is the second parameter to this function thus presumably this function was passed a bad pointer value as the second parameter.

Googling the function name gives us the function prototype for RtlDeleteNoSplay:

VOID
  RtlDeleteNoSplay(
    IN PRTL_SPLAY_LINKS  Links,
    IN OUT PRTL_SPLAY_LINKS *Root
    ); 

What I would like to do now is reconstruct the two parameters to the function so that I can perform further analysis. Because this can range from a simple task to something quite annoying, I’m going to demonstrate reconstructing the first parameter to this routine using three different levels of difficultly. This is always going to of course be very situation specific, but hopefully the three techniques shown here will come in handy.

EASY

In this routine I get lucky in finding the first parameter, as RCX is loaded into the R11 register before the crash:

 rcxr11

Since R11 is a volatile register, I can trust the value of R11 in the trap frame that I have so I can inspect the first parameter by looking in R11:

r11param

 But, what if RCX hadn’t been stored into R11 before the crash? Then we’d have to dig a little deeper and try to find the parameter in a previous frame. I like to do this as a practice exercise when I can, I already have the result that I know I have to get to so I can easily validate my result from going the hard way.

MEDIUM

So, how do I find the first parameter to this function without using R11? Well, I know that the previous function in the stack must have loaded a value into RCX to set up for this frame. So, I’ll start by walking up the stack to the previous function, which in this case is TreeUnlinkNoBalance:

treeunlink1

You’ll note here that RCX is stored in RBX before calling into RtlDeleteNoSplay. Now I know that upon entry to RtlDeleteNoSplay RCX was equal to RBX, so that gives me another register I can track inside RtlDeleteNoSplay in the hopes of recontructing the first parameter. So, let’s look at RtlDeleteNoSplay again:

rtldeleteprolog

Notice the first instruction of RtlDeleteNoSplay, it’s a push of RBX onto the stack! That means that the value of the first parameter to this routine was preserved on the execution stack. The RSP value in the trap frame will be that of the stack pointer at the time of the crash, thus in order to get the value on the stack back I will need to unwind any stack manipulations done after the save of the value on the stack. In this case, I simply have to add 0×20 to the stack pointer to undo the subtraction of 0×20 from RSP after the push:

r11match

HARD

But what if we had to go even further up in the stack to find a parameter? Would we be able to get a parameter to a function back? Let’s give it a shot!

For this one I’m doing to use a different routine with different parameters, since it’s further up in the stack and that will complicate our analysis.

Earlier in the stack PurgeStreamNameCache calls DeleteNameCacheNodes:

deletenamecachecall

Note that the routine sets up RCX by loading it with the value of RSI. Thus, if I want the first parameter back when I’m inside DeleteNameCacheNodes I can look for either RCX or RSI being loaded into another register or, even better, saved on the stack:

rsihomed

Immediately at the beginning of the function we see DeleteNameCacheNodes storing RSI into the R9 home space on the stack. This is great news for us, as it means that we can get this parameter back by plucking it off of the stack location. But, how can we figure out what stack pointer value to use?

This is the key trick to reconstructing parameters on the x64. It’s quite simple actually, all we need to do is execute the k command and get the Child-SP value for the frame that we’re interested in:

childsp

That is the value the stack pointer will be upon return from the current call that subroutine is making. That’s a bit confusing, but what I mean is that in this case we see DeleteNameCacheNodes calling TreeUnlinkMulti. The Child-SP indicates what RSP will be when that call returns. This makes it very easy to find anything on the stack that we want in DeleteNameCacheNodes since all we need to do is start unwinding the stack changes that were made in the prolog until we hit the value that we want:

offsets

(Note: Picture above fixed on 1/15 based on comments from Anonymous. See comments section of the post.)

Now that we have the correct values, we can just dump out that memory location with the dq command:

paramback

As a final tip, in this can I could have quickly found this value by leveraging the fact that the Args to Child in the kv output shows the four home space values from the stack. Since RSI was put in the R9 home space (RSP+0×20), I could have just done a kv and pulled the fourth value from the resulting output in the DeleteNameCacheNodes entry.

 kveasy

Happy New Year!

January 1st, 2010

Happy 2010 everyone! Hope it’s a healthy and prosperous year for us all.

Associate WinDBG With .DMP Files

December 27th, 2009

Just recently finished upgrading my laptop to Win7 and remembered a quick WinDBG tip.

If you want to associate your .dmp files with WinDBG, execute the following WinDBG command line:

windbg.exe -IA

x64 Stack Frame layout

December 20th, 2009

It’s half-time in the New England Patriots game so now’s a good a time as ever to stack about the x64 compiler’s usage of the execution stack.

The x64 compiler on Windows performs all stack manipulation during the function prolog. This includes saving non-volatile registers on the stack, making room for local variables, and even making room for parameters passed to the functions that it calls (remember that the first four parameters are passed in registers with any remaining parameters passed on the stack).

This leads to another interesting feature of the x64 compiler’s use of the stack. All subroutines are guaranteed to have a “spill” or “home” space allocated for them on the stack by the caller. This space is large enough to house the first four parameters passed to the routine via registers: RCX, RDX, R8, and R9. In addition, this space comes before the space where the fifth parameter can be found.

A picture probably clears this up. Upon entry to a subroutine, RSP points to a location that contains the return address of the routine. RSP+8 points to the RCX home space, RSP+10 is the RDX home space, etc:

x64-frame1

It is important to realize however that these locations on the stack are not reserved to just hold RCX, RDX, R8, and R9. In some cases, the prolog of the function may “home” all of the parameter registers into these locations. In other cases, the prolog may “home” a subset. In other cases, the prolog may store completely unrelated register values here. Or the compiler may choose to ignore this space and not store anything here. It is effectively just a scratch space for the subroutine to use in whatever way it sees fit.

We’re going to extensively utilize the scatch space in any x64 crash that we analyze since it’s a great place to find parameters passed to functions and non-volatile registers that aren’t present in trap frames. Next up we’ll see just how we can use the space to easily reconstruct the parameter passed to a function.

(Just finished typing in time for kick off!)

Update

Anonymous makes a really interesting point in the comments:

It’s important to get this right – even though stack manipulations are performed during prolog, RSP itself _can_ change during execution of function body. For example, due to “_alloca” call (other cases, anybody?).

This isn’t a common programming practice in kernel mode, so I failed to mention that. If anyone else knows of any cases please let us know!

x64 Calling Convention

December 19th, 2009

We’re working up to analyzing an interesting crash by learning more about working with the x64…

In order to work with x64 dumps we’re going to need to understand the calling convention used, that is going to allow us to do things such as identify the parameters passed to a particular function.

The basic rule here is that the first four parameters to a function are passed in registers, with the remaining parameters to the routine passed on the stack. The typical registers used here are:

Parameter 1 - RCX

Parameter2 - RDX

Parameter 3 - R8

Parameter 4 - R9

(NOTE: There are special rules when it comes to things such as floating point operations. Full details for those scenarios can be found here.)

For example, if a routine begins by accessing the contents of the RDX register, we can know that this routine is accessing the second parameter. We can also imply from that the fact that the caller must have loaded RDX with a meaningful value in a previous frame:

rdx

There is a major issue with this convention however. And that is the fact that all four of these parameters are treated as volatile by the compiler, meaning that their contents do not need to be saved across subroutine calls. Thus it is entirely possible and in fact quite likely that the compiler will overwrite the contents of these registers with unrelated values over the course of the subroutine. This makes reconstructing parameter values quite difficult on the x64.

The next x64 post will talk about the unique way that the x64 compiler utilizes the execution stack, which will then lead to a more detailed discussion on how we can utilize the stack to get parameter information back when we need it.

WinDBG Command History extension

December 17th, 2009

Just posted a nifty new WinDBG extension on OSR Online. It provides quick access to all of the WinDBG commands that you’ve used in your current debug session:

http://www.osronline.com/article.cfm?article=547

It’s something that I’ve wanted for a while, so I finally decided to just shut up and code it!