I was in Barcelona a couple of weeks ago speaking at Microsoft’s TechEd/ITForum conference, where I delivered several sessions (two, Advanced Malware Cleaning and Windows Vista Kernel Changes earned the top #1 and #2 rated breakout sessions for the week - you can see an interview of me at the conference here). The conference was a huge success and Windows Vista, which I had taken on the road for the first time, performed great. However, as I was running through some demos before one of my sessions, I noticed that the file open dialog, which is common to all Windows applications, would often take between 5 and 15 seconds to appear.
I didn’t have time to investigate before my talk, so the delays caused me consternation when they showed up during my Windows Vista Kernel Changes session immediately afterward. The behavior felt uncannily like the one I wrote up a few blog posts ago in The Case of the Process Startup Delays. In that case, Windows Defender’s Remote Procedure Call (RPC) communications during process startup tried to contact a domain controller, which resulted in hangs when the system was disconnected from its domain. I mumbled excuses on behalf of Windows Vista and tried to distract the audience by explaining the subsequent demonstrations.
It wasn’t until the plane ride home that I got a chance to look into it. I followed steps similar to the ones I had when I explored the Windows Defender hangs. I launched Notepad from within Debugging Tools for Windows’ Windbg tool, typed Ctrl+O to open the File Open dialog, and when I got the hang broke in and looked at the stack of Notepad’s main thread:
If you haven’t seen a stack before, it’s a history from most recent to least of nested functions called by a thread. You read it from bottom to top, so the stack shows that Notepad had loaded Browseui.Dll and called its CAddressBand::SetNavigationState function. That function called CBreadcrumbBar::SetNavigationState, which called CBreadcrumbBar::SetIDList, and so on.
A look at the function names on the stack immediately told me what was happening: when you access the Open dialog the first time within an application it navigates to your documents folder. On Windows Vista my folder is C:\Users\Markruss\Documents, but the shell wants to make the path in the dialog’s new “bread crumb” bar pretty by displaying it as “Mark Russinovich\Documents”, and so it calls GetUserNameEx to lookup my account’s display named as it’s stored in my User object in Active Directory. I confirmed my theory by verifying that the first parameter SHGetUserDisplayName passes to GetUserNameEx, which is interpreted as the EXTENDED_NAME_FORMAT enumeration, is 3: NameDisplay.
I set a breakpoint on the call’s return and hit it after the delay completed. GetUserNameEx returned the ERROR_NO_SUCH_DOMAIN error code, and stepping through SHGetUserDisplayName revealed that it falls back to calling GetUserName. Instead of looking up the user’s display name, that function just obtains the Security Identifier (SID) of the user from the process token (the kernel data structure that defined the owner of a process) and calls LookupAccountName to translate the SID to its account name, which in my case is simply “markruss”. Thus, the dialog that appeared looked like this:
As opposed to this, which is what I saw when I got back to the office and connected to the corporate network:
I had solved the case, but was curious to know where exactly the delay was taking place and so continued by researching what was happening on the other end of the Secure32!CallSPM call that’s on top of the stack listing. I knew that the Local Security Authority (LSASS) process is responsible for authentication, including interactions with domain controllers and account name translations, so I attached Windbg to the Lsass.exe process (make sure that you detach the debugger from LSASS before exiting with the “qd” command, otherwise LSASS will terminate and the system will begin a 30-second shutdown). I figured that Secur32.Dll acts like both a client and server and confirmed that it was loaded into LSASS, but I needed to determined the server-side function that corresponds to Secur32!SecpGetUserName. I did so by brute force: I dumped the functions implemented by Secur32.Dll and looked for ones with “name” in them:
I set breakpoints on several of them and when I reproduced the delay I hit the one on SecpGetUserName and stepped through it to eventually get to this stack:
The DsGetDcName function is documented as returning the name of a domain controller in the specified domain. SecpTranslateName obviously need to find a domain controller to which to send the account display name query. I traced further, and discovered that LSASS caches the result of the lookup for 45 seconds, which explained why I didn’t see the delay if I ran a different application and accessed the File Open dialog immediately after getting a delay. Then I hit a temporary dead-end when Netapi32!DsrGetDcNameEx2 executed a RPC request.
Again, figuring that Netapi32 acts like a client and a server, I dumped its symbols and set breakpoints on functions containing “dc”. I let LSASS continue executing and to my surprise hit the exact same function, Netapi32!DsrGetDcNameEx2. I traced into the call deeper and deeper until the thread finally called into the kernel (Ntdll!KiFastSystemCallRet):
I was close to the end of my investigation. The last question I had was what device driver was Netlogon calling to send a browser datagram? I answered this by looking at the first parameter it passed to NlBrowserDeviceIoControl, which I guessed was a handle to a file object. Then I opened Windbg in Local Kernel Debugging mode (note that on Windows Vista you have to boot in debugging mode to do this), which lets you look at live kernel data structures, and dumped the handle’s information. That showed me the device object that was opened, which told me that the driver is Bowser.sys, the “NT Lan Manager Datagram Receiver Driver”:
I thought my investigation was complete, but when I later tried to reproduce the delays I failed. I retraced my footsteps and found that LsapGetUserNameForLogonSession caches the display name for 30 minutes. Further, an account’s display name is cached with cached credentials so you won’t experience the delays for the first 30 minutes after logging in or disconnecting from the corporate network. I confirmed that by waiting 30 minutes and reproducing the hangs.
My investigation had come to a close. I had determined that Windows Vista’s File Open dialog tries to look up a user’s display name for the “bread crumb” bar when showing the documents folder and in the process tries to locate a domain controller by sending a Lan Manager datagram via the Bowser.sys device driver. I also knew that there’s no workaround for the delayed dialogs and that anyone that has a domain joined system that’s not connected to their domain will experience the same delays - at least until Windows Vista Service Pack 1.