The Case of the Veeerrry Slow Logons


This case is my favorite kind of case, one where I use my own tools to solve a problem affecting me personally.  The problem at the root of it is also one you might run into, especially if you travel, and demonstrates the use of some Process Monitor features that many people aren’t aware of, making it an ideal troubleshooting example to document and share.

The story unfolds the week before last when I made a trip to Orlando to speak at Microsoft’s TechEd North America conference. While I was there I began to experience five minute black-screen delays when I logged on to my laptop’s Windows 7 installation:

image

I’d typically chalk up an isolated delay like this to networking issues, common at conferences and with hotel WiFi, but I hit the issue consistently switching between the laptop’s Windows 8 installation, where I was doing testing and presentations, and the Windows 7 installation, where I have my development tools. Being locked out of your computer for that long is annoying to say the least.

The first time I ran into the black screen I forcibly rebooted the system after a couple of minutes because I thought it had hung, but when the delay happened a second time I was forced to wait it out and face the disappointing reality that my system was sick. When I logged off and back on again without a reboot in between, though, I didn’t hit the delay. It only occurred when logging on after a reboot, which I was doing as I switched between Windows 7 and Windows 8. What made the situation especially frustrating was that whenever I rebooted I was always in a hurry to get ready for my next presentation, so had to suffer with the inconvenience for several days before I finally had the opportunity to investigate.

Once I had a few spare moments, I launched Sysinternals Autoruns, an advanced auto-start management utility, to disable any auto-starting images that were located on network shares. I knew from previous executions of Autoruns on the laptop that Microsoft IT configures several scheduled tasks to execute batch files that reside on corporate network shares, so suspected that timeouts trying to launch them were to blame:

image

I logged off and logged back on with fingers crossed, but the delay was still there. Next, I tried logging into a local account to see if this was a machine-wide problem or one affecting just my profile. No delay. That was a positive sign since it meant that whatever the issue was, it would probably be relatively easy to fix once identified.

My goal now was to determine what was holding up the switch to the desktop. I had to somehow get visibility into what was going on during a logon immediately following a boot. The way that immediately jumped to mind as the easiest was to use Sysinternals Process Monitor to capture a trace of the boot process. Process Monitor, a tool that monitors system-wide file system, registry, process, DLL and network operations, has the ability to capture activity from very early in the boot, stopping its capture only when the system shuts down or you run the Process Monitor user interface. I selected the boot logging entry from the Options and opened the boot logging dialog:

SNAGHTML274998d

The dialog lets you direct Process Monitor to collect profiling events while it’s monitoring the boot, which are periodic samples of thread stacks. I enabled one-second profiling, hoping that even if I didn’t spot operations that explained the delay, that I could get a clue from the stacks of the threads that were active just before or during the delay.

After I rebooted, I logged on, waited for five minutes looking at a black screen, then finally got to my desktop, where I ran Process Monitor again and saved the boot log. Instead of scanning the several million events that had been captured, which would have been like looking for a needle in a haystack, I used this Process Monitor filter to look for operations that took more than one second, and hence might have caused the slow down:

SNAGHTML27bf247

Unfortunately, the filter cleared the display, dashing my hopes for quickly finding a clue.

Wondering if perhaps the sequence of processes starting during the logon might reveal something, I opened the Process Tree dialog from the Tools menu. The dialog shows the parent-child relationships of all the processes active during a capture, which in the case of a boot trace means all the processes that executed during the boot and logon process. Focusing my attention on Winlogon.exe, the interactive logon manager, I noticed that a process named Atbroker.exe launched around the time I entered my credentials, and then Userinit.exe executed at the time my desktop finally appeared:

image

The key to the solving the mystery lay in the long pause in between. I knew that Logonui.exe simply displays the logon user interface and that Atbroker.exe is just a helper for transitioning from the logon user interface to a user session, which ruled them out, at least initially. The black screen disappeared when Userinit.exe had started, so Userinit’s parent process, Winlogon.exe, was the remaining suspect. I set a filter to include just events from Winlogon.exe and added the Relative Time column to easily see when events occurred relative to the start of the boot. When I looked at the resulting items I could easily see the delay was actually about six minutes, but there was no activity in that time period to point me at a cause:

image 

Profiling events are excluded by default, so I clicked on the profile event filter button in the toolbar to include them, hoping that they might offer some insight:

image

In order to minimize log file sizes, Process Monitor’s profiling only captures a thread’s stack if the thread has executed since the last time it was sampled. I therefore was expecting to have to look at the thread profile events at the start of the event, but my eye was drawn to a pattern of the same four threads sampled every second throughout the entire black-screen period:

image

I was fairly certain that whatever thread was holding things up had executed some function at the beginning of the interval and was dormant throughout, so was skeptical that any of these active threads were related to the issue, but it was worth spending a few seconds to look at them. I opened the event properties dialog for one of the samples by double-clicking on it and switched to its Stack page, on the off chance that the names of the functions on the stack had an answer.

When I first run Process Monitor on a system I configure it to pull symbols for Windows images from the Microsoft public symbol server using the Debugging Tools for Windows debug engine DLL, so I can see descriptive function names in the stack frames of Windows executables, rather than just file offsets:

SNAGHTML3eeaff2

The first thread’s stack identified the thread as a core Winlogon “state machine” thread waiting for some unknown notification, yielding no clues:

image

The next thread’s stack was just as unenlightening, showing the thread to be a generic worker thread:

image

The stack of the third thread was much more interesting. It was many frames deep, including calls into functions of the Multiple UNC Provider (MUP) and Distributed File System Client (DFSC) drivers, both related to accessing file servers:

image

I scrolled down to see the frames higher on the stack and the name of one of the functions, WLGeneric_ActivationAndNotifyStartShell_Execute, pretty much confirmed the thread to be the one responsible for the problem, since it implied that it was supposed to start the desktop shell:

image

The next frame’s function, WNetRestoreAllConnectionsW, combined with the deeper calls into file server functions, led me to conclude that Winlogon was trying to restore file server drive letter mappings before proceeding to launch my shell and give me access to the desktop. I quickly opened Explorer, recalling that I had two drives mapped to network shares hosted on computers inside the Microsoft network, one to my development system and another to the internal Sysinternals share where I publish pre-release versions of the tools. While at the conference I was not on the intranet, so Winlogon was unable to reconnect them during the logon and was eventually – after many minutes – giving up:

image

Confident I’d solved the mystery, I right-clicked on each share and disconnected it. I rebooted the laptop to verify my fix (workaround to be precise), and to my immense satisfaction, the logon proceeded to the desktop within a few seconds. The case was closed! As for why the delays were unusually long, I haven’t had the time – or  need – to investigate further. The point of this story isn’t to highlight this particular issue, but illustrate the use of the Sysinternals tools and troubleshooting techniques to solve problems.

TechEd Europe, which took place in Amsterdam last week, gave me another chance to reprise the talks I’d given at TechEd US. I delivered the same Case of the Unexplained troubleshooting session I had at TechEd US, but this time I had the pleasure of sharing this very fresh and personal case. You can watch it and my other TechEd sessions either by going to my webcasts page, which lists all of my major sessions posted online, or follow these links directly:

Windows Azure Virtual Machines and Virtual Networks
Windows Azure Internals
Malware Hunting with the Sysinternals Tools
Case of the Unexplained 2012

And you can see all of both event’s sessions online at their webcast sites:

TechEd North America 2012 On-Demand Recordings
TechEd Europe 2012 On-Demand Recordings

I hope you enjoyed this case!


Comments (20)

  1. Anonymous says:

    Why are there never any "<Prev>" or "<Next>" links at the bottom of your blogs for multiple pages of comments? (Clicking a single, very small number isn't always easy for those of us with less-than-perfect hand-eye coordination.) Thx!

  2. Voffka_UA says:

    hmm… this is known problems with the network shares … But in my case it was resolved just practically :). And this is the old problem …

  3. Nice post – it's always nice reading along someone unraveling a mystery.  In your screenshot with the big red "SIX MINUTES" in it, as soon as I saw the "\redmondipc$" activity, I was smiling.  I wouldn't have gotten to the point of looking at thread stacks, but I understand that it's a great opportunity to showcase more of the tool.

    But I get the feeling there's more to this mystery. Isn't Windows capable of loading your shell and attempting to reconnect your network drives at the same time?  When I log in to a PC with unavailable network drives, the GUI loads, and then a few seconds later I get a little tooltip balloon in my tray saying something like "could not reconnect all network drives."

    Is your \redmondfilesSYSINTERNALS share using WebDAV instead of the usual SMB?  I wonder if that might have something to do with the massive timeout…

  4. Anonymous says:

    Thanks for posting this!  I've barely just begun to use SysInternals and assumed there was some boot logging function available, so this is exactly what I needed!

  5. I'm not sure why the delays were so abnormally long and haven't had the time – or need – to investigate further. The point is that I solved my immediate problem using Sysinternals tools and troubleshooting techniques.

  6. Tom Sweet says:

    Isn't there a role for Network Location Awareness (NLA) here?

    I could see Winlogon: 1) identifying the network the system is on, 2) comparing the current network to the network that the drives were mapped from, 3) make the connection if equal, 4) dynamically bring up a tunnel to the shares' home network, or leave in a disconnected state until accessed then dynamically bring up a tunnel.

    Suffering a six minute delay when instrumentation is available to avoid the problem seems awkward at best.

    I was wondering about restoring the drive maps asynchronously and letting the shell come up.  There might be profile, folder redirection, and startup program issues with that.   Still those errors could happen for a variety of reasons including being simply disconnected from a net and the shell has to handle that gracefully.

  7. Anonymous says:

    Hi Mark,

    For my users with laptops, I have the following GPO option explicitly set to Disabled:

    Computer/Administrative Templates/System/Logon/Always wait for the network at computer startup and logon

    That setting has kept my users from having similar issues.

  8. techvet2 says:

    This is the exact same issue I see when logging on to my work account on my work laptop at the kitchen table while disconnected from work.  It hangs for about five minutes and then finally lets me in.  I have taken note of the suggestion by "eclayp" but this will probably not be implemented because there are probably very few users in my company who use the laptop in that fashion.

  9. Anonymous says:

    I ran into a similar problem at my enterprise the other day – we have approx. 500 users that travel a lot and usually boot to desktop and then connect to the corporate network via a vpn client. A user began to experience long delays after entering his credentials. The logon process would hang at the Welcome screen with the blue swirl for approx. 2-3 minutes so I turned to Process Monitor and captured a trace of the slow logon operation via remote desktop. At the first glance the net result was pretty much the same: mpnotify.exe launched around the time user entered his credentials and userinit.exe launched two and a half minutes later, so I isolated winlogon.exe, enabled profiling events and turned to stacks (as any good Mark Russinovich presentation will tell you, you need stacks).

    As mentioned here, call stack included calls into functions of the MUP and DFSC drivers.  The Process Monitor trace also revealed that the winlogon process was querying registry keys related to the mapped network drives (HKUS-1-5-21-…Network*) and closed said keys two and a half minutes later. Disconnecting network shares was not an option, so I examined the trace right before the big gap hoping to find any network related clues. The first hit was HKLMSystemCurrentControlSetControlNetworkProviderRestoreTimeout – a setting which allows to overwrite the default time-out for a network provider. The next hit, a few operations earlier, was an attempt by winlogon to read HKLMSystemCurrentControlSetControlNetworkProviderRestoreConnection. By creating the registry value and setting it to 0 I could trick Windows to restore network connections only if user tried to access the respective network drive rather than trying to reconnect at logon. I then went ahead and captured another Process Monitor trace which served as confirmation – logon times improved dramatically.

  10. "Robert Silver" has a point. I assume a large number of users everyday log into laptops with disconnected network shares. So, I assume by this time, this problem should have been very well known. Are we missing something here?

  11. Anonymous says:

    Thanks Mark,

    slow logins seem to be a "never-ending-story" So this articel is very helpful to get under the hood…

  12. Anonymous says:

    This is the opposite problem that I have.  My system takes a long time to *shutdown*.  I've tried using Process Monitor to discover the problem…but no luck.  Do you want to make an example of my system Mark?

    Peace,

    Randy

  13. JS2010 says:

    @WndSks, yes in fact Windows 2008R2 has verbose status on.

    I thought the solution would be to disable active setup and shorten the $mft.

  14. skSdnW says:

    Does the verbosestatus policy (KB325376) from the 2000 era still work?

  15. Anonymous says:

    "Trying to drive his car home like many times before a driver suddenly noticed a dramatic loss of speed and steerability along with his car making strange noises. After first trying to ignore the problem for a while he finally lost his patience and decided to track down the problem. He got himself one of those high-tech car diagnostics units, plugged it in and took some test drives to get measurements. Reading through zillions of diagnostics messages, one cought his attraction which stated that there might be an unusually high difference in wheel speed between the front wheels. After much guessing and stabbing in the dark finally a message from the anti-lock-brake system cought his attrraction. Actually there wasn't anything wrong with the brake system, but the message brought to his mind the fact that his car had one at all, and for some reason, mostly intuition, he decided that he could have a look at its mechanical parts. Inspecting the front wheels he found that one had fallen off. He attached his reserve wheel and drove off, problem fixed. Beeing very delighted, he told the world about the usefulness of car diagnostics computers, and that everyone should get himself one and carry it all day."

    Still unresolved remains the question why the wheel had fallen off…

  16. @Randy: If you aren't getting anywhere with ProcMon, get a Windows Performance Toolkit (WPT) xbootmgr trace. WPT is in the Win7+.NET 4.0 SDK.

    xbootmgr -shutdown

    Look at the phase graphs.

  17. I’m using xbootmgr for this. The generic graph shows en entry "RestoringNetworkconnections" with a start and stop time:

    public.sn2.livefilestore.com/…/Networkconnections.PNG

    Here you can see that it takes 23s.

  18. Robert Silver [MCM] says:

    It's all very good and the case is interesting. However, aren't there many millions of people with disconnected network drives struggling to logon? Is it that difficult to say "if id doesn't connect in 2 secs, forget it".

  19. Sol Birn says:

    Just had this exact issue on a client's laptop (where the VPN doesn't kick in until after login). Not fixed yet, so I suppose the official Microsoft answer is to use DirectAccess.

  20. Aakash Shah says:

    +1 to disabling "Always wait for the network at startup" for laptop users – this has significantly helped our laptop users boot faster and be more productive.  While this may require a second reboot for some new/changed Group Policies to take effect, the net result is still better for the end user.

    Also if you can, I would consider not using a Network drive at all and using UNC paths with Network Locations.  And in Win7 onwards, Network Locations appear in the Computer window along with mapped drives, unlike XP where the Network Locations were in a separate My Network Places window.

    Another thing that may have affected Mark was that his paths were not using FQDN paths.  Without using FQDN paths, the computer would have attempted to do a search for the server using the available DNS suffixes, which may have added to the delay.  I would encourage using FQDN paths to help avoid delays associated with this.

Skip to main content