Troubleshooting the intermittent slow logon or slow startup

Update: See also the following articles for up-to-date information on how to fix slow logon issues in Windows systems:

http://social.technet.microsoft.com/wiki/contents/articles/10130.root-causes-for-slow-boots-and-logons.aspx
http://social.technet.microsoft.com/wiki/contents/articles/10128.tools-for-troubleshooting-slow-boots-and-slow-logons.aspx
http://social.technet.microsoft.com/wiki/contents/articles/10123.troubleshooting-slow-operating-system-boot-times-and-slow-user-logons.aspx

Sometimes the following issue turns up as a support case with Microsoft Support:

Every now and then, we have a slow logon to several of our workstations.

We can't see a pattern in which users or computers are involved and we can't reproduce the issue consistently.

Most of the time everything works fine and the users log on without problems.

When the issue *does* occur; the machine typically hangs for a long time during the ‘Applying Computer Group Policy’ or ‘Applying User Group Policy’ or ’Running Startup Scripts’ stages.

This is usually a difficult and time-consuming problem to troubleshoot. The bigger your network and the farther away from the end-user you are, the longer the time you're likely to spend on the issue.

The first challenge is to filter out the relevant from the irrelevant, i.e. who is actually having a problem and who just feels that the computer should generally be taking less time to log them on. Additionally, identifying whether you're looking at a Slow Logon or a Slow Startup is crucial as it determines the possible causes.

The big problem is that to the end-user this seems like the same thing since the symptoms are the same.

Windows XP and Windows Vista have a feature called Fast Logon Optimization, which means that the user is allowed to enter their credentials before the machine itself is fully ready to service logons. If you logon immediately and you don't have an IP address for example, you log on using Cached Credentials and then get authenticated transparently later on. Computer Group Policy is then applied later in the background.
Now, this is an excellent feature for the end-user as it allows them to log on without waiting for the hardware to be ready but it does complicate any troubleshooting scenario as it means you can't trust that the machine will always be applying the Group Policy in the same order every time.

Add to this the fact that GPO aspects like security policy and registry policies aren't applied again for the next 13-18 hours unless a change has been made to them and you are guaranteed to not have the same behaviour at the first reboot of a machine compared to the second and subsequent reboots within that timeframe.
This leads to the classic scenario where slow logons are only reported on Monday mornings (assuming you've already disabled TCP Chimney and RSS and increased the MaxUserPort value from the default of 5000 on the DC's).

Finally, there may also be a difference in behaviour depending on whether you do a full shutdown or a standard reboot (due to network equipment or hardware initialization).

Imagine the game of Whack-a-Mole for a picture of how troubleshooting a scenario like this can feel...

To troubleshoot this you basically need to simplify the scenario as much as possible, the goal being to get consistent behaviour where you're getting the slow startup or logon behaviour every single time so you can focus your troubleshooting efforts.

Step 1 Create a control group of workstations (and servers if applicable) that you confirm have had the problem.

Step 2 Configure the machines in the control group to have the following settings:

  •  
    • In the Computer Configuration GPO (Local or through a GPO):
      • Enable System\Verbose vs normal status messages
      • Enable System\Logon\Always wait for the network at computer startup and logon (Turns off Fast Logon Optimization)
      • Enable System\Group Policy\Registry policy processing and System\Group Policy\Security policy processing and tick both checkboxes
    • Configure a Logon or Logoff Script to run gpupdate /force (this ensures that Group Policy is fully applied at the next reboot)
    • Enable Userenv debug logging on all machines as per KB221833
      (On Vista/W2k8 set DWORD GpSvcDebugLevel under HKLM\Software\Microsoft\Windows NT\CurrentVersion\Diagnostics to 10002 Hex)
    • Enable Netlogon debug logging on the client (and DC's if possible) by running nltest /dbflag:0x2080ffff
    • Enable Winlogon debug logging as per KB 245422

Hopefully, this produces a machine where the problem reproduces every time and where we can compare the output from several different logs to get a more complete picture of what's going on. Keep in mind different external factors that may affect this as well, such as whether the machine is being rebooted or shut down between each reboot or the connectivity to the switch that the machine is connected to.

Once you have a reproducable scenario, you can follow up with things like:

  •  
    • Procmon logs to spot unusual activity
    • Network traces to see what's happening on the wire
    • Memory dumps of the machine to analyze with Windbg
    • Checking hardware (typically switches, routers or NIC's involved)

Typically, you'll discover one of the following as the causes for the slow startup: 

  • an application that is either waiting for another application to be finished or network connectivity issues
  • startup or login scripts that are either taking a long time to finish or are timing out
  • applications that are colliding with parts of the OS or other applications, causing a timout that slows logon or startup
  • excessive re-ACL'ing of registry or folders
  • Incorrectly configured or faulty network equipment (see link below)

To spice things up, you may also be dealing with two or more issues simultaneously....which complicates things even more as it means that resolving one issue will appear to not resolve the issue at all if it is being treated as a general 'Slow Startup' issue. Since you don't know before you start troubleshooting you need to assume from the start that you may be dealing with two or more issues.

You should therefore treat each machine separately and be careful not to make assumptions about one based on data from another.

Finally, note that the logon is done when you see the desktop. Any delay after that is usually caused by applications that are starting post-logon (The Autoruns tool will give you more details on this).
This is something that needs to be confirmed when getting reports about slow logon of machines, the more programs you're loading at logon the longer it will take obviously.

If you find that your Userenv logs are being overwritten before you get to the machine it is on...there's a little trick that can be used to prevent it from being deleted.

Beware! Doing this will cause the Userenv.log file to grow without limits until it eventually fills the hard drive and crashes the system, so it should be used sparingly and only for specific troubleshooting scenarios and reset as soon as troubleshooting is done.

To do this; simply set the Userenv.bak file that is created under %systemroot%\Debug\Usermode to Read-only. When the size of the userenv.log grows beyond 200k, the system will first try to delete the userenv.bak file before renaming the original file. When this fails (because of the Read-only attribute), the original file is not renamed and everything continutes to be logged to it.

Incidentally, the Windows Firewall uses the connection-specific DNS Suffix when determining whether to use the Standard or Domain firewall profile. Example; Your DHCP server issues the DNS name myLAN.domain.com to LAN-connected clients and myWLAN.domain.com to your WLAN clients but your AD name is domain.com. In this case, one of your connections may get firewalled by the Standard Profile which can lead to strange problems and timeouts.
A quick way to resolve this (if the DHCP admins can't be convinced to change the issued DNS domain names) is to simply implement the GPO settings from 294785 and set the Connection-Specific DNS settings to the same as your AD DNS name (in this case domain.com).
Note that this will override the actual settings but the GUI will still show the settings from the DHCP server.
Warning: Make sure you do not acidentally change the Primary Domain Suffix with this, only the Connection-Specific DNS Suffix
Another potential point of delays is running personal firewall software on the client that blocks ports used by day-to-day Active Directory operations.

Additional material:

Massaging the XP registry for logon performance

http://blogs.technet.com/b/instan/archive/2011/08/29/massaging-the-xp-registry-for-logon-performance.aspx

 

Fixing Group Policy problems by using log files
http://technet2.microsoft.com/windowsserver/en/library/0907105e-7856-4c93-b97f-a9a306623af51033.mspx?mfr=true

Group Policy processing and precedence
http://technet2.microsoft.com/windowsserver/en/library/274e614e-f515-4b80-b794-fe09b5c21bad1033.mspx?mfr=true

Troubleshooting Group Policy application problems
http://support.microsoft.com/kb/250842

Network Monitor 3.1
http://www.microsoft.com/downloads/details.aspx?FamilyID=18b1d59d-f4d8-4213-8d17-2f6dde7d7aac&displaylang=en

New group policies for DNS in Windows Server 2003
http://support.microsoft.com/kb/294785

Using PortFast and Other Commands to Fix Workstation Startup Connectivity Delays (Netlogon 5719)
http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00800b1500.shtml

 

How to perform advanced clean-boot troubleshooting in Windows XP

http://support.microsoft.com/default.aspx?scid=kb;EN-US;316434