The 5 million microsecond question

Recently I was helping a costumer analyzing their Windows image and the OS performance, in particular the boot times.

And for this customer 1st logon was a big priority, since it was taking around +200 seconds. At least they wanted to understand what was going on, even that we would or would not be able to optimize it.

Here’s a bit of info before we start:

  • Corei5 + 8GB RAM machine
  • Win7 + SP1 + latest windows updates installed
  • They were not using Roaming Profiles
  • They were using Folder Redirection, but to a local partition, instead of going to a remote server
  • Some GPOs/GPPs, but nothing too much complex like WMI/Item Level Targeting
  • The +200 seconds could be reproduced either by Wi-Fi or LAN

Called in the “big guns”, aka Windows Performance Toolkit (WPT), version 5, and took a boot trace for a 1st logon. Now, there’s already a lot of info about WPT, xperf, wpa and xbootmgr around the web and therefore no need to explain it here.

I usually like to start with a simpler trace just to make sure that the machine will not be affected for the amount of info being generated, like an “xbootmgr -trace boot -traceflags DiagEasy”. But in this case the computer had nice specs and it would easily be able to cope with more flags+stackwalk.

So, I basically issued the following command:

“xbootmgr -trace boot –traceflags base+latency+dispatcher -stackwalk profile+cswitch+processcreate+threadcreate+readythread”

For stackwalk, don’t forget to disable paging executive (https://support.microsoft.com/kb/184419).

Rebooted, logged in with a never logged on before user, and waited :)

The result was this:

Around 209 seconds to arrive at the desktop. The purple bar accounts for all the activity after the desktop is ready, which in the case was a bit extended, but nevertheless this was not the main issue.

The issue, which is quite easy to spot, was the Winlogon phase, where we have Services, CAD (CTRL+ALT+DEL), GPO, etc. With WPA (Windows Performance Analyser), there’s a specific graphic for the Windows Logon along with his sub phases.

Let’s take a peek:

 

Hum… it seems that the wait was because of GPClient. Ok, there are several ways to troubleshoot group policies, but with an xbootmgr trace, by default, we have group policy events logged.

Going to Generic Events, table mode, filtering for Windows Logon and Group Policy providers, and moving the column time to the left side (grouping by time that is), we arrived to this view:

 

Now let’s hunt our waits! Or by other words, make use of your mouse scroll and search for big intervals of time.

The first one to stand out is this:

 

This is what happens when our fingers don’t collaborate and we mistype our password :D

Moving on, we’ve stumbled into a 37 sec gap, just after Folder Redirection started.

 

Filtering by the start and stop event for Folder Redirection, it looks like that we spent 113 seconds processing that specific CSE (Client Side Extension).

 

Let’s do a short cut.

I can tell you that there was no resource contention (CPU, Disk, Memory), or network issues. We also did some network packet traces with netsh and reviewed them with Netmon, but nothing was wrong.

Since we activated stackwalk on this trace, I decided to start analyzing the process and thread responsible for the Folder Redirection CSE. 

After going to “CPU Usage (Precise)”, basically CPU usage based on context switch, and arranging a little bit the columns and walking the stack, here’s the result.

At this image it’s possible to see the start of GPO processing, moving on to Folder Redirection CSE (fdeploy.dll) and then calling in “SHELL32.DLL”, the shell, which will be responsible to create the new directories.

 

It seems that we have 11 times * 5 second wait, making a total of 55 second doing nothing… since the readying process was “Idle (0)”.

We have 11 waits, which by coincidence are exactly the same number of folders that are being redirected. So, I would say that just before performing the redirection of each folder, there’s a 5 second wait… But for what?!?

 

Took another trace, again reproducing a first logon, and added new flags in order to obtain more details, and therefore a possible insight.

“xbootmgr -trace boot –traceflags base+latency+dispatcher+registry+file_io+file_io_init -stackwalk profile+cswitch+processcreate+threadcreate+readythread” 

Re-did all the same steps has before, and used one of the eleven 5 second interval. In this new trace, the responsible process was “svchost.exe (924)”, along with thread 1896.

Adding to the same analysis view file+registry information and making use of WPA synchronized selection (basically highlights events in different graphics/tables when you select a particular interval/event in time), gave me this vision:

´

 

During the creation of the new folder, where the redirection will be done to, there’s some registry keys that have “STATUS_OBJECT_NAME_NOT_FOUND”.

Might be worth it to search in there :)

A little bit of search magic (searching by “redirection” for example) returned a registry key “FolderRedirectionWait”.

We have a winner!!! Now it was just a question of digging on the Web to understand why this key was called and doesn’t exist by default.

From https://social.technet.microsoft.com/Forums/pt-BR/w7itproperf/thread/dc3b77da-7fe4-4b09-bf95-aabcce4b7000:

“The 5 seconds delay is due to the NTFS quota.

If the server has per-directory auto quota enabled, it might take them a while to apply to the folder we just created. So, if we are query quota too early, we might get the wrong number. It is very hard from client side to know the auto quota applied on the server. So let's just wait a little bit here.”

There we have it! The delay is by design in order to check for NTFS quotas on the server side. Only when the option “Move contents…” in the Folder Redirection policy is enabled.

In this specific scenario, the redirection was being done to the same disk but a different partition, with no NTFS quotas. Which means that we can skip this 5 second interval by creating the registry key and set the value to 0 (millisecond).

 Here's the final result after adding the key and redoing the 1st logon: