Hello, and welcome to our second post in the Windows 7 launch series. This post is going to be a long one, so buckle in. We’re going to start with an overview of Fault Tolerant Heap, which is a new feature in Windows 7 and Windows Server 2008 R2 and then go over some Memory Management pieces. If you’re not familiar with the general concepts of Heap, you may want to start by reading our previous posts – Heap, Part One and Heap, Part Two. Heap is a term used to describe several type of memory structures that are used to store information. For instance, every process has what is called ‘process heap’ and this exists as long as the process lives. A process can also have what is called a ‘private heap’, which is for use only by the process that creates it. A DLL can also create a heap, and does so within the memory space of the process that owns the DLL.
None of this probably seems interesting to you unless you are an application developer, but what is important is what happens if a heap becomes damaged. We on the Performance Team lovingly call this ‘Heap Corruption’, and yes, the word ‘lovingly’ is used sarcastically in this case. Heap corruption occurs when a region of heap is overwritten by bad data. Heap, like all memory structures, is broken down into what are called pages. If more data is written into a page that will fit, it ends up spilling over into the next page. The problem with this is that the act of writing into the next page is not fatal; when it happens no one is the wiser. However, when that next page needs to be accessed, it will encounter bad data and most likely crash. If you follow the logic of this you will realize that this means that the actual culprit that wrote the bad data is probably long gone by the time the crash occurs, and all we see if we debug it is the victim. Basically, if you are debugging an application crash and you see RtlAllocateHeap or RtlFreeHeap at the top of the faulting stack, you are probably a victim of heap corruption. Here is what a heap corrupted stack may look like in the debugger:
00000000`01abdfd8 00000000`77ef1ce6 ntdll!ExpInterlockedPopEntrySListFault+0x0
00000000`01abdfe0 00000000`77ef3cc7 ntdll!RtlAllocateHeap+0x278
00000000`01abe230 000007ff`7fd5fea0 rpcrt4!DCE_BINDING::DCE_BINDING+0x14f
00000000`01abe290 000007ff`7fd61b82 rpcrt4!RpcStringBindingComposeW+0xb0
00000000`01abe310 000007ff`7d4d81bf winsta!RpcWinStationBindSecure+0x4f
00000000`01abe3a0 000007ff`7d4d671a winsta!WinStationOpenServerW+0x75
00000000`01abe420 000007ff`5ee3bfef tscfgwmi!CWin32_TerminalService::ExecQuery+0xdf
00000000`01abe790 000007ff`6de41687 framedyn!Provider::ExecuteQuery+0x77
00000000`01abe7c0 000007ff`6de47813 framedyn!CWbemProviderGlue::ExecQueryAsync+0x2c3
00000000`01abedb0 000007ff`7fd69a75 rpcrt4!Invoke+0x65
00000000`01abee20 000007ff`7fe96cc9 rpcrt4!NdrStubCall2+0x54d
00000000`01abf3e0 000007ff`7fe961b6 rpcrt4!CStdStubBuffer_Invoke+0xb1
00000000`01abf420 000007ff`57369edb ole32!SyncStubInvoke+0x62
00000000`01abf4c0 000007ff`57369e27 ole32!StubInvoke+0x142
00000000`01abf580 000007ff`5718e41b ole32!CCtxComChnl::ContextInvoke+0x21e
00000000`01abf770 000007ff`5718cdcb ole32!STAInvoke+0x97
00000000`01abf7e0 000007ff`573692da ole32!AppInvoke+0x144
00000000`01abf870 000007ff`57369a55 ole32!ComInvokeWithLockAndIPID+0x5a9
00000000`01abf9d0 000007ff`57369373 ole32!ComInvoke+0x127
00000000`01abfa40 000007ff`5718d1f3 ole32!ThreadDispatch+0x2b
00000000`01abfa70 000007ff`5718d19a ole32!ThreadWndProc+0x13a
00000000`01abfb20 00000000`77c43abc user32!UserCallWinProcCheckWow+0x1f9
00000000`01abfbf0 00000000`77c43f5c user32!DispatchMessageWorker+0x3af
00000000`01abfc60 00000001`00011f31 wmiprvse!WmiThread<unsigned long>::ThreadWait+0x141
00000000`01abfef0 00000001`00013539 wmiprvse!WmiThread<unsigned long>::ThreadDispatch+0x519
00000000`01abff50 00000001`0001379d wmiprvse!WmiThread<unsigned long>::ThreadProc+0x2d
00000000`01abff80 00000000`77d6b71a kernel32!BaseThreadStart+0x3a
Unfortunately, this is a fairly common scenario that we see. Our previous posts on heap issues contain information on how we debug these, and more information is available in Microsoft KB Article 286470.
OK, so let’s look at how Windows 7 and Windows Server 2008 R2 mitigate heap corruption issues – Fault Tolerant Heap (FTH). The main goals of FTH are:
- Mitigate heap misuse patterns found to be the most frequent by applying shims
- Dynamically determine when to apply mitigations
- Monitor the effectiveness of attempted mitigations and disable them if they are not working
- Provide support for multiple mitigation methods:
- From Microsoft through Watson
- Using manual scenarios such as via the Application Compatibility Toolkit
- Autonomously return diagnostic data about heap corruption error patterns to Microsoft, and ultimately to Independent Software Vendors.
So basically, Fault Tolerant Heap (FTH) watches for applications that crash, and then tries to determine if the crash is due to heap corruption. If the conclusion is that it is, then FTH tracks the application to see if the frequency of the crash warrants a shim, or applies a shim on the next run, depending on its configuration and whether the internet is accessible. An administrator can also apply a shim manually using the Application Compatibility Toolkit. The FTH shim is designed to mitigate the most common causes of heap corruption, such as small buffer overruns and double frees. It also tracks subsequent behavior of the shimmed application to determine the degree to which the shim was successful. If it is deemed not successful, the shim is removed to minimize interference with normal application functionality.
Full FTH functionality is only supported on client SKU’s. This means that it does not monitor and shim applications running on server SKU’s. However, you can manually apply the shim to an application on a server using the Application Compatibility Toolkit. FTH also only applies to interactive programs. Since services are no longer allowed to interact with the desktop starting with Windows Vista and Windows Server 2008, they will typically not be eligible for automatic FTH monitoring. Again however, you can manually shim a service using the Application Compatibility Toolkit.
FTH runs as part of what is called the Diagnostic Policy Service, which runs within a SVCHOST.EXE process running under the Local Service account. Because of this, the Local Service account requires full Read access to the path of the application in question, or else it may track the application but never be able to apply the shim. The user’s desktop for instance is not fully readable by the Local Service account, so an application being run directly from the desktop will not be shimmed.
FTH registry values are stored in the following key: HKEY_LOCAL_MACHINE\Software\Microsoft\FTH. There are a number of values under this key, but the main ones to watch are:
- CrashVelocity – The crash count threshold specified in CrashWindowInMinutes. This basically means how many times the application has to crash within a given period for FTH to shim it. The default is 3.
- CrashWindowInMinutes – The timeframe in minutes which CrashVelocity must be met in order for the application to be shimmed. The default is 60. This value and CrashVelocity means that an app must crash 3 times within 60 minutes in order to be shimmed. Both of these values can of course be modified if needed.
- Enabled – Whether FTH is enabled or not. A value of 0 means FTH is disabled, and a value of 1 means FTH is enabled.
- ExclusionList – A list of processes that are excluded from FTH tracking and shimming. There are several Windows processes in this list by default.
- MaximumTrackedApplications – The number of processes that will be tracked concurrently by FTH. The default is 128.
- MaximumTrackedProcesses – The maximum number of instances of a tracked process that FTH will monitor concurrently. The default is 4.
- CheckPointPeriod – The amount of time between clean-up cycles. The clean-up cycle is when FTH periodically clears its list of tracked applications. This allows for the fact that applications may be fixed or upgraded and no longer require a shim, or that the application simply has not been used recently. The default is 10,080 (7 days).
There is also a State key under the FTH key. This key stores information on applications that have been shimmed. So for instance, if you open this key on a fresh machine, it should have nothing under it other than the typical Default – Value Not Set. Once an application crashes due to heap corruption more than 3 times within 60 minutes, it will be added to this key in the format of <Appname> = <binary blob>. You can’t read what is actually in the binary value, but it includes various information such as the process-specific versions of the values listed above. All this key is really useful for from a user standpoint is that you can view the key to see what if any processes have been caught crashing in what appears to be heap corrupting behavior. Overall, FTH should assist in automatically addressing many common application crashes without any sort of intervention by the user. Now, let’s turn our attention to some new memory management pieces within Windows 7 and Windows Server 2008 R2, beginning with Working Set Trimming …
In previous versions of Windows, especially 64-bit versions of Windows Server 2003, the size of the working set of system cache could potential grow to consume all, or nearly all of RAM. In Windows Server 2008 R2 and Windows 7, significant changes were introduced to the management of working sets to address that situation. The nature of the changes is as follows:
- The number of levels that can describe the age of working set pages increased from 4 to 8. This allows for richer aging information and more diverse trimming policies.
- The distribution of aged pages is better-tracked to enable better trimming decisions.
- The growth rate of individual working sets is tracked and rapidly growing working sets are monitored closely so that optimal trimming can be accomplished in a timely fashion.
- Excess is typically trimmed from large working sets rather than from very small ones, reducing the inequitable effect that low memory situations had on smaller processes (which would then ripple into downstream performance degradation for larger processes as the smaller ones end up causing many hard faults).
- The system cache has been separated into three distinct working sets to prevent the individual expansion of one from causing the trimming of others:
- System Cache
- Paged Pool
- Driver Images
With respect to Contiguous Memory Allocations, new multi-megabyte tracking structures allow the memory manager to skip already-allocated ranges in large page chunks, yielding up to a 512x performance increase on some workloads. For example, Hyper-V allocations by VID.sys are now more than 30 times faster. In addition, pervasive top-down optimal scanning with a sliding window has contributed greatly to increased performance. Specifically, it dramatically improves the performance of Hyper-V creation of guest VMs and enterprise applications like SQL that allocate large amounts of memory.
The effectiveness of ASLR has been enhanced to include 64 possible load addresses for 32-bit drivers and 256 for 64-bit drivers, up from 16 for each. In addition, large session drivers such as Win32k.sys are also now relocated. Extra effort is also made to relocate user space images even when system virtual address space is tight by using the user address space of the system process.
Finally, let’s look at some Translation Look-Aside Buffer (TLB) and Cache Flush improvement. The TLB is how a processor caches virtual-to-physical translations to provide performance gains. The operating system is required to flush the corresponding entries whenever it changes a virtual-to-physical mapping. Windows 7 and Windows Server 2008 R2 take advantage of newer CPU designs that do not require TLB invalidation for permission promotion, eliminating the need for TLB flushing for many common operations such as dirty bit faults. Also added is automatic tracking of I/O space mappings, thus making the system robust against conflicting attribute specification by automatically guaranteeing that incorrect mapping requests are transparently converted to correct ones. This improves performance by eliminating unneeded and costly flushing of the entire cache.
And with that – we’ve reached the end of our second day! Tomorrow, Jim Martin will be back with a look at Core Parking / Intelligent Timer Tick and Timer Coalescing. Enjoy the rest of your Friday!
– Tim Newton, with special contributions by Jim Martin
|Share this post :|