MEMORY MANAGEMENT – WORKING SET TRIMMING
Description: Working Set Trimming refers to a condition wherein all Working Sets are flushed to the disk. A Working Set is the amount of physical memory used by a process. Working Set Trimming is primarily seen on SQL Server systems when large amounts of physical memory are allocated to the SQL Server processes and a new request for resource allocations is made that exceeds the amount of free available resources. In order to satisfy the new allocation request the Working Set manager flushes all the Working Sets for all the processes to disk (not just the SQL ones). Only the bare minimum needed to run these processes remains allocated.
Scoping the Issue: Although we normally see Working Set Trimming Issues on SQL Servers (SQL Server 2000 and later), we have also seen the issue surface on other application servers that make use of large databases – for example, Microsoft Exchange 2007 servers and Microsoft Data Protection Manager (DPM) Servers. Let’s take a look at each of these scenarios in turn:
Microsoft SQL Servers: Based on the version of SQL Server in use, and the configuration, there are a combination of symptoms that you may experience.
- SQL Server Standard Edition – From a systems perspective, you may notice sluggish performance on the server, both at the console and when accessing the server remotely. In addition, you may not be able to connect to the system via a Remote Desktop Connection. Over time the performance will improve as the running processes reacquire the resources that they were using previously. On systems running SQL Server 2005 SP2 or later, you may also notice a number of error messages that include this description: “A significant part of sql server process memory has been paged out” in the Application event log (Event ID 17890)
- SQL Server Enterprise Edition –SQL Server Enterprise Edition includes additional functionality that allows you to lock the buffer pool in memory. This functionality is called “Lock Pages in Memory”. If you have this feature configured and your system encounters a condition that requires the Working Sets to be flushed, you may see a number of Event ID 333 in the System log. The description for this Event ID includes the text, “An I/O operation initiated by the Registry failed unrecoverably.” If you are running SQL Server 2005 SP2 or later and you have this feature configured, you will not see the error message referencing SQL Server Process Memory being paged. Again, as the running processes reacquire the resources that they were using previously, system performance will improve.
Microsoft Exchange Server 2007: In the event that Exchange 2007 servers experience a Working Set flush, common symptoms include the following:
- Event ID 333
- Slow response at the console or from a remote connection
- Unable to connect via RDP
- New mail connections to the server time out – existing connections continue to function normally
- Remote Management is not possible – in essence new RPC connections are unavailable
Microsoft Data Protection Manager (DPM): DPM servers experience the same symptoms as described above for Microsoft SQL Servers. DPM servers will also experience failures when trying to take snapshots.
In each of the application server scenarios outlined above, the server will eventually recover as the running processes reacquires resources it was previously using. However, this may take anywhere from minutes to hours, depending on the amount of physical RAM and the processing taking place on the system.
Data Gathering: In all instances, collecting either MPS Reports with the General, Internet and Networking, Business Networks and Server Components diagnostics, or a Performance-oriented MSDT manifest must be done. Additional data required may include the following:
- Performance Monitor logs that include the timeframe when the Working Set Trimming occurred. Ideally, the capture interval should not exceed 10 seconds. You can create the log parameters manually, or by using the Performance Monitor Wizard. Required counters include:
- All Memory Counters / All Instances
- All Process Counters / All Instances
- All Disk Counters / All Instances
- Pool Monitor (PoolMon) logs that include the timeframe when the Working Set Trimming occurred. Ideally, the capture interval should not exceed 10 seconds.
- It may also be necessary to capture ETW tracing log information. See Microsoft KB Article 938486 for information on configuring your system to capture this information
- If you have a SQL server, capture the configuration information concerning SQL Memory Allocation
Troubleshooting / Resolution: After you have gathered this data, review the following:
- MPS Reports
- Outdated drivers & firmware – in particular for the NIC and Disk / Storage subsystems
- Event ID’s look for the Event ID’s listed above and also any 2019’s or 2020’s
- Performance Monitor Logs
- Look for evidence of pronounced Working Set flush
- Also look for evidence of a particular process’ Working Set growing at the time of the flush as this could indicate why the trim occurred (for example – backup job or large file copy etc)
- PoolMon logs
- Look for evidence of pool leaks
- Particular tags to watch for are MMcm and CM35
- Microsoft KB Article 918483
- Microsoft KB Article 938486
- Ask the Performance Team: Lock Pages in Memory … do you really need it?
- Ask the Performance Team: SQL and the Working Set
- Ask the Performance Team: Troubleshooting Event ID 333 Errors
- CSS SQL Server Engineers: Do I have to assign the Lock Pages in Memory privilege for Local System?