PRF: Memory Leak / Resource Depletion


MEMORY LEAK / RESOURCE DEPLETION



Description:  A memory leak is a situation that occurs on a system when a process is consuming a memory resource and never levels off. Examples of a memory resource can be virtual memory, pool memory, handles or heap.


 


Scoping the Issue:  Memory leaks may occur in any environment and for any number of reasons. The best means of determining what is going on with the system is gathering data when the system is in the problem state.


 


Data Gathering:  In all instances, collecting either MPS Reports with the General, Internet and Networking, Business Networks and Server Components diagnostics, or a Performance-oriented MSDT manifest must be done.  Additional data required may include the following:



  • Performance Monitor logs that include the timeframe when the Working Set Trimming occurred.  The length of time it takes the server to go from a normal state, to a memory leak state will determine the Perfmon capture interval. Please use the table below to set the capture interval.  You can create the log parameters manually, or by using the Performance Monitor Wizard.  Required counters include:


    • Cache / All Counters / All Instances

    • Memory / All Counters / All Instances

    • Process / All Counters / All Instances

    • Processor / All Counters / All Instances

    • Physical Disk / All Counters / All Instances















If the average time to issue is: The capture interval should be:
Weekly 14 minutes
Daily 120 seconds
Hourly 5 seconds



     

  • Pool Monitor (PoolMon) logs that include the timeframe when the memory leak is occurring.  As with Perfmon, the poolmon capture interval is set based on the frequency of the symptoms.  The table below provides some guidelines for setting the interval.  We strongly recommend capturing simultaneous Perfmon and Poolmon data simultaneously so we can correlate the events.

  •  
















If the average time to issue is: The capture interval should be:
Weekly 1 hours
Daily 15 minutes
Hourly 60 seconds


  • It may also be necessary to capture a complete memory dump of the server while it is in the problem state.  In most cases we will capture what is known as a Ctrl-ScrollLock Memory Dump. However, if your system has a “Lights out Management” system (iLO), you will most likely want to capture what is known as an NMI Dump. In either case it is important to ensure that you have a pagefile on the root drive that is equal to the amount of RAM on the system plus about 100MB. 

  • If your system has a very large amount of RAM or limited disk drive space on the root drive, you may need to use MAXMEM to constrain the memory to a more reasonable size for dumping the system (i.e. constrain a 16GB system to 4GB). If this is not possible, a Kernel Only dump may suffice depending on circumstance.


    •  


      Troubleshooting / Resolution: After you have gathered this data, review the following:



      • MPS Reports


        • Outdated drivers & firmware – in particular for the NIC and Disk / Storage subsystems as well as Anti-virus

        • Event ID’s look for the Event ID’s listed above and also any 2019’s or 2020’s.  These events are indicative of NonPaged / Paged Pool depletion

      • Performance Monitor Logs


        • Look for evidence of any upward trending processes. This may display as either an upward slanting line, or a stair-stepping increase over time.

        • If there is evidence of a leaking process, test removing or disabling the product to see if the issue goes away. If so, contact the product vendor for a resolution.

      • PoolMon logs


        • Look for trending increase of paged pool or non-paged pool memory which may indicate a leak

        • If there is evidence of a leaking pool tag, research what it correlates to. If possible test removing or disabling the product to see if the issue goes away. If so, contact the product vendor for a resolution

      • Complete Memory Dump


        • Analysis of the memory dump requires some knowledge of debugging.  The dump file should be provided to Microsoft support for analysis.

      Additional Resources: