An Overview of Troubleshooting Memory Issues

We’ve talked quite a bit in the past about various memory management concepts – the use of the /3GB switch, Understanding Pool Resources, the x86 Virtual Address Space and so on.  The most common manifestation of memory issues is a memory shortage – essentially a situation with insufficient resources. 

There are two types of memory pressure to consider – first, a system with too little physical RAM installed.  When a system has too little RAM, the Virtual Memory Manager has an increased workload as it attempts to keep the most recently accessed virtual memory pages of processes in Physical RAM so that they can be quickly retrieved.  Performance of both applications, and the overall system may suffer as a result of the increased paging to disk.  Although the excessive paging is really a secondary symptom, it is the easiest symptom to detect.  The second type of memory issue occurs when a process exhausts the available virtual memory.  This is most commonly referred to as a memory leak.  Most memory leaks are fairly easy to detect and are usually caused by software code defects.  However, normal system workload can also cause a memory depletion – there is no real memory leak, however overall virtual memory usage continues to grow until the system experiences a shortage.

One of the problems with configuring and troubleshooting memory on a system is that the memory is not used in quite the same fashion as the other hardware on a system.  Program instructions and data occupy physical memory to execute.  Often, they will occupy physical memory locations long after they are actively addressed.  The idle code and data are removed from RAM only when new requests for physical memory addresses cannot be satisfied from existing free RAM.  Also, because of the manner in which virtual memory address space is mapped to physical memory on demand, RAM tends to appear fully utilized all the time.

So now that we understand some of the mechanisms and how issues may manifest, how do we go about troubleshooting?  The first thing to understand is whether the problem is due to insufficient physical memory that results in excessive paging.  However, remember that excessive paging may occur even if there is plenty of available memory – for example if an application is leaking memory.  As we’ve mentioned before, it is important to have a baseline of your system’s performance to compare to any new performance data that you gather.  Below are some of the key performance counters – the descriptions are pulled directly from the counter descriptions in Performance Monitor on Windows Vista:

Counter Name Description Values
Memory \ %Committed Bytes in Use % Committed Bytes In Use is the ratio of Memory \ Committed Bytes to the Memory \ Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file.  If the paging file is enlarged, the commit limit increases, and the ratio is reduced). This counter displays the current percentage value only; it is not an average. If this value is consistently over 80% then your page file may be too small (refer back to our post "What is the Page File for anyway?")
Memory \ Available Bytes Available Bytes is the amount of physical memory, in bytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free and zero page lists. If this value falls below 5% of installed RAM on a consistent basis, then you should investigate.  If the value drops below 1% of installed RAM on a consistent basis, there is a definite problem!
Memory \ Committed Bytes Committed Bytes is the amount of committed virtual memory, in bytes. Committed memory is the physical memory which has space reserved on the disk paging file(s). There can be one or more paging files on each physical drive. This counter displays the last observed value only; it is not an average. Keep an eye on the trend of this value – if the value is constantly increasing without leveling off, you should investigate
Memory \ Pages / sec Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays.  It is the sum of Memory \ Pages Input/sec and Memory \ Pages Output/sec.  It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files. This will depend on the speed of the disk on which the page file is stored.  If there are consistently more than 40 per second on a slower disk or 300 per second on fast disks you should investigate
Memory \ Pages Input / sec Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\\Pages Input/sec to the value of  Memory\\Page Reads/sec to determine the average number of pages read into memory during each read operation. This will vary – based on the disk hardware and overall system performance.  On a slow disk, if this value is consistently over 20 you might have an issue.  A faster disk can handle more
Memory \ Pool Nonpaged Bytes Pool Nonpaged Bytes is the size, in bytes, of the nonpaged pool, an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated.  Memory\\Pool Nonpaged Bytes is calculated differently than Process \ Pool Nonpaged Bytes, so it might not equal Process \ Pool Nonpaged Bytes \ _Total.  This counter displays the last observed value only; it is not an average. If Nonpaged pool is running at greater than 80%, on a consistent basis, you may be headed for a Nonpaged Pool Depletion issue (Event ID 2019)
Memory \ Pool Paged Bytes Pool Paged Bytes is the size, in bytes, of the paged pool, an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used.  Memory \ Pool Paged Bytes is calculated differently than Process \ Pool Paged Bytes, so it might not equal Process \ Pool Paged Bytes \ _Total. This counter displays the last observed value only; it is not an average. Paged Pool is a larger resource than Nonpaged pool – however, if this value is consistently greater than 70% of the maximum configured pool size, you may be at risk of a Paged Pool depletion (Event ID 2020).  See our post "Understanding Pool Resources" for more details, including how to find out what your current maximum paged and nonpaged pool sizes are.
Process (_Total) \ Private Bytes Private Bytes is the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes. Similar to the Committed Bytes counter for memory, keep an eye on the trending of this value.  A consistently increasing value may be indicative of a memory leak
LogicalDisk (pagefile drive) \ % idle time % Idle Time reports the percentage of time during the sample interval that the disk was idle. If the drive(s) hosting the page file are idle less than 50% of the time, you may have an issue with high disk I/O
LogicalDisk (pagefile drive) \ Split I/O / sec Split IO/Sec reports the rate at which I/Os to the disk were split into multiple I/Os. A split I/O may result from requesting data of a size that is too large to fit into a single I/O or that the disk is fragmented. Issues relating to Split I/O depend on the disk drive type and configuration

So, if the page file shows a high degree of usage, the paging file may be sized too small.  Similarly, if the disk hosting the page file is excessively busy, overall system performance may be impacted.  Memory leaks caused by applications manifest in different ways – you may get an error message indicating that the system is low on virtual memory.  A memory leak in an application will show up as a gradual increase in the value of the Private Bytes counter listed above.  A memory leak in one process may cause excessive paging as it squeezes other process working sets out of RAM.

Now that you’ve gathered your data and determined what the issue is, how do you address it?  In the case of a server that is short on physical memory, you can either add more RAM (if the system hardware and the OS Version that you are running will support additional RAM).  You may also elect to offload some of the workload to another system.  However, if your data does not indicate an issue with the amount of Physical RAM, then it’s time to look at other components that may be experiencing issues based on the data you have collected.

If the memory does seem to be an issue, then we have to determine the specific cause.  In Terminal Services environments in particular, as more and more applications are added to the Terminal Server environment, the amount of resources required by the various processes increases.  In some cases, you may be able to narrow down the cause to a specific process.  In those situations, one of two things may need to occur – you need to add more RAM to the machine, or there is a problem with the process itself that needs attention.  The key here is to understand exactly what the process is doing.  If the virtual memory of the process climbs steadily and then levels off, then increasing the size of the page file may be all that is needed to offer relief.  If there is still insufficient RAM however, you will notice increased paging which may result in system performance issues.  However, the key here is that the process does not consume virtual memory indefinitely – it levels off at a certain point.

On the other hand, if the application consumes memory and never levels off, then the application is probably leaking memory.  In a memory leak scenario, the application reserves and consumes memory resources, but never releases them when they are no longer required.  Depending on how severe the leak is, the virtual memory could be depleted within weeks, days, hours – or even minutes (in the most extreme circumstances)!  It is more normal to see a slow leak, which can be identified using Performance Monitor logging.

Another scenario that can cause performance degradation occurs when the page file itself is heavily fragmented.  This occurs when the page file is located on a disk that is heavily used by other applications.  In these cases, the disk itself is probably fragmented as well.  Defragmenting the disk should alleviate some of the issues, however since the built-in disk defragmenter does not defragment paging files, you may want to consider moving the page file to another drive temporarily, setting the page file on the fragmented disk to 0 MB.  Reboot the system to enable the other page file to be used and perform the disk defragmentation on the original drive.  Once the defragmentation is complete, you can reset the page file on the original drive to the necessary values, zero out the page file size on the temporary drive and reboot the system again.

That brings us to the end of this post.  In our next post, we’ll continue our look at memory troubleshooting.  Until next time …

Additional Resources:

- CC Hameed

Share this post :