On the last session, back in July last year we saw an issue that on the first perspective looks like an ISA Server issue, however it end up to be another component. The scenario that I’m going to describe now is really something that could be interpreted as an issue with ISA Server, however only was analysis was it possible to prove that ISA Server was actually the victim.
2. Scenario – Server’s Performance Gradually Become Less Responsive Until it Stops Accepting Incoming Connections
That was the description that customer gave us, and additionally he said that if he stops the Microsoft Firewall Service the server starts accepting new connections. During the time of the issue the following event was also appearing in the Application Log:
As the event says, this could be potentially an issue with ISA Server, but usually this event is caused by something outside of ISA. we needed to narrow it down to understand why we were receiving this error.
2.1. Collecting Data
Since we were dealing with a server’s performance degradation in general, we were still unsure that ISA Server was the real root cause for this problem. We had to collect data from the operating system to analyze the performance in general. Here what the action plan was:
Preparing the Server
1) Configure the system to get a Kernel memory dump when the server became unresponsive. To do this configure the server according to KB244139.
2) Configure the system to get PoolMon. Use KB177415.
3) Configure PerfMon. Use the objects suggested by the article “Monitoring and Troubleshooting Performance” on Microsoft TechNet.
When the issue happens
1) Use the keyboard combination to get the dump as mentioned on KB244139. This will generate a blue screen (which is expected).
2) Right after the server comes up run the Setup/Perf MPSReports.
3) Get the PoolMon result.
4) Get the Perfmon result.
Even without having the Microsoft Internal symbols you will still be able to review some key data using the public symbols. Download the debugging tools and configure the symbols according to the Debugging Tools and Symbols: Getting Started Page in Windows Hardware Developer Central.
Reviewing the dump one of the main points that came up was the excessive use of Non Paged Pool, there the result:
In addition to checking the pool usage in the kernel dump (using !poolused) you can also use the PoolMon tool to see which drivers might be leaking. On this particular case the tags were:
Those tags do not belong to Microsoft products and you can find that by using the article “How to find pool tags that are used by third-party drivers” at Microsoft Help and Support.
To analyze the Performance Monitor data you can follow the recommendations of the article “Analyzing performance data” at Microsoft TechNet and review the Operating System side. In this particular case, the ISA Server process (wspsrv.exe) was not showing any kind of leaking behavior or excessive use of memory or CPU.
4. The Culprit
With those pieces of the puzzle in hands we could match such behavior with the following two KB articles:
- 947475 When TrendMicro OfficeScan is installed on a Windows Server 2003-based computer, event ID 2020 occurs, and the computer may stop responding, or the computer may respond slowly at Microsoft Help and Support.
- 923125 The computer may stop responding after you install Trend Micro OfficeScan 8 on a computer that is running Microsoft Windows 2000 Server or Windows Server 2003 at Microsoft Help and Support.
This particular product was installed on the server and the problem was starting to make sense at this point.
It makes sense that after the customer stopped the Microsoft Firewall service the server started working normally. The fact is that if the Operating System itself runs out of the Non Paged Pool Resources it will trigger the LowNonPagedPoolCondition which will cause ISA Server to stop to accept new connections. This doesn’t cause issues only in ISA Server, this also happens with other products, such as IIS (per KB933844).
The ISA Server was just a victim in this situation and the workaround was to uninstall the antivirus product for immediate relief from this problem in this customer’s environment (as per KB923125). TrendMicro has released a hotfix that addresses this problem.
Security Support Engineer – ISA Server Team – Microsoft Texas