This is another one of those cases where ISA Server Service mysterious crashes once a day, at the same time and nothing changed in the environment. This just make me really fell that the lack of communication between the teams that deals with technology is getting far beyond of what should exactly be. Many companies are investing money in putting Security in place by adding layers and layers of technology but they are still missing two important elements: process awareness and change control procedures. The absence of those elements can directly impact availability of the environment. Why availability? Well, I will tell you later when I finish this post.
2. Analyzing the Data
In this case ISA Server Service was crashing with the following errors:
The event 14057 is clear about one thing: this was an access violation exception (C0000005) on the filter module W3Filter.dll. Too broad, can be many things including issues with the filter itself, so we need to get a crash dump of this guy to better understand what is going on. Following the approach of one of my posts we can use DebugDiag to attach to wspsrv.exe and get the dump. After getting the dump you can use this other post as an example of how to analyze it. Unfortunately this is one of the cases where the public symbols don’t help that much as you can see below:
The !analyze result showed above will make you under the impression that W3Filter.dll is the culprit and it is exactly the opposite, this guy is only a victim.
After deeply analyze the dump using the private symbols we got to a conclusion that someone was locking the cache file when the Web Filtering was trying to write to it. Guess who was locking it? Once upon a time there was a system administrator that was following a plan that he received from his management to install backup software in all Windows Servers, so he installed this backup software on ISA and configured a Job to run every night…
The backup software was backing up the whole server (all hard drivers) including the driver where the ISA Cache was located. For this reason customer was saying that the issue just happened when the ISA Server Cache was enabled, if they disabled the cache the issue didn’t happen. Well make sense and the recommendation to exclude cache from backup as not new, as a matter of fact the article that recommends this is out there since October 2004, which is the following one:
Event ID 5, event ID 14079, and event ID 14176 are logged in the Application log on your Internet Security and Acceleration Server computer
Now the answer for: Why Availability? Because the ISA Server service in this case was crashing due and addition of a new product in the ISA Box without testing it in a lab environment (where the change control procedure is?). The Windows OS maintenance was responsibility of the System Administrator that with all the good intentions configured the Backup Software to back it up the whole hard drive. However the Firewall Admin wasn’t aware of this addition since it was out of the scope of his duty (where the process awareness is?) and he swear since the begging that nothing change in the environment and ISA was crashing from nothing L. But, this story had a happy end at least, so let’s finish this post with a smiling face J.