I often work with customers where a single CAS server in a group of load balanced CAS servers seems to be having performance issues. Upon closer inspection we find that the load balancing of those CAS servers was uneven and the only problem was that this particular CAS was receiving much more traffic than it should have. Since a very large percentage of CAS traffic is handling web based requests (EWS, ActiveSync, Outlook Anywhere, Autodiscover, OWA etc.) it’s extremely easy to measure load balancing efficiency by simply comparing the sizes of the IIS logs generated between each server.
For example, if you have 5 CAS servers in a CAS array and under production load CAS 1 logs 1GB of logs per day, you can safely assume that CAS servers 2-5 would log a similar number of requests. Since every request is logged, if one CAS is receiving more than its fair share of requests, its logs will naturally contain a larger number of rows. Due to affinity, persistence, changing loads, we can’t expect these comparisons to be 100% identical. In other words if the total rows logged are within 10-15% of each other you can usually rest assured that the NLB is probably doing its job. Discrepancies much larger than that however tend to point to balancing issues. With LPS we can do this remotely and even as an automated task if we wish. Here’s how:
Note: You’ll need administrator access to each CAS server via the account you are logged into on the remote machine you are running LPS from so you can access the logs across the network using the admin share.
1. The first order of business is to point LPS to each CAS server’s IIS logs. Click the log button to open the Log File Manager then click the “Add Files” button and browse to the location of the IIS logs for the default website for the first CAS. For example the network path to these logs on a 2010 CAS server named E2K10CAS1 running on Windows Server 2008 might be similar to the following:
In LPS browse to the path and click select the log file or files written during the time frame you want to verify. Typically, the last day or so should be fine, just make sure you select the same number of files and time frame for each server. Rinse and repeat so that there are equal entries for each CAS:
2. Next we’ll roll our own query that will return the load distribution per CAS which is incredibly simple by counting the number of requests in each file. Choose File > New Query to open a new blank query and paste in the following replacing the existing text:
SELECT LogFileName AS CAS,
Count(*) AS Weight
GROUP BY LogFileName
ORDER BY LogFileName
3. The last thing we need to do is set the log file type of our new query to IISW3C. This is accomplished by clicking the Log Type drop-down and choosing IISW3CLOG:
4. Run the query and the results will show the general distribution between CAS servers similar to the following:
Notice that E2K10CAS5 is receiving about 10 times as much traffic as the other four. This would indicate a possible imbalance and further investigation into the NLB setup would be in order. You may want to save the query to the library for future use using CTRL+S.
We could get a bit fancier with this method by tweaking the query further, saving the folder list and query, then running them as a scheduled task that sends the output to a CSV file providing a daily report of the last 24 hours of load balancing integrity for example. With LPS the sky is the limit and this method is not limited to Exchange Server only. Most any load balanced scenario where each server creates log files can benefit from this method.
Note: The ‘ in ‘[LOGFILEPATH]’ may cause copy/paste issues resulting in a ‘null parameter’ error in LPS so you may have to replace those in LPS with a standard apostrophe.