Analyzing HTTP Traffic On Your IIS 7.0 Cluster

If you have ever run an enterprise web site you have probably received a panicked call from a customer or from your Tier 1 monitoring team that your site was responding slowly or throwing 500 errors and that you needed to resolve this issue ASAP. Maybe you are rolling out a new application or software across your site and want to perform side by side analysis of servers to ensure their health.

Over the past year we have rolled out multiple pre-release versions of Windows Server 2008 / IIS 7.0 onto the www.microsoft.com site. During this process we have had to monitor the overall health of the site. Which included being able to quickly identify content/application that were not behaving well on a server, cluster of servers or across the entire site. When running an enterprise of IIS servers it is invaluable to understand the traffic patterns of your site and to have the tools to help identify problem areas. There are many tools in the wild to monitor the health of your enterprise, but I will show you how to use Logparser to harvest your IIS logs and pinpoint a site problem. The goal of this blog is to help you identify issues or anomalous traffic patterns on your IIS web site.

For the purposes of this blog I will use an eight server cluster for our analysis. Sometimes tracking down issues on your site can be like trying to find a needle in a haystack. I like to start by looking at the overall health of the site or for the purposes of my examples, a cluster of servers. Next, I like to drill down into more detail on the server that looks unhealthy. In my examples I will drill down by URI and followed next by the associated window of time that the server/URI was unhealthy.

Here is where the fun begins! The first step is to perform a query by HTTP status codes for the cluster. Using Logparser I created a file called statuscodes_by_servers_in_cluster.sql and fed in the input IIS log files from all the IIS servers in the cluster which are called u_ex07111409.log. This result in an easy to read table showing you the summarized results by HTTP status code for each of the servers in your cluster.

 

Summarized HTTP status codes by server for a complete a complete cluster

logparser -rtp:-1 file:statuscodes_by_servers_in_cluster.sql?logfilename=u_ex07111409.log

sc-status Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 Srv7 Srv8

--------- ------ ------ ------ ------ ------ ------ ------ ------

200 576313 585078 573956 574218 566388 599807 572982 589815

206 242 238 247 262 236 216 192 246

301 6148 6388 6378 6154 6140 6600 6293 6398

302 166817 168700 166471 165909 163386 172658 166764 170363

304 125493 126640 125908 126204 124139 130669 125855 129026

400 4 3 3 2 2 7 7 3

403 130 106 146 170 148 154 148 95

404 9023 9149 9835 8833 8811 9489 9250 9369

405 0 1 0 0 0 0 0 0

406 2 10 5 11 4 7 8 4

500 1361 342 363 337 332 323 340 351

501 18 22 21 25 31 27 34 30

 

I created the following Logparser file (statuscodes_by_servers_in_cluster.sql) to create the site by side server table of HTTP status code values.

 

statuscodes_by_servers_in_cluster.sql

SELECT

sc-status,

       SUM(_Srv1) AS Srv1,

       SUM(_Srv2) AS Srv2,

       SUM(_Srv3) AS Srv3,

       SUM(_Srv4) AS Srv4,

       SUM(_Srv5) AS Srv5,

       SUM(_Srv6) AS Srv6,

       SUM(_Srv7) AS Srv7,

       SUM(_Srv8) AS Srv8

      

USING

       CASE s-computername WHEN 'ServerName1' THEN 1 ELSE 0 END AS _Srv1,

       CASE s-computername WHEN 'ServerName2' THEN 1 ELSE 0 END AS _Srv2,

       CASE s-computername WHEN 'ServerName3' THEN 1 ELSE 0 END AS _Srv3,

       CASE s-computername WHEN 'ServerName4' THEN 1 ELSE 0 END AS _Srv4,

       CASE s-computername WHEN 'ServerName5' THEN 1 ELSE 0 END AS _Srv5,

       CASE s-computername WHEN 'ServerName6' THEN 1 ELSE 0 END AS _Srv6,

       CASE s-computername WHEN 'ServerName7' THEN 1 ELSE 0 END AS _Srv7,

       CASE s-computername WHEN 'ServerName8' THEN 1 ELSE 0 END AS _Srv8

FROM

       \\ServerName1\logshare$\%logfilename%,

       \\ServerName2\logshare$\%logfilename%,

       \\ServerName3\logshare$\%logfilename%,

       \\ServerName4\logshare$\%logfilename%,

       \\ServerName5\logshare$\%logfilename%,

       \\ServerName6\logshare$\%logfilename%,

       \\ServerName7\logshare$\%logfilename%,

       \\ServerName8\logshare$\%logfilename%

GROUP BY

        sc-status

ORDER BY

        sc-status

 

You will notice that the HTTP 500 status codes are rather high on one of our servers (SRV1). Now that we have identified which server looks unhealthy we can drill down and identify the problem application or URI which is causing the high level of 500’s server errors.

The next Logparser query I like to use is a spill by URI for the specific HTTP 500 status code on the cluster. Using Logparser I created a file called cs-uri-stem_by_servers_in_cluster.sql and fed in the same input IIS log files from all the IIS servers in the cluster called u_ex07111409.log.

URI spill for ‘500’ HTTP status code for a complete cluster:

logparser -rtp:-1 file:cs-uri-stem_by_servers_in_cluster.sql?logfilename=u_ex07111409.log

sc-status Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 Srv7 Srv8 cs-uri-stem

--------- ---- ---- ---- ---- ---- ---- ---- ---- ----------------------------------------------

500 0 0 0 0 0 0 0 1 /brasil/pr/2002/ms_edu_minas.stm

500 0 0 0 1 0 0 0 0 /brasil/pr/2002/ms_games_br.stm

500 0 0 0 0 0 1 0 0 /brasil/technet/topicos/codered.stm

500 1 0 0 0 0 0 0 0 /brasil/windows2000/requisitos.stm

500 0 0 0 0 2 0 0 0 /egypt/

500 0 0 0 0 1 0 0 1 /esp

500 0 0 0 0 0 0 0 35 /fwquery/

500 28 9 10 35 21 20 23 31 /gbadapp/errorpages/error.aspx

500 267 160 184 260 270 270 274 275 /ibadapp/errorpages/error.aspx

500 1 0 0 0 0 0 0 0 /middleeast/press/2001/may/gold_cert.stm

500 0 0 0 1 0 0 0 0 /middleeast/press/2001/may/officexpa.stm

500 1060 7 17 38 35 32 38 0 /obadapp/errorpages/error.aspx

500 0 0 0 0 0 0 1 0 /projectserver

500 4 2 1 2 1 0 4 5 /shared/1/navigation.asmx/DisplayDlNavHtml

500 0 0 0 0 2 0 0 1 /shared/2/navigation.asmx/DisplayDlNavHtml

500 0 0 0 0 0 0 0 2 /xml/overview.asp

 

I created the following Logparser file (cs-uri-stem_by_servers_in_cluster.sql) to create the URI spill by server for the HTTP 500 status code.

 

cs-uri-stem_by_servers_in_cluster.sql

SELECT

sc-status,

       SUM(_Srv1) AS Srv1,

       SUM(_Srv2) AS Srv2,

       SUM(_Srv3) AS Srv3,

       SUM(_Srv4) AS Srv4,

       SUM(_Srv5) AS Srv5,

       SUM(_Srv6) AS Srv6,

       SUM(_Srv7) AS Srv7,

       SUM(_Srv8) AS Srv8,

       cs-uri-stem

      

USING

       CASE s-computername WHEN 'ServerName1' THEN 1 ELSE 0 END AS _Srv1,

       CASE s-computername WHEN 'ServerName2' THEN 1 ELSE 0 END AS _Srv2,

       CASE s-computername WHEN 'ServerName3' THEN 1 ELSE 0 END AS _Srv3,

       CASE s-computername WHEN 'ServerName4' THEN 1 ELSE 0 END AS _Srv4,

       CASE s-computername WHEN 'ServerName5' THEN 1 ELSE 0 END AS _Srv5,

       CASE s-computername WHEN 'ServerName6' THEN 1 ELSE 0 END AS _Srv6,

       CASE s-computername WHEN 'ServerName7' THEN 1 ELSE 0 END AS _Srv7,

       CASE s-computername WHEN 'ServerName8' THEN 1 ELSE 0 END AS _Srv8

FROM

       \\ServerName1\logshare$\%logfilename%,

       \\ServerName2\logshare$\%logfilename%,

       \\ServerName3\logshare$\%logfilename%,

       \\ServerName4\logshare$\%logfilename%,

       \\ServerName5\logshare$\%logfilename%,

       \\ServerName6\logshare$\%logfilename%,

       \\ServerName7\logshare$\%logfilename%,

       \\ServerName8\logshare$\%logfilename%

WHERE sc-status = 500

GROUP BY

       cs-uri-stem,

       sc-status

ORDER BY

        cs-uri-stem

 

 

There are a couple URIs that are throwing the 500 errors on SRV1. The majority of the errors coming from the URI “/obadapp/errorpages/error.aspx”. The next step is to determine if there was a window where this URI/application was performing poorly.

The next Logparser query I like to use is to spill by HTTP status code and by time (in minutes) for the problem server in question. This will determine if the URI / application on ServerName1 (SRV1) was throwing the 500 errors consistently throughout the hour or during a specific window of time. Using Logparser I created a file called httpstatuspivot.sql and fed in the same input IIS log file from ServerName1 called u_ex07111409.log.

Spill of HTTP status codes by time

logparser -rtp:-1 file:httpstatuspivot.sql?logfilename=\\ServerName1\logshare$\u_ex07111409.log

Minute 200s 206s 301s 302s 304s 403s 404s 405s 406s 500s 501s

------------------- ----- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----

2007-11-14 09:00:00 11143 3 128 3302 2477 6 158 6 0 3 0

2007-11-14 09:01:00 10659 4 103 3158 2399 2 142 2 0 3 0

2007-11-14 09:02:00 11139 6 129 3215 2410 1 154 1 0 4 0

2007-11-14 09:03:00 10749 1 116 3141 2192 3 161 3 0 4 0

2007-11-14 09:04:00 10814 2 132 3130 2179 2 180 2 0 3 0

2007-11-14 09:05:00 11139 3 134 3240 2527 5 172 5 0 6 0

2007-11-14 09:06:00 11005 8 136 3285 2384 2 189 2 0 8 0

2007-11-14 09:07:00 10913 6 116 3125 2409 6 165 6 0 7 0

2007-11-14 09:08:00 11291 4 136 3267 2315 2 204 2 1 5 0

2007-11-14 09:09:00 10854 5 140 3239 2358 2 162 2 0 5 1

2007-11-14 09:10:00 10219 4 107 2975 2215 1 166 1 1 8 0

2007-11-14 09:11:00 10006 6 124 2854 2050 1 180 1 0 6 0

2007-11-14 09:12:00 10555 5 112 2895 2163 1 160 1 0 7 0

2007-11-14 09:13:00 10404 2 90 3007 2228 0 190 0 0 5 0

2007-11-14 09:14:00 9246 9 103 2839 1852 0 143 0 0 3 0

2007-11-14 09:15:00 9751 4 114 2964 2180 3 135 3 0 4 0

2007-11-14 09:16:00 11884 4 137 3411 2645 2 169 2 0 7 0

2007-11-14 09:17:00 10314 5 94 2838 2203 2 146 2 0 3 0

2007-11-14 09:18:00 9844 2 93 2911 2123 3 132 3 0 6 2

2007-11-14 09:19:00 10080 2 115 2883 2158 1 132 1 0 6 0

2007-11-14 09:20:00 10410 3 114 2993 2309 3 155 3 0 4 0

2007-11-14 09:21:00 10817 3 117 3173 2385 0 188 0 0 4 0

2007-11-14 09:22:00 10313 4 117 3016 2324 3 185 3 0 9 1

2007-11-14 09:23:00 10210 2 107 2792 2058 2 143 2 0 5 0

2007-11-14 09:24:00 10122 6 85 2785 2204 2 156 2 0 6 0

2007-11-14 09:25:00 10852 4 100 3095 2332 3 161 3 0 7 0

2007-11-14 09:26:00 9857 6 100 2911 2100 3 135 3 0 6 0

2007-11-14 09:27:00 9707 4 103 2820 2081 3 164 3 0 8 0

2007-11-14 09:28:00 9858 7 90 2775 2142 6 139 6 0 4 0

2007-11-14 09:29:00 9508 3 77 2767 2064 3 137 3 0 2 0

2007-11-14 09:30:00 8810 3 90 2593 2094 1 110 1 0 11 3

2007-11-14 09:31:00 8913 4 96 2522 1955 1 153 1 0 6 1

2007-11-14 09:32:00 9214 6 112 2789 2176 3 150 3 0 9 0

2007-11-14 09:33:00 9930 3 106 2871 2118 5 140 5 0 4 3

2007-11-14 09:34:00 9136 6 95 2548 1794 3 123 3 0 5 1

2007-11-14 09:35:00 9431 3 106 2734 2136 1 173 1 0 3 0

2007-11-14 09:36:00 9174 13 99 2667 2034 0 116 0 0 5 0

2007-11-14 09:37:00 8292 4 91 2507 1807 1 115 1 0 7 0

2007-11-14 09:38:00 9275 2 82 2496 1911 3 139 3 0 5 0

2007-11-14 09:39:00 8290 3 84 2396 1741 2 116 2 0 108 0

2007-11-14 09:40:00 8802 7 117 2556 2034 5 151 5 0 617 0

2007-11-14 09:41:00 8549 6 90 2332 2002 1 140 1 0 311 1

2007-11-14 09:42:00 8857 5 146 2521 1901 0 167 0 0 1 2

2007-11-14 09:43:00 8200 3 91 2315 1819 1 136 1 0 4 0

2007-11-14 09:44:00 8397 5 92 2388 1953 3 128 3 0 6 0

2007-11-14 09:45:00 8204 1 79 2453 1911 1 124 1 0 4 0

2007-11-14 09:46:00 8979 3 93 2657 2035 1 142 1 0 6 0

2007-11-14 09:47:00 9369 5 79 2757 2066 2 166 2 0 8 0

2007-11-14 09:48:00 8086 2 80 2375 1751 0 148 0 0 5 0

2007-11-14 09:49:00 8119 2 76 2193 1597 2 96 2 0 10 0

2007-11-14 09:50:00 8642 4 104 2494 2066 2 147 2 0 10 0

2007-11-14 09:51:00 8032 4 83 2292 1889 2 147 2 0 7 0

2007-11-14 09:52:00 8349 3 88 2585 1956 1 159 1 0 10 1

2007-11-14 09:53:00 8507 4 88 2589 1792 1 157 1 0 4 0

2007-11-14 09:54:00 9180 2 84 2538 1918 3 154 3 0 3 2

2007-11-14 09:55:00 8475 1 91 2523 1930 3 134 3 0 8 0

2007-11-14 09:56:00 8357 1 83 2428 1855 1 143 1 0 8 0

2007-11-14 09:57:00 8144 5 80 2354 1785 2 130 2 0 7 0

2007-11-14 09:58:00 8930 1 76 2567 1890 2 147 2 0 5 0

2007-11-14 09:59:00 10007 4 98 2971 2111 4 169 4 0 6 0

 

I created the following Logparser file (httpstatuspivot.sql) to create the spill by time (in minutes) for the server which was throwing the 500 errors. Also, an excellent TechNet article entitled “Inside Microsoft.com - Analyze Web Stats with Log Parser” also covers pivoting output of your Logparser query.

httpstatuspivot.sql

SELECT quantize(to_localtime(to_timestamp(date,time)),60) AS Minute,

    SUM([_200]) AS [200s],

    SUM([_206]) AS [206s],

    SUM([_301]) AS [301s],

    SUM([_302]) AS [302s],

    SUM([_304]) AS [304s],

    SUM([_403]) AS [403s],

    SUM([_404]) AS [404s],

    SUM([_403]) AS [405s],

    SUM([_406]) AS [406s],

    SUM([_500]) AS [500s],

    SUM([_501]) AS [501s]

USING

    CASE sc-status WHEN 200 THEN 1 ELSE 0 END AS [_200],

    CASE sc-status WHEN 206 THEN 1 ELSE 0 END AS [_206],

    CASE sc-status WHEN 301 THEN 1 ELSE 0 END AS [_301],

    CASE sc-status WHEN 302 THEN 1 ELSE 0 END AS [_302],

    CASE sc-status WHEN 304 THEN 1 ELSE 0 END AS [_304],

    CASE sc-status WHEN 403 THEN 1 ELSE 0 END AS [_403],

    CASE sc-status WHEN 404 THEN 1 ELSE 0 END AS [_404],

    CASE sc-status WHEN 401 THEN 1 ELSE 0 END AS [_405],

    CASE sc-status WHEN 406 THEN 1 ELSE 0 END AS [_406],

    CASE sc-status WHEN 500 THEN 1 ELSE 0 END AS [_500],

    CASE sc-status WHEN 501 THEN 1 ELSE 0 END AS [_501]

FROM %logfilename%

GROUP BY Minute

ORDER BY Minute

You will notice that during 9:39 through 9:41 this server was not performing well. From our earlier Logparser query the problem URI / application to focus on is “/obadapp/errorpages/error.aspx”.

 

Now that we have identified the problem URI / application you can engage the team that created the content / application and debug this issue further.

 

Using Logparser and the samples files above you can analyze the performance of your site as you roll out new content, applications or a new Operation System and ensure that you quickly identify issues during the rollout process.