I recently wrote about gathering user profile data for Exchange Server 2003 and 2007 by using the Exchange Server Profile Analyzer tool. As a re-cap the EPA tool uses WebDAV to interrogate the mailboxes and generates user profile data, including..
- messages sent per mailbox per day
- messages received per mailbox per day
- average message size
Note: This information is vital for performing good quality Exchange server scaling.
The problem of course is that Exchange Server 2010 does not include WebDAV and so the EPA tool will not work. This poses an interesting problem, however I am happy to report that we have a solution…
One of the nice things about Exchange 2007 and Exchange 2010 is that we can interrogate the message tracking logs via PowerShell. This provides us with a nice way to query what the Exchange Server is doing. Usefully the message tracking logs include sufficient information for us to approximate our user profile data, without needing the EPA.
Gathering the Data
After asking around internally within Microsoft about how to gather EPA data for Exchange Server 2010, it became apparent that PowerShell would be the best way to interrogate the message tracking logs. I mentioned to a few people that I was going to write something up over the next few weeks, however before I had a chance to even put any significant thought into the task, someone sent me a copy of the following script which I have uploaded here.
Now, I must confess that despite my best efforts I have been unable to track down the original author! As such it is provided here without credit (if you wrote this, then please get in touch!).
The script basically works by parsing the messaging tracking logs of your Exchange Servers and then tabulates the information into a CSV file for analysis in Excel. To provide some data to parse I configured a loadgen test against 10 mailboxes with a heavy profile, this should approximate to around 80 messages received and 20 sent per user.
The MessageStats script has a single command line parameter which controls how many days back it will look in the tracking logs. For my lab test I only wanted a single days worth, so I just tagged “1” on the end of the PS1 script.
Analysing the Data File
So, now we have our CSV file that we can open in Microsoft Excel, however the data required some work before we can get our EPA values. The following screenshot shows the raw data open in Excel.
The best way to process the data is to convert it into a table..
- Highlight cell A1
- Press CTRL+SHIFT+END
- Click on the INSERT Menu
- Click on the TABLE button
- Click on OK
- Open the DESIGN Menu
- Check the “Total Row” checkbox
- Hide columns C,D,E,H,I,J,K,L,M,N,O,R,S,T,U
You should now have a table with the following columns…
- Received Total
- Received MB Total
- Sent Unique Total
- Sent Unique MB Total
Note: Due to my test lab being very small I have added a filter to remove any non-loadgen accounts from the data analysis.
In the Total row at the bottom of your table add “AVERAGE” subtotals for “Received Total” and “Sent Unique Total”.
In the “Received MB Total” column total cell, add in an “AVERAGE” subtotal, then edit the formula in the cell and divide that value by the Total Row average for “Received Total”, then multiply the result by 1024 – this will report the average message size in KB.
In the “Sent Unique MB Total” column total cell, add in an “AVERAGE” subtotal, then edit the formula in the cell and divide that value by the Total Row average for “Sent Unique Total”, then multiply the result by 1024 – this will report the average message size in KB.
We now have all of the information that we require…
- Messages Received per Mailbox Per Day = Received Total / Days to Scan (68/1 = 68)
- Messages Sent per Mailbox Per Day = Sent Unique Total / Days to Scan (17/1 = 17)
- Average Message Size = Average of Received MB Total & Sent Unique MB Total (27.37+28.5)/2 = 27.94KB
So, using this technique we have managed to approximate our user profile to a fair degree of accuracy without needing to logon to any mailboxes!. I suspect that this method is accurate to around +/- 10% which is totally acceptable in this context.
Obviously there is a caveat here that I have only performed some rudimentary testing in a fairly small lab environment, so if you do run this in production and find that it generates weird results, or that it validates your already proven EPA data, then feel free to drop me a note to let me know