Included in this month’s Exchange Tools Web release (direct link) is a new analysis tool called the Microsoft Exchange Server Profile Analyzer Tool (or EPA). We created this tool to help server administrators understand their user profile to help with the process of server sizing or capacity planning. A user profile describes how many actions an average user performs in an average day. When we refer to actions, we are talking about things that a user would do within an e-mail client application like send mail, delete mail, browse folder, etc. The EPA tool attempts to use whatever information is available in user mailboxes to generate an estimated profile, and depending on the characteristics of your users, the estimated profile may not be very accurate (for example, if messages are frequently deleted from the Sent Items folder, EPA will not generate accurate sent message statistics). For those of you who have worked with older versions of Exchange and Outlook, the EPA tool is building a profile in a similar way to the storstat.exe tool.
This package includes three separate tools:
- EPA (available as EPAWin.exe and EPACmd.exe) is designed to generate a user profile based on the contents of user mailboxes.
- EPAOWA is designed to generate a user profile by analyzing OWA log files
- EPASummarizer is an assistant tool for EPA that can combine output files from EPA to summarize statistics across multiple data collections.
This article will focus specifically on the EPA tool. We will explain how it gathers data, what data it gathers, and specific requirements for running the tool successfully.
The major steps that EPA takes to collect data are:
Step 1: Load topology from Active Directory. EPA reads Exchange configuration data such as organization, server, storage group, private mailbox store, and http virtual server from AD. This means that the account that runs EPA is required to have the ability to read Exchange configuration data from AD. In the GUI version of EPA (EPAWin.exe), you will see a tree view of the topology once this step is complete.
Step 2: Scan HTTP virtual server and virtual directory configuration for each server. For each http virtual server found on the server, EPA reads configuration information from the IIS metabase for two attributes: running state and SSL accessibility. For each virtual server, EPA maintains a list of mailbox virtual directory access links. The format of the link looks like this: http(s)://servername(:portnumber)/virtualdirectoryname/.
For a particular virtual server:
- If EPA detects its state as not running, EPA will ignore it.
- If EPA fails to read the running state, EPA assumes the virtual server is running.
- If EPA reads that SSL is not enabled, for each mailbox virtual directory found on the virtual server, EPA adds a corresponding “http://” link to the list
- If EPA reads that SSL is enabled, for each mailbox virtual directory found on the virtual server, EPA adds a corresponding “https://” link to the list
- If EPA fails to read the SSL attribute, EPA adds two entries of this virtual server to the list. One assumes the SSL is enabled (https://), and the other does not (http://).
At the end, EPA sorts the list to put “https://” links on the top so that we will always attempt to use encrypted connections if they are available. When EPA moves on to Step 4 to collect data from each user mailbox, it will attempt to use each link in the list until it finds one that works. EPA will attempt to match the configured SMTP domain of a virtual server with each user’s SMTP domains to facilitate hosted Exchange scenarios.
You may notice that this step would require the account that runs EPA to have access to IIS metabase. If you don’t have access to the IIS metabase, EPA will make some assumptions about your configuration that should allow it to run under most circumstances, however you may see some error messages reported in the log if metabase access is not available.
Step 3: For each server and mailbox store we found in the topology, EPA determines whether or not they will be included in the data collection stage based on configuration input,
Step 4: Data collection. Based on user’s configuration of the ServerThread and MailboxThreadPerServer attributes in the configuration file, EPA creates single/multiple working threads for data collection. If ServerThread is configured to 1 (which is the default setting), EPA will collect data sequentially on servers. Otherwise, it will create multiple threads for servers, and starts to collect data from multiple servers at the same time. Similar logic applies to MailboxThreadPerServer. The mailbox thread is the actual thread that is doing the data collection from a single mailbox (see the section below for the details of this function). After each mailbox is processed, the statistics generated from the collected data will be summarized to the parent mailbox store. The collection process for a mailbox store is done when all the mailboxes in the store are processed. The statistics for the server level are calculated when the collections for all the mailbox stores on the server are done. Similarly, the statistics at the organization level will be calculated when the collections on all the servers are finished.
How EPA collects data from single mailbox
EPA accesses items in user’s mailbox by using HTTP/Web Distributed Authoring and Versioning (WebDAV). This requires that the account that runs EPA has to have Full Mailbox Access. Note that EPA only supports Integrated Windows Authentication (NTLM) authentication at this time. If your server does not have Integrated Windows Authentication enabled for HTTP virtual servers, EPA will fail to collect data. EPA will also fail if HTTP virtual servers on back-end mailbox servers have OWA FBA (Forms-Based Authentication) enabled.
The primary WebDAV methods that EPA uses are SEARCH, PROPFIND, BPROPFIND and X-MS-ENUMATTS. The SEARCH method is used to get the hierarchy table and content table from a folder. To get the actual properties of a folder or of a message item, EPA uses the PROPFIND and BPROPFIND methods. X-MS-ENUMATTS is used to get the attachment table from a message item. Table 1 shows the sets of properties that EPA reads for different types of store objects such as folder, message, attachment and appointment using WebDAV. Table 2 shows a list of collectors EPA calculates based on the properties listed in Table 1.
Table 1: Properties EPA reads by using WebDAV
Root Mailbox Folder
Note: Please refer to http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wss/wss/_webdav_x-ms-enumatts.asp for a complete list of supported WebDAV properties and detailed descriptions of the data containined within the properties listed above.
Table 2: Exchange Server Profile Analyzer Data Collectors
The total size, in bytes, of the mailbox
Total number of rules defined in the mailbox
Total number of visible folders in the mailbox
Maximum number of messages in any one folder
Number of visible folders in the mailbox
Number of user created folders that are direct children of the root of the mailbox
Number of search folders in the current mailbox
The height of the folder tree
Average number of children each child of the folder tree root has.
Average height of each of the root folder’s children’s subtrees.
Size in bytes of all the messages in a folder (takes list of folders to measure, e.g. “Inbox, Deleted Items, Sent Items”)
Folder size statistics (across all folders in mailbox) – provides Avg, Min, Max.
Number of messages in the mailbox
Number of messages in a folder (takes a list of folders to measure, e.g. (“Inbox”,”Deleted Items”,”Sent Items”)
Number of messages that are unread
Number of Deferred Action Messages in the mailbox
Number of messages where the subject prefix is “RE:” or equivalent subject prefix for the given culture
Number of messages where the subject prefix is “FW:” or equivalent subject prefix for the given culture
Number of messages containing at least 1 distribution list in the recipients table
Number of messages containing at least 1 attachment
Counts messages in a size range (takes a list of ranges, e.g. “2,10,100,1024” would provide counts of messages from 0-2,2-10,10-100,100-1024,1024-beyond)
Message size statistics across all messages (provides Avg,Min,Max)
Average number of messages received per day (provides Avg,Min,Max, and can restrict to the last N days)
Average number of rows in each sub table when the sent items folder is categorized by date (provides Avg,Min,Max and can restrict to last N days)
Average number of messages prefixed with “RE:” sent per day (provides Avg,Min,Max and can restrict to last N days)
Average number of messages prefixed with “FW:” sent per day (provides Avg,Min,Max and can restrict to last N days)
Number of messages in each body type requested (takes list of body types as input, e.g. “RTF,HTML,Other”)
Average number of recipients of each message in the Sent Items folder
Average number of distribution list recipients of each message in the Sent Items folder
Attachment size statistics across all attachments (provides Avg,Min,Max)
Counts attachments in specified size ranges (takes a range list of inputs, e.g. “2,10,100,1024” provides counts for ranges 0-2,2-10,10-100,100-1024,1024-up)
Statistics on number of attachments per message (provides Avg,Min,Max)
Number of contacts in the mailbox
Number of contacts created per day (provides Avg,Min,Max, can be restricted to last N days)
Number of appointments in the calendar
Number of appointments created per day (provides Avg,Min,Max, can be restricted to last N days)
Number of meeting requests in the calendar.
Statistics on number of meeting requestes received per day (provides Avg,Min,Max, can be restricted to last N days)
We want to hear from you! Feel free to send your feedback on this tool directly to epafb AT microsoft DOT com and we will use your input to help make future versions of this tool even better.
Frequently Asked Questions
Question: EPA reports CompletedWithException. Where can I find what exceptions occurred during the data collection process?
Answer: EPA reports exception information exceptions in a log file. The default log file will be located at the user’s application data path. For example, C:\Documents and Settings\<username>\Application
Question: Error: Unable to connect to Active Directory. Please make sure that the user account has enough permission.
Error: Unable to check the state of HTTP virtual server 1 on ServerName.
Error: Unknown error (0x80005000)
Answer: The errors are reported because the account that runs EPA does not have permission to read IIS metabase. The error code may vary. EPA will continue to attempt to collect data from user mailboxes by making some assumptions about the configuration of your Exchange topology. See “Step 2” described above for details.
Question: Error: Unable to find an available URI for user ServerName\MDBName\Mailbox x.
Answer: The error is reported for two reasons normally.
1. No HTTP virtual server is running.
2. For each HTTP virtual server on the exchange server, SMTP domain of the virtual server does not match with user’s SMTP domains.
Question: Error: User ServerName\MDBName\Mailbox x cannot access any of the following links:http://ServerName/Exchange/.
Answer: The error will come up in the following five cases.
1. The account that runs EPA does not have full mailbox access on the user’s mailbox. This can be verified through Exchange System Manager. Please see the EPA documentation for details on how to configure and verify permissions. If the account does have full mailbox access rights, then try using Internet Explorer to access the user’s mailbox with OWA. If you are not able to access the mailbox with OWA, EPA will not be able to access it either.
2. The authentication methods configured on the virtual directory do not include Integrated Windows Authentication which is the only method EPA supports.
3. There are restrictions set up on the Exchange Server that restrict TCP/IP traffic in some way so that HTTP or HTTPS traffic from the machine you are running EPA on is blocked. You can work around this by running EPA on a machine that has the ability to connect with the back-end Exchange Server (such as an Exchange front-end server).
4. When SSL is required on the server and the name on the SSL certificate does not match the name that EPA is using to access the server, EPA will fail due to SSL certificate validation errors. We will have a fix for this in next Web Release.
5. When OWA Forms-Based Authentication is enabled on the HTTP virtual server, EPA will fail since the DAV requests submitted by EPA will get “440 Login Timeout” error. This is similar to http://support.microsoft.com/default.aspx?scid=kb;en-us;817379 . We plan to provide better feedback for this case in the next web release version.
If these troubleshooting suggestions don’t solve your EPA problems, please contact epafb AT microsoft DOT com for further investigation.