It's been a while since I've updated this blog as I've been busy rolling out content internally in Microsoft to help our customers connect to our cloud services. I do have some new tips around checking routing and ISP performance that I'll write up soon and publish.
However, my colleagues have run into an issue with multiple customers which causes mailbox migrations to Office 365 to be very slow or to fail completely which is very difficult to diagnose and as such I thought I'd share in the hope it helps some of you.
We've had numerous issues with mailbox migrations running slower than they should, network checks show good bandwidth, low latency, TCP Window Scaling enabled etc etc but still no joy. Errors such as "Transient error CommunicationErrorTransientException has occurred. The system will retry". Or some other timeout errors may also occur.
My colleagues Andres Canello and Roshan Padmanabhan have bottomed out this issue which is caused by a Denial of Service protection feature on many web egress devices such as proxies and firewalls called HTTP Flood Mitigation.
This security feature is designed to minimize multiple HTTP requests from the same source IP, during mailbox migration O365 will send many HTTP connection requests to the MRS endpoint as it goes about its business. Customers normally publish their MRS endpoint using a reverse proxy (for example TMG Server) and HTTP flood mitigation is enabled by default on TMG which is throttling the number of connections that can pass through it per minute, ultimately slowing or causing the migrations to fail.
For TMG you can disable this feature or, ideally add IP exceptions for O365 IP ranges to ensure this restriction is not applied for the mailbox migration traffic. However, given the enormity of our IP ranges for Office 365 it may be easier just to disable this for the duration of the migrations.
There are numerous settings which can affect this but the main one to be concerned with is HTTP requests per minute per IP Address.
Refer this link for more information on how to do this in TMG: http://community.office365.com/en-us/w/exchange/office-365-move-mailbox-fails-with-transient-exception.aspx
You can read more about Flood Mitigation here http://technet.microsoft.com/en-us/library/cc995196.aspx
It's also possible that you have similar capabilities on any other proxy or firewall device that's being used to publish your MRS endpoint so, even if you are not using TMG it's well worth checking if you have any DoS type capability on your proxy or firewall preventing multiple HTTP Requests from the same source IP.
This may not be called HTTP Flood Mitigation but will refer to HTTP or TCP throttling or DOS protection. QOS devices could also do something similar to limit the amount a single TCP session can use.
Hope this helps and Kudos to Andres & Roshan for their work bottoming this one out.