High Availability for the Exchange Service

It has been a while since my last posting. We have been pretty busy dotting the i's on the new Office 365 service, and deploying the beta (which has gone very smoothly).

The conversation this time is centered on the framework through which we have tackled the challenges of making sure that we can deliver a highly reliable service. In this video, I cover some of the areas the Exchange team has focused on as our responsibilities have evolved from selling on-premises software to also encompass managing a service.

The most interesting aspect, is the extra focus that is applied to making sure that the isolation models in the service are well thought through. On-premises scenarios provide isolation for free since each company's deployment is very heavily isolated from every other one. To get efficiencies of scale, cloud services create the risk of tightly coupling the whole deployment so that one mistake or design flaw can cause failure for millions of customers.

In thinking through isolation we look at the important usage scenarios or resources within the service and try to identify a coherent set of nested isolation levels. To randomly pick a case, say storage, there are a lot of isolation levels that have meaningful levels of isolation that are brought to the table. Specifically, the Exchange service relies on a number of different layers of strictly nested isolation. For example: messages, mailboxes, database, server, database availability group and forest. In addition there are is datacenter and regional isolation layers that help but aren't in a strict hierarchy with the core layers. 

This talk is centered on the framework or philosophy by which we think through our designs for reliability and availability. In future blog postings I think it would be interesting to work through some examples or case studies of these scenarios. Please let me know if there are any particular areas that would be of interest.

- Perry