Building a disaster recovery environment for SharePoint in Windows Azure--a few things we learned

My team is wrapping up a proof of concept (POC) build of an intriguing solution—disaster recovery of a SharePoint on-premises farm to Windows Azure IaaS. We learned a lot about the Azure infrastructure, including how to place a SharePoint environment across the Azure IaaS architecture and how to manage the environment from day-to-day.

The solution requires a site-to-site VPN connection to Windows Azure. SQL Server log shipping and Distributed File System Replication (DFSR) are used to move SharePoint databases from the on-premises environment to Azure. Our Azure environment looks similar to this reference architecture.

The file share serves double duty running both DFSR and serving as the 3rd node of a Node Majority for SQL Server AlwaysOn.

One lesson we learned is to design the entire architecture BEFORE creating VMs in Azure. Hopefully we have done most of the ‘learn by doing’ for you in this category. For us, changing the configuration of availability sets and cloud services involved throwing away one set of VMs and starting over. There’s no need for you to experience this pain. In partnership with the Azure Center of Excellence and a few of our best SharePoint field experts, we settled on some best practices for architecture that are described in Windows Azure Architectures for SharePoint 2013. These best practices can be applied to any SharePoint solution in Azure.

As we wrap up this project we are capturing some of the operational issues and best practices that we learned. One of them is to monitor and manage time sync issues between the on-premises and Azure environment. We ran into at least one issue that was solved by adhering to best practices in this category. We hope to publish more operational guidance soon.

Finally, I’ll reiterate a best practice I learned from Mark Russinovich, Technical Fellow in the Cloud and Enterprise Division at Microsoft. Create the largest VM in your Windows Azure environment that you plan to use first. This ensures your solution lands on a “stamp” that allows the largest size you need. This size issue may not be a problem at some point in the future of Azure. However, my team confirmed that this was necessary for our environment.

For more information see: SharePoint Disaster Recovery in Windows Azure.

To learn more about setting up the site-to-site VPN connection, see Deploy Office 365 Directory Synchronization in Windows Azure.

For more information on DR planning, see Brian Lewis' blog series: Disaster Recovery Planning for IT Pros.