Whitepaper - Designing Your Network Infrastructure For Disaster Recovery

In the current cycle of customer validation during the last 12 months, we have been working with many enterprise organizations and service providers in order to deliver successful Azure Site Recovery (ASR) deployments.
The journey has been great and interesting.
In a world where everyone expects 24/7 connectivity, it is more important than ever to keep your infrastructure and applications up and running. The purpose of Business Continuity and Disaster Recovery (BCDR) is to restore failed components so the organization can quickly resume normal operations. Developing disaster recovery strategies to deal with unlikely, devastating events is very challenging. This is due to the inherent difficulty of predicting the future, particularly as it relates to improbable events, and the high cost to provide adequate measures of protection against far-reaching catastrophes.

Crucial for BCDR planning, Recovery Time Objective (RTO) and Recovery Point Objective (RPO) must be defined as part of a disaster recovery plan. When a disaster strikes the customer's data center, using ASR, customers can quickly (low RTO) bring online their replicated virtual machines located in either the secondary data center or Microsoft Azure with minimum data loss (low RPO). Failover is made possible by ASR which initially copies designated virtual machines from the primary data center to the secondary data center or to Azure (depending on the scenario), and then periodically refreshes the replicas. During infrastructure planning, network design should be considered as potential bottleneck that can prevent you from meeting company RTO and RPO objectives.

When administrators are planning to deploy a disaster recovery solution, one of the key questions in their minds is how the virtual machine would be reachable after the failover is completed.

While designing the network for the recovery site, the administrator has two choices:

  • Use a different IP address range for the network at recovery site. In this scenario the virtual machine after failover will get a new IP address and the administrator would have to do a DNS update.
  • Use same IP address range for the network at the recovery site. In certain scenarios administrators prefer to retain the IP addresses that they have on the primary site even after the failover. In a normal scenario an administrator would have to update the routes to indicate the new location of the IP addresses. But in the scenario where a stretched VLAN is deployed between the primary and the recovery sites, retaining the IP addresses for the virtual machines becomes an attractive option. Keeping the same IP addresses simplifies the recovery process by taking away any network related post-failover steps.

During those conversations with customers, we were listening very carefully to what they are missing today. One common feedback we've heard consistently is about the need for a prescriptive guidance on how to design the network infrastructure for disaster recovery. This helps to guarantee the best possible RTO by bringing (online as soon as possible) their replicated virtual machines located in either the secondary data center or Microsoft Azure.

This whitepaper is directed to IT professionals who are responsible for architecting, implementing, and supporting business continuity and disaster recovery (BCDR) infrastructure, and who want to leverage Microsoft Azure Site Recovery (ASR) to support and enhance their BCDR services. This paper discusses practical considerations for System Center Virtual Machine Manager server deployment, the pros and cons of stretched subnets vs. subnet failover, and how to structure disaster recovery to virtual sites in Microsoft Azure.

We posted this white paper in my TechNet Gallery contributions here:

Till next time, Happy "Networking"!

Prateek Sharma
Senior Program Manager – Azure Site Recovery Team

Nader Benmessaoud
Program Manager – ECG CAT Team