Taking a closer look at Microsoft Azure Maintenance to avoid downtime

 

We receive a lot of cases where the customer wants to know why their VM running in Azure was restarted and why they were not pre-notified about the same and in the future can we pre-notify them when there is any maintenance performed.

 

This blog post is an attempt to answer these questions and shed more light on Microsoft Azure maintenance.

 

Firstly let’s understand the differences between planned and unplanned maintenance:

 

Planned maintenance events are periodic updates made by Microsoft to the underlying Azure platform to improve overall reliability, performance, and security of the platform infrastructure that your virtual machines run on. The majority of these updates are performed without any impact to your virtual machines or cloud services. However, there are instances where these updates require a reboot to your virtual machine to apply the required updates to the platform infrastructure.

Unplanned maintenance events occur when the hardware or physical infrastructure underlying your virtual machine has faulted in some way. This may include local network failures, local disk failures, or other rack level failures. When such a failure is detected, the Azure platform will automatically migrate your virtual machine from the unhealthy physical machine hosting your virtual machine to a healthy physical machine. Such events are rare, but may also cause your virtual machine to reboot.

Now that we understand the difference between the both let go about addressing the below questions.

 

Why did my VM running in Microsoft Azure reboot?

Your VM could have rebooted because of planned or unplanned maintenance events.

 

Why didn’t we receive any notification for this maintenance?

We may provide pre-maintenance notification to users/customers who have VM’s not running in an availability set. It is expected users/customers who have their virtual machines in an availability set have designed the system to absorb the workload when maintenance is performed. We do not guarantee all maintenance will be announced this way as there may be emergency system maintenance necessary. These notifications will also not be provided in cases of service healing, movement from bad hardware or movement for load balancing. There is a product plan (wherever possible) to provide this type of information directly to the customer(S). However, this is a future feature

 

What is this availability set and how does it prevent downtime?

Recommended way to prevent downtime in these unexpected occasions is by using the functionality of availability set. You should use a combination of availability set and load-balancing endpoints to help ensure that your application is always available and running efficiently. The below links has more information on the same.

IaaS Virtual Machines, are essentially single-instance roles that have no scale-out capability. An important goal of the IaaS feature release was to enable Virtual Machines to be able to also achieve high availability in the face of host updates and hardware failures and the Availability Sets feature does just that. 

Availability Sets have five Update Domains (UDs) by default and support up to twenty. The Fabric Controller (Microsoft Azure Kernel) spreads instances assigned to an Availability Set across UDs. 

This allows customers to deploy Virtual Machines designed for high availability, for example two Virtual Machines configured for SQL Server mirroring, to an Availability Set, which ensures that a host update will cause a reboot of only one half of the mirror at a time as described here - https://blogs.technet.com/b/markrussinovich/archive/2012/08/22/3515679.aspx

Also note that just having a single instance of VM does not qualify for our Service Level Agreement, which requires two or more virtual machines running in the same Availability Set.

 

Will I receive per-notifications for a single instance VM deployed into an availability set?

We may provide pre-maintenance notification to users/customers who have VM’s not running in an availability set because your virtual machine is running in an Availability Set, you will not be notified via email of upcoming planned maintenance events to help you reduce the impact to your service. We send notification of planned maintenance only for virtual machines that are not inside an Availability Set.  

If you require this virtual machine to run as a single instance, we recommend you remove the virtual machine from the Availability Set so that you can receive notification of planned maintenance. Note that moving a virtual machine in or out of an Availability Set will result in the machine rebooting as we migrate it to a new physical machine.  

 

Will I receive per-notifications for VMs deployed into an availability set?

We send notification of planned maintenance only for virtual machines that are not inside an Availability Set. 

 

Hope this information will help you avoid downtime during Microsoft Azure maintenance.