Right time, right place; breaking down the migration barriers

Article
06/30/2014

The following post is from Ian Masters of Vision solutions

I work for one of Microsoft's go to market partners for migrating customers to Hyper-V and Azure. I want to tell the story of how that came about, what problem we solved and how it all worked.

Let's start at the beginning. I was fortunate enough to be invited to speak at one of the System Center 2012 launch events. The particular one I attended was in Helsinki and I will always be grateful to Riku Reimaa for giving me the opportunity to be a guest speaker. Riku kindly gave me 2 slides and 10 minutes to present what we could do (for anyone who knows me they'll know stopping me talking after 10 minutes is quite an achievement!). So I got up, did my bit and sat back down - keep in mind the rest of the day was a blank to me as it was all in Finnish, unsurprisingly! However during the next break I was accosted by a senior technical specialist from Microsoft who said, "you can help me!"…

Microsoft was heavily engaged with a Cloud OS Launch Partner, a Service Provider who had made a serious commitment to move all of their physical and VMware workloads to Hyper-V. Included in the deal was a significant amount of consultancy from Microsoft Consulting Services, but they had hit a roadblock and we're struggling to find a way to move forward. During the planning and assessment phase the Service Provider had identified a significant number of workloads that could not be taken offline in order to image them. When asked why, the answer was simple - they owned the infrastructure, servers and storage, plus the hypervisor and automation/orchestration layer, but they didn’t own the workloads. The servers, the majority of which were VMware virtual machines, were their clients. The Service Provider had strict SLA's in place to ensure the lights were kept on and would incur penalties if they took the workloads offline for too long. This left them with two options, do nothing and stay as they were or try and negotiate with their clients for additional downtime, neither of which was appealing. So Microsoft were looking for a solution that could migrate those workloads without any significant downtime, not an easy challenge to overcome.

At this point it's worth considering the tools that Microsoft had in hand and why they were unable to overcome the challenge. I won’t go into detail on how the Microsoft solutions work, but suffice to say that most of the solutions on the market - free or otherwise - approach the problem in the same way. The general approach is to take a snap shot of the production server and copy this to the new machine and then when you’re ready to migrate you have to take everything offline and perform the final synch. This, plus the manual labour required and potential risks posed by the fact that you cannot test cutovers, meant that Microsoft Consulting Services needed an alternative.

This is where we came in, as experts in High Availability and Disaster Recovery, in other words we kept the lights on. If you think of a simple HA solution, a pair of servers "clustered" such that when the production server is unavailable it fails over to the secondary server and when you fixed the issue you tailback to where you started. We approached this migration in a similar way, except once we moved them to a second system they stayed there. We achieved this with our Double-Take Move (DTM) product, let me explain how that worked. Once installed on the production server, be it physical or virtual, it makes an initial block level mirror of the entire server and at the same time our byte level replication starts capturing any new changes being made. This is achieved through our mini system filter driver which is installed on the server… One consideration here is that the initial mirror is going to add some payload to the server and so if it is already maxing out the resources you may run into some issues.

The great thing about DTM is, it will never bring the server down but may start queuing data or simply stop replicating in order to avoid this happening. Good assessment and planning can avoid any issues and ensure success. DTM replicated it to the new Hyper-V host, created mount points, which were then used to create the virtual machines on Hyper-V. This meant that there was no need to do any more than set up the host, DTM auto-provisioned the virtual machines. So to this point in the process there was no need to take the users, applications or servers offline, the entire process was achieved with the lights on. It then kept everything in synch until a convenient time to migrate. This could have been instigated through the click of a mouse or by scheduling the "cutover" to occur, in this case it was a scheduled process that took place between 12am and 1am. The reason for this was that this Service Provider had very tight SLA’s to meet for their clients and when the final migration occurs it requires a single reboot the new Hyper-V virtual machine, this is where users are going to be taken offline. Typically it is well under 15 minutes per server and then users are back online and working again, in this case there was a significant amount of automation built in to the process through our System Center integration. The orchestration and automation was such that user acceptance testing was completed automatically. What DTM allowed us to do was test cutovers, bring the new virtual machine online but not connected to the network, we could then create a private network to do user acceptance testing, once they were happy they could reconnect the machines and synch only the latest changes. Note that during the cutover process there are a myriad of options around topics like addressing and resources and we could make changes as required as part of the final cutover process. Basically anything you can script in PowerShell can be triggered to happen either pre-cutover or post-cutover. The entire migration is a consistent, repeatable process with almost no downtime and no risk.

The project I’m referring to in this article is Telecomputing, it has been written up as a Microsoft Case Study, I'll call out the two keys points highlighted by the customer. They believe we reduced the engineering effort by up to two thirds and saved them up to 20'000 hours of downtime on the first 1000 workloads alone. Think of this in monetary terms and that a huge cost saving. It also works just as effectively if you are looking to migrate to Azure, there are some slight differences in the approach, mainly that we will migrate to a pre-provisioned virtual machine, just a base template of the same OS, as we don't have hyper visor access to do the auto-provisioning. When the new system is rebooted upon cutover we apply the system state of your original production machine over the top of the template, this will induced all the service packs, hot fixes, security patches etc. You still have the option to change addressing and resources. We’ve also successfully helped K2 to migrate directly from Amazon AWS to Azure, this again has been written up as a case study. If you want to find out more information on what we offer on migrating workloads to Hyper-V or Azure, visit our website.

So as you can see I happened to be in the right place at the right time and we now have many successful migrations completed and continue to work closely with Microsoft to overcome the migration challenges and drive adoption of Hyper-V and Azure.

Have you been in a similar position to Ian where you’ve been in the right place, right time? if so, let us know via @TechNetUK .

Right time, right place; breaking down the migration barriers

Additional resources