How we (Microsoft) are using Azure for internal apps

One of things I love about working for Microsoft is that we use our own stuff, we really trust, we really deploy it, we really use it.  Not all our competitors do, you can tell because they don’t talk about it, they don’t run their own massive data centres for example.  We do and it gives us experience.  MSIT – our IT department, yes we do have an IT department too, has built and deployed SXP or Social eXperience Platform) on the Windows Azure platform – and more stuff is going that way too.

image

So what is SXP and what makes it special?  Well SXP runs on this site our video showcase and it is essentially a platform that allows us to manage and understand the social aspects of our content.  That content can be web pages, videos (as in the video show case site), blog posts, new stories, press releases…anything.  Essentially you could say it adds social context to anything and allows us to understand that context.  It’s a back end tool, it’s not doing the content hosting.

The platform is built on Azure (one web server role running on 3 medium instances) and storage is taken care of by SQL Azure with each subside of Microsoft.com having it’s own database allowing customisation and isolation of problems, should there be one.  The user interfaces are delivered with Silverlight.

There are some cool management things too, SCOM integration being handled by some custom code right now but the RC of the Azure Management pack is being run in parallel and that’s going to be something every IT Pro who’s managing Azure will love.  There’s also an interesting tool called “Keynote” running that checks that the web service is available from different points all over the globe and the tool the user facing tool for managing the workflow has been created in Silverlight and uses AD FS (Active Directory Federation Services) for authentication – meaning that once you’re signed into your desktop you’re signed into the app.

This is obviously not new functionality to us, commenting and rating of videos has been with us for some time but the 3rd party solutions we had in place don’t seem to have been the most manageable.  On that point we get quite a lot of comment spam that has to be filtered away.  The service has been live for about 120 days now and MSIT tell us that they’ve saved about $14k PER MONTH! in management costs, upped availability by 8 fold and made provisioning a staggering 240 times faster!  That’s Azure for you.

The team learnt some excellent lessons, which they’ve published here along with more detail on the above, but the lessons are really important and I want to call them out:

    • Pick the right application. The primary key to success in developing for Windows Azure is to pick the right application. Windows Azure is great for building a Web application or a compute-intensive application, but it is not yet a general-purpose application development platform. The SXP team started with a lower-risk customer-facing project to validate that everything worked as planned.
    • Prepare for the impact on operations. Operations is the biggest change when developing for Windows Azure. It is critical to understand the operations impact, get the operations team on board early, and design for operations. Because the SXP team knew that it faced an unknown operations environment, the team heavily instrumented its code so that it would know exactly how many transactions succeeded or failed. The team took advantage of the out-of-the-box Windows Azure Diagnostics to do the bulk of the work but also wrote some small custom tools.
    • Prepare early for security and integration. Security and integration are very important considerations. AD FS is a great security solution for authentication. Integration is challenging if there is data inside and outside the firewall. For projects that require integration, it is important to make sure that those problems can be solved before development starts.
    • Build in SQL Azure retries. SQL Azure moves databases to balance load. When a database is moved, the connection pool becomes invalid. If the service does not retry the SQL connection, the connect request fails, causing errors. So it is critical to build in SQL Azure retries. For examples of code that retries connections, see the blog entries at https://bit.ly/98Makv.
    • Conduct performance testing. Running performance and stress tests with Visual Studio and Windows Azure can be challenging. The problem is the wide area network (WAN) link between the client and the server. The results can become skewed, and the network can quickly become a bottleneck. The potential to accumulate bandwidth costs also exists. The SXP team solved this problem by creating a simple application that it deployed in Windows Azure in the same data centre, in order to test heavily on the same network. This testing provided feedback on real performance data.

I have a bet with myself about what the first comment will be on this post…