AzureML: A short introduction

Richard Conway, Elastacloud, Director and Head of Cloud Services at Elastacloud Limited, Co-Founder at UK Windows Azure Users Group.   If you're interested in getting started with Azure, join us every Tuesday from 12:30-14:00 (UK timezone) for the Azure Weekly Webinar. It is aimed at the techie who has not yet had any/much exposure to Azure but who just wants a leg-up to get started. This week join us for the first hour's "how-to" session which is immediately followed by a 30 minute talk by our guest speakers from Scaboodle who will be giving a presentation on Building a Cloud Business.  

Over at Elastacloud we’ve been using Big Data and machine learning frameworks on Azure for a years and on a community level teaching free courses and bootcamps on HDInsight and the associated Machine Learning framework Apache Mahout. As such I’ve been waiting with baited breath for Microsoft to release their own machine learning offering. Let’s deal with some basics …

What is machine learning?  Machine learning provides computers with the ability to learn without being explicitly programmed. It focusses on the development of software that can teach itself to grow and change when exposed to new data.

In my short online briefing I covered the idea of training data. In order to create a model and begin to understand the relationship between data points we can apply several types of algorithms to create models. These models can then be applied to new data.

You can see that after we select our dataset we train our model so that we can then test further data. We can then try the model with new data and or a portion of our original dataset and evaluate the results. The feedback cycle can continue ad-infinitum so that we create the best model available to us.

AzureML allows this process to occur very simply. If you look at the dataset I’ve chosen below you can see that I’m testing the idea that cricket chirps get louder as temperature increases. With AzureML I haven’t needed to write any code in this instance as I have data for cricket chirps in decibels and temperature in centigrade.

You can see that we feed the algorithm – in this case Linear Regression (remember from secondary school statistics class!) which seeks to find a linear relationship between the two aforementioned variables. We then train this model and we can score it and evaluate it thereafter. This allows us to feed additional cricket chirp data based on our training data so that we can determine whether we can accurately predict the temperature given the chirps of cricket. Evaluating our model is easy since if we have the actual data we can then test to see whether our predicted data and actual data match. Determining whether our model is effective is a simple consequence of averaging the errors between the two sets of data points!

What can machine learning be used for? Any type of problem which has available data can be used with AzureML. Google are using machine learning to make self-driving cars, Microsoft to make the Kinect and X-Box recommendation tools. Netflix is the posterchild of recommender systems and has successfully used machine learning to predict what its users want to watch next. Not an easy feat.

AzureML is a fully featured machine learning host which will allow you upload or process datasets in Azure, clean that data up and then make predictions using that data. Currently it supports a drag-drop web interface using HTML5 called MLStudio but supports a high level of configuration over the tasks you can enable and it also supports the R programming language which is the most popular language used by statisticians and data scientists as well as .NET exposing some of the features of Infer.NET – a powerful library produced by Microsoft Research.

Any models can also be exposed as web services which is powerful feature of enablement for many data scientists that don’t have the software skills to take their models into production.