A Data Scientist’s Perspective on Azure Machine Learning

Posted by Lixun Zhang, Data Scientist at Microsoft.

Before using Azure ML, I’ve had experience with statistical models and ML techniques using R, Python, and SAS. When I first got exposed to Azure ML, there were three burning questions in my mind. After much exploration, I found my answers and have summarized them in this post.

My three questions were:

  • What is Azure ML?

  • Why should a data scientist with experience in R, Python, SAS, etc. use Azure ML?

  • How can I use Azure ML?

What is Azure ML?

In a nutshell, Azure ML allows you to develop models on the cloud. Models are experiment-based and can be created by dragging and dropping modules. The wide range of modeling options include, but are not limited to, linear regression, logistic regression, support vector machines and boosted decision trees. Since it’s cloud-based, anyone with internet access can develop models from a web browser. Azure ML also allows models to be deployed in minutes, which allows others to use the web service based on your model. Another thing I liked about Azure ML is that it is integrated with R and Python environments. This feature makes it possible for data scientists to write and run R and Python programs on the cloud as well.

Why Should a Data Scientist Use It?

If you’ve been using other tools – R, Python, SAS, etc. – for most of your career, you might wonder why you would use Azure ML. Several things make Azure ML worthwhile to learn and use. First off, you won’t need a powerful on-premise computer in order to address the challenges caused by big data. This is due to the fact that Azure ML is on the cloud. Second, the deployment feature helps reduce the time from model development to model implementation. Since you can use Azure ML to set up a web service, the amount of IT efforts spent on implementation are significantly reduced. Third, you can still develop R and Python programs. Fourth, if you have an Azure storage account, you can easily access your data in your account from Azure ML. Finally, if you are using PowerBI, the predictions from Azure ML can directly feed into it. Based on these, I came to the conclusion that it was worth my while to learn Azure ML.

How to Use Azure ML?

Once I understood the advantages of Azure ML, I wanted to learn how to use it so I could make the most of its benefits. True, there is a lot of documentation and videos out there about Azure ML. But what I ideally wanted was something to teach me about the essentials of Azure ML within a few hours. I spent a couple of weeks reading various documents and trying out experiments. I have summarized the key lessons I learned and developed a tutorial for data scientists who are new to Azure ML – this tutorial, the Data Scientists’ Guide, can be found at Cortana Analytics Gallery. By spending 3-4 hours on it you’ll get a good overall understanding of the tool.

In the Guide, I also checked the results from Azure ML’s algorithms against the results from R. Using linear regression as an example, I compared the parameters and model performance metrics from Azure ML’s algorithm with those from R. It turned out that they return the same values. This reassured me about my decision to use Azure ML. The details on this can be found in the Data Scientists’ Guide.

Conclusion

In summary, Azure ML is a modern cloud platform for data scientists to develop and deploy models without the need for powerful on premise computers. It can help individual data scientists and organizations reduce the time and resources needed for model implementation.

Lixun Zhang
Follow me on Twitter