From Data to Operationalized ML in 60 Minutes!

This blog post was co-authored by Debi Mishra , Jacob Spoelstra and Dmitry Pechyony of the Information Management & Machine Learning team at Microsoft.

Microsoft has a strong track record for crafting tools such as our Office apps or Visual Studio which millions of users find relatively easy to use. These apps have set the industry standard for individual and team productivity in terms of how quickly a new user can learn the tool, use it to accomplish their tasks and automate tedious or mundane activities, so they can better focus on their job.

Great tools spark creativity and do not get in the way of the user. They make seemingly difficult things easy to accomplish. Great tool often eventually end up creating an entirely new breed of empowered users. Take for instance how, in the 1990s, Microsoft Visual Basic expanded the base of software programmers by millions worldwide – users who might otherwise not have taken up such a pursuit. 

When we created Azure Machine Learning Studio, our target audience included all data scientists – from aspiring students and hobbyists to enthusiasts and seasoned experts. Our primary vision for the tool is the title of this blog post – we truly believe that the proof of the ease and power of a tool such as Azure ML Studio can be measured by the time it takes for a typical data scientist, even someone relatively new to the field, to go all the way from raw data to a fully operationalized web service, powered by the intelligence harnessed from that data.

In this post, we talk about how Azure ML Studio is helping drive greater productivity and ease of use for our Data Scientist audience. In particular, we focus on how the tool enables users to stand up intelligent web services powered by predictive analytics in a matter of an hour or less.

The ease of use starts with Azure ML being cloud hosted. There is no software to install, no hardware to manage, no dependency on IT and practically no constraints on disk space or CPU cycles. With our free option – which no longer requires an Azure subscription or credit card – you can start developing ML models in a matter of minutes and you can do so from anywhere, using any device and using nothing but a web browser. You can start work on an ML experiment at your workplace, pick things up from where you left them during your commute and – later the same evening – continue running your experiment from your tablet at home.

Model Authoring Experience

Azure ML Studio lets you set up experiments as simple visual data flow graphs, with an easy to use drag, drop and connect paradigm. The tool also makes many common data science tasks easy and intuitive. For instance, you can do the following:

  • Bootstrap from a set of pre-authored templates of fully working experiments, representing common data science patterns.

  • Compose an experiment workflow using “modules” as algorithmic building blocks. All our modules are plug ‘n play with strong “typing” and have reasonable default settings pre-selected. So simply dropping in a module without any customization works as a reasonable starting point.

  • Bring the data from multiple sources including SQL, Hadoop, OData, and Azure Storage. 

  • Use our powerful built-in suite of world class ML algorithms. All our learners can be used in the same way and swapped with each other as needed, so there is little effort to use a new learner.

  • Handle feature selection with feature selection and parameter sweeper modules.

  • Easily compare the performance of several algorithms and choose the one that works best for your problem. Since our data flow graphs support multiple parallel paths, you can make side-by-side comparisons easily.

  • Use our built-in support for R. Over 400 of the most popular CRAN packages come preinstalled. This allows your existing R skills and scripts to be directly brought into and integrated seamlessly into Azure ML – see an earlier post on this topic. Our team is working to add Python support soon.

  • Easily revisit prior runs of an experiment, using our lineage tracking capability – so you can get a complete view of your prior experimentation.

  • Avoid programming for a large number of common tasks, which lets you focus on experiment design and iteration.

  • Collaborate with others worldwide on your project. Azure ML Studio lets teammates virtually look over each other’s shoulders, share data and intermediate results, and pick up on your work where others left off.

This screenshot of a typical Azure ML Studio experiment showcases many of these points:

We are gratified to receive many positive comments from our customers regarding our ease of use. Here is one such comment, from Yogesh Dandawate of Icertis Applied Cloud :The standout benefit for us was to be able to quickly build and test predictive models and verify their results. There is no cognitive overhead to learn a new scripting or coding language”.

Models in Production

Data Scientists want to see their models deployed and functional in the real world. A common frustration is how hard it is to put built models into production, and indeed, a large percentage of models never see real world usage. Azure ML Studio makes it super simple to deploy a model into production use, with a single click. The operationalized workflow – containing the data transformations and model – are deployed as web services supported by the fully managed, secure, reliable, and elastic Azure cloud infrastructure, which provides worldwide access. The model that you build can be called from any modern programming language used by the engineering team that consumes the model. As you publish the model, Azure ML Studio provides you with sample code in C#, R or Python for immediate consumption of the published web services within the app or a productivity tool like Excel. Azure ML also provides an operationalization layer for R code. You can easily transform your existing R code into a cloud-based model with REST APIs. This is a critically important feature given how large the R developer community is, and given the fact that they have historically not had such as easy way to operationalize R code. The blog post Running R in the Azure ML cloud on R-bloggers discusses how Azure ML enables easy deployment of R models.

Our team has much work ahead as we aim to make our tool even more widely accessible and productive for our users. For instance: 

  • We have heard from you that full REPL capability inside the Studio is desirable.

  • We are adding support to allow the dataset schema information to “flow” down the workflow, so that the column selector works even before you have run the experiment.

  • We are working on “composite modules” that will enable users to save common workflows as pre-fabricated compositions which they can reuse across many experiments.

  • Our design team is conducting user studies to create a continuous feedback loop and we are combining those inputs with our service analytics, to ensure that our team is fully aware of areas where the tool could do even better.

“The ease of implementation makes machine learning accessible to a larger number of investigators with various backgrounds—even non-data scientists.” says Bertrand Lasternas of Carnegie Mellon University. Hans Kristiansen of Capgemini agrees: "Azure ML offers a data science experience that is directly accessible to business analysts and domain experts, reducing complexity and broadening participation through better tooling."

If you have not done so already, please go to www.azure.com/ml and start using Azure ML Studio for free. Be sure to check out our samples, create a new experiment and stand up your own ML web service – not in weeks and months as it used to take – but all in matter of an hour or less! Send us your feedback and thoughts.

We believe that, with the right future investments, Azure ML can truly help attract many more practitioners to the data science community, just as Visual Basic earlier did for an earlier generation of software developers.

Debi, Jacob and Dmitry
Follow Debi on Twitter