This blog post is authored by Debi Mishra, Partner Engineering Manager in the Information Management and Machine Learning team at Microsoft.
The open source community practicing machine learning (ML) has grown significantly over the last several years with R and Python tools and packages especially gaining adoption among ML practitioners. Many powerful ML libraries have been developed in these languages resulting in a virtuous cycle with even more adopting these languages as a result. The popularity of R has a lot to do with CRAN while Python adoption has been significantly aided by the SciPy stack. In general though, these languages and associated tools and packages are a bit like islands – there is generally not much interoperability across them. The interoperability challenge is not just at language or script level. There are specialized objects for “dataset”, specialized interpretation of “columnar schema” and other key data science constructs in these environments. To truly enable the notion of “ambient intelligence in the cloud”, ML platforms need to allow developers and data scientists to mix and match languages and frameworks used to compose their solutions. Data science solutions frequently involve many stages of computation and data flow including data ingestion, transformation, and optimization and ML algorithms. Different languages, tools and packages may be optimal for different steps as they may fit the need of that particular stage better.
The Azure ML service is an extensible, cloud-based, multi-tenant service for authoring and executing data science workflows and putting such workflows into production. A unique capability of the Azure ML Studio toolset is the ability to perform functional composition and execute arbitrary workflows with data and compute. Such workflows can be operationalized as REST end-points on Azure. This enables a developer or data scientist to author their data and compute workflows using a simple “drag, drop and connect” paradigm, test these workflows and then stand them up as production web services with a single click.
A key part of the vision for the Azure ML service is the emphasis on the extensibility of the ML platform and its support for open source software such as R, Python and other similar environments. This way, the skills as well as the code and scripts that exist among current ML practitioners can be directly brought into and operationalized within the context of Azure ML in a friction-free manner. We built the foundations of the Azure ML platform with this tenet in mind.
R is the first such environment that we support, specifically in the following manner:
Data scientists can bring their existing assets in R and integrate them seamlessly into their Azure ML workflows.
Using Azure ML Studio, R scripts can be operationalized as scalable, low latency web services on Azure in a matter of minutes!
Data scientists have access to over 400 of the most popular CRAN packages, pre-installed. Additionally, they have access to optimized linear algebra kernels that are part of the Intel Math Kernel Library.
Data scientists can visualize their data using R plotting libraries such as ggplot2.
The platform and runtime environment automatically recognize and provide extensibility via high fidelity bi-directional dataframe and schema bridges, for interoperability.
Developers can access common ML algorithms from R and compose them with other algorithms provided by the Azure ML platform.
The pictures below show how one would use the “Execute R Module” in Azure ML to visualize a sample dataset, namely “Breast cancer data”.
It is gratifying to see how popular R has been with our first wave of users. Interestingly, the most common errors our users see happen to be syntax errors that they discover in their R scripts! Usage data shows R being used in about one quarter of all Azure ML modelling experiments. The R forecasting package is being used by some of Microsoft’s key customers as well as some of our own teams, internally.
You too can get started with R on Azure ML today. Meanwhile, our engineering teams are working hard to extend Azure ML with similar support for Python – for more information on that, just stay tuned to this blog.