Jupyter Notebooks with R in Azure ML Studio

This post is authored by Andrie de Vries, Senior Program Manager at Microsoft

Introduction

Azure ML Studio is a powerful canvas for the composition of machine learning experiments and their subsequent operationalization and consumption. In addition to experiments, Azure ML Studio also contains Jupyter notebooks, but until now the notebook kernels have been restricted to Python 2 and Python 3. I am delighted to announce that the R kernel has been added on the notebook side to the existing Python 2 and Python 3 kernels. This means you can now create Jupyter notebooks that run R:

Jupyter Notebooks provide a delightful interface for quickly running code, visualizing data, exploring insights, and trying out ideas:

Jupyter Notebooks run on any OS and modern browser. Notebooks, at a high level, consist of two main types of "cells" – markdown cells for documentation and executable code cells. After editing a cell, press Shift+Enter to run it:

This only scratches the surface of what you can do with Notebooks – for a few short and longer tutorials, see the links at the bottom of this post. 

A Standalone Data Playground

Simply click +New, get a blank notebook, enter some R code, and compute away:

Explore Your Azure ML Studio Datasets

Want to know more about a dataset? Simply select it, then choose to open it in a Notebook and explore away. Your dataset is automatically available as a data frame:

Explore Your Azure ML Studio Intermediate Datasets

Sometimes you need to check out a dataset in between phases. There is now an easy way to do this. First, add a Convert-to-CSV node. Then right click on it and open in a Notebook. Your data will be available as a data frame as in the above case:

Author Code Snippets "Execute R Script" Modules in Experiments

Currently you can add R and Python code modules in your experiments by editing them directly in the embedded editor. While convenient for short snippets, it does not provide an execution environment. You can use Notebooks to author and debug your modules and then paste them back into the experiment nodes instead. In the future we'll provide a way to insert the code directly into the script node in an experiment:

Your notebooks are persisted in your workspace and can be used in subsequent sessions. You can see a list of your notebooks by clicking on the Notebook tab. Notebooks can be renamed, deleted, copied, etc. from either the Studio or from Jupyter directly and both environments will sync up.

Using the Azure-ML R Package

The notebook environment has several packages pre-installed. The list of installed packages closely mirrors the packages already available in the Azure ML Studio.

In addition, the notebook environment has the Azure-ML R package loaded. This package makes it easy to:

  • Query your workspace and list available datasets, experiments and web services.
  • Download datasets and intermediate datasets from Azure ML Studio to your R environment.
  • Publish and consume web services.

Exploring your datasets and experiments from within the notebook (or any IDE for that matter) is easy:

You can actually slice, dice, and store the modified dataset back into Azure ML Studio. These and similar functionality is available via the Azure ML R package.

Remember that the Azure ML R package is pre-installed for you, on the Jupyter Notebook service in Azure ML Studio

Execution Environment

The Notebook environment currently supports R-3.1.1, matching with the version of R running in Azure ML Studio.

Access to external internet sites is restricted. However, we have white listed a number of important URLs:

  • All CRAN mirrors are on the white list, so you should be able to install packages using your favorite CRAN mirror.
  • Github is also white listed, meaning you can use devtools::install_github() to install packages that are not on CRAN, or get the development version of a package.

If you are inactive for more than one hour, your Notebook Server will be reclaimed. Notebooks are check-pointed regularly and the latest saved version will appear in your Studio workspace. You can also manually click Save on the menu bar as well as download the Notebook to your local machine.

Preview Limitations

The following limitation currently exist and will likely be changed in the future:

  • Network access is limited to Azure (with the exception of the white listed sites mentioned above). You can place your data in various stores in Azure and access them in Python (Azure SDK) or Azure ML Studio.
  • You cannot upload text files or create folders.

Roadmap

While there is a lot of functionality already available in this preview, we consider this a baby step. We have a lot of exciting plans for Notebook scenarios in the coming year. We have a close working relationship with the Jupyter team and will work with them to incorporate and rollout updated versions as soon as they're stable. Some of the ideas we're exploring include:

  • Deeper Azure ML Studio integration.
  • Better code completion (IntelliSense).
  • Integrated debugging.
  • Dashboarding.
  • Authoring experiments and publishing entirely from within Notebooks.
  • PowerShell integration.
  • Improved Notebook sharing support.
  • Git integration.
  • Publishing your own Notebooks in the Azure Gallery.

Help Us Improve Jupyter Notebooks on Azure

Want to make sure your idea is on the roadmap? Want to help us prioritize features? Please check out this one-minute survey and let us know what you think:
https://www.surveymonkey.com/s/JupyterOnAzureML

Conclusion

Jupyter is one of the most important innovations in the data science and technical computing space in recent years. You now have full access to its power from any OS, from any modern browser directly from inside Azure ML Studio. You can choose whichever canvas makes the most sense at that particular moment. The two work together hand in hand to ensure a productive and delightful experience for you.

Try it Out!

Notebooks are easy and fun to use – give it a try right now:

  • Go to https://studio.azureml.net and select "Get started".
  • Select Guest or, better yet, create an account and login so your Notebooks persist.
  • Click on Datasets left-tab, then "Samples" on top, then "Movie Ratings".
  • At the bottom of the page, select "Open in Notebook", R.
  • Note that your authentication and conversion to dataframe code is set up for you.
  • Paste in this code into a new cell:

options(repr.plot.width=6, repr.plot.height=3)

hist(dat$Rating, main = "Movie Ratings")

  • From the top Menu select "Cell / Run All".

You should see a summary of your data, plus a histogram of the ratings:

Credits

We'd like to thank the awesome Jupyter team, especially Fernando, Brian, Min and Kyle for their work on Jupyter and their continuous support.

Resources

The following links provide further information on Jupyter and Azure ML:

Using R in Azure ML:

Jupyter:

Azure Machine Learning Studio:

R Tools for Visual Studio:

Andrie
@RevoAndrie