Introducing Jupyter Notebooks in Azure ML Studio

Posted by Shahrokh Mortazavi, Partner Director of Program Management at Microsoft.

Azure ML Studio is a powerful canvas for the composition of machine learning experiments and their subsequent operationalization and consumption. Although the Studio provides an easy to use, yet powerful, drag-drop style of creating experiments, you sometimes need a good old “REPL” to have a tight loop where you enter some script code and get a response. I am delighted to announce that we’ve now integrated this functionality into ML Studio through Jupyter Notebooks:

Jupyter Notebooks provide a delightful interface for quickly running code, visualizing data, exploring insights, and trying out ideas:

Jupyter Notebooks run on any OS and modern browser. Notebooks, at a high level, consist of two main types of “cells” – markdown cells for documentation and executable code cells. After editing a cell, press Shift+Enter to run it:

Jupyter Notebooks also provide special commands (“magics”) that act as macros:

And also an escape character (“!”) to access the shell:

This only scratches the surface of what you can do with Notebooks – for a few short and longer tutorials, see the links at the bottom of this post.

Integration with Studio

In this preview, the Notebook service supports several core scenarios:

A Standalone Data Playground

Simply click +New, get a blank notebook, enter some Python, perhaps import some data from Azure blob storage and compute away: 

Explore Your Azure ML Datasets

Want to know more about a dataset? Simply select it, then choose to open it in a Notebook and explore away. Your dataset is automatically available as a Pandas dataframe:

Inspect Intermediate Data in an Experiment

Sometimes you need to check out a dataset in between phases. There is now an easy way to do this, first, add a convert-to-csv node. Then right click on it and open in a Notebook. Your data will be available as a Pandas dataframe as in the above case:

Author Code Snippets for Python Modules in Experiments

Currently you can add R and Python code modules in your experiments by editing them directly in the embedded editor. While convenient for short snippets, it does not provide an execution environment. You can use Notebooks to author and debug your modules and then paste them back into the experiment nodes instead. In the future we’ll provide a way to insert the code directly into the script node in an experiment:

Your notebooks are persisted in your workspace and can be used in subsequent sessions. You can see a list of your notebooks by clicking on the Notebook tab. Notebooks can be renamed, deleted, copied, etc. from either the Studio or from Jupyter directly and both environments will sync up.

Azure ML Client SDK

Enumerating and exploring your datasets and experiments from within the notebook (or any IDE for that matter) is pretty easy: 

You can actually slice, dice, and store the modified dataset back into Azure ML. These and similar functionality is available via the recently enhanced Azure ML Client SDK. 

Additionally, you can use the Python Azure SDK to access a wide variety of services in Azure. These including operations such as Storage, Service Management, etc.:

Note: Both the Azure SDK and the Azure ML Client SDK are preinstalled for you.

Execution Environment

The Notebook environment currently supports Python 2 and Python 3. We will be adding full R support in the near future. When you start up a Notebook, you have the full Anaconda 64-bit distro available to you. The full list of pkgs can be found here. The most relevant ones are: numpy/scipy, pandas, matplotlib, scikit-learn. For the curious, the Notebook service runs on Ubuntu 14.04.02 under Docker. Shell commands are available via the “!” escape character.

If you are inactive for more than one hour, your Notebook Server will be reclaimed. Notebooks are check-pointed regularly and the latest saved version will appear in your Studio workspace. You can also manually click Save on the menu bar as well as download the Notebook to your local machine. 

Preview Limitations

The following limitation currently exist and will likely be changed in the future:

  • Network access is limited to Azure. You can place your data in various stores in Azure and access them in Python (Azure SDK) or Azure ML Studio.

  • While the notebooks support Python 2 and Python 3, operationalization (web service) only supports Python 2.

  • Some of the Azure ML algorithms are not yet available while in Notebooks (use scikit-learn, pybrain, statsmodels, etc).

  • You can’t upload text files, create folders or terminals.

Roadmap

While there is a lot of functionality already available in this preview, we consider this a baby step. We have a lot of exciting plans for Notebook scenarios in the coming year. We have a close working relationship with the Jupyter team and will work with them to incorporate and rollout updated versions as soon as they’re stable. Some of the ideas we’re exploring include:

  • Full R support (RRE, RRO)

  • Deeper Azure ML Studio integration

  • Deeper intellisense

  • Integrated debugging

  • Dashboarding

  • Authoring experiments and publishing entirely from within Notebooks

  • PowerShell integration

  • Improved Notebook sharing support

  • Git integration

  • Publishing your sample Notebooks in the Marketplace

Help Us Improve Jupyter Notebooks on Azure

Want to make sure your idea is on the roadmap? Want to help us prioritize features? Please check out this 1 minute survey and let us know what you think:

https://www.surveymonkey.com/s/JupyterOnAzureML

Conclusion

Jupyter is one of the most important innovations in the data science and technical computing space in recent years. You now have full access to its power from any OS, from any modern browser directly from inside Azure ML Studio. You can choose whichever canvas makes the most sense at that particular moment. The two work together hand in hand to ensure a productive and delightful experience for you.

Try it Out!

Notebooks are easy and fun to use – give it a try right now:

  • Go to http://studio.azureml.net and select “Get started”

  • Select Guest or, better yet, create an account and login so your Notebooks persist

  • Click on Datasets left-tab, then “Samples” on top, then “Movie Ratings”

  • At the bottom of the page, select “Open in Notebook”, Python 2

  • Note that your auth and conversion to dataframe code is set up for you

  • Paste in this code into a new cell:

frame.describe()

frame['Rating'].head(200).plot(figsize=(12,4))

  • From the top Menu select “Run All”

You should see a summary of your data, plus a plot of the 1st 200 ratings:

Credits

We’d like to thank the awesome Jupyter team, especially Fernando, Brian, Min and Kyle for their work on Jupyter and their continuous support. 

Resources

The following links provide further information on Jupyter and Azure ML:

Jupyter

Azure ML

Python Tools for Visual Studio

Anaconda and key pkgs

 

Shahrokh