Azure ML: Now With Even More Python!

This post is authored by Shahrokh Mortazavi, Partner Director of Program Management, Microsoft Azure Machine Learning

Hello again Python enthusiasts!  In a previous post I discussed how PTVS can be used as a powerful Data Science workbench. I'm very excited to talk about two important new Azure ML features for Python users:

Azure ML Studio Now Supports Python

As you know, the Studio already supported running R scripts. You now have the same capability with Python, backed by its rich ecosystem of libraries. Simply type or paste in your Python script and it will be run under CPython 2.7 (64bit) with access to the Anaconda Distro.

Azure ML Python SDK

This SDK provides programmatic access to your Experiments and Datasets in Azure ML. Thus far, these were available only via the Studio, but you can now access, manipulate and upload these via the SDK. 

Let's look at a simple scenario where you can author some Python code, debug it, use it as script in the Studio and use IPython to visualize some intermediate data.

We'll be using the Iris dataset that's already available on AzureML: 

Next I will click on “Generate Data Access Code” to get a Python snippet which enables secure access to my experiments and data: 

Here I have PTVS up, the access code pasted in, with the debugger at a breakpoint so I can inspect the data. Note that you can use any Python IDE or environments of your choice, including IPython: 

Here you can enter the relevant scikit-learn data processing or modeling code as needed. With the code verified, let us Alt-Tab to the Studio and run the code on Azure ML. 

I've created a simple experiment to grab the Iris data to use with my debugged Python script:

The “Execute Python Script” node is where I’ve added my Python code (just as you’ve done with R before).  I’ve also added a “Convert to CSV” node so it can be read by the Python SDK and converted into a Pandas dataframe.    

Now I would like to take a look at my data while it's in flight between the DAG nodes to do some data debugging. I’ll right-click to get my Data Access code again.  For this exercise, I'll quickly fire up IPython and use Bokeh to visualize the data. Note that your instance of IPython could be anywhere – local, in the cloud, console or notebook: 

Conclusion

Azure ML now does Python! This is a major step forward as the two main languages used in data science, namely R and Python, are now fully supported. You can use scikit-learn, pybrain, statsmodel, pandas, bokeh, etc. to do a variety of data science tasks in an easy to use language. Additionally, you can use the SDK to do things such as download, upload or enumerate your data and experiments easily which enables manipulation and visualization programmatically or interactively from PTVS, IPython, or any other environment. 

For further documentation on using Python on Azure ML, please use to the resources below:

Official docs for Azure ML Python Client Library

Official docs for Execute Python Script module:

Shahrokh