Ten Things You Can Do on the Microsoft Data Science Virtual Machine

Posted by Gopi Kumar, Senior Program Manager in the Microsoft Data Group.

In November last year, we announced the availability of the Microsoft Data Science Virtual Machine (DSVM), an operating system image we published in the Azure Marketplace with a host of popular data science tools pre-installed and pre-configured. In January this year, we updated the image to include the Microsoft R Server, an enterprise class analytics platform based on the R language, and also supported Jupyter notebooks for browser based data exploration in both R as well as Python.

DSVM

The Microsoft DSVM offers a powerful development environment for all your data analytics and modeling tasks. The DSVM makes it easy to get started quickly with your data science projects for cloud, on-premises, or hybrid deployments. The DSVM is able to read and process data to and from various Azure data and analytics technologies like Azure SQL Data Warehouse, Azure Data Lake, HDInsight, Blob Storage, DocumentDB and Azure Machine Learning.

The DSVM will save you time that would otherwise be spent in discovering, installing, configuring and maintaining the right software for your data science tasks. It uses the power of Azure to dynamically scale your environment on-demand and you only pay for what you use. You have full administrative control over the VM and can extend it with other tools as per your needs.

Since its launch, a number of users both outside and within Microsoft have had an opportunity to use the DSVM. We’ve heard lots of good ideas from people who have used the virtual machine for their data science projects and run labs for their data analytics training classes. We have also heard some feedback about how it wasn’t always clear which tools were already on the VM, or how other services on Azure could be accessed from the DSVM. Based on this, we put together an article titled “Ten things you can do on the Data science Virtual Machine” describing how to use the DSVM to perform typical tasks you may encounter as part of your analytics projects. Here’s a recap of the ten things you can do with DSVM, in no particular order:

  • Explore data and develop models locally on the DSVM using Microsoft R Server or Python.
  • Use a Jupyter notebook to experiment with your data on a browser using Python 2, Python 3 or R.
  • Operationalize models built using R and Python on Azure ML, so client applications can access your models using a simple web services interface.
  • Administer your Azure resources using Azure Portal or PowerShell.
  • Extend your storage space and share large scale datasets / code across your whole team by creating an Azure File Storage as a mountable drive on your DSVM.
  • Share code with your team using GitHub and access your repository using the pre-installed Git clients – Git Bash, Git GUI or Visual Studio Community Edition.
  • Access various Azure data and analytics services like Azure blob storage, Azure Data Lake, Azure HDInsight (Hadoop), Azure DocumentDB, Azure SQL Data Warehouse and databases.
  • Build reports and dashboards using the Power BI Desktop pre-installed on the DSVM and deploy them on the cloud.
  • Dynamically scale your DSVM to meet your project needs.
  • Install additional tools on your virtual machine.

We encourage you to check out the article for more detailed information on each of these tasks. These are just a few things you can do on the Microsoft Data Science Virtual Machine. We would love to know how you are using the DSVM and whether it is helping you become more productive with your data science projects. We would also welcome your suggestions for how we can improve your experience – you can either comment on this post or on our article

Gopi