New Additions to the Data Science Virtual Machine – Test Drive, Community Forums, Deep Learning

This post is authored by Paul Shealy, Senior Software Engineer, and Barnam Bora, Program Manager, at Microsoft.

The Data Science Virtual Machine (DSVM) is a custom virtual machine image from Microsoft that comes pre-installed with popular data science tools for modeling and development activities. The DSVM is offered in both Windows and Linux editions. There’s been a tremendous response to this offering from the data analytics community worldwide, and we continue to iterate and improve the experience. This post described a few key updates the DSVM, a set of features that will make it easier to try the DSVM for free before adopting it, obtain community-based support from users within and outside Microsoft, and run deep learning tools on Azure GPUs.

Free Test Drive

You can now try the Linux Data Science Virtual Machine for free on the Azure Test Drive. Test Drive is a frictionless way to try the VM before deciding to adopt it for your data science workloads. You can launch a VM instance with just a few clicks and explore it fully – no credit cards or Azure subscriptions needed. A test drive lasts eight hours, enough time for you to try several sample solutions or analyze your own dataset.

Once your trial starts, here are a few things you can do:

  • Use JupyterHub to view the sample Jupyter notebook in /dsvm/Notebooks.
  • Run the CNTK (now renamed as Microsoft Cognitive Toolkit) examples in /dsvm/tools/cnkt/Examples.
  • Try the “winningest” ML tool in data science competitions, XGBoost, in /dsvm/tools/xgboost.
  • Bring your own data and analyze it with R, Python, Rattle, CNTK, Vowpal Wabbit, XGBoost, or one of many other tools.
  • Access other Azure resources using the Azure ML package for R, azureml library for python, the Azure CLI, or Azure Storage Explorer.

When your Test Drive starts, you will receive instructions on connecting to your DSVM and more information on things to try.

After your Test Drive is over, it’s easy to deploy a paid Linux DSVM in the marketplace.


If you copied any data to your Test Drive VM, be sure to copy it off before your trial ends.

Community Forum

The DSVM team is always looking for better ways to communicate with customers, assist with issues, and receive and listen to your feedback. That’s why we recently launched a DSVM community forum as part of the Microsoft forums. It provides a way for users to ask questions, get assistance and provide feedback. Both Windows and Linux DSVMs are supported. You can search previous questions, vote for items that affect you and submit requests for new features.

A growing number of organizations around the world and people with diverse skillsets and requirements have started using the DSVM, and the types of problems people are solving with this toolset is quickly expanding. It is our intention to build and grow this community and promote healthy conversations, drive community based problem solving and knowledge sharing via this forum.

We encourage you to visit the forum and get your question answered today.

Deep Learning Toolkit on the DSVM

Deep learning is behind many recent breakthroughs in machine learning applications, including language translation with Skype, which was recently named one of the 7 greatest software innovations of the year by Popular Science, and speech recognition, where Microsoft recently achieved human-level parity at conversational speech.


Azure, Microsoft’s hyperscale cloud computing platform, recently announced the availability of virtual machines with GPUs. These VMs combine powerful hardware (NVIDIA Tesla K80 or M60 GPUs) with discrete device assignment to enable powerful new options for training deep neural networks.

The deep learning toolkit for the DSVM is a solution for the Windows DSVM that installs several GPU-accelerated tools for deep learning, CUDA, cuDNN, the GPU driver and several samples. With the same steps required to create a new VM, you can have a DSVM ready for deep learning on Azure GPUs.

GPU-accelerated deep learning tools available with the VM include mxnet and CNTK, with more on the way.

A few samples are also installed in the C:\dsvm\deep-learning folder, including:

  • Character recognition on the MNIST dataset.
  • Image classification on the CIFAR-10 dataset.
  • Neural artistic style, a way to extract the style of an image and apply it to a new image.

These solutions clearly describe the elements involved in building a deep learning solution and demonstrate the power of the GPUs – use them to explore what’s possible or as a starting point for your own projects.

Summary

With the Data Science Virtual Machine, you have a comprehensive set of tools to perform a whole range of data science activities including data movement, storage, exploration/visualization, modeling with ML and AI algorithms, and operationalization using multiple languages in both Linux and Windows environments. Now you can test drive the VM, engage with other users on the forum and explore deep learning too. A comprehensive list of tools on the DSVM can be found here.

There’s lots more information available in the resources listed below. Go ahead and try the DSVM for your next data science project or training session, we’d love to hear your feedback on the DSVM community forum and continue to improve your experience.

Paul & Barnam

 

Resources – Windows Edition

Resources – Linux Edition

Webinar