Announcing the Availability of the Linux Data Science Virtual Machine

This post is by Gopi Kumar, Senior Program Manager in the Microsoft Data Group.

As a follow-up to the launch of the Windows Data Science Virtual Machine, we’re thrilled to announce the immediate availability of the Linux Data Science Virtual Machine on the Azure marketplace.

This is a custom VM image built on the OpenLogic CentOS-based Linux version 7.2 and contains several popular data science tools used by data scientists, developers, educators and researchers. Some of the common uses for the Data Science Virtual Machine include: performing advanced analytics to gain insights from data, building intelligent applications, running data science education/training classes, and running hackathons and competitions.

Linux DSVM

Thanks to Azure’s worldwide cloud infrastructure, customers now have on-demand access to a Linux environment to perform a wide range of data science tasks. The VM saves customers the time and effort of having to discover, install, configure and manage these tools individually. Hosting the data science VM on Azure ensures high availability, elastic capacity and a consistent set of tools to foster collaboration across your team.

In addition to the standard Linux utilities and the shell, some of the salient tools that are pre-installed and pre-configured on the Linux Data Science Virtual Machine include:

  • Microsoft R Open (with Intel Math Kernel Library).
  • Anaconda Python Distribution with Python 2.7 and 3.5.
  • Jupyter Notebooks with Python and R kernel for browser based data exploration and development.
  • Azure tools: Azure Command Line Interface for managing Azure resources, Azure Storage Explorer for working with Azure Blobs.
  • A local Postgres database instance.
  • Machine Learning Tools:
    • Azure ML: Productionize R and Python models built locally on the VM to our cloud based Azure ML service through pre-installed libraries.
    • Computational Network Toolkit (CNTK): A deep learning software from Microsoft Research.
    • Vowpal Wabbit: An ML system supporting techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
    • XGBoost: A tool providing fast and accurate boosted tree implementation.
    • Rattle (the R Analytical Tool To Learn Easily): A GUI tool that makes it very easy to get started with data analytics in R, with graphical data exploration, ML models and R code generation.
  • Development Tools: Azure SDK in Java, Python, Node.js, Ruby, PHP; Eclipse IDE with Azure Toolkit plugin; code editors like vim, gedit and Emacs (with ESS, auctex add-ons); SQL Server drivers and command line tools like bcp (Bulk Copy), sqlcmd (text based SQL Server query utility); SQuirreL SQL graphical client to access various databases.
  • Remote access on textual interface through an SSH client (like PuTTY or ssh command) or on a graphical desktop (needs separate one-time install of X2Go on your client machine).

In about 15 minutes you can standup your own data science VM within your subscription and you’ll be ready to jump right into data exploration and modeling immediately. You have full administrative access to the VM and can install additional software as needed. There’s no separate fee to use the VM image. You only pay for actual hardware compute usage of the virtual machine depending on the size of the VM you’re provisioning. You can turn off the VM from Azure portal when it’s not in use to avoid being billed for usage. When you restart the VM you can continue your work with all data and files intact. You can further augment your analytics on the data science virtual machine by leveraging services in Microsoft Azure and Cortana Intelligence Suite.

The Linux Data Science Virtual Machine provides you with a very productive Linux analytics environment where you can rapidly build advanced analytics solutions for deployment either to the cloud or on-premises or in a hybrid environment.

You can find the Linux Data Science Virtual Machine and Azure hardware compute pricing here. More information about this VM can be found by visiting this link. If you’re new to Azure, you can try the VM for free via a 30-day Azure free trial.

So go ahead and use the Linux Data Science Virtual Machine for your next analytics project, or in your next data science training session. We would appreciate your feedback on the VM so we can continue to serve your needs even better.

Linux DSVM