Announcing Microsoft Machine Learning Library for Apache Spark

This post is authored by Roope Astala, Senior Program Manager, and Sudarshan Raghunathan, Principal Software Engineering Manager, at Microsoft. We’re excited to announce the Microsoft Machine Learning library for Apache Spark – a library designed to make data scientists more productive on Spark, increase the rate of experimentation, and leverage cutting-edge machine learning techniques –… Read more

End-to-End Scenarios Enabled by the Data Science Virtual Machine: Webinar Video

This post is authored by Barnam Bora, Program Manager in the Algorithms & Data Science team at Microsoft. Microsoft’s Data Science Virtual Machine (DSVM) is a family of popular VM images in Windows Server & Linux flavors that are published on the Microsoft Azure Marketplace. They have a curated but broad set of pre-configured machine… Read more

Using Microsoft’s Deep Learning Toolkit with Spark on Azure HDInsight Clusters

This post is authored by Miruna Oprescu, Software Engineer, and Mary Wahl, Data Scientist at Microsoft Have you ever wondered what it would be like to combine the power of deep learning with the scalability of distributed computing? Say no more! We present a solution that uses leading-edge technologies to score images using a pre-trained… Read more

Embarrassingly Parallel Image Classification, Using Cognitive Toolkit and TensorFlow on Azure HDInsight Spark

This post is by Mary Wahl, Data Scientist, T.J. Hazen, Principal Data Scientist Manager, Miruna Oprescu, Software Engineer, and Sudarshan Raghunathan, Principal Software Engineering Manager, at Microsoft. Summary Deep neural networks (DNNs) are extraordinarily versatile and increasingly popular machine learning models that require significantly more time and computational resources for execution than traditional approaches. By… Read more

End-to-End Data Science Walkthrough with Spark 2.0 on Azure HDInsight Hadoop Clusters

This post is authored by Debraj GuhaThakurta, Senior Data Scientist, and Brad Severtson, Senior Content Developer, at Microsoft. The data scientists among you would have seen how Spark 2.0, which released in July 2016, offered several enhancements over Spark 1.6. These enhancements included: Easier ANSI SQL and more streamlined APIs. Improvements in the speeds of… Read more

Build & Deploy Machine Learning Apps on Big Data Platforms with Microsoft Linux Data Science Virtual Machine

This post is authored by Gopi Kumar, Principal Program Manager in the Data Group at Microsoft. This post covers our latest additions to the Microsoft Linux Data Science Virtual Machine (DSVM), a custom VM image on Azure, purpose-built for data science, deep learning and analytics. Offered in both Microsoft Windows and Linux editions, DSVM includes… Read more

Moving eBird to the Azure Cloud

Re-posted from the Azure Data Lake & HDInsight blog. Hosted by the Cornell Lab of Ornithology, eBird is a citizen science project that allows birders to submit observations to a central database. Birders seek to identify and record the birds that they discover, and can also report how much effort it took to find those… Read more

Introducing Microsoft R Server 9.0

This post is authored by Nagesh Pabbisetty, Partner Director of Program Management at Microsoft. To thrive in today’s data-driven world, businesses increasingly need more powerful analytics solutions to predict customer behavior and discover new opportunities. However, existing solutions often fail to deliver enough insights, fast enough. At Microsoft, we continue to invest deeply in advanced… Read more

Free Online Workshop on Cortana Intelligence Suite: Register Now!

Get Live, Step-by-Step Guidance from Microsoft Experts This post is authored by Matthew Calder, Senior Content Developer at Microsoft. Join us on Microsoft Virtual Academy on Tuesday December 6th 2016, from 9AM – 4PM Pacific, for an exciting look at the Cortana Intelligence Suite (CIS), and end your day with a fully working intelligent web… Read more

Data Manipulation at Scale with Microsoft R Server & Spark on Azure HDInsight

Re-posted from the Revolutions blog. Dealing with distributed data and having to program concurrent systems is not always the easiest of tasks, and data scientists familiar with R are unlikely to have extensive experience with such systems. In such scenarios, Spark offers a very popular, intuitive distributed data processing platform, with R and Python APIs… Read more