Celebrating with Hadoop and Red Hat communities

This post was authored by Joseph Sirosh, Corporate Vice President of the Data Group at Microsoft

This week I had the opportunity to represent Microsoft in keynotes at both the Hadoop and Red Hat Summits in San Jose and San Francisco, and during that time speak to many customers and members of the open source and big data communities. The energy in these communities is incredible and I’m proud to see Microsoft engage as an active participant and contributor. Below I want to share a quick summary of recent news related to both these events and communities.

Hadoop Summit

This year we celebrated 10 years of Hadoop and along with the rest of the community we’re working hard to make big data easier for customers, no matter the type of data, what they need to do or what platform they’re on. At Hadoop Summit, Hortonworks also announced Microsoft Azure HDInsight as its Premier Connected Data Platforms solution to deliver Apache Hadoop in a cloud environment. This announcement culminates a long-standing partnership with Hortonworks that started in 2011 when Hortonworks was three-months-old. It’s been rewarding to see how far Hadoop has come, and it’s now deployed in thousands of organizations. As an example, Jet.com is using Hadoop with HDInsight to help redefine the e-commerce category by providing consumers with completely transparent pricing that dynamically changes based on the actual costs of the transaction – warehouse location, payment method, and number of items shipped.

Looking back at our journey with Hadoop, it’s also been gratifying to see our contributions accelerate its adoption. Members of Microsoft have been contributing to the development of Apache YARN since its inception. We’ve also been leading or contributing to projects like bringing Hadoop onto Azure and Windows, speeding up the query processing of Hive, making cloud-based stores accessible via WebHDFS, and making Spark execution available through a REST endpoint. Recently, we also announced our commitment to Apache Spark at Spark Summit 2016 including:

  • Spark for Azure HDInsight General Availability, is a fully managed Spark service from Hortonworks that is enterprise ready and easy to use
  • R Server for HDInsight in the cloud powered by Spark, in preview today and will be generally available later this summer makes Spark integration easy no matter if you are working on premises or in the cloud.
  • R Server for Hadoop on-premises now powered by Spark, as the leading solution in the world to run R at scale, R Server for Hadoop now supports both Microsoft R and native Spark execution frameworks made available this week. Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train your models on data 1000x larger and 100x faster than was possible with open source R and nearly 2x faster than Spark’s own MLLib.
  • Free R Client for Data Scientists, a new free tool for data scientists to build high performance analytics using R.
  • Power BI support for Spark Streaming, General Availability, Spark support in Power BI now allows you to publish real-time events from Spark Streaming.

With our investments in R combined with Spark & Hadoop, statisticians and data scientists can rapidly train a variety of predictive models on large-scale data, limited only by the size of their Spark clusters. With Spark, R Server’s compiled code algorithms and transparent parallelization of regression, clustering, decision trees and other statistical algorithms speeds analysis 100x faster on terabytes of data.

Hadoop63016

Red Hat Summit

At Microsoft, we’re serious about building an intelligent cloud through a comprehensive approach that includes the open source ecosystem. Today, our cloud offerings range from support for Linux in Azure Virtual Machines – and nearly 1 in 3 VMs are running Linux today – to a Hadoop solution in HDInsight, or deep integration of Docker Swarm and Apache Mesos in Azure Container Service that represent our commitment to the ecosystem and highlight the value of our partnerships. In November, Microsoft and Red Hat announced a partnership to add value to the open source investments in the enterprise. At Red Hat Summit, we announced a number of important partnership milestones, including:

  • The general availability of .NET Core 1.0 and ASP.NET Core 1.0, a platform for creating modern applications for Windows, Linux and Mac OS X
  • In partnership with Red Hat and 21Vianet, this week we are extending support for Red Hat Enterprise Linux to Azure China operated by 21Vianet
  • Red Hat is making CloudForms 4.1 generally available, with deep support for Azure including state analysis, metrics, chargeback and retirement, making Azure the best supported cloud in CloudForms
  • Availability of a new OpenShift solution template on GitHub that makes it simple to deploy OpenShift in Azure

In March, we announced our plans to bring SQL Server to Linux, starting with a private preview. In the research note Microsoft Diversifies With Linux Support for SQL Server, Gartner wrote “SQL Server on Linux represents a bold statement that the company understands there is more to the overall IT world than just Windows and this flexibility is necessary to compete in the DBMS market.” Today, at the Red Hat Summit, I will show SQL Server running on Red Hat Enterprise Linux. Our goal is to make SQL Server the platform of choice to support any data, any application, on-premises or in the cloud, and providing you with platform choice. Bringing SQL Server to Red Hat Enterprise Linux will provide enterprise Linux customers with SQL Server’s mission-critical performance, industry-leading TCO, the least vulnerable database,[1] and hybrid cloud innovations like Stretch Database to access data on-premises or in the cloud.  We’ll first release the core relational database capabilities on Linux targeting mid-calendar year 2017.

MongoDB on Microsoft Azure

Additionally, this week, MongoDB announced MongoDB Atlas, a new elastic on-demand cloud service that will provide comprehensive infrastructure and management for its popular database. MongoDB Atlas will become available for Azure customers via a strategic partnership between the two companies. This partnership with MongoDB further reinforces Microsoft’s commitment to providing customers with open source solutions and the most comprehensive cloud platform on the market.

– Joseph


[1] National Institute of Standards and Technology, National Vulnerability Database statistics as of 2/1/2016.