Microsoft announces major commitment to Apache Spark

This post was authored by Tiffany Wissner, Senior Director of Data Platform Marketing.

This week in San Francisco, thousands of people are at Spark Summit, to explore and understand how they leverage Apache Spark to get the most out of big data. Building on our previous investments, today we are announcing an extensive commitment for Spark to power Microsoft’s big data and analytics offerings including Cortana Intelligence Suite, Power BI, and Microsoft R Server:

  • Spark for Azure HDInsight General Availability, previously announced as public preview, Spark for Azure HDInsight generally available today, and introducing a fully managed Spark service from Hortonworks that has been hardened for the enterprise and made simpler for you to use. You can also rely on the industry’s highest availability service level agreement for Spark at 99.9%. You can get value out of Spark immediately with out-of-the-box integration with Jupyter, the most popular open source notebook for data scientists.
  • R Server for HDInsight in the cloud powered by Spark, previously announced as public preview, R Server for HDInsight will be generally available in the summer making the Spark integration available both on-premises and in the cloud. This makes it easy to move code and projects to the cloud with a few clicks and within a few minutes without buying hardware or hiring specialized operations teams typically associated with big data infrastructure.
  • R Server for Hadoop on-premises now powered by Spark, as the leading solution in the world to run R at scale, R Server for Hadoop will support both Microsoft R and native Spark execution frameworks available in June. Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train your models on data 1000x larger and 100x faster than was possible with open source R and nearly 2x faster than Spark’s own MLLib.
  • Free R Client for Data Scientists, today we are announcing Microsoft R Client, a new freely available tool for data scientists to build high performance analytics using R.  R Client not only allows you to use any of the open source R functions to analyze the data present on your local workstation, it also enables you to analyze remote big data and scale out the analytics by pushing the computation to a production instance of Microsoft R Server such as SQL Server R Services, R Server for Hadoop and HD Insight with Spark. You can download Microsoft R Client today at http://aka.ms/rclient.
  • Power BI support for Spark Streaming, previously announced with Power BI General Availability, Spark support in Power BI is now expanded with new support for Spark Streaming scenarios. This allows you to publish real-time events from Spark Streaming directly into one of the fastest growing visualization tools in the market today.

It’s an exciting time for Spark users and for R users alike. We have much more to tell you and we’ll look forward to seeing you this week at Spark Summit in San Francisco so come by our booth or join Joseph Sirosh, corporate vice president, Microsoft for his keynote on Wednesday, June 8 at 9:20 AM PT.

It promises to be an exciting week! And if you’re unable to join us in person you can get lots more information on today’s announcement on our apache-spark and r-server sites. Also come and visit us back at this blog where we’ll be sharing additional insights on our news announcements.