Announcing Microsoft R on Apache Spark and R Client at the Hadoop Summit

This post was authored by Nagesh Pabbisetty, Partner Director of PM, Microsoft R.

This week Microsoft will be joining thousands of people attending Hadoop Summit in San Jose to explore the technology and business of big data and data science. As part of our participation in the conference, I’m happy to announce today that Microsoft has integrated support for Apache Spark into Microsoft R Server for Hadoop, bringing Spark’s speed advantages within the reach of R users with on-premises installations.

This comes in addition to the previously announced R Server for HDInsight that allows users to do predictive modeling and machine learning on a managed Spark environment in Azure. With this milestone, we continue to deliver on our promise of bringing advanced analytics to where the data is.

  • Power of Microsoft R on Apache Spark: Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train models on data 1000 times larger. Furthermore, when comparing R Server on a five node Spark cluster to open source R with CRAN algorithms which can only run on a single server, R Server ran GLM 125 times faster on five times the hardware, showing the combined speed of R Server’s parallelized algorithms and Spark’s in-memory architecture.
  • Free R Client for data scientists: To further empower data scientists, we also recently announced Microsoft R Client, a new freely available tool for data scientists to build high performance analytics using R. R Client not only allows you to use any of the open source R functions to analyze the data present on your local workstation, it also enables you to analyze remote big data and scale out the analytics by pushing the computation to a production instance of Microsoft R Server such as SQL Server R Services, R Server for Hadoop and HD Insight with Spark.

If you’re interested in learning more, watch this short video on Microsoft’s Channel 9 describing our Spark support in Microsoft R Server version 8.0.5. If you’d like to test drive R Server on Spark and Hadoop in the Azure cloud, you can start for free at azure.microsoft.com. With your Azure account, you can spin-up a cluster with R Server and Spark on Azure HDInsight. Download Microsoft R Client today!

We are pleased to announce DeployR, a component of Microsoft R Server, has undergone major architecture improvements making it  easier to use, with more choices of supported repository databases and more secure than ever with improved Web security features for better protection against malicious attacks, improved installation security, and improved Security Policy Management.

By providing analytics as web services, DeployR solves key integration and operationalization problems faced by those adopting R-based analytics alongside existing IT infrastructure. These services make it easy for application developers to collaborate with data scientists to integrate R analytics into their applications without any R programming knowledge.

It’s an exciting time for R users. We have much more to tell you and look forward to seeing you at the Hadoop Summit in San Jose. Stop by our booth or join Joseph Sirosh, corporate vice president, Microsoft, for his keynote on June 28.