Microsoft, Hadoop and Big Data

In a couple of weeks it will be my one year anniversary here at Microsoft and I couldn’t wish for a better anniversary gift: now that Microsoft has laid out its roadmap for Big Data, I’m really excited about the role that Apache HadoopTM plays in this.

In case you missed it, Microsoft Corporate Vice President Ted Kummert earlier today announced that we are adopting Hadoop by announcing plans to deliver enterprise class Apache Hadoop based distributions on both Windows Server and Windows Azure.

This news is loaded with goodies for the big data community, broadening the accessibility and usage of Hadoop-based technologies among developers and IT professionals, by making it available on Windows Server and Windows Azure.  

But there is more. Microsoft will be working with the community to offer contributions for inclusion into the Apache Hadoop project and its ecosystem of tools and technologies.

I believe that all of this will really benefit not only the broader Open Source community by enabling them to take their existing skill sets and assets use them on Windows Azure and Windows Server, but also developers, our customers and partners. It is also another example of our ongoing commitment to providing Interoperability, compatibility and flexibility.

As a proud member of the Apache Software Foundation, I personally could not be happier to see how Microsoft is willing to engage in such an important Open Source project and community.

Technical Considerations

On the more technical front, we have been working on a simplified download, installation and configuration experience of several Hadoop related technologies, including HDFS, Hive, and Pig, which will help broaden the adoption of Hadoop in the enterprise.

The Hadoop based service for Windows Azure will allow any developer or user to submit and run standard Hadoop jobs directly on the Azure cloud with a simple user experience.

Let me stress this once again: it doesn’t matter what platform you are developing your Hadoop jobs on -you will always be able to take a standard Hadoop job and deploy it on our platform, as we strive towards full interoperability with the official Apache Hadoop distribution.

This is great news as it lowers the barrier for building Hadoop based applications while encouraging rapid prototyping scenarios in the Windows Azure cloud for Big Data.

To facilitate all of this, we have also entered into a strategic partnership with Hortonworks that enables us to gain unique experience and expertise to help accelerate the delivery of Microsoft’s Hadoop based distributions on both Windows Server and Windows Azure.

For developers, we will enable integration with Microsoft developer tools as well as invest in making Javascript a first class language for Big Data. We will do this by making it possible to write high performance Map/Reduce jobs using Javascript. Yes, Javascript Map/Reduce, you read it right.

For end users, the Hadoop-based applications targeting the Windows Server and Windows Azure platforms will easily work with Microsoft’s existing BI tools like PowerPivot and recently announced Power View, enabling self-service analysis on business information that was not previously accessible. To enable this we will be delivering an ODBC Driver and an Add-in for Excel, each of which will interoperate with Apache Hive. 

Finally, in line with our commitment to Interoperability and to facilitate the high performance bi-directional movement of enterprise data between Apache Hadoop and Microsoft SQL Server, we have released two Hadoop-based connectors for SQL Server to manufacturing.

The SQL Server connector for Apache Hadoop lets customers move large volumes of data between Hadoop and SQL Server 2008 R2, while the SQL Server PDW connector for Apache Hadoop moves data between Hadoop and SQL Server Parallel Data Warehouse (PDW). These new connectors will enable customers to work effectively with both structured and unstructured data.

I really look forward to sharing updates on all this as we move forward. For now, check out www.microsoft.com/bigdata and check back on the DPI blog tomorrow.

Gianugo