Hadoop on Linux on Azure (1)


In this blog series, we set up a Hadoop cluster on Azure using virtual machines running Linux. More specifically, we use the HDP 2.1 on Linux distribution by Hortonworks that also provides the HDP distributions for the Windows platform. Furthermore, we install Hadoop with Ambari, an Apache project that provides an intuitive UI for provisioning, managing and monitoring a Hadoop cluster.

Contents

1 Introduction
2 Step-by-Step: Build the Infrastructure
3 Step-by-Step: Install a Hadoop Distribution

Introduction

While HDInsight is the Platform as a Service (PaaS) option for building and running a Hadoop cluster in Microsoft Azure, this article specifies its IaaS (Infrastructure as a Service) counterpart. With the IaaS option you have more flexibility in the choice of Hadoop distributions, Hadoop components and platform (e.g. Linux), amongst others.

This blog series elaborates the install of Hortonworks’ Hadoop distribution for Linux, HDP 2.1 for Linux. Alternatives for commercial Hadoop distributions on Linux include Cloudera (CDH) and MapR. Moreover, we will use CentOS as the Linux platform. In the end, we will have a four-node Hadoop cluster: one master node (also called NameNode) and three worker nodes (also called DataNode):

0 architecture 3

We heavily base our step-by-step guide on Benjamin’s great article How to install Hadoop on Windows Azure Linux virtual machines and Hortonworks’ documentation Hortonworks Data Platform – Automated Install with Ambari.

Before installing a Hadoop distribution though, the required environment needs to be prepared. Thus, the next article walks through the infrastructure setup for such a cluster on Microsoft Azure.

Comments (5)

  1. amritahyd says:

    wow!! very nice blogs and one of the my suggestion best read hat training center in Hyderabad.
    http://goo.gl/v65qsG

  2. Chandra28 says:

    wow!! very nice blogs and one of the my suggestion best read hat training center in Hyderabad.
    http://goo.gl/v65qsG

  3. Besant Technologies says:

    Hi,Excellent information.Thanks a lot.

    http://www.hadooptrainingchennai.co.in">Hadoop Training in Chennai

  4. bytes online training says:

    this is very good nice article. this is very useful for http://www.bytesonlinetraining.com/hadoop-online-training/">Hadoop students.

    ==========================================================
    This is very nice article. This is very use ful for Hadoop Learners.
    http://www.bytesonlinetraining.com/hadoop-online-training/
    ===========================================================

    hi sir. i want to do Hadoop training.
    http://www.bytesonlinetraining.com/hadoop-online-training/">Hadoop ONlINE TRAINING Thanks for providing valuable information.

  5. Viraf Karai (Seattle WA) says:

    One of the nicer blogs that I have read and followed closely. I had to do exactly what the author did – install HDP on a cluster of Linux machines (VMs) on Azure. Thanks to Olivia, it turned out to be reasonable. The number of steps to get the cluster
    up and running is more than I expected, so writing automated scripts will be useful.