Setup 8

How to install Apache Hadoop 2.6.0 in Ubuntu (M... http://pingax.com/install-apache-hadoop-ubuntu-c...
PINGAX
(HTTP://PINGAX.COM/) BIG DATA ANALYTICS WITH R AND HADOOP
HOME (HTTP://PINGAX.COM/) R-BLOGGERS (HTTP://WWW.R-BLOGGERS.COM)
HADOOP (HTTP://PINGAX.COM/CATEGORY/HADOOP/) R (HTTP://PINGAX.COM/CATEGORY/R/)
MACHINE LEARNING (HTTP://PINGAX.COM/CATEGORY/MACHINE-LEARNING/)
CONTACT US (HTTP://PINGAX.COM/CONTACT-US/)
How to install Apache Hadoop 2.6.0 in Ubuntu (Multi node/Cluster setup)

Home (http://pingax.com) / How to install Apache Hadoop 2.6.0 in Ubuntu (Multi node/Cluster setup)
 APR 20, 2015 (HTTP://PINGAX.COM/2015/10/21/)  VIGNESH PRAJAPATI

 HADOOP (HTTP://PINGAX.COM/CATEGORY/HADOOP/)
 30 COMMENTS (HTTP://PINGAX.COM/INSTALL-APACHE-HADOOP-UBUNTU-CLUSTER-SETUP/#COMMENTS)
HOW TO INSTALL APACHE HADOOP 2.6.0 IN UBUNTU (MULTI

NODE/CLUSTER SETUP)
As you have reached on this blogpost of Setting up Multinode Hadoop cluster, I may believe that you have
already read and experimented with my previous blogpost on HOW TO INSTALL APACHE HADOOP 2.6.0 IN
UBUNTU (SINGLE NODE SETUP) (http://pingax.com/install-hadoop2-6-0-on-ubuntu/). If not then �rst I would
like to recommend you to read it before proceeding here. Since we are interested to setting up Multinode
Hadoop cluster, we must have multiple machines to be �t with in Master- Slave architecture.
1 of 16 Wednesday 21 October 2015 10:48 PM

Here, Multinode Hadoop cluster as composed of Master-Slave Architecture to accomplishment of BigData

processing which contains multiple nodes. So, in this post I am going to consider three machines (One as
MasterNode and rest two are as SlaveNodes) for setting up Hadoop Cluster. Here there is a correlation
between number of computer in cluster with size of data and data processing technique. Hence heavier the
dataset (as well as heavier the data processing technique) require larger number of computer/nodes in
Hadoop cluster.
Let’s get started towards setting up a fresh Multinode Hadoop (2.6.0) cluster. Follow the given steps,
Prerequisites
1. Installation and Con�guration of Single node Hadoop :
Install and Con�ure Single node Hadoop which will be our Masternode. To get instructions over How to
setup Hadoop Single node, visit – previous blog http://pingax.com/install-hadoop2-6-0-on-ubuntu/
(http://pingax.com/install-hadoop2-6-0-on-ubuntu/).
2. Prepare your computer network (Decide no of nodes to set up cluster) :

Based on the helping parameters like Purpose of Hadoop Multinode cluster, Size of the dataset to be
processed and Availability of Machines, you need to de�ne no of Master nodes and no of Slave nodes to be
con�gured for Hadoop Cluster setup.
3. Basic installation and con�guration :

Step 3A : Hostname identi�cation of your nodes to be con�gured in the further steps. To Masternode, we

will name it as HadoopMaster and to 2 di�erent Slave nodes, we will name them as HadoopSlave1,
HadoopSlave2 respectively in /etc/hosts directory. After deciding a hostname of all nodes, assign their
names by updating hostnames (You can ignore this step if you do not want to setup names.) Add all host
names to /etc/hosts directory in all Machines (Master and Slave nodes).
# Edit the /etc/hosts file with following command

sudo gedit /etc/hosts
# Add following hostname and their ip in host table

192.168.2.14 HadoopMaster
192.168.2.15 HadoopSlave1
192.168.2.16 HadoopSlave2
Step 3B : Create hadoop as group and hduser as user in all Machines (if not created !!).
vignesh@HadoopMaster:~$ sudo addgroup hadoop

vignesh@HadoopMaster:~$ sudo adduser --ingroup hadoop hduser
If you require to add hdusers to sudoers, then �re following command
sudo usermod -a -G sudo hduser
OR
Add following line in /etc/sudoers/
hduser ALL=(ALL:ALL) ALL
Step 3C : Install rsync for sharing hadoop source with rest all Machines,
sudo apt-get install rsync
Step 3D : To make above changes re�ected, we need to reboot all of the Machines.

sudo reboot
Hadoop con�guration steps

1. Applying Common Hadoop Con�guration :
However, we will be con�guring Master-Slave architecture we need to apply the common changes in
Hadoop con�g �les (i.e. common for both type of Mater and Slave nodes) before we distribute these
Hadoop �les over the rest of the machines/nodes. Hence, these changes will be re�ected over your single
node Hadoop setup. And from the step 6, we will make changes speci�cally for Master and Slave nodes
respectively.
Changes:
1. Update core-site.xml
Update this �le by changing hostname from localhost to HadoopMaster
## To edit file, fire the below given command

hduser@HadoopMaster:/usr/local/hadoop/etc/hadoop$ sudo gedit core-site.xml
## Paste these lines into <configuration> tag OR Just update it by replacing

localhost with master
<property>
<name>fs.default.name</name>
<value>hdfs://HadoopMaster:9000</value>
</property>
2. Update hdfs-site.xml
Update this �le by updating repliction factor from 1 to 3.


hduser@HadoopMaster:/usr/local/hadoop/etc/hadoop$ sudo gedit hdfs-site.xml
## Paste/Update these lines into <configuration> tag

<property>
<name>dfs.replication</name>
<value>3</value>
</property>
3. Update yarn-site.xml
Update this �le by updating the following three properties by updating hostname from localhost to
HadoopMaster,

hduser@HadoopMaster:/usr/local/hadoop/etc/hadoop$ sudo gedit yarn-site.xml

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>HadoopMaster:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
</property>
<property>
<name>yarn.resourcemanager.address</name>
</property>
4. Update Mapred-site.xml
Update this �le by updating and adding following properties,


hduser@HadoopMaster:/usr/local/hadoop/etc/hadoop$ sudo gedit mapred-site.xml

<property>
<name>mapreduce.job.tracker</name>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
</property>
5. Update masters
Update the directory of master nodes of Hadoop cluster

hduser@HadoopMaster:/usr/local/hadoop/etc/hadoop$ sudo gedit masters
## Add name of master nodes

HadoopMaster
6. Update slaves
Update the directory of slave nodes of Hadoop cluster

hduser@HadoopMaster:/usr/local/hadoop/etc/hadoop$ sudo gedit slaves
## Add name of slave nodes

HadoopSlave1
HadoopSlave2
2. Copying/Sharing/Distributing Hadoop con�g �les to rest all nodes – master/slaves

Use rsync for distributing con�gured Hadoop source among rest of nodes via network.
# In HadoopSlave1 machine
sudo rsync -avxP /usr/local/hadoop/ hduser@HadoopSlave1:/usr/local/hadoop/
# In HadoopSlave2 machine
sudo rsync -avxP /usr/local/hadoop/ hduser@HadoopSlave2:/usr/local/hadoop/
The above command will share the �les stored within hadoop folder to Slave nodes with location –
/usr/local/hadoop. So, you dont need to again download as well as setup the above con�guration in
rest of all nodes. You just need Java and rsync to be installed over all nodes. And this JAVA_HOME path need
to be matched with $HADOOP_HOME/etc/hadoop/hadoop-env.sh �le of your Hadoop distribution which
we had already con�gured in Single node Hadoop con�guration.
3. Applying Master node speci�c Hadoop con�guration: (Only for master nodes)
These are some con�guration to be applied over Hadoop MasterNodes (Since we have only one master
node it will be applied to only one master node.)
Step 6A : Remove existing Hadoop_data folder (which was created while single node hadoop setup.)
sudo rm -rf /usr/local/hadoop_tmp/
Step 6B : Make same (/usr/local/hadoop_tmp/hdfs) directory and create NameNode

( /usr/local/hadoop_tmp/hdfs/namenode) directory
sudo mkdir -p /usr/local/hadoop_tmp/

sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
Step 6C : Make hduser as owner of that directory.
sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/
4. Applying Slave node speci�c Hadoop con�guration : (Only for slave nodes)

Since we have three slave nodes, we will be applying the following changes over HadoopSlave1,
HadoopSlave2 and HadoopSlave3 nodes.
Step 7A : Remove existing Hadoop_data folder (which was created while single node hadoop setup)
sudo rm -rf /usr/local/hadoop_tmp/hdfs/
Step 7B : Creates same (/usr/local/hadoop_tmp/) directory/folder,

an inside this folder again Create DataNode (/usr/local/hadoop_tmp/hdfs/namenode) directory/folder
sudo mkdir -p /usr/local/hadoop_tmp/

sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
Step 7C : Make hduser as owner of that directory
sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/
5. Copying ssh key for Setting up passwordless ssh access from Master to Slave node :
To manage (start/stop) all nodes of Master-Slave architecture, hduser (hadoop user of Masternode) need to
be login on all Slave as well as all Master nodes which can be possible through setting up passwrdless SSH
login. (If you are not setting this then you need to provide password while starting and stoping daemons on
Slave nodes from Master node).
Fire the following command for sharing public SSH key – $HOME/.ssh/id_rsa.pub �le (of HadoopMaster
node) to authorized_keys �le of hduser@HadoopSlave1 and also on hduser@HadoopSlave1 (in
$HOME/.ssh/authorized_keys)
hduser@HadoopMaster: ~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave

1
hduser@HadoopMaster: ~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave
2
6. Format Namenonde (Run on MasterNode) :

# Run this command from Masternode

hduser@HadoopMaster: usr/local/hadoop/$ hdfs namenode -format
7. Starting up Hadoop cluster daemons : (Run on MasterNode)
Start HDFS daemons:
hduser@HadoopMaster:/usr/local/hadoop$ start-dfs.sh
Start MapReduce daemons:
hduser@HadoopMaster:/usr/local/hadoop$ start-yarn.sh
Instead both of these above command you can also use start-all.sh, but its now deprecated so its not
recommended to be used for better Hadoop operations.
8. Track/Monitor/Verify Hadoop cluster : (Run on any Node)
Verify Hadoop daemons on Master :
hduser@HadoopMaster: jps
(http://pingax.com/wp-content
/uploads/2015/04/master.png)
Verify Hadoop daemons on all slave nodes :
hduser@HadoopSlave1: jps

hduser@HadoopSlave2: jps
(http://pingax.com/wp-content/uploads
/2015/04/jpsSLave11.png)
(As shown in above snap- The running services of HadoopSlave1 will be the same for all Slave nodes
con�gured in Hadoop Cluster.)
Monitor Hadoop ResourseManage and Hadoop NameNode via web-version,
If you wish to track Hadoop MapReduce as well as HDFS, you can also try exploring Hadoop web view of
ResourceManager and NameNode which are usually used by hadoop administrators. Open your default
browser and visit to the following links from any of the node.
For ResourceManager – Http://HadoopMaster:8088 (http://HadoopMaster:8088)

(http://pingax.com/wp-content/uploads/2015/04/master80881.png)
For NameNode – Http://HadoopMaster:50070 (http://HadoopMaster:50070)
(http://pingax.com/wp-content/uploads/2015/04/master500701.png)

If you are getting the similar output as shown in the above snapshot for Master and Slave noes then
Congratulations! You have successfully installed Apache Hadoop in your Cluster and if not then post your
error messages in comments. We will be happy to help you. Happy Hadooping.!! Also you can request me
([email protected]) for blog title if you want me to write over it.
(/install-apache-hadoop-ubuntu-cluster-setup/?format=pdf)
Google+ Comments

27 comments
Add a comment as Sarah Masud
Top comments
Robin Dong 1 month ago - Shared publicly

Do you have pig, hive or hbase installation tutorials on this hadoop 2.6 and ubuntu 12.04?
thank you so much.
· Reply
Aseem Patni 4 months ago - Shared publicly

Thanks for this amazing post. Really helped.
+1 · Reply
Robin Dong 1 month ago - Shared publicly

Thank you so much for this wonderful guide. I was able to setup my multi nodes by your steps with no errors.
thanks again.
· Reply
Selam Getachew 3 days ago - Shared publicly

Just used this post to build my Hadoop cluster. Excellent even if some point need to be detailed like the fact that on e
node hduser should have ownership of /usr/local ﬁle to transfer using rsync...
· Reply
p r i t bhalerao 6 days ago (edited) - Shared publicly

we are not getting this output.our conﬁgure capacity is getting 0%
plz tell me solution

Powered by Google+ Comments (http://3doordigital.com/wordpress/plugins/google-plus-comments/)
30 comments to “How to install Apache Hadoop 2.6.0 in Ubuntu (Multi

node/Cluster setup)”
You can leave a reply or Trackback (http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/trackback/)
this post.
(http://bit.ly/udemy_banner_link)
(http://www.bit.ly/1jX3mLu)

EMAIL SUBSCRIPTION
Your email here
Subscribe
Posts RSS (Really Simple Syndication) (http://pingax.com/feed/) Comments RSS (Really Simple
Syndication) (http://pingax.com/comments/feed/)
RECENT POSTS
SparkR with Rstudio in Ubuntu 12.04 (http://pingax.com/sparkr-with-rstudio-ubuntu-12-04/)
How to install Apache Hadoop 2.6.0 in Ubuntu (Multi node/Cluster setup) (http://pingax.com/install-apache-
hadoop-ubuntu-cluster-setup/)
Predictive analysis in eCommerce part-3 (http://pingax.com/predictive-analysis-ecommerce-part-3/)
How to install Apache Hadoop 2.6.0 in Ubuntu (Single node setup) (http://pingax.com/install-hadoop2-6-0-
on-ubuntu/)
Build Predictive Model on Big data: Using R and MySQL Part-3 (http://pingax.com/build-predictive-model-
big-data-using-r-mysql-part-3/)
CATEGORIES
Hadoop (http://pingax.com/category/hadoop/)
Machine Learning (http://pingax.com/category/machine-learning/)
MongoDB (http://pingax.com/category/mongodb/)
R (http://pingax.com/category/r/)
Spark (http://pingax.com/category/spark/)

Powered by WordPress (http://wordpress.org/). Designed by MageeWP Themes (http://www.mageewp.com/).

Setup 8

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Setup 8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Setup 8

Uploaded by

Copyright:

Available Formats

How to install Apache Hadoop 2.6.0 in Ubuntu (M... http://pingax.com/install-apache-hadoop-ubuntu-c...

HOME (HTTP://PINGAX.COM/) R-BLOGGERS (HTTP://WWW.R-BLOGGERS.COM)

HADOOP (HTTP://PINGAX.COM/CATEGORY/HADOOP/) R (HTTP://PINGAX.COM/CATEGORY/R/)

MACHINE LEARNING (HTTP://PINGAX.COM/CATEGORY/MACHINE-LEARNING/)

How to install Apache Hadoop 2.6.0 in Ubuntu (Multi node/Cluster setup)

 APR 20, 2015 (HTTP://PINGAX.COM/2015/10/21/)  VIGNESH PRAJAPATI

HOW TO INSTALL APACHE HADOOP 2.6.0 IN UBUNTU (MULTI

1 of 16 Wednesday 21 October 2015 10:48 PM

Here, Multinode Hadoop cluster as composed of Master-Slave Architecture to accomplishment of BigData

2. Prepare your computer network (Decide no of nodes to set up cluster) :

3. Basic installation and con�guration :

2 of 16 Wednesday 21 October 2015 10:48 PM

# Edit the /etc/hosts file with following command

# Add following hostname and their ip in host table

vignesh@HadoopMaster:~$ sudo addgroup hadoop

If you require to add hdusers to sudoers, then �re following command

sudo usermod -a -G sudo hduser

Add following line in /etc/sudoers/

hduser ALL=(ALL:ALL) ALL

sudo apt-get install rsync

3 of 16 Wednesday 21 October 2015 10:48 PM

Hadoop con�guration steps

Update this �le by changing hostname from localhost to HadoopMaster

## To edit file, fire the below given command

## Paste these lines into <configuration> tag OR Just update it by replacing

Update this �le by updating repliction factor from 1 to 3.

4 of 16 Wednesday 21 October 2015 10:48 PM

## To edit file, fire the below given command

## Paste/Update these lines into <configuration> tag

## To edit file, fire the below given command

## Paste/Update these lines into <configuration> tag

Update this �le by updating and adding following properties,

5 of 16 Wednesday 21 October 2015 10:48 PM

## To edit file, fire the below given command

## Paste/Update these lines into <configuration> tag

Update the directory of master nodes of Hadoop cluster

## To edit file, fire the below given command

## Add name of master nodes

Update the directory of slave nodes of Hadoop cluster

## To edit file, fire the below given command

## Add name of slave nodes

2. Copying/Sharing/Distributing Hadoop con�g �les to rest all nodes – master/slaves

6 of 16 Wednesday 21 October 2015 10:48 PM

sudo rm -rf /usr/local/hadoop_tmp/

Step 6B : Make same (/usr/local/hadoop_tmp/hdfs) directory and create NameNode

sudo mkdir -p /usr/local/hadoop_tmp/

Step 6C : Make hduser as owner of that directory.

sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

7 of 16 Wednesday 21 October 2015 10:48 PM

sudo rm -rf /usr/local/hadoop_tmp/hdfs/

Step 7B : Creates same (/usr/local/hadoop_tmp/) directory/folder,

sudo mkdir -p /usr/local/hadoop_tmp/

Step 7C : Make hduser as owner of that directory

sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

hduser@HadoopMaster: ~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@HadoopSlave

6. Format Namenonde (Run on MasterNode) :

8 of 16 Wednesday 21 October 2015 10:48 PM

# Run this command from Masternode