Setup 8
Setup 8
Setup 8
PINGAX
(HTTP://PINGAX.COM/) BIG DATA ANALYTICS WITH R AND HADOOP
CONTACT US (HTTP://PINGAX.COM/CONTACT-US/)
As you have reached on this blogpost of Setting up Multinode Hadoop cluster, I may believe that you have
already read and experimented with my previous blogpost on HOW TO INSTALL APACHE HADOOP 2.6.0 IN
UBUNTU (SINGLE NODE SETUP) (http://pingax.com/install-hadoop2-6-0-on-ubuntu/). If not then �rst I would
like to recommend you to read it before proceeding here. Since we are interested to setting up Multinode
Hadoop cluster, we must have multiple machines to be �t with in Master- Slave architecture.
Let’s get started towards setting up a fresh Multinode Hadoop (2.6.0) cluster. Follow the given steps,
Prerequisites
1. Installation and Con�guration of Single node Hadoop :
Install and Con�ure Single node Hadoop which will be our Masternode. To get instructions over How to
setup Hadoop Single node, visit – previous blog http://pingax.com/install-hadoop2-6-0-on-ubuntu/
(http://pingax.com/install-hadoop2-6-0-on-ubuntu/).
will name it as HadoopMaster and to 2 di�erent Slave nodes, we will name them as HadoopSlave1,
HadoopSlave2 respectively in /etc/hosts directory. After deciding a hostname of all nodes, assign their
names by updating hostnames (You can ignore this step if you do not want to setup names.) Add all host
names to /etc/hosts directory in all Machines (Master and Slave nodes).
Step 3B : Create hadoop as group and hduser as user in all Machines (if not created !!).
OR
Step 3C : Install rsync for sharing hadoop source with rest all Machines,
Step 3D : To make above changes re�ected, we need to reboot all of the Machines.
sudo reboot
Changes:
1. Update core-site.xml
2. Update hdfs-site.xml
3. Update yarn-site.xml
Update this �le by updating the following three properties by updating hostname from localhost to
HadoopMaster,
4. Update Mapred-site.xml
5. Update masters
6. Update slaves
Use rsync for distributing con�gured Hadoop source among rest of nodes via network.
# In HadoopSlave1 machine
sudo rsync -avxP /usr/local/hadoop/ hduser@HadoopSlave1:/usr/local/hadoop/
# In HadoopSlave2 machine
sudo rsync -avxP /usr/local/hadoop/ hduser@HadoopSlave2:/usr/local/hadoop/
The above command will share the �les stored within hadoop folder to Slave nodes with location –
/usr/local/hadoop. So, you dont need to again download as well as setup the above con�guration in
rest of all nodes. You just need Java and rsync to be installed over all nodes. And this JAVA_HOME path need
to be matched with $HADOOP_HOME/etc/hadoop/hadoop-env.sh �le of your Hadoop distribution which
we had already con�gured in Single node Hadoop con�guration.
3. Applying Master node speci�c Hadoop con�guration: (Only for master nodes)
These are some con�guration to be applied over Hadoop MasterNodes (Since we have only one master
node it will be applied to only one master node.)
Step 6A : Remove existing Hadoop_data folder (which was created while single node hadoop setup.)
4. Applying Slave node speci�c Hadoop con�guration : (Only for slave nodes)
Since we have three slave nodes, we will be applying the following changes over HadoopSlave1,
HadoopSlave2 and HadoopSlave3 nodes.
Step 7A : Remove existing Hadoop_data folder (which was created while single node hadoop setup)
5. Copying ssh key for Setting up passwordless ssh access from Master to Slave node :
To manage (start/stop) all nodes of Master-Slave architecture, hduser (hadoop user of Masternode) need to
be login on all Slave as well as all Master nodes which can be possible through setting up passwrdless SSH
login. (If you are not setting this then you need to provide password while starting and stoping daemons on
Slave nodes from Master node).
Fire the following command for sharing public SSH key – $HOME/.ssh/id_rsa.pub �le (of HadoopMaster
node) to authorized_keys �le of hduser@HadoopSlave1 and also on hduser@HadoopSlave1 (in
$HOME/.ssh/authorized_keys)
hduser@HadoopMaster:/usr/local/hadoop$ start-dfs.sh
hduser@HadoopMaster:/usr/local/hadoop$ start-yarn.sh
Instead both of these above command you can also use start-all.sh, but its now deprecated so its not
recommended to be used for better Hadoop operations.
hduser@HadoopMaster: jps
(http://pingax.com/wp-content
/uploads/2015/04/master.png)
Verify Hadoop daemons on all slave nodes :
hduser@HadoopSlave1: jps
hduser@HadoopSlave2: jps
(http://pingax.com/wp-content/uploads
/2015/04/jpsSLave11.png)
(As shown in above snap- The running services of HadoopSlave1 will be the same for all Slave nodes
con�gured in Hadoop Cluster.)
If you wish to track Hadoop MapReduce as well as HDFS, you can also try exploring Hadoop web view of
ResourceManager and NameNode which are usually used by hadoop administrators. Open your default
browser and visit to the following links from any of the node.
(http://pingax.com/wp-content/uploads/2015/04/master80881.png)
For NameNode – Http://HadoopMaster:50070 (http://HadoopMaster:50070)
(http://pingax.com/wp-content/uploads/2015/04/master500701.png)
If you are getting the similar output as shown in the above snapshot for Master and Slave noes then
Congratulations! You have successfully installed Apache Hadoop in your Cluster and if not then post your
error messages in comments. We will be happy to help you. Happy Hadooping.!! Also you can request me
([email protected]) for blog title if you want me to write over it.
(/install-apache-hadoop-ubuntu-cluster-setup/?format=pdf)
Google+ Comments
27 comments
Top comments
· Reply
+1 · Reply
thanks again.
· Reply
· Reply
(http://bit.ly/udemy_banner_link)
(http://www.bit.ly/1jX3mLu)
EMAIL SUBSCRIPTION
Subscribe
Posts RSS (Really Simple Syndication) (http://pingax.com/feed/) Comments RSS (Really Simple
Syndication) (http://pingax.com/comments/feed/)
RECENT POSTS
How to install Apache Hadoop 2.6.0 in Ubuntu (Multi node/Cluster setup) (http://pingax.com/install-apache-
hadoop-ubuntu-cluster-setup/)
How to install Apache Hadoop 2.6.0 in Ubuntu (Single node setup) (http://pingax.com/install-hadoop2-6-0-
on-ubuntu/)
Build Predictive Model on Big data: Using R and MySQL Part-3 (http://pingax.com/build-predictive-model-
big-data-using-r-mysql-part-3/)
CATEGORIES
Hadoop (http://pingax.com/category/hadoop/)
MongoDB (http://pingax.com/category/mongodb/)
R (http://pingax.com/category/r/)
Spark (http://pingax.com/category/spark/)