0% found this document useful (0 votes)
365 views10 pages

Hadoop Cluster Setup

This document provides instructions for configuring a Hadoop cluster across multiple machines including modifying host files, setting up passwordless SSH, editing configuration files like core-site.xml and hdfs-site.xml, formatting the namenode, and starting and stopping the Hadoop processes. Key steps include modifying hosts files on all machines, setting up SSH between master and slaves, editing configuration files like masters and slaves files on all machines, formatting the namenode on the master, and starting/stopping processes by running scripts on the master only.

Uploaded by

bispsolutions
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
365 views10 pages

Hadoop Cluster Setup

This document provides instructions for configuring a Hadoop cluster across multiple machines including modifying host files, setting up passwordless SSH, editing configuration files like core-site.xml and hdfs-site.xml, formatting the namenode, and starting and stopping the Hadoop processes. Key steps include modifying hosts files on all machines, setting up SSH between master and slaves, editing configuration files like masters and slaves files on all machines, formatting the namenode on the master, and starting/stopping processes by running scripts on the master only.

Uploaded by

bispsolutions
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

Configuring Hadoop Cluster on multiple machine

Agenda
Modify your hosts file SSH from master to all slaves SSH to all slaves to master Edit masters file Edit slaves file Modify hadoop-env.sh file Modify core-site.xml file Modify hdfs-site.xml file Modify mapred-site.xml file Formatting of name node Start Hadoop cluster Stop Hadoop cluster

Modify your hosts file


Hosts file contains mapping of ip to hostname Edit your hosts file by typing the below command in your terminal

sudo vi /etc/hosts

Add entries for master & slaves

Repeat the same step on all master/slaves machines.

Master needs to communicate with each slave machine


There should be passwordless ssh from master machine to slave machine Follow the 3 commands to set passwordless ssh from master to slave username@master:~> ssh-keygen -t rsa username@master:~> ssh username@slave1 mkdir -p .ssh username@master:~> cat .ssh/id_rsa.pub | ssh username@slave1 'cat >> .ssh/authorized_keys' Repeat the same steps for each slave machine.

Each slave needs to communicate with master machine


There should be passwordless ssh from each slave machine to master machine Follow the 3 commands to set passwordless ssh from slave to master username@slave1:~> ssh-keygen -t rsa username@slave1:~> ssh username@master mkdir -p .ssh username@slave1:~> cat .ssh/id_rsa.pub | ssh username@master 'cat >> .ssh/authorized_keys' Repeat the same steps on each slave machine

Edit masters file


Open masters file ( HADOOP_HOME/conf/masters ) Add master machine entry in the file Save the master file Make these changes on each machine on cluster (master/slaves)

Edit slaves file


Open slaves file ( HADOOP_HOME/conf/slaves ) Add all slaves machine entry in the file Add slave entry 1 per line. Save the slaves file Make these changes on each machine on cluster (master/slaves)

Modify hadoop-env.sh file


hadoop-env.sh file contains system level variable. Make the following entry in HADOOP_HOME/conf/hadoop-env.sh

export JAVA_HOME=/usr export HADOOP_HOME=/home/neeraj/local_cluster_home/hadoop-1.0.3 Make these changes on each machine on cluster (master/slaves)

Modify core-site.xml file


We need to make the following entry in core-site.xml..

<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/neeraj/local_cluster_home/hadoop1.0.3/hdfs_temp</value> </property> </configuration> Make these changes on each machine on cluster (master/slaves)

Modify hdfs-site.xml file


We need to make the following entry in hdfs-site.xml.. <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>It's the number of times the block of a file will be replicated on cluster. Default is 3

</description> </property> <property> <name>dfs.data.dir</name> <value>/home/neeraj/local_cluster_home/hadoop1.0.3/hdfs_data</value> </property> </configuration>

Make these changes on each machine on cluster (master & slaves)

Modify mapred-site.xml file


We need to make the following entry in mapred-site.xml..

<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port on MapReduce job tracker runs at. </description> </property>

</configuration> Make these changes on each machine on cluster (master/slaves)

Format your Namenode


Run the following command on your master machine ./hadoop namenode -format

Start your Hadoop cluster

Run the following command on master machine ./start-all.sh No need to start anything on slave machines

Check Hadoop daemons

Run the jps command on master machine

Run the jps command on slave machines

Stop your Hadoop cluster

Run the following command on master machine ./stop-all.sh No need to stop anything on slave machines

Thanks

Contact Point :www.bispsolutions.com

You might also like