Online:: Setting Up The Environment
Online:: Setting Up The Environment
In this tutorial you will know step by step process for setting up a Hadoop Single Node cluster, so that you can play
around with the framework and learn more about it.
In This tutorial we are using following Software versions, you can download same by clicking the hyperlinks:
Ubuntu Linux 12.04.3 LTS (steps are same for any version)
Hadoop 2.6.4, (steps are same for Any version)
Prerequisites:
1. Installing Java v1.7 or later version
2. Adding dedicated Hadoop system user.
3. Configuring SSH access.
Before starting of installing any applications or softwares, please makes sure your list of packages from all repositories
and PPA’s is up to date or if not update them by using this command:
sudo apt-get update
For running Hadoop it requires Java v1. 6+ but use latest version
Step 1) sudo apt-get install openjdk-8-jdk (it will ask for password please enter you login or root password)
Step 2) execute update-java-alternatives -l ## will tell which all java versions installed if there are more than one.
and also would give path for installation for example
/usr/lib/jvm/java-1.8.0-openjdk-amd64
Step 3) Follow the above steps mentioned under section steps to update bashrc file above to update the bashrc file.
if it doesnt open then execute export DISPLAY=:0.0 then try gedit .bashrc again. it will open bash rc in
another window (text editior)
3. go to end of the file and give java home path
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
Step 6) save the file and close .bashrc file., and Run source .bashrc command for update the classpath.
$source .bashrc
$ java -version (it will display the version of java which you have installed)
Note: in this document, where ever you find username, you need to replace with your Ubuntu
machine username.
3. Configuring SSH access:
The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the
secondary node) to start/stop them and also local machine if you want to use Hadoop with it. For our single-node setup
of Hadoop, we therefore need to configure SSH access to localhost for the (user username) user we created in the
previous section.
Before this step you have to make sure that SSH is up and running on your machine and configured it to allow SSH public
key authentication.
Generating an SSH key for the your username is user which we have used in setup, it may be different in your machine).
c. It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at
‘/home/username/ .ssh’
d. Enable SSH access to your local machine with this newly created key.
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ exit
f. The final step is to test the SSH setup by connecting to your local machine with the (username) user.
ssh username@localhost or ssh localhost
This will add localhost permanently to the list of known hosts
Add the following lines to the end of the file and reboot the machine, to update the configurations correctly.
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Hadoop 2.x Setup –Veeraravi Kumar Singiri, +91 9986755336
Hadoop Installation:
Go to terminal, type following command.
Go to Apache Downloads and download Hadoop version 2.6.4 (prefer to download any stable versions)
i. Download hadoop from https://archive.apache.org/dist/hadoop/core/hadoop-2.6.4/ (prefer to download any
stable versions) and copy the hadoop-2.6.4.tar.gz to /usr/local/hadoop-env
ii. Unpack the compressed hadoop file by using this command: go to /usr/local/hadoop-env
Configuring Hadoop:
The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.
a. yarn-site.xml:
b. core-site.xml
c. mapred-site.xml
d. hdfs-site.xml
e. hadoop-env.sh
f. Update $HOME/.bashrc
We can find the list of files in Hadoop directory which is located in
cd /usr/local/hadoop-env/hadoop-2.6.4/etc/hadoop
Note: Select the above listed file and right-click and open as textfiles and modify with respective
configurations.
a.yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
Hadoop 2.x Setup –Veeraravi Kumar Singiri, +91 9986755336
</property>
</configuration>
b. core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
c. mapred-site.xml:
If this file does not exist, copy mapred-site.xml.template and paste in same location and update the name as mapred-
site.xml
i. Edit the mapred-site.xml file
ii. Add the following entry to the file and save and quit the file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
d. hdfs-site.xml:
i. Edit the hdfs-site.xml file
ii. Create two directories to be used by namenode and datanode. (no need to create them, it will get create by Hadoop
when you execute Hadoop namenode -format)
mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
mkdir -p $HADOOP_HOME/yarn_data/hdfs/datanode
Hadoop 2.x Setup –Veeraravi Kumar Singiri, +91 9986755336
iii. Add the following entry to the file and save and quit the file:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
</configuration>
e. Hadoop-env.sh:
update the java_home in this file.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
f. Update $HOME/.bashrc
i. Go back to the home directory by cd cmd and edit the .bashrc file.
vi .bashrc
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
#Java path
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
start-dfs.sh
start-yarn.sh
or
You can start all the services by using start-all.sh
Note: Run following command to check whether the services are running or not
>jps
It will display the following services:
NameNode
DataNode
Secondary NameNode
ResourceManager
NodeManager
By this we are done in setting up a single node hadoop cluster v2.2.0, hope this step by step guide helps you to setup
same environment at your end.
v. Stop Hadoop by running the following command
stop-dfs.sh
stop-yarn.sh
=============Hadoop setup completed======================
Hadoop 2.x Setup –Veeraravi Kumar Singiri, +91 9986755336
Node Manager: