Computer Science & Engineering: Department of
Computer Science & Engineering: Department of
AIM:
Install Hadoop single node cluster and run applications like word count.
OBJECTIVE:
To install and test Hadoop single node cluster.
PROCEDURE:
1) Install jdk to your ubuntu vm
using:
Command: sudo apt install openjdk-11-jdk
4) Add the Hadoop and Java paths in the bash file (.bashrc). Open .bashrc file. Now, add Hadoop and Java Path as shown
below. (prefer the change to bash format for .bashrc file) Use “nano .bashrc” to edit the file.
Fig 1
Then, save the config file and close it. For applying all these changes to the current Terminal, execute the source
command.
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
Fig 2
Fig 3
7) Open core-site.xml and edit the property mentioned below inside configuration tag:
core-site.xml informs Hadoop daemon where NameNode runs in the cluster. It contains configuration settings of
Hadoop core such as I/O settings that are common to HDFS &
MapReduce
8) Edit hdfs-site.xml and edit the property mentioned below inside configuration tag:
hdfs-site.xml contains configuration settings of HDFS daemons (i.e. NameNode, DataNode, Secondary NameNode). It
also includes the replication factor and block size of HDFS.
9) Edit the mapred-site.xml file and edit the property mentioned below inside configuration tag:
mapred-site.xml contains configuration settings of MapReduce application like number of JVM that can run in parallel,
the size of the mapper and the reducer process, CPU cores available for a process, etc.
In some cases, mapred-site.xml file is not available. So, we have to create the mapred- site.xml file using
mapredsite.xml template.
10) Edit yarn-site.xml and edit the property mentioned below inside configuration tag:
yarn-site.xml contains configuration settings of ResourceManager and NodeManager like application memory
management size, the operation needed on program & algorithm, etc.
<?xml version="1.0">
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</ name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
11) Edit hadoop-env.sh and add the Java Path as mentioned below:
hadoop-env.sh contains the environment variables that are used in the script to run Hadoop like Java home path, etc.
Command: cd
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
Command: cd hadoop-3.4.0
Command: bin/hadoop namenode -format
This formats the HDFS via NameNode. This command is only executed for the first time. Formatting the file system
means initializing the directory specified by the dfs.name.dir variable.
Never format, up and running Hadoop filesystem. You will lose all your data stored in the HDFS.
13) Once the NameNode is formatted, go to hadoop-2.7.3/sbin directory and start all the daemons.
Command: cd hadoop-3.4.0/sbin
Either you can start all daemons with a single command or do it individually.
Command: ./start-all.sh
14) To check that all the Hadoop services are up and running, run the below command.
Command: jps
Fig 4
15) If everything is done as per mentioned above one can open the Mozilla browser and go to http://localhost:8088/ to
check for The ResourceManager UI
And to check the NameNode UI go to http://localhost:9870/ .
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
Fig 5
Thus the Hadoop one cluster was installed and simple applications executed successfully. The
word count application successfully processes input data and generates output. Analyze the
output to understand word frequencies, identify common words, and gain insights into the
dataset.