0% found this document useful (0 votes)
46 views6 pages

Computer Science & Engineering: Department of

The document details the steps to install Hadoop single node cluster on an Ubuntu VM and run the word count application. It provides instructions on downloading and extracting Hadoop, configuring environment variables, formatting the namenode, starting Hadoop services, and checking the outputs of resource manager and namenode UIs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views6 pages

Computer Science & Engineering: Department of

The document details the steps to install Hadoop single node cluster on an Ubuntu VM and run the word count application. It provides instructions on downloading and extracting Hadoop, configuring environment variables, formatting the namenode, starting Hadoop services, and checking the outputs of resource manager and namenode UIs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DEPARTMENT OF

COMPUTER SCIENCE & ENGINEERING


Experiment – 3.1

Student Name: Anurag Kumar UID: 21BCS1040


Branch: BE-CSE Section/Group: 605-A
th
Semester: 6 Date of Performance: 04/04/2024
Subject Name: Cloud Computing Lab Subject Code: 21CSP-378

AIM:
Install Hadoop single node cluster and run applications like word count.

OBJECTIVE:
To install and test Hadoop single node cluster.

PROCEDURE:
1) Install jdk to your ubuntu vm
using:
Command: sudo apt install openjdk-11-jdk

2) Download Hadoop archive file from apache website using:


Command: wget https://archive.apache.org/dist/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz

3) Extract the tar file using:


Command: tar -xvf hadoop-3.4.0.tar.gz

4) Add the Hadoop and Java paths in the bash file (.bashrc). Open .bashrc file. Now, add Hadoop and Java Path as shown
below. (prefer the change to bash format for .bashrc file) Use “nano .bashrc” to edit the file.

Fig 1
Then, save the config file and close it. For applying all these changes to the current Terminal, execute the source
command.
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING

Command: source .bashrc

5) Check for successful installation of Hadoop and JAVA:


Command: java -version
Command: hadoop version

Fig 2

6) Edit the Hadoop Configuration files


We will be editing some configuration file of Hadoop Change the directory
to: hadoop-3.4.0/etc/hadoop Using command:
• Command: cd hadoop-3.4.0/etc/hadoop
• Command: ls

Fig 3

7) Open core-site.xml and edit the property mentioned below inside configuration tag:

core-site.xml informs Hadoop daemon where NameNode runs in the cluster. It contains configuration settings of
Hadoop core such as I/O settings that are common to HDFS &
MapReduce

Command: nano core-site.xml

<?xml version="1.0" encoding="UTF-8"?>


DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

8) Edit hdfs-site.xml and edit the property mentioned below inside configuration tag:

hdfs-site.xml contains configuration settings of HDFS daemons (i.e. NameNode, DataNode, Secondary NameNode). It
also includes the replication factor and block size of HDFS.

Command: nano hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>
</configuration>

9) Edit the mapred-site.xml file and edit the property mentioned below inside configuration tag:

mapred-site.xml contains configuration settings of MapReduce application like number of JVM that can run in parallel,
the size of the mapper and the reducer process, CPU cores available for a process, etc.
In some cases, mapred-site.xml file is not available. So, we have to create the mapred- site.xml file using
mapredsite.xml template.

Command: cp mapred-site.xml.template mapred-site.xml

Command: nano vi mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

10) Edit yarn-site.xml and edit the property mentioned below inside configuration tag:

yarn-site.xml contains configuration settings of ResourceManager and NodeManager like application memory
management size, the operation needed on program & algorithm, etc.

Command: nano yarn-site.xml

<?xml version="1.0">
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</ name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>

11) Edit hadoop-env.sh and add the Java Path as mentioned below:
hadoop-env.sh contains the environment variables that are used in the script to run Hadoop like Java home path, etc.

Command: nano hadoop–env.sh

Command: export JAVA_HOME=/home/ubuntu22/usr/lib/jvm/java-11-openjdk-amd64

12) Go to Hadoop home directory and format the NameNode.

Command: cd
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
Command: cd hadoop-3.4.0
Command: bin/hadoop namenode -format

This formats the HDFS via NameNode. This command is only executed for the first time. Formatting the file system
means initializing the directory specified by the dfs.name.dir variable.
Never format, up and running Hadoop filesystem. You will lose all your data stored in the HDFS.

13) Once the NameNode is formatted, go to hadoop-2.7.3/sbin directory and start all the daemons.

Command: cd hadoop-3.4.0/sbin

Either you can start all daemons with a single command or do it individually.

Command: ./start-all.sh

14) To check that all the Hadoop services are up and running, run the below command.

Command: jps

Fig 4

15) If everything is done as per mentioned above one can open the Mozilla browser and go to http://localhost:8088/ to
check for The ResourceManager UI
And to check the NameNode UI go to http://localhost:9870/ .
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING

Fig 5

Result & Analysis:

Thus the Hadoop one cluster was installed and simple applications executed successfully. The
word count application successfully processes input data and generates output. Analyze the
output to understand word frequencies, identify common words, and gain insights into the
dataset.

You might also like