100% found this document useful (1 vote)

104 views16 pages

Install Hadoop 3.3.0 & Run WordCount

The document provides steps to install Hadoop and configure the environment. It then describes how to run a sample WordCount MapReduce program on the Hadoop cluster.

Uploaded by

b.benchenni27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

104 views16 pages

Install Hadoop 3.3.0 & Run WordCount

The document provides steps to install Hadoop and configure the environment. It then describes how to run a sample WordCount MapReduce program on the Hadoop cluster.

Uploaded by

b.benchenni27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

Steps to install Hadoop:

1. Make sure java is installed.
java -version

If java is not installed, then type in the following commands:

sudo apt-get install update
sudo apt-get update
sudo apt-get install default-jdk
Make sure now java is installed.
java -version

2. Install ssh server

sudo apt-get install ssh-server
Generate public/private RSA key pair.
ssh-keygen -t rsa -P “”
When prompted for the file name to save the key, press Enter (leave it blank).

Type the following commands:

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

1
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

ssh localhost
exit

3. Install Hadoop by navigating to the following link and downloading the [Link] file
for Hadoop version 3.3.0 (or a later version if you wish). (478 MB)
[Link]

4. Once downloaded, open the terminal and cd to the directory where it is

downloaded (assume the desktop for example) and extract it as follows:
cd Desktop

2
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

sudo tar -xvzf [Link]

You can now check that there is an extracted file named hadoop-3.3.0 by typing
the command “ls” or by visually inspecting the files.
5. Now, we move the extracted file to the location /usr/local/hadoop
sudo mv hadoop-3.3.0 /usr/local/hadoop
6. Let’s configure the hadoop system.
Type the following command:
sudo gedit ~/.bashrc
At the end of the file, add the following lines: (Note: Replace the java version with the version
number you already have. You can navigate to the directory /usr/lib/jvm and check the file
name java-xx-openjdk-amd64)
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/native
export HADOOP_OPTS="-[Link]=$HADOOP_HOME/native"

3
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

7. Save the file and close it.

8. Now from the terminal, type the following command:
source ~/.bashrc
9. We start configuring Hadoop by opening [Link] as follows:
sudo gedit /usr/local/hadoop/etc/hadoop/[Link]
Search for the line starting with export JAVA_HOME= and replace it with the
following line.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Save the file by clicking on “Save” or (Ctrl+S)

4
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

10. Open [Link] as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/[Link]

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S).
<property>
<name>[Link]</name>
<value>hdfs://localhost:9000</value>
</property>

5
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

11. Open [Link] as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/[Link]

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S).
<property>
<name>[Link]</name>
<value>1</value>
</property>

6
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

<property>
<name>[Link]</name>
<value>file:/usr/local/hadoop_space/hdfs/namenode</value>
</property>
<property>
<name>[Link]</name>
<value>file:/usr/local/hadoop_space/hdfs/datanode</value>
</property>

7
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

12. Open [Link] as follows:

8
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

</property>

13. Open [Link] as follows:

9
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>[Link]</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>[Link]</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>

14. Now, run the following commands on the terminal to create a directory for
hadoop space, name node and data node.

sudo mkdir -p /usr/local/hadoop_space

sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode

Now we have successfully installed Hadoop.

10
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

15. Format the namenode as follows:

hdfs namenode -format

This step should end by shutting down the namenode as follows:

16. Before starting the Hadoop Distributed File System (hdfs), we need to
make sure that the rcmd type is “ssh” not “rsh” when we type the following
command
pdsh -q -w localhost

11
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

17. If the rcmd type is “rsh” as in the above figure, type the following
commands:
export PDSH_RCMD_TYPE=ssh
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Run Step 16 again to check that the rcmd type is now ssh.
If not, skip that step.

18. Start the HDFS System using the command.

[Link]

19. Start the YARN using the command

12
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

[Link]

20. Type the following command. You should see an output similar to the one
in the following figure.
jps

Make sure these nodes are listed: (ResourceManager, NameNode,

NodeManager, SecondaryNameNode, Jps and DataNode).

21. Go to localhost:9870 from the browser. You should expect the following

Steps to run WordCount Program on Hadoop:

13
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

1. Make sure Hadoop and Java are installed properly

hadoop version
javac -version

2. Create a directory on the Desktop named Lab and inside it create two folders;
one called “Input” and the other called “tutorial_classes”.
[You can do this step using GUI normally or through terminal commands]
cd Desktop
mkdir Lab
mkdir Lab/Input
mkdir Lab/tutorial_classes

3. Add the file attached with this document “[Link]” in the directory Lab

4. Add the file attached with this document “[Link]” in the directory Lab/Input.

5. Type the following command to export the hadoop classpath into bash.
export HADOOP_CLASSPATH=$(hadoop classpath)
Make sure it is now exported.
echo $HADOOP_CLASSPATH
6. It is time to create these directories on HDFS rather than locally. Type the
following commands.
hadoop fs -mkdir /WordCountTutorial
hadoop fs -mkdir /WordCountTutorial/Input
hadoop fs -put Lab/Input/[Link] /WordCountTutorial/Input
7. Go to localhost:9870 from the browser, Open “Utilities → Browse File System”
and you should see the directories and files we placed in the file system.
8. Then, back to local machine where we will compile the [Link] file.
Assuming we are currently in the Desktop directory.
cd Lab
javac -classpath $HADOOP_CLASSPATH -d tutorial_classes
[Link] the output files in one jar file (There is a dot at the end)
jar -cvf [Link] -C tutorial_classes .

9. Now, we run the jar file on Hadoop.

hadoop jar [Link] WordCount /WordCountTutorial/Input
/WordCountTutorial/Output

14
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

10. Output the result:

hadoop dfs -cat /WordCountTutorial/Output/*

Requirement:

15
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

Vodafone Egypt is launching a marketing campaign in Ramadan to promote their

sales and increase their profit from selling the prepaid recharge cards. These cards
are worth 5, 10, 15, 50, and 100 EGP.

The data science team at Vodafone are analyzing the customers’ data which include
the customer personal information, the prepaid card they purchased, the timestamp
they registered the prepaid amount on their Vodafone accounts, among other
information.

The details of the customers are omitted, and you are only provided with a file “[Link]”
which includes two columns.
1. Customer ID. (Each ID maps to a certain customer, whose data is hidden for
confidentiality).
2. Prepaid Card Amount.

Your task is to generate a report using MapReduce (similar to the WordCount

program) showing the total amount of prepaid cards for each customer that they have
purchased. For example, if a customer with ID 300 purchased 5 cards with 10, 15,
15, 10, 100, then the report should include that customer ID 300 bought cards with a
total amount of 150.

Disclaimer: Thanks to Vodafone DS team who provided us with this real customer
data.

BDT Lab Manual
No ratings yet
BDT Lab Manual
34 pages
2017 Big Data Trends Overview
No ratings yet
2017 Big Data Trends Overview
13 pages
Hive Case Study
100% (2)
Hive Case Study
29 pages
Hadoop Basics for Data Science Students
No ratings yet
Hadoop Basics for Data Science Students
22 pages
Big Data in Mobile Networks
No ratings yet
Big Data in Mobile Networks
15 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
BDA Question Paper
No ratings yet
BDA Question Paper
2 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Cloud Deployment and Service Models
No ratings yet
Cloud Deployment and Service Models
10 pages
1.hadoop Admin Brochure
No ratings yet
1.hadoop Admin Brochure
11 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
MCA Project Report On JAVA Studio
No ratings yet
MCA Project Report On JAVA Studio
165 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Big Data DevOps Role in Budapest
No ratings yet
Big Data DevOps Role in Budapest
2 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
Big Data Technologies
No ratings yet
Big Data Technologies
4 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
186 Cloud Computing From Scet
No ratings yet
186 Cloud Computing From Scet
8 pages
Cdac Resume
No ratings yet
Cdac Resume
1 page
Bigdata
No ratings yet
Bigdata
2 pages
Siddharth Pandya
No ratings yet
Siddharth Pandya
11 pages
OnkarPramodKurle (3 0)
No ratings yet
OnkarPramodKurle (3 0)
7 pages
Digitap.ai CKYC API Integration Guide
No ratings yet
Digitap.ai CKYC API Integration Guide
20 pages
Google Data Engineer Certification Guide
No ratings yet
Google Data Engineer Certification Guide
4 pages
Web Mining and Text Mining
No ratings yet
Web Mining and Text Mining
65 pages
Advanced GCP Cloud Architect Training
No ratings yet
Advanced GCP Cloud Architect Training
4 pages
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
Hadoop Testing and Big Data Trends
100% (1)
Hadoop Testing and Big Data Trends
34 pages
RGPV BigData ExamNotes
No ratings yet
RGPV BigData ExamNotes
19 pages
Big Data Pipelines
No ratings yet
Big Data Pipelines
22 pages
CCS334-Big-Data-Analytics UNIVERSITY QP
No ratings yet
CCS334-Big-Data-Analytics UNIVERSITY QP
20 pages
BDA Presentations
No ratings yet
BDA Presentations
26 pages
Oomd (U1&u2)
100% (1)
Oomd (U1&u2)
83 pages
Apache Hadoop Developer Training PDF
No ratings yet
Apache Hadoop Developer Training PDF
394 pages
Hadoop and IBM Big Insights Overview
No ratings yet
Hadoop and IBM Big Insights Overview
112 pages
Informatica Power Center Best Practices
No ratings yet
Informatica Power Center Best Practices
8 pages
10
No ratings yet
10
4 pages
Lab Manual: Sri Ramakrishna Institute of Technology
No ratings yet
Lab Manual: Sri Ramakrishna Institute of Technology
49 pages
BigQuery For Data Warehouse Practitioners - Solutions - Google Cloud
No ratings yet
BigQuery For Data Warehouse Practitioners - Solutions - Google Cloud
25 pages
Small Enterprise Project Cost Analysis
No ratings yet
Small Enterprise Project Cost Analysis
6 pages
Google Cloud Internship Review
No ratings yet
Google Cloud Internship Review
10 pages
MG6088-Software Project Management
No ratings yet
MG6088-Software Project Management
12 pages
Bsd1313 Chapter 4
No ratings yet
Bsd1313 Chapter 4
129 pages
Soa
100% (1)
Soa
129 pages
Big Data: Concepts, Challenges, and Solutions
No ratings yet
Big Data: Concepts, Challenges, and Solutions
22 pages
Mysql Interview Questions PDF
No ratings yet
Mysql Interview Questions PDF
5 pages
Distributed Database: GDC Thana Semester 6
No ratings yet
Distributed Database: GDC Thana Semester 6
10 pages
API LIst
No ratings yet
API LIst
2 pages
Cassandra Database Overview
No ratings yet
Cassandra Database Overview
37 pages
Anurag Arwalkar: Web Developer Profile
No ratings yet
Anurag Arwalkar: Web Developer Profile
1 page
How To Set Up REPMGR With WITNESS For PostgreSQL 10 Official Pythian®® Blog
No ratings yet
How To Set Up REPMGR With WITNESS For PostgreSQL 10 Official Pythian®® Blog
5 pages
Cloud Computing Syllabus New Scheme
No ratings yet
Cloud Computing Syllabus New Scheme
4 pages
Data Science Course Content
No ratings yet
Data Science Course Content
8 pages
Sagar Akunuri Sr. Python Developer
No ratings yet
Sagar Akunuri Sr. Python Developer
5 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Module 3
No ratings yet
Module 3
36 pages
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
No ratings yet
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
14 pages
Hadoop Single Node Cluster Setup Guide
No ratings yet
Hadoop Single Node Cluster Setup Guide
61 pages