0% found this document useful (0 votes)

158 views6 pages

Hadoop Installation Step by Step

The document outlines the 4 main steps to install Hadoop on a single node Linux cluster: 1) Install Java, which is required to run Hadoop 2) Download and extract the Hadoop binary files 3) Configure Hadoop's Java home directory in the configuration files 4) Run Hadoop and an example MapReduce job to test the installation

Uploaded by

Umesh Nagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views6 pages

Hadoop Installation Step by Step

Uploaded by

Umesh Nagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)

Step 1 — Installing Java

First we need to update the package list

 sudo apt-get update

Next, install JDK, the default Java Development Kit on Ubuntu version.

 sudo apt-get install default-jdk

Once the installation is complete, let's check the version.

 java –version
 Output

java version "1.8.0_151"

Java(TM) SE Runtime Environment (build 1.8.0_151-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

This output verifies that OpenJDK has been successfully installed.

Step 2 — Installing Hadoop

With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent
stable release. Follow the binary for the current release:

Design and modified by Mr. Amitava Choudhury and Ms. Ambika Agarwal, Assistant Professor, UPES.
Ref. www.digitalocean.com
HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)

Or simple write the following command on terminal, On the server, we'll use wget to
fetch it:

 wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.9.0/hadoop-
2.9.0.tar.gz

Note: The Apache website will direct you to the best mirror dynamically, so your URL may
not match the URL above.

Now that we've verified that the file wasn't corrupted or changed, we'll use the tar
command with the -xflag to extract, -z to uncompress, -v for verbose output, and -f to
specify that we're extracting from a file. Use tab-completion or substitute the correct
version number in the command below:
tar -xzvf hadoop-2.9.0.tar.gz

Finally, we'll move the extracted files into /usr/local, the appropriate place for locally
installed software. Change the version number, if needed, to match the version you
downloaded.

 sudo mv hadoop-2.9.0 /usr/local/hadoop

With the software in place, we're ready to configure its environment.

Step 3 — Configuring Hadoop's Java Home

Hadoop requires that you set the path to Java, either as an environment variable or in
the Hadoop configuration file.
Design and modified by Mr. Amitava Choudhury and Ms. Ambika Agarwal, Assistant Professor, UPES.
Ref. www.digitalocean.com
HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)
The path to Java, /usr/bin/java is a symlink to /etc/alternatives/java, which is
in turn a symlink to default Java binary. We will use readlink with the -f flag to follow
every symlink in every part of the path, recursively. Then, we'll use sed to
trim bin/java from the output to give us the correct value for JAVA_HOME.

To find the default Java path

 readlink -f /usr/bin/java | sed "s:bin/java::"

Output
/usr/lib/jvm/java-8-openjdk-amd64/jre/
You can copy this output to set Hadoop's Java home to this specific version, which
ensures that if the default Java changes, this value will not. Alternatively, you can use
the readlink command dynamically in the file so that Hadoop will automatically use
whatever Java version is set as the system default.

To begin, open hadoop-env.sh:

 sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Then, choose one of the following options:

Option 1: Set a Static Value

/usr/local/hadoop/etc/hadoop/hadoop-env.sh

. . .
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

Step 4 — Running Hadoop

Now we should be able to run Hadoop:

 /usr/local/hadoop/bin/hadoop

Output

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

CLASSNAME run the class named CLASSNAME

where COMMAND is one of:

version print the version

jar <jar> run a jar file

note: please use "yarn jar" to launch

YARN applications, not this command.

checknative [-a|-h] check native hadoop and compression libraries

availability

distcp <srcurl> <desturl> copy file or directories recursively

archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop

classpath prints the class path needed to get the

credential interact with credential providers

Hadoop jar and the required libraries

daemonlog get/set the log level for each daemon

trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

The help means we've successfully configured Hadoop to run in stand-alone mode Check
Hadoop version. To ensure that Hadoop installed successfully need to check Hadoop
version.

/usr/local/hadoop/bin/hadoop version

If everything alright it will give output

Hadoop 2.9.0

Subversion https://[email protected]/repos/asf/hadoop.git
-r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075

Compiled by kshvachk on 2017-12-16T01:06Z

Compiled with protoc 2.9.0

From source with checksum 9f118f95f47043332d51891e37f736e9

We'll ensure that it is functioning properly by running the example MapReduce program
it ships with. To do so, create a directory called input in our home directory

 mkdir ~/input

Create a text file (eg. Text.txt) inside input folder. Here our file name is text.txt and it
contains some textual information (Hi, his name is Himadri, his favorite place is Dehradun
which is located at Uttarakhand. This information will help you to search hi in all word
containing hi).

Now copy Hadoop's configuration files into input to use those files as our data.

 cp /usr/local/hadoop/etc/hadoop/*.xml ~/input

Next, we can use the following command to run the MapReduce hadoop-mapreduce-
examples program, a Java archive with several options. We'll invoke its grep program,
one of many examples included in hadoop-mapreduce-examples, followed by the input
directory, input and the output directory grep_example. The MapReduce grep program
will count the matches of a literal word or regular expression. Finally, we'll supply a regular
expression to find occurrences of the word principal within or at the end of a
declarative sentence. The expression is case-sensitive, so we wouldn't find the word if it
were capitalized at the beginning of a sentence:

/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.9.0.jar grep ~/input/text.txt ~/output 'hi[.]*'

When the task completes, it provides a summary of what has been processed and
errors it has encountered, but this doesn't contain the actual results.

Output
. . .
File System Counters
FILE: Number of bytes read=1184518
FILE: Number of bytes written=2348362
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1

Design and modified by Mr. Amitava Choudhury and Ms. Ambika Agarwal, Assistant Professor, UPES.
Ref. www.digitalocean.com
HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)
Map output bytes=11
Map output materialized bytes=19
Input split bytes=114
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=19
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=62
Total committed heap usage (bytes)=336732160

Note: If the output directory already exists, the program will fail, and rather than seeing the
summary, the ouput will look something like:
Output
. . .
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Results are stored in the output directory and can be checked by running cat on the
output directory:

 cat ~/output/*

Output
7 hi
The MapReduce task found one occurrence of the word hi followed by a period and 7
times in the text file hi found. Running the example program has verified that our stand-
alone installation is working properly and that non-privileged users on the system can run
Hadoop for exploration or debugging.

Design and modified by Mr. Amitava Choudhury and Ms. Ambika Agarwal, Assistant Professor, UPES.
Ref. www.digitalocean.com

Cloudera Administrator Training For Apache Hadoop
No ratings yet
Cloudera Administrator Training For Apache Hadoop
5 pages
Hadoop Administration Course Content PDF
No ratings yet
Hadoop Administration Course Content PDF
4 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
7 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Hibernate (An ORM Tool)
No ratings yet
Hibernate (An ORM Tool)
69 pages
It6713 Grid Cloud Computing Lab
No ratings yet
It6713 Grid Cloud Computing Lab
96 pages
Presentation 2
No ratings yet
Presentation 2
36 pages
Formation Angular Lab 2 More Components: Lab 2.1: Data Flowing Downwards
No ratings yet
Formation Angular Lab 2 More Components: Lab 2.1: Data Flowing Downwards
5 pages
ADBMS Lab Manual Aug-Dec 2017 - ByMe
No ratings yet
ADBMS Lab Manual Aug-Dec 2017 - ByMe
9 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
Android Training Online
No ratings yet
Android Training Online
6 pages
DWM Assignment
No ratings yet
DWM Assignment
9 pages
Q1. Explain JDK, JRE and JVM?
No ratings yet
Q1. Explain JDK, JRE and JVM?
21 pages
Python Notes - 1
No ratings yet
Python Notes - 1
364 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
5 pages
NoSQL Systems For Big Data Management
No ratings yet
NoSQL Systems For Big Data Management
8 pages
Installing Multi Node Cluster - Handbook 2.0
No ratings yet
Installing Multi Node Cluster - Handbook 2.0
2 pages
DBMS QB
No ratings yet
DBMS QB
4 pages
Python Syllbus by Lokesh
No ratings yet
Python Syllbus by Lokesh
5 pages
Introduction To Keras!: Vincent Lepetit!
No ratings yet
Introduction To Keras!: Vincent Lepetit!
33 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
Java Programming Part I
No ratings yet
Java Programming Part I
120 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
Advanced Java
No ratings yet
Advanced Java
209 pages
Adbms Lab Manual
No ratings yet
Adbms Lab Manual
7 pages
Wipro Questions and Answers
No ratings yet
Wipro Questions and Answers
3 pages
Wrapper Classes Exercise: Cognizant Technology Solutions
No ratings yet
Wrapper Classes Exercise: Cognizant Technology Solutions
7 pages
Mapreduce Lab
No ratings yet
Mapreduce Lab
36 pages
PL SQL Quick Reference
No ratings yet
PL SQL Quick Reference
48 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Cs 403 Software Engineering Jun 2020
No ratings yet
Cs 403 Software Engineering Jun 2020
3 pages
Hive Installation On Windows 10
No ratings yet
Hive Installation On Windows 10
13 pages
Data Science
100% (2)
Data Science
52 pages
Cs8492 - Database Management Systems Unit I - Relational Database Part - A (2 Marks)
No ratings yet
Cs8492 - Database Management Systems Unit I - Relational Database Part - A (2 Marks)
27 pages
Data Modeling
No ratings yet
Data Modeling
3 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
Lab Programs
100% (1)
Lab Programs
24 pages
QB
No ratings yet
QB
231 pages
June-2012 (Computer Science) (Paper)
No ratings yet
June-2012 (Computer Science) (Paper)
12 pages
1 Month Big Data Boot Camp
No ratings yet
1 Month Big Data Boot Camp
6 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Final Lab Manual WEB
No ratings yet
Final Lab Manual WEB
62 pages
GCC QB
100% (1)
GCC QB
16 pages
Mysql Interview Questions PDF
No ratings yet
Mysql Interview Questions PDF
5 pages
Hadoop Multi Node Cluster
No ratings yet
Hadoop Multi Node Cluster
7 pages
Tycs Ai Unit 2
No ratings yet
Tycs Ai Unit 2
84 pages
Department of Computer Science and Engineering Astu: NLP: Background and Overview
No ratings yet
Department of Computer Science and Engineering Astu: NLP: Background and Overview
30 pages
Angular JS: Key Words: - Html5 Course Description
No ratings yet
Angular JS: Key Words: - Html5 Course Description
7 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
DWDM 1-5 QB Sols
No ratings yet
DWDM 1-5 QB Sols
193 pages
Dbms Unit4 SQL Final
No ratings yet
Dbms Unit4 SQL Final
7 pages
Oops
No ratings yet
Oops
71 pages
Step by Step Installation
No ratings yet
Step by Step Installation
28 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
From Everand
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
equitypress
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Oracle Data Guard A Clear and Concise Reference
From Everand
Oracle Data Guard A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Java Learning Guide
No ratings yet
Java Learning Guide
3 pages
Ex - No:7 (A) Date: Aim:: Cpu Scheduling Algorithms
No ratings yet
Ex - No:7 (A) Date: Aim:: Cpu Scheduling Algorithms
16 pages
Deep Learning Techniques: An Overview: January 2021
No ratings yet
Deep Learning Techniques: An Overview: January 2021
11 pages
AL3391-artificial-intelligence Unit-1
No ratings yet
AL3391-artificial-intelligence Unit-1
22 pages
MT45115 S 1512 Presentation
100% (1)
MT45115 S 1512 Presentation
1,284 pages
Sliding Window
No ratings yet
Sliding Window
4 pages
Subject Name: Compiler Design Subject Code:2170701
No ratings yet
Subject Name: Compiler Design Subject Code:2170701
15 pages
Code
No ratings yet
Code
2 pages
Exit Exam Tutorial Schedule 2024-25
No ratings yet
Exit Exam Tutorial Schedule 2024-25
2 pages
Rohan
No ratings yet
Rohan
47 pages
Map578 5
No ratings yet
Map578 5
52 pages
FINAL450
No ratings yet
FINAL450
45 pages
Error Page
No ratings yet
Error Page
16 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Chapter 1 - Logic-2
No ratings yet
Chapter 1 - Logic-2
23 pages
Unit Vi (FDS)
No ratings yet
Unit Vi (FDS)
57 pages
Viva Questions
No ratings yet
Viva Questions
4 pages
1.C-Routine For Insert Operation Circular Queue
No ratings yet
1.C-Routine For Insert Operation Circular Queue
5 pages
Binary Arithmetic Alphabet
No ratings yet
Binary Arithmetic Alphabet
2 pages
Object Oriented Programming Through C July 2022
No ratings yet
Object Oriented Programming Through C July 2022
1 page
Aes and Des
No ratings yet
Aes and Des
5 pages
04 Chapter-05 Register Organization
No ratings yet
04 Chapter-05 Register Organization
3 pages
Os Unit 5
No ratings yet
Os Unit 5
85 pages
Unit 2 ML
No ratings yet
Unit 2 ML
47 pages
A Guide To Competitive Programming
No ratings yet
A Guide To Competitive Programming
8 pages
Unit 2 - Source Coding-4
No ratings yet
Unit 2 - Source Coding-4
57 pages
Os Lab Manual Final
No ratings yet
Os Lab Manual Final
17 pages
Java Interview Qus.
No ratings yet
Java Interview Qus.
3 pages
Cracking The C++ Programming Skills - IT Job Interview Series
No ratings yet
Cracking The C++ Programming Skills - IT Job Interview Series
627 pages

Hadoop Installation Step by Step

Uploaded by

Hadoop Installation Step by Step

Uploaded by

HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)

Step 1 — Installing Java

 sudo apt-get update

 sudo apt-get install default-jdk

Once the installation is complete, let's check the version.

java version "1.8.0_151"

Java(TM) SE Runtime Environment (build 1.8.0_151-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

This output verifies that OpenJDK has been successfully installed.

Step 2 — Installing Hadoop

 sudo mv hadoop-2.9.0 /usr/local/hadoop

With the software in place, we're ready to configure its environment.

Step 3 — Configuring Hadoop's Java Home

To find the default Java path

 readlink -f /usr/bin/java | sed "s:bin/java::"

To begin, open hadoop-env.sh:

 sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Then, choose one of the following options:

Option 1: Set a Static Value

Step 4 — Running Hadoop

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

CLASSNAME run the class named CLASSNAME

where COMMAND is one of:

version print the version

jar <jar> run a jar file

note: please use "yarn jar" to launch

YARN applications, not this command.

checknative [-a|-h] check native hadoop and compression libraries

distcp <srcurl> <desturl> copy file or directories recursively

archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop

classpath prints the class path needed to get the

credential interact with credential providers

Hadoop jar and the required libraries

daemonlog get/set the log level for each daemon

trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

If everything alright it will give output

Compiled by kshvachk on 2017-12-16T01:06Z

Compiled with protoc 2.9.0

From source with checksum 9f118f95f47043332d51891e37f736e9

You might also like