0% found this document useful (0 votes)

18 views8 pages

Install Hadoop

This document provides a step-by-step guide for installing Hadoop on a single-node Ubuntu cluster, starting with checking Java and Hadoop versions, updating the system, and installing Java. It covers downloading and configuring Hadoop, setting up environment variables, enabling SSH, and configuring Hadoop files for core, HDFS, MapReduce, and YARN settings. Finally, it details formatting the Namenode, starting Hadoop services, verifying running services, and accessing Hadoop web interfaces for monitoring.

Uploaded by

Gopika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

Install Hadoop

Uploaded by

Gopika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Hadoop Installation on Ubuntu (Single Node Cluster)

Step 1: Check Java and Hadoop Versions

Before installing Hadoop, it is important to check whether Java is installed because

Hadoop is built on Java and requires it to function. The java -version command verifies the
Java installation, ensuring compatibility. Similarly, hadoop version checks if Hadoop is
already installed to avoid conflicts when setting up a new version. If Java is missing, we
must install it before proceeding with Hadoop installation.

Before installing Hadoop, ensure that Java is installed.

java -version

hadoop version

Step 2: Update and Upgrade the System

Running sudo apt update and sudo apt upgrade -y ensures that all system packages are up
to date. This step prevents dependency issues while installing new software like Java and
Hadoop. Updating the package list ensures we get the latest versions, and upgrading
applies security patches and software improvements.

Updating ensures that all the installed packages are up to date.

sudo apt update

sudo apt upgrade -y

The -y flag automatically confirms updates.

Step 3: Install Java

Hadoop requires Java to execute its processes. OpenJDK 11 is a stable, widely used version
that works well with Hadoop 3.x. By installing it with sudo apt install openjdk-11-jdk -y, we
ensure that Hadoop has the necessary Java runtime environment. This step is crucial
because, without Java, Hadoop will not function.

Hadoop requires Java to run. Install OpenJDK 11 using:

sudo apt install openjdk-11-jdk -y

java -version
Step 4: Download and Extract Hadoop

Hadoop is downloaded from Apache’s official website using wget. The command fetches
the Hadoop package (hadoop-3.3.6.tar.gz), which is then extracted using tar -xvzf. This
unpacks Hadoop into a directory. Finally, the extracted folder is moved to
/usr/local/hadoop, a common location for system-wide software installations. This makes
Hadoop easily accessible to all users on the system.

Download Hadoop from the official Apache website.

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

Extract the downloaded file:

tar -xvzf hadoop-3.3.6.tar.gz

Move Hadoop to the /usr/local directory for system-wide access:

sudo mv hadoop-3.3.6 /usr/local/Hadoop

Step 5: Configure Environment Variables

After installation, we need to configure environment variables to make Hadoop and Java
easily executable from any terminal session. This is done by adding the Hadoop and Java
paths to ~/.bashrc. We define JAVA_HOME, HADOOP_HOME, PATH, and
HADOOP_CONF_DIR, ensuring that the system recognizes Hadoop commands without
requiring full paths.

Edit the ~/.bashrc file to set up Hadoop and Java paths.

nano ~/.bashrc

Add the following lines at the end of the file:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/Hadoop

Save and exit (Ctrl + X, then Y, then Enter).

Apply the changes:

Once the environment variables are added, they need to be applied for the current
session. Running source ~/.bashrc reloads the bash profile so that changes take effect
immediately without needing to restart the terminal. This ensures that any Hadoop-
related commands work as expected.
Source ~/.bashrc

Step 6: Enable SSH for Hadoop

Hadoop requires SSH (Secure Shell) for communication between nodes in a distributed
environment. Even in a single-node setup, SSH is needed to start and stop Hadoop services
without manually logging in each time. This step is essential because Hadoop’s daemons
interact over SSH.

To enable password-less SSH login, we generate an SSH key pair using ssh-keygen -t rsa -P
"" -f ~/.ssh/id_rsa. The public key is then added to the authorized_keys file using cat
~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys. This setup allows Hadoop daemons to
communicate securely without repeatedly asking for passwords, which is crucial for
automation.

Hadoop requires passwordless SSH access.

ssh localhost

If SSH is not installed, install it using:

sudo apt install ssh -y

Generate SSH keys and configure passwordless SSH:

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

Now, verify SSH:

ssh localhost

Step 7: Configure Hadoop Files

Core-Site Configuration:

The core-site.xml file specifies Hadoop’s core settings. The <fs.defaultFS> property is set to
hdfs://localhost:9000, defining the default Hadoop filesystem as HDFS. The
<hadoop.tmp.dir> property sets a temporary directory for Hadoop’s intermediate
operations. This configuration is necessary to initialize and manage HDFS correctly.

Edit the core-site.xml file:

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following content:

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/tmp</value>

<description>A base directory for HDFS and other temporary files.</description>

</property>

</configuration>

Save and exit.

HDFS-Site Configuration:

The hdfs-site.xml file configures the Hadoop Distributed File System (HDFS). The
<dfs.replication> property is set to 1, meaning each file block is stored only once, which is
ideal for a single-node setup. <dfs.namenode.name.dir> and <dfs.datanode.data.dir>
specify directories for storing metadata and actual file data, ensuring proper data
organization.

Edit the hdfs-site.xml file:

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following content:

<name>dfs.replication</name>

<description>Number of replicas for HDFS blocks (set to 1 for single-node

cluster).</description>

</property>
<property>

<name>dfs.namenode.name.dir</name>

<value>file:///usr/local/hadoop/hdfs/namenode</value>

<description>Directory for Namenode metadata.</description>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:///usr/local/hadoop/hdfs/datanode</value>

<description>Directory for Datanode storage.</description>

</property>

</configuration>

Save and exit

MapReduce Configuration:

This file configures the MapReduce framework in Hadoop. The

<mapreduce.framework.name> property is set to yarn, meaning Hadoop will use YARN to
manage computational resources. <mapreduce.jobhistory.address> is set to
localhost:10020, enabling the job history server to track completed MapReduce jobs. This
configuration is essential for executing and monitoring Hadoop jobs.

Edit the mapred-site.xml file:

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following content:

<name>mapreduce.framework.name</name>

</property>
<property>

<name>mapreduce.jobhistory.address</name>

<value>localhost:10020</value>

</property>

</configuration>

Save and exit

YARN Configuration:

The yarn-site.xml file sets up YARN, the resource management layer of Hadoop.
<yarn.resourcemanager.hostname> is set to localhost, defining where the
ResourceManager will run. <yarn.nodemanager.aux-services> is set to mapreduce_shuffle,
enabling data shuffling for MapReduce jobs. These settings ensure that YARN efficiently
schedules and executes tasks.

Edit the yarn-site.xml file:

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following content:

<name>yarn.resourcemanager.hostname</name>

<value>localhost</value>

</property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Save and exit

Step 8: Format Namenode:

Before starting Hadoop for the first time, the NameNode must be formatted using hdfs
namenode -format. This command initializes the HDFS metadata and clears any previous
data. Without formatting, the system might face inconsistencies, preventing Hadoop from
functioning correctly. This step is only required for the first setup.

Before starting Hadoop, format the HDFS Namenode:

hdfs namenode -format

Step 9: Start Hadoop Services:

To launch Hadoop, we run start-dfs.sh to start HDFS services (NameNode and DataNode)
and start-yarn.sh to start YARN (ResourceManager and NodeManager). These scripts
initialize the distributed storage and resource management layers of Hadoop. Running
them ensures that the cluster is up and ready for processing tasks.

Start the Hadoop Distributed File System (HDFS):

start-dfs.sh

start-yarn.sh

Step 10: Verify Running Services:

After setting up Hadoop, we use the jps command to list all running Java processes. This
helps verify if essential Hadoop daemons like NameNode, DataNode, ResourceManager,
and NodeManager are running properly. If any service is missing, troubleshooting is
needed before proceeding.

Check if Hadoop processes are running:

Jps

Expected output:

 NameNode

 DataNode

 SecondaryNameNode

 ResourceManager

 NodeManager
Step 11: Hadoop Web Interfaces

 You can access the following Hadoop web UIs:

Service URL Description

NameNode UI http://localhost:9870/ Shows HDFS file system status.

ResourceManager UI http://localhost:8088/ Monitors running applications in

YARN.

DataNode UI http://localhost:9864/ Displays DataNode status.

NodeManager UI http://localhost:8042/ Shows NodeManager details.

Hadoop provides web interfaces for real-time monitoring:

 NameNode UI (http://localhost:9870/): Shows HDFS status, including storage

capacity and active nodes.
 ResourceManager UI (http://localhost:8088/): Displays running and completed
YARN applications.
 DataNode UI (http://localhost:9864/): Monitors individual DataNode health.
 NodeManager UI (http://localhost:8042/): Shows the status of compute nodes.

These web UIs are useful for troubleshooting and observing cluster activity.

اردو گرامر برائے نہم دہم
No ratings yet
اردو گرامر برائے نہم دہم
116 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
ESWIS External User Manual
100% (1)
ESWIS External User Manual
50 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Steps of Hadoop Installation
No ratings yet
Steps of Hadoop Installation
3 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
6 Hadoop
No ratings yet
6 Hadoop
20 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
BDAO
No ratings yet
BDAO
23 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Experiment 1 Hadoop Installation
No ratings yet
Experiment 1 Hadoop Installation
6 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Hadoop
No ratings yet
Hadoop
27 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Unit 3 PART 2
No ratings yet
Unit 3 PART 2
11 pages
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Exp 1
No ratings yet
Exp 1
24 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
What's New in SAP Extended Warehouse Management 9.5: Document Version: 1.0 - 2017-12-07
No ratings yet
What's New in SAP Extended Warehouse Management 9.5: Document Version: 1.0 - 2017-12-07
22 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
2.PC Jotun Chart1011 PDF
No ratings yet
2.PC Jotun Chart1011 PDF
3 pages
Experiment-2 BDA Lab
No ratings yet
Experiment-2 BDA Lab
13 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Practical 5
No ratings yet
Practical 5
3 pages
Motortech Application Guide Waukesha VHP Series 01.00.014 en 2018 02
No ratings yet
Motortech Application Guide Waukesha VHP Series 01.00.014 en 2018 02
76 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Support of Hadoop Cluster Installation and Administration
No ratings yet
Support of Hadoop Cluster Installation and Administration
10 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Understanding Information Centric Networking and Mobile Edge Computing
No ratings yet
Understanding Information Centric Networking and Mobile Edge Computing
24 pages
AnycubicSlicer - Usage Instructions - V1.0 - EN
100% (1)
AnycubicSlicer - Usage Instructions - V1.0 - EN
16 pages
Xi4 Series Parts Catalog en Us
No ratings yet
Xi4 Series Parts Catalog en Us
17 pages
Controls
No ratings yet
Controls
34 pages
Adafruit Ultimate Gps PDF
No ratings yet
Adafruit Ultimate Gps PDF
52 pages
Hawassa University PPT 3
No ratings yet
Hawassa University PPT 3
21 pages
Cyber Warfare: Brian Connett, LCDR, USN Us Naval Academy
No ratings yet
Cyber Warfare: Brian Connett, LCDR, USN Us Naval Academy
11 pages
Notes - 5 Unit
No ratings yet
Notes - 5 Unit
54 pages
Final PPT CAMPUS
No ratings yet
Final PPT CAMPUS
20 pages
Project 1name - Excel Activities in Email Automation - People - Email
No ratings yet
Project 1name - Excel Activities in Email Automation - People - Email
4 pages
Infineon 6EDL7151 DataSheet v01 00 en
No ratings yet
Infineon 6EDL7151 DataSheet v01 00 en
158 pages
The Vadalian Issue 1
No ratings yet
The Vadalian Issue 1
12 pages
RP C Ext Dali 1
No ratings yet
RP C Ext Dali 1
5 pages
Study Questions - Database Concepts
No ratings yet
Study Questions - Database Concepts
8 pages
Log
No ratings yet
Log
25 pages
Personality-Based Career Recommender System Using Probability Approach
No ratings yet
Personality-Based Career Recommender System Using Probability Approach
9 pages
Boucherit Oussama F1
No ratings yet
Boucherit Oussama F1
55 pages
Mathematical Modeling: Methods and Application
No ratings yet
Mathematical Modeling: Methods and Application
97 pages
Voip Cube Multi Tenants
No ratings yet
Voip Cube Multi Tenants
8 pages
Lecture 1: Matrices and Systems of Linear Equations: Brandon Behring
No ratings yet
Lecture 1: Matrices and Systems of Linear Equations: Brandon Behring
37 pages
Thinkpad Regulatory Notice: About This Manual
No ratings yet
Thinkpad Regulatory Notice: About This Manual
14 pages
Suraj Rathi Resume
No ratings yet
Suraj Rathi Resume
1 page
Accesing IO
No ratings yet
Accesing IO
3 pages
Game Requirements For Venge Io (Clone)
No ratings yet
Game Requirements For Venge Io (Clone)
3 pages
Case Study HR
No ratings yet
Case Study HR
2 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

Install Hadoop

Uploaded by

Install Hadoop

Uploaded by

Hadoop Installation on Ubuntu (Single Node Cluster)

Step 1: Check Java and Hadoop Versions

Before installing Hadoop, it is important to check whether Java is installed because

Before installing Hadoop, ensure that Java is installed.

Step 2: Update and Upgrade the System

Updating ensures that all the installed packages are up to date.

sudo apt update

sudo apt upgrade -y

The -y flag automatically confirms updates.

Step 3: Install Java

Hadoop requires Java to run. Install OpenJDK 11 using:

sudo apt install openjdk-11-jdk -y

Download Hadoop from the official Apache website.

Extract the downloaded file:

tar -xvzf hadoop-3.3.6.tar.gz

Move Hadoop to the /usr/local directory for system-wide access:

sudo mv hadoop-3.3.6 /usr/local/Hadoop

Step 5: Configure Environment Variables

Edit the ~/.bashrc file to set up Hadoop and Java paths.

Add the following lines at the end of the file:

Save and exit (Ctrl + X, then Y, then Enter).

Step 6: Enable SSH for Hadoop

Hadoop requires passwordless SSH access.

If SSH is not installed, install it using:

sudo apt install ssh -y

Generate SSH keys and configure passwordless SSH:

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

Now, verify SSH:

Step 7: Configure Hadoop Files

Edit the core-site.xml file:

Add the following content:

<description>A base directory for HDFS and other temporary files.</description>

Save and exit.

Edit the hdfs-site.xml file:

Add the following content:

<description>Number of replicas for HDFS blocks (set to 1 for single-node

<description>Directory for Namenode metadata.</description>

<description>Directory for Datanode storage.</description>

Save and exit

This file configures the MapReduce framework in Hadoop. The

Edit the mapred-site.xml file:

Add the following content:

Save and exit

Edit the yarn-site.xml file:

Add the following content:

Save and exit

Before starting Hadoop, format the HDFS Namenode:

hdfs namenode -format

Step 9: Start Hadoop Services:

Start the Hadoop Distributed File System (HDFS):

Step 10: Verify Running Services:

Check if Hadoop processes are running:

 You can access the following Hadoop web UIs:

Service URL Description

ResourceManager UI http://localhost:8088/ Monitors running applications in

DataNode UI http://localhost:9864/ Displays DataNode status.

NodeManager UI http://localhost:8042/ Shows NodeManager details.

Hadoop provides web interfaces for real-time monitoring:

 NameNode UI (http://localhost:9870/): Shows HDFS status, including storage

You might also like