0% found this document useful (0 votes)

20 views6 pages

Experiment 1 Hadoop Installation

The document provides step-by-step instructions for installing Java and Hadoop on a single node. It covers downloading and extracting Hadoop, configuring environment variables, formatting HDFS, and starting the daemons to launch a single node Hadoop cluster in pseudo-distributed mode for testing.

Uploaded by

20261A6757 VIJAYAGIRI ANIL KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views6 pages

Experiment 1 Hadoop Installation

Uploaded by

20261A6757 VIJAYAGIRI ANIL KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

Installation of Hadoop
Aim: to insall Java and Hadoop
Introduction:
The Apache Hadoop software library is a framework that allows for the distributed processing of
large data sets across clusters of computers using simple programming models. It is designed to
scale up from single servers to thousands of machines, each offering local computation and storage.
Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and
handle failures at the application layer, so delivering a highly-available service on top of a cluster of
computers, each of which may be prone to failures.
Modules:
Hadoop includes these main modules ( Components):

 Hadoop Common: The common utilities and libraries that support the other Hadoop
modules.
 Hadoop Distributed File System (HDFS™): A distributed file system that provides high-
throughput access to application data. It is responsible for persisting data to disk. HDFS is a
file system that is distributed over numerous nodes.
 Hadoop YARN: A framework for job scheduling and cluster resource management. Yet
Another Resource Negotiator (YARN), is the “operating system” for HDFS. YARN manages
the layers of resources.
 Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. It is
a framework for developing applications that handle a massive amount of data. It distributes
work within the cluster or map, then organizes and reduces the results from the nodes into a
response to a query. Many other processing models are available for the 3.x version of
Hadoop.
Who Uses Hadoop?
A wide variety of companies and organizations use Hadoop for both research and production. Users
are encouraged to add themselves to the Hadoop.
Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open source
project in the big data playing field and is sponsored by the Apache Software Foundation.
Hadoop has been used in machine learning and data mining techniques. It is also used for managing
multiple dedicated servers.

Single Node Hadoop Deployment (Pseudo-Distributed Mode)

Hadoop excels when deployed in a fully distributed mode on a large cluster of networked servers.
However, if you are new to Hadoop and want to explore basic commands or test applications, you
can configure Hadoop on a single node. It is suitable for learning about Hadoop, performing simple
operations, and debugging.
This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as a single
Java process. In this tutorial, you’ll run one of the example MapReduce programs it includes to
verify the installation.
Installation Starts Here
sudo apt-get remove ssh

sudo apt-get remove pdsh

Configure Password-less SSH

sudo apt install openssh-server openssh-client -y

ssh-keygen -t rsa -P ''

Generating public/private rsa key pair.

( Note: instead of “user”, your system name will be shown, type the path as shown )

Enter file in which to save the key (/home/csm-6/.ssh/id_rsa): /home/user/.ssh/id_rsa

/home/user/.ssh/id_rsa already exists.
Overwrite (y/n)? y

sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

sudo chmod 640 ~/.ssh/authorized_keys

ssh localhost

Installation Steps of Java and Hadoop

1. Install Java ( Note: If Java is not installed, you do this. Otherwise skip this step. )
sudo apt install default-jdk default-jre -y

2. Check Java version

java -version
3. Download latest stable version of hadoop
sudo wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

4. Extract the downloaded tar file

sudo tar -xvzf hadoop-3.3.1.tar.gz

5. Create Hadoop directory

To ensure that all of your files are organised in one location, move the extracted directory to
/usr/local/.
sudo mv hadoop-3.3.1 /usr/local/hadoop

6. To maintain hadoop logs, create a different directory inside of usr/local/hadoop called logs.
sudo mkdir /usr/local/hadoop/logs

7. Finally, use the following command to modify the directory’s ownership.

sudo chown -R hadoop:hadoop /usr/local/hadoop
8. Configure Hadoop
sudo nano ~/.bashrc

9. Once executing the above command you can see nano editor in your terminal then paste
following lines:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS=" -Djava.library.path=$HADOOP_HOME/lib/native"

Press CTRL + S to save and CTRL + X to exit the nano editor after copying the lines above.
10. Use the following command to activate environmental variables after closing the nano editor.
source ~/.bashrc

> Configure Java Environmental variables

You must define Java environment variables in the configuration file for hadoop-env.sh in order to
configure these components, including YARN, HDFS, MapReduce, and Hadoop-related project
settings.
11. Find Java path and Open-JDK directory with help of following commands

which javac

readlink -f /usr/bin/javac

12. Edit Hadoop-env.sh file

Open the hadoop-env.sh file in your preferred text editor first. In this case, I’ll use nano.

sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Add the next few lines to the file’s end now.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"

13. Javax activation

Install Javax by going to the hadoop directory.
cd /usr/local/hadoop/lib

Now, copy and paste the following command in your terminal to download javax activation file
sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-
api/1.2.0/javax.activation-api-1.2.0.jar
14. Verify your hadoop by typing hadoop version
hadoop version
After successful installation, You will see the following lines
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r
a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar

15. Make a directory for node metadata storage and give it hadoop’s ownership
In the below commands, instead of word : “hadoop”, use your system user Like:
csm-1, a213a-6, etc.

sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}

sudo chown -R hadoop:hadoop /home/hadoop/hdfs

Format the HDFS NameNode and validate the Hadoop configuration.

16. Format namenode

hdfs namenode -format

Launch the Apache Hadoop Cluster

17. Launch the namenode and datanode
start-dfs.sh

Starting namenodes on [dkrao-HP-EliteBook-Folio-9470m]

Starting datanodes
Starting secondary namenodes [dkrao-HP-EliteBook-Folio-9470m]

18. Launch the yarn resource and node manager

start-yarn.sh

Starting resourcemanager
resourcemanager is running as process 2676. Stop it first and ensure /tmp/hadoop-dkrao-
resourcemanager.pid file is empty before retry.
Starting nodemanagers

19. Verify running components

jps

15921 NodeManager
16035 Jps
2676 ResourceManager

20. stop-all.sh

WARNING: Stopping all Apache Hadoop daemons as dkrao in 10 seconds.

WARNING: Use CTRL-C to abort.
Stopping namenodes on [dkrao-HP-EliteBook-Folio-9470m]
Stopping datanodes
Stopping secondary namenodes [dkrao-HP-EliteBook-Folio-9470m]
Stopping nodemanagers
Stopping resourcemanager

For more information, see https://blog.devgenius.io/install-configure-and-setup-hadoop-in-ubuntu-

a3cdd6305a0e

References:
https://hadoop.apache.org/
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
common/SingleCluster.html
https://cwiki.apache.org/confluence/display/HADOOP2/Hadoop2OnWindows

Aryan
No ratings yet
Aryan
60 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
Hadoop Installaion
No ratings yet
Hadoop Installaion
113 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
Big Data File
No ratings yet
Big Data File
32 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
Experiment-2 BDA Lab
No ratings yet
Experiment-2 BDA Lab
13 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Big Data
No ratings yet
Big Data
32 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Installing Hadoop On Ubuntu
No ratings yet
Installing Hadoop On Ubuntu
29 pages
Installing A Single Node Hadoop Cluster
No ratings yet
Installing A Single Node Hadoop Cluster
4 pages
Installationof Hadoop 3
No ratings yet
Installationof Hadoop 3
6 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
BDAO
No ratings yet
BDAO
23 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
BDA Practical1 MC18-23
No ratings yet
BDA Practical1 MC18-23
17 pages
Bda Record
No ratings yet
Bda Record
27 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Updated CMD
No ratings yet
Updated CMD
23 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Hadoop Installation
No ratings yet
Hadoop Installation
12 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages