0% found this document useful (0 votes)
57 views6 pages

Experiment 01 PDF

The experiment aimed to set up a Hadoop single node cluster and compare versions 1.x, 2.x, and 3.x. A single node cluster was successfully set up with the latest Hadoop 3.x version following steps like installing Java, configuring SSH, installing Hadoop, and editing configuration files. Hadoop 1.x introduced MapReduce and HDFS but only supported single tenancy, while 2.x added YARN for better resource management and multi-tenancy. Hadoop 3.x further improved scalability and supports resources beyond CPU and memory.

Uploaded by

Kaushik Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views6 pages

Experiment 01 PDF

The experiment aimed to set up a Hadoop single node cluster and compare versions 1.x, 2.x, and 3.x. A single node cluster was successfully set up with the latest Hadoop 3.x version following steps like installing Java, configuring SSH, installing Hadoop, and editing configuration files. Hadoop 1.x introduced MapReduce and HDFS but only supported single tenancy, while 2.x added YARN for better resource management and multi-tenancy. Hadoop 3.x further improved scalability and supports resources beyond CPU and memory.

Uploaded by

Kaushik Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Experiment 01 -222051017

16 April 2023 13:45

Aim: Setup Hadoop Single node cluster. Compare Hadoop 1.x, 2.x and 3.x

Theory:
The Apache Hadoop software library provides a framework for the distributed processing of massive
data volumes across computer clusters. From a single server to thousands of devices, each providing
local computing and storage, it is intended to scale up. The library itself is designed to identify and
handle problems at the application layer rather than relying on hardware to provide high availability. As
a result, a highly-available service is delivered on top of a cluster of computers, each of which may be
prone to failures.

Implementation:

A) Setup of Hadoop Single Node cluster:

a. Install latest or desired version of java

b. Since I want to manage Hadoop files independently, I create a separate Hadoop user named
'hadoop'. And then switch from angela_user to hadoop user and make sure this newly
created user is a member of the group.

Big Data Lab Page 1


c. Next we configure password-less ssh.

Big Data Lab Page 2


d. Install and Configure Apache Hadoop in hadoop user.

e. Next we configure Java Environment Variables and then Edit core-site.xml, hdfs-site.xml,
mapred-site.xml and yarn-site.xml

Big Data Lab Page 3


f. Format the HDFS NameNode as shown above and validate the Hadoop configuration.
We launch the namenode, datanode, yarn resource and Node Manager.

g. In order to verify the running components, we check jps (java vm process status) as shown
above.
Knowing one’s IP address and Hadoop port will allow access to the Hadoop dashboard.

Example : http://localhost:9870/

Big Data Lab Page 4


Thus a single cluster node is setup.

B) Comparison between Hadoop 1.X vs Hadoop 2.X vs Hadoop 3.X:

Hadoop 1.X Hadoop 2.X Hadoop 3.X (currently installed


above)
Hadoop 1.x was Hadoop 2.x released in 2012 Hadoop 3.x released in 2017
released in 2011
It introduced YARN (Yet another resource In Hadoop 3.x, the YARN resource
MapReduce and HDFS. negotiator) added for better model is generalized to support
That is to say, the resource management. As a result, user-defined resource types
MapReduce frameowrk it enabled multi-tenancy. beyond CPU and memory. For
is used as data Therefore, the same cluster can be example, the administrator can
processing and for used by MapReduce as well as by define resources like GPUs,
resource management some other processes using YARN. software licenses, or locally-
also. attached storage. YARN tasks can
then be scheduled based on the
availability of these resources.
Supports single tenancy Supports multiple tenants using Multiple tenants are supported
only YARN here.
Hadoop 1.x uses Hadoop 2.x is also a Master-Slave It added supports for multiple
Master-Slave architecture. However, this active namenodes
architecture that consists of multiple masters that
consists of a single includes active namenode and
master and multiple standby namenode. So, in this case
slaves. So, in case the if master node get failed then the
master node gets failed standby master node will take over
then the entire clusters it. As a result, hadoop 2.x fixes the
become unavailable. problem of a single point of
failure.
Hadoop 1.x is limited to It supports up to 10000 nodes in a The scalability is improved in
4000 nodes per cluster. cluster. Hadoop 3.x and it can have more
than 10000 nodes in one cluster.

Big Data Lab Page 5


than 10000 nodes in one cluster.
Manual intervention is needed for We don’t need manual intervention
namenode recovery. for namenode recovery.
Java 7 is the minimum supported Java 8 is the minimum supported
version version.
It supports HDFS(default), FTP, All file systems including Microsoft
Amazon S3 and Windows Azure Azure Data Lake filesystem is
Storage Blobs (WASB) file systems. compatible with Hadoop 3.x.
It uses 3x replication scheme that Hadoop 3 uses eraser encoding in
results in 200% storage overhead. HDFS that helps to reduce the
storage overhed. It has 50% storage
overhead only.
It added support for GPU hardware
that can be used to execute deep
leanring algorithms on a Hadoop
cluster.

Diagrammatically the above differences can be represented as :

Conclusion:

Thus, in this experiment, we have set up Hadoop single node cluster and have compared different
versions of Hadoop (v1.x, v2.x and v3.x)

Big Data Lab Page 6

You might also like