0% found this document useful (0 votes)
45 views16 pages

Hdfs Architecture

Uploaded by

madhuvanthi611
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views16 pages

Hdfs Architecture

Uploaded by

madhuvanthi611
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

*HDFS

ARCHITECTURE
Hadoop Distributed File System
*HDFS - FEATURES
*HDFS stores very large files running on a
cluster of commodity hardware.

*HDFS stores data reliably even in the case


of hardware failure. It provides high
throughput by providing the data access
in parallel.
*HDFS
ARCHITECTURE
EXPLAINED
* Hadoop Distributed File System follows the master-
slave architecture.

* Each cluster comprises a single master node and


multiple slave nodes.

* Internally the files get divided into one or more blocks,


and each block is stored on different slave machines
depending on the replication factor.

* The Master node is the NameNode and DataNodes are


the slave nodes
*MASTER NODE / NAME
NODE
*NameNode is the centerpiece of the
Hadoop Distributed File System.
*It maintains and manages the file
system namespace and provides the
right access permission to the clients.
*Fsimage: Fsimage stands for File System
image. It contains the complete
namespace of the Hadoop file system
since the NameNode creation.

*Edit log: It contains all the recent


changes performed to the file system
namespace to the most recent Fsimage.
*HDFS DATA NODE
*DataNodes are the slave nodes in Hadoop
HDFS.

*DataNodes are inexpensive commodity


hardware.

*They store blocks of a file.


*HDFS DATA NODE
RESPONSIBILITIE
S
* DataNode is responsible for serving the client
read/write requests.

* Based on the instruction from the NameNode,


DataNodes performs block creation, replication, and
deletion.

* DataNodes send a heartbeat to NameNode to report


the health of HDFS.

* DataNodes also sends block reports to NameNode to


report the list of blocks it contains.
*SECONDARY
NAMENODE
*HDFS BACKUP
NODES
*A Backup node provides the same check
pointing functionality as the Checkpoint
node.

*In Hadoop, Backup node keeps an in-


memory, up-to-date copy of the file
system namespace. It is always
synchronized with the active NameNode
state.
*Replication
Management
* HDFS stores replicas of a block on multiple
DataNodes based on the replication factor.

* If the replication factor is 3, then three copies


of a block get stored on different DataNodes.

* So if one DataNode containing the data block


fails, then the block is accessible from the
other DataNode containing a replica of the
block.
*Replication
Management
*Ifwe are storing a file of 128 Mb and the
replication factor is 3, then (3*128=384)
384 Mb of disk space is occupied for a file
as three copies of a block get stored.
*HDFS Rack
awareness algorithm
*The first replica will get stored on the local
rack.

*The second replica will get stored on the


other DataNode in the same rack.

*The third replica will get stored on a


different rack.
*HDFS
READ/WRITE
OPERATION
*Study link from
the web
*https://data-flair.training/blogs/hadoop-hdf
s-architecture/

You might also like