We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16
*HDFS
ARCHITECTURE Hadoop Distributed File System *HDFS - FEATURES *HDFS stores very large files running on a cluster of commodity hardware.
*HDFS stores data reliably even in the case
of hardware failure. It provides high throughput by providing the data access in parallel. *HDFS ARCHITECTURE EXPLAINED * Hadoop Distributed File System follows the master- slave architecture.
* Each cluster comprises a single master node and
multiple slave nodes.
* Internally the files get divided into one or more blocks,
and each block is stored on different slave machines depending on the replication factor.
* The Master node is the NameNode and DataNodes are
the slave nodes *MASTER NODE / NAME NODE *NameNode is the centerpiece of the Hadoop Distributed File System. *It maintains and manages the file system namespace and provides the right access permission to the clients. *Fsimage: Fsimage stands for File System image. It contains the complete namespace of the Hadoop file system since the NameNode creation.
*Edit log: It contains all the recent
changes performed to the file system namespace to the most recent Fsimage. *HDFS DATA NODE *DataNodes are the slave nodes in Hadoop HDFS.
*DataNodes are inexpensive commodity
hardware.
*They store blocks of a file.
*HDFS DATA NODE RESPONSIBILITIE S * DataNode is responsible for serving the client read/write requests.
* Based on the instruction from the NameNode,
DataNodes performs block creation, replication, and deletion.
* DataNodes send a heartbeat to NameNode to report
the health of HDFS.
* DataNodes also sends block reports to NameNode to
report the list of blocks it contains. *SECONDARY NAMENODE *HDFS BACKUP NODES *A Backup node provides the same check pointing functionality as the Checkpoint node.
*In Hadoop, Backup node keeps an in-
memory, up-to-date copy of the file system namespace. It is always synchronized with the active NameNode state. *Replication Management * HDFS stores replicas of a block on multiple DataNodes based on the replication factor.
* If the replication factor is 3, then three copies
of a block get stored on different DataNodes.
* So if one DataNode containing the data block
fails, then the block is accessible from the other DataNode containing a replica of the block. *Replication Management *Ifwe are storing a file of 128 Mb and the replication factor is 3, then (3*128=384) 384 Mb of disk space is occupied for a file as three copies of a block get stored. *HDFS Rack awareness algorithm *The first replica will get stored on the local rack.
*The second replica will get stored on the
other DataNode in the same rack.
*The third replica will get stored on a
different rack. *HDFS READ/WRITE OPERATION *Study link from the web *https://data-flair.training/blogs/hadoop-hdf s-architecture/