Unit 3 HDFS
Unit 3 HDFS
Master Node
NAMENODE
• The system having the namenode acts as the master server and it
does the following tasks:
• Manages the file system namespace.
• Regulates client’s access to files.
• It also executes file system operations such as renaming, closing, and
opening files and directories.
DATANODE
• The datanode is a commodity hardware having any operating system and
datanode software.
• For every node (Commodity hardware/System) in a cluster, there will be a
datanode. These nodes manage the data storage of their system.
• Datanodes perform read-write operations on the file systems, as per client
request.
• They also perform operations such as block creation, deletion, and replication
according to the instructions of the namenode.
BLOCKS
Generally the user data is stored in the files of HDFS. The file in a file system will be
divided into one or more segments and/or stored in individual data nodes. These
file segments are called as blocks.
In other words, the minimum amount of data that HDFS can read or write is called
a Block. The default block size is 128MB, but it can be increased as per the need to
change in HDFS configuration.
Racks
https://www.npntraining.com/blog/anatomy-of-file-read-and-write/
HDFS Architecture
(-R option will make the change recursively through the directory structure)
The HDFS components comprise different servers like NameNode, DataNode,
and Secondary NameNode.
NameNode Server DataNode Server Secondary NameNode Server
(single instance) (multiple instances) (single instance)
Maintains the file system name
space Associated with data storage Not exactly a hot backup of the
places in the file system actual NameNode server
Manages the files and directories
in the file system tree Reports to NameNode
periodically with lists of Used for recovery of NameNode
Stores information in the blocksthey store in case of NameNode failure
namespace image and the edit
log Stores and retrieves blocks when
referred by clients or NameNode Keeps namespace image through
NameNode knows the data nodes edit log periodically
on which all the blocks for a Servers read, write requests,
given file exist performs block creation, Namespace image lags behind,
deletion, and replication upon so total recovery is impossible
NameNode is a critical one point instruction from NameNode
failure node