HDFS
HDFS
Secondary
HDFS Client Name Node
NameNode
Rack 1 Rack n
DataNode DataNode DataNode
DataNode
Hadoop Distributed File System
Copy back to
FsImage FsImage
Name node
Data Nodes
Stores the files as data
blocks
Serves the read and
write requests
Sends Heartbeat
messages to Namenode
Sends block report to
Namenode
Rack-Aware Placement Policy
Blocks replicated on
datanodes
One replica on a local
node
Another on a remote
rack and third one on a
different node on the
same remote rack
HDFS Read Path
Client request to Namenode to get
block locations
Namenode checks if file available
and whether client has permissions
to read file
Returns data block locations sorted
by distance from client node
Client can read from local node and
other nodes based on the sorted
list
HDFS Write Path
Client request to Namenode to
create a new file in the filesystem
namespace
Namenode checks if file already
exists or not and whether client has
permissions to write file
Returns an output stream object
Client writes to output stream object
which splits data into packets and
puts them into a data queue
HDFS Write Path
Thread of consumed data pockets
gets block location information from
Namenode
Pockets of data from data queue
written to first datanode on the
replication pipeline, when then
writes to second datanode and so
on. This goes on till block size is
reached
Client requests Namenode for new
blocks for additional data
HDFS Write Path
Acknowledgement sent to
Client from datanodes
Process continues till all data
pockets written to datanodes
and all acknowledged by
datanodes
Client closes output stream
and requests Namenode to
close the file
HDFS
HDFS