DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
3
Hadoop 2 - Architecture
• Master-slave architecture for overall compute and data management
• Slaves implement peer-to-peer communication
Note: YARN Resource Manager also uses application level App Master
processes on slave nodes for application specific resource management 4
What changed from Hadoop 1 to Hadoop 2
5
Hadoop Distributions
6
Topics for today
7
HDFS Features (1)
• A DFS stores data over multiple nodes in a cluster and allows multi-user access
✓ Gives a feeling to the user that the data is on single machine
✓ HDFS is a Java based DFS that sits on top of native FS
✓ Enables storage of very large files across nodes of a Hadoop cluster
✓ Data is split into large blocks : 128MB
• Scale through parallel data processing
✓ 1 node with 1TB storage can have an IO bandwidth of 400MBps across 4 IO
channels = 43 min
✓ 10 nodes with partitioned 1 TB data can access in parallel that data in 4.3
min
8
HDFS Features (2)
9
HDFS Features (3)
10
HDFS Features (4)
11
HDFS Architecture - Master node
12
HDFS Architecture - Slave nodes
• Multiple slave nodes with one
DataNode per slave
• Serves block R/W from Clients
• Serves Create/Delete/Replicate
requests from NameNode
• DataNodes interact with each
other for pipeline reads and
writes.
13
Functions of a NameNode
• Maintains namespace in HDFS with 2 files
• FsImage: Contains mapping of blocks to file, hierarchy, file properties / permissions
• EditLog: Transaction log of changes to metadata in FsImage
• Does not store any data - only meta-data about files
• Runs on Master node while DataNodes run on Slave nodes
• HA can be configured (discussed later)
• Records each change that takes place to the meta-data. e.g. if a file is deleted in HDFS, the
NameNode will immediately record this in the EditLog.
• Receives periodic Heartbeat and a block report from all the DataNodes in the cluster to
ensure that the DataNodes are live.
• Ensure replication factor is maintained across DataNode failures
• In case of the DataNode failure, the NameNode chooses new DataNodes for new
replicas, balance disk usage and manages the communication traffic to the DataNodes
14
Where are fsimage and edit logs ?
15
Namenode - What happens on start-up
1. Enters into safe mode
✓ Check for status of Data nodes on slaves
• Does not allow any Datanode replications in this mode
• Gets heartbeat and block report from Datanodes
• Checks for minimum Replication Factor needed for configurable majority of
blocks
✓ Updates meta-data (this is also done at checkpoint time)
• Reads FsImage and EditLog from disk into memory
• Applies all transactions from the EditLog to the in-memory version of FsImage
• Flushes out new version of FsImage on disk
• Keeps latest FsImage in memory for client requests
• Truncates the old EditLog as its changes are applied on the new FsImage
2. Exits safe mode
3. Continues with further replications needed and client requests
16
Functions of a DataNode (1)
17
Functions of a DataNode (2)
• DataNode continuously sends heartbeat to
NameNode (default 3 sec)
✓ To ensure the connectivity with
NameNode NN
22
Communication
23
Topics for today
• HDFS stores each file as blocks which are scattered throughout the Apache Hadoop
cluster.
• The default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in Apache
Hadoop 1.x) which you can configure as per your requirement.
• It is not necessary that in HDFS, each file is stored in exact multiple of the configured
block size (128 MB, 256 MB etc.).
• A file of size 514 MB can have 4 x 128MB and 1 x 2MB blocks
• Why large block of 128MB ?
• HDFS is used for TB/PB size files and small block size will create too much meta-
data
• Larger blocks will further reduce the “indexing” at block level, impact load
balancing across nodes etc.
25
How to see blocks of a file in HDFS - fsck
26
HDFS on local FS
• Find / configure the root of HDFS in hdfs-site.xml - > dfs.data.dir property
• e.g. $HADOOP_HOME/data/dfs/data/hadoop-${user.name}/current
• If you want to see the files in local FS that store blocks of HDFS :
• cd to the HDFS root dir specified in dfs.data.dir
• go inside the sub-dir with name you got from fsck command
• navigate into further sub-directories to find the block files
• All this mapping is stored on the NameNode to map HDFS files to blocks (local FS files) on DataNodes
27
Replica Placement Strategy - with Rack awareness
28
Why rack awareness ?
29
Topics for today
• HDFS client contacts NameNode for Write Request against the two
blocks, say, Block A & Block B.
• NameNode grants permission to client with IP addresses of the
DataNodes to copy blocks
• Selection of DataNodes is randomized but factoring in availability, RF,
and rack awareness
• For 3 copies, 3 unique DNs needed, if possible, for each block.
◦ For Block A, list A = {DN1, DN4, DN6}
◦ For Block B, set B = {DN3, DN7, DN9}
• Each block will be copied in three different DataNodes to maintain the
replication factor consistent throughout the cluster.
• Now the whole data copy process will happen in three stages:
1. Set up of Pipeline
2. Data streaming and replication
3. Shutdown of Pipeline (Acknowledgement stage)
•
31
HDFS Write: Step 1. Setup pipeline
32
HDFS Write: Step 1. Setup pipeline for a block
33
HDFS Write: Step 2. Data streaming
34
HDFS Write: Step 3. Shutdown pipeline / ack
35
Multi-block writes
36
Sample write code
37
HDFS Create / Write - Call sequence in code
38
HDFS Create / Write - Call sequence in code
3) The first DataNode stores packet and forwards it
to Second DataNode and then Second node
transfer it to Third DataNode.
4) FSDataOutputStream also manages a “Ack
queue” of packets that are waiting for the
acknowledgement by DataNodes.
5) A packet is removed from the queue only if it is
acknowledged by all the DataNodes in the
pipeline
6) When the client finishes writing to the file, it calls
close() on the stream
7) This flushes all the remaining packets to DataNode
pipeline and waits for relevant acknowledgements
before communicating the NameNode to inform
the client that the writing of the file is complete.
39
HDFS Read
1.Client contacts NameNode asking for the block
metadata for a file
2.NameNode returns list of DNs where each block is
stored
3.Client connects to the DNs where blocks are stored
4.The client starts reading data parallel from the DNs
(e.g. Block A from DN1, Block B from DN3)
5.Once the client gets all the required file blocks, it
will combine these blocks to form a file.
43
Security
44
Topics for today
46
File formats - text based
47
File formats - sequence files
• A flat file consisting of binary key/value pairs
• Extensively used in MapReduce as input/output formats as well as internal
temporary outputs of maps
• 3 types
1. Uncompressed key/value records.
2. Record compressed key/value records - only 'values' are compressed here.
3. Block compressed key/value records - both keys and values are collected in
configurable 'blocks' and compressed. Sync markers added for random access
and splitting.
• Header includes information about :
• key, value class
• whether compression is enabled and whether at block level
• compressor codec used
ref: https:/cwiki.apache.org/confluence/display/HADOOP2/SequenceFile 48
Example: Storing images in HDFS
public class ImageToSeq {
public static void main(String args[]) throws Exception { • We want to store a bunch of
Configuration confHadoop = new Configuration();
… details of conf …
images as key-value pairs, maybe
FileSystem fs = FileSystem.get(confHadoop);
Path inPath = new Path("/mapin/1.png");
attach some meta-data as well
• Files are stored as binary format,
Path outPath = new Path("/mapin/11.png");
FSDataInputStream in = null;
Text key = new Text();
BytesWritable value = new BytesWritable(); e.g. sequence file
SequenceFile.Writer writer = null;
try{
in = fs.open(inPath);
• We can apply some image
byte buffer[] = new byte[in.available()];
in.read(buffer); processing on the files, as well as
writer = SequenceFile.createWriter(fs, confHadoop, outPath,
key.getClass(),value.getClass()); use meta-data to retrieve specific
writer.append(new Text(inPath.getName()), new BytesWritable(buffer));
}catch (Exception e) { images
System.out.println("Exception MESSAGES = "+e.getMessage());
}
finally {
IOUtils.closeStream(writer);
System.out.println("last line of the code....!!!!!!!!!!");
}
}
}
Ref: https://stackoverflow.com/questions/16546040/store-images-videos-into-hadoop-hdfs 49
File formats - Optimized Row Columnar (ORC) *
Improves performance when Hive is reading, writing, and processing data
53
More HDFS commands - config and usage
Get configuration data in general, about name nodes, about any specific attribute
• hdfs getconf
• return various configuration settings in effect
• hdfs getconf -namenodes
• returns namenodes in the cluster
• hdfs getconf -confkey <a.value>
• return the value of a particular setting (e.g. dfs.replication)
• hdfs dfsadmin -report
• find out how much disk space us used, free, under-replicated, etc.
54
Summary
55
Next Session:
Hadoop MapReduce and YARN