BDS Session 5
BDS Session 5
Janardhanan PS
Professor
[email protected]
Topics for today
2
Hadoop - Data and Compute layers
• A data storage layer
• A Distributed File System - HDFS
• A data processing layer
• MapReduce programming
3
Hadoop 2 - Architecture
• Master-slave architecture for overall compute and data management
• Slaves implement peer-to-peer communication
Note: YARN Resource Manager also uses application level App Master
processes on slave nodes for application specific resource management 4
What changed from Hadoop 1 to Hadoop 2
• Hadoop 1:
MapReduce was coupled with resource management
• Hadoop 2 brought in YARN as a resource management
capability and MapReduce is only about data
processing.
• Hadoop 1: Single Master node with NameNode is a SPOF
• Hadoop 2 introduced active-passive and other HA
configurations besides secondary NameNodes
• Hadoop 1: Only MapReduce programs
• In Hadoop 2, non MR programs can be run by YARN
on slave nodes (since decoupled from MapReduce) as
well support for non-HDFS storage, e.g. Amazon S3
etc.
5
Hadoop Distributions
6
Topics for today
7
HDFS Features (1)
• A DFS stores data over multiple nodes in a cluster and allows multi-user access
✓ Gives a feeling to the user that the data is on single machine
✓ HDFS is a Java based DFS that sits on top of native FS
✓ Enables storage of very large files across nodes of a Hadoop cluster
✓ Data is split into large blocks : 128MB (Default)
• Scale through parallel data processing
✓ 1 node with 1TB storage can have an IO bandwidth of 400MBps across 4 IO channels = 43 min
✓ 10 nodes with partitioned 1 TB data can access in parallel that data in 4.3 min
8
HDFS Features (2)
9
HDFS Features (3)
• Variety and Volume of Data: Huge data i.e. Terabytes & petabytes of data and
different kinds of data - structured, unstructured or semi structured.
10
HDFS Features (4)
• Data Locality: Data locality talks about moving processing unit to data rather
than the data to processing unit. Bring the computation part to the data nodes
where the data is residing. Hence, you are not moving the data, you are
bringing the program or processing part to the data.
11
HDFS Architecture - Master node
12
HDFS Architecture - Slave nodes
• Multiple slave nodes with one
DataNode per slave
• Serves block R/W from Clients
• Serves Create/Delete/Replicate
requests from NameNode
• DataNodes interact with each other
for pipeline reads and writes.
13
Functions of a NameNode
• Maintains namespace in HDFS with 2 files
• FsImage: Contains mapping of blocks to file, hierarchy, file properties / permissions
• EditLog: Transaction log of changes to metadata in FsImage
• Does not store any data - only meta-data about files
• Runs on Master node while DataNodes run on Slave nodes
• HA can be configured (discussed later)
• Records each change that takes place to the meta-data. e.g. if a file is deleted in HDFS, the NameNode
will immediately record this in the EditLog.
• Receives periodic Heartbeat and a block report from all the DataNodes in the cluster to ensure that the
DataNodes are live.
• Ensure replication factor is maintained across DataNode failures
• In case of the DataNode failure, the NameNode chooses new DataNodes for new
replicas, balance disk usage and manages the communication traffic to the DataNodes
14
Where are fsimage and edit logs ?
15
Namenode - What happens on start-up
1. Enters into safe mode
✓ Check for status of Data nodes on slaves
• Does not allow any Datanode replications in this mode
• Gets heartbeat and block report from Datanodes
• Checks for minimum Replication Factor needed for configurable majority of blocks
✓ Updates meta-data (this is also done at checkpoint time)
• Reads FsImage and EditLog from disk into memory
• Applies all transactions from the EditLog to the in-memory version of FsImage
• Flushes out new version of FsImage on disk
• Keeps latest FsImage in memory for client requests
• Truncates the old EditLog as its changes are applied on the new FsImage
2. Exits safe mode
3. Continues with further replications needed and client requests
16
Functions of a DataNode (1)
17
Functions of a DataNode (2)
• DataNode continuously sends heartbeat to
NameNode (default 3 sec)
✓ To ensure the connectivity with NameNode
• If no heartbeat message from DataNode, NN
NameNode replicates that DataNode within the
cluster and removes the DN from the meta-data
records
heartbeat replicate
• DataNodes also send a BlockReport on start-up /
periodically containing file list no heartbeat
18
Topics for today
19
Hadoop 2: Introduction of Secondary NameNode
20
HA configuration of NameNode
JournalNodes / NFS Zookeeper (3 or 5)
• Active-Passive configuration can also be setup
with a standby NameNode
• Can use a Quorum Journal Manager (QJM) or
NFS to maintain shared state
• DataNodes send heartbeats and updates to both
NameNodes.
• Writes to JournalNodes only happens via Active Active
NameNode Passive
NameNode - avoids “split brain” scenario of NameNode
network partitions
• Standby reads from JournalNodes to keep updated
on state as well as latest updates from DataNodes
• Zookeeper session may be used for failure
detection and election of new Active
Client DataNodes
21
Other robustness mechanisms
22
Communication
23
Topics for today
24
Blocks in HDFS
• HDFS stores each file as blocks which are scattered throughout the Apache Hadoop cluster.
• The default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in Apache Hadoop
1.x) which you can configure as per your requirement.
• It is not necessary that in HDFS, each file is stored in exact multiple of the configured block
size (128 MB, 256 MB etc.).
• HDFS is used for TB/PB size files and small block size will create too much meta-data
• Larger blocks will further reduce the “indexing” at block level, impact load balancing across
nodes etc.
25
How to see blocks of a file in HDFS - fsck
26
HDFS Blocksize Vs Input Split
• e.g. $HADOOP_HOME/data/dfs/data/hadoop-$
{user.name}/current
28
Replica Placement Strategy - with Rack awareness
29
Why rack awareness ?
30
Topics for today
31
HDFS data writes
Now, the following protocol will be followed whenever the data is written into
HDFS:
• HDFS client contacts NameNode for Write Request against the two blocks,
say, Block A & Block B.
• NameNode grants permission to client with IP addresses of the DataNodes to
copy blocks
• Selection of DataNodes is randomized but factoring in availability, RF, and
rack awareness
• For 3 copies, 3 unique DNs needed, if possible, for each block.
◦ For Block A, list A = {DN1, DN4, DN6}
◦ For Block B, set B = {DN3, DN7, DN9}
• Each block will be copied in three different DataNodes to maintain the
replication factor consistent throughout the cluster.
• Now the whole data copy process will happen in three stages:
1. Set up of Pipeline
2. Data streaming and replication
3. Shutdown of Pipeline (Acknowledgement stage)
32
HDFS Write: Step 1. Setup pipeline
33
HDFS Write: Step 1. Setup pipeline for a block
34
HDFS Write: Step 2. Data streaming
35
HDFS Write: Step 3. Shutdown pipeline / ack
DN 6 to 4 and then to 1.
successfully.
36
Multi-block writes
37
Sample write code
public class WriteFileToHDFS{
public static void main(String[] args) throws IOException {
WriteFileToHDFS.writeFileToHDFS();
}
public static void writeFileToHDFS() throws IOException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem fileSystem = FileSystem.get(configuration);
String fileName = "read_write_hdfs_example.txt";
Path hdfsWritePath = new Path("/javareadwriteexample/" + fileName);
FSDataOutputStream fsDataOutputStream = fileSystem.create(hdfsWritePath,true);
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(fs
DataOutputStream,StandardCharsets.UTF_8));
bufferedWriter.write("Java API to write data in HDFS");
bufferedWriter.newLine();
bufferedWriter.close();
fileSystem.close();
}
}
38
HDFS Create / Write - Call sequence in code
39
HDFS Create / Write - Call sequence in code
3) The first DataNode stores packet and forwards it to
Second DataNode and then Second node transfer it to
Third DataNode.
4) FSDataOutputStream also manages a “Ack queue” of
packets that are waiting for the acknowledgement by
DataNodes.
5) A packet is removed from the queue only if it is
acknowledged by all the DataNodes in the pipeline
6) When the client finishes writing to the file, it calls
close() on the stream
7) This flushes all the remaining packets to DataNode
pipeline and waits for relevant acknowledgements before
communicating the NameNode to inform the client that
the writing of the file is complete.
40
HDFS Read
1.Client contacts NameNode asking for the block metadata
for a file
2.NameNode returns list of DNs where each block is
stored
3.Client connects to the DNs where blocks are stored
4.The client starts reading data parallel from the DNs (e.g.
Block A from DN1, Block B from DN3)
5.Once the client gets all the required file blocks, it will
combine these blocks to form a file.
41
Sample read code
public class ReadFileFromHDFS{
public static void main(String[] args) throws IOException {
ReadFileFromHDFS.readFileFromHDFS();
}
public static void readFileFromHDFS() throws IOException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem fileSystem = FileSystem.get(configuration);
String fileName = "read_write_hdfs_example.txt";
Path hdfsReadPath = new Path("/javareadwriteexample/" + fileName);
FSDataInputStream inputStream = fileSystem.open(hdfsReadPath);
//Classical input stream usage
String out= IOUtils.toString(inputStream, "UTF-8");
System.out.println(out);
inputStream.close();
fileSystem.close();
}
} 42
HDFS Read - Call sequence in code
1) Client opens the file that it wishes to read from by calling open() on
FileSystem
1) FileSystem communicates with NameNode to get location of
data blocks.
2) NameNode returns the addresses of DataNodes on which blocks
are stored.
3) FileSystem returns FSDataInputStream to client to read from
file.
2) Client then calls read() on the stream, which has addresses of
DataNodes for first few blocks of file, connects to the closest
DataNode for the first block in file
1) Client calls read() repeatedly to stream the data from DataNode
2) When end of block is reached, stream closes the connection with
DataNode.
3) The stream repeats the steps to find the best DataNode for the
next blocks.
3) When the client completes the reading of file, it calls close() on the
stream to close the connection.
43
Read optimizations
44
Security
45
Topics for Session 5
46
File formats
47
File formats - text based
48
File formats - sequence files
• A flat file consisting of binary key/value pairs
• Extensively used in MapReduce as input/output formats as well as internal temporary
outputs of maps
• 3 types
1. Uncompressed key/value records.
2. Record compressed key/value records - only 'values' are compressed here.
3. Block compressed key/value records - both keys and values are collected in
configurable 'blocks' and compressed. Sync markers added for random access and
splitting.
• Header includes information about :
• key, value class
• whether compression is enabled and whether at block level
• compressor codec used
ref: https:/cwiki.apache.org/confluence/display/HADOOP2/SequenceFile 49
Example: Storing images in HDFS
public class ImageToSeq {
confHadoop.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(confHadoop);
value pairs, maybe attach some meta-data
Path inPath = new Path("/mapin/1.png");
Text key = new Text(); • Files are stored as binary format, e.g.
BytesWritable value = new BytesWritable();
in.read(buffer);
the files, as well as use meta-data to
writer = SequenceFile.createWriter(fs, confHadoop, outPath, key.getClass(),value.getClass());
}catch (Exception e) {
retrieve specific images
System.out.println("Exception MESSAGES = "+e.getMessage());
finally {
IOUtils.closeStream(writer);
} Ref: https://stackoverflow.com/questions/16546040/store-images-videos-into-hadoop-hdfs 50
File formats - Optimized Row Columnar (ORC) *
Improves performance when Hive is reading, writing, and processing data
state string,
zip int
52
Topics for today
55
Summary
56
Next Session:
Distributed Programming