Lecture 4 - Hadoop HDFS
Lecture 4 - Hadoop HDFS
DSCI 551
Wensheng Wu
1
Hadoop
• A large-scale distributed & parallel batch-
processing infrastructure
• Large-scale:
– Handle a large amount of data and computation
• Distributed:
– Distribute data & computation over multiple machines
• Batch processing
– Process a series of jobs without human intervention
2
2-10 Gbps backbone between racks
3
In 2011 it was guestimated that Google had 1M machines, http://bit.ly/Shh0RO
1/7/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 36
4
select cid, title -- clause
from Course
where semester = 'Fa23'
mysql – partition
mongodb – shard
hdfs – block
5
History
• 1st version released by Yahoo! in 2006
– named after an elephant toy (Cassandra)
6
Roadmap
• Hadoop architecture
– HDFS
– MapReduce
7
Key components
• HDFS (Hadoop distributed file system)
– Distributed data storage with high reliability
• MapReduce
– A parallel, distributed computational paradigm
– With a simplified programming model
8
HDFS
• Data are distributed among multiple data nodes
– Data nodes may be added on demand for more
storage space
bar: 256MB
block 3: 128MB (4KB: 32K)
block 5: 128MB (4KB: 32K)
A C
B
/usr/john/blk_5_1.csv /usr/mary/blk_3_1.csv 10
HDFS has …
• A single NameNode, storing meta data:
– A hierarchy of directories and files (name space)
– Attributes of directories and files (in inodes), e.g.,
permission, access/modification times, etc.
– Mapping of files to blocks on data nodes
• A number of DataNodes:
– Storing contents/blocks of files
11
Compute nodes
• Data nodes are compute nodes too
• Advantage:
– Allow schedule computation close to data
12
HDFS also has …
• A SecondaryNameNode
– Maintaining checkpoints/images of NameNode
– For recovery
– not a failover node
• In a single-machine setup
– all nodes correspond to the same machine
13
Metadata in NameNode
• NameNode has an inode for each file and dir
14
Mapping information in NameNode
• E.g., file /user/aaron/foo consists of blocks 1,
2, and 4
15
Block size
• HDFS: 128 MB (version 2 & above)
– Much larger than disk block size (4KB)
v – A: 128MB; B: 4KB
– 128MB/4KB = 32K
– A: 1GB/128MB = 8; B: 1GB/4KB = 2^30/2^12 =
2^18 = 2^8K = 256K
• Why larger size in HDFS?
– Reduce metadata required per file
– Fast streaming read of data (since larger amount
of data are sequentially laid out on disk)
16
HDFS
• HDFS exposes the concept of blocks to client
17
Client and Namenode communication
• Source code (version 2.8.1)
– Definition of protocol
• ClientNamenodeProtocol.proto
• <hadoop-src-dir>\hadoop-hdfs-project\hadoop-hdfs-
client\src\main\proto
– Implementation
• ClientProtocol.java
• <hadoop-src-dir>\hadoop-hdfs-project\hadoop-hdfs-
client\src\main\java\org\apache\hadoop\hdfs\protocol
18
Key operations
• Reading:
– getBlockLocations()
• Writing
– create()
– append()
– addBlock()
19
getBlockLocations
Before reading, client needs to first obtain locations of blocks
20
getBlockLocations
• Input:
– File name
– Offset (to start reading)
– Length (how much data to be read)
• Output:
– Located blocks (data nodes + offsets)
21
s
22
../java/…hdfs/protocol/LocatedBlocks.java
Block
Offset of this block
in the entire file
Data nodes with
replicas of block
23
Create/append a file
24
Creating a file
• Needs to specify:
– Path to the file to be created, e.g., /foo/bar
– Permission mask
– Client name
– Flag on whether to overwrite (entire file!) if
already exists
– How many replicas
– Block size
25
A hierarchy of files and directories
26
Allocating new blocks for writing
Asking NameNode to allocate a new block
+ data nodes holding its replicas
27
28
Client and Datanode communication
• Source code (version 2.8.1)
– Definition of protocol
• datatransfer.proto
• Located at: <hadoop-src-dir>\hadoop-hdfs-
project\hadoop-hdfs-client\src\main\proto
– Implementation
• DataTransferProtocol.java
• <hadoop-src-dir>\hadoop-hdfs-project\hadoop-hdfs-
client\src\main\java\org\apache\hadoop\hdfs\protocol
\datatransfer
29
Operations
• readBlock()
• writeBlock()
30
Reading a file
1. Client first contacts NameNode which
informs the client of the closest DataNodes
storing blocks of the file
– This is done by making which RPC call?
31
datatransfer.proto
Block, offset, length
32
DataTransferProtocol.java
33
Writing a file
• Blocks are written one at a time
– In a pipelined fashion through the data nodes
A, [B,C]
B, [C]
C, []
35
Block to be written
Rest of data nodes
36
Data pipelining
• Consider a block X to be written to DataNode
A, B, and C (replication factor = 3)
38
Data pipelining for writing a block
Control messages
Data packets
queue:
p1,
Acknowledgment
messages
39
Acknowledgement
• Client does not wait for the acknowledgement
of previous packet before sending next one
• Advantage?
40
Roadmap
• Hadoop architecture
– HDFS
– MapReduce
41
Hadoop & HDFS installation
• Refer to the installation note posted on course
web site on how to install Hadoop and setup
HDFS
42
Working with hdfs
• Setting up home directory in hdfs
– hdfs dfs -mkdir /user
– hdfs dfs -mkdir /user/ec2-user
(ec2-user is user name of your EC2 account)
44
Working with hdfs
• Copy data from hdfs
– hdfs dfs -get /user/ec2-user/input input1
– If input1 does not exist, it will create one
– If it does, it will create another one under it
45
Working with hdfs
• Remove files
– hdfs dfs -rm /user/ec2-user/input/core-site.xml
– hdfs dfs -rm /user/ec2-user/input/*
• Remove directory
– hdfs dfs -rmdir /user/ec2-user/input
– Directory "input" needs to be empty first
46
Where is hdfs located?
• /tmp/hadoop-ec2-user/dfs/
47
References
• K. Shvachko, H. Kuang, S. Radia, and R. Chansler,
"The hadoop distributed file system," in Mass
Storage Systems and Technologies (MSST), 2010
IEEE 26th Symposium on, 2010, pp. 1-10.
48