Lecture13 15319 MHH 27feb 2012
Lecture13 15319 MHH 27feb 2012
CS 15-319
Distributed File Systems and Cloud Storage Part II
Lecture 13, Feb 27, 2012
Majd F. Sakr, Mohammad Hammoud and
Suhail Rehman
1
Today
Last session
Distributed File Systems and Cloud Storage- Part I
Todays session
Distributed File Systems and Cloud Storage- Part II
Announcement:
Project update is due next Wednesday, Feb 29
Basics
DFS Aspects
DFS Aspects
Aspect
Description
Architecture
Processes
Communication
Naming
Synchronization
Fault Tolerance
DFS Aspects
Aspect
Description
Architecture
Processes
Communication
Naming
Synchronization
Fault Tolerance
Processes (1)
Cooperating processes in DFSs are usually the storage servers
and file manager(s)
The most important aspect concerning DFS processes is whether
they should be stateless or stateful
1. Stateless Approach:
Does not require that servers maintain any client state
When a server crashes, there is no need to enter a recovery
phase to bring the server to a previous state
Locking a file cannot be easily done
E.g., NFSv3 and PVFS (no client-side caching)
Processes (2)
2. Stateful Approach:
Requires that a server maintains some client state
Clients can make effective use of caches but this would entail an
efficient underlying cache consistency protocol
Provides a server with the ability to support callbacks (i.e., the ability
to do RPC to a client) in order to keep track of its clients
E.g., NFSv4 and HDFS
DFS Aspects
Aspect
Description
Architecture
Processes
Communication
Naming
Synchronization
Fault Tolerance
Communication
Communication
calls (RPCs)
The main reason for choosing RPC is to make the system independent from
underlying OSs, networks, and transport protocols
In NFS, all communication between a client and server proceeds along the
Open Network Computing RPC (ONC RPC)
HDFS uses RPC for the communication between clients, DataNodes and the
NameNode
in
DFSs
is
typically based
on remote procedure
The communication with I/O daemons and the manager is handled transparently
within the API implementation
DFS Aspects
Aspect
Description
Architecture
Processes
Communication
Naming
Synchronization
Fault Tolerance
Naming
Names are used to uniquely identify entities in distributed systems
Entities may be processes, remote objects, newsgroups,
Names are mapped to an entitys location using a name resolution
An example of name resolution
Name
http://www.cdk5.net:8888/WebExamples/earth.html
DNS Lookup
55.55.55.55
MAC address
02:60:8c:02:b0:5a
8888
WebExamples/earth.htm
l
Host
Naming In DFSs
Naming In NFS
The fundamental idea underlying the NFS naming model is to
provide clients with complete transparency
Transparency in NFS is achieved by allowing a client to mount
a remote file system into its own local file system
However, instead of mounting an entire file system, NFS
allows clients to mount only part of a file system
A server is said to export a directory to a client when a client
mounts a directory, and its entries, into its own name space
Mounting in NFS
Client A
remote
Client B
Server
usr
users
Mount steen
subdirectory
work
usr
Mount steen
subdirectory
vu
me
steen
mbox
mbox
mbox
Exported directory
mounted by Client A
usr
Exported directory
mounted by Client B
Example
Client A
remote
Client B
Server
usr
users
Mount steen
subdirectory
usr
Mount steen
subdirectory
bin
bin
steen
mbox
mbox
mbox
Exported directory
mounted by Client A
work
usr
Exported directory
mounted by Client B
Server B
Server A
bin
packages
draw
draw
install
install
install
In addition, Alice may have access to Bobs (another user) public files by
accessing Bobs directory through /home/bob
lot
of
communication
and
Server Machine
1. Lookup /home/alice
users
3. Mount request
NFS Client
Automounter
alice
home
d
sub
t
un
Mo
4.
alice
ir
e
alic
m
fr o
ver
r
e
s
Naming in HDFS
An HDFS cluster consists of a single NameNode (the master) and
multiple DataNodes (the slaves)
The NameNode manages HDFS
accesses to files by clients
The DataNodes manage storage attached to the nodes that they run on
They are responsible for serving read and write requests from clients
They perform block creation, deletion, and replication upon instructions
from the NameNode
DistributedFileSystem
2. G
et B
lo
HDFS
Client
3. R
ead
ck L
o
cati
o
ns
FSDataInputStream
5. R
ead
6. Close
NameNode
namenode
Client JVM
ead
4. R
Client Node
DataNode
DataNode
DataNode
namenode
namenode
namenode
Data Reads
For each block, the NameNode returns the addresses of the DataNodes that
have a copy of that block
During the read process, DFSInputStream calls the NameNode to retrieve the
DataNode locations for the next batch of blocks needed
The DataNodes are sorted according to their proximity to the client in order to
exploit data locality
An important aspect of this design is that the client contacts DataNodes
directly to retrieve data and is guided by the NameNode to the best DataNode
for each block
DistributedFileSystem
HDFS
Client
3. W
rite
FSDataOutputStream
2. C
r
7. C
om p
eate
NameNode
lete
namenode
6. Close
r it e
t
cke
Pa
t
c ke
Pa
ck
5. A
Client Node
4. W
Client JVM
4
Pipeline of DataNodes
DataNode
DataNode
DataNode
namenode
namenode
namenode
Data Pipelining
When a client is writing data to an HDFS file, its data is first written to a local file
When the local file accumulates a full block of user data, the client retrieves a list
of DataNodes from the NameNode
Libpvfs:
Metadata
Manager
Naming in PVFS
PVFS file systems may be mounted on all nodes in the same
directory simultaneously
This allows all nodes to see and access all files on the PVFS file
system through the same directory scheme
Once mounted, PVFS files and directories can be operated on with
all the familiar tools, such as ls, cp, and rm
With PVFS, clients can avoid making requests to the file system
through the kernel by linking to the PVFS native API
Metadata Access
Data Access
Description
Architecture
Processes
Communication
Naming
Synchronization
Fault Tolerance