0% found this document useful (0 votes)
14 views46 pages

Lecture24 DFS PartI 25nov 2014

Uploaded by

Arsim Krasniqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views46 pages

Lecture24 DFS PartI 25nov 2014

Uploaded by

Arsim Krasniqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Distributed Systems

CS 15-440

Distributed File Systems- Part I


Lecture 24, Nov 25, 2014

Mohammad Hammoud

1
Today…
 Last Session:
 Fault Tolerance- Part II

 Today’s Session:
 Distributed File Systems – Part I

 Announcements:
 PS4 grades will be out tomorrow
 Project 4 is due on Dec 3rd by midnight
 PS5 is due on Dec 4th by midnight
 Final Exam is on Monday Dec 8th at 9:00AM in Room 1031. It will
be comprehensive, but open book and notes

2
Intended Learning Outcomes:
Distributed File Systems
Considered: a reasonably critical and ILO7 Explain distributed file systems as a paradigm for
comprehensive perspective.
general-purpose distributed systems, and analyze
Thoughtful: Fluent, flexible and efficient
perspective. its various aspects and architectures
Masterful: a powerful and illuminating
perspective.
ILO7.1 Define distributed file systems (DFSs) and explain various
architectures
ILO7 ILO7.2 Analyze various aspects of DFSs including
processes, communication, naming,
ILO ILO
synchronization, consistency and replication, and
7.1 7.2 fault tolerance
Discussion on Distributed File Systems

Distributed File Systems (DFSs)

Basics DFS Aspects

4
Distributed File Systems
 Why File Systems?

 To organize data (as files)


 To provide a means for applications to store, access, and modify data

 Why Distributed File Systems (DFSs)?

 Sharing and managing data in distributed systems

 Big data continues to grow


 A DFS can hold sheer volumes of data (contrary to a local file
system) and provide access to these data for many clients
dispersed across networks
NAS versus SAN
 Another term for DFS is network attached storage (NAS), referring
to attaching storage to network servers that provide file systems

 A similar sounding term that refers to a very different approach is


storage area network (SAN)

 SAN makes storage devices (not file systems) available over


a network
Client
Computer File Server
(Providing
Client NAS)
Computer
LAN SAN
Client
Computer
Database
Client Server
Computer
Benefits of DFSs
 DFSs provide:

1. File sharing over a network: without a DFS, we would have to


exchange files by e-mail or use applications such as the
Internet’s FTP

2. Transparent files accesses: A user’s programs can access


remote files as if they are local. The remote files have no
special APIs; they are accessed just like local ones

3. Easy file management: managing a DFS is easier than


managing multiple local file systems
DFS Components
 DFS information can be typically categorized as follows:

1. The data state: This is the contents of files


2. The attribute state: This is the information about each file (e.g., file’s
size and access control list)
3. The open-file state: This includes which files are open or otherwise in
use, as well as describing how files are locked

 Designing a DFS entails determining how its various components


are placed. Specifically, by component placement we indicate:

 What resides on the servers


 What resides on the clients
DFS Component Placement (1)
 The data and the attribute states permanently reside on the server’s
local file system, but recently accessed or modified information
might reside in server and/or client caches

 The open-file state is transitory; it changes as processes open


and close files

Data Cache Data Cache

Attribute Cache Network Attribute Cache Local File System

Open-File State Open-File State

Client Server
DFS Component Placement (2)
 Three basic concerns govern the DFS components
placement strategy:

1. Access speed: Caching information on clients improves


performance considerably

2. Consistency: If clients cache information, do all parties share


the same view of it?

3. Recovery: If one or more computers crash, to what extent are


the others affected? How much information is lost?
Discussion on Distributed File Systems

Distributed File Systems (DFSs)

Basics DFS Aspects

11
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes?
• Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed
by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well
as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes?
• Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed
by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well
as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Architectures
 Client-Server Distributed File Systems
 Cluster-Based Distributed File Systems
 Symmetric Distributed File Systems
Architectures
 Client-Server Distributed File Systems
 Cluster-Based Distributed File Systems
 Symmetric Distributed File Systems
Network File System
 Many distributed file systems are organized along the lines of
client-server architectures

 Sun Microsystem’s Network File System (NFS) is one of the most


widely-deployed DFS for Unix-based systems

 NFS comes with a protocol that describes precisely how a client can
access a file stored on a (remote) NFS file server

 NFS allows a heterogeneous collection of processes, possibly


running on different OSs and machines, to share a common
file system
Remote Access Model
 The model underlying NFS and similar systems is that of remote
access model

Replies from
server
Server
Client
File

Requests from
client to access
remote file File stays
on server
 In this model, clients:
 Are offered transparent accesses to a file system that is managed by a
remote server
 Are normally unaware of the actual location of files
 Are offered an interface to a file system similar to the interface offered by a
conventional local file system
Upload/Download Model
 A contrary model, referred to as upload/download model, allows a
client to access a file locally after having downloaded it
from the server
File moved to client

Server

Client
File

New File

All accesses are When client is done,


done on client file is returned to server

 The Internet’s FTP service can be used this way when a client
downloads a complete file, modifies it, and then puts it back
The Basic NFS Architecture
Client Server

System call layer System call layer

Virtual File System Virtual File System


(VFS) layer (VFS) layer

Local file system Local file system


NFS client NFS server
interface interface

RPC client stub RPC server stub

A Client Request in NFS

Network
Architectures
 Client-Server Distributed File Systems
 Cluster-Based Distributed File Systems
 Symmetric Distributed File Systems
Data-Intensive Applications
 Today there is a deluge of large data-intensive applications

 Most data-intensive applications fall


into one of two styles of computing:

 Internet services (or cloud computing)


 High-performance computing (HPC)

 Cloud computing and HPC applications:


 Run typically on thousands of compute nodes
 Process sheer volumes of data Visualization of entropy in Terascale
Supernova Initiative application. Image
(or Big Data) from Kwan-Liu Ma’s visualization team
at UC Davis
Cluster-Based Distributed
File Systems
 The underlying cluster-based file system is a key component for
providing scalable data-intensive application performance

 The cluster-based file system divides and distributes Big Data, using
file striping techniques, for allowing concurrent data accesses

 The cluster-based file systems could be either cloud computing or


HPC oriented

 Examples:
 Cloud Computing Oriented: Google File System (GFS)
 HPC Oriented: Parallel Virtual File System (PVFS)
File Striping Techniques
 Server clusters are often used for distributed applications and their
associated file systems are adjusted to satisfy their requirements

 One well-known technique is to deploy file-striping techniques, by


which a single file is distributed across multiple servers

 Hence, it becomes possible to fetch different parts concurrently

Accessing file parts in parallel

a b d e a
a
c e c b b
d
c
d e
Round-Robin Distribution (1)
 How to stripe a file over multiple machines?
 Round-Robin is typically a reasonable default solution

Logical File

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Stripe Size

Striping Unit

0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15

Server 1 Server 2 Server 3 Server 4


Round-Robin Distribution (2)
 Clients perform writes/reads of file at various regions

Client I: 512K write, offset 0 Client II: 512K write, offset 512

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 4 1 5 2 6 3 7 8 12 9 13 10 14 11 15

0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15

Server 1 Server 2 Server 3 Server 4


2D Round-Robin Distribution (1)
 What happens when we have many servers (say 1000s)?
 2D distribution can help

Logical File

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Stripe Size

Striping Unit

0 2 4 6 1 3 5 7 8 10 12 14 9 11 13 15

Server 1 Server 2 Server 3 Server 4

Group Size = 2
2D Round-Robin Distribution (2)
 2D distribution can limit the number of servers per client

Client I: 512K write, offset 0 Client II: 512K write, offset 512

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 4 1 5 2 6 3 7 8 12 9 13 10 14 11 15

0 2 4 6 1 3 5 7 8 10 12 14 9 11 13 15

Server 1 Server 2 Server 3 Server 4

Group Size = 2
GFS Data Distribution Policy
 The Google File System (GFS) is a scalable DFS for data-intensive
applications

 GFS divides large files into multiple pieces called chunks or blocks
(by default 64MB) and stores them on different data servers
 This design is referred to as block-based design

 Each GFS chunk has a unique 64-bit identifier and is stored as a file
in the lower-layer local file system on the data server

 GFS distributes chunks across cluster data servers using a random


distribution policy
GFS Random Distribution Policy
Server 0 Server 1 Server 2 Server 3
(Writer)

0M Blk Blk Blk Blk


0 0 1 0

64M Blk Blk Blk Blk


1 2 2 1

128M Blk Blk Blk Blk


2 3 4 4
Blk Blk Blk
192M
3 3 6

256M Blk Blk


4 5
Blk Blk
320M 5 5
Blk Blk
384M
6 6

Load Imbalance
GFS Architecture
 The storage and compute capabilities of a cluster are organized in
two ways:
1. GFS
Co-locate storage and compute in the same node
2. Separate storage nodes from compute nodes
File name, chunk index

GFS client Master

Contact address

Chunk Id, range


Chunk Server Chunk Server Chunk Server

Chunk data Linux File Linux File Linux File


System System System
PVFS Data Distribution Policy
 Parallel Virtual File System (PVFS) is a scalable DFS for (scientific)
data-intensive applications

 PVFS divides large files into multiple pieces called stripe units (by
default 64KB) and stores them on different data servers
 This design is referred to as object-based design

 Unlike the block-based design of GFS, PVFS stores an object (or a


handle) as a file that includes all the stripe units at a data server

 PVFS distributes stripe units across cluster data servers using a


round robin policy
PVFS Round-Robin Distribution Policy
Server 0 Server 1 Server 2 Server 3
(Writer)

0M
Blk Blk Blk Blk
64M 0 0 0 1
Blk Blk Blk Blk
128M 1 1 2 2
Blk Blk Blk Blk
192M 2 3 3 3
Blk Blk Blk Blk
256M 4 4 4 5
Blk Blk Blk Blk
5 5 6 6
320M
Blk
384M 6

Load Balance
PVFS Architecture
 The storage and compute capabilities of a cluster are organized in
two ways:
1. Co-locate storage and compute in the same node
2. Separate storage nodesPVFS
from compute nodes
Metadata
Manager
Architectures
 Client-Server Distributed File Systems
 Cluster-Based Distributed File Systems
 Symmetric Distributed File Systems
Ivy
 Fully symmetric organizations that are based on peer-to-peer
technology also exist

 All current proposals use a DHT-based system for distributing data,


combined with a key-based lookup mechanism

 As an example, Ivy is a distributed file system that is built using a


Chord DHT-based system

 Data storage in Ivy is realized by a block-oriented (i.e., blocks are


distributed over storage nodes) distributed storage called DHash
Ivy Architecture
 Ivy consists of 3 separate layers:

Node where a file system is rooted

File System Layer Ivy Ivy Ivy

Block-Oriented Storage DHash DHash DHash

DHT Layer Chord Chord Chord


Ivy
 Ivy implements an NFS-like file system

 To increase availability and improve performance, Ivy:

Replicates every block B to the k immediate successors of the


server responsible for storing B

Caches looked up blocks along the route that the lookup


request followed
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes?
• Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed
by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well
as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Stateless Processes
 Cooperating Processes in DFSs are usually the storage servers
and file manager(s)

 The most important aspect concerning DFS processes is whether


they should be stateless or stateful

 The Stateless Approach:

 Doesn’t require that servers maintain any client state


 When a server crashes, there is essentially no need to enter
a recovery phase to bring the server to a previous state
 Locking a file cannot be easily done
 E.g., NFSv3
Stateful Processes
 The Stateful Approach:

 Requires that a server maintains some client state

 Clients can make effective use of caches but this would


require a cache consistency protocol

 Provides a server with the ability to support callbacks (i.e.,


the ability to do RPC to a client) in order to keep track
of its clients

 E.g., NFSv4
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes?
• Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed
by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well
as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Communication in DFSs
 Communication in DFSs is mainly based on remote procedure
calls (RPCs)

 The main reason for choosing RPC is to make the system


independent from underlying OSs, networks, and transport protocols

 GFS uses RPC and may break a read into multiple RPCs to
increase parallelism

 PVFS currently uses TCP for all its internal communication

 In NFS, all communication between a client and server proceeds


along the Open Network Computing RPC (ONC RPC)
RPCs in NFS
Client Server

Lookup
 Up until NFSv4, the client was made responsible
for making the server’s life as easy as possible
Lookup name
by keeping requests simple
Read

Time
 The drawback becomes apparent when considering
Read file data
the use of NFS in a wide-area system

 In that case, the extra latency of a second RPC Client Server

leads to performance degradation Lookup


Open
Read

 To circumvent such a problem, NFSv4 supports Lookup name

compound procedures Open file

Time
Read file data
RPC in Coda: RPC2
 Another enhancement to RPCs has been developed as part of the
Coda file system (Kistler and Satyanarayanan, 1992) and referred
to as RPC2

 RPC2 is a package that offers reliable RPCs on top of the (unreliable)


UDP protocol

 Each time an RPC is made, the RPC2 client code starts a new thread
that sends an invocation request to the server then blocks until it
receives an answer

 As request processing may take an arbitrary time to complete, the


server regularly sends back messages to the client to let it know it is
still working on the request
RPC2 and Multicasting
 RPC2 supports multicasting

 An important design issue in Coda is that servers keep track of


which clients have a local copy of a file (i.e., stateful servers)

 When a file is modified, a sever invalidates local copies at clients in


parallel using multicasting

Client Client

Invalidate Reply Invalidate Reply


Server Server

Invalidate Invalidate
Reply Reply
Client Client

Time Time

Sending an invalidation message Sending an invalidation message


one at a time using multicasting as in RPC2
Next Class: DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes?
• Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed
by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well
as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?

You might also like