0% found this document useful (0 votes)

36 views30 pages

Lecture13 15319 MHH 27feb 2012

The document discusses distributed file systems and cloud storage. It provides an overview of key aspects of distributed file systems including processes, communication, naming, synchronization, consistency and replication, and fault tolerance. Specifically, it describes how distributed file systems typically use remote procedure calls for communication between stateful or stateless processes. It also explains how naming is handled in NFS through mounting remote directories and in HDFS through the name node managing file locations.

Uploaded by

Anthony Yueh Grsy Ca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views30 pages

Lecture13 15319 MHH 27feb 2012

Uploaded by

Anthony Yueh Grsy Ca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Cloud Computing

CS 15-319
Distributed File Systems and Cloud Storage Part II
Lecture 13, Feb 27, 2012
Majd F. Sakr, Mohammad Hammoud and
Suhail Rehman
1

Today
Last session
Distributed File Systems and Cloud Storage- Part I
Todays session
Distributed File Systems and Cloud Storage- Part II
Announcement:
Project update is due next Wednesday, Feb 29

Discussion on Distributed File Systems

Distributed File Systems (DFSs)

Basics

DFS Aspects

DFS Aspects
Aspect

Description

Architecture

How are DFSs generally organized?

Processes

Who are the cooperating processes?

Are processes stateful or stateless?

Communication

What is the typical communication paradigm followed

by DFSs?
How do processes in DFSs communicate?

Naming

How is naming often handled in DFSs?

Synchronization

What are the file sharing semantics adopted by DFSs?

Consistency and Replication

What are the various features of client-side caching as well

as server-side replication?

Fault Tolerance

How is fault tolerance handled in DFSs?

DFS Aspects
Aspect

Description

Architecture

How are DFSs generally organized?

Processes

Who are the cooperating processes?

Are processes stateful or stateless?

Communication

What is the typical communication paradigm followed

by DFSs?
How do processes in DFSs communicate?

Naming

How is naming often handled in DFSs?

Synchronization

What are the file sharing semantics adopted by DFSs?

Consistency and Replication

What are the various features of client-side caching as well

as server-side replication?

Fault Tolerance

How is fault tolerance handled in DFSs?

Processes (1)
Cooperating processes in DFSs are usually the storage servers
and file manager(s)
The most important aspect concerning DFS processes is whether
they should be stateless or stateful
1. Stateless Approach:
Does not require that servers maintain any client state
When a server crashes, there is no need to enter a recovery
phase to bring the server to a previous state
Locking a file cannot be easily done
E.g., NFSv3 and PVFS (no client-side caching)

Processes (2)
2. Stateful Approach:
Requires that a server maintains some client state
Clients can make effective use of caches but this would entail an
efficient underlying cache consistency protocol
Provides a server with the ability to support callbacks (i.e., the ability
to do RPC to a client) in order to keep track of its clients
E.g., NFSv4 and HDFS

DFS Aspects
Aspect

Description

Architecture

How are DFSs generally organized?

Processes

Who are the cooperating processes?

Are processes stateful or stateless?

Communication

What is the typical communication paradigm followed

by DFSs?
How do processes in DFSs communicate?

Naming

How is naming often handled in DFSs?

Synchronization

What are the file sharing semantics adopted by DFSs?

Consistency and Replication

What are the various features of client-side caching as well

as server-side replication?

Fault Tolerance

How is fault tolerance handled in DFSs?

Communication

Communication
calls (RPCs)

The main reason for choosing RPC is to make the system independent from
underlying OSs, networks, and transport protocols

In NFS, all communication between a client and server proceeds along the
Open Network Computing RPC (ONC RPC)

HDFS uses RPC for the communication between clients, DataNodes and the
NameNode

PVFS currently uses TCP for all its internal communication

DFSs

typically based

on remote procedure

The communication with I/O daemons and the manager is handled transparently
within the API implementation

DFS Aspects
Aspect

Description

Architecture

How are DFSs generally organized?

Processes

Who are the cooperating processes?

Are processes stateful or stateless?

Communication

What is the typical communication paradigm followed

by DFSs?
How do processes in DFSs communicate?

Naming

How is naming often handled in DFSs?

Synchronization

What are the file sharing semantics adopted by DFSs?

Consistency and Replication

What are the various features of client-side caching as well

as server-side replication?

Fault Tolerance

How is fault tolerance handled in DFSs?

Naming
Names are used to uniquely identify entities in distributed systems
Entities may be processes, remote objects, newsgroups,
Names are mapped to an entitys location using a name resolution
An example of name resolution

Name

http://www.cdk5.net:8888/WebExamples/earth.html

DNS Lookup

Resource ID (IP Address, Port, File Path)

55.55.55.55
MAC address

02:60:8c:02:b0:5a

8888

WebExamples/earth.htm
l

Host

Naming In DFSs

NFS is considered as a representative of how

naming is handled in DFSs

Naming In NFS
The fundamental idea underlying the NFS naming model is to
provide clients with complete transparency
Transparency in NFS is achieved by allowing a client to mount
a remote file system into its own local file system
However, instead of mounting an entire file system, NFS
allows clients to mount only part of a file system
A server is said to export a directory to a client when a client
mounts a directory, and its entries, into its own name space

Mounting in NFS
Client A

remote

Client B

Server

usr

users

Mount steen
subdirectory

work

usr
Mount steen
subdirectory

steen

mbox

Exported directory
mounted by Client A

The file named /remote/vu/mbox

at Client A

usr

Exported directory
mounted by Client B

Sharing files becomes harder

The file named /work/vu/mbox

at Client B

Sharing Files In NFS

A common solution for sharing files in NFS is to provide
each client with a name space that is partly standardized
For example, each client may by using the local directory
/usr/bin to mount a file system
A remote file system can then be mounted in the same
manner for each user

Example
Client A

remote

Client B

Server

usr

users

Mount steen
subdirectory

usr

Mount steen
subdirectory

bin

steen

mbox

Exported directory
mounted by Client A

The file named /usr/bin/mbox

at Client A

work

usr

Exported directory
mounted by Client B

Sharing files resolved

The file named /usr/bin/mbox

at Client B

Mounting Nested Directories In NFSv3

An NFS server, S, can itself mount directories, Ds, that are exported
by other servers
However, in NFSv3, S is not allowed to export Ds to its own clients
Instead, a client of S will have to explicitly mount Ds
If S will be allowed to export Ds, it would have to return to its clients
file handles that include identifiers for the exporting servers
NFSv4 solves this problem

Mounting Nested Directories in NFS

Client

Server B

Server A

bin

packages

draw

install

Client imports directory

from server A

Server A imports directory

from server B

Client needs to explicitly import

subdirectory from server B

install

NFS: Mounting Upon Logging In (1)

Another problem with the NFS naming model has to do with deciding
when a remote file system should be mounted
Example: Let us assume a large system with 1000s of users and that
each user has a local directory /home that is used to mount the home
directories of other users

Alices (a user) home directory is made locally available to her as /home/alice

This directory can be automatically mounted when Alice logs into

her workstation

In addition, Alice may have access to Bobs (another user) public files by
accessing Bobs directory through /home/bob

NFS: Mounting Upon Logging In (2)

Example (Contd):

The question, however, is whether Bobs home directory should also be

mounted automatically when Alice logs in

If automatic mounting is followed for each user:

Logging in could incur

administrative overhead

lot

All users should be known in advance

communication

and

A better approach is to transparently mount another users home

directory on-demand

On-Demand Mounting In NFS

On-demand mounting of a remote file system is handled in NFS by an
automounter, which runs as a separate process on the clients machine
Client Machine

Server Machine

1. Lookup /home/alice
users

3. Mount request

NFS Client

Automounter
alice

2. Create subdir alice

Local File System Interface

home
d
sub
t
un
Mo
4.

alice

e
alic

m
fr o

ver
r
e
s

Naming in HDFS
An HDFS cluster consists of a single NameNode (the master) and
multiple DataNodes (the slaves)
The NameNode manages HDFS
accesses to files by clients

namespace and regulates

It executes file system namespace operations (e.g., opening, closing,

and renaming files and directories)
It is an arbitrator and repository for all HDFS metadata
It determines the mapping of blocks to DataNodes

The DataNodes manage storage attached to the nodes that they run on

They are responsible for serving read and write requests from clients
They perform block creation, deletion, and replication upon instructions
from the NameNode

A Client Reading Data from HDFS

Here is the main sequence of events when reading a file in HDFS

pen
1. O

DistributedFileSystem

2. G
et B
lo

HDFS
Client
3. R
ead

ck L
o

cati
o

FSDataInputStream
5. R
ead

6. Close

NameNode
namenode

Client JVM
ead
4. R

Client Node

DataNode

namenode

Data Reads

DistributedFileSystem calls the NameNode, using RPC, to determine the locations

of the blocks for the first few blocks in the file

For each block, the NameNode returns the addresses of the DataNodes that
have a copy of that block
During the read process, DFSInputStream calls the NameNode to retrieve the
DataNode locations for the next batch of blocks needed
The DataNodes are sorted according to their proximity to the client in order to
exploit data locality
An important aspect of this design is that the client contacts DataNodes
directly to retrieve data and is guided by the NameNode to the best DataNode
for each block

A Client Writing Data to HDFS

Here is the main sequence of events when writing a file to HDFS (the
case assumes creating a new file, writing data to it, then closing the file)
te
rea
C
.
1

DistributedFileSystem

HDFS
Client
3. W

rite

FSDataOutputStream

2. C
r

7. C
om p

eate

NameNode
lete

namenode

6. Close

r it e

t
cke
Pa

t
c ke
Pa

ck
5. A

Client Node

4. W

Client JVM

4
Pipeline of DataNodes

DataNode

namenode

Data Pipelining

When a client is writing data to an HDFS file, its data is first written to a local file

When the local file accumulates a full block of user data, the client retrieves a list
of DataNodes from the NameNode

The client then flushes the block to the first DataNode

The first DataNode:

Starts receiving the data in small portions (4KB)
Writes each portion to its local repository
Transfers that portion to the subsequent DataNode in the list

A subsequent DataNode follows the same steps as the previous DataNode

Thus, the data is pipelined from one DataNode to the next

PVFS System View

Some major components of the PVFS system:

Metadata server (mgr)
I/O server (iod)
PVFS native API (libpvfs)

The mgr manages all file metadata for

PVFS files

The iods handle storing and retrieving file data

stored on local disks connected to the node

Libpvfs:

provides user-space access to the PVFS

servers
handles the scatter/gather operations
necessary to move data between user
buffers and PVFS servers

Metadata
Manager

Naming in PVFS
PVFS file systems may be mounted on all nodes in the same
directory simultaneously
This allows all nodes to see and access all files on the PVFS file
system through the same directory scheme
Once mounted, PVFS files and directories can be operated on with
all the familiar tools, such as ls, cp, and rm
With PVFS, clients can avoid making requests to the file system
through the kernel by linking to the PVFS native API

This library implements a subset of the UNIX operations which directly

contact PVFS servers rather than passing through the local kernel

Metadata and Data Accesses

For metadata operations, applications communicate through the
library with the metadata server
For data access, the metadata server is eliminated from the access
path and instead I/O servers are contacted directly

Metadata Access

Data Access

Next Class: DFS Aspects

Aspect

Description

Architecture

How are DFSs generally organized?

Processes

Who are the cooperating processes?

Are processes stateful or stateless?

Communication

What is the typical communication paradigm followed

by DFSs?
How do processes in DFSs communicate?

Naming

How is naming often handled in DFSs?

Synchronization

What are the file sharing semantics adopted by DFSs?

Consistency and Replication

What are the various features of client-side caching as well

as server-side replication?

Fault Tolerance

How is fault tolerance handled in DFSs?

C911 C931 C941 C942 Maintenance Manual Rev 2
No ratings yet
C911 C931 C941 C942 Maintenance Manual Rev 2
304 pages
Vera Connection Guide For Non-Wells Fargo-Owned Pcs - V.8.1
0% (1)
Vera Connection Guide For Non-Wells Fargo-Owned Pcs - V.8.1
28 pages
12.6.1 Packet Tracer - Troubleshooting Challenge - Document The Network
No ratings yet
12.6.1 Packet Tracer - Troubleshooting Challenge - Document The Network
5 pages
CCNP Switch Chapter 2 v7
88% (16)
CCNP Switch Chapter 2 v7
11 pages
Biostar I945C-M7B r6.0
No ratings yet
Biostar I945C-M7B r6.0
28 pages
Lab Manual FOR Unix Operating System: Ms. A.K. Ingale
No ratings yet
Lab Manual FOR Unix Operating System: Ms. A.K. Ingale
31 pages
Java Cheatsheet CodeWithHarry
0% (1)
Java Cheatsheet CodeWithHarry
18 pages
Dell G15 5515: Service Manual
No ratings yet
Dell G15 5515: Service Manual
75 pages
Part 5 MAD Notes On Storage, Database
No ratings yet
Part 5 MAD Notes On Storage, Database
19 pages
Manual Billetero JCM
100% (1)
Manual Billetero JCM
16 pages
Comunicacion Con ECM y Flash
No ratings yet
Comunicacion Con ECM y Flash
3 pages
Symetrix Radius 12x8
No ratings yet
Symetrix Radius 12x8
2 pages
PHPR OZ12 B
No ratings yet
PHPR OZ12 B
31 pages
Compiler Construction: Mohamed Zahran (Aka Z) Mzahran@cs - Nyu.edu
No ratings yet
Compiler Construction: Mohamed Zahran (Aka Z) Mzahran@cs - Nyu.edu
37 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
Hospital Management System Srs
No ratings yet
Hospital Management System Srs
22 pages
5.distributed File System
No ratings yet
5.distributed File System
86 pages
Testing Android Studio
No ratings yet
Testing Android Studio
7 pages
70-410 CH1 Deploy, Manage and Maintain Servers
No ratings yet
70-410 CH1 Deploy, Manage and Maintain Servers
25 pages
Lec 11 - Distributed Files - Distributed File System
No ratings yet
Lec 11 - Distributed Files - Distributed File System
33 pages
Optimize Audio in Win
No ratings yet
Optimize Audio in Win
39 pages
Question Paper-An-with Reg No
No ratings yet
Question Paper-An-with Reg No
7 pages
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
No ratings yet
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
51 pages
03 Nfs PDF
No ratings yet
03 Nfs PDF
48 pages
Android Training Course Module
No ratings yet
Android Training Course Module
4 pages
Microsoft Word (Skraćeno MS Word) Jest
No ratings yet
Microsoft Word (Skraćeno MS Word) Jest
4 pages
Distributed File Systems
No ratings yet
Distributed File Systems
42 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
Resolving The Truncation of Multi-Byte Characters in DMEE File
No ratings yet
Resolving The Truncation of Multi-Byte Characters in DMEE File
5 pages
Gytha John Harikrishnan Hridya S7Cse: Presented by
No ratings yet
Gytha John Harikrishnan Hridya S7Cse: Presented by
17 pages
Cara Flash Advan X7 Via Intel Flashtool - Tutorial Repair Software Atau Flashing Handphone Android
No ratings yet
Cara Flash Advan X7 Via Intel Flashtool - Tutorial Repair Software Atau Flashing Handphone Android
2 pages
T2000 Hardware RAID
No ratings yet
T2000 Hardware RAID
9 pages
Distributed File Systems
No ratings yet
Distributed File Systems
43 pages
02 NFS
No ratings yet
02 NFS
9 pages
Distributed-File Systems Background
No ratings yet
Distributed-File Systems Background
9 pages
Chapter 8
No ratings yet
Chapter 8
22 pages
Reliable Distributed Systems
No ratings yet
Reliable Distributed Systems
44 pages
Unit-3 Part1
No ratings yet
Unit-3 Part1
57 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
How To Configure IPv4, Grouping and Policy in Fortigate Firewall - Technology Software Center
No ratings yet
How To Configure IPv4, Grouping and Policy in Fortigate Firewall - Technology Software Center
5 pages
L6 DFS
No ratings yet
L6 DFS
27 pages
Distributed File Systems
No ratings yet
Distributed File Systems
50 pages
Distributed File Systems
No ratings yet
Distributed File Systems
31 pages
Distributed File System Implementation
100% (1)
Distributed File System Implementation
30 pages
2distributed File System Dfs
No ratings yet
2distributed File System Dfs
21 pages
WEEK 2 - Introduction To Operating System
No ratings yet
WEEK 2 - Introduction To Operating System
5 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
Distributed File System
100% (1)
Distributed File System
17 pages
DFS, PPT
No ratings yet
DFS, PPT
18 pages
18-Distributed File Systems Study On Operating Systems
No ratings yet
18-Distributed File Systems Study On Operating Systems
24 pages
Lecture24 DFS PartI 25nov 2014
No ratings yet
Lecture24 DFS PartI 25nov 2014
46 pages
Chapter 11: Distributed File Systems
No ratings yet
Chapter 11: Distributed File Systems
6 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Leaks Va1 Source Build Sample10 2hr
No ratings yet
Leaks Va1 Source Build Sample10 2hr
1 page
Distributed File System
No ratings yet
Distributed File System
21 pages
SGT Spares India
No ratings yet
SGT Spares India
8 pages
Virtual Appliances Quick Start Guide - Bitnami Documentation
No ratings yet
Virtual Appliances Quick Start Guide - Bitnami Documentation
13 pages
Requirements For Distributed File Systems
No ratings yet
Requirements For Distributed File Systems
4 pages
Microcontroller Exam Paper
100% (1)
Microcontroller Exam Paper
6 pages
Other File Systems: LFS, NFS, and Afs
No ratings yet
Other File Systems: LFS, NFS, and Afs
37 pages
What Is A Distributed File System?: Dfs Has Two Important Goals
No ratings yet
What Is A Distributed File System?: Dfs Has Two Important Goals
5 pages
Unit 3: Distributed File System
No ratings yet
Unit 3: Distributed File System
12 pages
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
No ratings yet
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
27 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Distributed File Systems: Arvind Krishnamurthy Spring 2001
No ratings yet
Distributed File Systems: Arvind Krishnamurthy Spring 2001
3 pages
Distributed File System
No ratings yet
Distributed File System
5 pages
What Is DFS
No ratings yet
What Is DFS
37 pages
Ds 2016 17 Lec17
No ratings yet
Ds 2016 17 Lec17
32 pages
Unit 2. Sun Network File System.
No ratings yet
Unit 2. Sun Network File System.
1 page
Distributed File Systems
No ratings yet
Distributed File Systems
28 pages
Network File System (NFS)
No ratings yet
Network File System (NFS)
31 pages
Chap 6
No ratings yet
Chap 6
54 pages
Distributed File System
No ratings yet
Distributed File System
7 pages
10 Distributed File Systems
No ratings yet
10 Distributed File Systems
27 pages
Rev. Lecture 1 PPT2
No ratings yet
Rev. Lecture 1 PPT2
24 pages
Distributed File System
No ratings yet
Distributed File System
68 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
Discrete Computing
No ratings yet
Discrete Computing
25 pages
Access Matrix Report
No ratings yet
Access Matrix Report
7 pages
11 Distributed File Systems
No ratings yet
11 Distributed File Systems
16 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Lecture 08
No ratings yet
Lecture 08
25 pages
(DFS) Distributed File System-1
No ratings yet
(DFS) Distributed File System-1
12 pages
Module III Hadoop Framework
No ratings yet
Module III Hadoop Framework
21 pages
A Distributed File System: By, Prof Ankita Mandore
No ratings yet
A Distributed File System: By, Prof Ankita Mandore
37 pages
DC - Unit 3 Uhh Ybhg The G Hai H G BT
No ratings yet
DC - Unit 3 Uhh Ybhg The G Hai H G BT
32 pages