100% found this document useful (1 vote)

3K views30 pages

Distributed File System Implementation

A distributed file system allows files to be shared across multiple machines. Clients can access files in a transparent way without needing to know the physical location. There are different approaches to implementing a distributed file system, including how clients and servers are structured, how files and directories are managed, and how files are named and located. Caching and replication help improve performance and availability. Maintaining consistency between cached copies and the master files is a key challenge.

Uploaded by

Rajat Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views30 pages

Distributed File System Implementation

Uploaded by

Rajat Aggarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

ByPriyan Agarwal Priyanka Dey Pulkit Kapoor Rahul Aggarwal Rajat Aggarwal Ritika Goyal Rohit Mathur Sahib

Setia

DISTRIBUTED FILE SYSTEM

A Distributed File System ( DFS ) is simply a classical model of a file system ( as discussed before ) distributed across multiple machines. The purpose is to promote sharing of dispersed files. This is an area of active research interest today.
The resources on a particular machine are local to itself. Resources on other machines are remote. A file system provides a service for clients. The server interface is the normal set of file operations: create, read, etc. on files.

DFS- definition

Clients, servers, and storage are dispersed across machines. Configuration and implementation may vary

Servers may run on dedicated machines, OR Servers and clients can be on the same machines. The OS itself can be distributed (with the file system a part of that distribution. A distribution layer can be interposed between a conventional OS and the file system.

Clients should view a DFS the same way they would a centralized FS; the distribution is hidden at a lower level. Performance is concerned with throughput and response time.

Distributed File System Implementation

System Structure

- Are clients and servers different? = no distinction between clients and servers. All machines run the same basic software. = the file server and directory server are just user programs, so a system can be configured to run client and server software on the same machines or not, as it wishes. = clients and servers are fundamentally different machines, in terms of either hardware or software.

How the file and directory service is structured?

= combine the two into a single server that handles all the directory and file calls itself. = keep them separated, which is flexible and make software simpler. However, this method requires more communication.
Whether or not file, directory and other servers should

maintain state information about clients.

Naming
Naming is the mapping between logical and physical objects.
Example: A user filename maps to <cylinder, sector>. In a conventional file system, it's understood where the file actually

resides; the system and disk are known.

In a transparent DFS, the location of a file, somewhere in the

network, is hidden.
File replication means multiple copies of a file; mapping returns a

SET of locations for the replicas.

Transparency
Location transparency a) The name of a file does not reveal any hint of the file's

physical storage location. b)File name still denotes a specific, although hidden, set of physical disk blocks. c) This is a convenient way to share data. d)Can expose correspondence between component units and machines.

Location independence The name of a file doesn't need to be changed when the file's physical

storage location changes. Dynamic, one-to-many mapping.

Better file abstraction.
Promotes sharing the storage space itself. Separates the naming hierarchy from the storage devices hierarchy.

Most DFSs today:

Support location transparent systems. Do NOT support migration; (automatic movement of a file from machine

to machine.)
Files are permanently associated with specific disk blocks.

The ANDREW DFS AS AN EXAMPLE:

Is location independent. Supports file mobility. Separation of FS and OS allows for disk-less systems. These have lower cost

and convenient system upgrades. The performance is not as good. NAMING SCHEMES: There are three main approaches to naming files: 1. Files are named with a combination of host and local name.
This guarantees a unique name. NOT location transparent NOR location

independent.
Same naming works on local and remote files. The DFS is a loose collection

of independent file systems.

2. Remote directories are mounted to local directories.

So a local system seems to have a coherent directory structure.
The remote directories must be explicitly mounted. The files are location

independent.
SUN NFS is a good example of this technique.

3. A single global name structure spans all the files in the system.
The DFS is built the same way as a local file system. Location independent.

IMPLEMENTATION TECHNIQUES
Can Map directories or larger aggregates rather than individual files. A non-transparent mapping technique:

name ----> < system, disk, cylinder, sector >

A transparent mapping technique:

name ----> file_identifier ----> < system, disk, cylinder, sector >
So when changing the physical location of a file, only the file identifier

need be modified. This identifier must be "unique" in the universe.

STATEFUL VS. STATELESS SERVICE:

Stateful: A server keeps track of information about client requests.
It maintains what files are opened by a client; connection identifiers;

server caches. Memory must be reclaimed when client closes file or when client dies. Stateless: Each client request provides complete information needed by the server (i.e., filename, file offset ).
The server can maintain information on behalf of the client, but it's

not required. Useful things to keep include file info for the last N files touched.

STATEFUL VS. STATELESS SERVICE:

Performance is better for stateful.
Don't need to parse the filename each time, or "open/close" file on every

request. Stateful can have a read-ahead cache. Fault Tolerance: A stateful server loses everything when it crashes.
Server must poll clients in order to renew its state. Client crashes force the server to clean up its encached information. Stateless remembers nothing so it can start easily after a crash.

System Structure
Advantages of stateless servers Fault tolerance No Open/Close calls needed Advantages of stateful servers Shorter request messages

Better performance

No server space wasted on tables Read ahead possible No limits on number of open filesIdempotency easier No problems if a client crashes File locking possible

REMOTE FILE ACCESS- CACHING

Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally. If required data is not already cached, a copy of data is brought from the server to the user. Perform accesses on the cached copy. Files are identified with one master copy residing at the server machine, Copies of (parts of) the file are scattered in different caches.

Cache Consistency Problem -- Keeping the cached copies consistent with the master file.

Caching
CACHING
A remote service ((RPC) has these characteristic steps:
a) The client makes a request for file access.

b) The request is passed to the server in message format.

c) The server makes the file access. d) Return messages bring the result back to the client.

This is equivalent to performing a disk access for each request.

CACHE LOCATION:
Caching is a mechanism for maintaining disk data on the local machine. This data can be kept in the local memory or in the local disk. Caching can be advantageous both for read ahead and read again. The cost of getting data from a cache is a few HUNDRED instructions; disk accesses cost THOUSANDS of instructions. The master copy of a file doesn't move, but caches contain replicas of portions of the file. Caching behaves just like "networked virtual memory".

CACHE LOCATION:
What should be cached? << blocks <---> files >>. Bigger sizes give a better hit rate; Smaller give better transfer times.
Caching on disk gives:
Better reliability.

Caching in memory gives:

The possibility of diskless work stations, Greater speed,

Since the server cache is in memory, it allows the use of only one mechanism.

CACHE UPDATE POLICY:

A write through cache has good reliability. But the user must wait for writes to get to the server. Used by NFS. Delayed write - write requests complete more rapidly. Data may be written over the previous cache write, saving a remote write. Poor reliability on a crash.
Flush sometime later tries to regulate the frequency of writes. Write on close delays the write even longer.

Which would you use for a database file? For file editing?

Example NFS with Caches

CACHE CONSISTENCY:
The basic issue is, how to determine that the client-cached data is consistent with what's on the server.
Client - initiated approach -

The client asks the server if the cached data is OK. What should be the frequency of "asking"? On file open, at fixed time interval, ...?
Server - initiated approach -

Possibilities: A and B both have the same file open. When A closes the file, B "discards" its copy. Then B must start over. The server is notified on every open. If a file is opened for writing, then disable caching by other clients for that file. Get read/write permission for each block; then disable caching only for particular blocks.

COMPARISON OF CACHING AND REMOTE SERVICE:

Many remote accesses can be handled by a local cache. There's a great deal of locality of reference in file accesses. Servers can be accessed only occasionally rather than for each access.
Caching causes data to be moved in a few big chunks rather than in many smaller pieces; this leads to considerable efficiency for the network. Cache consistency is the major problem with caching. When there are infrequent writes, caching is a win. In environments with many writes, the work required to maintain consistency overwhelms caching advantages. Caching requires a whole separate mechanism to support acquiring and storage of large amounts of data. Remote service merely does what's required for each call. As such, caching introduces an extra layer and mechanism and is more complicated than remote service.

FILE REPLICATION:
Duplicating files on multiple machines improves availability and performance. Placed on failure-independent machines ( they won't fail together ). Replication management should be "location-opaque". The main problem is consistency - when one copy changes, how do other copies reflect that change? Often there is a tradeoff: consistency versus availability and performance. Example:

"Demand replication" is like whole-file caching; reading a file causes it to be cached locally. Updates are done only on the primary file at which time all other copies are invalidated.
Atomic and serialized invalidation isn't guaranteed ( message could get lost / machine could crash. )

ANDREW FILE SYSTEM

A distributed computing environment (Andrew) under development since

1983 at Carnegie-Mellon University, purchased by IBM and released as Transarc DFS, now open sourced as OpenAFS.
OVERVIEW:
AFS tries to solve complex issues such as uniform name space, location-

independent file sharing, client-side caching (with cache consistency), secure authentication (via Kerberos) Also includes server-side caching (via replicas), high availability Can span 5,000 workstations

AFS
Clients have a partitioned space of file names:
a local name space and a shared name space
Dedicated servers, called Vice, present the shared name space to the clients

as an homogeneous, identical, and location transparent file hierarchy

Workstations run the Virtue protocol to communicate with Vice. Are required to have local disks where they store their local name space Servers collectively are responsible for the storage and management of the

shared name space

AFS
Clients and servers are structured in clusters interconnected by a

backbone LAN
A cluster consists of a collection of workstations and a cluster server and is

connected to the backbone by a router

A key mechanism selected for remote file operations is whole file caching

Opening a file causes it to be cached, in its entirety, on the local disk

SHARED NAME SPACE:

The server file space is divided into volumes. Volumes contain files of only one user. It's these volumes that are the level of granularity attached to a client.
A vice file can be accessed using a fid = <volume number, vnode >. The fid doesn't depend on machine location. A client queries a volume-location database for this

information.
Volumes can migrate between servers to balance space and utilization. Old server has "forwarding" instructions and handles client updates during migration.

Read-only volumes ( system files, etc. ) can be replicated. The volume database knows how to find these.

FILE OPERATIONS AND CONSISTENCY SEMANTICS:

Andrew caches entire files form servers

A client workstation interacts with Vice servers only during opening and closing of files Venus caches files from Vice when they are opened, and stores modified copies of files back when they are closed Reading and writing bytes of a file are done by the kernel without Venus intervention on the cached copy Venus caches contents of directories and symbolic links, for path-name translation Exceptions to the caching policy are modifications to directories that are made directly on the server responsibility for that directory

IMPLEMENTATION Flow of a request:

Deflection of open/close: The client kernel is modified to detect references to vice files. The request is forwarded to Venus with these steps: Venus does pathname translation. Asks Vice for the file Moves the file to local disk Passes inode of file back to client kernel. Venus maintains caches for status ( in memory ) and data ( on local disk.)

A server user-level process handles client requests.

A lightweight process handles concurrent RPC requests from clients. State information is cached in this process.

Susceptible to reliability problems.

Thank you

Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
No ratings yet
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
27 pages
Distributed File Systems
No ratings yet
Distributed File Systems
50 pages
PHPR OZ12 B
No ratings yet
PHPR OZ12 B
31 pages
5.distributed File Systems
No ratings yet
5.distributed File Systems
47 pages
DFS
No ratings yet
DFS
37 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
Distributed File System
No ratings yet
Distributed File System
68 pages
Distributed File Systems
No ratings yet
Distributed File Systems
42 pages
3distributed File System
No ratings yet
3distributed File System
42 pages
CS2510 00 Distributed Storage Overview
No ratings yet
CS2510 00 Distributed Storage Overview
53 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
Lecture 5 - DFS & NFS
No ratings yet
Lecture 5 - DFS & NFS
45 pages
Distributed File Systems
No ratings yet
Distributed File Systems
38 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
DFS OS Final
No ratings yet
DFS OS Final
28 pages
ch16 PDF
No ratings yet
ch16 PDF
24 pages
06 dfs2
No ratings yet
06 dfs2
50 pages
L8 DFS
No ratings yet
L8 DFS
35 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
WINSEM2012-13 CP0029 06-Mar-2013 RM01 DFT 2
No ratings yet
WINSEM2012-13 CP0029 06-Mar-2013 RM01 DFT 2
46 pages
Ds 2016 17 Lec17
No ratings yet
Ds 2016 17 Lec17
32 pages
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
No ratings yet
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
51 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
DFS Design and Implementation
No ratings yet
DFS Design and Implementation
40 pages
AOS Module III
No ratings yet
AOS Module III
63 pages
CSCI319 Distributed Systems
No ratings yet
CSCI319 Distributed Systems
26 pages
DFS Design and Implementation: Brent R. Hafner
No ratings yet
DFS Design and Implementation: Brent R. Hafner
40 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Caching in Distributed File System: Ke Wang CS614 - Advanced System Apr 24, 2001
No ratings yet
Caching in Distributed File System: Ke Wang CS614 - Advanced System Apr 24, 2001
56 pages
5.distributed File System
No ratings yet
5.distributed File System
86 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
Distributed File System
No ratings yet
Distributed File System
49 pages
Distributed File Systems
No ratings yet
Distributed File Systems
31 pages
Distributed File Systems
No ratings yet
Distributed File Systems
28 pages
CH 17
No ratings yet
CH 17
15 pages
Disk Scheduling
No ratings yet
Disk Scheduling
21 pages
Reliable Distributed Systems
No ratings yet
Reliable Distributed Systems
44 pages
Rev. Lecture 1 PPT2
No ratings yet
Rev. Lecture 1 PPT2
24 pages
Distributed File Systems (DFS) : A Resource Management Component of A Distributed Operating System
No ratings yet
Distributed File Systems (DFS) : A Resource Management Component of A Distributed Operating System
16 pages
Oschapter 8
No ratings yet
Oschapter 8
27 pages
03-1 File Systems
No ratings yet
03-1 File Systems
9 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
64 Prerna Jain Dspractassg11
No ratings yet
64 Prerna Jain Dspractassg11
8 pages
Brand Management of Tanishq
67% (3)
Brand Management of Tanishq
54 pages
SAP Leonardo Intro 2017.10.19 PDF
No ratings yet
SAP Leonardo Intro 2017.10.19 PDF
98 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Distributed File Systems
No ratings yet
Distributed File Systems
6 pages
A Distributed File System: By, Prof Ankita Mandore
No ratings yet
A Distributed File System: By, Prof Ankita Mandore
37 pages
Distributed Systems U4
No ratings yet
Distributed Systems U4
8 pages
Distributed-File Systems Background
No ratings yet
Distributed-File Systems Background
9 pages
L6 DFS
No ratings yet
L6 DFS
27 pages
Distributed File Systems
No ratings yet
Distributed File Systems
6 pages
Andrew - Cmu.edu: Let's Start With A Familiar Example: Andrew 10,000s of People Terabytes of Disk
No ratings yet
Andrew - Cmu.edu: Let's Start With A Familiar Example: Andrew 10,000s of People Terabytes of Disk
7 pages
Other File Systems: LFS, NFS, and Afs
No ratings yet
Other File Systems: LFS, NFS, and Afs
37 pages
Distributed File System
No ratings yet
Distributed File System
21 pages
DBMS Lab Week3
No ratings yet
DBMS Lab Week3
2 pages
Requirements For Distributed File Systems
No ratings yet
Requirements For Distributed File Systems
4 pages
Unit-3 Part1
No ratings yet
Unit-3 Part1
57 pages
What Is A Distributed File System?: Dfs Has Two Important Goals
No ratings yet
What Is A Distributed File System?: Dfs Has Two Important Goals
5 pages
Tegbar Getahun
No ratings yet
Tegbar Getahun
55 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Pos 311
No ratings yet
Pos 311
151 pages
Computer Memory: Prepared By: Avanthika Krishnan, XI D
No ratings yet
Computer Memory: Prepared By: Avanthika Krishnan, XI D
17 pages
Security Part II: Auditing Database Systems: IT Auditing, Hall, 4e
No ratings yet
Security Part II: Auditing Database Systems: IT Auditing, Hall, 4e
37 pages
Nep 2023 Syllabus MSC CS
No ratings yet
Nep 2023 Syllabus MSC CS
33 pages
Mid-Term Review-Questions
No ratings yet
Mid-Term Review-Questions
7 pages
Ai in Sports
No ratings yet
Ai in Sports
8 pages
03.design Theory PDF
No ratings yet
03.design Theory PDF
112 pages
IrpMan PDF
No ratings yet
IrpMan PDF
182 pages
Working With Cursors
No ratings yet
Working With Cursors
3 pages
Cover Letter For SQL Developer
100% (1)
Cover Letter For SQL Developer
5 pages
Business Research Methods: Problem Definition and The Research Proposal
No ratings yet
Business Research Methods: Problem Definition and The Research Proposal
37 pages
2016 Doctoral Conference Graduate School of Education University of Bristol
100% (1)
2016 Doctoral Conference Graduate School of Education University of Bristol
36 pages
QB 2
No ratings yet
QB 2
17 pages
The Advantages and Disadvantages of Using Qualitative and Quantitative Approaches and Methods in Languag...
No ratings yet
The Advantages and Disadvantages of Using Qualitative and Quantitative Approaches and Methods in Languag...
12 pages
Griffiths v3
No ratings yet
Griffiths v3
72 pages
Secondary School Programme Guide
No ratings yet
Secondary School Programme Guide
21 pages
Unit7-Transaction Processing Concepts Notes
No ratings yet
Unit7-Transaction Processing Concepts Notes
7 pages
Ijikc 96 105
No ratings yet
Ijikc 96 105
11 pages
Module 4 Assignment
No ratings yet
Module 4 Assignment
6 pages
Vertica VSQL Command Line Examples
No ratings yet
Vertica VSQL Command Line Examples
3 pages
Dzone Refcard295 Microseviceswithredis0408
No ratings yet
Dzone Refcard295 Microseviceswithredis0408
7 pages
Module 7 Lab Creating Visualizations
No ratings yet
Module 7 Lab Creating Visualizations
3 pages
Assignment 05
No ratings yet
Assignment 05
6 pages
Quynh Nguyen: 02/2024 Current
No ratings yet
Quynh Nguyen: 02/2024 Current
2 pages
108121!versIC Datasheet
No ratings yet
108121!versIC Datasheet
2 pages
Kig12 GIS in Swiss
No ratings yet
Kig12 GIS in Swiss
1 page
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet

Distributed File System Implementation

Uploaded by

Distributed File System Implementation

Uploaded by

ByPriyan Agarwal Priyanka Dey Pulkit Kapoor Rahul Aggarwal Rajat Aggarwal Ritika Goyal Rohit Mathur Sahib

DISTRIBUTED FILE SYSTEM

Distributed File System Implementation

How the file and directory service is structured?

maintain state information about clients.

resides; the system and disk are known.

SET of locations for the replicas.

storage location changes. Dynamic, one-to-many mapping.

Most DFSs today:

The ANDREW DFS AS AN EXAMPLE:

of independent file systems.

2. Remote directories are mounted to local directories.

name ----> < system, disk, cylinder, sector >

need be modified. This identifier must be "unique" in the universe.

STATEFUL VS. STATELESS SERVICE:

STATEFUL VS. STATELESS SERVICE:

REMOTE FILE ACCESS- CACHING

b) The request is passed to the server in message format.

This is equivalent to performing a disk access for each request.

Caching in memory gives:

CACHE UPDATE POLICY:

Example NFS with Caches

COMPARISON OF CACHING AND REMOTE SERVICE:

ANDREW FILE SYSTEM

as an homogeneous, identical, and location transparent file hierarchy

shared name space

connected to the backbone by a router

Opening a file causes it to be cached, in its entirety, on the local disk

SHARED NAME SPACE:

FILE OPERATIONS AND CONSISTENCY SEMANTICS:

IMPLEMENTATION Flow of a request:

A server user-level process handles client requests.

Susceptible to reliability problems.

You might also like