0% found this document useful (0 votes)

1K views50 pages

Distributed File Systems

Distributed file systems allow files to be shared across multiple computers over a network. They provide a centralized view of files even though the files may be physically distributed across different systems. Key aspects of distributed file systems include naming and locating files, caching files locally to improve performance, maintaining consistency between cached files and the master files, and handling issues like concurrent writes.

Uploaded by

rid9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views50 pages

Distributed File Systems

Uploaded by

rid9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Distributed File Systems

Distributed Files Systems (DFS)

• A special case of distributed system
• Allows multi-computer systems to share files
• Even when no other IPC or RPC is needed
• Sharing devices
• Special case of sharing files
• E.g.,
• NFS (Sun’s Network File System)
• Windows NT, 2000, XP
• Andrew File System (AFS) & others …
Distributed File Systems (continued)
• One of most common uses of distributed
computing
• Goal: provide common view of centralized file
system, but distributed implementation.
• Ability to open & update any file on any machine on
network
• All of synchronization issues and capabilities of shared
local files
DFS: Architecture
• In general, files in a DFS can be located in “any” system. We
call the “source(s)” of files to be servers and those accessing
them to be clients.
• Potentially, a server for a file can become a client for
another file.
• However, most distributed systems distinguish between
clients and servers in more strict way:
• Clients simply access files and do not have/share local files.
• Even if clients have disks, they (disks) are used for swapping,
caching, loading the OS, etc.
• Servers are the actual sources of files.
• In most cases, servers are more powerful machines (in terms of
CPU, physical memory, disk bandwidth, ..)

4
Design Issues
• Naming: Locating the file/directory in a DFS based on
name.
• Location of cache: disk, main memory, both.
• Writing policy: Updating original data source when
cache content gets modified.
• Cache consistency: Modifying cache when data
source gets modified.
• Availability: More copies of files/resources.
• Scalability: Ability to handle more clients/users.
• Semantics: Meaning of different operations (read,
write,…)

5
Naming of Distributed Files
• Naming – mapping between logical and physical objects.
• A transparent DFS hides the location where in the network the
file is stored.
• Location transparency – file name does not reveal the file’s
physical storage location.
• File name denotes a specific, hidden, set of physical disk
blocks.
• Convenient way to share data.
• Could expose correspondence between component units
and machines.
• Location independence – file name does not need to be
changed when the file’s physical storage location changes.
• Better file abstraction.
• Promotes sharing the storage space itself.
• Separates the naming hierarchy from the storage-devices
hierarchy.
Naming
• Name space: (e.g.,) /home/students/jack, /home/staff/jill.
• Name space is a collection of names.
• Location transparency: file names do not indicate their
physical locations.
• Name resolution: mapping name space to an
object/device/file/directory.
• Naming approaches:
• Simple Concatenation: add hostname to file names.
• Guarantees unique names.
• No transparency. Moving a file to another host involves a file
name change.

7
DFS – Three Naming Schemes
1. Mount remote directories to local directories, giving the
appearance of a coherent local directory tree
• Mounted remote directories can be accessed transparently.
• Unix/Linux with NFS; Windows with mapped drives
2. Files named by combination of host name and local
name;
• Guarantees a unique system wide name
• Windows Network Places, Apollo Domain
3. Total integration of component file systems.
• A single global name structure spans all the files in the system.
• If a server is unavailable, some arbitrary set of directories on
different machines also becomes unavailable.
Mechanisms for DFS
• Mounting: to help in combining files/directories in different
systems and form a single file system structure.
• Caching: to reduce the response time in bringing data from
remote machines.
• Hints: modified caching
• Bulk data transfer: helps in reducing the delay due to
transfer of files over the network. Bulk:
• Obtain multiple number of blocks with a single seek
• Format, transfer large number of packets in a single
context switch.
• Reduce the number of acknowledgements to be sent.
• (e.g.,) useful when downloading OS onto a diskless
client.
• Encryption: Establish a key for encryption with the help of
an authentication server.
9
Mounting Remote Directories
(NFS)
Mounting Remote Directories
(continued)

• Note:– names of files are not unique

• As represented by path names
• E.g.,
• Server A sees : /users/steen/mbox
• Client A sees: /remote/vu/mbox
• Client B sees: /work/me/mbox

• Consequence:– Cannot pass file “names” around

haphazardly
DFS – File Access Performance
• Reduce network traffic by retaining recently
accessed disk blocks in local cache
• Repeated accesses to the same information can be
handled locally.
• All accesses are performed on the cached copy.
• If needed data not already cached, copy of data
brought from the server to the local cache.
• Copies of parts of file may be scattered in different
caches.
• Cache-consistency problem – keeping the cached
copies consistent with the master file.
• Especially on write operations
Caching
• Performance of distributed file system, in terms of
response time, depends on the ability to “get” the files
to the user.
• When files are in different servers, caching might be
needed to improve the response time.
• A copy of data (in files) is brought to the client (when
referenced). Subsequent data accesses are made on the
client cache.
• Client cache can be on disk or main memory.
• Data cached may include future blocks that may be
referenced too.
• Caching implies DFS needs to guarantee consistency of
data.

13
DFS – File Caches
• In client memory
• Performance speed up; faster access
• Good when local usage is transient
• Enables diskless workstations
• On client disk
• Good when local usage dominates (e.g., AFS)
• Caches larger files
• Helps protect clients from server crashes
Writing Policy
• When should a modified cache content be transferred to
the server?
• Write-through policy:
• Immediate writing at server when cache content is
modified.
• Advantage: reliability, crash of cache (client) does not mean
loss of data.
• Disadvantage: Several writes for each small change.
• Delayed writing policy:
• Write at the server, after a delay.
• Advantage: small/frequent changes do not increase
network traffic.
• Disadvantage: less reliable, susceptible to client crashes.
• Write at the time of file closing.

15
DFS –Cache Update Policies
• When does the client update the master file?
• I.e. when is cached data written from the cache to the file?
• Write-through – write data through to disk ASAP
• I.e., following write() or put(), same as on local disks.
• Reliable, but poor performance.
• Delayed-write – cache and then written to the server later.
• Write operations complete quickly; some data may be overwritten
in cache, saving needless network I/O.
• Poor reliability
• unwritten data may be lost when client machine crashes
• Inconsistent data
• Variation – scan cache at regular intervals and flush dirty blocks.
DFS Data Access
Request to
Access data
Return data Load data Load server
to client to client cache cache
Check
client
Data Issue disk
cache
present read
Data
Not present Data
Not present
Check Check
Local disk Server cache
Data
(if any) Data
present
present

Data Send request to

Not present File server Network

17
DFS – File Consistency
• Is locally cached copy of the data consistent with
the master copy?
• Client-initiated approach
• Client initiates a validity check with server.
• Server verifies local data with the master copy
• E.g., time stamps, etc.
• Server-initiated approach
• Server records (parts of) files cached in each client.
• When server detects a potential inconsistency, it reacts
Cache Consistency
• When should a modified source content be transferred to the
cache?
• Server-initiated policy:
• Server cache manager informs client cache managers that
can then retrieve the data.
• Client-initiated policy:
• Client cache manager checks the freshness of data before
delivering to users. Overhead for every data access.
• Concurrent-write sharing policy:
• Multiple clients open the file, at least one client is writing.
• File server asks other clients to purge/remove the cached
data for the file, to maintain consistency.

19
Cache Consistency ...
• Sequential-write sharing policy: a client opens a file that
was recently closed after writing.

• This client may have outdated cache blocks of the file (since
the other client might have modified the file contents).
• Use time stamps for both cache and files. Compare the
time stamps to know the freshness of blocks.

• The other client (which was writing previously) may still have
modified data in its cache that has not yet been updated on
server. (e.g.,) due to delayed writing.
• Server can force the previous client to flush its cache
whenever a new client opens the file.

20
DFS – Remote Service vs. Caching
• Remote Service – all file actions implemented by
server.
• RPC functions
• Use for small memory diskless machines
• Particularly applicable if large amount of write activity

• Cached System
• Many “remote” accesses handled efficiently by the local
cache
• Most served as fast as local ones.
• Servers contacted only occasionally
• Reduces server load and network traffic.
• Enhances potential for scalability.
• Reduces total network overhead
DFS – File Server Semantics
• Stateless Service
• Avoids state information in server by making
each request self-contained.
• Each request identifies the file and position in
the file.
• No need to establish and terminate a
connection by open and close operations.
• Poor support for locking or synchronization
among concurrent accesses
DFS – File Server Semantics
(continued)
• Stateful Service
• Client opens a file (as in Unix & Windows).
• Server fetches information about file from disk, stores in
server memory,
• Returns to client a connection identifier unique to
client and open file.
• Identifier used for subsequent accesses until session
ends.
• Server must reclaim space used by no longer active
clients.
• Increased performance; fewer disk accesses.
• Server retains knowledge about file
• E.g., read ahead next blocks for sequential access
• E.g., file locking for managing writes
• Windows
DFS –Server Semantics
Comparison
• Failure Recovery: Stateful server loses all volatile
state in a crash.
• Restore state by recovery protocol based on a dialog
with clients.
• Server needs to be aware of crashed client processes
• orphan detection and elimination.
• Failure Recovery: Stateless server failure and
recovery are almost unnoticeable.
• Newly restarted server responds to self-contained
requests without difficulty.
DFS –Server Semantics Comparison
(continued)

•…
• Penalties for using the robust stateless service: –
• longer request messages
• slower request processing
• Some environments require stateful service.
• Server-initiated cache validation cannot provide
stateless service.
• File locking (one writer, many readers).
DFS – Replication
• Replicas of the same file reside on failure-independent
machines.
• Improves availability and can shorten service time.
• Naming scheme maps a replicated file name to a particular
replica.
• Existence of replicas should be invisible to higher levels.
• Replicas must be distinguished from one another by
different lower-level names.
• Updates
• Replicas of a file denote the same logical entity
• Update to any replica must be reflected on all other
replicas.
DFS: Case Studies
• NFS (Network File System)
• Developed by Sun Microsystems (in 1985)
• Most popular, open, and widely used.
• NFS protocol standardised through IETF (RFC 1813)

• AFS (Andrew File System)

• Developed by Carnegie Mellon University as part of
Andrew distributed computing environments (in 1986)
• A research project to create campus wide file system.
• Public domain implementation is available on Linux
(LinuxAFS)
• It was adopted as a basis for the DCE/DFS file system in
the Open Software Foundation (OSF,
www.opengroup.org) DEC (Distributed Computing
Environment)
27
Example Distributed File Systems
• NFS – Sun’s Network File System (ver. 3)
• Tanenbaum & van Steen, Chapter 11

• NFS – Sun’s Network File System (ver. 4)

• Tanenbaum & van Steen, Chapter 11

• AFS – the Andrew File System

• See Silbershatz §17.6
NFS
• Sun Network File System (NFS) has become de facto
standard for distributed UNIX file access.
• NFS runs over LAN
• even WAN (slowly)
• Any system may be both a client and server

• Basic idea:
• Remote directory is mounted onto local directory
• Remote directory may contain mounted directories
within
Mounting Remote Directories
(NFS)
Nested Mounting (NFS)
NFS Implementation

NFS
NFS Operations
• Lookup
• Fundamental NFS operation
• Takes pathname, returns file handle
• File Handle
• Unique identifier of file within server
• Persistent; never reused
• Storable, but opaque to client
• 64 bytes in NFS v3; 128 bytes in NFS v4
• Most other operations take file handle as argument
Other NFS Operations (version 3)
• read, write
• link, symlink
• mknod, mkdir • Conspicuously absent
• rename, rmdir • open, close

• readdir, readlink
• getattr, setattr
• create, remove
NFS v3 — A Stateless Service
• Server retains no knowledge of client
• Server crashes invisible to client
• All hard work done on client side
• Every operation provides file handle
• Server caching
• Performance only
• Based on recent usage
• Client caching
• Client checks validity of caches files
• Client responsible for writing out caches
•…
NFS v3 — A Stateless Service
(continued)

•…
• No locking! No synchronization!

• Unix file semantics not guaranteed

• E.g., read after write
• Session semantics not even guaranteed
• E.g., open after close
NFS v3 — A Stateless Service
(continued)

• Solution: global lock manager

• Separate from NFS

• Typical locking operations

• Lock – acquire lock (non-blocking)
• Lockt – test a lock
• Locku – unlock a lock
• Renew – renew lease on a lock
NFS Implementation
• Remote procedure calls for all operations
• Implemented in Sun ONC
• XDR is interface definition language
• Network communication is client-initiated
• RPC based on UDP (non-reliable protocol)
• Response to remote procedure call is de facto
acknowledgement
• Lost requests are simply re-transmitted
• As many times as necessary to get a response!
NFS – Caching
• On client open(), client asks server if its cached
attribute blocks are up to date.

• Once file is open, different client processes can

write it and get inconsistent data.

• Modified data is flushed back to the server every

30 seconds.
NFS Failure Recovery
• Server crashes are transparent to client
• Each client request contains all information
• Server can re-fetch from disk if not in its caches
• Client retransmits request if interrupted by crash
• (i.e., no response)

• Client crashes are transparent to server

• Server maintains no record of which client(s) have cached files.
Summary NFS
• That was version 3 of NFS
• Stateless file system
• High performance, simple protocol
• Based on UDP

• Everything has changed in NFS version 4

• First published in 2000
• Clarifications published in 2003
• Almost complete rewrite of NFS
NFS Version 4
• Stateful file service
• Based on TCP – reliable transport protocol
• More ways to access server
• Compound requests
• I.e., multiple RPC calls in same packet

• More emphasis on security

• Mount protocol integrated with rest of NFS
protocol
NFS Version 4
NFS Version 4 (continued)
• Additional RPC operations
• Long list for managing files, caches, validating versions,
etc.
• Also security, permissions, etc.
• Also
• Open() and close().
• With a server crash, some information may have to be recovered
• See
• Silbershatz, p. 653
• http://www.tcpipguide.com/free/t_TCPIPNetworkFileSystemNFS.ht
m
Andrew File System (AFS)
• Completely different kind of file system

• Developed at CMU to support all student

computing.
• Consists of workstation clients and dedicated file
server machines.
Andrew File System (AFS)
• Stateful
• Single name space
• File has the same names everywhere in the world.
• Lots of local file caching
• On workstation disks
• For long periods of time
• Originally whole files, now 64K file chunks.
• Good for distant operation because of local disk
caching
AFS
• Need for scaling led to reduction of client-server
message traffic.
• Once a file is cached, all operations are performed
locally.
• On close, if the file is modified, it is replaced on the
server.
• The client assumes that its cache is up to date!
• Server knows about all cached copies of file
• Callback messages from the server saying otherwise.
•…
AFS
• On file open()
• If client has received a callback for file, it must fetch new
copy
• Otherwise it uses its locally-cached copy.
• Server crashes
• Transparent to client if file is locally cached
• Server must contact clients to find state of files

• See Silbershatz §17.6

Distributed File Systems —
Summary
• Performance is always an issue
• Tradeoff between performance and the semantics of file
operations (especially for shared files).
• Caching of file blocks is crucial in any file system,
distributed or otherwise.
• As memories get larger, most read requests can be
serviced out of file buffer cache (local memory).
• Maintaining coherency of those caches is a crucial
design issue.
• Current research addressing disconnected file
operation for mobile computers.
Reading Assignment
• Silbershatz, Chapter 17
or
• Tanenbaum, Modern Operating Systems
• §8.3 and §10.6.4
or
• Tanenbaum & van Steen, Chapter 11

Architectural Design Challenges + Elasticity
No ratings yet
Architectural Design Challenges + Elasticity
8 pages
Distributed File System
No ratings yet
Distributed File System
49 pages
OS 2 Marks
100% (11)
OS 2 Marks
15 pages
On Distributed Os
100% (1)
On Distributed Os
131 pages
Security Trends, Legal, Ethical and Professional Aspects of Security
No ratings yet
Security Trends, Legal, Ethical and Professional Aspects of Security
3 pages
Distributed Computing Question Bank
No ratings yet
Distributed Computing Question Bank
6 pages
From Chapter 1 of Distributed Systems Concepts and Design, 4 Edition
100% (1)
From Chapter 1 of Distributed Systems Concepts and Design, 4 Edition
49 pages
2 Mark Question With Answers
No ratings yet
2 Mark Question With Answers
9 pages
Distributed Systems Question Bank-2021-2022
0% (1)
Distributed Systems Question Bank-2021-2022
7 pages
Scheduling Methods For Disk Requests
No ratings yet
Scheduling Methods For Disk Requests
4 pages
Unit-2 Operating System
No ratings yet
Unit-2 Operating System
74 pages
Distributed Operating Systems: Unit - 2
No ratings yet
Distributed Operating Systems: Unit - 2
48 pages
Lecture 1 - PPT - CNS
No ratings yet
Lecture 1 - PPT - CNS
16 pages
Java Interface To HDFS
No ratings yet
Java Interface To HDFS
4 pages
Elmasri and Navathe DBMS Concepts 25
No ratings yet
Elmasri and Navathe DBMS Concepts 25
10 pages
Distributed Os
No ratings yet
Distributed Os
26 pages
COA Notes Unit - 2
No ratings yet
COA Notes Unit - 2
30 pages
Data Storage Technologies and Networks
No ratings yet
Data Storage Technologies and Networks
7 pages
1.1. Cloud Architecture System Models For Distributed and Cloud Computing
No ratings yet
1.1. Cloud Architecture System Models For Distributed and Cloud Computing
31 pages
Algorithm For Asynchronous Check Pointing and Recovery
No ratings yet
Algorithm For Asynchronous Check Pointing and Recovery
4 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
Transaction With Replicated Data PDF
No ratings yet
Transaction With Replicated Data PDF
3 pages
DC Notes - 2 Marks
No ratings yet
DC Notes - 2 Marks
11 pages
Cloud Computing Unit - 3 Final
No ratings yet
Cloud Computing Unit - 3 Final
43 pages
Data Center Design and Interconnection Network
No ratings yet
Data Center Design and Interconnection Network
12 pages
CSE 5th Semester - Neural Networks and Deep Learning - CCS355 2021 Regulation - Question Paper 2023 Nov Dec
No ratings yet
CSE 5th Semester - Neural Networks and Deep Learning - CCS355 2021 Regulation - Question Paper 2023 Nov Dec
5 pages
CS2402 Mobile and Pervasive Computing Syllabus
No ratings yet
CS2402 Mobile and Pervasive Computing Syllabus
1 page
Distributed Computing Question Paper
No ratings yet
Distributed Computing Question Paper
2 pages
5153 DESIGN and ANALYSIS of ALGORITHMS Anna University Previous Year Question Paper
100% (1)
5153 DESIGN and ANALYSIS of ALGORITHMS Anna University Previous Year Question Paper
6 pages
Roots of Cloud Computing
100% (1)
Roots of Cloud Computing
29 pages
Distributed Systems Question Paper JNTUH
100% (1)
Distributed Systems Question Paper JNTUH
2 pages
CS3551 DC 5 Units Notes
No ratings yet
CS3551 DC 5 Units Notes
102 pages
Distributed File System - File Service Architecture
No ratings yet
Distributed File System - File Service Architecture
51 pages
Distributed-Computing Notes
No ratings yet
Distributed-Computing Notes
108 pages
Jntuk R13 Batch Distirbuted Sytems Previous Questions Papers
No ratings yet
Jntuk R13 Batch Distirbuted Sytems Previous Questions Papers
1 page
M.E.cse - R21 Syllabus
No ratings yet
M.E.cse - R21 Syllabus
20 pages
CS2032 2 Marks & 16 Marks With Answers
100% (1)
CS2032 2 Marks & 16 Marks With Answers
30 pages
How Does A Single Bit Error Differs From Burst Error.
No ratings yet
How Does A Single Bit Error Differs From Burst Error.
4 pages
Unit 4
100% (1)
Unit 4
33 pages
IOT Mod4@AzDOCUMENTS - in
No ratings yet
IOT Mod4@AzDOCUMENTS - in
17 pages
Production Systems
No ratings yet
Production Systems
27 pages
Bigdatacourse
No ratings yet
Bigdatacourse
10 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
Institute of Technology & Science, Mohan Nagar, Ghaziabad Compiler Design Model Questions Unit-1
No ratings yet
Institute of Technology & Science, Mohan Nagar, Ghaziabad Compiler Design Model Questions Unit-1
4 pages
Unit II - SCADA and RFID Protocols
0% (1)
Unit II - SCADA and RFID Protocols
6 pages
CS3551 DC - Int - I - Answer Key 7.9.23
No ratings yet
CS3551 DC - Int - I - Answer Key 7.9.23
68 pages
Chapter 6 (Pipelining and Superscalar Techniques)
No ratings yet
Chapter 6 (Pipelining and Superscalar Techniques)
10 pages
Grid Architecture
No ratings yet
Grid Architecture
19 pages
Distributed Systems-Question Bank
100% (1)
Distributed Systems-Question Bank
16 pages
DWDM Bits
100% (1)
DWDM Bits
11 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Crowd Sourcing Analytics
100% (1)
Crowd Sourcing Analytics
27 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
15 pages
Introduction - Examples of Distributed Systems-Trends in Distributed Systems - Focus On Resource Sharing - Challenges. Case Study: World Wide Web
100% (2)
Introduction - Examples of Distributed Systems-Trends in Distributed Systems - Focus On Resource Sharing - Challenges. Case Study: World Wide Web
46 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
Distributed File System Implementation
100% (1)
Distributed File System Implementation
30 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
Distributed File Systems
No ratings yet
Distributed File Systems
28 pages
What Is A Distributed File System?: Dfs Has Two Important Goals
No ratings yet
What Is A Distributed File System?: Dfs Has Two Important Goals
5 pages
Distributed-File Systems Background
No ratings yet
Distributed-File Systems Background
9 pages

Distributed File Systems

Uploaded by

Distributed File Systems

Uploaded by

Distributed File Systems

Distributed Files Systems (DFS)

• Note:– names of files are not unique

• Consequence:– Cannot pass file “names” around

Data Send request to

• AFS (Andrew File System)

• NFS – Sun’s Network File System (ver. 4)

• AFS – the Andrew File System

• Unix file semantics not guaranteed

• Solution: global lock manager

• Typical locking operations

• Once file is open, different client processes can

• Modified data is flushed back to the server every

• Client crashes are transparent to server

• Everything has changed in NFS version 4

• More emphasis on security

• Developed at CMU to support all student

• See Silbershatz §17.6

You might also like