Distributed File Systems
Distributed File Systems
4
Design Issues
• Naming: Locating the file/directory in a DFS based on
name.
• Location of cache: disk, main memory, both.
• Writing policy: Updating original data source when
cache content gets modified.
• Cache consistency: Modifying cache when data
source gets modified.
• Availability: More copies of files/resources.
• Scalability: Ability to handle more clients/users.
• Semantics: Meaning of different operations (read,
write,…)
5
Naming of Distributed Files
• Naming – mapping between logical and physical objects.
• A transparent DFS hides the location where in the network the
file is stored.
• Location transparency – file name does not reveal the file’s
physical storage location.
• File name denotes a specific, hidden, set of physical disk
blocks.
• Convenient way to share data.
• Could expose correspondence between component units
and machines.
• Location independence – file name does not need to be
changed when the file’s physical storage location changes.
• Better file abstraction.
• Promotes sharing the storage space itself.
• Separates the naming hierarchy from the storage-devices
hierarchy.
Naming
• Name space: (e.g.,) /home/students/jack, /home/staff/jill.
• Name space is a collection of names.
• Location transparency: file names do not indicate their
physical locations.
• Name resolution: mapping name space to an
object/device/file/directory.
• Naming approaches:
• Simple Concatenation: add hostname to file names.
• Guarantees unique names.
• No transparency. Moving a file to another host involves a file
name change.
7
DFS – Three Naming Schemes
1. Mount remote directories to local directories, giving the
appearance of a coherent local directory tree
• Mounted remote directories can be accessed transparently.
• Unix/Linux with NFS; Windows with mapped drives
2. Files named by combination of host name and local
name;
• Guarantees a unique system wide name
• Windows Network Places, Apollo Domain
3. Total integration of component file systems.
• A single global name structure spans all the files in the system.
• If a server is unavailable, some arbitrary set of directories on
different machines also becomes unavailable.
Mechanisms for DFS
• Mounting: to help in combining files/directories in different
systems and form a single file system structure.
• Caching: to reduce the response time in bringing data from
remote machines.
• Hints: modified caching
• Bulk data transfer: helps in reducing the delay due to
transfer of files over the network. Bulk:
• Obtain multiple number of blocks with a single seek
• Format, transfer large number of packets in a single
context switch.
• Reduce the number of acknowledgements to be sent.
• (e.g.,) useful when downloading OS onto a diskless
client.
• Encryption: Establish a key for encryption with the help of
an authentication server.
9
Mounting Remote Directories
(NFS)
Mounting Remote Directories
(continued)
13
DFS – File Caches
• In client memory
• Performance speed up; faster access
• Good when local usage is transient
• Enables diskless workstations
• On client disk
• Good when local usage dominates (e.g., AFS)
• Caches larger files
• Helps protect clients from server crashes
Writing Policy
• When should a modified cache content be transferred to
the server?
• Write-through policy:
• Immediate writing at server when cache content is
modified.
• Advantage: reliability, crash of cache (client) does not mean
loss of data.
• Disadvantage: Several writes for each small change.
• Delayed writing policy:
• Write at the server, after a delay.
• Advantage: small/frequent changes do not increase
network traffic.
• Disadvantage: less reliable, susceptible to client crashes.
• Write at the time of file closing.
15
DFS –Cache Update Policies
• When does the client update the master file?
• I.e. when is cached data written from the cache to the file?
• Write-through – write data through to disk ASAP
• I.e., following write() or put(), same as on local disks.
• Reliable, but poor performance.
• Delayed-write – cache and then written to the server later.
• Write operations complete quickly; some data may be overwritten
in cache, saving needless network I/O.
• Poor reliability
• unwritten data may be lost when client machine crashes
• Inconsistent data
• Variation – scan cache at regular intervals and flush dirty blocks.
DFS Data Access
Request to
Access data
Return data Load data Load server
to client to client cache cache
Check
client
Data Issue disk
cache
present read
Data
Not present Data
Not present
Check Check
Local disk Server cache
Data
(if any) Data
present
present
17
DFS – File Consistency
• Is locally cached copy of the data consistent with
the master copy?
• Client-initiated approach
• Client initiates a validity check with server.
• Server verifies local data with the master copy
• E.g., time stamps, etc.
• Server-initiated approach
• Server records (parts of) files cached in each client.
• When server detects a potential inconsistency, it reacts
Cache Consistency
• When should a modified source content be transferred to the
cache?
• Server-initiated policy:
• Server cache manager informs client cache managers that
can then retrieve the data.
• Client-initiated policy:
• Client cache manager checks the freshness of data before
delivering to users. Overhead for every data access.
• Concurrent-write sharing policy:
• Multiple clients open the file, at least one client is writing.
• File server asks other clients to purge/remove the cached
data for the file, to maintain consistency.
19
Cache Consistency ...
• Sequential-write sharing policy: a client opens a file that
was recently closed after writing.
• This client may have outdated cache blocks of the file (since
the other client might have modified the file contents).
• Use time stamps for both cache and files. Compare the
time stamps to know the freshness of blocks.
• The other client (which was writing previously) may still have
modified data in its cache that has not yet been updated on
server. (e.g.,) due to delayed writing.
• Server can force the previous client to flush its cache
whenever a new client opens the file.
20
DFS – Remote Service vs. Caching
• Remote Service – all file actions implemented by
server.
• RPC functions
• Use for small memory diskless machines
• Particularly applicable if large amount of write activity
• Cached System
• Many “remote” accesses handled efficiently by the local
cache
• Most served as fast as local ones.
• Servers contacted only occasionally
• Reduces server load and network traffic.
• Enhances potential for scalability.
• Reduces total network overhead
DFS – File Server Semantics
• Stateless Service
• Avoids state information in server by making
each request self-contained.
• Each request identifies the file and position in
the file.
• No need to establish and terminate a
connection by open and close operations.
• Poor support for locking or synchronization
among concurrent accesses
DFS – File Server Semantics
(continued)
• Stateful Service
• Client opens a file (as in Unix & Windows).
• Server fetches information about file from disk, stores in
server memory,
• Returns to client a connection identifier unique to
client and open file.
• Identifier used for subsequent accesses until session
ends.
• Server must reclaim space used by no longer active
clients.
• Increased performance; fewer disk accesses.
• Server retains knowledge about file
• E.g., read ahead next blocks for sequential access
• E.g., file locking for managing writes
• Windows
DFS –Server Semantics
Comparison
• Failure Recovery: Stateful server loses all volatile
state in a crash.
• Restore state by recovery protocol based on a dialog
with clients.
• Server needs to be aware of crashed client processes
• orphan detection and elimination.
• Failure Recovery: Stateless server failure and
recovery are almost unnoticeable.
• Newly restarted server responds to self-contained
requests without difficulty.
DFS –Server Semantics Comparison
(continued)
•…
• Penalties for using the robust stateless service: –
• longer request messages
• slower request processing
• Some environments require stateful service.
• Server-initiated cache validation cannot provide
stateless service.
• File locking (one writer, many readers).
DFS – Replication
• Replicas of the same file reside on failure-independent
machines.
• Improves availability and can shorten service time.
• Naming scheme maps a replicated file name to a particular
replica.
• Existence of replicas should be invisible to higher levels.
• Replicas must be distinguished from one another by
different lower-level names.
• Updates
• Replicas of a file denote the same logical entity
• Update to any replica must be reflected on all other
replicas.
DFS: Case Studies
• NFS (Network File System)
• Developed by Sun Microsystems (in 1985)
• Most popular, open, and widely used.
• NFS protocol standardised through IETF (RFC 1813)
• Basic idea:
• Remote directory is mounted onto local directory
• Remote directory may contain mounted directories
within
Mounting Remote Directories
(NFS)
Nested Mounting (NFS)
NFS Implementation
NFS
NFS Operations
• Lookup
• Fundamental NFS operation
• Takes pathname, returns file handle
• File Handle
• Unique identifier of file within server
• Persistent; never reused
• Storable, but opaque to client
• 64 bytes in NFS v3; 128 bytes in NFS v4
• Most other operations take file handle as argument
Other NFS Operations (version 3)
• read, write
• link, symlink
• mknod, mkdir • Conspicuously absent
• rename, rmdir • open, close
• readdir, readlink
• getattr, setattr
• create, remove
NFS v3 — A Stateless Service
• Server retains no knowledge of client
• Server crashes invisible to client
• All hard work done on client side
• Every operation provides file handle
• Server caching
• Performance only
• Based on recent usage
• Client caching
• Client checks validity of caches files
• Client responsible for writing out caches
•…
NFS v3 — A Stateless Service
(continued)
•…
• No locking! No synchronization!