Distributed File System
11
Distributed Storage
Storage needs increase almost exponentially – widespread use of e-mail,
photos, videos, logs, …
Can’t store everything on one large disk. If the disk fails, we lose
everything!
Solution: Store the user’s information along with “some redundant
information” across many disks.
If a disk fails, then you still have enough information in the surviving
disks. Bring in a new disk and replace the information lost by the failed
disk ASAP.
Simple? No. Today’s large data centers have so many disks that multiple
disk failures are more common! Permanent data loss becomes likely.
This presentation is about these issues.
2
Distributed Storage: what we care about.
Performance metrics:
Storage efficiency: how much redundant information do you
store?
Saturation throughput: how many I/O requests can the system
handle before it collapses (or delay increases to infinity)?
Rebuild time: how fast can you replace information lost due to
disk failure?
Mean time to data loss: under assumptions on failure and usage
models of the system, how long do you expect to run without any
permanent loss of data?
Encoding/Decoding/Update/Rebuild complexity: the
computation power needed for all these operations; also, how
many bytes of data on how many disks do you have to update if
you just want to update 1 byte of user data?
Sequential read/write bandwidth: bandwidth the system can
provide for streaming data
3
Distributed Files Systems (DFS)
A special case of distributed system
Allows multi-computer systems to share files
Even when no other IPC or RPC is needed
Sharing devices
Special case of sharing files
E.g.,
NFS (Sun’s Network File System)
Windows NT, 2000, XP
Andrew File System (AFS) & others …
Distributed File Systems 4
Distributed File Systems (continued)
One of most common uses of distributed
computing
Goal: provide common view of centralized
file system, but distributed implementation.
Ability to open & update any file on any machine
on network
All of synchronization issues and capabilities of
shared local files
Distributed File Systems 5
DFS Structure
Service – software entity running on one or more
machines and providing a particular type of function
to a priori unknown clients
Server – service software running on a single machine
Client – process that can invoke a service using a set
of operations that forms its client interface
A client interface for a file service is formed by a set of
primitive file operations (create, delete, read, write)
Client interface of a DFS should be transparent, i.e.,
not distinguish between local and remote files
6
Naming of Distributed Files
Naming – mapping between logical and physical objects.
A transparent DFS hides the location where in the network the
file is stored.
Location transparency – file name does not reveal the file’s
physical storage location.
File name denotes a specific, hidden, set of physical disk blocks.
Convenient way to share data.
Could expose correspondence between component units and
machines.
Location independence – file name does not need to be
changed when the file’s physical storage location changes.
Better file abstraction.
Promotes sharing the storage space itself.
Separates the naming hierarchy from the storage-devices
hierarchy.
Distributed File Systems 7
DFS – Three Naming Schemes
1. Mount remote directories to local directories
1. Mounted remote directories can be accessed transparently.
2. Unix/Linux with NFS; Windows with mapped drives
2. Files named by combination of host name and local name;
1. Guarantees a unique system wide name
2. Windows Network Places
3. Total integration of component file systems.
1. A single global name structure spans all the files in the system.
2. If a server is unavailable, some arbitrary set of directories on
different machines also becomes unavailable.
3. Andrew File System
Distributed File Systems 8
Mounting Remote Directories (NFS)
Distributed File Systems 9
Mounting Remote Directories (continued)
Note:– names of files are not unique
As represented by path names
E.g.,
Server A sees : /users/steen/mbox
Client A sees: /remote/vu/mbox
Client B sees: /work/me/mbox
Distributed File Systems 10
DFS – File Access Performance
Reduce network traffic by retaining recently
accessed disk blocks in local cache
Repeated accesses to the same information can be
handled locally.
All accesses are performed on the cached copy.
If needed data not already cached, copy of data
brought from the server to the local cache.
Copies of parts of file may be scattered in different caches.
Cache-consistency problem – keeping the cached
copies consistent with the master file.
Especially on write operations
Distributed File Systems 11
Where to put File Caches
In client memory
Performance speed up; faster access
Good when local usage is temporary
Enables diskless workstations
On client disk
Good when local usage dominates (e.g., AFS)
Caches larger files
Helps protect clients from server crashes
Distributed File Systems 12
Caching
We can employ caching to improve system
performance. There are four places in a distributed
system where we can hold data:
On the server's disk
In a cache in the server's memory
In the client's memory
On the client's disk, the first two places are not an
issue since any interface to the server can check the
centralized cache. It is in the last two places that
problems arise and we have to consider the issue
of cache consistency.
13
Caching
Several approaches may be taken:
write-through
What if another client reads its own cached copy? All accesses would require
checking with the server first (adds network congestion) or require the server
to maintain state on who has what files cached. Write-through also does not
alleviate congestion on writes.
delayed writes
Data can be buffered locally (where consistency suffers) but files can be
updated periodically. A single bulk write is far more efficient than lots of little
writes every time any file contents are modified. Unfortunately the semantics
become ambiguous.
write on close
This is admitting that the file system uses session semantics.
centralized control
The server keeps track of who has what open in which mode. We would have to
support a stateful system and deal with signaling traffic.
14
Cache Location: Disk vs. Main Memory
Advantages of disk caches:
More reliable
Cached data kept on disk are still there during recovery and
don’t need to be fetched again
Advantages of main memory caches:
Permit workstations to be diskless
Data can be accessed more quickly
Performance speedup in bigger memories
Server caches (used to speed up disk I/O) are in main
memory regardless of where user caches are located; using
main memory caches on the user machine permits a single
caching mechanism for servers and users
15
File Cache Update Policies
When does the client update the master file?
I.e. when is cached data written from the cache to the file?
Write-through – write data through to disk ASAP
I.e., following write() or put(), same as on local disks.
Reliable, but poor performance.
Delayed-write – cache and then written to the server later.
Write operations complete quickly; some data may be
overwritten in cache, saving needless network I/O.
Poor reliability
unwritten data may be lost when client machine crashes
Inconsistent data
Variation – scan cache at regular intervals
Distributed File Systems 16
DFS – File Consistency
Is locally cached copy of the data consistent with the
master copy?
Client-initiated approach
Client initiates a validity check with server.
Server verifies local data with the master copy
E.g., time stamps, etc.
Server-initiated approach
Server records (parts of) files cached in each client.
When server detects a potential inconsistency, it reacts
Distributed File Systems 17
DFS – Remote Service vs. Caching
Remote Service – all file actions implemented by
server/service.
RPC functions
Use for small memory diskless machines
Particularly applicable if large amount of write activity
Cached System
Many “remote” accesses handled efficiently by the local
cache
Servers contacted only occasionally
Reduces server load and network traffic.
Enhances potential for scalability.
Reduces total network overhead
Distributed File Systems 18
DFS – File Server Semantics
Stateless Service
Avoids state information in server by making each
request self-contained.
Each request identifies the file and position in the
file.
No need to establish and terminate a connection
by open and close operations.
Poor support for locking or synchronization
among concurrent accesses
Distributed File Systems 19
DFS – File Server Semantics
(continued)
Stateful Service
Client opens a file (as in Unix & Windows).
Server fetches information about file from disk, stores in
server memory,
Returns to client a connection identifier unique to client and
open file.
Identifier used for subsequent accesses until session ends.
Server must reclaim space used by no longer active clients.
Increased performance; fewer disk accesses.
Server retains knowledge about file
E.g., read ahead next blocks for sequential access
E.g., file locking for managing writes
Windows
Distributed File Systems 20
DFS –Server Semantics
Comparison
Failure Recovery: Stateful server loses all volatile
state in a crash.
Restore state by recovery protocol based on a dialog with
clients.
Server needs to be aware of crashed client processes
orphan detection and elimination.
Failure Recovery: Stateless server failure and
recovery are almost unnoticeable.
Newly restarted server responds to self-contained requests
without difficulty.
Distributed File Systems 21
Example Distributed File Systems
NFS – Sun’s Network File System (ver. 3)
NFS – Sun’s Network File System (ver. 4)
AFS – the Andrew File System
Distributed File Systems 22
NFS
Sun Network File System (NFS) has become
standard for distributed UNIX file access.
NFS runs over LAN
even WAN (slowly)
Any system may be both a client and server
Basic idea:
Remote directory is mounted onto local directory
Remote directory may contain mounted directories within
Distributed File Systems 23
24
NFS v3 — A Stateless Service
Server retains no knowledge of client
Server crashes invisible to client
All hard work done on client side
Every operation provides file handle
Server caching
Performance only
Based on recent usage
Client caching
Client checks validity of caches files
Client responsible for writing out caches
…
Distributed File Systems 25
NFS v3 — A Stateless Service (continued)
…
No locking! No synchronization!
Unix file semantics not guaranteed
E.g., read after write
Session semantics not even guaranteed
E.g., open after close
Distributed File Systems 26
NFS v3 — A Stateless Service (continued)
Solution: global lock manager
Separate from NFS
Typical locking operations
Lock – acquire lock (non-blocking)
Lockt – test a lock
Locku – unlock a lock
Renew – renew lease on a lock
Distributed File Systems 27
NFS procedures
NFS Functions
Procedures
LOOKUP Returns a file handle and attribute corresponding to a file name in a specified directory.
MKDIR Create a directory.
RMDIR Delete a directory.
READDIR Read a directory. Used by the Unix ls command, for example.
RENAME Rename a file.
REMOVE Delete a file.
CREATE Create a file.
READ Read from a file, by specify the file handle, starting offset and max. no. of bytes to read
(up to 8192).
WRITE Write to a file.
GETATTR Returns the attributes of a file: type of file, permissions, size, owner, last-access time,
and so on.
SETATTR Set the attributes of a file: permissions, owner, group, size,and last-access and last-
modification time.
LINK Create a Unix hard link to a file.
SYMLINK Create a symbolic link to a file.
READLINK Returns the name of the file to whidh the symbolic link points.
STATFS Returns the status of a file system. Used by the Unix df command, for example.
28
NFS Implementation
Remote procedure calls for all operations
Implemented in Sun ONC (Open Network Computing)
Network communication is client-initiated
RPC based on UDP (non-reliable protocol)
Response to remote procedure call is
acknowledgement
Lost requests are simply re-transmitted
As many times as necessary to get a response!
29
Summary NFS
That was version 3 of NFS
Stateless file system
High performance, simple protocol
Based on UDP
Everything has changed in NFS version 4
First published in 2000
Clarifications published in 2003
Almost complete rewrite of NFS
Distributed File Systems 30
NFS Version 4
Stateful file service
Based on TCP – reliable transport protocol
More ways to access server
Compound requests
I.e., multiple RPC calls in same packet
More emphasis on security
Mount protocol integrated with rest of NFS
protocol
Distributed File Systems 31
NFS Version 4
Distributed File Systems 32
NFS Version 4 (continued)
Additional RPC operations
Long list for managing files, caches, validating versions, etc.
Also security, permissions, etc.
Also
Open() and close().
With a server crash, some information may have to be recovered
Distributed File Systems 33
Distributed File Systems — Summary
Performance is always an issue
Tradeoff between performance and the semantics of file
operations (especially for shared files).
Caching of file blocks is crucial in any file system,
distributed or otherwise.
As memories get larger, most read requests can be serviced
out of file buffer cache (local memory).
Maintaining coherency of those caches is a crucial design
issue.
Current research addressing disconnected file
operation for mobile computers.
Distributed File Systems 34
NFS v3 and v4 compared
NFSv3 NFSv4
A collection of protocols (file access, One protocol to a single port (2049)
mount, lock, status) Lease-based state
Stateless Supports UNIX and Windows file
UNIX-centric, but seen in Windows semantics
too Mandates strong authentication
Deployed with weak authentication String-based identities
32 bit numeric uids Real caching handshake
Ad-hoc caching Windows-like access
UNIX permissions Bans UDP
Works over UDP, TCP Uses a universal character set for file
Needs a-priori agreement on names
character sets
35
Andrew File System (AFS)
Completely different kind of file system
Developed at Carnegie Mellon University
(CMU) to support all student computing.
Consists of workstation clients and dedicated
file server machines.
Distributed File Systems 36
Andrew File System (AFS)
Stateful
Single name space
File has the same names everywhere in the world.
Lots of local file caching
On workstation disks
For long periods of time
Originally whole files, now 64K file chunks.
Good for distant operation because of local disk
caching
Distributed File Systems 37
AFS
Once a file is cached, all operations are
performed locally.
On close, if the file is modified, it is replaced
on the server.
The client assumes that its cache is up to date!
Server knows about all cached copies of file
Callback messages from the server saying otherwise.
…
Distributed File Systems 38
AFS
On file open()
If client has received a callback for file, it must
fetch new copy
Otherwise it uses its locally-cached copy.
Server crashes
Transparent to client if file is locally cached
Server must contact clients to find state of files
Distributed File Systems 39
Network File Sharing
NFS AFS
Low administrative High administrative
overhead overhead
Standard UNIX “Enhanced” backup
backup /restore /restore
Available for most OS Limited OS availability
Distributed Central administration
administration replaces standard
Uses standard utilities utilities
40
Stateful or stateless design?
A stateless system is one in which the client sends a
request to a server, the server carries it out, and returns
the result
Between these requests, no client-specific information is
stored on the server
A stateful system is one where information about client
connections is maintained on the server
State may refer to any information that a server stores
about a client: whether a file is open, whether a file is
being modified, cached data on the client, etc.
41
Stateful or stateless design?
In a stateless system:
Each request must be complete — the file has to be fully identified
and any offsets specified.
If a server crashes and then recovers, no state was lost about client
connections because there was no state to maintain. This creates a
higher degree of fault tolerance.
No remote open/close calls are needed (they only serve to establish
state).
There is no server memory devoted to storing per-client data.
There is no limit on the number of open files on the server; they
aren't "open" since the server maintains no per-client state.
There are no problems if the client crashes. The server does not
have any state to clean up.
42
Stateful or stateless design?
In a stateful file system:
Requests are shorter (there is less information to send).
Cache coherence is possible; the server can know which clients are
caching which blocks of a file.
With shorter requests and caching, one will generally see better
performance in processing the requests.
File locking is possible; the server can keep state that a certain
client is locking a file (or portion thereof).
Although the list of stateless advantages is longer, history shows us
that the clear winner is the stateful approach. The ability to
maintain better cache coherence, lock files, and know whether files
are open by remote clients are all incredibly compelling
advantages.
43