Distributed-File Systems Background
Distributed-File Systems Background
1 2
• Service – software entity running on one or more machines and • Naming – mapping between logical and physical objects
providing a particular type of function to a priori unknown
clients.
• Ideally, client interface should be transparent, i.e., not
distinguish between local and remote files
• Server – service software running on a single machine. – In practice, this is not always possible
• Client – process that can invoke a service using a set of – More complicated failure modes, different design goals
operations that forms its client interface. sometimes motivate a different interface
• A client interface for a file service is formed by a set of primitive • A transparent DFS hides the location where in the network the
file operations (create, delete, read, write). file is stored.
– There is a binding from file f to server s
– Either static or dynamic
– S is possibly a set, for replicated files
3 4
1
Naming Structures Naming Schemes — Three Main Approaches
• Location transparency – file name does not reveal the file’s • Files named by combination of their host name and local name;
physical storage location. guarantees a unique systemwide name.
– File name still denotes a specific, although hidden, set of • Attach remote directories to local directories, giving the
physical disk blocks. appearance of a coherent directory tree; only previously
– Convenient way to share data. mounted remote directories can be accessed transparently
– Can expose correspondence between component units • Total integration of the component file systems.
and machines.
– A single global name structure spans all the files in the
• Location independence – file name does not need to be system.
changed when the file’s physical storage location changes. – If a server is unavailable, some arbitrary set of directories
– Better file abstraction. on different machines also becomes unavailable. .
– Promotes sharing the storage space itself.
– Separates the naming hierarchy form the storage-devices
hierarchy.
5 6
• Reduce network traffic by retaining recently accessed disk • Advantages of disk caches
blocks in a cache, so that repeated accesses to the same – More reliable.
information can be handled locally. – Cached data kept on disk are still there during recovery
– If needed data not already cached, a copy of data is and don’t need to be fetched again.
brought from the server to the user.
– Accesses are performed on the cached copy.
• Advantages of main-memory caches:
– Permit workstations to be diskless.
– Files identified with one master copy residing at the server
machine, but copies of (parts of) the file ar scattered in – Data can be accessed more quickly.
different caches. – Performance speedup in bigger memories.
– Cache-consistency problem – keeping the cached copies – Server caches (used to speed up disk I/O) are in main
consistent with the master file. memory regardless of where user caches are located;
using main-memory caches on the user machine permits
a single caching mechanism for servers and users.
7 8
2
Cache Placement Cache Update Policy
• Two locations for a cache • Write-through – write data through to disk as soon as they are
– In the client placed on any cache. Reliable, but poor performance.
– In the server • Delayed-write – modifications written to the cache and then
written through to the server later. Write accesses complete
• Client caches can reduce network traffic:
quickly; some data may be overwritten before they are written
– Read-only operations on unchanged files do not need to back, and so need never be written at all.
go over the network
– Poor reliability; unwritten data will be lost whenever a user
• Server caches can reduce server load: machine crashes.
– Cache is amortized across all clients (but needs to be – Variation – scan cache at regular intervals and flush
bigger to be effective) blocks that have been modified since the last scan.
• In practice, need both kinds of caches – Variation – write-on-close, writes data back to the server
when the file is closed. Best for files that are open for
long periods and frequently modified.
9 10
• Is locally cached copy of the data consistent with the master • Mechanism.
copy? – Client opens a file.
• Client-initiated approach – Server fetches information about the file from its disk,
stores it in its memory, and gives the client a connection
– Client initiates a validity check. identifier unique to the client and the open file.
– Server checks whether the local data are consistent with – Identifier is used for subsequent accesses until the
the master copy. session ends.
• Server-initiated approach • Increased performance.
– Server records, for each client, the (parts of) files it – Fewer disk accesses
caches. – Stateful server knows if a file was opened for sequential
– When server detects a potential inconsistency, it must access and can thus read ahead the next blocks
react. – RPCs are small, contain only an identifier
– File may be cached entirely on the client, invalidated by
the server if there is a conflicting write
11 12
3
Stateless File Server Distinctions Between Stateful & Stateless Service
13 14
• Penalties for using the robust stateless service: • Replicas of the same file reside on failure-independent
– longer request messages machines.
– slower request processing • Improves availability and can shorten service time.
– additional constraints imposed on DFS design • Naming scheme maps a replicated file name to a particular
• Some environments require stateful service. replica.
– A server employing server-initiated cache validation – Existence of replicas should be invisible to higher levels.
cannot provide stateless service, since it maintains a – Replicas must be distinguished from one another by
record of which files are cached by which clients. different lower-level names.
– UNIX use of file descriptors and implicit offsets is • Updates – replicas of a file denote the same logical entity, and
inherently stateful; servers must maintain tables to map thus an update to any replica must be reflected on all other
the file descriptors to inodes, and store the current offset replicas.
within a file.
• Demand replication – reading a nonlocal replica causes it to be
cached locally, thereby generating a new nonprimary replica.
15 16
4
The Sun Network File System (NFS) NFS (Cont.)
17 18
19 20
5
NFS Protocol Three Major Layers of NFS Architecture
• Provides a set of remote procedure calls for remote file • UNIX file-system interface (based on the open, read, write, and
operations. The procedures support the following operations: close calls, and file descriptors).
– searching for a file within a directory
– reading a set of directory entries
• Virtual File System (VFS) layer – distinguishes local files from
remote ones, and local files are further distinguished according
– manipulating links and directories to their file-system types.
– accessing file attributes
– The VFS activates file-system-specific operations to
– reading and writing blocks within files handle local requests according to their file-system types.
• NFS servers are stateless; each request has to provide a full – Calls the NFS protocol procedures for remote requests.
set of arguments.
• NFS service layer – bottom layer of the architecture;
• Modified data must be committed to the server’s disk before implements the NFS protocol.
results are returned to the client (lose advantages of caching).
• The NFS protocol does not provide concurrency-control
mechanisms.
21 22
23 24
6
Three Independent File Systems Mounting in NFS
25 26
27 28
7
NFS and Locking ANDREW Filesystem
• File locks are a useful abstraction • Andrew filesystem (AFS) is designed to be highly scalable
– Consider mail delivery – The system is designed to be able to name and access all AFS
servers in the world
• Impossible to implement locks in a stateless way • Client-server model
– The whole point of a lock is to have some state that
protects the file in question • Simple interface
– GET file
– NFS makes an attempt
– PUT file
– Cannot offer strong guarantees
– Other calls for manipulating access controls, volumes, etc.
– Implementation was always ‘buggy’– a euphemism
• Whole file caching is the central idea behind AFS
– Later amended with block operations
– Simple, effective
• AFS is stateful
– Servers keep track of which clients have which files
– Recall files when they have been modified
29 30
• Dedicated servers present an homogeneous, identical, and • Servers arrange storage in logical volumes,
location transparent file hierarchy to clients • Files and directories are named by an fid. A fid identifies a file
• Clients are required to have local disks where they store or directory. A fid is 96 bits long and has three equal-length
components:
– their local files
– volume number
– the result of GET operations
– vnode number – index into an array containing the inodes
of files in a single volume.
– uniquifier – allows reuse of vnode numbers, thereby
keeping certain data structures, compact.
• Fids are location transparent; therefore, file movements from
server to server do not invalidate cached directory contents.
• Location information is kept on a volume basis, and the
information is replicated on each server.
31 32
8
ANDREW File Operations ANDREW Implementation
• Andrew caches entire files form servers. A client workstation • Client processes are interfaced to a UNIX kernel with the usual
interacts with servers only during opening and closing of files. set of system calls.
• AFS caches files from servers when they are opened, and • AFS carries out path-name translation component by
stores modified copies of files back when they are closed. component.
• Reading and writing bytes of a file are done by the kernel, • The UNIX file system is used as a low-level storage system for
without AFS involvement, on the cached copy. both servers and clients. The client cache is a local directory
on the workstation’s disk.
• AFS caches contents of directories and symbolic links, for path-
name translation. • Both AFS and server processes access UNIX files directly by
their inodes to avoid the expensive path name-to-inode
translation routine.
33 34