0% found this document useful (0 votes)
79 views9 pages

Distributed-File Systems Background

The document provides background information on distributed file systems (DFS). A DFS manages dispersed storage devices across a network to provide an overall storage space that is greater than any individual device. There are two main approaches to naming and transparency in a DFS - either attaching remote directories to local directories, or using a single global naming structure. Caching is used in DFS to reduce network traffic by storing recently accessed data locally, but this introduces cache consistency issues to resolve.

Uploaded by

pradeepshaktawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views9 pages

Distributed-File Systems Background

The document provides background information on distributed file systems (DFS). A DFS manages dispersed storage devices across a network to provide an overall storage space that is greater than any individual device. There are two main approaches to naming and transparency in a DFS - either attaching remote directories to local directories, or using a single global naming structure. Caching is used in DFS to reduce network traffic by storing recently accessed data locally, but this introduces cache consistency issues to resolve.

Uploaded by

pradeepshaktawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Distributed-File Systems Background

• Background • Distributed file system – a distributed implementation of a file


system, where multiple users share files and storage
• Naming and Transparency resources.
• Stateful versus Stateless Service • A DFS manages set of dispersed storage devices
• NFS • Overall storage space managed by a DFS is composed of
• AFS different, remotely located, smaller storage spaces.
• There is usually a correspondence between constituent storage
spaces and sets of files.

1 2

DFS Structure Naming and Transparency

• Service – software entity running on one or more machines and • Naming – mapping between logical and physical objects
providing a particular type of function to a priori unknown
clients.
• Ideally, client interface should be transparent, i.e., not
distinguish between local and remote files
• Server – service software running on a single machine. – In practice, this is not always possible
• Client – process that can invoke a service using a set of – More complicated failure modes, different design goals
operations that forms its client interface. sometimes motivate a different interface
• A client interface for a file service is formed by a set of primitive • A transparent DFS hides the location where in the network the
file operations (create, delete, read, write). file is stored.
– There is a binding from file f to server s
– Either static or dynamic
– S is possibly a set, for replicated files

3 4

1
Naming Structures Naming Schemes — Three Main Approaches

• Location transparency – file name does not reveal the file’s • Files named by combination of their host name and local name;
physical storage location. guarantees a unique systemwide name.
– File name still denotes a specific, although hidden, set of • Attach remote directories to local directories, giving the
physical disk blocks. appearance of a coherent directory tree; only previously
– Convenient way to share data. mounted remote directories can be accessed transparently
– Can expose correspondence between component units • Total integration of the component file systems.
and machines.
– A single global name structure spans all the files in the
• Location independence – file name does not need to be system.
changed when the file’s physical storage location changes. – If a server is unavailable, some arbitrary set of directories
– Better file abstraction. on different machines also becomes unavailable. .
– Promotes sharing the storage space itself.
– Separates the naming hierarchy form the storage-devices
hierarchy.

5 6

Caching Location – Disk Caches vs. Main Memory Cache

• Reduce network traffic by retaining recently accessed disk • Advantages of disk caches
blocks in a cache, so that repeated accesses to the same – More reliable.
information can be handled locally. – Cached data kept on disk are still there during recovery
– If needed data not already cached, a copy of data is and don’t need to be fetched again.
brought from the server to the user.
– Accesses are performed on the cached copy.
• Advantages of main-memory caches:
– Permit workstations to be diskless.
– Files identified with one master copy residing at the server
machine, but copies of (parts of) the file ar scattered in – Data can be accessed more quickly.
different caches. – Performance speedup in bigger memories.
– Cache-consistency problem – keeping the cached copies – Server caches (used to speed up disk I/O) are in main
consistent with the master file. memory regardless of where user caches are located;
using main-memory caches on the user machine permits
a single caching mechanism for servers and users.

7 8

2
Cache Placement Cache Update Policy

• Two locations for a cache • Write-through – write data through to disk as soon as they are
– In the client placed on any cache. Reliable, but poor performance.
– In the server • Delayed-write – modifications written to the cache and then
written through to the server later. Write accesses complete
• Client caches can reduce network traffic:
quickly; some data may be overwritten before they are written
– Read-only operations on unchanged files do not need to back, and so need never be written at all.
go over the network
– Poor reliability; unwritten data will be lost whenever a user
• Server caches can reduce server load: machine crashes.
– Cache is amortized across all clients (but needs to be – Variation – scan cache at regular intervals and flush
bigger to be effective) blocks that have been modified since the last scan.
• In practice, need both kinds of caches – Variation – write-on-close, writes data back to the server
when the file is closed. Best for files that are open for
long periods and frequently modified.

9 10

Consistency Stateful File Service

• Is locally cached copy of the data consistent with the master • Mechanism.
copy? – Client opens a file.
• Client-initiated approach – Server fetches information about the file from its disk,
stores it in its memory, and gives the client a connection
– Client initiates a validity check. identifier unique to the client and the open file.
– Server checks whether the local data are consistent with – Identifier is used for subsequent accesses until the
the master copy. session ends.
• Server-initiated approach • Increased performance.
– Server records, for each client, the (parts of) files it – Fewer disk accesses
caches. – Stateful server knows if a file was opened for sequential
– When server detects a potential inconsistency, it must access and can thus read ahead the next blocks
react. – RPCs are small, contain only an identifier
– File may be cached entirely on the client, invalidated by
the server if there is a conflicting write

11 12

3
Stateless File Server Distinctions Between Stateful & Stateless Service

• Avoids state information by making each request self- • Failure Recovery.


contained. – A stateful server loses all its volatile state in a crash.
• Each request identifies the file and position in the file. ? Restore state by recovery protocol based on a dialog
with clients, or abort operations that were underway
• No need to establish and terminate a connection by open and when the crash occurred.
close operations
? Server needs to be aware of client failures in order to
• Advantage: reclaim space allocated to record the state of crashed
– A fileserver crash does not affect any clients client processes (orphan detection and elimination).
• Disadvantages: – With stateless server, the effects of server failure and
recovery are almost unnoticeable. A newly reincarnated
– RPCs need to contain all state associated with the server can respond to a self-contained request without
operation any difficulty.

13 14

Distinctions (Cont.) File Replication

• Penalties for using the robust stateless service: • Replicas of the same file reside on failure-independent
– longer request messages machines.
– slower request processing • Improves availability and can shorten service time.
– additional constraints imposed on DFS design • Naming scheme maps a replicated file name to a particular
• Some environments require stateful service. replica.
– A server employing server-initiated cache validation – Existence of replicas should be invisible to higher levels.
cannot provide stateless service, since it maintains a – Replicas must be distinguished from one another by
record of which files are cached by which clients. different lower-level names.
– UNIX use of file descriptors and implicit offsets is • Updates – replicas of a file denote the same logical entity, and
inherently stateful; servers must maintain tables to map thus an update to any replica must be reflected on all other
the file descriptors to inodes, and store the current offset replicas.
within a file.
• Demand replication – reading a nonlocal replica causes it to be
cached locally, thereby generating a new nonprimary replica.

15 16

4
The Sun Network File System (NFS) NFS (Cont.)

• An implementation and a specification of a software system for • Client-server model


accessing remote files across LANs (or WANs). – A remote directory is mounted over a local file system
• Built on top of an unreliable datagram protocol (UDP/IP) directory. The mounted directory looks like an integral
subtree of the local file system, replacing the subtree
descending from the local directory
– Specification of the remote directory for the mount
operation is nontransparent; the host name of the remote
directory has to be provided. Files in the remote directory
can then be accessed in a transparent manner
– Subject to access-rights accreditation, potentially any file
system (or directory within a file system), can be mounted
remotely on top of any local directory

17 18

NFS (Cont.) NFS Mount Protocol


• Establishes initial logical connection between server and client.
• NFS is designed to operate in a heterogeneous environment of • Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
different machines, operating systems, and network
– Mount request is mapped to corresponding RPC and forwarded
architectures; the NFS specifications independent of these
to mount server running on server machine.
media.
– Export list – specifies local file systems that server exports for
• This independence is achieved through the use of RPC mounting, along with names of machines that are permitted to
primitives built on top of an External Data Representation mount them.
(XDR) protocol used between two implementation-independent • Following a mount request that conforms to its export list, the server
interfaces. returns a filesystem handle— a key for further accesses.
• Filesystem handle – a file-system identifier, and an inode number to
identify the mounted directory within the exported file system.
• The mount operation changes only the user’s view and does not
affect the server side.

19 20

5
NFS Protocol Three Major Layers of NFS Architecture

• Provides a set of remote procedure calls for remote file • UNIX file-system interface (based on the open, read, write, and
operations. The procedures support the following operations: close calls, and file descriptors).
– searching for a file within a directory
– reading a set of directory entries
• Virtual File System (VFS) layer – distinguishes local files from
remote ones, and local files are further distinguished according
– manipulating links and directories to their file-system types.
– accessing file attributes
– The VFS activates file-system-specific operations to
– reading and writing blocks within files handle local requests according to their file-system types.
• NFS servers are stateless; each request has to provide a full – Calls the NFS protocol procedures for remote requests.
set of arguments.
• NFS service layer – bottom layer of the architecture;
• Modified data must be committed to the server’s disk before implements the NFS protocol.
results are returned to the client (lose advantages of caching).
• The NFS protocol does not provide concurrency-control
mechanisms.

21 22

Schematic View of NFS Architecture NFS Path-Name Translation

• Performed by breaking the path into component names and


performing a separate NFS lookup call for every pair of
component name and directory vnode.
• To make lookup faster, a directory name lookup cache on the
client’s side holds the vnodes for remote directory names.

23 24

6
Three Independent File Systems Mounting in NFS

Mounts Cascading mounts

25 26

Path-name Translation NFS Remote Operations

• Nearly one-to-one correspondence between regular UNIX


system calls and the NFS protocol RPCs (except opening and
closing files).
• NFS adheres to the remote-service paradigm, but employs
buffering and caching techniques for the sake of performance.
• File-blocks cache – when a file is opened, the kernel checks
with the remote server whether to fetch or revalidate the
cached attributes. Cached file blocks are used only if the
corresponding cached attributes are up to date.
• File-attribute cache – the attribute cache is updated whenever
new attributes arrive from the server.
• Clients do not free delayed-write blocks until the server
confirms that the data have been written to disk.

27 28

7
NFS and Locking ANDREW Filesystem

• File locks are a useful abstraction • Andrew filesystem (AFS) is designed to be highly scalable
– Consider mail delivery – The system is designed to be able to name and access all AFS
servers in the world
• Impossible to implement locks in a stateless way • Client-server model
– The whole point of a lock is to have some state that
protects the file in question • Simple interface
– GET file
– NFS makes an attempt
– PUT file
– Cannot offer strong guarantees
– Other calls for manipulating access controls, volumes, etc.
– Implementation was always ‘buggy’– a euphemism
• Whole file caching is the central idea behind AFS
– Later amended with block operations
– Simple, effective
• AFS is stateful
– Servers keep track of which clients have which files
– Recall files when they have been modified

29 30

ANDREW (Cont.) ANDREW Shared Name Space

• Dedicated servers present an homogeneous, identical, and • Servers arrange storage in logical volumes,
location transparent file hierarchy to clients • Files and directories are named by an fid. A fid identifies a file
• Clients are required to have local disks where they store or directory. A fid is 96 bits long and has three equal-length
components:
– their local files
– volume number
– the result of GET operations
– vnode number – index into an array containing the inodes
of files in a single volume.
– uniquifier – allows reuse of vnode numbers, thereby
keeping certain data structures, compact.
• Fids are location transparent; therefore, file movements from
server to server do not invalidate cached directory contents.
• Location information is kept on a volume basis, and the
information is replicated on each server.

31 32

8
ANDREW File Operations ANDREW Implementation

• Andrew caches entire files form servers. A client workstation • Client processes are interfaced to a UNIX kernel with the usual
interacts with servers only during opening and closing of files. set of system calls.
• AFS caches files from servers when they are opened, and • AFS carries out path-name translation component by
stores modified copies of files back when they are closed. component.
• Reading and writing bytes of a file are done by the kernel, • The UNIX file system is used as a low-level storage system for
without AFS involvement, on the cached copy. both servers and clients. The client cache is a local directory
on the workstation’s disk.
• AFS caches contents of directories and symbolic links, for path-
name translation. • Both AFS and server processes access UNIX files directly by
their inodes to avoid the expensive path name-to-inode
translation routine.

33 34

You might also like