Distributed File Systems
Distributed File Systems
Distributed File
Systems
Distributed Systems
Introduction
File service architecture
Sun Network File System (NFS)
Andrew File System (AFS)
Recent advances
Summary
Learning objectives
9 Understand the requirements that affect
the design of distributed services
9 NFS: understand how a relatively simple,
widely-used service is designed
o Obtain a knowledge of file systems, both local
and networked
o Caching as an essential design technique
o Remote interfaces are not the same as APIs
o Security requires special consideration
9 Recent advances: appreciate the ongoing
research that often leads to major
advances
Distributed Systems DoCS 2002 2
DoCS 2002 1
ReplicationAndConsistency September 2002
DoCS 2002 2
ReplicationAndConsistency September 2002
DoCS 2002 3
ReplicationAndConsistency September 2002
filedes = open(name, mode) Opens an existing file with the given name.
filedes = creat(name, mode) Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open
file. The mode is read, write or both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer.
count = write(filedes, buffer, n) Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
and advance the read-write pointer.
pos = lseek(filedes, offset, Moves the read-write pointer to offset (relative or absolute,
whence) depending on whence).
status = unlink(name) Removes the file name from the directory structure. If the file
has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file name into buffer.
DoCS 2002 4
ReplicationAndConsistency September 2002
DoCS 2002 5
ReplicationAndConsistency September 2002
DoCS 2002 6
ReplicationAndConsistency September 2002
DoCS 2002 7
ReplicationAndConsistency September 2002
9 Transparency Consistency
9 Concurrency Unix offers one-copy update semantics for
9 Replication operations on local files - caching is
9 Heterogeneity completely transparent.
9 Fault tolerance Difficult to achieve the same for
distributed file systems while maintaining
9 Consistency good performance and scalability.
9 Security
9 Efficiency..
DoCS 2002 8
ReplicationAndConsistency September 2002
9 Transparency Security
9 Concurrency Must maintain access control and privacy as
9 Replication for local files.
9 Heterogeneity based on identity of user making request
9 Fault tolerance identities of remote users must be
authenticated
9 Consistency
privacy requires secure communication
9 Security
Service interfaces are open to all processes
9 Efficiency.. not excluded by a firewall.
vulnerable to impersonation and
other attacks
9 Transparency
9 Concurrency
Efficiency
9 Replication
Goal for distributed file systems is usually
9 Heterogeneity performance comparable to local file
9 Fault tolerance system.
9 Consistency
9 Security
9 Efficiency..
DoCS 2002 9
ReplicationAndConsistency September 2002
Client module
Read
Write
Create
Delete
GetAttributes
SetAttributes
Distributed Systems DoCS 2002 19
Read(FileId,position
i, n) ->of first
Databyte Lookup(Dir, Name) -> FileId
AddName(Dir, Name, FileId)
Write(FileId,position
i, Data)of first byte
UnName(Dir, Name)
Create() -> FileId
GetNames(Dir, Pattern) -> NameSeq
Delete(FileId) FileId
GetAttributes(FileId) -> A unique identifier for files anywhere in
the network.
Attr Pathname lookup
Pathnames such as '/usr/bin/tar' are resolved by
SetAttributes(FileId, Attr) iterative calls to lookup(), one call for each
component of the path, starting with the ID of
the root directory '/' which is known in every
client.
DoCS 2002 10
ReplicationAndConsistency September 2002
File Group
A collection of files that can be
located on any server or moved
between servers while To construct a globally
maintaining the same names. unique ID we use some
unique attribute of the
o Similar to a UNIX filesystem
machine on which it is
o Helps with distributing the created, e.g. IP number,
load of file serving between even though the file group
several servers. may move subsequently.
o File groups have identifiers
which are unique throughout
the system (and hence for an File Group ID:
open system, they must be 32 bits 16 bits
globally unique). IP address date
Used to refer to file groups
and files
DoCS 2002 11
ReplicationAndConsistency September 2002
NFS architecture
Client computer
remote files
UNIX NFS UNIX
NFS NFS
file Client file
Other
client server
system system
NFS
protocol
(remote operations)
Distributed Systems DoCS 2002 23
DoCS 2002 12
ReplicationAndConsistency September 2002
DoCS 2002 13
ReplicationAndConsistency September 2002
Mount service
9 Mount operation:
mount(remotehost, remotedirectory, localdirectory)
Remote Remote
people students x staff users
mount mount
Note: The file system mounted at /usr/students in the client is actually the sub-tree located at
/export/people in Server 1; the file system mounted at /usr/staff in the client is actually the
sub-tree located at /nfs/users in Server 2.
DoCS 2002 14
ReplicationAndConsistency September 2002
Automounter
NFS client catches attempts to access 'empty'
mount points and routes them to the Automounter
o Automounter has a table of mount points and multiple
candidate serves for each
o it sends a probe message to each candidate server and
then uses the mount service to mount the filesystem at
the first server to respond
9 Keeps the mount table small
9 Provides a simple form of replication for read-only
filesystems
o E.g. if there are several servers with identical copies of
/usr/lib then each server will have a chance of being
mounted at some clients.
Kerberized NFS
9 Kerberos protocol is too costly to apply on each file
access request
9 Kerberos is used in the mount service:
o to authenticate the user's identity
o User's UserID and GroupID are stored at the server with the
client's IP address
9 For each file request:
o The UserID and GroupID sent must match those stored at
the server
o IP addresses must also match
9 This approach has some problems
o can't accommodate multiple users sharing the same client
computer
o all remote filestores must be mounted each time a user logs in
Distributed Systems DoCS 2002 30
DoCS 2002 15
ReplicationAndConsistency September 2002
DoCS 2002 16
ReplicationAndConsistency September 2002
NFS performance
9 Early measurements (1987) established that:
o write() operations are responsible for only 5% of server calls in typical UNIX
environments
hence write-through at server is acceptable
o lookup() accounts for 50% of operations -due to step-by-step pathname
resolution necessitated by the naming and mounting semantics.
9 More recent measurements (1993) show high
performance:
1 x 450 MHz Pentium III: > 5000 server ops/sec, < 4 ms average
latency
24 x 450 MHz IBM RS64: > 29,000 server ops/sec, < 4 ms average
latency
see www.spec.org for more recent measurements
9 Provides a good solution for many environments
including:
o large networks of UNIX and PC clients
o multiple web server installations sharing a single file store
Distributed Systems DoCS 2002 34
DoCS 2002 17
ReplicationAndConsistency September 2002
NFS summary
9 An excellent example of a simple, robust, high-performance
distributed service.
9 Achievement of transparencies:
Access: Excellent; the API is the UNIX system call interface for
both local and remote files.
Location: Not guaranteed but normally achieved; naming of
filesystems is controlled by client mount operations, but
transparency can be ensured by an appropriate system
configuration.
Concurrency: Limited but adequate for most purposes; when
read-write files are shared concurrently between clients,
consistency is not perfect.
Replication: Limited to read-only file systems; for writable files,
the SUN Network Information Service (NIS) runs over NFS
and is used to replicate essential system files,
Distributed Systems DoCS 2002 35
NFS summary
Achievement of transparencies (continued):
Failure: Limited but effective; service is suspended if a server
fails. Recovery from failures is aided by the simple stateless
design.
Mobility: Hardly achieved; relocation of files is not possible,
relocation of filesystems is possible, but requires updates to
client configurations.
Performance: Good; multiprocessor servers achieve very high
performance, but for a single filesystem it's not possible to
go beyond the throughput of a multiprocessor server.
Scaling: Good; filesystems (file groups) may be subdivided and
allocated to separate servers. Ultimately, the performance
limit is determined by the load on the server holding the
most heavily-used filesystem (file group).
Distributed Systems DoCS 2002 36
DoCS 2002 18
ReplicationAndConsistency September 2002
User Venus
program
Vice
UNIX kernel
UNIX kernel
Venus
User Network
program
UNIX kernel
Vice
Venus
User
program UNIX kernel
UNIX kernel
bin
Symbolic
links
DoCS 2002 19
ReplicationAndConsistency September 2002
User Venus
program
UNIX file Non-local file
system calls operations
UNIX kernel
UNIX file system
Local
disk
DoCS 2002 20
ReplicationAndConsistency September 2002
DoCS 2002 21
ReplicationAndConsistency September 2002
DoCS 2002 22
ReplicationAndConsistency September 2002
Summary
9 Sun NFS is an excellent example of a distributed service designed
to meet many important design requirements
9 Effective client caching can produce file service performance
equal to or better than local file systems
9 Consistency versus update semantics versus fault tolerance
remains an issue
9 Most client and server failures can be masked
9 Superior scalability can be achieved with whole-file serving
(Andrew FS) or the distributed virtual disk approach
Future requirements:
support for mobile users, disconnected operation, automatic re-
integration
support for data streaming and quality of service (Cf. Tiger file
system,)
Distributed Systems DoCS 2002 45
DoCS 2002 23