Lecture 5 - DFS & NFS
Lecture 5 - DFS & NFS
Distributed File
Systems & NFS
1
Lecture Outlines
• Distributed file system
• File service architecture
• Network File System
Background Review
• Security
Keywords
2
Distributed File Systems
Intro
• A distributed file system DFS:
• Enables programs to store and access remote files exactly as
they do local ones,
• Allowing users to access files from any computer on a
network.
• The performance and reliability experienced for
access to files stored at a server should be
comparable to that for files stored on local disks.
• Several related file systems that exploit new modes of
data organization on disk or across multiple servers
to achieve :
• high-performance, fault-tolerant and scalable file systems..
3
Distributed File Systems
Intro
• The requirements for sharing within local networks and
intranets lead to a need for a different type of service –
• Supports persistent storage of data
• Supports programs of all types on behalf of clients
• Ensures consistent distribution of up-to-date data
• Basic distributed file systems provide an essential
foundation for organizational computing based on
intranets.
• A well-designed file service:
• provides access to files stored at a server
• Offers performance and reliability similar to or better than files
stored local disks.
4
Distributed File Systems
Intro
• File systems were originally developed for:
• Centralized computer systems and desktop computers as an
operating system facility.
• Providing a convenient programming interface to disk
storage.
• They subsequently acquired features such as
• access-control and file-locking mechanisms
• that made them useful for the sharing of data and programs.
• Distributed file systems support the sharing of
• information in the form of files and
• hardware resources in the form of persistent storage
throughout an intranet.
5
Distributed File Systems
Intro
• A file service
• Enables programs to store and access remote files exactly
as they do local ones,
• Allowing users to access their files from any computer in an
intranet.
• The concentration and Benefits of Centralized
Persistent Storage - at a few servers-:
• Reduces the need for local disk storage and
• (more importantly) enables economies in the management
and archiving of the persistent data owned by an organization.
• As an example: Web servers are reliant on filing systems
for the storage of the web pages that they serve.
6
Distributed File Systems
Intro
• Figure 12.1 provides an overview of types of storage
system.
• In addition to those already mentioned, the table
includes distributed shared memory (DSM) systems and
persistent object stores.
7
Figure 12.1 Storage systems and their properties
Types of consistency:
1: strict one-copy. 3: slightly weaker guarantees. 2: considerably weaker guarantees.
Distributed File Systems
Distributed file system requirements
• Transparency •
• The design of the file service should support many of the
transparency requirements for distributed systems
• Concurrent file updates •
• Changes to a file by one client should not interfere with the
operation of other clients simultaneously accessing or changing the
same file.
• File replication •
• In a file service that supports replication, a file may be represented
by several copies of its contents at different locations.
• Security •
• Virtually all file systems provide access-control mechanisms based
on the use of access-control lists.
9
Distributed File Systems
Distributed file system requirements
• Hardware and operating system heterogeneity •
• The service interfaces should be defined so that client and server software
can be implemented for different operating systems and computers.
• Fault tolerance •
• The central role of the file service in distributed systems makes it essential
that the service continue to operate in the face of client and server failures.
• Consistency •
• This refers to a model for concurrent access to files in which the file
contents seen by all of the processes accessing or updating a given file are
those that they would see if only a single copy of the file contents existed.
• Efficiency •
• A distributed file service should offer facilities that are of at least the same
power and generality as those found in conventional file systems and
should achieve a comparable level of performance.
10
Distributed File Systems
File service architecture
• An architecture that offers a clear separation of the main
concerns in providing access to files is obtained by structuring
the file service as three components –
1. Flat file service,
2. Directory service and
• The flat file service and the directory service each export
an interface for use by client programs.
3. Client module.
• provides a single programming interface with operations
on files similar to those found in conventional file
systems
• The relevant modules and their relationships are shown in
Figure 12.5.
11
Figure 12.5
File service architecture
Client module
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
File service architecture
Flat file service
• The flat file service:
• Concernes with implementing operations on the contents of files.
• Uses Unique File Identifiers (UFIDs) for all requests for flat file
service operations
• Unique file identifiers (UFIDs).
• UFIDs are long sequences of bits chosen so that each file
has a UFID that is unique among all of the files in a
distributed system.
• When the flat file service receives a request to
create a file, it generates a new UFID for it and
returns the UFID to the requester.
• The division of responsibilities between the file
service and the directory service is based upon the
13
File service architecture
Directory service
• The directory service
• Provides a mapping between text names for files and
their UFIDs.
• Allows Clients may obtain the UFID of a file by
quoting its text name.
• The directory service provides the functions :
• Generate directories,
• Add new file names to directories and
• Obtain UFIDs from directories.
14
File service architecture
Client module •
• A client module
• Runs in each client computer,
• Integrating and extending the operations of the flat
file service and the directory service under a single
API.
• Provide a single API that is available to user-level
programs in client computers.
• Holds information about the network locations of
the flat file server and directory server processes.
• Finally, it implements a cache of recently used file
blocks at the client to achieve satisfactory
performance 15
File service architecture
Access control •
• the user’s access rights are checked against the
access mode (read or write) requested in the open call.
• and the file is opened only if the user has the necessary rights.
• The user identity (UID) used in the access rights
check:
• It is retrieved during the user’s earlier authenticated login
• It cannot be tampered with in non-distributed
implementations.
• The resulting access rights are
• Retained until the file is closed, and
• No further checks are required when subsequent operations
on the same file are requested.
16
File service architecture
Access control •
Two alternative approaches can be adopted:
• Approach 1: Capability-Based Access Control
• An access check is made whenever a file name is converted
to a UFID,
• The results are encoded in the form of a capability,
• Capability is returned to the client for submission with
subsequent requests.
• Approach 2: Identity-Based Access Control:
• A user identity is submitted with every client request,
• Access checks are performed by the server for every file
operation.
17
File service architecture
Hierarchic file system
• A hierarchic file system
• consists of a number of directories arranged in a
tree structure.
• Each directory holds the names of the files and
other directories that are accessible from it.
• Any file or directory can be referenced using a
pathname.
• a multi-part name that represents a path through
the tree
• The root has a distinguished name, and each file or
directory has a name in a directory.
18
File service architecture
Hierarchic file system
• A file-naming system
• can be implemented by the client module using the flat file
and directory services.
• A tree-structured network of directories
• is constructed with files at the leaves and directories at the
other nodes of the tree.
• The root of the tree is a directory with a ‘well-known’
UFID.
• Multiple names for files can be supported using the
AddName operation and the reference count field in the
attribute record.
19
File service architecture
Hierarchic file system
• The file attributes associated with files
• should include a type field that distinguishes
between ordinary files and directories.
• This is used when following a path to ensure
that each part of the name, except the last,
refers to a directory.
20
File service architecture
File groups
• A file group is a collection of files located on a given
server.
• A server may hold several file groups, and groups can
be moved between servers,
• but a file cannot change the group to which it belongs.
• A similar construct called a filesystem is used in most of
operating systems.
• Terminology note:
• the single word filesystem refers to the set of files held in a
storage device or partition,
• whereas the words file system refers to a software
component that provides access to files. 21
File service architecture
File groups
• File group identifiers
• must be unique throughout a distributed system.
• To ensure uniqueness, identifiers should be generated using an
algorithm that guarantees uniqueness.
• Since file groups can be moved, the only way to ensure that file
group identifiers will always be distinct in a given system.
• For example, whenever a new file group is created,
• a unique identifier can be generated by combining
the 32-bit IP address of the host creating the group
with a 16-bit integer derived from the date, resulting
in a unique 48-bit integer.
22
Network File System
Intro
• NFS Architecture Overview:
• Follows an abstract model defined earlier.
• Supports the NFS protocol.
• NFS Protocol:
• A set of Remote Procedure Calls (RPCs).
• Allows clients to perform operations on a remote file
store.
• Operating system-independent.
• Originally developed for UNIX systems.
23
Figure 12.8 NFS architecture
Application Application
program program
UNIX
system calls
UNIX kernel
UNIX kernel Virtual file system Virtual file system
Local Remote
file system
client server
system system
NFS
protocol
Network File System
NFS server
• The NFS server module
• resides in the kernel on each computer that acts as
an NFS server.
• Client Module:
• Translates requests referring to files in a remote file system.
• Converts requests to NFS protocol operations.
• Are passed to the NFS server module at the computer
holding the relevant file system.
25
Network File System
NFS server
• The NFS client and server modules communicate using
RPC.
• RPC system
• It was developed for use in NFS.
• It can be configured to use either UDP or TCP, and the NFS
protocol is compatible with both.
• A port mapper service
• It is included to enable clients to bind to services in a given
host by name.
26
Network File System
Virtual file system •
• NFS provides access transparency:
• Allows user programs to issue file operations for
local or remote files without distinction.
• Integration with Other File Systems:
• Other distributed file systems supporting UNIX
system calls can be integrated similarly.
• The integration is achieved by a virtual file system
(VFS) module
27
Network File System
Virtual file system •
• Virtual File System (VFS) Module
• Added to the UNIX kernel to :
• Distinguish between local and remote files and
• Translate between the UNIX-independent file
identifiers used by NFS and the internal file
identifiers normally used in UNIX and other file
systems
• VFS keeps track of the filesystems that are
currently available both locally and remotely,
• it passes each request to the appropriate local
system module (the UNIX file system, the NFS client
module or the service module for another file
28
system).
Network File System
File handles
• File handles :
• The file identifiers used in NFS are called file
handles.
• It is opaque to clients and contains whatever
information the server needs to distinguish an
individual file.
• The filesystem identifier field is a unique
number that is allocated to each filesystem when
it is created.
29
Network File System
File handles •
• The file handle is derived from:
• The file’s i-node number
• i-node number of a file identifies and locates the file
within the file system in which the file is stored
• Two extra fields are added to i-node number
• The i-node generation number is needed because in the
conventional file system i-node numbers are reused after
a file is removed.
• In the VFS extensions to the file system, a generation
number is stored with each file and is incremented each
time the i-node number is reused
30
Network File System
Client integration •
• The NFS client module:
• Plays the role described for the client module in our
architectural model,
33
Network File System
Access control and authentication •
• NFS server is stateless:
• It does not keep files open on behalf of its clients.
• The server must check the user’s identity against
the file’s access permission attributes afresh on each
request
• The Sun RPC protocol requires clients to send user
authentication information.
34
Network File System
NFS server interface •
• A simplified representation of the RPC interface
provided by NFS version 3 servers.
• The NFS file access operations
• read, write, getattr and setattr are almost identical to the
Read, Write, GetAttributes and SetAttributes operations
defined for our flat file service model.
36
Network File System
NFS server interface •
• File Creation and Insertion
• the creation and insertion of file names in directories
is performed by a single create operation,
• It takes the text name of the new file and the file
handle for the target directory as arguments.
• The other NFS operations on directories are
• create, remove, rename, link, symlink,
readlink, mkdir, rmdir, readdir and
statfs.
• They resemble their UNIX counterparts with the exception
of readdir, which provides a representation.
37
Network File System
Mount service •
• The mounting of subtrees of remote filesystems
• It is supported by a separate mount service process that runs
at user level on each NFS server computer.
• On each server, there is a file with a well-known name
(/etc/exports)
• containing the names of local filesystems that are available
for remote mounting.
• An access list
• Is associated with each filesystem name
• Indicating which hosts are permitted to mount the filesystem.
38
Network File System
Mount service •
• Mounting Remote Filesystems:
• Clients use a modified version of the UNIX mount command.
• Specify the remote host’s name, the pathname of a directory
in the remote filesystem, and the local name for mounting.
• Remote Directory:
• Can be any subtree of the required remote filesystem.
• Enables clients to mount any part of the remote filesystem.
• RPC Protocol:
• Includes an operation that takes a directory pathname.
• Returns the file handle of the specified directory if the client
has access permission for the relevant filesystem.
39
Network File System
Mount service •
• Figure 12.10 illustrates a Client with two remotely
mounted file stores.
• The nodes people and users in filesystems at Server 1 and
Server 2 are mounted over nodes students and staff in the
Client’s local file store.
• The meaning of this is that programs running at Client
can access files at Server 1 and Server 2 by using
pathnames such as /usr/students/jon and
/usr/staff/ann.
40
Figure 12.10
Local and remote file systems accessible on an NFS client
Note:
The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2.
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Network File System
Mount service •
• Remote filesystems may be hard-mounted or soft-mounted in a
client computer.
• Hard-mounted
• User-level process is suspended until the request can be completed,
• If the remote host is unavailable for any reason, the NFS client module
continues to retry the request until it is satisfied.
• in the case of a server failure, user-level processes are suspended
until the server restarts and then they continue just as though there
had been no failure.
• Soft-mounted,
• The NFS client module returns a failure indication to user-level processes
after a small number of retries.
• Properly constructed programs will detect the failure and take
appropriate recovery or reporting actions.
42
Network File System
Pathname translation •
• In NFS, pathnames cannot be translated at a server,
because the name may cross a ‘mount point’ at the
client.
• So pathnames are parsed and translated iteratively by
the client.
• Each part of a name that refers to a remote-mounted
directory is translated to a file handle using a separate
lookup request to the remote server.
• The lookup operation looks for a single part of a pathname in
a given directory and returns the corresponding file handle
and file attributes.
43
Network File System
Automounter •
• The automounter
• Maintains a table of mount points (pathnames).
• List references for to one or more NFS servers listed
against each.
• It behaves like a local NFS server at the client
machine.
44
Next lecture
•Name services and the Domain Name System
Assignment
Deadline
Next lecture
45