0% found this document useful (0 votes)
137 views70 pages

Distributed File Systems & Name Services: UNIT-4

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views70 pages

Distributed File Systems & Name Services: UNIT-4

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

DISTRIBUTED FILE SYSTEMS

&
NAME SERVICES
UNIT-4
• A distributed file system enables programs to
store and access remote files exactly as they
do local ones, allowing users to access files
from any computer on a network.
• In this chapter we discuss on simple
architecture for file systems and to describe
the architecture and implementation of basic
distributed file systems.
Two basic distributed file service implementations
that have been in widespread use
Sun Network File System , NFS
Andrew File System, AFS
Each emulates the UNIX file system interface with
differing degree of scalability, fault tolerance and
deviation from strict UNIX one – copy file update
semantics. (Updates are written to the single copy
and are available immediately)
• Sharing of stored information – main aspect of
distributed resource sharing.
• Web servers provides a restricted form of data
sharing in which files are stored locally to the
server that are available to the clients thruout
the internet.
• But data accessed thru the web servers is
managed and updated in the file system at the
server .
• The requirements for sharing within local
networks and intranets lead to a need for a
different type of service- one that support
persistent storage of data and programs of all
types on behalf of clients and consistent
distribution of up-to-date data.
• DFS support the sharing of information in the
form of files and h/w resource in the form of
persistent storage thruout an intranet.
• A file service enables programs to store and
access remote files exactly as they do in local
ones. Allowing users to access files from any
computer in the intranet.
• The need for persistent storage at a few
servers reduce the need for local disk
management and helps economically in
maintaining and archiving persistent data.
• In any organization that operate web servers
for external and internal access via intranet ,
web server often store and access the material
from a local distributed file system.
• The below figure provides an overview of
types of storage system.
Types of consistency:
1: stt one-copy . slightly weaker guarantees. 2: considerably weaker guarantees.
• One-copy consistency: cannot observe any
discrepancies b/w cached copies and stored data after
update.
• But distributed replicas are used, strict consistency is
more difficult to achieve.
• AFS and sun NFS maintains an approximation to strict
consistency.
• The consistency b/w the copies stored at web proxies
and client caches and the original server is maintained
by explicit user actions.
• Clients are not notified when a page stored at the
original server is updated. They perform explicit
checks to keep their local copies up-to-date.
• The CORBA and Persistent Java schemes maintain
single copies of persistent objects and need
remote invocation to access them. So only
consistency issue is b/w the persistent copy of an
object on disk and active copy in the memory.
Characteristic of File Systems
• File Systems are responsible for
Organization
Storage
Retrieval
Naming
Sharing
Protection of Files
They provide file abstraction for freeing programmers from
concern with the details of storage allocation and layout.
• File contains both data and attributes.
• Data consist of sequence of data items- accessible by
operations to read and write any portion of the sequence.
• The attributes are held as a single record containing
information such as,
Length of file
Timestamps
File type
Owner’s identity
Access control list
The shaded attributes are managed by the file system and not
updatable by user programs.
Figure 8.3
File attribute record structure
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list

Instructor’s Guide for Coulouris, Dollimore


and Kindberg Distributed Systems:
Concepts and Design Edn. 4
• File systems are designed to store and manage large
number of files – facilitates for creating , naming and
deleting files.
• Naming of files – supported by use of directories.
• A directory is a file – that provides a mapping from text
names to internal file identifiers.
• File system also responsible
-> for the control of access to files
-> restricting access to files according to users’
authorization.
-> type of access required to files.
• The term metadata is used to refer all extra information
stored by a file system needed for management of files.
It includes file attributes, directories and other
persistent information used by the file system.
• File system modules is a typical layered structure. It
consist of
Directory Module
File Module
Access Control Module
File Access Module
Block Module
Device Module
Figure 8.2
File system modules
Directory module: relates file names to file IDs

File module: relates file IDs to particular files

Access control module: checks permission for operation requested

File access module: reads or writes file data or attributes

Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

Instructor’s Guide for Coulouris, Dollimore


and Kindberg Distributed Systems:
Concepts and Design Edn. 4
• This is the layered module structure for
implementation of non DFS.
• Each layer depends on the layer below it.
• DFS also requires all this modules along with
components that deal with client server
communication and with the distributed
naming and location of files.
Figure 8.4
UNIX file system operations
These are system calls implemented by kernel
filedes = open(name, mode) Opens an existing file with the given name.
filedes = creat(name, mode) Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open
file. The mode is read, write or both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer.
count = write(filedes, buffer, n) Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
and advance the read-write pointer.
pos = lseek(filedes, offset, Moves the read-write pointer to offset (relative or absolute,
whence) depending on whence).
status = unlink(name) Removes the file name from the directory structure. If the file
has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file name into buffer.
Instructor’s Guide for Coulouris, Dollimore and Kindberg
Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2005
Distributed File System Requirements
• Transparency
• Concurrency
• Replication
• Heterogeneity
• Fault tolerance
• Consistency
• Security
• Efficiency..
• Transparency: The file service is the heavily loaded
service in the intranet, so its functionality and
performance are critical. Some of the forms of
transparency are
Access Transparency
Location Transparency
Mobility Transparency
Performance Transparency
Scaling Transparecy
1. Access Transparency: Client programs should be
unaware of distribution of files. A single set of
operation is provided for access to local and
remote files. Programs written to operate on local
files are able to access remote files without
modification.
2. Location Transparency: Files or group of files may
be relocated without changing their pathnames
and user programs sees the same name space
wherever they are executed.
3. Mobility Transparency: Neither client programs nor
system administration tables need to be changed
when files are moved. This allows file mobility.
4. Performance Transparency: Client program should
continue to perform satisfactorily while load on
service varies within a specified range.
5. Scaling Transparency: The service can be expanded
by incremental growth to deal with wide range of
loads and n/w sizes.
• Concurrent File Updates: Changes to a file by
one client should not interfere with the
operation of other clients simultaneously
accessing or changing the same file.
• It is a well known issue of concurrency control.
• The need for concurrency control is to access
shared data .
• File Replication: A file may be represented by
several copies of its content at different locations.
• This has two benefits:
It enables multiple server to share the load of
providing a service to clients accessing same set
of files, enhancing the scalability of the service
It enhances fault tolerance by enabling clients to
locate another server that holds a copy of the file
when one has failed.
H/w and OS Heterogeneity:
• Service interface is defined so that client and
server s/w can be implemented for different
OS and computers. It is an important aspect of
openess.
Fault Tolerance:
1. The central role of file service in DS makes it
essential that service continue to operate in
the face of client and server failures.
2. It can use of at-least once semantics with a
server protocol designed in terms of
idempotent operation.
Consistency:
• Conventional file system offers one – copy semantics.
Concurrent access to files in which the file content is
seen by all of the processes accessing or updating a
given file are those that they would see if only a single
copy of the file contents existed.
• When files are replicated or cached at different sites ,
there will be a delay bcos modifcations made at one
site to all of the other sites that hold copies result in
some deviation from one-copy semantics.
Security:
1. In distributed file system , there is a need to
authenticate client request so that access
control at the server is based on correct user
identities and to protect the content of request
and reply msgs with digital signature and
encryption of secret data.
Efficiency:
• It should offer facilities that are of at least the
same power and generality as those found in
conventional file system and should achieve a
comparable level of performance.
File Service Architecture:
• Under this we are going to see the sub topics:
Flat file service
Directory service
Client module
Flat file service interface
Access Control
Directory service interface
Hierarchic File system
File Grouping
File service is structured as 3 components
• A flat file service
• A directory service
• Client model
• Flat file service and directory service export an interface
for use by client programs and their RPC i/f, provide a
comprehensive set of operations for access to files.
• Client module provides a single programming interface
with operation on files similar to those found in
conventional file system.
Flat File Service:
• Flat file service is concerned with implementing
operations on the content of files.
• Unique file identifiers are used to refer to files in all
requests for flat file service operations.
• The division of responsibilities b/w file service and
directory service is based upon the use of UFIDs.
• This UFIDs are unique among all files in DS.
• When FFS receives a request to create a file , it generates
a new UFID for it and returns UFID to the requester.
Directory Service:
• Directory service provides a mapping b/w text names
for files and their UFID.
• Clients may obtain UFIDs of a file by quoting its text
name to the directory service.
• Directory service provides functions needed to
generate directories and to add new file names and to
obtain UFIDs from directories.
• Directory service is a client of flat file service and its
directory files are stored in files of flat file service.
Client Module:
• Client module runs in each client computer,
integrating and extending the operations of flat file
service and directory service.
• Client module also holds information about the n/w
location of flat file server and directory server
processes.
• Client module play an important role in achieving
performance thru the implementation of a cache of
recently used file blocks at the client.
Flat file service Interface:
• Read(file ld, i, n) : If 1 ≤ i ≤ length(file) – Reads a
sequence of upto n items from a file starting at item i
and returns it in data that returns to client.
• Write(file Id, I, data) : If 1 ≤ i ≤ length(file) +1 – write a
sequence of data to a file starting at item I, extending
the file if necessary .
• Create( ) -> Fileld – creates a new file of length 0 and
delivers a UFID for that.
• Delete(fileld) – Removes a file from the file store
• If a file id is invalid - if the file is not present in the server
processing the request or if its access permission is
inappropriate for the operation requested.
• The parameter ‘i’ of read and write specify a position in the
file.
• Get attributes and Set attributes enable clients to access
attribute record
• Get attribute – available to any client that read the file and
returns the file attributes .
• Set attribute - restricted to directory service that provides
access to file.
• Flat file service interface differs from UNIX
interface mainly for fault tolerance.
• Repeatable Operation: The operations are
idempotent allowing the use of at least once RPC
semantics - client may repeat calls to which they
receive no reply.
• Stateless Servers: This server can be restarted
after a failure and resume the operation without
any need for clients or server to restore any state.
• ( In UNIX file operations are neither idempotent
nor consistent. A R/W pointer is generated by
UNIX file system whenever file is opened and it is
retained until the file is closed.
• UNIX R/W is not idempotent so if operations are
accidently repeated, automatic advance of R/W
pointer results in access to a different portion of
the file in the repeated operations.)
Access Control:
• Access rights checks have to be performed at the
server bcos the server RPC i/f otherwise becomes
unprotected point of access to files.
• Two alternative approach:
• An access check is made whenever a file name is
converted to a UFID.
• A user identity is submitted with every client request
and access checks are performed by the server for
every file operation.
Directory Service Interface:
• Primary directory service is to provide a
service for translating text names to UFIDs.
• It maintains directory files containing the
mappings b/w text names for files and UFIDs.
• Lookup operation performs single name 
UFID translation
• Add name  adds an entry to a directory and
increment the reference count field in the file
attribute record. AddName(Dir, Name, File)
• Un name  removes an entry from a directory
and decrements the reference count.
unnamed(Dir, Name)
• GetName is provided to enable clients to
examine the contents of directories
GetName(Dir, Pattern)
Hierarchic File System:
• Consist of number of directories arranged in a
tree structure
• Each directory holds the name of the file and
other directories that are accessible from it.
• The root has a distinguished name and each file
or directory has a name in a directory.
• Implemented by a link operation , which add a
new name for a file to a specified directory.
File Grouping:
• Collection of files located on a given server
• Sever may have several file groups and groups can be
moved b/w servers bu file cannot change the group to
which it belong.
• In DS file service , file groups support the allocation of
files to file server in larger logical units and enable the
service implemented with files stored on several
servers.
• File group identifier must be unique thruout DS.
• Since file groups can be moved and DS that are separate can
be merged to a single system, the only way to ensure file
group identifier should always be distinct.
• Whenever file group is created , unique identifier can be
generated by concatenating 32 bit IP address of the host
creating a new group with 16 bit integer derived from the
date producing 48 bit integer.
File Group Identifier
• IP Address date

32 bit 16 bit
Sun N/W File System:
• Under Sun NFS:
Virtual File System
Client Integration
Access Control and Authentication
NFS Server Interface
Mount Service
Path Name Translation
Automounter
Server Caching
Client Caching
• NFS server module resides in the kernel of each
computer.
• Request referring to files in a remote file
system are translated by the client module to
NFS protocol operations and then passed to NFS
server module at the computer holding relevant
file system.
• NFS client and server modules communicate
using remote procedure calling.
Virtual File System:
• NFS provides access transparency
• User programs can issue file operations for local or
remote files without distinction.
• Integration is achieved by virtual file system modules,
which has been added to UNIX kernel to distinguish
b/w local and remote files and to translate b/w UNIX
independent file identifiers used by NFS and internal
file identifiers used in UNIX and other file system.
• VFS keeps track of file system that are
currently available both locally and remotely
and it passes each request to local system
module.(UNIX File System, NFS client module
and other file system)
• File identifiers used in NFS are called file
handles. A file handle is opaque to clients and
contains whatever information the server
needs to distinguish an individual file.
File Handle:

‘Filesystem’ is the set of files held in storage device and the word ‘file system ‘ refers
to s/w component that provides access to files.
• Filesystem identifier is a unique number
allocated to each filesystem when it is created.
• i -node generation number is needed bcos in
conventional UNIX , i- node numbers are
reused after a file is removed.
• In VFS generation no. is stored in each file and
is incremented each time the i-node number
is reused.
• Virtual file system layer has one VFS structure for
each mounted file system and one v-node per open
file.
• VFS structure relates a remote file system to the local
directory on which it is mounted.
• v-node contains an indicator to show whether a file is
local or remote. If it is local , v-node contains a
reference to the index of local file.
• If it is remote , it contains the file handle of the
remote file.
Client Integration:
• It emulates the semantics of standard UNIX file
system primitives and is integrated with UNIX kernel.
• It is integrated with kernel , so user programs can
access files via UNIX system calls without
recompilation or reloading.
• A single client module serves all of the user-levels
processes with a shared cache or recently used
blocks.
• Encryption key used to authenticate user IDs
passed to server is retained in the kernel.
• NFS client module cooperates with VFS in each
client machine.
• Transferring blocks of files to and from the server
and caching the blocks in local memory
whenever possible.
• It shares same buffer cache that is used by local
i/p and o/p systems.
Access Control and Authentication:
• NFS server is stateless and does not keep files
open on behalf of its client.
• So server must check the users identity against
file access permission attributes on each
request.
• Kerberos has been integrated with Sun NFS to
provide solution for user authentication and
security.
NFS Server Interface:
• NF S File access operations like read, write,
getattr, setattr are almost identical to Read,
Write, GetAttributes, SetAttributes defined on
flat file service model.
• lookup operation are similar to those in
directory service model.
Mount Service:
• Mounting of sub trees of remote file system by clients
are supported by a separate mount service process.
• On each server there is a file with name ( /etc/exports)
containing the name of local filesystems that are
available for remote mounting.
• Access list is associated with each filesystem name
indicating which hosts are permitted to mount the file
system.
• Modified mount command communicates with mount
service process on remote host using a mount protocol.
Local and remote file system accessible on an
NFS client

Note: The file system mounted at /usr/students in the client is actually the sub-tree
located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the sub-tree located at
/nfs/users in Server 2.
• Programs running at client can access files at server 1 and
server2 by using pathnames such as /usr/students/jon
and /usr/staff/ann
• Remote file system may be hard mounted or soft mounted.
• Hard mounted means process is suspended until the request
can be completed and if the remote host is unavailable for
any reason the NFS client module continues to retry the
request until it is satisfied.
• Soft mounted - NFS client module returns a failure
indication to user level processes after a small no. of retries.
Path Name Translation:
• Path names are parsed and their translation is
performed in an iterative manner by the client.
• Each part of the name that refers to a remote
mounted directory is translated to file handle
using separate lookup request to the remote
server.
• The lookup operation returns the corresponding
file handle and file attributes.
Automounter:
1. It is used to mount a remote directory dynamically
whenever an empty mount point is referenced by a client.
2. It maintains a table of mount points (pathnames) with a
reference to one or more NFS servers listed against each.
3. When the NFS client module attempts to resolve a
pathname that includes one of these mount points , it
passes a lookup() request to the local automounter which
locates the required filesystem in its table and send a
‘probe’ request to each server listed.
• The filesystem on the first server is mounted
at the client .
• The mounted filesystem is linked to mount
point using a symbolic link so that access to
link will not result in further request to the
automounter.
• Finally automounter unmounts the remote file
system.
Server Caching:
• In UNIX file system , file pages , directories and file
attributes have been read from disk are retained in the
main memory buffer cache until the buffer space is
required for other pages.
• If the process issues the read or write request , it can be
satisfied without another disk access.
• This caching techniques work in this way bcos, read and
write request issued by user level processes pass thru a
single cache . The cache is always kept up-to-date.
• In NFS server, use of server cache to hold recently read disk
blocks and does not raise any consistency problems. But when
a server performs write operation, extra measures are needed.
It offers two options:
• Data in write operations received from clients is stored in the
memory cache at the server and written to disk before a reply is
sent to client. This is write thru cache.
• Data in write operation is stored only in the memory cache. It
will be written to disk when a commit operation is received for
relevant file.It issues commit whenever a file that was open for
writing is closed.
Client Caching:
• Client module caches the result of read, write,
getattr,lookup and readdir operation in order
to reduce the number of requests transmitted
to servers.
• Clients are responsible for polling the server to
check the validity of cached data that they
hold.
Two Timestamps:
• Tc is the time when the cache entry was last validated.
• Tm is the time when the block was last modified at the
server.
• A cache entry is valid at time T , if T-Tc is less than a
freshness interval t or if the value for Tm recorded at
the client matches the value of Tm at the server
validity condition is,
(T – Tc < t ) V (Tm client = Tm server)
• Validity check is performed whenever a cache
entry is used. The first half of validity
condition can be evaluated without access to
the server. If it is true, then the second half
need not be evaluated.
• If it is false , then the current value of Tm server
is obtained and compared with local value Tm
client .If they are same , then the cache entry is
taken to be valid and the value of Tc for that
cache entry is updated to the current time.
• If they differ then the cached data has been
updated at the server and the cache entry is
invalidated , resulting in a request to the server
for the relevant data.

You might also like