0% found this document useful (0 votes)
43 views66 pages

Unit - Iii: Prepared by N.Susila Ap/It Skcet

The document discusses distributed file systems. It provides an overview of file system architecture, describing the roles of the flat file service, directory service, and client module. The flat file service handles file contents and uses unique IDs, while the directory service maps names to IDs. The client module provides a unified API and caches data.

Uploaded by

Kousi Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views66 pages

Unit - Iii: Prepared by N.Susila Ap/It Skcet

The document discusses distributed file systems. It provides an overview of file system architecture, describing the roles of the flat file service, directory service, and client module. The flat file service handles file contents and uses unique IDs, while the directory service maps names to IDs. The client module provides a unified API and caches data.

Uploaded by

Kousi Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 66

UNIT – III

Prepared By
N.Susila
AP/IT
SKCET

1
DISTRIBUTED FILE SYSTEM

 Introduction
 File Service Architecture
 DFS: Case Studies
 Case Study: Sun NFS
 Case Study: The Andrew File System
DISTRIBUTED FILE SYSTEM

 File system were originally developed for


centralized computer systems and desktop
computers.
 File system was as an operating system
facility providing a convenient
programming interface to disk storage.

3
DISTRIBUTED FILE SYSTEM

 Distributed file systems support the


sharing of information in the form of files
and hardware resources.
 With the advent of distributed object
systems (CORBA, Java) and the web, the
picture has become more complex.
 an overview of types of storage system is
shown below.

4
DISTRIBUTED FILE SYSTEM

Sharing Persis- Distributed Consistency Example


tence cache/replicas maintenance
Main memory 1 RAM
File system 1 UNIX file system
Distributed file system Sun NFS
Web Web server

Distributed shared memory Ivy (Ch. 18)


Remote objects (RMI/ORB) 1 CORBA
Persistent object store 1 CORBA Persistent
Object Service
Peer-to-peer storage system OceanStore(Ch. 10)

Storage systems and their properties


Types of consistency between copies: 1 - strict one-copy consistency
√ - approximate consistency
X - no automatic consistency

5
DISTRIBUTED FILE SYSTEM

a typical layered module structure for the


implementation of a non-distributed file
system in a conventional operating system
is presented below.

6
DISTRIBUTED FILE SYSTEM

Directory module: relates file names to file IDs

File module: relates file IDs to particular files


Access control module: checks permission for operation requested

File access module: reads or writes file data or attributes


Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

File system modules

7
DISTRIBUTED FILE SYSTEM

 File systems are responsible for the


organization, storage, retrieval, naming,
sharing and protection of files.
 Files contain both data and attributes.
 A typical attribute record structure is
illustrated in the below figure

8
DISTRIBUTED FILE SYSTEM

File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list

File attribute record structure

9
DISTRIBUTED FILE SYSTEM

 the main operations on files that are


available to applications in UNIX systems
are summarized.

10
DISTRIBUTED FILE SYSTEM

UNIX file system operations

filedes = open(name, mode) Opens an existing file with the given name.
filedes = creat(name, mode) Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open
file. The mode is read, write or both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer.
count = write(filedes, buffer, n) Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
and advance the read-write pointer.
pos = lseek(filedes, offset, Moves the read-write pointer to offset (relative or absolute,
whence) depending on whence).
status = unlink(name) Removes the file name from the directory structure. If the file
has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file name into buffer.

11
DISTRIBUTED FILE SYSTEM

 Distributed File system requirements


 Related requirements in distributed file systems
are:
 Transparency
 Concurrency
 Replication
 Heterogeneity
 Fault tolerance
 Consistency
 Security
 Efficiency

12
DISTRIBUTED FILE SYSTEM

 An architecture that offers a clear separation of


the main concerns in providing access to files is
obtained by structuring the file service as three
components:
 A flat file service
 A directory service
 A client module.
 The relevant modules and their relationship is
shown

13
DISTRIBUTED FILE SYSTEM

Client computer Server computer

Application Application Directory service


program program

Flat file service

Client module

File service architecture

14
DISTRIBUTED FILE SYSTEM

 The Client module implements exported


interfaces by flat file and directory services on
server side.
 Responsibilities of various modules can be
defined as follows:
 Flat file service:
 Concerned with the implementation of operations on
the contents of file. Unique File Identifiers (UFIDs)
are used to refer to files in all requests for flat file
service operations. UFIDs are long sequences of bits
chosen so that each file has a unique among all of
the files in a distributed system.

15
DISTRIBUTED FILE SYSTEM

 Directory service:
 Provides mapping between text names for the files
and their UFIDs. Clients may obtain the UFID of a
file by quoting its text name to directory service.
Directory service supports functions needed
generate directories, to add new files to directories.

16
DISTRIBUTED FILE SYSTEM

 Client module:
 It runs on each computer and provides integrated
service (flat file and directory) as a single API to
application programs. For example, in UNIX hosts, a
client module emulates the full set of Unix file
operations.
 It holds information about the network locations of
flat-file and directory server processes; and achieve
better performance through implementation of a
cache of recently used file blocks at the client.

17
DISTRIBUTED FILE SYSTEM

 Flat file service interface:


 contains a definition of the interface to a flat file service.

18
DISTRIBUTED FILE SYSTEM

Read(FileId, i, n) -> Data if 1≤i≤Length(File): Reads a sequence of up to n items


-throws BadPosition from a file starting at item i and returns it in Data.

Write(FileId, i, Data) if 1≤i≤Length(File)+1: Write a sequence of Data to a

-throws BadPosition file, starting at item i, extending the file if necessary.


Create() -> FileId Creates a new file of length0 and delivers a UFID for
it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId) -> Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes

Flat file service operations


19
DISTRIBUTED FILE SYSTEM

 Access control
 In distributed implementations, access rights checks
have to be performed at the server because the
server RPC interface is an otherwise unprotected
point of access to files.
 Directory service interface
 contains a definition of the RPC interface to a
directory service.

Lookup(Dir, Name) -> FileId Locates the text name in the directory and

-throws NotFound returns the relevant UFID. If Name is not in

the directory, throws an exception.


AddName(Dir, Name, File) If Name is not in the directory, adds(Name,File)

20
DISTRIBUTED FILE SYSTEM

-throws NameDuplicate to the directory and updates the file’s attribute record.
If Name is already in the directory: throws an exception.
UnName(Dir, Name) If Name is in the directory, the entry containing Name
is removed from the directory.
If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the
regular expression Pattern.

Directory service operations

21
DISTRIBUTED FILE SYSTEM

 Hierarchic file system


 A hierarchic file system such as the one that UNIX
provides consists of a number of directories
arranged in a tree structure.
 File Group
 A file group is a collection of files that can be located
on any server or moved between servers while
maintaining the same names.
– A similar construct is used in a UNIX file system.
– It helps with distributing the load of file serving
between several servers.
– File groups have identifiers which are unique
throughout the system (and hence for an open
system, they must be globally unique).

22
DISTRIBUTED FILE SYSTEM

To construct a globally unique


ID we use some unique
attribute of the machine on
which it is created, e.g. IP
number, even though the file
group may move
subsequently.

File Group ID:


32 bits 16 bits

IP address date

23
DISTRIBUTED FILE SYSTEM

 NFS (Network File System)


 Developed by Sun Microsystems (in 1985)
 Most popular, open, and widely used.
 NFS protocol standardized through IETF (RFC 1813)
 AFS (Andrew File System)
 Developed by Carnegie Mellon University as part of Andrew
distributed computing environments (in 1986)
 A research project to create campus wide file system.
 Public domain implementation is available on Linux
(LinuxAFS)
 It was adopted as a basis for the DCE/DFS file system in
the Open Software Foundation (OSF, www.opengroup.org)
DEC (Distributed Computing Environment

24
DISTRIBUTED FILE SYSTEM

Client computer Server computer

Application Application
program program
UNIX
system calls
UNIX kernel
UNIX kernel Virtual file system Virtual file system
Operations Operations
on local files on
file system

remote files
UNIX NFS NFS UNIX
file file
Other

client server
system system

NFS protocol
NFS architecture (remote operations)

*
25
DISTRIBUTED FILE SYSTEM

 The file identifiers used in NFS are called


file handles.

fh = file handle:

Filesystem identifier i-node number i-node generation

26
DISTRIBUTED FILE SYSTEM

 A simplified representation of the RPC


interface provided by NFS version 3
servers is shown below

• read(fh, offset, count) -> attr, data


• write(fh, offset, count, data) -> attr
• create(dirfh, name, attr) -> newfh, attr
• remove(dirfh, name) status
• getattr(fh) -> attr
• setattr(fh, attr) -> attr

27
DISTRIBUTED FILE SYSTEM

• lookup(dirfh, name) -> fh, attr


• rename(dirfh, name, todirfh, toname)
• link(newdirfh, newname, dirfh, name)
• readdir(dirfh, cookie, count) -> entries
• symlink(newdirfh, newname, string) -> status
• readlink(fh) -> string
• mkdir(dirfh, name, attr) -> newfh, attr
• rmdir(dirfh, name) -> status
• statfs(fh) -> fsstats

NFS server operations (NFS Version 3 protocol, simplified)

28
DISTRIBUTED FILE SYSTEM

 NFS access control and authentication


 The NFS server is stateless server, so the user's
identity and access rights must be checked by the
server on each request.
 In the local file system they are checked only on the
file’s access permission attribute.
 Every client request is accompanied by the userID
and groupID
 It is not shown in the Figure 8.9 because they are
inserted by the RPC system.
 Kerberos has been integrated with NFS to provide
a stronger and more comprehensive security
solution.
29
DISTRIBUTED FILE SYSTEM

 Mount service
 Mount operation:
mount(remotehost, remotedirectory, localdirectory)
 Server maintains a table of clients who have
mounted filesystems at that server.
 Each client maintains a table of mounted file
systems holding:
< IP address, port number, file handle>
 Remote file systems may be hard-mounted or
soft-mounted in a client computer.
 Figure 10 illustrates a Client with two remotely
mounted file stores.

30
DISTRIBUTED FILE SYSTEM

Server 1 Client Server 2


(root) (root) (root)

export . . . vmunix usr nfs

Remote Remote
people students x staff users
mount mount

big jon bob . . . jim ann jane joe

Local and remote file systems accessible on an NFS client

31
DISTRIBUTED FILE SYSTEM

 Automounter
 The automounter was added to the UNIX
implementation of NFS in order to mount a remote
directory dynamically whenever an ‘empty’ mount
point is referenced by a client.
 Automounter has a table of mount points with a
reference to one or more NFS servers listed against
each.
 it sends a probe message to each candidate server
and then uses the mount service to mount the
filesystem at the first server to respond.
 Automounter keeps the mount table small.

32
DISTRIBUTED FILE SYSTEM

 Automounter Provides a simple form of replication


for read-only filesystems.
 E.g. if there are several servers with identical copies
of /usr/lib then each server will have a chance of
being mounted at some clients.

33
DISTRIBUTED FILE SYSTEM

 Server caching
 Similar to UNIX file caching for local files:
 pages (blocks) from disk are held in a main memory
buffer cache until the space is required for newer
pages. Read-ahead and delayed-write optimizations.
 For local files, writes are deferred to next sync event
(30 second intervals).
 Works well in local context, where files are always
accessed through the local cache, but in the remote
case it doesn't offer necessary synchronization
guarantees to clients.

34
DISTRIBUTED FILE SYSTEM

 NFS v3 servers offers two strategies for


updating the disk:
 Write-through - altered pages are written to
disk as soon as they are received at the
server. When a write() RPC returns, the
NFS client knows that the page is on the
disk.
 Delayed commit - pages are held only in the
cache until a commit() call is received for
the relevant file. This is the default mode
used by NFS v3 clients. A commit() is
issued by the client whenever a file is
closed.
35
DISTRIBUTED FILE SYSTEM

 Client caching
 Server caching does nothing to reduce
RPC traffic between client and server
 further optimization is essential to reduce
server load in large networks.
 NFS client module caches the results of
read, write, getattr, lookup and readdir
operations
 synchronization of file contents (one-copy
semantics) is not guaranteed when two or
more clients are sharing the same file.

36
DISTRIBUTED FILE SYSTEM

 Timestamp-based validity check


 It reduces inconsistency, but doesn't
eliminate it.
 It is used for validity condition for cache
entries at the client:
(T - Tc < t) v (Tmclient = Tmserver)

t freshness guarantee
Tc time when cache entry was last
validated
Tm time when block was last
updated at server
T current time

37
DISTRIBUTED FILE SYSTEM

 it is configurable (per file) but is typically set


to 3 seconds for files and 30 secs. for
directories.
 it remains difficult to write distributed
applications that share files with NFS.

38
DISTRIBUTED FILE SYSTEM

 Other NFS optimizations


 Sun RPC runs over UDP by default (can use TCP
if required).
 Uses UNIX BSD Fast File System with 8-kbyte
blocks.
 reads() and writes() can be of any size
(negotiated between client and server).
 The guaranteed freshness interval t is set
adaptively for individual files to reduce getattr()
calls needed to update Tm.
 File attribute information (including Tm) is
piggybacked in replies to all file requests.

39
DISTRIBUTED FILE SYSTEM

 NFS performance
 Early measurements (1987) established that:
 Write() operations are responsible for only 5% of
server calls in typical UNIX environments.
– hence write-through at server is acceptable.
 Lookup() accounts for 50% of operations -due to
step-by-step pathname resolution necessitated by
the naming and mounting semantics.
 More recent measurements (1993) show high
performance.
 see www.spec.org for more recent measurements.

40
DISTRIBUTED FILE SYSTEM

 NFS summary
 NFS is an excellent example of a simple,
robust, high-performance distributed
service.
 Achievement of transparencies are other
goals of NFS:
 Access transparency:
– The API is the UNIX system call interface for
both local and remote files.

41
DISTRIBUTED FILE SYSTEM

 Location transparency:
– Naming of filesystems is controlled by client
mount operations, but transparency can be
ensured by an appropriate system configuration.
 Mobility transparency:
– Hardly achieved; relocation of files is not
possible, relocation of filesystems is possible,
but requires updates to client configurations.
 Scalability transparency:
– File systems (file groups) may be subdivided
and allocated to separate servers.
Ultimately, the performance limit is determined
by the load on the server holding the most
heavily-used filesystem (file group).

42
DISTRIBUTED FILE SYSTEM

 Replication transparency:
– Limited to read-only file systems; for writable
files, the SUN Network Information Service
(NIS) runs over NFS and is used to replicate
essential system files.
 Hardware and software operating system
heterogeneity:
– NFS has been implemented for almost every
known operating system and hardware platform
and is supported by a variety of filling systems.
 Fault tolerance:
– Limited but effective; service is suspended if a
server fails. Recovery from failures is aided by
the simple stateless design.

43
DISTRIBUTED FILE SYSTEM

 Consistency:
– It provides a close approximation to one-copy
semantics and meets the needs of the vast
majority of applications.
– But the use of file sharing via NFS for
communication or close coordination between
processes on different computers cannot be
recommended.
 Security:
– Recent developments include the option to use
a secure RPC implementation for authentication
and the privacy and security of the data
transmitted with read and write operations.

44
DISTRIBUTED FILE SYSTEM

 Efficiency:
–NFS protocols can be implemented for use in
situations that generate very heavy loads.

45
DISTRIBUTED FILE SYSTEM

 Like NFS, AFS provides transparent


access to remote shared files for UNIX
programs running on workstations.
 AFS is implemented as two software
components that exist at UNIX processes
called Vice and Venus.
(Figure 11)

46
DISTRIBUTED FILE SYSTEM

Workstations Servers

User Venus
program
Vice
UNIX kernel

UNIX kernel

User Venus Network


program
UNIX kernel

Vice

Venus
User
program UNIX kernel
UNIX kernel

Distribution of processes in the Andrew File System

47
DISTRIBUTED FILE SYSTEM

 The files available to user processes running on


workstations are either local or shared.
 Local files are handled as normal UNIX files.
 They are stored on the workstation’s disk and
are available only to local user processes.
 Shared files are stored on servers, and copies of
them are cached on the local disks of
workstations.
 The name space seen by user processes is
illustrated

48
DISTRIBUTED FILE SYSTEM

Local Shared
/ (root)

tmp bin . . . vmunix cmu

bin

Symbolic
links

File name space seen by clients of AFS

49
DISTRIBUTED FILE SYSTEM

 The UNIX kernel in each workstation and server


is a modified version of BSD UNIX.
 The modifications are designed to intercept
open, close and some other file system calls
when they refer to files in the shared name
space and pass them to the Venus process in
the client computer.

50
DISTRIBUTED FILE SYSTEM

Workstation

User Venus
program
UNIX file Non-local file
system calls operations

UNIX kernel
UNIX file system

Local
disk

System call interception in AFS

51
DISTRIBUTED FILE SYSTEM

User p ro cess UNIX kern el Ven u s Net Vice


o p en (FileNa me, If FileNa me refers to a
mo d e) file in sh ared file sp ace,
p ass th e req u est to Ch eck list o f files in
lo cal cach e. If n o t
Ven us. p resen t o r th ere is n o
v alid ca llb a ck p ro mise,
sen d a req u est fo r th e
file to th e Vice serv er
th at is cu sto d ian o f th e
v o lu me co n tain in g th e Tran sfer a co p y o f th e
file. file an d a ca llb a ck
p ro mise to th e
wo rk statio n . Lo g th e
Place th e co p y of th e callback p ro mise.
file in th e lo cal file
Op en th e lo cal file an d sy stem, en ter its lo cal
retu rn the file n ame in th e lo cal cach e
d escrip tor to th e list an d retu rn the lo cal
ap p licatio n . n ame to UNIX.
rea d (FileDescrip to r, Perform a n o rmal
Bu ffer, len g th) UNIX read o p eratio n
o n th e local co p y.
write(FileDescrip to r, Perform a n o rmal
Bu ffer, len g th) UNIX write o p eratio n
o n th e local co p y.
clo se(FileDescrip to r) Clo se the lo cal co p y
an d n o tify Ven u s th at
th e file h as b een clo sed . If th e lo cal co p y h as
b een ch an g ed , sen d a
co p y to th e Vice serv er Rep lace th e file
th at is th e cu sto d ian o f co n ten ts an d sen d a
th e file. ca llb a ck to all o th er
clients ho ld in gca llba ck
p ro mises o n th e file.

implementation of file system calls in AFS

52
DISTRIBUTED FILE SYSTEM

Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file
identified by the fid and records a callback promise on it.
Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified
file.
Create() -> fid Creates a new file and records a callback promise on it.
Remove(fid) Deletes the specified file.
SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the
lock may be shared or exclusive. Locks that are not removed
expire after 30 minutes.
ReleaseLock(fid) Unlocks the specified file or directory.
RemoveCallback(fid) Informs server that a Venus process has flushed a file from its
cache.
BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels
the callback promise on the relevant file.

The main components of the Vice service interface

53
DISTRIBUTED FILE SYSTEM

URL
http://www.cdk3.net:8888/WebExamples/earth.html

DNS lookup
Resource ID (IP number, port number, pathname)

55.55.55.55 8888 WebExamples/earth.html

Web server
Network address

2:60:8c:2:b0:5a file

Socket
DISTRIBUTED FILE SYSTEM

NS2

2
Name
1 NS1 servers
Client
3
NS3

A client iteratively contacts name servers NS1–NS3 in order to resolve a name


DISTRIBUTED FILE SYSTEM

NS2 NS2

2 2
4 3
1 1
NS1 NS1
client client
4 3 5
NS3 NS3

Non-recursive Recursive
server-controlled server-controlled

A name server NS1 communicates with other name servers on behalf of a client
DISTRIBUTED FILE SYSTEM

Note: Name server names a.root-servers.net


are in italics, and the (root)
corresponding domains are
in parentheses.
Arrows denote name server entries uk
ns1.nic.uk purdue.edu
(uk) yahoo.com
ns.purdue.edu
(purdue.edu)
co.uk
ac.uk ns0.ja.net
(ac.uk)
* .purdue.edu

ic.ac.uk
qmw.ac.uk

dcs.qmw.ac.uk *.dcs.qmw.ac.uk
*.ic.ac.uk
*.qmw.ac.uk

alpha.qmw.ac.uk dns0.dcs.qmw.ac.uk dns0-doc.ic.ac.uk


(qmw.ac.uk) (dcs.qmw.ac.uk) (ic.ac.uk)
DISTRIBUTED FILE SYSTEM

Record typeMeaning Main contents


A A computer address IP number
NS An authoritative name server Domain name for server
CNAME The canonical name for an aliasDomain name for alias
SOA Marks the start of data for a zone
Parameters governing the zone
WKS A well-known service descriptionList of service names and protocols
PTR Domain name pointer (reverse Domain name
lookups)
HINFO Host information Machine architecture and operating
system
MX Mail exchange List of <preference, host
> pairs
TXT Text string Arbitrary text
DISTRIBUTED FILE SYSTEM

admin
Printing
Client 1. ‘finance’ service
lookup service?

admin
Lookup Client
service
Network 2. Here I am: .....
4. Use printing
service admin, finance

3. Request Lookup
Corporate Printing printing service
infoservice service
finance
DISTRIBUTED FILE SYSTEM

DI: 599(EC)

DI: 543 UK FR DI: 574

DI: 437 AC

DI: 322 QMW

Peter.Smith

mailboxes password

Alpha Beta Gamma


DISTRIBUTED FILE SYSTEM

DI: 633 (WORLD)


Well-known directories:
#599 = #633/EC
#642 = #633/NORTH AMERICA EC NORTH AMERICA

DI: 599 DI: 642

UK FR US CANADA
DI: 543 DI: 574 DI: 732 DI: 457
DISTRIBUTED FILE SYSTEM

DI: 633(WORLD)
Well-known directories:
#599 = #633/EC
#642 = #633/NORTH AMERICA EC NORTH AMERICA

DI: 599 DI: 642

US US CANADA
DI: 543 UK FR DI: 574 DI: 732 DI: 457

#633/EC/US
DISTRIBUTED FILE SYSTEM

DUA DSA DSA

DUA DSA
DSA
DSA
DUA

DSA
DISTRIBUTED FILE SYSTEM

X.500 Service (root)

... France (country)Great Britain (country)Greece (country)


...

... BT Plc (organization)University of Gormenghast (organization)


...

... Computing Service (organizationalUnit)


Department of Computer Science (organizationalUnit)
Engineering Department (organizationalUnit)
...

... Departmental Staff (organizationalUnit)


ely (applicationProcess)
Research Students (organizationalUnit)
...

... Alice Flintstone (person)... Pat King (person)James Healey (person) Janet Papworth (person) ...
DISTRIBUTED FILE SYSTEM

info
Alice Flintstone, Departmental Staff, Department of Computer Science,
University of Gormenghast, GB

commonName uid
Alice.L.Flintstone alf
Alice.Flintstone
Alice Flintstone mail
A. Flintstone
[email protected]
surname [email protected]
Flintstone roomNumber
telephoneNumber Z42
+44 986 33 4604 userClass
Research Fellow
DISTRIBUTED FILE SYSTEM

66

You might also like