Distributed File System
Distributed File System
Purpose of using the files:
Permanent storage of information
Sharing of information
A distributed file system provides abstraction
to the users of a distributed system & makes
it convenient to use files in a distributed
environment.
Advantages of DFS
Remote information sharing
User mobility
Availability
Diskless workstations
DFS provides the following services:
Storage service
True file service
Name service
Desirable features of a good DFS
Transparency
Structure transparency
Access transparency
Naming transparency
Replication transparency
Cont…
Performance
Usually measured as the average amount of
time needed to satisfy client requests.
In centralized file system, this time includes the
time for accessing the secondary storage device
on which the file is stored and the CPU
processing time.
In DFS, this time includes network
communication overhead when the accessed file
is remote.
Cont…
User mobility
Simplicity & ease of use
Scalability
High availability
High reliability
Data integrity
Security
Heterogeneity
File Models
Unstructured and structured files
Unstructured – Unix, DOS
Structured – Ordered sequence of records
Mutable and immutable files
Unstructured and Structured files
Structured file model
A file appears to the file server as an ordered
sequence of records.
Structured files are of two types
Files with Non-indexed records
Files with indexed records
Unstructured file model
Sharing of files by different applications is easier.
Mutable and immutable files
Mutable file model
In this model, an update performed on a file
overwrites on its old contents to produce the new
contents.
File is represented as a single stored sequence
that is altered by each update operation.
Immutable file model
In this model, a file cannot be modified once it has
been created except to be deleted.
File versioning approach is used.
File-accessing models
Method used for accessing remote files
Unit of data access
Accessing remote files
Remote service model
Processing of the client‟s request is performed at the
server‟s node.
Every remote file access request results in network traffic.
Data-caching model
It gives advantage of the locality feature found in file
accesses.
Data is copied from the server‟s node to the client‟s node
and is cached on the client‟s node.
Cache consistency problem.
Unit of data transfer
1. File-level transfer model: AFS-2, Amoeba
2. Block-level transfer model: Sun Microsystems
NFS
3. Byte-level transfer model: Cambridge File
System
4. Record-level transfer model: Structured Files
File-sharing semantics
Unix semantics
It enforces an absolute time ordering on all
operations.
Session semantics
All changes made to a file during a session are
initially made visible only to the client process that
opened the session &
Invisible to other remote processes who have the
same file open simultaneously.
Immutable shared-files semantics
Transaction-like semantics
It is based on the transaction mechanism, which is
a high-level mechanism for controlling concurrent
access to shared, mutable data.
File-caching schemes
Several key decisions, such as:
Granularity of cached data
Cache size
Replacement policy
Cache location
Modification propagation
Cache validation
Cache location
Place where the cached data is stored.
Possible cache locations in DFS are:
Server‟s main memory
Client‟s disk (diskless workstation ?)
Client‟s main memory
Modification propagation
When client caching is used, multiple copies of
the same data at many nodes require to be
consistent.
Issues:
When to propagate modifications made to a cached
data to the corresponding file server?
How to verify the validity of cached data?
Modification Propagation Schemes
Write-through scheme
Unix Like Semantics: master copy always remains
updated
Delayed-write scheme
Write on Ejection from Cache
Periodic Write
Write on Close: like session semantics
Delayed-write Scheme
When a cache entry is modified, the new value
is written only to the cache and the client just
makes a note that the cache entry has been
updated.
Cache validation schemes
Client-initiated approach
Checking before every access: Unix like semantic
Periodic checking
Check on file open: Session semantic
Cont…
Server-initiated approach
A client informs the file server when opening a file,
indicating whether the file is being opened for reading,
writing, or both.
The server keeps monitoring the file usage modes by
keeping a record of which client has which file open
and in what mode.
Simultaneous read access allowed, but R & W not
allowed.
Drawback of Server-initiated approach
Violates principle of client-server
Requires stateful servers
Callback Policy
Server keeps record of client who have
cached the file.
Cached entry assumed to be valid unless
notified by server.
File replication
Replication and Caching
A replica is associated with a server, whereas a
cached copy is normally associated with a client.
The existence of a cached copy is primarily
dependent on the locality in the access patterns,
whereas the existence of a replica normally
depends on availability and performance
requirements.
File replication
As compared to a cached copy, a replica is more
persistent, widely known, secure, available,
complete, and accurate.
A cached copy is contingent upon a replica. Only
by periodic revalidation with respect to a replica
can a cached copy be useful.
File Replication Advantages
1. Increased availability
2. Increased reliability
3. Improved response time
4. Reduced network traffic
5. Improved system throughput
6. Better scalability
7. Autonomous operation
File replication
Replication transparency
• Naming of replicas
• Replication control
1. Explicit replication
2. Implicit / lazy replication
Multicopy Update Problem
Commonly used approaches to handle this:
1. Read-only replication
2. Read-any-write-all protocol
3. Available-copies protocol
4. Primary-copy protocol
5. Quorum-based protocols
Multicopy Update Problem
Read-only replication
Replication of only Immutable Files
Read-any-write-all protocol
For Mutable files
Unix like semantic
Lock all copy and update
Available-copies protocol
Update available copies (some server may be
down)
Primary-copy protocol
One copy designated as Primary copy, rest
secondary
Write operation only on primary copy
Secondary updated by push / pull, Unix sem
or Lazily
Quorum-based protocol
Handles network partition
Let there be „n‟ copies of file F
Read Op – Min „r‟ copies of F are consulted
Write Op – Min „w‟ copies of F written
r+w>n
Guarantees atleast one up-to-date copy
Associate Ver no. with copy
Copy with highest Ver no. most recent /
updated
Special Cases of Quorum Protocol
Read Any Write All
Suitable when ratio of Read to Write is large
Read All Write Any
Suitable when ratio of Write to Read is large
Majority Consensus Protocol
When ratio of Rd. to Wr. Is 1
Consensus with Weighted Voting
Giving higher weightage to frequently accessed
copy
Fault tolerance
Properties
1. Availability
2. Robustness
3. Recoverability
Fault tolerance
Storage
1. Volatile storage
2. Nonvolatile storage
3. Stable storage
Fault tolerance & Service paradigm
Stateful file servers
Maintains state information pertaining to
service Request during file open & close
operation called Session.
Stateless file servers
Atomic transactions
Essential properties of transactions:
1. Atomicity
Failure atomicity / all-or-nothing property
Concurrency atomicity / consistency property
2. Serializability / isolation property
3. Permanence / durability
Need for transactions in a file service
For improving the recoverability of files in the event
of failures.
For allowing the concurrent sharing of mutable files
by multiple clients in a consistent manner.
Inconsistency may be due to:
System failure or
Concurrent access
Operations for transactions-based file service
1. begin_transaction
2. end_transaction
3. abort_transaction
Recovery Techniques
File versions approach
Avoid overwriting of actual data in physical storage
When a transaction begins, the server creates a tentative
version from current version for write operation.
When transaction is committed, tentative version is made
the new current version and
Previous current version added to sequence of old version
Serializability Conflict – when merging various tentative
versions.
When 2 or more transactions are allowed to access same data
item & one or more of these is Wr Op.
Recovery Techniques
Shadow blocks technique for implementing file versions
Shadow blocks technique for implementing file versions
is used as an optimization that allows creation of a
tentative version of a file without the need to copy the
full file. In fact, it removes most of the copying.
Here entire dist space partitioned into blocks.
File system maintains index for each file and list of free
blocks.
Tentative Ver. of a file is created by copying index of
current Ver. of the file
Recovery Techniques
The write-ahead log approach
Write-ahead log maintained on a stable storage.
A record is first created and written to a Log.
After this, operation is performed on file to modify
its contents.
It is used for recording file updates in a recoverable
manner.
A log file “write-ahead log” is used to record the
operations of a transaction that modifies the file.
Concurrency control
Allows maximum concurrency with minimum
overhead.
Ensures that transactions are run in a manner so
that their effects on shared data are serially
equivalent.
Cont…
Approaches used are:
Locking
Optimistic concurrency control
timestamps
Locking
In the basic locking mechanism, a transaction locks
a data item before accessing it.
Optimized locking for better concurrency
Type-specific locking
Intention-to-write locks
Read, i-write & commit lock
(during i-write lock read opn is permitted whereas during
commit, it is not)
Two phase locking protocol
1. Growing phase
2. Shrinking phase
Locking
Granularity of locking - the unit of lockable data
items.
Handling of locking deadlocks
Avoidance
Detection
Timeouts
Optimistic Concurrency Control
Transactions are allowed to proceed uncontrolled
up to the end of the first phase.
In the second phase, before a transaction is
committed, the transaction is validated to see if
any of its data items have been changed by any
other transaction since it started.
The transaction is committed if found valid;
otherwise it is aborted.
Contd..
For validation process, two records are kept
of the data items within a transaction:
read set
write set
To validate a transaction, read and write sets
are compared with write sets of all the
concurrent transactions that reached at the
end of first phase.
Contd..
If any data item present in the read set or
write set of the transaction being validated is
also present in the write set of any concurrent
transaction, the validation fails.
Cont…
Advantages:
Maximum parallelism
Free from deadlock
Drawbacks:
Old versions of files are required to be retained
for validation process.
Starvation of a transactions.
In an overloaded system, number of transactions
getting aborted may go up substantially.
Timestamps
Detect Conflict right when operation causing it is
executed.
Each operation in a transaction is validated when
it is carried out.
It the validation fails, the transaction is aborted
immediately and it can then be restarted.
Each transaction is assigned a unique timestamp
at the moment it does begin_transaction.
Every data item has a read timestamp and write
timestamp.
Contd..
When a transaction accesses a data item, depending
on the type of access (read/write), the data item
timestamp is updated to the transaction‟s timestamp.
The write operations of transactions are recorded
tentatively and are invisible to other transactions until
the transaction commits.
Validation of Write Operation
If the timestamp of current transaction is either
equal to or more recent than the read and
(committed) write timestamps of accessed
data item, the write operation passes a
validation check.
If the timestamp of current transaction is older
than the timestamp of the last read or
committed data item, the validation fails.
Validation of Read Operation
If the timestamp of current transaction is more
recent than the write timestamp of all
committed and tentative values of the
accessed data item, the read operation passes
validation check.
The read operation can be performed
immediately only if there are no tentative
values of the data item; otherwise it must wait
until the completion of the transactions having
tentative values of the data item.
Contd..
The validation check fails and the current
transaction is aborted in the following cases:
The timestamp of the current transaction is older
than the timestamp of the most recent
(committed) write to the data item.
The timestamp of the current transaction is older
than that of tentative value of the data item made
by another transaction, although it is more recent
than the timestamp of permanent data item.
Distributed Transaction Service
It supports transactions involving files managed by
more than one server.
All servers need to communicate with one another
to coordinate their actions during the processing of
the transaction.
A simple approach is to pass client requests through
a single server that holds the relevant file.
Contd..
A client begins the transaction by sending a
begin_transaction request to any server.
The contacted server executes the begin_transaction
request and returns the resulting TiD to the client.
This server becomes the coordinator for the
transaction and is responsible for aborting or
committing it and for adding other servers called
workers.
Contd..
Workers are dynamically added to the transaction.
The request,
Add_transaction (TID, server_id of coordinator)
Informs a server that it is involved in TID.
When the server receives add_transaction request,
It records the server identifier of coordinator.
Makes a new transaction record containing the TID.
Initializes a new log to record the updates to local files from
the transaction
It also makes a call to the coordinator to inform it of its
intention to join the transaction.
Contd..
Hence, each worker comes to know about the
coordinator and the coordinator comes to know
about and keeps a list of all the workers involved in
the transaction.
This information enables the workers and
coordinator to coordinate with each other at commit
time.
Two-Phase Multiserver Commit Protocol
When the client makes an end_transaction request,
the co-ordinator and the workers in the transaction
have tentative values in logs.
The co-ordinator decides whether the transaction
should be aborted or committed.
Hence, end_transaction is performed in two phases:
Preparation Phase &
Commit Phase
Preparation Phase
The coordinator makes an entry in its log that it is
starting the commit protocol.
It then sends a prepare message to all the workers
telling them to prepare to commit. The message has
a time-out value associated with it.
When a worker gets a message, it checks to see if it
is ready to commit.
If so, makes an entry in its log and replies with a
ready message, else replies with abort message.
Commit Phase
If all the workers are ready to commit, the
transaction is committed.
Coordinator makes an entry in its log indicating that
the transaction has been committed.
It then sends a commit message to the workers
asking them to commit.
At this point, the transaction is effectively completed,
so the coordinator can report success to the client.
Contd..
If any of the replies was abort or the prepare message
of worker got timed out, the transaction is aborted.
Coordinator makes an entry in its log indicating that
the transaction has been aborted.
It then sends an abort message to the workers asking
them to abort and reports failure to the client.
Contd..
When a worker receives the commit message, it makes
a committed entry in its log and sends a committed
reply to the coordinator.
Part of the transaction with worker is treated as
completed & its record with worker are erased.
When the coordinator has received a committed reply
from all the workers, the transaction is considered
complete and all its records maintained by the
coordinators are erased.
Coordinator keeps resending the commit message until
it receives the committed reply from all the workers.
Nested Transactions
Nested transactions are a generalization of
traditional transactions in which a transaction may
be composed of sub-transactions.
A sub-transaction may in turn have its own sub-
transactions.
In this way, transactions can be nested forming a
family of transactions.
Committing of Nested Transactions
In a nested transaction, a transaction may
commit only after all its descendants have
committed.
A transaction may abort at any time.
Hence, in order to commit the whole
transaction, its top level transaction must wait
for other transactions in the family to commit.
Contd..
Advantages of nested transactions
It allows concurrency within a transaction.
It provides greater protection against failures.
Design Principles for Distributed File
System
1. Clients have cycles to burn.
Preferably perform an operation on client‟s machine rather
than performing it on a server machine.
2. Cache whenever possible.
Caching of data at client‟s sites frequently improves overall
system performance because it makes data available.
3. Exploit usage properties.
Depending upon the usage properties (access and
modifications), files should be grouped into small number
of identifiable classes.
Contd..
4. Minimize system wide knowledge and change.
Monitoring or automatically updating of global information
is avoided.
5. Trust the fewest possible entities.
To ensure security based on the integrity of much smaller
number of servers rather than trusting thousand of clients.
6. Batch if possible
Grouping operations together can improve throughput.