0% found this document useful (0 votes)

39 views68 pages

Distributed File System

A Distributed File System (DFS) enables permanent storage and sharing of information in a distributed environment, offering advantages like user mobility, availability, and remote information sharing. It provides various services such as storage, true file service, and name service, while emphasizing desirable features like transparency, performance, and security. The document also discusses file models, accessing methods, caching schemes, replication, fault tolerance, and concurrency control techniques essential for maintaining data integrity and consistency in a distributed system.

Uploaded by

hiral2004gabhane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views68 pages

Distributed File System

Uploaded by

hiral2004gabhane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Distributed File System

 Purpose of using the files:
 Permanent storage of information
 Sharing of information

 A distributed file system provides abstraction

to the users of a distributed system & makes
it convenient to use files in a distributed
environment.
Advantages of DFS
 Remote information sharing
 User mobility
 Availability
 Diskless workstations

DFS provides the following services:

 Storage service
 True file service
 Name service
Desirable features of a good DFS
 Transparency
 Structure transparency
 Access transparency
 Naming transparency
 Replication transparency
Cont…
 Performance
 Usually measured as the average amount of
time needed to satisfy client requests.
 In centralized file system, this time includes the
time for accessing the secondary storage device
on which the file is stored and the CPU
processing time.
 In DFS, this time includes network
communication overhead when the accessed file
is remote.
Cont…
 User mobility
 Simplicity & ease of use
 Scalability
 High availability
 High reliability
 Data integrity
 Security
 Heterogeneity
File Models
 Unstructured and structured files
 Unstructured – Unix, DOS
 Structured – Ordered sequence of records
 Mutable and immutable files
Unstructured and Structured files

 Structured file model

 A file appears to the file server as an ordered
sequence of records.

 Structured files are of two types

 Files with Non-indexed records
 Files with indexed records

 Unstructured file model

 Sharing of files by different applications is easier.
Mutable and immutable files
 Mutable file model
 In this model, an update performed on a file
overwrites on its old contents to produce the new
contents.
 File is represented as a single stored sequence
that is altered by each update operation.

 Immutable file model

 In this model, a file cannot be modified once it has
been created except to be deleted.
 File versioning approach is used.
File-accessing models
 Method used for accessing remote files

 Unit of data access

Accessing remote files
 Remote service model
 Processing of the client‟s request is performed at the
server‟s node.
 Every remote file access request results in network traffic.

 Data-caching model
 It gives advantage of the locality feature found in file
accesses.
 Data is copied from the server‟s node to the client‟s node
and is cached on the client‟s node.
 Cache consistency problem.
Unit of data transfer
1. File-level transfer model: AFS-2, Amoeba
2. Block-level transfer model: Sun Microsystems
NFS
3. Byte-level transfer model: Cambridge File
System
4. Record-level transfer model: Structured Files
File-sharing semantics
 Unix semantics
 It enforces an absolute time ordering on all
operations.
 Session semantics
 All changes made to a file during a session are
initially made visible only to the client process that
opened the session &
 Invisible to other remote processes who have the
same file open simultaneously.
 Immutable shared-files semantics

 Transaction-like semantics
 It is based on the transaction mechanism, which is
a high-level mechanism for controlling concurrent
access to shared, mutable data.
File-caching schemes
 Several key decisions, such as:
 Granularity of cached data
 Cache size
 Replacement policy
 Cache location
 Modification propagation
 Cache validation
Cache location

 Place where the cached data is stored.

 Possible cache locations in DFS are:
 Server‟s main memory
 Client‟s disk (diskless workstation ?)
 Client‟s main memory
Modification propagation
 When client caching is used, multiple copies of
the same data at many nodes require to be
consistent.

 Issues:
 When to propagate modifications made to a cached
data to the corresponding file server?

 How to verify the validity of cached data?

Modification Propagation Schemes
 Write-through scheme
 Unix Like Semantics: master copy always remains
updated

 Delayed-write scheme
 Write on Ejection from Cache
 Periodic Write
 Write on Close: like session semantics
Delayed-write Scheme
 When a cache entry is modified, the new value
is written only to the cache and the client just
makes a note that the cache entry has been
updated.
Cache validation schemes
 Client-initiated approach
 Checking before every access: Unix like semantic
 Periodic checking
 Check on file open: Session semantic
Cont…
 Server-initiated approach
 A client informs the file server when opening a file,
indicating whether the file is being opened for reading,
writing, or both.

 The server keeps monitoring the file usage modes by

keeping a record of which client has which file open
and in what mode.
 Simultaneous read access allowed, but R & W not
allowed.
Drawback of Server-initiated approach
 Violates principle of client-server

 Requires stateful servers

Callback Policy
 Server keeps record of client who have
cached the file.
 Cached entry assumed to be valid unless
notified by server.
File replication
 Replication and Caching
 A replica is associated with a server, whereas a
cached copy is normally associated with a client.

 The existence of a cached copy is primarily

dependent on the locality in the access patterns,
whereas the existence of a replica normally
depends on availability and performance
requirements.
File replication
 As compared to a cached copy, a replica is more
persistent, widely known, secure, available,
complete, and accurate.

 A cached copy is contingent upon a replica. Only

by periodic revalidation with respect to a replica
can a cached copy be useful.
File Replication Advantages
1. Increased availability

2. Increased reliability

3. Improved response time

4. Reduced network traffic

5. Improved system throughput

6. Better scalability

7. Autonomous operation
File replication
 Replication transparency
• Naming of replicas

• Replication control
1. Explicit replication

2. Implicit / lazy replication

Multicopy Update Problem
 Commonly used approaches to handle this:
1. Read-only replication

2. Read-any-write-all protocol

3. Available-copies protocol

4. Primary-copy protocol

5. Quorum-based protocols
Multicopy Update Problem

 Read-only replication
 Replication of only Immutable Files

 Read-any-write-all protocol
 For Mutable files
 Unix like semantic
 Lock all copy and update
 Available-copies protocol
 Update available copies (some server may be
down)

 Primary-copy protocol
 One copy designated as Primary copy, rest
secondary
 Write operation only on primary copy
 Secondary updated by push / pull, Unix sem
or Lazily
Quorum-based protocol

 Handles network partition

 Let there be „n‟ copies of file F
 Read Op – Min „r‟ copies of F are consulted
 Write Op – Min „w‟ copies of F written
 r+w>n
 Guarantees atleast one up-to-date copy
 Associate Ver no. with copy
 Copy with highest Ver no. most recent /
updated
Special Cases of Quorum Protocol

 Read Any Write All

 Suitable when ratio of Read to Write is large
 Read All Write Any
 Suitable when ratio of Write to Read is large
 Majority Consensus Protocol
 When ratio of Rd. to Wr. Is 1
 Consensus with Weighted Voting
 Giving higher weightage to frequently accessed
copy
Fault tolerance
 Properties
1. Availability

2. Robustness

3. Recoverability
Fault tolerance
 Storage
1. Volatile storage

2. Nonvolatile storage

3. Stable storage
Fault tolerance & Service paradigm

 Stateful file servers

 Maintains state information pertaining to
service Request during file open & close
operation called Session.
 Stateless file servers
Atomic transactions
Essential properties of transactions:

1. Atomicity
 Failure atomicity / all-or-nothing property

 Concurrency atomicity / consistency property

2. Serializability / isolation property

3. Permanence / durability
Need for transactions in a file service

 For improving the recoverability of files in the event

of failures.

 For allowing the concurrent sharing of mutable files

by multiple clients in a consistent manner.

 Inconsistency may be due to:

 System failure or

 Concurrent access
Operations for transactions-based file service
1. begin_transaction
2. end_transaction
3. abort_transaction
Recovery Techniques
 File versions approach
 Avoid overwriting of actual data in physical storage
 When a transaction begins, the server creates a tentative
version from current version for write operation.
 When transaction is committed, tentative version is made
the new current version and
 Previous current version added to sequence of old version
 Serializability Conflict – when merging various tentative
versions.
 When 2 or more transactions are allowed to access same data
item & one or more of these is Wr Op.
Recovery Techniques
 Shadow blocks technique for implementing file versions
 Shadow blocks technique for implementing file versions
is used as an optimization that allows creation of a
tentative version of a file without the need to copy the
full file. In fact, it removes most of the copying.
 Here entire dist space partitioned into blocks.
 File system maintains index for each file and list of free
blocks.
 Tentative Ver. of a file is created by copying index of
current Ver. of the file
Recovery Techniques
 The write-ahead log approach
 Write-ahead log maintained on a stable storage.
 A record is first created and written to a Log.
 After this, operation is performed on file to modify
its contents.
 It is used for recording file updates in a recoverable
manner.
 A log file “write-ahead log” is used to record the
operations of a transaction that modifies the file.
Concurrency control
 Allows maximum concurrency with minimum
overhead.

 Ensures that transactions are run in a manner so

that their effects on shared data are serially
equivalent.
Cont…

 Approaches used are:

 Locking

 Optimistic concurrency control

 timestamps
Locking
 In the basic locking mechanism, a transaction locks
a data item before accessing it.

 Optimized locking for better concurrency

 Type-specific locking
 Intention-to-write locks
 Read, i-write & commit lock
 (during i-write lock read opn is permitted whereas during
commit, it is not)

 Two phase locking protocol

1. Growing phase
2. Shrinking phase
Locking

 Granularity of locking - the unit of lockable data

items.

 Handling of locking deadlocks

 Avoidance
 Detection
 Timeouts
Optimistic Concurrency Control

 Transactions are allowed to proceed uncontrolled

up to the end of the first phase.

 In the second phase, before a transaction is

committed, the transaction is validated to see if
any of its data items have been changed by any
other transaction since it started.

 The transaction is committed if found valid;

otherwise it is aborted.
Contd..
 For validation process, two records are kept
of the data items within a transaction:
 read set
 write set

 To validate a transaction, read and write sets

are compared with write sets of all the
concurrent transactions that reached at the
end of first phase.
Contd..

 If any data item present in the read set or

write set of the transaction being validated is
also present in the write set of any concurrent
transaction, the validation fails.
Cont…
 Advantages:
 Maximum parallelism

 Free from deadlock

 Drawbacks:
 Old versions of files are required to be retained
for validation process.
 Starvation of a transactions.

 In an overloaded system, number of transactions

getting aborted may go up substantially.
Timestamps
 Detect Conflict right when operation causing it is
executed.
 Each operation in a transaction is validated when
it is carried out.
 It the validation fails, the transaction is aborted
immediately and it can then be restarted.
 Each transaction is assigned a unique timestamp
at the moment it does begin_transaction.
 Every data item has a read timestamp and write
timestamp.
Contd..

 When a transaction accesses a data item, depending

on the type of access (read/write), the data item
timestamp is updated to the transaction‟s timestamp.

 The write operations of transactions are recorded

tentatively and are invisible to other transactions until
the transaction commits.
Validation of Write Operation

 If the timestamp of current transaction is either

equal to or more recent than the read and
(committed) write timestamps of accessed
data item, the write operation passes a
validation check.
 If the timestamp of current transaction is older
than the timestamp of the last read or
committed data item, the validation fails.
Validation of Read Operation

 If the timestamp of current transaction is more

recent than the write timestamp of all
committed and tentative values of the
accessed data item, the read operation passes
validation check.
 The read operation can be performed
immediately only if there are no tentative
values of the data item; otherwise it must wait
until the completion of the transactions having
tentative values of the data item.
Contd..

 The validation check fails and the current

transaction is aborted in the following cases:
 The timestamp of the current transaction is older
than the timestamp of the most recent
(committed) write to the data item.
 The timestamp of the current transaction is older
than that of tentative value of the data item made
by another transaction, although it is more recent
than the timestamp of permanent data item.
Distributed Transaction Service
 It supports transactions involving files managed by
more than one server.
 All servers need to communicate with one another
to coordinate their actions during the processing of
the transaction.
 A simple approach is to pass client requests through
a single server that holds the relevant file.
Contd..
 A client begins the transaction by sending a
begin_transaction request to any server.
 The contacted server executes the begin_transaction
request and returns the resulting TiD to the client.
 This server becomes the coordinator for the
transaction and is responsible for aborting or
committing it and for adding other servers called
workers.
Contd..
 Workers are dynamically added to the transaction.
 The request,
Add_transaction (TID, server_id of coordinator)
Informs a server that it is involved in TID.

 When the server receives add_transaction request,

 It records the server identifier of coordinator.
 Makes a new transaction record containing the TID.
 Initializes a new log to record the updates to local files from
the transaction
 It also makes a call to the coordinator to inform it of its
intention to join the transaction.
Contd..
 Hence, each worker comes to know about the
coordinator and the coordinator comes to know
about and keeps a list of all the workers involved in
the transaction.
 This information enables the workers and
coordinator to coordinate with each other at commit
time.
Two-Phase Multiserver Commit Protocol
 When the client makes an end_transaction request,
the co-ordinator and the workers in the transaction
have tentative values in logs.
 The co-ordinator decides whether the transaction
should be aborted or committed.
 Hence, end_transaction is performed in two phases:
 Preparation Phase &
 Commit Phase
Preparation Phase

 The coordinator makes an entry in its log that it is

starting the commit protocol.
 It then sends a prepare message to all the workers
telling them to prepare to commit. The message has
a time-out value associated with it.
 When a worker gets a message, it checks to see if it
is ready to commit.
 If so, makes an entry in its log and replies with a
ready message, else replies with abort message.
Commit Phase

 If all the workers are ready to commit, the

transaction is committed.

 Coordinator makes an entry in its log indicating that

the transaction has been committed.

 It then sends a commit message to the workers

asking them to commit.

 At this point, the transaction is effectively completed,

so the coordinator can report success to the client.
Contd..

 If any of the replies was abort or the prepare message

of worker got timed out, the transaction is aborted.
 Coordinator makes an entry in its log indicating that
the transaction has been aborted.
 It then sends an abort message to the workers asking
them to abort and reports failure to the client.
Contd..
 When a worker receives the commit message, it makes
a committed entry in its log and sends a committed
reply to the coordinator.
 Part of the transaction with worker is treated as
completed & its record with worker are erased.
 When the coordinator has received a committed reply
from all the workers, the transaction is considered
complete and all its records maintained by the
coordinators are erased.
 Coordinator keeps resending the commit message until
it receives the committed reply from all the workers.
Nested Transactions
 Nested transactions are a generalization of
traditional transactions in which a transaction may
be composed of sub-transactions.

 A sub-transaction may in turn have its own sub-

transactions.

 In this way, transactions can be nested forming a

family of transactions.
Committing of Nested Transactions

 In a nested transaction, a transaction may

commit only after all its descendants have
committed.
 A transaction may abort at any time.
 Hence, in order to commit the whole
transaction, its top level transaction must wait
for other transactions in the family to commit.
Contd..

 Advantages of nested transactions

 It allows concurrency within a transaction.
 It provides greater protection against failures.
Design Principles for Distributed File
System
1. Clients have cycles to burn.
 Preferably perform an operation on client‟s machine rather
than performing it on a server machine.

2. Cache whenever possible.

 Caching of data at client‟s sites frequently improves overall
system performance because it makes data available.

3. Exploit usage properties.

 Depending upon the usage properties (access and
modifications), files should be grouped into small number
of identifiable classes.
Contd..

4. Minimize system wide knowledge and change.

 Monitoring or automatically updating of global information
is avoided.

5. Trust the fewest possible entities.

 To ensure security based on the integrity of much smaller
number of servers rather than trusting thousand of clients.

6. Batch if possible
 Grouping operations together can improve throughput.

WINSEM2012-13 CP0029 06-Mar-2013 RM01 DFT 2
No ratings yet
WINSEM2012-13 CP0029 06-Mar-2013 RM01 DFT 2
46 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
42 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
37 pages
Distributed File Systems
No ratings yet
Distributed File Systems
28 pages
3distributed File System
No ratings yet
3distributed File System
42 pages
Distributed Computing
No ratings yet
Distributed Computing
19 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
DFS-Based Railway Reservation
No ratings yet
DFS-Based Railway Reservation
8 pages
Distributed File System Overview
100% (1)
Distributed File System Overview
30 pages
Module 5
No ratings yet
Module 5
36 pages
Unit-3 Part1
No ratings yet
Unit-3 Part1
57 pages
Dos
No ratings yet
Dos
86 pages
DISTRIBUTEDFILESYS
No ratings yet
DISTRIBUTEDFILESYS
16 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
28 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
9 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
16 pages
Client-Server Architecture Overview
No ratings yet
Client-Server Architecture Overview
44 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
10 pages
Distributed Systems: File Models
No ratings yet
Distributed Systems: File Models
25 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
35 pages
File Models and File Accessing Models: Prepared By: Mehta Ishani 1300407010030
No ratings yet
File Models and File Accessing Models: Prepared By: Mehta Ishani 1300407010030
18 pages
Unit 3 DS
No ratings yet
Unit 3 DS
7 pages
Distributed Operating Systems: By: Malik Abdulrehman
No ratings yet
Distributed Operating Systems: By: Malik Abdulrehman
27 pages
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
No ratings yet
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
51 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
Distributed File System Requirements
No ratings yet
Distributed File System Requirements
4 pages
Distributed File Systems
No ratings yet
Distributed File Systems
50 pages
Final Suggestions Dos - 605B
No ratings yet
Final Suggestions Dos - 605B
17 pages
Dos 4
No ratings yet
Dos 4
2 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
Gytha John Harikrishnan Hridya S7Cse: Presented by
No ratings yet
Gytha John Harikrishnan Hridya S7Cse: Presented by
17 pages
Caching in Distributed File System: Ke Wang CS614 - Advanced System Apr 24, 2001
No ratings yet
Caching in Distributed File System: Ke Wang CS614 - Advanced System Apr 24, 2001
56 pages
Distributed File Systems
No ratings yet
Distributed File Systems
31 pages
Overview of LFS, NFS, and AFS Systems
No ratings yet
Overview of LFS, NFS, and AFS Systems
37 pages
Chapter 6 - NG - 2020
No ratings yet
Chapter 6 - NG - 2020
16 pages
Distributed UNIT 5
No ratings yet
Distributed UNIT 5
15 pages
Discrete Computing
No ratings yet
Discrete Computing
25 pages
Dos Notes
No ratings yet
Dos Notes
2 pages
Chapter 2 (II) Distributed System
No ratings yet
Chapter 2 (II) Distributed System
80 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Ds 2016 17 Lec17
No ratings yet
Ds 2016 17 Lec17
32 pages
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
No ratings yet
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
27 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
47 pages
Distributed File Systems
No ratings yet
Distributed File Systems
23 pages
Distributed File Systems: Arvind Krishnamurthy Spring 2001
No ratings yet
Distributed File Systems: Arvind Krishnamurthy Spring 2001
3 pages
Designing a Distributed File System
No ratings yet
Designing a Distributed File System
21 pages
5.distributed File System
No ratings yet
5.distributed File System
86 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
6 pages
Unit-4 DFS-1
No ratings yet
Unit-4 DFS-1
9 pages
A Distributed File System: By, Prof Ankita Mandore
No ratings yet
A Distributed File System: By, Prof Ankita Mandore
37 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Unit - 6 - Distributed File System
No ratings yet
Unit - 6 - Distributed File System
6 pages
Distributed File System
No ratings yet
Distributed File System
27 pages
Fasm PDF
No ratings yet
Fasm PDF
123 pages
How To Remove or Uninstall Qualcomm Driver From Windows Computer
No ratings yet
How To Remove or Uninstall Qualcomm Driver From Windows Computer
8 pages
Ai OLog
No ratings yet
Ai OLog
1 page
Temenos UXP R19 (v7.0) Release
No ratings yet
Temenos UXP R19 (v7.0) Release
20 pages
From Zero To Hero
No ratings yet
From Zero To Hero
21 pages
Basic PLC VDEE 2
No ratings yet
Basic PLC VDEE 2
70 pages
ARM Instruction Set Overview
No ratings yet
ARM Instruction Set Overview
58 pages
85 - SAP S4HANA Conversion and SAP System Upgrade (E - S4HCON2022) - v1.0 - BH
No ratings yet
85 - SAP S4HANA Conversion and SAP System Upgrade (E - S4HCON2022) - v1.0 - BH
23 pages
Memory Banking in 8086
No ratings yet
Memory Banking in 8086
9 pages
Java Basics Quiz
No ratings yet
Java Basics Quiz
11 pages
Commvault Corporate Overview
No ratings yet
Commvault Corporate Overview
22 pages
HikCentral Professional On Amazon Web Services - Deployment Guide - 20211018
No ratings yet
HikCentral Professional On Amazon Web Services - Deployment Guide - 20211018
15 pages
Year: Ii Btech Branch: Cse Subject: Java
No ratings yet
Year: Ii Btech Branch: Cse Subject: Java
18 pages
Secret Codes For Phone
No ratings yet
Secret Codes For Phone
13 pages
Apple Watch Series 3 OTA Updates
No ratings yet
Apple Watch Series 3 OTA Updates
1 page
A6V11420333 en
No ratings yet
A6V11420333 en
5 pages
4-Red Teaming MS SQL Server
No ratings yet
4-Red Teaming MS SQL Server
161 pages
Pricelist Legalsize
No ratings yet
Pricelist Legalsize
2 pages
Process Synchronization Guide
No ratings yet
Process Synchronization Guide
14 pages
dsPIC30F Digital Signal Controllers - ds70095b
No ratings yet
dsPIC30F Digital Signal Controllers - ds70095b
16 pages
Acer Iconia w4 Manual
No ratings yet
Acer Iconia w4 Manual
61 pages
Embedded Interview Questions - SPI & I2C
100% (2)
Embedded Interview Questions - SPI & I2C
10 pages
K. Shibu-Introduction To Embedded Systems-Tmh (2009) - Text
100% (1)
K. Shibu-Introduction To Embedded Systems-Tmh (2009) - Text
748 pages
Repetitive Statements in Programming
No ratings yet
Repetitive Statements in Programming
25 pages
Kubernetes 101 v2.0
No ratings yet
Kubernetes 101 v2.0
14 pages
SQL Database Basics for Beginners
No ratings yet
SQL Database Basics for Beginners
3 pages
Windows Programming Chapter Two
No ratings yet
Windows Programming Chapter Two
59 pages
Content Management System
100% (9)
Content Management System
29 pages
How Flash Memory Functions
No ratings yet
How Flash Memory Functions
3 pages
IBM ManageMentServer
No ratings yet
IBM ManageMentServer
194 pages

Distributed File System

Uploaded by

Distributed File System

Uploaded by

Distributed File System