DFS OS Final
DFS OS Final
(DFS)
Distributed File System provides transparency of data even if server or disk fails
Requirement of DFS
• Transparency—access data no matter where they are located and nature.
• Concurrency—allow multiple users to do multiple tasks at same time.
• Replication—Storing Data on multiple locations for easy availability.
• Heterogeneity—Same result after different kind of testing.
• Fault tolerance—Continue Operating when other component fails.
• Consistency-- same data kept at different places.
• Security—Protecting Data from unauthorized users.
• Efficiency-- many processes can be applied to data such as storage, access,
filtering, sharing.
Distributed File System (Cont.)
Transparent DFS hides the location where in the network the file is stored.
For a file being replicated in several sites the mapping returns a set of
location of this file’s replicas but the existence of multiple copies and their
location are hidden.
Naming Structure
Location Transparency: File name does not reveal the file’s physical
storage location.
1. File named by combination of their host name and local name which guarantees
a unique system wide name .
3. Single global name structures span all files in the system .If a server is
unavailable , some arbitrary set of directories on different machines also
becomes unavailable.
File Caching Schemes
Every distributed file system uses some form of caching. The reasons are:
A file-caching scheme for a distributed file system uses following file caching
forms:
Cache location
Modification Propagation
Cache Validation
Cache Location
Client’s Disk
In this case a cache hit costs one network access. It does not contribute to
scalability and reliability of the distributed file system. Since every cache hit
requires accessing the server.
Advantages:
Easy to implement
Easy to keep the original file and the cached data consistent
Client’s Disk
In this case a cache hit costs one disk access. This is somewhat slower than
having the cache in server main memory. Having the cache in server main
memory is also simpler.
Advantages:
Provides reliability
Large storage capacity
Contributes to scalability and reliability
Disadvantages:
Eliminates both network access cost and disk access cost. This technique is
not preferred to a client’s disk cache when large cache size and increased
reliability of cached data are desired.
Advantages:
When the cache is located on client’s nodes, a files data may simultaneously be
cached on multiple nodes. It is possible for caches to be come inconsistent
when the file data is changed by one of the clients and the corresponding data
cached at other nodes are not changed or discarded.
The modification propagation scheme used has a critical affect on the systems
performance and reliability. Techniques used include:
Write-Through Scheme
When a cache entry is modified, the new value is immediately sent to the server for
updating the master copy of the file.
Advantage:
The risk of updated data getting lost in the event of a client crash is very low
Disadvantage:
Advantages:
Disadvantages:
The modification propagation policy only specifies when the master copy of a
file on the server node is updated upon modification of a cache entry. It does not
tell anything about when the file data residing in the cache of other nodes is
updated.
Client-initiated approach
• Client initiates a validity check each time a file/cached data is
accessed
• Checks are initiated at fixed time intervals
• Only first copy of cached data is accessed
• Server checks whether the cached data is consistent with the master
copy
When access and validity check is coupled ,it is delayed as compared
to cache access.
Consistency Contd…
Server-initiated approach
• For each client, server records cached files