Distributed File Systems: Unit - V Essay Questions
Distributed File Systems: Unit - V Essay Questions
Cache Validation schemes the modification propagation policy only specifies when
the master copy of a file on the server node is updated upon modification of a cache entry. It
does not tell anything about when the file data residing in the cache of other nodes is
updated.
A file data may simultaneously reside in the cache of multiple nodes. A clients cache
entry becomes stale as soon as some other client modifies the data corresponding to the
cache entry in the master copy of the file on the server.
It becomes necessary to verify if the data cached at a client node is consistent with
the master copy. If not, the cached data must be invalidated and the updated version of the
data must be fetched again from the server.
There are two approaches to verify the validity of cached data: The client – initiated
approach and the server – initiated approach.
Client – initiated approach: The client contacts the server and checks whether its locally cached
data is consistent with the master copy. Two approaches may be used:
1. Checking before every access.
This defeats the purpose of caching because the server needs to be contacted on every access.
2. Periodic checking.
A check is initiated every fixed interval of time.
So server monitors file usage modes being used by different clients and reacts whenever it
detects a potential for inconsistency. E.g. if a file is open for reading, other clients may be allowed to
open it for reading, but opening it for writing cannot be allowed. So also, a new client cannot open a
file in any mode if the file is open for writing.
When a client closes a file, it sends intimation to the server along with any modifications
made to the file. Then the sever updates its record of which client has which file open in which
mode.
When a new client makes a request to open an already open file and if the server finds that
the new open mode conflicts with the already open mode, the server can deny the request, queue
the request, or disable caching by asking all clients having the file open to remove that file from
their caches.
(c) Data – caching model: This model attempts to reduce the network traffic of the
previous model by caching the data obtained from the server node. This takes advantage of the
locality feature of the found in file accesses. A replacement policy such as LRU is used to keep the
cache size bounded.
While this model reduces network traffic it has to deal with the cache coherency problem during
writes, because the local cached copy of the data needs to be updated, the original file at the server
node needs to be updated and copies in any other caches need to updated.
(d) Diskless workstations: A distributed file system, with its transparent remote – file
accessing capability, allows the use of diskless workstations in a system.
Also, having all file access requests processed by a single server and disallowing caching on client
nodes is not desirable in practice due to poor performance, poor scalability, and poor reliability of
the distributed file system.
Hence distributed file systems implement a more relaxed semantics of file sharing. Applications
that need to guarantee UNIX semantics should provide mechanisms (e.g. mutex lock etc)
themselves and not rely on the underlying semantics of sharing provided by the file system.
In addition, replication control should be transparent, i.e., the number and locations of
replicas of a replicated file should be hidden from the user. Thus replication controlh must be
handled automatically in a user-transparent manner.
iv. If commit but not done, copy updated blocks from log to files, then add done
entry.
A replicated file is a file that has multiple copies, with each file on a separate file server.
Advantages of Replication:
1. Increased Availability: Alternate copies of a replicated data can be used when the
primarty copy is unaviable.
2. Increased Reliability: Due to the presence of redundant data files in the system,
recovery from catastrophic failure (e.g. hard drive crash) becomes possible.
3. Improved response time : It enables data to be accessed either locally or from a node to
which access time is lower than the primary copy access time.
4. Reduced network traffic: If a files replica is available with a file server that resides on a
clients node, the clients access request can be serviced locally, resulting in reduced
network traffic.
5. Improved system throughput: Several clients request for access to a file can be serviced
in parallel by different servers, resulting in improved system throughput.
6. Better scalability: Multiple file servers are available to service client requests since due
to file replication. This improves scalability.