Unit_6_Distributed File System
Unit_6_Distributed File System
a) Concatenate the host name to the names of files that are stored on that
host.
Advantages:
Limitations:
It conflicts with the goal of network tranparency.
Moving a file from one host to another requires changes in filename and the
application accessing that file that is naming scheme is not location
independent.
c) Maintain a single global directory where all the files in the system belong to single
namespace. The main limitation of this scheme is that it is limited to one computing
facility or to a few co-operating computing facilities. This scheme is not used
generally.
Its limitation is: If name server fails, the entire system is affected and Name server
become a bottleneck and degrades the performance of the system.
Way 2: Use several name servers(on different hosts) wherein each server is
responsible for ampping objects stored in different domains. This approach is
generally used. Whenever a name is to be mapped to an object, the local name
server is queried. The local name server may point to remote server for furthur
mapping of the name.
Example: "a/b/c"- requires a remote server mapping the /b/c part of the filename.
This procedure is repeated until the name is completely resolved.
3. Writing Policy:
This policy decides when a modified cache block at client should be transferred to the
server.
b) Delayed Writing Policy: It delays the writing at the server that is modifications due to
write are reflected at server after some delay. This scheme can take advantage of caching.
The main limitation is less reliability that is when client crash, large amount of
information can be lost.
c) This scheme delays the updating of files at the server until the file is closed at the
client. When average period for which files are open is short then this policy is equivalent
to write through and when period is large then this policy is equivalent to delayed writing
policy.
4. Cache Consistency:
When multiple clients want to modify or access the same data, then cache consistency
problem arises. Two schemes are used to guarantee that data returned to the client is
valid.
a) Server initiated approach: Server inform the cache manager whenever data in client
caches becomes valid. Cache manager at clients then retrieve the new data or invalidate
the blocks containing old data in cache in cache. Server has maintain reliable records on
what data blocks are cached by which cache managers. Co operation between servers and
cache manager is required.
5. Availability:
It is one of the important issue is design of Distributed file system.
Limitations:
Extra storage space is required to store replicas.
2.All the file servers storing the replicas of file are not reachable from all clients due to
network partition and replicas of file in different partition are updated differently.
Unit of Replication: The main design issue in replication is the unit of replication. Following
units can be used for replication of data:
a) The most basic unit is file: This unit allow the replication of only those files that need to
have higher availability. This unit results in expansive replica management. Protection rights
associated with directory have to individually stored with each replica. Replica of common
directory may not have common file servers so require extra name resolution to locate each
replica in case of modification of file or directory.
b) Group of files called volumes can be used: Volume may represent files of single user or
files that are in a server. This scheme is used in coda file system. In this scheme, replica
management is simple that is protection rights can be associated with volume instead with
each individual file replica. Volume replication results in wasteful as a user needs higher
availability for only a few files in the volume.
c) Combination of volume and single file replication can be used: All the files of a user
form a file group called primary pack. A replica of primary pack, called a pack, is allowed to
contain a subset of files in primary pack. Corresponding to primary pack, one or more of
packs can be obtained according to requirement.
6. Scalability:
The design of DFS should be such that new systems can be easily introduced without
affecting it. Generally, client-server organisation is used to define DFS structure. Caching is
used in this organisation to improve performance. Server initiated cache invalidation is used
to maintain cache consistancy. In this approach, server maintain a record based information
regarding all the clients sharing file stored on it. This information represents server state. As
the system grows both the size of server state and load due to inalidations increases on server.
Following schemes can be used to reduce server state and server load:
a) Exploit knowledge about usage of files that is it is found that most commonly used and
shared files are accessed in read only mode. So, there is no need to check the validity of these
files of maintain the list of clients at servers for validation purpose.
b) Generally, data required by a client is found in another client's cache so a client can obtain
required data from another client rather than server.
Structure of server process play an important role. If server is designed with single proces,
then many clients have to wait for a long time whenevr a disk input/ output is initiated. This
can be avoided if separate process is assigned to each client.
7. Semantics:
The semantic of a file system represent the affects of acceses on file. The basic semantic is
that a read operation will return the data (stored ) due to latest write operation. The semantic
can be guranteed in two ways: All read and writes from various clients will have to go
through the server. Sharing will have to be disallowed either by server or by the use of locks
by application. In first way, the server become bottleneck and in second way, the file is not
available for certain clients.