0% found this document useful (0 votes)
2 views

Unit_6_Distributed File System

The document discusses the architecture and design issues of Distributed File Systems (DFS), focusing on naming, caching, writing policies, cache consistency, availability, scalability, and semantics. It outlines various strategies for name resolution, caching mechanisms, and replication to enhance file availability and consistency. The document emphasizes the importance of balancing performance, reliability, and system transparency in the design of DFS.

Uploaded by

vkst2417
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit_6_Distributed File System

The document discusses the architecture and design issues of Distributed File Systems (DFS), focusing on naming, caching, writing policies, cache consistency, availability, scalability, and semantics. It outlines various strategies for name resolution, caching mechanisms, and replication to enhance file availability and consistency. The document emphasizes the importance of balancing performance, reliability, and system transparency in the design of DFS.

Uploaded by

vkst2417
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Unit 6:Distributed File System

Distributed File System and its goal:

Architecture of File Systems:

Name Server and Cache Manager:


Fig1.Architecture of distributed file system

Fig2:Typical data access flow in Distributed file system


Design Issues of Distributed File System
1. Naming and Name Resolution:

Name refers to an object such as file or a directory.


Name Resoltuion refers to the process of mapping a name to an object that
is physical storage.
Name space is collection of names.
Names can be assigned to files in distributed file system in three ways:

a) Concatenate the host name to the names of files that are stored on that
host.
Advantages:

 File name is unique


 Systemwide
 Name resolution is simple as file can be located easily.

Limitations:
It conflicts with the goal of network tranparency.

 Moving a file from one host to another requires changes in filename and the
application accessing that file that is naming scheme is not location
independent.

b) Mount remote directories onto local directories. Mounting a remote directory


require that host of directory to be known only once. Once a remote directory is
mounted, its files can be referred in location transparent way. This approach resolve
file name without consulting any host.

c) Maintain a single global directory where all the files in the system belong to single
namespace. The main limitation of this scheme is that it is limited to one computing
facility or to a few co-operating computing facilities. This scheme is not used
generally.

Name Server: It is responsible for name resolution in distributed system. Generally


two approaches are used for maintaining name resolution information.
Way 1: Use a single name server that is all clients send their queries to single server
which maps names to objects.

Its limitation is: If name server fails, the entire system is affected and Name server
become a bottleneck and degrades the performance of the system.
Way 2: Use several name servers(on different hosts) wherein each server is
responsible for ampping objects stored in different domains. This approach is
generally used. Whenever a name is to be mapped to an object, the local name
server is queried. The local name server may point to remote server for furthur
mapping of the name.

Example: "a/b/c"- requires a remote server mapping the /b/c part of the filename.
This procedure is repeated until the name is completely resolved.

2. Caches on Disk or Main Memory:


Caching refers to storage of data either into the main memory or onto disk space
after its first reference by client machine.

Advantages of having cache in main memory:


Diskless workstations can also take advantage of caching.

 Accessing a cache in main memory is much faster than accessing a cache on


local disk.
 The server cache is in the main memory at the server, a single design for a
caching mechanism is used for clients and servers.

Limitations:
 Large files cannot be cached completely so caching done block oriented
which is more complex.

 It competes with virtual memory system for physical memory space, so a


scheme to deal with memory contention cache and virtual memory system is
necessary. Thus, more complex cache manager and memory management is
required.

Advantages of having cache on a local disk:


Large files can be cached without affecting performance.

Virtual memory management is simple.

3. Writing Policy:

This policy decides when a modified cache block at client should be transferred to the
server.

Following policies are used:


a) Write Through: All writes required by clients applications are also carried out at
servers immediately. The main advantage is reliability that is when client crash, little
information is lost. This scheme cannot take advantage of caching.

b) Delayed Writing Policy: It delays the writing at the server that is modifications due to
write are reflected at server after some delay. This scheme can take advantage of caching.
The main limitation is less reliability that is when client crash, large amount of
information can be lost.

c) This scheme delays the updating of files at the server until the file is closed at the
client. When average period for which files are open is short then this policy is equivalent
to write through and when period is large then this policy is equivalent to delayed writing
policy.

4. Cache Consistency:
When multiple clients want to modify or access the same data, then cache consistency
problem arises. Two schemes are used to guarantee that data returned to the client is
valid.

a) Server initiated approach: Server inform the cache manager whenever data in client
caches becomes valid. Cache manager at clients then retrieve the new data or invalidate
the blocks containing old data in cache in cache. Server has maintain reliable records on
what data blocks are cached by which cache managers. Co operation between servers and
cache manager is required.

b) Client-initiated approach: It is the responsibilty of cache manager at the clients to


validate data with server before returning it to the clients. This approach does not take
benefit of caching as the cache manager consult the server for validation of cached block
each time.

5. Availability:
It is one of the important issue is design of Distributed file system.

Server failure or communication network can attract the availability of files.

Replication: The primary mechanism used for enhancing availability of files is


replication. In this mechanism, many copies or replicas of files are maintained at different
servers.

Limitations:
Extra storage space is required to store replicas.

Extra overhead is required in maintained all replicas up to date.

Following situations cause inconsistency among replicas:


1.Replica is not updated due to failure of server storing the replica

2.All the file servers storing the replicas of file are not reachable from all clients due to
network partition and replicas of file in different partition are updated differently.
Unit of Replication: The main design issue in replication is the unit of replication. Following
units can be used for replication of data:
a) The most basic unit is file: This unit allow the replication of only those files that need to
have higher availability. This unit results in expansive replica management. Protection rights
associated with directory have to individually stored with each replica. Replica of common
directory may not have common file servers so require extra name resolution to locate each
replica in case of modification of file or directory.
b) Group of files called volumes can be used: Volume may represent files of single user or
files that are in a server. This scheme is used in coda file system. In this scheme, replica
management is simple that is protection rights can be associated with volume instead with
each individual file replica. Volume replication results in wasteful as a user needs higher
availability for only a few files in the volume.
c) Combination of volume and single file replication can be used: All the files of a user
form a file group called primary pack. A replica of primary pack, called a pack, is allowed to
contain a subset of files in primary pack. Corresponding to primary pack, one or more of
packs can be obtained according to requirement.
6. Scalability:
The design of DFS should be such that new systems can be easily introduced without
affecting it. Generally, client-server organisation is used to define DFS structure. Caching is
used in this organisation to improve performance. Server initiated cache invalidation is used
to maintain cache consistancy. In this approach, server maintain a record based information
regarding all the clients sharing file stored on it. This information represents server state. As
the system grows both the size of server state and load due to inalidations increases on server.
Following schemes can be used to reduce server state and server load:
a) Exploit knowledge about usage of files that is it is found that most commonly used and
shared files are accessed in read only mode. So, there is no need to check the validity of these
files of maintain the list of clients at servers for validation purpose.
b) Generally, data required by a client is found in another client's cache so a client can obtain
required data from another client rather than server.
Structure of server process play an important role. If server is designed with single proces,
then many clients have to wait for a long time whenevr a disk input/ output is initiated. This
can be avoided if separate process is assigned to each client.
7. Semantics:
The semantic of a file system represent the affects of acceses on file. The basic semantic is
that a read operation will return the data (stored ) due to latest write operation. The semantic
can be guranteed in two ways: All read and writes from various clients will have to go
through the server. Sharing will have to be disallowed either by server or by the use of locks
by application. In first way, the server become bottleneck and in second way, the file is not
available for certain clients.

You might also like