0% found this document useful (0 votes)
10 views28 pages

DFS OS Final

Uploaded by

Ayesha Masood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

DFS OS Final

Uploaded by

Ayesha Masood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

DISTRIBUTED FILE SYSTEM

(DFS)

PRESENTING TO: DR. SHOIAB


GROUP MEMBERS

Abu Zer Ghaffari (2021-MSCS-39)


Amna Awais (2021-MSCS-08)
Faiza Butt (2021-MSCS-41)
Fatima Khan (2021-MSCS-35)
Sameen Khalid (2021-MSCS-46)
This Presentation will discuss:

• Basic principle of currently available Distributed file system architecture


• Two main architectural models of DFS
• Challenges faced by these models
• Naming and transparency
• Remote file access
• Caching and cache consistency
What Is File System?
It is a method of data collection that is used as a medium for giving input and
receiving output from any program

But now, A file system typically manages operations, such as storage


management, file naming, directories/folders, metadata, access rules and
privileges. Commonly used file systems include File Allocation Table 32 (FAT
32), New Technology File System (NTFS) and Hierarchical File System
(HFS
File system taxonomy
Distributed File System (DFS)
It allows the data to be share remotely. It improved the availability of file,
access time and network efficiency. Improved the capacity to change the size of
the data and also improves the ability to exchange the data

Distributed File System provides transparency of data even if server or disk fails
Requirement of DFS
• Transparency—access data no matter where they are located and nature.
• Concurrency—allow multiple users to do multiple tasks at same time.
• Replication—Storing Data on multiple locations for easy availability.
• Heterogeneity—Same result after different kind of testing.
• Fault tolerance—Continue Operating when other component fails.
• Consistency-- same data kept at different places.
• Security—Protecting Data from unauthorized users.
• Efficiency-- many processes can be applied to data such as storage, access,
filtering, sharing.
Distributed File System (Cont.)

• Two widely-used architectural models include client-server model and


cluster-based model
Client-Server DFS Model
• Server(s) store both files and metadata on attached storage
• Clients contact the server to request files
• Sever responsible for authentication, checking file permissions, and delivering the file
• Changes client makes to file must be propagated back to the server
• Popular examples include NFS and OpenAFS
• Challenges :
scalability and bandwidth
Client-Server DFS Model (Cont.)
Cluster-based DFS Model

• Built to be more fault-tolerant and scalable than client-server DFS


• Examples include the Google File System (GFS) and Hadoop
Distributed File System (HDFS)
• Clients connected to master metadata server and several data servers that hold
“chunks” (portions) of files
• Metadata server keeps mapping of which data servers hold chunks of which files
• As well as hierarchical mapping of directories and files
• File chunks replicated n times
Cluster-based DFS Model (Cont.)
Challenges faced by these models

• Naming and transparency


• Remote file access
• Caching and cache consistency
Naming Transparency

Naming is mapping between logical and physical objects.

Multi-level mapping provides abstraction of a file that hides the detail of


how and where on the disk the file is actually stored .

Transparent DFS hides the location where in the network the file is stored.

For a file being replicated in several sites the mapping returns a set of
location of this file’s replicas but the existence of multiple copies and their
location are hidden.
Naming Structure

In DFS, we use two notion regarding name mapping :-

Location Transparency: File name does not reveal the file’s physical
storage location.

Location Independency: when the file’s physical storage location


change , file name does not need to be changed.
Naming Scheme

There are the three name approaches:-

1. File named by combination of their host name and local name which guarantees
a unique system wide name .

2. Attach more directories to local directories giving the apperence of a coherent


directory tree ; only previously mounted remote directories can be accessed
transparently.

3. Single global name structures span all files in the system .If a server is
unavailable , some arbitrary set of directories on different machines also
becomes unavailable.
File Caching Schemes

Every distributed file system uses some form of caching. The reasons are:

Better performance since repeated accesses to the same information is


handled additional network accesses and disk transfers. This is due to
locality in file access patterns.

A file-caching scheme for a distributed file system uses following file caching
forms:

 Cache location

 Modification Propagation

 Cache Validation
Cache Location

This refers to the place where the cached data is stored.


There are three possible cache locations.

 Server’s Main Memory

 Client’s Disk

 Client’s Main Memory


Server’s Main Memory

In this case a cache hit costs one network access. It does not contribute to
scalability and reliability of the distributed file system. Since every cache hit
requires accessing the server.

Advantages:

 Easy to implement

 Totally transparent to clients

 Easy to keep the original file and the cached data consistent
Client’s Disk

In this case a cache hit costs one disk access. This is somewhat slower than
having the cache in server main memory. Having the cache in server main
memory is also simpler.

Advantages:

 Provides reliability
 Large storage capacity
 Contributes to scalability and reliability

Disadvantages:

 Does not support if the system is to support diskless workstations


 Access time is considerably large
Client’s Main Memory

Eliminates both network access cost and disk access cost. This technique is
not preferred to a client’s disk cache when large cache size and increased
reliability of cached data are desired.

Advantages:

 Maximum performance gain.

 Permits workstations to be diskless.

 Contributes to reliability and scalability


Modification Propagation

When the cache is located on client’s nodes, a files data may simultaneously be
cached on multiple nodes. It is possible for caches to be come inconsistent
when the file data is changed by one of the clients and the corresponding data
cached at other nodes are not changed or discarded.

There are two design issues involved:

 When to propagate modifications made to a cached data to the


corresponding file server.

 How to verify the validity of cached data


Techniques

The modification propagation scheme used has a critical affect on the systems
performance and reliability. Techniques used include:

Write-Through Scheme

When a cache entry is modified, the new value is immediately sent to the server for
updating the master copy of the file.
Advantage:

 High degree of reliability and suitability

 The risk of updated data getting lost in the event of a client crash is very low

Disadvantage:

 Poor Write Performance.


Delayed-Write Scheme
To reduce network traffic for writes the delayed-write scheme is used. In this case, the
new data value is only written to the cache and all updated cache entries are sent to the
server at a later time. There are three commonly used delayed-write approaches:

 Write on ejection from cache


 Periodic write
 Write on close

Advantages:

 Write accesses complete more quickly that results in a performance gain

Disadvantages:

 Reliability can be a problem


Cache Validation schemes

The modification propagation policy only specifies when the master copy of a
file on the server node is updated upon modification of a cache entry. It does not
tell anything about when the file data residing in the cache of other nodes is
updated.

It becomes necessary to verify if the data cached at a client node is consistent


with the master copy or not. There are two approaches to verify the validity of
cached data:

 The client-initiated approach

 The server-initiated approach


Consistency
Is local copy of cached data is consistent with the master copy or not?

Client-initiated approach
• Client initiates a validity check each time a file/cached data is
accessed
• Checks are initiated at fixed time intervals
• Only first copy of cached data is accessed
• Server checks whether the cached data is consistent with the master
copy
When access and validity check is coupled ,it is delayed as compared
to cache access.
Consistency Contd…

Server-initiated approach
• For each client, server records cached files

• Server reacts when it detects a potential inconsistency

• When a file is opened, server must be notified, whether file is

opened in read or write mode.


Thank You

You might also like