3distributed File System

Distributed File Systems (DFS) enable clients to access files without needing to know their physical location, ensuring network transparency and high availability. The architecture includes file servers, clients, and services like name servers and cache managers to optimize performance and data access. DFS supports various file sharing semantics and replication strategies to maintain consistency and reliability across distributed environments.

Uploaded by

dipeshjohree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views42 pages

3distributed File System

Uploaded by

dipeshjohree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Distributed File System

DISTRIBUTED FILE SYSTEMS

Clients, servers, and storage are dispersed across machines. Configuration and
implementation may vary –
a) Servers may run on dedicated machines, OR
b) Servers and clients can be on the same machines.
c) The OS itself can be distributed (with the file system a part of that
distribution).
d) A distribution layer can be interposed between a conventional OS and
the file system.
Clients should view a DFS the same way they would a centralized FS; the
distribution is hidden at a lower level.
Performance is concerned with throughput and response time.
Goals
1 Network transparency: users do not have to aware the location of files to
access them
– location transparency: the name of a file does not reveal any kind of the file's
physical storage location.
• /server1/dir1/dir2/X
• server1 can be moved anywhere
– location independence: the name of a file does not need to be changed when
the file's physical storage location changes.
• The above file X cannot moved to server2 if server1 is full and server2 is
not so full.
2 High availability: system failures or scheduled activities such as backups,
addition of nodes
Architecture
• Computation model
– file servers -- machines dedicated to storing files and
performing storage and retrieval operations (for high
performance)
– clients -- machines used for computational activities may have
a local disk for caching remote files
• Two most important services
– name server -- maps user specified names to stored objects,
files and directories
– cache manager -- to reduce network delay, disk delay
problem: inconsistency
• Typical data access actions
– open, close, read, write, etc.
Data access in a distributed system
Client side
• Client request to access data
• Check client cache
– Data present Return to client
– Else Check local disk
• Data present Return to client
• Else Send request to file server through n/w
Server side
• Check server cache
– Data present ; load data to client cache through
n/w; return to client.
– Else Issue disk read; Local server cache; load data
to client cache through n/w; return to client.
Distributed File system design
File service vs. file server
– File service interface: the specification of what
the file system offers to its clients.
– Implemented by a user/kernel process called file
server
– File server: a process that runs on some machine
and helps implement the file service.
– A system may have one or several file servers
running at the same time
Remote Files
• What is a file?
– Uninterpreted sequence of bytes
– Can be structured as a sequence of records
• Files can have attributes
– Owner, size, creation date and access permissions
• File model
– Files can be modified or Immutable files
• File Protection
– Capability
• organized by rows and have the information about each user and
the objects that can have access by them and in what ways.
– Access control list
• The information stored is about the users who are allowed to
perform operations on each object. It is a list of < user, privilege>
pairs
• File Service Model
– upload/download model
– files move between server and clients, few operations (read
file & write file), simple, requires storage at client, good if
whole file is accessed
– remote access model
– files stay at server, reach interface for many operations,
less space at client, efficient for small accesses
Directory Service

The directory service

– creating and deleting directories
– naming and renaming files
– moving files
Clients can have the same view (global root directory)
or different views of the file system (remote mounting)
Hierarchical File system
Directory Trees
Directory Graphs
Naming and Name Resolution
• a name space -- collection of names
• name resolution -- mapping a name to an object
– same or different view of a directory hierarchy
• 3 traditional ways to name files in a distributed environment
– concatenate the host name to the names of files stored on that host:
system-wide uniqueness guaranteed, simple to located a file; however,
not network transparent, not location independent, e.g.,
/machine/usr/foo
– mount remote directories onto local directories:
once mounted, files can be referenced in a location-transparent manner
– provide a single global directory:
requires a unique file name for every file, location independent,
cannot encompass heterogeneous environments and wide geographical
areas
Two-Level Naming
•Symbolic name (external), e.g. prog.c; binary name (internal), e.g. local i-
node number as in Unix
•Directories provide the translation from symbolic to binary names
•Binary name format
– i-node: no cross references among servers
– (server, i-node): a directory in one server can refer to a file on a
different server
– Capability specifying address of server, number of file, access
permissions, etc
– {binary_name+}: binary names refer to the original file and all of its
backups
File Sharing
File Sharing Semantics
UNIX semantics:
•Total ordering of R/W events
•Value read is the value stored by last write
•Writes to an open file are visible immediately to
others that have this file opened at the same time.
•Easy to achieve in a non-distributed system ; In a
distributed system with one server and multiple
clients with no caching at client.
Session semantics:
•Writes to an open file by a user is not visible immediately by
other users that have files opened already. Once a file is closed,
the changes made by it are visible by sessions started later.
•Writes are guaranteed to become visible only when the file is
close
•Allow caching at client with lazy updating -> better performance
•If two or more clients simultaneously write: one file (last one or
non-deterministically) replaces the other
•Immutable files:
create and read file operations (no write) - i.e. a
sharable file cannot be modified.
File names cannot be reused and its contents may not
be altered.
Simple to implement.
Writing a file means to create a new one and enter it
into the directory replacing the previous one with the
same name: atomic operations
Collision in writing: last copy or nondeterministically
Transaction semantics:
• mutual exclusion on file accesses; either all file operations are
completed or none is. Good for banking systems

• All changes have all-or-nothing property. W1,R1,R2,W2 not

allowed where P1 = W1;W2 and P2 = R1;R2
Distributed File system
Implementation
• System Structure
– Clients and servers on different machines?
• Combine File and directory services
• Keep them separate
– Lookups
• Iterative lookup
• Automatic lookup
Stateless vs. Stateful

Stateless Server Stateful Servers


requests are self-contained 
shorter messages

better fault tolerance 
better performance (info in memory until close)

open/close at client (fewer msgs) 
open/close at server

no space reserved for tables

thus, no limit of open files

file locking possible

no problem if client crashes

read ahead possible
Caching
Four places to store files
– server’s disk: slow performance
• eliminates coherence problem
– server caching: in main memory
• cache management issue, how much to
cache, replacement strategy
• still slow due to network delay
• Used in high-performance web-search
engine servers
– Client caching in main memory
• can be used by diskless workstation
• faster to access from main memory than disk
• compete with the virtual memory system for physical
memory space
• avoids disk access but still network access
Three Options
• inside each process address space: no sharing at client
• in the kernel: kernel involvement on hits
• in a separate user-level cache manager: flexible and
efficient if paging can be controlled from user-level
– client-cache on a local disk
• large files can be cached
• the virtual memory management is simpler
• a workstation can function even when it is
disconnected from the network
Update algorithms for client caching

• write-through:
– all writes are carried out immediately
– writes sent to the server as soon as they are performed at the client ->
high traffic, requires cache managers to check (modification time) with
server before can provide cached content to any client
– Reliable: little information is lost in the event of a client crash
– Slow: cache not that useful
• delayed-write:
– delays writing at the server
– coalesces multiple writes; better performance but ambiguous semantics
– possible to perform many writes to a block in the cache before it is
written
– if data is written and then deleted immediately, data need not be
written at all (20-30 % of new data is deleted with 30 secs)
• write-on-close:
– delay writing until the file is closed at the client
– Implements session semantics
– if file is open for short duration, works fine
– if file is open for long, susceptible to losing data in the event of client
crash
• Central control:
– file server keeps a directory of open/cached files at clients -> Unix
semantics, but problems with robustness and scalability; problem also
with invalidation messages because clients did not solicit them
Cache Coherence
• How to maintain consistency between locally cached data with
the master data when the data has been modified by another
client?
1 Client-initiated approach -- check validity on every access:
too much overhead first access to a file (e.g., file open)
every fixed time interval
2 Server-initiated approach -- server records, for each client,
the (parts of) files it caches. After the server detects a
potential inconsistency, it reacts.
3 Not allow caching when concurrent-write sharing occurs.
Allow many readers. If a client opens for writing, inform all
the clients to purge their cached data.
• Potential inconsistency:
– In session semantics, a client closes a modified
file.
– In UNIX semantics, the server must be notified
whenever a file is opened and the intended mode
(read or write mode) must be indicated for every
open.
– Disable cache when a file is opened in conflicting
modes.
Replication
File Replication
•Multiple copies are maintained, each copy on a
separate file server
• Reasons:
To improve reliability; availability and performance.

•Replication transparency
– explicit file replication: programmer controls
replication
– lazy file replication: copies made by the server in
background
– use group communication: all copies made at the
same time in the foreground
Explicit File Replication

File : 1.14 : 2.16 : 3.19

S1
Pl : 1.2.1 : 2.43 : 3.41

C
S2

S3
Lazy replication

S2
C

S3
File replication using Group

S2
C

S3
Update protocols
• Primary Copy Replication

• Voting

• Voting with ghosts

Modifying Replicas: Voting
Protocol
•Updating all replicas using a coordinator works but is not robust (if
coordinator is down, no updates can be performed) => Voting: updates (and
reads) can be performed if some specified # of servers agree.
•Voting Protocol:
– A version # (incremented at write) is associated with each file
– To perform a read, a client has to assemble a read quorum of Nr
servers; similarly, a write quorum of Nw servers for a write
– If Nr + Nw > N, then any read quorum will contain at least one most
recently updated file version
– For reading, client contacts Nr active servers and chooses the file with
largest version #
– For writing, client contacts Nw active servers asking them to write.
Succeeds if they all say yes.
Voting Protocol with Ghosts
• Nr is usually small (reads are frequent), but Nw is usually close
to N (want to make sure all replicas are updated). Problem with
achieving a write quorum in the presence of server failures
• Voting with ghosts: allows to establish a write quorum when
several servers are down by temporarily creating dummy (ghost)
servers (at least one must be real)
• Ghost servers are not permitted in a read quorum (they don’t
have any files)
• When server comes back it must restore its copy first by
obtaining a read quorum
Features of Distributed File System (DFS)
• Transparency:
– Structure transparency: There is no need for the client to know about
the number or locations of file servers and the storage devices.
Multiple file servers should be provided for performance, adaptability,
and dependability.
– Access transparency: Both local and remote files should be accessible
in the same manner. The file system should be automatically located
on the accessed file and send it to the client’s side.
– Naming transparency: There should not be any hint in the name of
the file to the location of the file. Once a name is given to the file, it
should not be changed during transferring from one node to another.
– Replication transparency: If a file is copied on multiple nodes, both
the copies of the file and their locations should be hidden from one
node to another.
• User mobility: It will automatically bring the user’s home directory to the
node where the user logs in.
• Performance: Performance is based on the average amount of time
needed to convince the client requests. This time covers the CPU time +
time taken to access secondary storage + network access time. It is
advisable that the performance of the Distributed File System be similar
to that of a centralized file system.
• Simplicity and ease of use: The user interface of a file system should be
simple and the number of commands in the file should be small.
• High availability: A Distributed File System should be able to continue in
case of any partial failures like a link failure, a node failure, or a storage
drive crash.
History
• The server component of the Distributed File System was initially
introduced as an add-on feature. It was added to Windows NT 4.0 Server
and was known as “DFS 4.1”. Then later on it was included as a standard
component for all editions of Windows 2000 Server. Client-side support
has been included in Windows NT 4.0 and also in later on version of
Windows.
• Linux kernels 2.6.14 and versions after it come with an SMB client VFS
known as “cifs” which supports DFS. Mac OS X 10.7 (lion) and onwards
supports Mac OS X DFS.
Applications
• NFS: NFS stands for Network File System. It is a client-server architecture
that allows a computer user to view, store, and update files remotely. The
protocol of NFS is one of the several distributed file system standards for
Network-Attached Storage (NAS).
• SMB: SMB stands for Server Message Block. It is a protocol for sharing a
file and was invented by IBM. The SMB protocol was created to allow
computers to perform read and write operations on files to a remote host
over a Local Area Network (LAN). The directories present in the remote
host can be accessed via SMB and are called as “shares”.
• CIFS: CIFS stands for Common Internet File System. CIFS is an accent of
SMB. That is, CIFS is an application of SMB protocol, designed by
Microsoft.
• Hadoop: Hadoop is a group of open-source software services. It gives a
software framework for distributed storage and operating of big data
using the MapReduce programming model. The core of Hadoop contains
a storage part, known as Hadoop Distributed File System (HDFS), and an
operating part which is a MapReduce programming model.
• NetWare: NetWare is an abandon computer network operating system
developed by Novell, Inc. It primarily used combined multitasking to run
different services on a personal computer, using the Internetwork Packet
Exchange (IPX) network protocol.

WINSEM2012-13 CP0029 06-Mar-2013 RM01 DFT 2
No ratings yet
WINSEM2012-13 CP0029 06-Mar-2013 RM01 DFT 2
46 pages
Distributed File System Requirements
No ratings yet
Distributed File System Requirements
4 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
9 pages
Dos
No ratings yet
Dos
86 pages
Distributed File System Overview
100% (1)
Distributed File System Overview
30 pages
Unit-3 Part1
No ratings yet
Unit-3 Part1
57 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
47 pages
Understanding Distributed File Systems
No ratings yet
Understanding Distributed File Systems
42 pages
DISTRIBUTEDFILESYS
No ratings yet
DISTRIBUTEDFILESYS
16 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
18 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
16 pages
Client-Server Architecture Overview
No ratings yet
Client-Server Architecture Overview
44 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
No ratings yet
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
51 pages
Distributed File System
No ratings yet
Distributed File System
68 pages
Distributed File Systems
No ratings yet
Distributed File Systems
28 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
Distributed Computing
No ratings yet
Distributed Computing
19 pages
CSCI319 Distributed Systems
No ratings yet
CSCI319 Distributed Systems
26 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
File System Design & Access Methods
No ratings yet
File System Design & Access Methods
40 pages
DFS Design and Implementation: Brent R. Hafner
No ratings yet
DFS Design and Implementation: Brent R. Hafner
40 pages
Distributed File Systems
No ratings yet
Distributed File Systems
31 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
35 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
Distributed File Systems
No ratings yet
Distributed File Systems
50 pages
DOS Unit 4
No ratings yet
DOS Unit 4
26 pages
Overview of LFS, NFS, and AFS Systems
No ratings yet
Overview of LFS, NFS, and AFS Systems
37 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
38 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
Oschapter 8
No ratings yet
Oschapter 8
27 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
28 pages
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
No ratings yet
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
27 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
6 pages
Distributed Systems U4
No ratings yet
Distributed Systems U4
8 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
6 pages
Lecture 08
No ratings yet
Lecture 08
25 pages
Distributed File Systems Guide
No ratings yet
Distributed File Systems Guide
27 pages
Caching in Distributed File System: Ke Wang CS614 - Advanced System Apr 24, 2001
No ratings yet
Caching in Distributed File System: Ke Wang CS614 - Advanced System Apr 24, 2001
56 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
5.distributed File System
No ratings yet
5.distributed File System
86 pages
CH 8file System
No ratings yet
CH 8file System
25 pages
DFS-Based Railway Reservation
No ratings yet
DFS-Based Railway Reservation
8 pages
Distributed Computing
No ratings yet
Distributed Computing
37 pages
An Assignment On Maintaining Cache Consistency: Group - 6 Roll: 160103, 160109 and 160137
No ratings yet
An Assignment On Maintaining Cache Consistency: Group - 6 Roll: 160103, 160109 and 160137
4 pages
Ds 2016 17 Lec17
No ratings yet
Ds 2016 17 Lec17
32 pages
Module 2
No ratings yet
Module 2
27 pages
CS2510 00 Distributed Storage Overview
No ratings yet
CS2510 00 Distributed Storage Overview
53 pages
Chapter 2 (II) Distributed System
No ratings yet
Chapter 2 (II) Distributed System
80 pages
Overview of Distributed File Systems
No ratings yet
Overview of Distributed File Systems
37 pages
Understanding Distributed Filesystems
No ratings yet
Understanding Distributed Filesystems
7 pages
Operating System
No ratings yet
Operating System
40 pages
Final Suggestions Dos - 605B
No ratings yet
Final Suggestions Dos - 605B
17 pages
Atria Institute of Technology: File System Mounting and File Sharing
No ratings yet
Atria Institute of Technology: File System Mounting and File Sharing
24 pages
International Islamic University, Islamabad: (Course Registration / Permission Form For SUM-2024)
No ratings yet
International Islamic University, Islamabad: (Course Registration / Permission Form For SUM-2024)
1 page
Alright - I
No ratings yet
Alright - I
8 pages
The Concept of Corporate Strategy
100% (2)
The Concept of Corporate Strategy
15 pages
Qassim Date Tree
No ratings yet
Qassim Date Tree
337 pages
Characteristics of a Brilliant Teacher
No ratings yet
Characteristics of a Brilliant Teacher
3 pages
Aip 2025
No ratings yet
Aip 2025
14 pages
Full
No ratings yet
Full
144 pages
Exampro Methods Solutions
No ratings yet
Exampro Methods Solutions
536 pages
Variscan Front Units Redefined
No ratings yet
Variscan Front Units Redefined
7 pages
Industrial Attachment Report
No ratings yet
Industrial Attachment Report
18 pages
Bacilus Coagulans Spores
No ratings yet
Bacilus Coagulans Spores
13 pages
JP Morgan - Global Report
100% (1)
JP Morgan - Global Report
88 pages
Quincy Selected Paintings
100% (1)
Quincy Selected Paintings
56 pages
Jazz in Autumn
100% (11)
Jazz in Autumn
16 pages
Tle7 Ia s5 SD Aban 2-6-24 Edited Final
No ratings yet
Tle7 Ia s5 SD Aban 2-6-24 Edited Final
37 pages
Understanding API SIRE Reading 1 Part 2 of 2
No ratings yet
Understanding API SIRE Reading 1 Part 2 of 2
54 pages
Sri Jayalakshmi Transport - CH1
No ratings yet
Sri Jayalakshmi Transport - CH1
29 pages
Composition Scheme Tax Guidelines
No ratings yet
Composition Scheme Tax Guidelines
14 pages
Lesson 4 - Shopping
No ratings yet
Lesson 4 - Shopping
11 pages
Dark Knight Script
No ratings yet
Dark Knight Script
5 pages
Section 9.2 Hyperbolas
No ratings yet
Section 9.2 Hyperbolas
20 pages
North Zone
No ratings yet
North Zone
70 pages
Calibration ISO 16949
No ratings yet
Calibration ISO 16949
4 pages
Week-2 ELEC4 IntAcc
No ratings yet
Week-2 ELEC4 IntAcc
18 pages
HR Policy RMG2
No ratings yet
HR Policy RMG2
7 pages
Development of Hypothesis
No ratings yet
Development of Hypothesis
18 pages
Microsoft PSS Division Service Strategy
No ratings yet
Microsoft PSS Division Service Strategy
3 pages
Fast Animal Detection in Uav Images Using Convolutional Neural Networks
No ratings yet
Fast Animal Detection in Uav Images Using Convolutional Neural Networks
4 pages
IAS 32 Financial Instruments Guidance
No ratings yet
IAS 32 Financial Instruments Guidance
34 pages
Patsaku High School Bill
No ratings yet
Patsaku High School Bill
1 page

3distributed File System

Uploaded by

3distributed File System

Uploaded by

Distributed File System

DISTRIBUTED FILE SYSTEMS

The directory service

• All changes have all-or-nothing property. W1,R1,R2,W2 not

Stateless Server Stateful Servers

File : 1.14 : 2.16 : 3.19

• Voting with ghosts

You might also like