Unit-5.2 Distributed File System (DFS)
Unit-5.2 Distributed File System (DFS)
Unit-5 / Dr Komarasamy G 2
Distributed File System (DFS)
• A Distributed File System (DFS) as the name suggests, is a file
system that is distributed on multiple file servers or multiple
locations. It allows programs to access or store isolated files as
they do with the local ones, allowing programmers to access files
from any network or computer.
• The main purpose of the Distributed File System (DFS) is to
allows users of physically distributed systems to share their
data and resources by using a Common File System.
• A collection of workstations and mainframes connected by a
Local Area Network (LAN) is a configuration on Distributed File
System. A DFS is executed as a part of the operating system.
• In DFS, a namespace is created and this process is transparent for
the clients.
Unit-5 / Dr Komarasamy G 3
Distributed File System (DFS)
• DFS has two components:
• Location Transparency
• Location Transparency achieves through the namespace
component.
• Redundancy
• Redundancy is done through a file replication component.
• In the case of failure and heavy load, these components together
improve data availability by allowing the sharing of data in
different locations to be logically grouped under one folder,
which is known as the “DFS root”.
• It is not necessary to use both the two components of DFS
together, it is possible to use the namespace component without
using the file replication component and it is perfectly possible to
use the file replication component without using the namespace
component between servers.
Unit-5 / Dr Komarasamy G 4
Distributed File System (DFS)
• File system replication:
• Early iterations of DFS made use of Microsoft’s File Replication
Service (FRS), which allowed for straightforward file replication
between servers. The most recent iterations of the whole file are
distributed to all servers by FRS, which recognizes new or
updated files.
• “DFS Replication” was developed by Windows Server 2003 R2
(DFSR). By only copying the portions of files that have changed
and minimising network traffic with data compression, it helps to
improve FRS. Additionally, it provides users with flexible
configuration options to manage network traffic on a
configurable schedule.
Unit-5 / Dr Komarasamy G 5
Features of DFS
• Transparency :
– Structure transparency
There is no need for the client to know about the number or locations of
file servers and the storage devices. Multiple file servers should be
provided for performance, adaptability, and dependability.
– Access transparency
Both local and remote files should be accessible in the same manner. The
file system should be automatically located on the accessed file and send
it to the client’s side.
– Naming transparency
There should not be any hint in the name of the file to the location of the
file. Once a name is given to the file, it should not be changed during
transferring from one node to another.
– Replication transparency
If a file is copied on multiple nodes, both the copies of the file and their
locations should be hidden from one node to another.
Unit-5 / Dr Komarasamy G 6
Features of DFS
• User mobility :
It will automatically bring the user’s home directory to the node where the
user logs in.
• Performance :
Performance is based on the average amount of time needed to convince the
client requests. This time covers the CPU time + time taken to access
secondary storage + network access time. It is advisable that the performance
of the Distributed File System be similar to that of a centralized file system.
• Simplicity and ease of use :
The user interface of a file system should be simple and the number of
commands in the file should be small.
• High availability :
A Distributed File System should be able to continue in case of any partial
failures like a link failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different
and independent file servers for controlling different and independent storage
devices.
Unit-5 / Dr Komarasamy G 7
Features of DFS
• Scalability :
Since growing the network by adding new machines or joining two networks
together is routine, the distributed system will inevitably grow over time. As a
result, a good distributed file system should be built to scale quickly as the
number of nodes and users in the system grows. Service should not be
substantially disrupted as the number of nodes and users grows.
• High reliability :
The likelihood of data loss should be minimized as much as feasible in a
suitable distributed file system. That is, because of the system’s unreliability,
users should not feel forced to make backup copies of their files. Rather, a file
system should create backup copies of key files that can be used if the
originals are lost. Many file systems employ stable storage as a high-reliability
strategy.
• Data integrity :
Multiple users frequently share a file system. The integrity of data saved in a
shared file must be guaranteed by the file system. That is, concurrent access
requests from many users who are competing for access to the same file must
be correctly synchronized using a concurrency control method. Atomic
transactions are a high-level concurrency management mechanism for data
integrity that is frequently offered
Unit-5 /to users byGa file system.
Dr Komarasamy 8
Features of DFS
• Security :
A distributed file system should be secure so that its users may trust that their
data will be kept private. To safeguard the information contained in the file
system from unwanted & unauthorized access, security mechanisms must be
implemented.
• Heterogeneity :
Heterogeneity in distributed systems is unavoidable as a result of huge scale.
Users of heterogeneous distributed systems have the option of using multiple
computer platforms for different purposes.
Unit-5 / Dr Komarasamy G 9
Distributed File System (DFS)
• History :
• The server component of the Distributed File System was initially
introduced as an add-on feature. It was added to Windows NT 4.0
Server and was known as “DFS 4.1”.
• Then later on it was included as a standard component for all editions
of Windows 2000 Server.
• Client-side support has been included in Windows NT 4.0 and also in
later on version of Windows.
• Linux kernels 2.6.14 and versions after it come with an SMB client VFS
known as “cifs” which supports DFS. Mac OS X 10.7 (lion) and onwards
supports Mac OS X DFS.
Unit-5 / Dr Komarasamy G 10
Distributed File System (DFS)
• Properties:
• File transparency: users can access files without knowing where they
are physically stored on the network.
• Load balancing: the file system can distribute file access requests
across multiple computers to improve performance and reliability.
• Data replication: the file system can store copies of files on multiple
computers to ensure that the files are available even if one of the
computers fails.
• Security: the file system can enforce access control policies to ensure
that only authorized users can access files.
• Scalability: the file system can support a large number of users and a
large number of files.
• Concurrent access: multiple users can access and modify the same file
at the same time.
Unit-5 / Dr Komarasamy G 11
Distributed File System (DFS)
• Fault tolerance: the file system can continue to operate even if one or
more of its components fail.
• Data integrity: the file system can ensure that the data stored in the
files is accurate and has not been corrupted.
• File migration: the file system can move files from one location to
another without interrupting access to the files.
• Data consistency: changes made to a file by one user are immediately
visible to all other users.
• Support for different file types: the file system can support a wide
range of file types, including text files, image files, and video files.
Unit-5 / Dr Komarasamy G 12
Distributed File System (DFS)
• Applications :
• NFS –
NFS stands for Network File System. It is a client-server architecture
that allows a computer user to view, store, and update files remotely.
The protocol of NFS is one of the several distributed file system
standards for Network-Attached Storage (NAS).
• CIFS –
CIFS stands for Common Internet File System. CIFS is an accent of SMB.
That is, CIFS is an application of SIMB protocol, designed by Microsoft.
• SMB –
SMB stands for Server Message Block. It is a protocol for sharing a file
and was invented by IMB. The SMB protocol was created to allow
computers to perform read and write operations on files to a remote
host over a Local Area Network (LAN). The directories present in the
remote host can be accessed via SMB and are called as “shares”.
Unit-5 / Dr Komarasamy G 13
Distributed File System (DFS)
• Hadoop –
Hadoop is a group of open-source software services. It gives a
software framework for distributed storage and operating of big
data using the MapReduce programming model. The core of
Hadoop contains a storage part, known as Hadoop Distributed
File System (HDFS), and an operating part which is a MapReduce
programming model.
• NetWare –
NetWare is an abandon computer network operating system
developed by Novell, Inc. It primarily used combined multitasking
to run different services on a personal computer, using the IPX
network protocol.
Unit-5 / Dr Komarasamy G 14
Distributed File System (DFS)
• Working of DFS :
• There are two ways in which DFS can be implemented:
• Standalone DFS namespace –
• It allows only for those DFS roots that exist on the local computer and are not
using Active Directory. A Standalone DFS can only be acquired on those
computers on which it is created. It does not provide any fault liberation and
cannot be linked to any other DFS. Standalone DFS roots are rarely come
across because of their limited advantage.
• Domain-based DFS namespace –
• It stores the configuration of DFS in Active Directory, creating the DFS
namespace root accessible at
\\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Unit-5 / Dr Komarasamy G 15
Distributed File System (DFS)
• Working of DFS :
Unit-5 / Dr Komarasamy G 16
Distributed File System (DFS)
• Advantages :
• DFS allows multiple user to access or store the data.
• It allows the data to be share remotely.
• It improved the availability of file, access time, and network efficiency.
• Improved the capacity to change the size of the data and also improves the
ability to exchange the data.
• Distributed File System provides transparency of data even if server or disk
fails.
• Disadvantages :
• In Distributed File System nodes and connections needs to be secured
therefore we can say that security is at stake.
• There is a possibility of lose of messages and data in the network while
movement from one node to another.
• Database connection in case of Distributed File System is complicated.
• Also handling of the database is not easy in Distributed File System as
compared to a single user system.
• There are chances that overloading will take place if all nodes tries to send
data at once. Unit-5 / Dr Komarasamy G 17
File Models in Distributed System
• File Models in Distributed Systems. In Distributed File Systems (DFS), multiple
machines are used to provide the file system’s facility. Different file systems
often employ different conceptual models. The models based on structure and
mobility are commonly used for the modeling of files.
Unit-5 / Dr Komarasamy G 18
File Models in Distributed System
• There are two types of file models:
• Unstructured and Structured Files
• Mutable and Immutable Files
• Based on the structure criteria, file models are of two types:
• 1. Unstructured Files:
• It is the simplest and most commonly used model. A file is a collection of an
unstructured sequence of data in the unstructured model. There is no
substructure associated with it. The data and structure of each file available in
the file system is an uninterrupted sequence of bytes as it relies on the
application used like UNIX or DOS.
• Most modern OS prefers to use the unstructured file model instead of the
structured file model because of sharing of files by different applications. It
follows no structure so different applications can interpret in different ways.
Unit-5 / Dr Komarasamy G 19
File Models in Distributed System
• 2. Structured Files:
• The rarely used file model now is the Structured file model. Here in the
structured file model, the file system sees a file consisting of a collection of a
sequence of records in order. Files exhibit different types, different sizes, and
different properties. It can also be possible that records of different files
belonging to the same file system are of variant sizes.
• Files possess different properties despite they belong to the same file system.
The smallest unit of data that can be retrieved is termed a record. The read or
write operations are performed on a set of records.
• In a structured files system, there are various “File Attributes” available, which
describe the file. Each attribute consists of a name with its value. File
attributes rely on the file system used. It contains information regarding files,
file size, file owner, date of last modification, date of file creation, access
permission, and date of last access. The Directory Service facility is used to
maintain file attributes because of the varying access permissions.
Unit-5 / Dr Komarasamy G 20
File Models in Distributed System
• The structured files further consist of two types:
• Files with Non-Indexed records:
• In files with non-indexed records, the retrieving of records is performed
concerning a position in the file. For example third record from the beginning,
the third record from the last/end.
• Files with Indexed records:
• In files with indexed records, one or more key fields exist in each record, each
of which can be addressed by providing its value. To locate records fast, a file
is maintained as a B-tree or other equivalent data structure or hash table.
Unit-5 / Dr Komarasamy G 21
File Models in Distributed System
• Based on the modifiability criteria, file models are of two types:
• 3. Mutable Files: The mutable file model is used by the existing OS. The
existing contents of a file get overwritten by the new contents after file
updating. As the same file gets updated again and again after writing new
contents so a file is described as a single sequence of records.
• 4. Immutable Files: Cedar File System uses the Immutable file model. In the
immutable file model, the file cannot be changed once it has been
created. The file can only be deleted after its creation. To implement file
updates, multiple versions are created of the same file. Every time a new
version of the file is created when a file is updated. There is consistent sharing
in this file model because of the sharing of only immutable files. Distributed
Systems support caching and replication schemes and hence, overcome the
limitation to maintain consistency of multiple copies.
Unit-5 / Dr Komarasamy G 22
File Models in Distributed System
• Drawbacks of using the Immutable file model- increase in space utilization and
increase in disk allocation activity. CFS employs the “Keep” parameter to
maintain the no. of the current version of the file.
• When the value of the parameter is 1 then it causes the creation of a new file
version. The existing version gets deleted and the disk space is reused for
another one.
• When the value of the parameter is greater than 1 then that refers to the
existence of multiple versions of a file.
• The specific version of a file can be accessed by mentioning its full name.
• In case the version number is not mentioned then CFS uses the lowest version
number for the implementation of operations like the “delete” operation and
the highest version number for the other operations like the “open”
operation.
Unit-5 / Dr Komarasamy G 23
File Accessing models in Distributed System
• Distributed File Systems (DFS), multiple machines are used to provide the file
system’s facility.
• Different file system utilize different conceptual models of a file. The two
most usually involved standards for file modeling are structure and
modifiability. File models in view of these standards are described below.
Unit-5 / Dr Komarasamy G 24
File Accessing models in Distributed System
• File Accessing Models:
• The file accessing model basically to depends on the unit of data
access/Transfer
• The method utilized for accessing to remote files
• Based on the unit of data access, following file access models may be utilized
to get to the particular file.
• 1. File-level transfer model: In file level transfer model, the all out document
is moved while a particular action requires the document information to be
sent the whole way through the circulated registering network among client
and server. This model has better versatility and is proficient.
• 2. Block-level transfer model: In the block-level transfer model, record
information travels through the association among client and a server is
accomplished in units of document blocks. Thus, the unit of information move
in block-level transfer model is document blocks. The block-level transfer
model might be used in dispersed figuring climate containing a few diskless
workstations.
Unit-5 / Dr Komarasamy G 25
File Accessing models in Distributed System
• 3. Byte-level transfer model: In the byte-level transfer model, record
information moves the association among client and a server is accomplished
in units of bytes. In this way, the unit of information move in byte-level
exchange model is bytes.
• The byte-level exchange model offers more noteworthy versatility in contrast
with the other record move models since, it licenses recuperation and limit of
a conflicting progressive sub range of a document. The significant hindrance to
the byte-level exchange model is the trouble in store organization because of
the variable-length information for different access requests.
• 4. Record-level transfer model: The record-level file transfer model might be
used in the document models where the document contents are organized as
records.
• In record-level exchange model, document information travels through the
organization among client and a server is accomplished in units of records. The
unit of information move in record-level transfer model is record.
Unit-5 / Dr Komarasamy G 26
File Accessing models in Distributed System
• The Method Utilizes for Accessing Remote Files:
• A distributed file system might utilize one of the following models to service a
client’s file access request when the accessed to file is remote:
• 1. Remote service model: Handling of a client’s request is performed at the
server’s hub. Thusly, the client’s solicitation for record access is passed across
the organization as a message on to the server, the server machine plays out
the entrance demand, and the result is shipped off the client. Need to restrict
the amount of messages sent and the vertical per message.
• Remote access is taken care of across the organization so it is all the slower.
• Increase server weight and organization traffic. Execution undermined.
• Transmission of series of responses to explicit solicitation prompts higher
organization overhead.
• For staying aware of consistency correspondence among client and server is
there to have a specialist copy predictable with clients put away data.
• Remote assistance better when essential memory is close to nothing.
• It is only an augmentation of neighborhood record system interface across the
network.
Unit-5 / Dr Komarasamy G 27
File Accessing models in Distributed System
• 2. Data-caching model: This model attempts to decrease the organization
traffic of the past model by getting the data got from the server center. This
exploits the region part of the found in record gets to. A replacement
methodology, for instance, LRU is used to keep the store size restricted.
• Remote access can be served locally so that access can be quicker.
• Network traffic, server load is reduced. Further develops versatility.
• Network over head is less when transmission of huge of information in
comparison to remote service.
• For keeping up with consistency, if less writes then better performance in
maintaining consistency ,if more frequent writes then poor performance.
• Caching is better for machines with disk or large main memory.
• Lower level machine interface is different from upper level UI(user interface).
• Benefit of Data-caching model over the Remote service model:
• The data -catching model offers the opportunity for expanded execution and
greater system versatility since it diminishes network traffic, conflict for the
network, and conflict for the document servers. Hence almost all distributed
file systems implement some form of caching.
Unit-5 / Dr Komarasamy G 28
Reference
• https://www.geeksforgeeks.org/what-is-dfsdistributed-file-system/
• https://www.geeksforgeeks.org/file-accessing-models-in-distributed-system
/?ref=lbp
Unit-5 / Dr Komarasamy G 29