Distributed File System
Distributed File System
In this article, you will learn about the distributed file system in the operating system and its features,
components, advantages, and disadvantages.
DFS's primary goal is to enable users of physically distributed systems to share resources and information
through the Common File System (CFS). It is a file system that runs as a part of the operating systems. Its
configuration is a set of workstations and mainframes that a LAN connects. The process of creating a
namespace in DFS is transparent to the clients.
DFS has two components in its services, and these are as follows:
1. Local Transparency
2. Redundancy
Local Transparency
Redundancy
In the case of failure or heavy load, these components work together to increase data availability by allowing
data from multiple places to be logically combined under a single folder known as the "DFS root".
It is not required to use both DFS components simultaneously; the namespace component can be used
without the file replication component, and the file replication component can be used between servers
without the namespace component.
Features
There are various features of the DFS. Some of them are as follows:
Transparency
1. Structure Transparency
The client does not need to be aware of the number or location of file servers and storage devices. In
structure transparency, multiple file servers must be given to adaptability, dependability, and performance.
2. Naming Transparency
There should be no hint of the file's location in the file's name. When the file is transferred form one node to
other, the file name should not be changed.
3. Access Transparency
Local and remote files must be accessible in the same method. The file system must automatically locate the
accessed file and deliver it to the client.
4. Replication Transparency
When a file is copied across various nodes, the copies files and their locations must be hidden from one node
to the next.
Scalability
The distributed system will inevitably increase over time when more machines are added to the network, or
two networks are linked together. A good DFS must be designed to scale rapidly as the system's number of
nodes and users increases.
Data Integrity
Many users usually share a file system. The file system needs to secure the integrity of data saved in a
transferred file. A concurrency control method must correctly synchronize concurrent access requests from
several users who are competing for access to the same file. A file system commonly provides users with
atomic transactions that are high-level concurrency management systems for data integrity.
High Reliability
The risk of data loss must be limited as much as feasible in an effective DFS. Users must not feel compelled to
make backups of their files due to the system's unreliability. Instead, a file system should back up key files so
that they may be restored if the originals are lost. As a high-reliability strategy, many file systems use stable
storage.
High Availability
A DFS should be able to function in the case of a partial failure, like a node failure, a storage device crash, and
a link failure.
Ease of Use
The UI of a file system in multiprogramming must be simple, and the commands in the file must be minimal.
Performance
The average time it takes to persuade a client is used to assess performance. It must perform similarly to a
centralized file system.
Distributed File System Replication
Initial versions of DFS used Microsoft's File Replication Service (FRS), enabling basic file replication among
servers. FRS detects new or altered files and distributes the most recent versions of the full file to all servers.
Windows Server 2003 R2 developed the "DFS Replication" (DFSR). It helps to enhance FRS by only copying
the parts of files that have changed and reducing network traffic with data compression. It also gives users the
ability to control network traffic on a configurable schedule using flexible configuration options.
The DFS's server component was firstly introduced as an additional feature. When it was incorporated
into Windows NT 4.0 Server, it was called "DFS 4.1". Later, it was declared a standard component of
all Windows 2000 Server editions. Windows NT 4.0 and later versions of Windows have client-side support.
Linux kernels 2.6.14 and later include a DFS-compatible SMB client VFS known as "cifs". DFS is available in
versions Mac OS X 10.7 (Lion) and later.
It does not use Active Directory and only permits DFS roots that exist on the local system. A Standalone DFS
may only be acquired on the systems that created it. It offers no-fault liberation and may not be linked to
other DFS.
It stores the DFS configuration in Active Directory and creating namespace root
at domainname>dfsroot> or FQDN>dfsroot>.
DFS namespace
SMB routes of the form are used in traditional file shares that are linked to a single server.
\\<SERVER>\<path>\<subpath>
Domain-based DFS file share paths are identified by utilizing the domain name for the server's name
throughout the form.
\\<DOMAIN.NAME>\<dfsroot>\<path>
When users access such a share, either directly or through mapping a disk, their computer connects to one of
the accessible servers connected with that share, based on rules defined by the network administrator. For
example, the default behavior is for users to access the nearest server to them; however, this can be changed
to prefer a certain server.
Hadoop
Hadoop is a collection of open-source software services. It is a software framework that uses the MapReduce
programming style to allow distributed storage and management of large amounts of data. Hadoop is made
up of a storage component known as Hadoop Distributed File System (HDFS). It is an operational
component based on the MapReduce programming model.
A client-server architecture enables a computer user to store, update, and view files remotely. It is one of
various DFS standards for Network-Attached Storage.
IBM developed an SMB protocol to file sharing. It was developed to permit systems to read and write files to a
remote host across a LAN. The remote host's directories may be accessed through SMB and are known
as "shares".
NetWare
It is an abandon computer network operating system that is developed by Novell, Inc. The IPX network
protocol mainly used combined multitasking to execute many services on a computer system.
CIFS is an accent of SMB. The CIFS protocol is a Microsoft-designed implementation of the SIMB protocol.
Advantages
There are various advantages of the distributed file system. Some of the advantages are as follows:
1. It allows the users to access and store the data.
2. It helps to improve the access time, network efficiency, and availability of files.
5. It helps to enhance the ability to change the amount of data and exchange data.
Disadvantages
There are various disadvantages of the distributed file system. Some of the disadvantages are as follows:
3. If all nodes try to transfer data simultaneously, there is a chance that overloading will happen.
4. There is a possibility that messages and data would be missed in the network while moving from one
node to another.