2distributed File System Dfs
2distributed File System Dfs
Distributed File System (DFS) as the name suggests, is a file system that is
distributed on multiple file servers or multiple locations. It allows programs to
access or store isolated files as they do with the local ones, allowing
programmers to access files from any network or computer.
the main purpose of the Distributed File System (DFS) is to allows users of
physically distributed systems to share their data and resources by using a
Common File System. A collection of workstations and mainframes connected
by a Local Area Network (LAN) is a configuration on Distributed File System. A
DFS is executed as a part of the operating system. In DFS, a namespace is
created and this process is transparent for the clients.
DFS has two components:
Location Transparency –
Location Transparency achieves through the namespace component.
Redundancy –
Redundancy is done through a file replication component.
In the case of failure and heavy load, these components together improve data availability
by allowing the sharing of data in different locations to be logically grouped under one
folder, which is known as the “DFS root”.
It is not necessary to use both the two components of DFS together, it is possible to use
the namespace component without using the file replication component and it is perfectly
possible to use the file replication component without using the namespace component
between servers.
File system replication:
Early iterations of DFS made use of Microsoft’s File Replication Service (FRS), which
allowed for straightforward file replication between servers. The most recent iterations of
the whole file are distributed to all servers by FRS, which recognizes new or updated
files.
“DFS Replication” was developed by Windows Server 2003 R2 (DFSR). By only
copying the portions of files that have changed and minimizing network traffic with data
compression, it helps to improve FRS. Additionally, it provides users with flexible
configuration options to manage network traffic on a configurable schedule.
Features of DFS :
Transparency :
Structure transparency –
There is no need for the client to know about the number or locations of file
servers and the storage devices. Multiple file servers should be provided for
performance, adaptability, and dependability.
Access transparency –
Both local and remote files should be accessible in the same manner. The
file system should be automatically located on the accessed file and send it
to the client’s side.
Naming transparency –
There should not be any hint in the name of the file to the location of the
file. Once a name is given to the file, it should not be changed during
transferring from one node to another.
Replication transparency –
If a file is copied on multiple nodes, both the copies of the file and their
locations should be hidden from one node to another.
User mobility :
It will automatically bring the user’s home directory to the node where the user logs
in.
Performance :
Performance is based on the average amount of time needed to convince the client
requests. This time covers the CPU time + time taken to access secondary storage +
network access time. It is advisable that the performance of the Distributed File
System be similar to that of a centralized file system.
Simplicity and ease of use :
The user interface of a file system should be simple and the number of commands in
the file should be small.
High availability :
A Distributed File System should be able to continue in case of any partial failures
like a link failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different and
independent file servers for controlling different and independent storage devices.
Scalability :
Since growing the network by adding new machines or joining two networks together
is routine, the distributed system will inevitably grow over time. As a result, a good
distributed file system should be built to scale quickly as the number of nodes and
users in the system grows. Service should not be substantially disrupted as the number
of nodes and users grows.
High reliability :
The likelihood of data loss should be minimized as much as feasible in a suitable
distributed file system. That is, because of the system’s unreliability, users should not
feel forced to make backup copies of their files. Rather, a file system should create
backup copies of key files that can be used if the originals are lost. Many file systems
employ stable storage as a high-reliability strategy.
Data integrity :
Multiple users frequently share a file system. The integrity of data saved in a shared
file must be guaranteed by the file system. That is, concurrent access requests from
many users who are competing for access to the same file must be correctly
synchronized using a concurrency control method. Atomic transactions are a high-
level concurrency management mechanism for data integrity that is frequently offered
to users by a file system.
Security:
A distributed file system should be secure so that its users may trust that their data
will be kept private. To safeguard the information contained in the file system from
unwanted & unauthorized access, security mechanisms must be implemented.
Heterogeneity:
Heterogeneity in distributed systems is unavoidable as a result of huge scale. Users of
heterogeneous distributed systems have the option of using multiple computer
platforms for different purposes.
History:
The server component of the Distributed File System was initially introduced as an add-
on feature. It was added to Windows NT 4.0 Server and was known as “DFS 4.1”. Then
later on it was included as a standard component for all editions of Windows 2000
Server. Client-side support has been included in Windows NT 4.0 and also in later on
version of Windows.
Linux kernels 2.6.14 and versions after it come with an SMB client VFS known as “cifs”
which supports DFS. Mac OS X 10.7 (lion) and onwards supports Mac OS X DFS.
Applications:
NFS –
NFS stands for Network File System. It is a client-server architecture that allows a
computer user to view, store, and update files remotely. The protocol of NFS is one of
the several distributed file system standards for Network-Attached Storage (NAS).
CIFS –
CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is,
CIFS is an application of SIMB protocol, designed by Microsoft.
SMB –
SMB stands for Server Message Block. It is a protocol for sharing a file and was
invented by IMB. The SMB protocol was created to allow computers to perform read
and write operations on files to a remote host over a Local Area Network (LAN). The
directories present in the remote host can be accessed via SMB and are called as
“shares”.
Hadoop –
Hadoop is a group of open-source software services. It gives a software framework
for distributed storage and operating of big data using the MapReduce programming
model. The core of Hadoop contains a storage part, known as Hadoop Distributed File
System (HDFS), and an operating part which is a MapReduce programming model.
NetWare –
NetWare is an abandon computer network operating system developed by Novell, Inc.
It primarily used combined multitasking to run different services on a personal
computer, using the IPX network protocol.
Working of DFS :
There are two ways in which DFS can be implemented:
Standalone DFS namespace –
It allows only for those DFS roots that exist on the local computer and are not using
Active Directory. A Standalone DFS can only be acquired on those computers on
which it is created. It does not provide any fault liberation and cannot be linked to any
other DFS. Standalone DFS roots are rarely come across because of their limited
advantage.
Domain-based DFS namespace –
It stores the configuration of DFS in Active Directory, creating the DFS namespace
root accessible at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Advantages :
DFS allows multiple user to access or store the data.
It allows the data to be share remotely.
It improved the availability of file, access time, and network efficiency.
Improved the capacity to change the size of the data and also improves the ability to
exchange the data.
Distributed File System provides transparency of data even if server or disk fails.
Disadvantages :
In Distributed File System nodes and connections needs to be secured therefore we
can say that security is at stake.
There is a possibility of lose of messages and data in the network while movement
from one node to another.
Database connection in case of Distributed File System is complicated.
Also handling of the database is not easy in Distributed File System as compared to a
single user system.
There are chances that overloading will take place if all nodes tries to send data at
once
A Distributed File System (DFS) is a file system that is distributed on multiple file
servers or multiple locations. It makes the programs to access or to store isolated files
with the local ones, allowing programmers to access files from any network or
computer. It manages files and folders on different computers. It is mainly designed to
provide file storage and access controlled to files over LAN and WAN.
A DFS is also called a client-server architecture based application, which allows the
user or clients to access the data from the server as it is stored in their own computer. It
provides location transparency and redundancy help to improve the data availability.
And also use data replication strategy on multiple servers to prevent data access
failure.
The challenges associated with them compared to traditional file systems are as
follows −
Data redundancy and inconsistency.
Difficulty in accessing data.
Data isolation
Integrity problems
Unauthorized access is not restricted.
It coordinates only physical access.
Components
The components of DFS are as follows −
Block Storage provider
Client Driver
Security provider
Meta- Data Service
Object service.
These components are pictorially represented below −
Features
The features of DFS are as follows −
User mobility
Easy to use
High availability
Performance
Coherent access
Location independence
File locking
Multi-networking access
Local gateways
Multi-protocol access
Example
Given below is an example of DFS Structure −
Benefits
The benefits of DFS are as follows −
Flexibility in storage management − In DFS, storage management is very
flexible and we can easily modify it according to our need.
Load sharing advantage − Load sharing can be done with optimal results using
the DFS. Load sharing is one of the best benefits of DFS.
Security Integration − If we want to implement security then it can be easily
done in the DFS.
Graphical way of Administration − Graphical view of administration window is
available here, which reduces cost in administration training.
High Availability − High availability is also one of the best benefits of DFS. It
keeps all the important data available all the time.
Mechanism for building Distributed file system
Distributed Systems are the systems that make a single system image to users of the
networks. The failure of one system in the network will not be coming to the picture of
all other uses. Here, all the systems act in a dual role such as both client as well as server.
The distributed File system provides a similar abstraction to the users of a distributed
system and makes it convenient for them to use files in a distributed environment.
Characteristic of distributed file system
Remote data/file sharing: It allows a file to be transparently accessed by processes of
any node of the system irrespective of the file’s location. Example: Any process ‘A’
can create the file and share it with other processes ‘B’ or ‘C’ and the same file can be
accessed/modified process running in other nodes.
User mobility: Users in the distributed systems are allowed to work in any system at
any time. So, users need not relocate secondary storage devices in distributed file
systems.
Availability: Distributed file systems keep multiple copies of the same file in multiple
places. Hence, the availability of the distributed file system is high and it maintains a
better fault tolerance for the system.
Data Integrity: A file system is typically shared by several users. Data saved in a
transferred file must have its integrity protected by the file system. The correct
synchronisation of concurrent access requests from multiple users vying for access to
the same file requires a concurrency control method. Atomic transactions, which are
high-level concurrency management systems for data integrity, are frequently made
available to users by file systems.
Performance: Performance is evaluated using the typical amount of time it takes to
persuade a client. It must function in a manner comparable to a centralised file
system.
Diskless workstations: Distributed file systems allow the use of diskless workstations to
reduce noise and heat in the system. Also, diskless workstations are more economical
than disk full workstations.
Desirable features to build distributed file system
Scalable networks: Even with an increase in the number of users in the network, the
performance should remain the same. For example, initially, 100 users are using 100
Mbps bandwidth networks and suddenly system admin increased the number of users
to 150, in that case, the performance of the network remains the same.
Replications: The services should be replicated in many systems to avoid a single point
of failure. For example, an email server should be available in multiple systems to
reach the service to users 24×7.
Openness: The systems with different architecture as well as operating systems can be
connected to the distributed system environment and thus message passing is possible.
The person with a 32-bit system can interact with the person with a 64-bit system
seamless interaction.
Reliable and Availability: The systems should be built with 100% reliability and 100%
availability for the utilization of networks.
Mechanism to build distributed file systems
Use of FIle Models: The DFS uses different conceptual models of a file. The following
are the two basic criteria for file modeling, which include file structure and
modifiability. The files can be unstructured or structured based on the applications
used in file systems. Also, the modifiability of the file can be categorized as mutable
and immutable files.
Use of FIle Accessing Models: A distributed file system may use one of the following
models to service a client’s file access request when the accessed file is a remote file.
There are two such models are there, viz., the Remote service model and the Data-
caching model.
Use of FIle sharing Semantics: A shared file may be simultaneously accessed by multiple
users. The types of file-sharing semantics can be used such as Unix Semantics,
Session Semantics, Immutable shared files semantics, and transaction-like semantics.
Use of FIle -Caching Schemes: Basically following key criteria used in file caching
scheme viz., cache location, modification propagation, and cache validation
Use of FIle Replications: File replication is the primary mechanism for improving file
availability in a distributed systems environment. A replicated file is a file that has
multiple copies with each copy located on a separate file server.
File Service Architecture in Distributed System
A file system is a system where data is stored persistently. In this article, we will see the
concept of File service architecture in distributed systems. The access control mechanism
is used for sharing of data between multiple users. Furthermore, concurrent access is
provided certainly for read-only access.
The following modules are the part of the file system:
Directory module: This module gives the relation to file names by connecting them
to their corresponding file IDs.
File module: This module serves the purpose of relating file IDs to specific files.
Access control module: This module provides the controlling feature by validating
the requested operation.
File access module: This module refers to the reading and writing of file data or
attributes.
Block module: This module accesses and allocates the disk block.
Device module: This module refers to the disk I/O and buffering.
File Service Requirements:
File Service Architecture is an architecture that provides the facility of file accessing by
designing the file service as the following three components:
A client module
A flat file service
A directory service
The implementation of exported interfaces by the client module is carried out by flat-file
and directory services on the server-side.
Let’s discuss the functions of these components in file service architecture in detail.
1. Flat file service: A flat file service is used to perform operations on the contents of a
file. The Unique File Identifiers (UFIDs) are associated with each file in this service. For
that long sequence of bits is used to uniquely identify each file among all of the available
files in the distributed system. When a request is received by the Flat file service for the
creation of a new file then it generates a new UFID and returns it to the requester.
Flat File Service Model Operations:
Read(FileId, i, n) -> Data: Reads up to n items from a file starting at item ‘i’ and
returns it in Data.
Write(FileId, i, Data): Write a sequence of Data to a file, starting at item I and
extending the file if necessary.
Create() -> FileId: Creates a new file with length 0 and assigns it a UFID.
Delete(FileId): The file is removed from the file store.
GetAttributes(FileId) -> Attr: Returns the file’s file characteristics.
SetAttributes(FileId, Attr): Sets the attributes of the file.
2. Directory Service: The directory service serves the purpose of relating file text names
with their UFIDs (Unique File Identifiers). The fetching of UFID can be made by
providing the text name of the file to the directory service by the client. The directory
service provides operations for creating directories and adding new files to existing
directories.
Directory Service Model Operations:
Lookup(Dir, Name) -> FileId : Returns the relevant UFID after finding the text
name in the directory. Throws an exception if Name is not found in the directory.
AddName(Dir, Name, File): Adds(Name, File) to the directory and modifies the
file’s attribute record if Name is not in the directory. If a name already exists in the
directory, an exception is thrown.
UnName(Dir, Name): If Name is in the directory, the directory entry containing
Name is removed. An exception is thrown if the Name is not found in the directory.
GetNames(Dir, Pattern) -> NameSeq: Returns all the text names that match the
regular expression Pattern in the directory.
3. Client Module: The client module executes on each computer and delivers an
integrated service (flat file and directory services) to application programs with the help
of a single API. It stores information about the network locations of flat files and
directory server processes. Here, recently used file blocks hold in a cache at the client-
side, thus, resulting in improved performance.
What are the influential trends responsible for this paradigm shift?
Modern Internet
The modern internet is a vast interconnected collected of computer networks with many
different types, with the range of types increasing all the time. For instance a wide range of
wireless communication technologies such as wifi, wimax, Bluetooth and third generation
mobile phone networks
The net result is that networking has become a pervasive resource and device can be
connected (if desired) at any time and in any place
Ubiquitous computing
Ubiquitous computing is the harnessing of many small, cheap computational devices that
are present in users physical environments, including the home, office and even natural
settings. The term ‘ubiquitous’ is intended to suggest that small computing devices will
eventually become so pervasive in everyday objects that they are scarcely noticed. That is,
their computational behavior will be transparently and intimately ties up with their physical
function.
Device miniaturization and wireless networking led to the integration of portable computing
devices into distributed systems
Devices like laptops, PDA’s, smart phones, tablets, wearable devices, embedded devices
Ubiquitous computing includes the need to deal with variable connectivity and indeed
disconnection
Teleconferencing
Webcasting is the ability to broadcast continuous media, typically audio or video, over the
internet.
It is now commonplace for major sporting or music events to be broadcast in this way, often
attracting large numbers of viewers (for example, the live8 concert in 2005 attracted around
170000 simultaneous users at its peak).
Example:
Grid computing is also a form of cloud computing primarily focused towards support for
scientific applications
Cloud Computing:
A model for enabling ubiquitous, convenient, on-demand network access to a shared pool
of configurable computing resources (e.g., networks, servers, storage, applications and
services) that can be rapidly provisional and released with minimal management effort or
service provider interaction
Cluster computing:
A set of interconnected computers that cooperated closely provide a single, integrated high
performance computing capability
In the earlier slide we have introduced sharing of compute, storage & network abilities
To book a Tatkal ticket, users must share the common reservation database
Not all, but often, resource sharing is achieved using Client Server models
A complete interaction between a client and a server, from the point when the client sends
its request to when it receives the server’s response, is called a Remote invocation.
A group of users who cooperated directly share resources such as documents in a small,
closed group