0% found this document useful (0 votes)
68 views

Distributed File Systems

This document provides an overview of distributed file systems (DFS). It discusses how DFS extend traditional file systems to manage files across multiple networked machines. Key points include: - DFS allow files to be stored across many hosts on a network with users also distributed, while presenting the illusion of a single local file system to users. - DFS implement the same core functions as local file systems, like naming, accessing, allocating and securing files, but do so across networked machines rather than a single local system. - Benefits of DFS include providing more storage than a single system, increased fault tolerance, and allowing distributed user access. Managing caching and consistency across machines adds complexity over local file systems.

Uploaded by

NUREDIN KEDIR
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Distributed File Systems

This document provides an overview of distributed file systems (DFS). It discusses how DFS extend traditional file systems to manage files across multiple networked machines. Key points include: - DFS allow files to be stored across many hosts on a network with users also distributed, while presenting the illusion of a single local file system to users. - DFS implement the same core functions as local file systems, like naming, accessing, allocating and securing files, but do so across networked machines rather than a single local system. - Benefits of DFS include providing more storage than a single system, increased fault tolerance, and allowing distributed user access. Managing caching and consistency across machines adds complexity over local file systems.

Uploaded by

NUREDIN KEDIR
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Distributed File Systems (DFS)

Introduction: What Is a Distributed File Systems (DFS)

Let's start to answer that question by reminding ourselves about regular, vanilla flavored,
traditional file systems. What are they? What do they do? Let's begin by asking ourselves a more
fundamental question: What is a file?

A file is a collection of data organized by the user. The data within a file isn't necessarily
meaningful to the operating system. Instead, a file is created by the user and meaningful to the
user. It is the job of the operating system to maintain this unit, without understanding or caring
why.

A file system is the component of an operating system that is responsible for managing files. File
systems typically implement persistent storage, although volatile file systems are also possible
(/proc is such an example). So what do we mean when we say manage files? Well, here are some
examples:

 Name files in meaningful ways. The file system should allow a user to locate a
file using a human-friendly name. In many cases, this is done using a hierarchical
naming scheme like those we know and love in Windows, UNIX, Linux, &c.
 Access files. Create, destroy, read, write, append, truncate, keep track of position
within, &c
 Physical allocation. Decide where to store things. Reduce fragmentation. Keep
related things close together, &c.
 Security and protection. Ensure privacy; prevent accidental (or malicious)
damage, &c.
 Resource administration. Enforce quotas, implement priorities, &c.

So what is it that a DFS does? Well, basically the same thing. The big difference isn't what it
does but the environment in which it lives. A traditional file system typically has all of the users
and the entire storage resident on the same machine. A distributed file system typically operates
in an environment where the data may be spread out across many, many hosts on a network --
and the users of the system may be equally distributed.

For the purposes of our conversation, we'll assume that each node of our distributed system has a
rudimentary local file system. Our goal will be to coordinate these file systems and hide their
existence from the user. The user should, as often as possible, believe that they are using a local
file system. This means that the naming scheme needs to hide the machine names, &c. This also
means that the system should mitigate the latency imposed by network communication. And the
system should reduce the increase risk of accidental and malicious damage to user data
assocaited with vulnerabilities in network communication. We could actually continue this list.
To do so, pick a property of a local file system and append transparency.

Why Would we want a DFS?


There are many reasons that we might want a DFS. Some of the more common ones are listed
below:

 More storage than can fit on a single system


 More fault tolerance than can be achieved if "all of the eggs are in one basket."
 The user is "distributed" and needs to access the file system from many places

How Do We Build a DFS?

Let's take a look at how to construct a design a DFS. What questions do we need to ask. Well,
let's walk through some of the bigger design decisions.

Grafting a Name Space Into the Tree

For the most part, it isn't too hard to tie a distributed file system into the name space of the local
file system. The process of associating a remote directory tree with a particular point in the local
directory tree is known as mounting the local directory that represents the root of the remote
directory is typically known as a mount point. Directories are often mounted as the result of an
action on the part of an administrator. In the context of UNIX systems, this can be via the
"mount" command. The mounts may be done "by hand" or as part of the system initialization.
Many UNIX systems automatically associate remote directories with local ones based on
"/etc/vfstab". Other mechanisms may also be available. Solaris, for example, can bind remote
directories to local mount points "as needed" and "on demand" through the "automounter".

Since mounting hides the location of files from the user, it allows the files to be spread out across
multiple servers and/or replicated, without the users knowledge. AFS implements read-only
replication. Coda supports read-write replication.

Implementing The Operations

Once the name spaces are glued, the operations on the distributed file system need to be defined.
In other words, once the know that the user wants to access a directory entry that lives
somewhere else, how does it know what to do? In the context of UNIX and UNIX-like operating
systems, this is typically done via the virtual file system (VFS) interface. Recall from operating
systems that VFS and vnodes provide an easy way to keep the same interface and the same
operations, but to implement them differently. Many other operating systems provide similar
object-oriented interfaces.

In a traditional operating system, the operations on a vnode access a local device and local cache.
In a similar fashion, we can write operations to implement the VFS and vnode interfaces that will
go across the network to get our data. On systems that don't have as nice an interface, we have to
do more heavy kernel hacking -- but implementation is still possible. We just implement the DFS
using whatever tools are provided to implement a local file system, but ultimately look to the
network for the data.

Unit of Transit
So, now that we can move data across the network, how much do we move? Well, there are two
intuitive answers to this question: whole files and blocks. Since a file is the unit of storage
known to the user, it might make sense to move a whole file each time the user asks to access
one. This makes sense, because we don't know how the file is organized, so we don't know what
the user might want to do -- except that the user has indicated that the data within the file is
related. Another option is to use a unit that is convenient to the file system. Perhaps we agree to a
particular block size, and move only one block at a time, as demanded.

Whole file systems are convenient in the sense that once the file arrives, the user will not
encounter any delays. Block-based systems are convenient because the user can "get started"
after the first block arrives and does not need to wait for the whole (and perhaps very large) file
to arrive. As we'll talk about soon, it is easier to maintain file-level coherency if a whole file
based system is used and block-level coherency if a block-based system is used. But, in practice,
the user is not likely to notice the difference between block-level and file-level systems -- the
overwhelming majority of files are smaller than a single block.

Reads, Writes and Coherency

For the moment, let's assume that there is no caching and that a user can atomically read from a
file, write to it, and write it back. If there is only one user, it doesn't matter what the she or he
might do, or he or she does it -- the file will always be as expected when all is said and done.

But if multiple users are playing with the file, this is not the case. In a file-based system, if two
users write to two different parts of the file, the final result will be the file from the perspective
of one user, or the other, but not both. This is because one of the writes is destroyed by the
subsequent writes. In a block-based system, if the writes lie in different blocks, the final file may
represent the changes of both users -- this is because only the block containing each write is
written -- not the whole file.

This might be good, because neither user's change is lost. This might also be bad, because the
final file isn't what either user expects. It might be better to have one broken application than two
broken applications. This is completely a semantic decision.

In a traditional UNIX file system, the unit of sharing is typically a byte. Unless flock(), lockf(),
or some other voluntary locking is used, files are not protected from inconsistency caused by
multiple users and conflicting writes.

Andrew File System (AFS) version 1 and 2 implemented whole file semantics, as does Coda.
NFS and AFS version 3 implement block level semantics. None of these popular systems
implement byte-level semantics as does a local file system. This is because the network overhead
of sending a message is too high to justify such a small payload.

AFS changed from file level to block level access to reduce the lead time associated with
opening and closing a file, and also to improve cache efficiency (more soon). It is unclear if
actual workloads, or simply perception, prompted the change. My own belief is that it was
simply perception (good NFS salespeople?) This is because most files are smaller than a block --
databases should not be distributed via a DFS!

Caching and Server State

There is typically a fairly high latency associated with the network. We really don't want to ask
the file server for help each time the user accesses a file -- we need caching. If only one user
would access a unit (file or block) at a time, this wouldn't be hard. Upon a close, we could
simply move it to the client, let the client play with it, and then move it back (if written) or
simply invalidate the cached copy (if read).

But life gets much harder in a multiuser system. What happens if two different users want to read
the same file? Well, I suppose we could just give both of them copies of the file, or block, as
appropriate. But then, what happens if one of them, or another user somewhere else writes to the
file? How do the clients hodling cached copies know?

Well, we can apply one of several solutions to this problem. The first solution is to apply the
Ostrich principle -- we could just hope it doesn't happen and live with the inconsistency if it
does. This isn't a bad technique, because, in practice, files are rarely written by multiple users --
never mind concurrently.

Or, perhaps we can periodically validate our cache by checking with the server: "I have
checksome [blah]. Is this still valid?" or "I have timestamp [blah] is this still valid?" In this case
we can eventually detect a change an invalidate the cache, but there is a window of vulnerability.
But, the good thing is that the user can access the data without bringing it back from the network
each time. (And, even that approach can still allow for inconsistency, since it takes non-zero
time). The cache and validate approach is the technique used by NFS.

An alternative to this approach is used by AFS and Coda. These systems employ smarter servers
that keep track of the users. They do this by issuing a callback promise to each client as it
collects a file/block. The server promises the client that it will be informed immediately if the
file changes. This notice allows the client to invalidate its cache entry.

The callback-based approach is optimistic and saves the network overhead of the periodic
validations, while ensuring a more strict consistency model. But it does complicate the server
and reduce its robustness. This is because the server must keep track of the callbacks and
problems can arise if the server fails, forgets about these promises, and restarts. Typically
callbacks have timelimits associated with them, to reduce the amount of damage in the event that
a client crashes and doesn't close() the file, or server fails and cannot issue a callback.

Coda and Disconnected Operation

Let's talk a little more about how AFS-2 (whole file) is organized. Typically files are collected
into volumes which are nothing more than a collection of files grouped together for
adminsitrative purposes. A server can hold one or more volumes. Read-only volumes may be
replicated, so that a back-up copy is used if the primary copy becomes unreachable. Since
writable volumes cannot be replicated write conflicts aren't a problem. Additionally, a client
might have a cached copy of certain files.

Now, if a volume server should become unreachable, a client can continue to work out of its
cache -- at least until it misses or tries to write a file back to the server. At this point, things break
down, either because the client cannot get the files it needs, or because the client cannot inform
the server of the changes, so the client cannot close the file -- or the server might violate a
callback promise to other clients.

Now, let's hypothesize, for just a moment that we make sure that clients have all of the files that
they need, and also allow them to close the file, even if they cannot contact the server to inform
it of the change? How could we make this work?

Well, Coda has something called the hoard daemon whose job is to keep certain files in the
client's cache, requesting them as necessary, just in case the client should later find itself unable
to communicate with the server. The user can explicitly specify which files should be hoarded, or
build a list by watching his/her activity using a tool called spy. This is the easy part (well, in
some sense).

But what do we do about the delayed writes? When we can eventually tell the server about them,
what do we do? Someone else may have found themselves in the same position. In orde to solve
this problem, Coda keeps a version number with each file and another associated with each
volume. This version number simply tracks the number of times a file has been modified. Before
a client writes a file to the server, it checks the version of the file on the server. If that version
number matches the version number of the file that the client read before the write, the client is
safe and can send the new version of the file. The server can then increment the version number.

If the version number has increased, the client missed a callback promise. It cannot send its
version -- it may be older than what is already there. This is called a conflict. Coda allows the
user to send the file, but labels it as a conflict. These conflicts typically must be resolved by the
user, before the file can be used. The user is basically asked, "Which one is the 'real' one?" The
other one can, of course, be saved under a different name. There are some tools that can
automatically merge multiple versions of certain files to automatically resolve conficts, but this
is only possible for certin, well structured files -- and each one must be handed coded on an
application-by-application basis.

Conflicts aren't typically a big problem, because shared files are typically not written by multiple
users. As a result, the very small chance of a conflict isn't a bad price to pay for working through
a failure.

Coda and Replication

Coda actually implements replication for writeable volumes. The collection of servers that main
copies of a particular volume are known as a Volume Storage Group (VSG). Let's talk about how
this is implemented. Coda actually doesn't maintain a single version number for each file and
volume. It maintains a vector of version numbers known as the Coda Version Vector (CVV) for
each file and Volume Version Vector (VVV) for each volume.

This version vector contains one entry for each replica. Each entry is the version number of the
file on that replica. In the perfect case, the entry for each replica will be identical. The client
actually requests the file in a two-step process.

1. It asks all replicas for their version number


2. It then asks the replica with the greatest version number for the file
3. If the servers don't agree about the files version, the client can direct the servers to
update a client that is behind, or inform them of a conflict. CVVs are compared
just like vector timestamps. A conflict exists if two CVVs are concurrent, because
concurrent vectors indicate that each server involved has seen some changes to
the file, but not all changes.

In the perfect case, when the client writes a file, it does it in a multi-step process:

1. The client sends the file to all servers, along with the original CVV.
2. Each server increments its entry in the file's CVV and ACKS the client.
3. The client merges the entries form all of the servers and sends the new CVV back
to each server.
4. If a conflict is detected, the client can inform the servers, so that it can be resolved
automatically, or flagged for mitigation by the user.

Given this process, let's consider what happens if one or more servers should fail. In this case,
the client cannot contact the server, so it temporarily forgets about it. The collections of volume
servers that the client can communicate with is known as the Available Volume Storage Group
(AVSG). The AVSG is a subset of the VSG.

In the event that the AVSG is smaller than the VSG, the client does nothing special. It goes
throguh the same process as before, but only involves those servers in the AVSG.

Eventually when the partitioned or failed server becomes accessible, it will be added back to the
AVSG. At this point, it will be involved in reads and writes. When this happens, the client will
begin to notice any writes it has missed, because its CVV will be behind the others in the group.
This will be automatically fixed by the a write operation.

Coda clients also periodically poll the members of their VSG. If they find that hosts have
appeared that are not currently in their AVSG, they add them. When they add a server in the
VSG back to the AVSG, they must compare the VVV's. If the new server's VVV does not match
the client's copy of the VVV, thier is a conflict. To force a resolution of this conflict, the client
drops all callbacks in the volume. This is because the server had updates while it was
disconnected, but the client (because it couldn't talk to the server) missed the callbacks.

Now, let's assume that the network is partitioned. Let's say that half of the network is accessible
to one client and the other half to the other client. If these clients play with different files,
everything works as it did above. But if they play with the same files, a write-write conflict will
occur. The servers in each partition will update their own version numbers, but not the other. For
example, we could see the following:

Initial: <1,1,1,1>
<1,1,1,1>
<1,1,1,1>
<1,1,1,1>

--------- Partition 1/2 and 3/4 ----------


Write 1/2: <2,2,1,1>
<2,2,1,1>

Write 3/4: <1,1,2,2>


<1,1,2,2>

--------- Partition repaired ----------


Read (ouch!) <1,1,2,2>
<1,1,2,2>
<2,2,1,1>
<2,2,1,1>

The next time a client does a read (or a write), it will detect the inconsistency. This inconsistency
cannot be resolved automatically and must be repaired by the user. Coda simply flags it and
requests the user's involvement before permitting a subsequent access.

Notice that replication does not affect the model we discussed before -- Coda's disconnected
mode is not affected. The version vectors work just like the individual version numbers in this
respect.

Coda and Weakly Connected Mode

Coda operates under the assumption that servers are busy and clients have plenty of free time.
This is the reason that replication and conflict detection are client-driven in Coda. But this
assumption does not always hold. It would be prohibitively time consuming to implement client-
based replication over a modem line. It might also be prohibitively expensive over a low-
bandwith wireless connection.

Coda handles this situation with what is known as weakly connected mode. If the Coda client
finds itself with a limited connection to the servers, it picks one of the servers and sends it the
update. The server then propagates the change to the other servers, if possible, and the version
vectors are updated on the servers, as appropriate. Basically the update generates a server-server
version conflict and triggers the server to resolve it.
Lustre: A File System for Cluster Computing

The Lustre file system is a contraction of "Linux Cluster". It is a scalable cluster file system


designed for use by any organization that needs a large, scalable general-purpose file system.

It is basically a network-based RAID. Imagine a system where each node of the cluster file
system provides a huge chunk of storage. Now, imagine managing these chunks of storage like a
RAID. Files are broken into objects, very similar to stripes. These stripes can be stored by
different nodes. The result is that we get the same type of scalability in capacity that we saw with
RAIDs -- but at an order of magnitude larger. Each node might, itself, behind the scenes, be
connected to a large RAID, or even a storage aire network. And, as with RAIDs, we see the same
performance improvement correlated to the number of stripes in flight at a time.

Under the hood, when a client opens a fiel, using the standard POSIX function, e.g. open(), a
request is sent to a metadata server. This server responds by giving the cient the metadata about
this file -- including the mapping of the file to objects on various nodes. For those who happen to
be familiar with the internals of a traditional UNIX file system, this mapping essentially replaces
the block-to-storage mapping present in an inode.

Robustness isn't intriniscly a concern, because each node is internally robust, with its own
storage based on a RAID or SAN, and its own back-up system. Lustre does, as you might
imagine, provide tools to facilitate backup, repair, and recovery of the metadata server. Since
even the temporary loss of this server can disable the entire system, Lustre supports a standby
server, which is essentially an always-available mirror.

Beyond a UNIX-like File System Interface

The file system interface, the way a file system interacts with an application program, has long
been dominated by the early UNIX file systems. These early systems are the basis for the
modern day POSIX standard, as well as the landscape in which many non-POSIX-compliant
systems, such as AFS were developed.

It is easy to think about file systems only in terms of random-access reads and writes,
user/group/world permissions, the prevailing cases being that reads dominating writes, certain
reads are dramatically more common than others, and many read or write operations are
sequential reads, such as loading or processing all data, or sequential writes, such as logging
events over time.

But, many distributed systems are used to tackle very different problems, with very different
modalities. If we realize the differences between these applications and the cases to which we are
accustomed, we can often make dramatically different design decisions and obtain better
performance.

MogileFS and HDFS


Today, we are going to take a really quick, somewhat shallow look at two of these file systems:
MogileFS and HDFS. We'll take a more detailed look at HDFS a little later, when we discuss
Hadoop.

These two file systems have (at least) two important things in common. They are both
implemented at the user-level, rather than in the kernel, and they both support semantics that are
quite different, and in some important ways more limited than POSIXs. What they gain is
performance for the special class of applications they are intended to support. In the case of
MogileFS, it was designed to support a file hosting site, e.g. a photo hosting site. In the case of
HDFS, it was designed to support a paradigm of distributed computing, known as the Map-
Reduce model. In-place edits, e.g. random access writes, are not important in either model. And,
additionally, HDFS benefits from location-awareness -- exactly the opposite of most
applications, and the POSIX interface, which are based on location transparency.

Both applications are implemented at the user-level, rather than in-kernel. In other words, they
are written as application-level programs that make use of the existing POSIX-compliant file
system supported by each local operating system. This is done because it gives them much more
flexibility to support a variety of underlying hardware.

Real-world distributed systems are often heterogeneous. It is impossible to keep sufficiently


large systems running the same version of the OS at all times. If the file system depends on the
OS, it requires a lot of coordination and supporting the same interface across multiple kernel
versions. HDFS can support many different operating systems and local fiel systems. Although
MogileFS does this in principle, in practice some of the details are Linux-centric. None-the-less,
the ability to run across many Linux distribution, and versions of distributions, with different
local file systems, is huge, and is good enough in practice for many.

MogileFS: Serving Objects

A big part of our world is sharing. We share all sorts of things. Let's think about photo sharing
sites, video sharing sites, music sharing sites. Indeed, we could solve the problem of storing the
objects, e.g. photos, videos, and music, using a traditional file system. But, we can do better if
we recognize how this workload is likely to be different.

First, these objects are accessed in fewer, simpler ways than in other models. They are never
edited in place. No one is changing photos -- they are uploading them. In fact, they are never
edited at all -- no one is adding on to existing videos or music -- even if they might subsequently
upload longer or extended version. And, no one wants a chunk from the middle of a photo. The
same is true for music or videos. Instead, these objects are downloaded, from start-to-finish. So,
what we need is a file system that can efficiently support the uploading and downloading of files
from start-to-finish, even if it doesn't allow random reads, random writes, or appends.

The world will interact with the MogileFS only through the very rich interface of the Web site. It
isn't necessary to supported nested directory trees to create a hierarchy for human convenience.
Instead, it is only necessary to allow some simpler way to disambiguate files from different
applications (or higher-level domains of some kind).
And, finally, since the number of users is small and under the same administrative domain,
protections are enforced by the applications, not the file system.

But, what we do need to do is to deliver many of these files very rapidly and very reliably from
many clients. We're might have, for example, a bunch of Web servers from our farm hitting the
file system at the same time, each of which will want a fat pipe. And, we don't want to lose any
of the objects we are charged with preserving and distributing.

How does MogileFS accomplish this? It works a lot like LustreFS. At a high level, it has the
same idea -- a distributed RAID. But, it is different in some of the details. It doesn't rely on each
node providing robust storage, instead it replicates objects across servers. The number of replicas
is associated with the class of the file, so, for example, photos might have three replicas, each,
but thumbnails, which can be recreated from the original photos, might only have one replica of
each. this reduces the cost of the storage by allowing less expensive components.

Additionally, MogileFS uses HTTP to server objects from each replica, as opposed to a home-
grown protocol, for portability. For the same reason, it keeps its metadata in a standard MySQL
database. Since, unlike in Lustre, in-place writes aren't permitted, locks aren't very frequently
needed, so the database can maintain a sufficiently high throughput.

Lastly, it maintains simple namespaces, rather than directory trees. We can imagine that several
different applications, e.g. several different Web sites of different kinds with different objects to
serve, might use the same MogileFS, and accidentally have objects with the same name. As long
as they use different namespaces, this isn't a problem -- and is much simpler and more efficient
than a full-blown directory system. Similarly, the lack of a complex permission/ownership
scheme, though less important, helps to keep things simple.

Hadoop File System (HDFS)

The Hadoop File System (HDFS) is designed to support Hadoop, an open framework for a
specialized type of very scalable distributed computing often known as the Map-
Reduce paradigm. Both Hadoop and HDFS are based on Google's papers describing early,
presumedly landmark, versions of their framework. These days, there are many champions and
users of Hadoop. It is worth mentioning Yahoo!, by name, because of their particularly deep
involvement and advocacy.

We'll spend a good chunk of time talking about the Map-Reduce paradigm, Hadoop, and HDFS
later in the semester. But let's see if we can sketch out the problem. Imagine that you've got a
truely huge number of records. Oh, for example, records capturing descriptive observations of
some approximation of each and every Web page in the world.

The bad news is that we've got a huge amount of data. The good news is that it won't be edited in
place. We'll just be collecting it, adding to it. And, the better news is that since the observations
are somewhat independent, we don't have to be too careful about the order in which we append
them -- just as long as we don't lose or corrupt any. And, as we'll discuss a little bit in a minute,
and more later this semester, we'll also be reading them in fairly limited ways, too.
Now, let's suggest that you want to look at these and decide if they match some search criteria. It
would be impossible to look at them sequentially. So, you want to look at them in parallel. So,
somehow, the records are going to need to be very spread out for processing. If we keep the data
and the storage segregated, for example by maintaining separate computing and storage farms, a
huge amount of data will need to move, placing impossible demands upon any deployable
network.

Instead we see that it would be ideal to scatter the records across many systems, and have the
distributed computing scattered in the same way. We want the computing to be local to the data
upon which it is operating, for example attached to the same switch.

Because there is so much data, it isn't practical to use any type of off-line backup. So, instead it
should simply use replication -- which has the added benefit of providing extra replicas for
parallel processing.

And, this leads us to some important aspects of the deisgn of HDFS. It has to allow appends, but
not in-place edits. Writes are read many times. The data and the processing have to be local to
each other, so the system requires location-awareness. And, the data needs to be very heavily
distributed, to allow for very heavily distributed processing. And, as we discussed earlier, it
needs to be implemented in a portable way at the user-level, to be maintainable at scale.

Characteristic of distributed file system


 Remote data/file sharing: It allows a file to be transparently accessed by processes of any
node of the system irrespective of the file’s location. Example: Any process  ‘A’ can create
the file and share it with other processes ‘B’ or ‘C’ and the same file can be
accessed/modified process running in other nodes.
 User mobility: Users in the distributed systems are allowed to work in any system at any
time. So, users need not relocate secondary storage devices in distributed file systems.
 Availability: Distributed file systems keep multiple copies of the same file in multiple
places. Hence, the availability of the distributed file system is high and it maintains a better
fault tolerance for the system.
 Data Integrity: A file system is typically shared by several users. Data saved in a
transferred file must have its integrity protected by the file system. The correct
synchronisation of concurrent access requests from multiple users vying for access to the
same file requires a concurrency control method. Atomic transactions, which are high-level
concurrency management systems for data integrity, are frequently made available to users
by file systems. 
 Performance: Performance is evaluated using the typical amount of time it takes to
persuade a client. It must function in a manner comparable to a centralised file system.
 Diskless workstations: Distributed file systems allow the use of diskless workstations to
reduce noise and heat in the system.  Also, diskless workstations are more economical than
disk full workstations. 
Desirable features to build distributed file system
 Scalable networks: Even with an increase in the number of users in the network, the
performance should remain the same. For example, initially, 100 users are using 100 Mbps
bandwidth networks and suddenly system admin increased the number of users to 150, in
that case, the performance of the network remains the same.
 Replications: The services should be replicated in many systems to avoid a single point of
failure. For example, an email server should be available in multiple systems to reach the
service to users 24×7.
 Openness: The systems with different architecture as well as operating systems can be
connected to the distributed system environment and thus message passing is possible. The
person with a 32-bit system can interact with the person with a 64-bit system seamless
interaction.
 Reliable and Availability:  The systems should be built with 100% reliability and 100%
availability for the utilization of networks. 
Mechanism to build distributed file systems
 Use of FIle Models: The DFS uses different conceptual models of a file. The following are
the two basic criteria for file modeling, which include file structure and modifiability. The
files can be unstructured or structured based on the applications used in file systems. Also,
the modifiability of the file can be categorized as mutable and immutable files.  
 Use of FIle Accessing Models: A distributed file system may use one of the following
models to service a client’s file access request when the accessed file is a remote file. There
are two such models are there, viz., the Remote service model and the Data-caching
model. 
 Use of FIle sharing Semantics: A shared file may be simultaneously accessed by multiple
users. The types of file-sharing semantics can be used such as Unix Semantics, Session
Semantics, Immutable shared files semantics, and transaction-like semantics.  
 Use of FIle -Caching Schemes: Basically following key criteria used in file caching
scheme viz., cache  location, modification propagation, and cache validation
 Use of FIle Replications: File replication is the primary mechanism for improving file
availability in a distributed systems environment. A replicated file is a file that has multiple
copies with each copy located on a separate file server.

Distributed File System (DFS) Architecture


Distributed File Systems are used to store and share files across different computers or servers.
In turn, they allow people to share information with others without having it stored on one
computer. This is done by dividing the file into pieces.  After, that they are distributed to
different servers so that they are accessed from any other computer on the network.
Consequently, you distinguish a distributed file system (DFS) from typical file systems
(i.e., NTFS and HFS). Certainly, this is achieved by allowing direct host access to the same file
data across multiple locations.
As noted, you distribute files among multiple storage servers and in multiple locations. This then
allows users to share data and storage resources. In case of a disaster or high load, two
components work together to improve the data availability. Basically, this allows data from
several locations to logically combine into one folder, known as DFS Root.
There are a few reasons one may want to look at a DFS solution for his environment. But all boil
down to a need to access the same data from multiple locations. Especially in an unstructured
data world, a DFS plays a critical role by providing a single, logical view of data scattered
between local and remote locations, including in the cloud.
Besides, DFS makes information and files easily shared between users across the network, with
controlled permissions. In turn, it allows users of the network to share information and files in a
controlled and authorized way.
Also Read
What is NFS – Network File System (How it Works)
Applications of Distributed File System
Some of the major applications of the distributed file system are shown below:
NFS
Network File System (NFS) is a file sharing protocol works in a client server architecture. As a
matter of fact, it allows users to access and mount directories located on the remote system. It is
one of various DFS standards for Network-Attached Storage. Chiefly NFS uses a file locking
system that allows many clients to share the same files. The NFS manages multiple application
or compute threads for operation.
Hadoop
Free, open source distributed file system used to store process and analyse data which are very
huge in volume. Designed to process large data sets across clusters of computers using simple
programming models. Using Hadoop, you scale up from single servers to thousands of machines,
each offering local computation and storage.
SMB
A Server Message Block or SMB is a file sharing protocol developed by IBM. All in all, it
allows you to read and write files on a remote server over the local area network. With SMB, you
share files, directories, printers and other resources on a companies internal network.
NetWare
Network operating system developed by Novell. NetWare uses IPX network protocol to run
different services on a personal computer. Additionally supports several operating systems,
including Microsoft Windows, DOS, IBM OS/2, and Unix.
Also Read 
NFS vs FTP – What’s the Difference ? (Pros and Cons)
Features of DFS
 Easy to use and high availability
 File Locking feature.
 Coherent access.
 Supports multi networking and multi protocol access.
 User mobility.
 Scalable and reliable.
 Data integrity.
 Secure and protects information from unwanted and unauthorized access.
DFS namespaces and DFS replication are part of the File and Storage Services role.
Namespaces are role services on Windows Server. That allow sharing folders located on various
servers to be grouping in a single, or multiple, logically structured namespaces.
DFS replication is the multi master replication mechanism in Microsoft Windows Server. That is
used for synchronizing folders across servers over a low bandwidth network connection. The
Enterprise and Data center editions of Windows Server hosts multiple DFS roots on a single
server.
It is not required that you use these components together, you can a namespace without using a
file replication component, and it is entirely possible to use the file replication component
without using the namespace component among servers. Below with Distributed File System
(DFS) Architecture, we have listed a few more components of Distributed File Systems.
DFS Components
Distributed File Systems are a set of computers that work together to store and retrieve data.
They are used for storing data in a secure way, as well as for sharing that data across the
network. It has several components, which we have listed below.
Also below, in this section, we have discussed the cache manager, DCE file server machines,
administrative server processes, and tools that help in keeping track of the DFS use and its
activities. We have also explained the DFS/NFS secure gateway, which offers authorized access
to DFS from NFS clients.
Cache Manager

Image
Source: Geeksforgeeks

A cache manager is a program that stores data in the form of a cache. Intermediary between a
computer’s main memory and the permanent storage system. The cache manager has two main
functions: to improve performance by storing copies of recently accessed data in memory. In
addition, it also protects against data loss by keeping backup copies of data on permanent
storage.
Use cache managers for many different purposes, such as caching web pages for faster access,
caching files for faster loading times, and optimizing disk usage.
It is the client side of DFS that checks the local cache at first on receiving a user’s request. If no
similar file is found in the local cache, the cache manager forwards the request to the file
server machine and caches data on disk or in memory.
Also Read 
How to Install NFS Server on Windows Server 2019
File Exporter
The File Exporter is the server side component of DFS that exports files from the DFS to a
location outside the DFS. You use it to export entire folders or individual files from the DFS, and
it is also used to export groups of files. The component runs on the file server machine, where it
receives requests and manages files.
When a file exporter receives an RPC request, it accesses its own local file system to fulfil the
request. This local file system is the DCE Local File System (LFS) or a UNIX File System
(UFS). It handles the synchronization of multiple clients accessing the same file simultaneously
using the token manager and provides the client with the needed information.
Token Manager

Image Source: Gurtam
A Token Manager allows users to create and manage their own tokens to carry out operations. It
further helps synchronize access to files by
numerous clients. The access privileges associated with the tokens that a token manager issues to
DFS clients are typically read or write. Token manager can issue four types of tokens, including
data tokens, status tokens, lock tokens, and open tokens.
To manage tokens, the token management layer in the cache manager works with the token
manager that runs on a file server machine. If a client requests an operation that clashes with a
token that another client possesses, then the token management must revoke the current token
and give a new token before completing the desired action.
Also Read 
How to Install NFS on Debian 11 Server Tutorial (Step by Step)
DCE Local File System

Image  Source: Geeksforgeeks
A DCE Local File System is a file system that allows users to store and retrieve files from the
computer’s hard drive. The design goal of the DCE Local File System is to provide an open and
reliable file system that is used by all applications in the distributed computing environment.
These Local File System provides a high level interface to store and retrieve files. It also
provides support for directories, access control lists, and data integrity checks.
This type of file system is used to share files among workgroups, departments, or whole
organizations. The DCE Local File System also provides a way to control access to files so that
only authorized users read or change.
Fileset Server
The Fileset Server is a Windows service that stores files in the DFS namespace. A service is
installed on an existing Windows Server machine or on a new server running the Windows
Server operating system. Using this component, administrators create, delete, transfer, or
perform operations on the fileset. It provides a centralized location for storing files that are
shared among multiple clients.
Also Read 
How to Install NFS Server on Linux Ubuntu 20.04 (Step by Step Tutorial)
Basic Overseer Server

Image Source: Jetsoft
A Basic Overseer Server is a server that contains one or more DFS replicas. Moreover, it has the
ability to create and delete standalone or domain based replicas on the same server as well as on
other servers in your organization. They are also used to provide an overview of all of the DFS
Replica Servers that are part of your organization’s replication topology. It also
helps monitor the health of your DFS Replica Servers.
Replication Server

Image Source: Nakivo
A replication Server is an administrative server that allows you to replicate and synchronize
databases between different servers. It manages the replication of filesets and synchronizes the
changes made on one server with another server. This is done by copying the data from one
database to another.
Equally, you update the replicas manually or automatically. Further, if a copy of replicas fails to
move, you can still access another copy of the fileset from another file server machine.
Also Read 
15 Best TFTP Servers for Windows/Linux (Network File Transfers)
Update Server

Update servers are used in order to distribute binary files or administrative data to DFS
configured servers. The upclient and upserver programmes make up the update server. A system
that needs to receive updated binary files or administrative data is where the upclient software is
installed. Any updates to binaries or administrative data are propagated to the workstations
running the upclient software by the upserver programme, which is operated on a master system.
Fileset Location Server
The fileset location server (FL Server) is a server that manages filesets and their locations. At
this point, it also offers a replicated directory service that maintains a record of each fileset and
the place where it resides. You then easily access a fileset just by its name. It is not compulsory
to know the fileset’s location in order to access a fileset. Also, the fileset location database’s
location is automatically updated by DFS (FLDB).
Also Read 
Top 10 Azure Disaster Recovery Solutions (Best Practices)
Backup Server

Image Server: Aspirationhosting 
The backup server in DFS is used to create data backups on file server machines and maintain
schedules for the same. This component helps to keep the replicated backup database’s backup
records up to data and provides the ability to run full and incremental dumps. The fileset serves
as the backup unit.
Scout
This administrative tool is another vital component that helps collect and show data about the file
exporters running on file server computers. It helps administrators to keep track of how DFS is
being used.
The dfstrace Utility
Using the dfstrace utility, administrators and system developers  keep track of DFS processes
running run in the user space or the kernel. The component also offers a suite of commands for
low level diagnostic and debugging data.
DFS/NFS Secure Gateway

Image source: Cloudera
The DFS/NFS Secure Gateway is a gateway that provides secure access to files stored on DFS or
NFS servers. It offers a number of features that make it easy for users to work with the files they
have stored on the server.
The gateway provides access to files and folders on the server plus file system navigation and
file management operations, i.e., copy, delete, rename, and create a folder. Also, it has the ability
to upload and download files from the server.
You can also perform file system tasks such as splitting archive files, synchronizing directories,
and backing up data from the server onto local drives.
Thank you for reading Distributed File System (DFS) Architecture Components Explained. We
shall conclude.
Also Read 
SFTP vs SCP – What’s the Difference for Secure File Sharing?
Distributed File System (DFS) Architecture Components Explained Conclusion
The Distributed filesystems (DFS) are file systems that extend over several file servers or several
locations, for example, file servers located at various physical locations. They are highly scalable
and have a high performance. They are also fault tolerant and are also used as a replacement for
a centralized server.
It provides transparency of data and allows sharing it remotely. The distributed file system is also
highly secure and helps protect data in the file system from unauthorized access. It also supports
load sharing and file locking features.
Distributed file systems are designed to overcome the limits of traditional local storage, where
data resides on only one computer. They is used for both large scale and small scale storage, as
well as for backup purposes. Cache manager, file exporter, token manager, replication
server, backup server, and a few main components of DFS. Check out their use and roles in DFS.

You might also like