0% found this document useful (0 votes)
70 views9 pages

Class Notes

This document discusses distributed file systems. It begins by defining distributed file systems and their ability to provide location-transparent storage across distributed processes. It then discusses challenges in building distributed file systems that can perform well under different workloads. The document reviews several popular distributed file systems and how they are optimized for specific use cases. It also summarizes the evolution of distributed file system architectures and how techniques like distributed hash tables and erasure codes have improved scalability and resilience. Finally, it discusses how cryptographic hashes and techniques like Merkle trees and digital signatures can help ensure integrity and authenticity of data in distributed file systems.

Uploaded by

viswa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views9 pages

Class Notes

This document discusses distributed file systems. It begins by defining distributed file systems and their ability to provide location-transparent storage across distributed processes. It then discusses challenges in building distributed file systems that can perform well under different workloads. The document reviews several popular distributed file systems and how they are optimized for specific use cases. It also summarizes the evolution of distributed file system architectures and how techniques like distributed hash tables and erasure codes have improved scalability and resilience. Finally, it discusses how cryptographic hashes and techniques like Merkle trees and digital signatures can help ensure integrity and authenticity of data in distributed file systems.

Uploaded by

viswa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

DISTRUBUTED FILE

SYSTEMS
BY:
T.GUNA SEKHAR – 18BCI0002
U. CHAITANYA – 18BCE0292
M.SAI KRISHNA – 18BCE0783
S.SUSANTH – 18BCE0552
ABSTRACT
Distributed file systems provide a fundamental abstraction to
location-transparent, permanent storage. They allow
distributed processes to co-operate on hierarchically
organized data beyond the life-time of each individual
process. The great power of the file system interface lies in the
fact that applications do not need to be modified in order to
use distributed storage. On the other hand, the general and
simple file system interface makes it notoriously difficult for a
distributed file system to perform well under a variety of
different workloads. This has lead to today’s landscape with a
number of popular distributed file systems, each tailored to a
specific use case..
ABSTRACT
Early distributed file systems merely execute file system calls on a
remote server, which limits scalability and resilience to failures. Such
limitations have been greatly reduced by modern techniques such
as distributed hash tables, content-addressable storage, distributed
consensus algorithms, or erasure codes. In the light of upcoming
scientific data volumes at the exabyte scale, two trends are
emerging. First, the previously monolithic design of distributed file
systems is decomposed into services that independently provide a
hierarchical namespace, data access, and distributed
coordination. Secondly, the segregation of storage and computing
resources yields to a storage architecture in which every compute
node also participates in providing persistent storage.
INTRODUCTION
A distributed file system is a client/server-based application that
allows clients to access and process data stored on the server as if it
were on their own computer. When a user accesses a file on the
server, the server sends the user a copy of the file, which is cached
on the user's computer while the data is being processed and is
then returned to the server. Ideally, a distributed file system
organizes file and directory services of individual servers into a
global directory in such a way that remote data access is not
location-specific but is identical from any client. All files are
accessible to all users of the global file system and organization is
hierarchical and directory-based.
INTRODUCTION:

Since more than one client may access the same data
simultaneously, the server must have a mechanism in place (such as
maintaining information about the times of access) to organize
updates so that the client always receives the most current version
of data and that data conflicts do not arise. Distributed file systems
typically use file or database replication (distributing copies of data
on multiple servers) to protect against data access failures. Sun
Microsystems' Network File System (NFS), NovellNetWare, Microsoft's
Distributed File System, and IBM/Transarc's DFS are some examples
of distributed file systems.
LITERATURE SURVEY:
HOW THESE FILE SYSTEMS ARE USED?
Even though the file system interface is general and fits a broad spectrum of applications,
most distributed file system implementations are optimized for a particular class of
applications. For instance, the Andrew File System (AFS) is optimized for users’ home
directories. XrootD is optimized for high-throughput access to high-energy physics data sets
the Hadoop File System (HDFS) is designed as a storage layer for the MapReduce framework
These use cases differ both quantitatively and qualitatively. Consider a multi-dimensional
vector describing different levels of properties or requirements for a particular class of data
that consists of data value, data confidentiality, redundancy, volume, median file size,
change frequency, and request rate. Every single use case above poses high requirements in
only some of the dimensions. All of the use cases combined, however, would require a
distributed file system with outstanding performance in every dimension. Moreover, some
requirements contradict each other: a high level of redundancy (e. g. for recorded
experiment data) inevitably reduces the write throughput in cases where redundancy is not
needed (e. g. for a scratch area). The file system interface provides no standard way to
specify quality of service properties for particular files or directories. Instead, we have to resort
to using a number of distributed file systems, each with implicit quality of service guarantees
and mounted at a well-known location (/afs, /eos, /cvmfs, /data, /scratch, . . . ). Quantitative
file system studies, which are unfortunately rare, provide precise workload characterizations
to guide file system implementers
LITERATURE SURVEY:
ARCHITECTURE EVOLUTION:
The simplest architecture for a distributed file system is a single server that exports a local
directory tree to a number of clients (e. g. NFSv3). This architecture is obviously limited
by the capabilities of the exporting server. An approach to overcome some of these
limitations is to delegate ownership and responsibility of certain file system subtrees to
different servers, as done by AFS. In order to provide access to remote servers, AFS
allows for lose coupling of multiple file system trees (“cells”). Across cells, this
architecture is not network-transparent: moving a file from one cell to another requires a
change of path. It also involves a copy through the node which triggers the move, e. g.
move is not a namespace-only operation. Furthermore, the partitioning of a file system
tree is static and changing it requires administrative intervention. In object-based file
systems, data management and meta-data management is separated (e. g. GFS). Files
are spread over a number of servers that handle read and write operations. A meta-
data server maintains the directory tree and takes care of data placement. As long as
meta-data load is much smaller than data operations (i. e. files are large), this
architecture allows for incremental scaling. As the load increases, data servers can be
added one by one with minimal administrative overhead. The architecture is refined by
parallel file systems (e. g. Lustre) that cut every file in small blocks and distribute the
blocks over many nodes. Thus read and write operations are executed in parallel on
multiple servers for better maximum throughput.
LITERATURE SURVEY:
FILE SYSTEM INTEGRITY:
Global file systems often need to transfer data via untrusted connections and
still ensure integrity and authenticity of the data. Cryptographic hashes of the
content of files are often used to ensure data integrity. Cryptographic hashes
provide a short, constant length, unique identifier for data of any size. Collisions
are virtually impossible to occur neither by chance nor by clever crafting,
which makes cryptographic hashes a means to protect against data
tampering Many globally distributed file systems use cryptographic hashes in
the form of content-addressable storage where the name of a file is derived
from its cryptographic content hash. This allows for verification of the data
independently of the meta-data. It also results in immutable data, which
eliminates the problem of detecting stale cache entries and keeping cache
consistency. Furthermore, redundant data and duplicated files are
automatically de-duplicated, which in some use cases (backups, scientific
software binaries) reduces the actual storage space utilization by many
factorsCryptographic hashes are also used to protect the integrity of the file
system tree when combined with a Merkle tree
LITERATURE SURVEY:
FILE SYSTEM INTEGRITY:
Cryptographic hashes are also used to protect the integrity of the file
system tree when combined with a Merkle tree. In a Merkle tree, nodes
recursively hash their children’s cryptographic hashes so that the root hash
uniquely identifies the state of the entire file system. Copies of this root hash
created at various points in time provide access to previous snapshots of
file systems, which effectively allows for backups and for versioned file
systems. The hashes in the tree can also be cryptographically signed in
order to ensure data authenticity of a file system or a subtree (who
created the content). An elegant way to solve the problem of key
distribution inherent to digital signatures is the encoding of the public key
as part of the path name. To protect against silent corruption—the
probabilistic decay of physical storage media over time—simple
checksums such as CRC32 provide an easy means. Checksums can be
faster verified than cryptographic hashes, fast enough to compute them
on the fly on every read access

You might also like