0% found this document useful (0 votes)

26 views12 pages

Docs Template

The document discusses several distributed file systems and parameters like fault tolerance, replication, checkpointing, and security that were studied using MapReduce. It describes tools like Ceph, GlusterFS, HDFS and Lustre that were used in the study. Lessons learned from case studies on application integrity, decomposition and file system integrity are also covered.

Uploaded by

John Bernard Tungol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views12 pages

Docs Template

Uploaded by

John Bernard Tungol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

DHVSU LUBAO CAMPUS

Wireframe Documentation: Critical Study of

Performance Parameters on Distributed File
Systems using MapReduce [Title of your
Wireframe]

In Partial Fulfillment
of the Requirements for the Course of
Human Computer Interaction and SIA

John Bernard C. Tungol

April 10, 2021

DHVSU LUBAO CAMPUS

Abstract

Every day, there is a lot of data generated by the network. MapReduce is a good
programming model that is parallel for large processing of data. The paper that was
reviewed surveyed several distributed storage and distributed systems. Various
parameters were studied. These are Fault Tolerance, Replication, Checkpointing, Security
and Optimizing small file access using MapReduce and distributed file systems that were
reviewed were Ceph, GlusterFS, HDFS and Lustre. These distributed file systems were
open source. Also, one of the important distributed file systems that plays an important
role in protecting application data is Cloud Computing. The authors said that their paper
is efficient and scalable because of MapReduce application.

Introduction and Discussions

There is a lot of data generated from the network is growing every day. Massive data
processing and inadequacy in storage of traditional databases is observed well. A
distributed file system stores data on multiple nodes, and to remove the bottleneck.
Clients are allowed to access data in parallel from the storage nodes. Furthermore, DFSs
are sometimes called network. Distributed file systems provide persistent storage of
unstructured data, which are organized in a hierarchical namespace of files that is shared
among networked nodes. This data model and the interface to applications distinguishes
distributed file systems from other types such as databases. To applications, it should be
transparent whether data is on a distributed file system or stored on a local file system.

Motivation and Context

DHVSU LUBAO CAMPUS

Even though the file system interface is general and fits a broad spectrum of
applications, most distributed file system implementations are optimized for applications
particularly. There is often a need to survey and study the distributed systems. These
cases differ qualitatively and quantitatively. Every use case poses high requirements in
only some of the dimensions. All of the use cases combined, however, would require a
distributed file system in every dimension which the performance must be outstanding.
Some systems requirements contradict each other: a high-level redundancy (e.g., for
recorded experiment data) reduces inevitably the right throughput in cases where
redundancy is not needed (e.g., for a scratch area). The file system interface has no
standard to provide a way to specify quality of service properties for particular directories
or files.

Related Work

There are so many related works that is related or synonymous to the paper that is
about to be reviewed. Among these are the following published papers: (1) Cassandra
File System Over Hadoop Distributed File System by Mr. Ashish A. Mutha and Miss.
Vaishali M. Deshmukh, both from PRMIT&R College, Amravati, India. (2) A Survey on
Distributed File System Technology by J Blomer of Switzerland. These published studies
were recommended well and being used as reference respectively by the succeeding
researchers.

Overview of Modelling Method

Fault Tolerance and Replication

DHVSU LUBAO CAMPUS

Data is distributed over multiple machines where there are chances of failure of networks.
A simple distributed file system is needed with integrated fault tolerance for efficient
handling of small records of data. Fault-tolerant data storage is becoming popular as
moving data is done to the cloud. Actually, fault tolerance is achieved through the
division of a file into smaller chunks or fragments, which are processed and managed by
a set of servers. Fault-tolerance is an important aspect in cloud storage where a major
concern is the robustness of data.

Checkpointing

Checkpointing is an essential fault tolerance mechanism adopted by long-running

intensive data applications. It occurs at regular intervals; applications undergo which
methods are to compute and to checkpoint operations. The process of Periodical
Checkpointing saves the state of the application. This process is done frequently by
verifying the health of the cluster to check the progress. It has been observed that the
checkpointing is done in various approach; on cloud of free disk space, or on central file
server parallel file system, or temporary buffer.

Dealing with Small Files

It has been presented in a log-structured file system, authors claim that such a system
exhibits a performance increase of an order of magnitude for small-file writes, while
matching the performance for large files in comparison to non-log-structured file
systems. Managing free space is an important issue, since the problem is how to make
large extents of free space available for writing after many overwrites and deletions of
small files. The object-based storage systems like Ceph for example, has the data
organized and can be accessed, struggle with workloads that access large number of small
files, which are developed through user workspaces and software, there are two reasons
DHVSU LUBAO CAMPUS

for this: loss of namespace locality at the storage devices and interactions of each file
with metadata server.

Security

Recent efforts recognize the importance of self-protection of such big data management
systems, but they mainly focus on correctness and data privacy. The increasing popularity
of storing analytics and big data create a need of efficient and secure data management
mechanism. One of the most relevant security topics for handling such big data refers to
preventing the users from damaging the stored data or from breaking data-access
protocols and security policies. For the security of large data, techniques such as logging,
encryption and privacy techniques etc. are necessary. Distributed file system is one of the
cloud computing systems, security issues of these systems and technologies are
applicable to cloud computing. IBM researchers prefer that using Kerberos as an
application to secure its data environment.

Tool Support

These are the tools that support the study of Distributed File System using
MapReduce framework.

Ceph

Ceph is free, easy to use, and reliable. Its power allows to manage vast amounts of data.
Ceph delivers extraordinary scalability where thousands of clients can access petabytes to
exabytes of data. A ceph node can work smoothly on commodity hardware, which
DHVSU LUBAO CAMPUS

accommodates large numbers of nodes, which can communicate with each other to
dynamically redistribute and replicate data. the placement policies can separate the object
replicas across different failure domains but still maintains the desired distribution using
the CRUSH (Controlled Replication Under Scalable Hashing) algorithm.

GlusterFS

In GlusterFS, the elementary storage units are called as Bricks. A server can have more
than one bricks where they can store data through translators on lower-level file systems.
GlusterFS distributes load using a distribute hash translation (DHT) of filenames to its
subvolumes, which are duplicated to provide fault tolerance and load handling on scale-
out distributed file system supporting thousands of clients.

Lustre

Lustre is a file system having high performance computing (HPC) ability and has an
ability to process Big Data. Lustre is a cluster file system based on client/server model.
Lustre file system achieves great performances and scalability as the separation in
metadata operations is seen from normal data operations. Data is stored on Object
Storage Servers (OSSs) and metadata is stored on Metadata Servers (MDSs). Lustre and
Hadoop’s Distributed File System (HDFS) have similarity in terms of performance and
storage capabilities.

Industrial Case Study and Lesson Learned

DHVSU LUBAO CAMPUS

Application Integrity

Some distributed file systems are implemented as an extension of the operating

system kernel (e. g. NFS, AFS, Lustre). That can provide better performance compared to
interposition systems but the deployment is difficult and implementation errors typically
crash the operating system kernel. Distributed file systems do not fully comply with the
POSIX file system standard. Each distributed file system needs to be tested with real
applications. From the point of view of applications there are different levels of
integration a distributed file system can provide.

Decomposition

There is a tendency of decomposition and modularization in distributed file systems.

In the grid, for instance, federates globally distributed cluster file systems, the namespace
is controlled by experiments’ file catalogs, which, in combination with grid middleware.
Examples are the offloading of authorization to Kerberos in AFS, the offloading of
distributed consensus to Chubby in GFS (resp. ZooKeeper in HDFS), or the layered
implementation of Ceph with the independent RADOS key-value store as building block
beneath the file system.

File System Integrity

Cryptographic hashes of the content of files are often used to ensure data integrity.
Cryptographic hashes provide a short, constant length, unique identifier for data of any
size. Collisions are virtually impossible to occur neither by chance nor by clever crafting,
which makes cryptographic hashes a means to protect against data tampering. It also
results in immutable data, which is keeping cache consistency and eliminates the problem
of detecting stale cache entries. Furthermore, redundant data and duplicated files are
DHVSU LUBAO CAMPUS

automatically de-duplicated, which in some use cases (backups, scientific software

binaries) reduces the actual storage space utilization by many factors.

Efficiency

Caching and file striping are standard techniques to improve the speed of distributed file
systems. Caches can be located in memory, in flash memory, or on hard disks. Caches are
most often managed per file system node. Cache sizes needs to be manually tuned
according to the working set size of applications. Co-operative caches between nodes in a
local network have been discussed but they are not implemented in today’s production
file systems. Dynamic workload adaptation is a technique used in the Ceph file system to
change the metadata-to-metadata mapping of servers based on the server load.

Highlights

Title and Content

Critical Study of Performance Parameters on Distributed File Systems using

MapReduce.

Handling Large Data - Both MapReduce and Parallel DBMS provide a means to process
large volumes of data. As the volume of data captured continues to rise, questions have
been asked as to whether the parallel DBMS paradigm can scale to meet demands.
Parallel DBMS were developed to improve the performance of database systems. As
there is an improvement in processor performances, it has been outstripped disk
throughput, hence critics have predicted from time to time that I/O bottleneck would be a
DHVSU LUBAO CAMPUS

major problem. MapReduce has been designed inherently fault tolerant and is to run on
thousands of nodes.

Analytics - The output from one subprocess is the input to the next such algorithms are
difficult to implement in SQL. Performing these tasks in many steps reduces the
performance benefits gained from parallel DBMS. Both MapReduce and Parallel DBMS
can be used to produce analytical results from big data.

Replication - An engineering challenge is the placement of redundant data in such a way

that the redundancy crosses multiple failure domains. While replication is simple and
fast, it also results in a large storage overhead. Replication and erasure codes are the
techniques used to avoid data loss and to continue operation in case of hardware failures.

Impact

 MapReduce can be transparently scalable. The underlying hardware has no

dependencies. The user does not need to manage data placement or the number of
nodes used for their job.

 Because processing is independent, failover is trivial. A failed process can be

restarted, provided the underlying filesystem is redundant like HDFS.

 Data flow is highly defined and in one direction from the Map to the Reduce, with
no communication between independent mapper or reducer processes.

 MapReduce though powerful, does not fit all problem types.

DHVSU LUBAO CAMPUS

Strengths

Ceph is unable to provide coherent file chunk replicas and thus is bandwidth limited.
A computational paradigm named MapReduce, where an application is divided into many
small fragments of work, each of which may be executed on any node in the cluster. The
cluster can be HPC or the client can be a node on distributed file system or centralized
system. The data is replicated, making it fault tolerant. Many of researchers have written
that MapReduce can be run on Glusterfs and will give better performance than HDFS.

Weaknesses

In the upcoming years, the computing landscape will move towards the exascale.
That means data sets that routinely sum up to exabytes and supercomputers that provide
computing power in the exaflop range. While the gap between capacity and bandwidth
widened by one to two orders of magnitude in the last 20 years, the bandwidth of
Ethernet networks scaled at a similar pace than the capacity of hard drives. Raicu et al.
predict the collapse of exaflop supercomputing applications due to the limited storage
bandwidth and the architecture of todays distributed file systems. Experts suggests to
break the segregation between computer networks and storage networks and to build
distributed file systems with the following characteristics. The desire to integrate
MapReduce on the distributed node or any HPC cluster will be the part of the future
study and implementations.
DHVSU LUBAO CAMPUS

Conclusions

Distributed file systems provide a relatively well-defined and general-purpose interface

for applications to use large-scale persistent storage. Its implementation of distributed file
systems, however, is always tailored to a particular class of applications. To conclude,
MapReduce is a programming model for processing large data sets with a distributed,
parallel algorithm on a cluster. The choice of backing file system or cluster on HPC, and
the placement and data security concern, MapReduce supports the data availability.
However, comparative testing on a much larger and wider scale would be undertaken in
the future.

Storyboard

References

1. Sandberg R, Goldberg D, Kleiman S, Walsh D and Lyon B 1985 Proc. of the

Summer USENIX conference pp 119–130
2. Morris J H, Satyanarayanan M, Conner M H, Howard J H, Rosenthal D S H and
Smith F D 1986 Communications of the ACM 29 184–201
3. Carns P H, III W B L, Ross R B and Thakur R 2000 Proc. 4th Annual Linux
Showcase and Conference (ALS’00) pp 317–328
4. Schmuck F and Haskin R 2002 Proc. 1st USENIX conf. on File Storage and
Technologies (FAST’02) pp 231–244
5. “Designing performance monitoring tool for NoSQL Cassandra distributed
database”. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6360579
&queryText%3DCASSANDRA, 2012 IEEE.
6. “Cassandra: flexible trust management, applied to electronic health records”,
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1310738 &queryText
%3DCASSANDRA, IEEE.
7. David A. Patterson, Garth A. Gibson, and Randy H. Katz. A Case for Redundant
Arrays of Inexpensive Disks (RAID). In : Proceedings of the 1985 ACM SIGMOD
International Conference on Management of Data;1988,pp. 109–116.
DHVSU LUBAO CAMPUS

8. T. Kosar, Data Intensive Distributed Computing: Challenges and Solutions for Large
Scale Information Management, IGI Publications;2011.
9. A.Silberschatz,P.Baer Galvin,G.Gagne, Operating Systems , Publication By John
Wiley & Sons.
10. L. Lamport. Time, clocks, and the ordering of events in a distributed system.
Commun. ACM;1978.
11. J.Bent,G.Gibson,G.Ben McClell,P.Nowoczynski, J. Nunez, M.Polte, M.
Wingat,PLFS: A Checkpoint Filesystem for Parallel

Unit III
No ratings yet
Unit III
120 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
Distributed System DS Unit5
No ratings yet
Distributed System DS Unit5
61 pages
Big-Data-Unit 2
No ratings yet
Big-Data-Unit 2
70 pages
A Distributed File System: By, Prof Ankita Mandore
No ratings yet
A Distributed File System: By, Prof Ankita Mandore
37 pages
Chapter 6 - NG - 2020
No ratings yet
Chapter 6 - NG - 2020
16 pages
Module 2
No ratings yet
Module 2
27 pages
Wa0001.
No ratings yet
Wa0001.
56 pages
UNIT 5 Storage Systems
No ratings yet
UNIT 5 Storage Systems
9 pages
What Is DFS
No ratings yet
What Is DFS
37 pages
BDA Module 2 COMP
No ratings yet
BDA Module 2 COMP
29 pages
Dist Sys Unit 4 Notes
No ratings yet
Dist Sys Unit 4 Notes
45 pages
Storage Donvito Chep 2013
No ratings yet
Storage Donvito Chep 2013
43 pages
Unit 3
No ratings yet
Unit 3
61 pages
Module III Hadoop Framework
No ratings yet
Module III Hadoop Framework
21 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
Distributed Computing
No ratings yet
Distributed Computing
19 pages
Lecture 4.0 - Distributed File Systems
No ratings yet
Lecture 4.0 - Distributed File Systems
15 pages
Rev. Lecture 1 PPT2
No ratings yet
Rev. Lecture 1 PPT2
24 pages
Evaluating Fault Tolerance and Scalability in Distributed File Systems: A Case Study of GFS, HDFS, and Minio
No ratings yet
Evaluating Fault Tolerance and Scalability in Distributed File Systems: A Case Study of GFS, HDFS, and Minio
9 pages
m94922 Laing Paper
No ratings yet
m94922 Laing Paper
7 pages
Distributed File Systems
No ratings yet
Distributed File Systems
23 pages
L8 DFS
No ratings yet
L8 DFS
35 pages
Distributed File Systems Concepts and e 61384
No ratings yet
Distributed File Systems Concepts and e 61384
54 pages
Lec 11 - Distributed Files - Distributed File System
No ratings yet
Lec 11 - Distributed Files - Distributed File System
33 pages
Distributed File System Questions and Answers
100% (1)
Distributed File System Questions and Answers
6 pages
Navigating The Landscape of Distributed File Systems: Architectures, Implementations, and Considerations
No ratings yet
Navigating The Landscape of Distributed File Systems: Architectures, Implementations, and Considerations
10 pages
8 06072873 Sec Real DFS
No ratings yet
8 06072873 Sec Real DFS
6 pages
Distributed File System
No ratings yet
Distributed File System
27 pages
DC Mod 6
No ratings yet
DC Mod 6
9 pages
Kafka PDF
No ratings yet
Kafka PDF
106 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
Critical Study of Performance Parameters On Distributed File
No ratings yet
Critical Study of Performance Parameters On Distributed File
9 pages
CSCI319 Distributed Systems
No ratings yet
CSCI319 Distributed Systems
26 pages
Distributed Systems U4
No ratings yet
Distributed Systems U4
8 pages
2distributed File System Dfs
No ratings yet
2distributed File System Dfs
21 pages
Electronics: Performance Evaluations of Distributed File Systems For Scientific Big Data in FUSE Environment
No ratings yet
Electronics: Performance Evaluations of Distributed File Systems For Scientific Big Data in FUSE Environment
16 pages
Unit-3 (Bit-43)
No ratings yet
Unit-3 (Bit-43)
16 pages
Distributed File Systems: Pavel Bžoch
No ratings yet
Distributed File Systems: Pavel Bžoch
36 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
Distributed File System
No ratings yet
Distributed File System
7 pages
A Comparative Study of The Architectures and Applications of Scalable High-Performance Distributed File Systems
No ratings yet
A Comparative Study of The Architectures and Applications of Scalable High-Performance Distributed File Systems
11 pages
Large Scale Distributed File System Survey
No ratings yet
Large Scale Distributed File System Survey
7 pages
Distributed File System
No ratings yet
Distributed File System
5 pages
The Hadoop Approach
100% (2)
The Hadoop Approach
14 pages
Class Notes
No ratings yet
Class Notes
9 pages
Distributed File System - File Service Architecture
No ratings yet
Distributed File System - File Service Architecture
51 pages
Hadoop Session
No ratings yet
Hadoop Session
65 pages
7 A Taxonomy and Survey On Distributed File Systems
No ratings yet
7 A Taxonomy and Survey On Distributed File Systems
6 pages
GPS Vs Hdfs
No ratings yet
GPS Vs Hdfs
6 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
Vishwa SrDataEngineer Resume
No ratings yet
Vishwa SrDataEngineer Resume
4 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
Nca Aiio
No ratings yet
Nca Aiio
11 pages
Fundamentals of Big Data Engineering: A Guide To The
No ratings yet
Fundamentals of Big Data Engineering: A Guide To The
14 pages
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
No ratings yet
Petroleum: Big Data Analytics in Oil and Gas Industry: An Emerging Trend
10 pages
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
No ratings yet
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
8 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
Oracle Big Data
No ratings yet
Oracle Big Data
12 pages
Avinash Kottu Email: (971) - 727-0299
No ratings yet
Avinash Kottu Email: (971) - 727-0299
4 pages
GCP Data Engineer Resume
No ratings yet
GCP Data Engineer Resume
1 page
L2 AWS Basics
No ratings yet
L2 AWS Basics
56 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
5 pages
Defense 4.0: Internet of Things in Military: Serhat Burmaoglu, Ozcan Saritas, and Haydar Yalcin
No ratings yet
Defense 4.0: Internet of Things in Military: Serhat Burmaoglu, Ozcan Saritas, and Haydar Yalcin
18 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Assignment 03 BigData Computing Noc23-Cs112
No ratings yet
Assignment 03 BigData Computing Noc23-Cs112
6 pages
Trifacta Connection Guide
No ratings yet
Trifacta Connection Guide
83 pages
HDFS Blocks
No ratings yet
HDFS Blocks
2 pages
Abhilash Resume
No ratings yet
Abhilash Resume
5 pages
Introduction To Big Data Analytics Valli
No ratings yet
Introduction To Big Data Analytics Valli
9 pages
K-Means and MAP REDUCE Algorithm
No ratings yet
K-Means and MAP REDUCE Algorithm
13 pages
CV
No ratings yet
CV
6 pages
Week1 Frequently Asked Questions
No ratings yet
Week1 Frequently Asked Questions
19 pages
FDS 2 Marks All Units For File
No ratings yet
FDS 2 Marks All Units For File
13 pages
CC Mini Project Report
No ratings yet
CC Mini Project Report
20 pages
Apache Spark - Executors - How Many Tasks Can My Cluster Run in Parallel - by Swetha Murali - Medium
No ratings yet
Apache Spark - Executors - How Many Tasks Can My Cluster Run in Parallel - by Swetha Murali - Medium
8 pages
MODULE I - Lesson 3 Big Data
No ratings yet
MODULE I - Lesson 3 Big Data
9 pages
Experiment No: 2 Pig Latin Commands Aim
No ratings yet
Experiment No: 2 Pig Latin Commands Aim
7 pages
Praveen Kumar Kandhala
No ratings yet
Praveen Kumar Kandhala
5 pages
Work Experience: Synechron Technologies
No ratings yet
Work Experience: Synechron Technologies
2 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
From Everand
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GlusterFS Administration and Deployment: Definitive Reference for Developers and Engineers
From Everand
GlusterFS Administration and Deployment: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Apache Mesos: Definitive Reference for Developers and Engineers
From Everand
Practical Apache Mesos: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
From Everand
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
From Everand
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Docs Template

Uploaded by

Docs Template

Uploaded by

DHVSU LUBAO CAMPUS

Wireframe Documentation: Critical Study of

John Bernard C. Tungol

April 10, 2021

Introduction and Discussions

Motivation and Context

Overview of Modelling Method

Fault Tolerance and Replication

Checkpointing is an essential fault tolerance mechanism adopted by long-running

Dealing with Small Files

Industrial Case Study and Lesson Learned

Some distributed file systems are implemented as an extension of the operating

There is a tendency of decomposition and modularization in distributed file systems.

File System Integrity

automatically de-duplicated, which in some use cases (backups, scientific software

Title and Content

Critical Study of Performance Parameters on Distributed File Systems using

Replication - An engineering challenge is the placement of redundant data in such a way

 MapReduce can be transparently scalable. The underlying hardware has no

 Because processing is independent, failover is trivial. A failed process can be

 MapReduce though powerful, does not fit all problem types.

Distributed file systems provide a relatively well-defined and general-purpose interface

1. Sandberg R, Goldberg D, Kleiman S, Walsh D and Lyon B 1985 Proc. of the

You might also like