0% found this document useful (0 votes)

17 views4 pages

36 DC Expt9

Uploaded by

greeshmahedvikar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

36 DC Expt9

Uploaded by

greeshmahedvikar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

St.

Francis Institute of Technology

SV Road, Borivali (West), Mumbai 400103

Department of Computer Engineering

Academic Year: 2023-2024 Semester: VIII

Subject: Distributed Computing Class / Division: BE/CMPN A2
Name: Vedant Pednekar Roll Number: 36

Experiment No.: 09

Aim: Case Study: Distributed File System

Pre-requisites: Distributed File System

Theory:
Give the overview of the Distributed File System and explain any one case study of DFS -
1. Google File System
Google File System (GFS) is a distributed file system designed by Google to cater to the
storage requirements of their extensive, data-centric applications. Introduced in a seminal
paper by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung in 2003, GFS adopts
a chunk-based architecture, breaking files into fixed-size chunks and employing a
master-chunk server model. The master server maintains metadata and control, while
multiple chunk servers store data with fault tolerance achieved through replication. GFS
prioritizes scalability, fault tolerance, and high throughput, making it suitable for
large-scale data processing. Operations involve file creation, read/write processes, and
chunk migration, with the system designed to gracefully handle failures. Despite potential
concerns such as a single point of failure and limited POSIX compliance, GFS has
significantly influenced the development of distributed file systems, leaving an indelible
mark on the storage solutions landscape and serving as the backbone for Google's
data-intensive applications.

Key Features of GFS:

● Scalability: GFS is designed to scale horizontally, allowing it to accommodate the

ever-growing volume of data and increasing user demands. This scalability is achieved by
adding more commodity hardware to the system.
● Fault Tolerance: GFS ensures fault tolerance through the replication of data across multiple
chunk servers. Each chunk, a fixed-size unit of data, is replicated on different servers,
minimizing the impact of hardware failures and enhancing data reliability.
● High Throughput: The system is optimized for high-throughput access to large datasets.
This is particularly important for data-intensive applications that require efficient processing
of vast amounts of data, and GFS achieves this through its design and operation.
● Streaming Writes: GFS is well-suited for applications with significant write-intensive
workloads, such as those involving large-scale log processing. It efficiently handles streaming
writes, contributing to its performance in scenarios where data is continuously generated.
● Chunk-Based Architecture: GFS organizes data into fixed-size chunks (typically 64 MB).
This chunk-based architecture facilitates efficient storage and retrieval of data and simplifies
the management of large datasets.
● Master-Chunk Server Model: GFS follows a master-chunk server model, where a central
master server oversees the metadata and control of the file system, while multiple chunk
servers store and manage the actual data. This separation of responsibilities contributes to the
system's scalability and manageability.
● Relaxed Consistency Model: GFS employs a relaxed consistency model, prioritizing high
throughput and availability over strict consistency. This allows for quicker data access and
manipulation, especially in scenarios involving concurrent read and write operations.
● Data Migration and Balancing: The system supports the migration of data chunks between
servers for load balancing and optimization purposes. The master server manages this
process, ensuring that data is efficiently distributed across the available resources.

GFS Design Requirements

● High Fault Tolerance: In the context of Google File System (GFS), high fault tolerance is a
fundamental design principle to address the challenges inherent in large-scale distributed
systems. The emphasis on fault tolerance is motivated by the acknowledgement that hardware
failures are inevitable in such expansive infrastructures. GFS achieves fault tolerance through
data replication—each chunk of data is replicated across multiple chunk servers. In the event
of a hardware failure or node crash, the system can seamlessly redirect operations to
available replicas, ensuring continuous data accessibility and system functionality. This
design choice is crucial for maintaining the integrity and reliability of data, safeguarding
against potential disruptions caused by hardware or server failures.
● High Performance: GFS is engineered to deliver high-performance access to large datasets,
aligning with the demands of Google's data processing applications. The architecture is
optimized to support both high-speed read and write operations, making it well-suited for
scenarios where large volumes of data need to be processed efficiently. By prioritizing
performance, GFS contributes to the rapid and effective execution of tasks such as web
indexing, search operations, and other data-intensive processes. This focus on high
performance is a key factor in enhancing the overall efficiency and responsiveness of the file
system.
● Scalability: Scalability is a pivotal requirement for GFS due to the continuously expanding
volumes of data generated and processed by Google's applications. The system is designed to
scale horizontally, allowing for seamless expansion by adding more commodity hardware to
the distributed infrastructure. This scalability ensures that GFS can handle the ever-increasing
storage needs and processing demands, making it adaptable to the dynamic and growing
nature of Google's data-centric operations. The distributed nature of GFS, coupled with its
scalability, enables it to effectively manage massive datasets without sacrificing performance
or reliability.
● Simplicity: Simplicity in both design and operations is a guiding principle for GFS. This
design choice is intended to ensure easy manageability and maintenance of the distributed file
system. By favoring simplicity, GFS reduces the likelihood of errors, streamlines system
operations, and enhances overall reliability. The file system's simplicity is evident in its
architecture, file organization, and mechanisms for data access. This approach makes GFS
more robust and user-friendly, contributing to its successful integration into Google's
infrastructure and facilitating efficient management of large-scale data processing tasks.

GFS Architecture

Components of GFS
A group of computers makes up GFS. A cluster is just a group of connected computers. There
could be hundreds or even thousands of computers in each cluster. There are three basic entities
included in any GFS cluster as follows:

GFS Clients: They can be computer programs or applications which may be used to request
files. Requests may be made to access and modify already-existing files or add new files to the
system.

GFS Master Server: It serves as the cluster’s coordinator. It preserves a record of the cluster’s
actions in an operation log. Additionally, it keeps track of the data that describes chunks, or
metadata. The chunks’ place in the overall file and which files they belong to are indicated by
the metadata to the master server.

GFS Chunk Servers: They are the GFS’s workhorses. They keep 64 MB-sized file chunks.
The master server does not receive any chunks from the chunk servers. Instead, they directly
deliver the client the desired chunks. The GFS makes numerous copies of each chunk and stores
them on various chunk servers to assure stability; the default is three copies. Every replica is
referred to as one.

GFS system mainly stores 2 types of data

File Metadata: The file metadata in GFS consists of information about the structure and
properties of files. This metadata includes details such as file names, directory structures, access
permissions, and timestamps. The master server in the GFS architecture is responsible for
managing and maintaining this metadata. It keeps track of the location of chunks that make up a
file, the number of replicas for each chunk, and other relevant attributes. The file metadata is
crucial for organizing and managing the file system's structure.

File Data: The actual content of the files, referred to as file data, is stored in GFS. This data is
divided into fixed-size chunks, typically 64 megabytes in size. Each chunk is treated as a
separate unit for storage and retrieval purposes. The chunk servers are responsible for storing
and managing these chunks of file data. Data replication is used to ensure fault tolerance, with
multiple replicas of each chunk distributed across different chunk servers. This redundancy
allows for continued data access even in the event of hardware failures or other issue

Advantages of GFS

● High accessibility: Data is still accessible even if a few nodes fail. (replication)
Component failures are more common than not, as the saying goes.
● Excessive throughput: many nodes operating concurrently.
● Dependable storing: Data that has been corrupted can be found and duplicated.

Disadvantages of GFS

● Not the best fit for small files.

● The master may act as a bottleneck.
● unable to type at random.
● Suitable for procedures or data that are written once and only read (appended) later.

Google File System Report
50% (2)
Google File System Report
36 pages
Sodapdf
No ratings yet
Sodapdf
6 pages
Adv Java Nit
100% (3)
Adv Java Nit
229 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
AWS Certified DevOps Engineer Professional... Tests 2021
100% (3)
AWS Certified DevOps Engineer Professional... Tests 2021
210 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Final Project Proposal
50% (4)
Final Project Proposal
8 pages
The Google File System Final
No ratings yet
The Google File System Final
20 pages
Google File System 1
No ratings yet
Google File System 1
48 pages
2 GFS
No ratings yet
2 GFS
30 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
Chap 6
No ratings yet
Chap 6
54 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
Saritha Gfs Report
No ratings yet
Saritha Gfs Report
28 pages
Chapter 2 Google File System 250525 070947
No ratings yet
Chapter 2 Google File System 250525 070947
42 pages
Google File System
No ratings yet
Google File System
48 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
Google File System
No ratings yet
Google File System
20 pages
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
No ratings yet
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
33 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
DBMS Final
No ratings yet
DBMS Final
21 pages
DS Lecture 5
No ratings yet
DS Lecture 5
28 pages
Google File System
No ratings yet
Google File System
22 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
2 Uvm
No ratings yet
2 Uvm
15 pages
BCA 5 Google File System
No ratings yet
BCA 5 Google File System
17 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
Lecture 14 HDFS GFS
No ratings yet
Lecture 14 HDFS GFS
30 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
The Google File System: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google
No ratings yet
The Google File System: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google
15 pages
Huawei Pass4sure H13-629 v2018-01-14 by - Paul - 150q
No ratings yet
Huawei Pass4sure H13-629 v2018-01-14 by - Paul - 150q
70 pages
CC
No ratings yet
CC
17 pages
Google File System (GFS)
No ratings yet
Google File System (GFS)
18 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
Google File System Paper - Summary
50% (2)
Google File System Paper - Summary
4 pages
M4 - 05 - Google File System
No ratings yet
M4 - 05 - Google File System
28 pages
AnalyzingGFS HDFS
No ratings yet
AnalyzingGFS HDFS
11 pages
A Novel Distributed File System Using Blockchain Metadata
No ratings yet
A Novel Distributed File System Using Blockchain Metadata
20 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Gfs Google File System 13331
No ratings yet
Gfs Google File System 13331
28 pages
Demands of Google's Data Processing Needs. Performance, Scalability, Reliability, and Availability. A Proprietary DFS
No ratings yet
Demands of Google's Data Processing Needs. Performance, Scalability, Reliability, and Availability. A Proprietary DFS
9 pages
Google File System
No ratings yet
Google File System
9 pages
Brown and Black Modern Watercolor Presentation
No ratings yet
Brown and Black Modern Watercolor Presentation
11 pages
Google File System and Hadoop Distributed File System-An Analogy
No ratings yet
Google File System and Hadoop Distributed File System-An Analogy
11 pages
Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
DBMS Unit V
No ratings yet
DBMS Unit V
17 pages
Large Scale Distributed File System Survey
No ratings yet
Large Scale Distributed File System Survey
7 pages
The Google File System
No ratings yet
The Google File System
21 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
DS Mod 5.2
No ratings yet
DS Mod 5.2
6 pages
ETL Tools: Basic Details About Informatica
No ratings yet
ETL Tools: Basic Details About Informatica
121 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
1564-Article Text-2810-1-10-20171231 PDF
No ratings yet
1564-Article Text-2810-1-10-20171231 PDF
5 pages
MIT 6.824 - Lecture 3 - GFS
No ratings yet
MIT 6.824 - Lecture 3 - GFS
1 page
Offer For Maintenance Contract
No ratings yet
Offer For Maintenance Contract
26 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
What Is Distributed Data Processing?
No ratings yet
What Is Distributed Data Processing?
2 pages
GPS Vs Hdfs
No ratings yet
GPS Vs Hdfs
6 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
Business Intelligence Tools
No ratings yet
Business Intelligence Tools
7 pages
Techexpressway Sap S4abap Coursemap Latest PDF Model2
No ratings yet
Techexpressway Sap S4abap Coursemap Latest PDF Model2
1 page
Inmate Tracking & Management System
No ratings yet
Inmate Tracking & Management System
13 pages
Abusing Linked Database - MSSQL - Hacking
No ratings yet
Abusing Linked Database - MSSQL - Hacking
17 pages
Business Analyst Testing Document
No ratings yet
Business Analyst Testing Document
7 pages
Working With Expressions - Grouping and Summarizing Data - 16.05.20
No ratings yet
Working With Expressions - Grouping and Summarizing Data - 16.05.20
64 pages
DBE Lab Manual
No ratings yet
DBE Lab Manual
41 pages
MongoDB Ebook 07292020
No ratings yet
MongoDB Ebook 07292020
24 pages
Merge Multiple Excel Files
No ratings yet
Merge Multiple Excel Files
22 pages
System Error Codes (0-499)
No ratings yet
System Error Codes (0-499)
33 pages
IDP Proposal
No ratings yet
IDP Proposal
12 pages
Xenapp On Oracle Cloud Infrastructure
No ratings yet
Xenapp On Oracle Cloud Infrastructure
24 pages
Us 18 Stone Unpacking The Packed Unpacker
No ratings yet
Us 18 Stone Unpacking The Packed Unpacker
45 pages
MCA Grid Computing Notes
No ratings yet
MCA Grid Computing Notes
6 pages
Ashritha Resume
No ratings yet
Ashritha Resume
4 pages
Chapter 22
No ratings yet
Chapter 22
40 pages
Adbms Final Project
No ratings yet
Adbms Final Project
15 pages
Software Engineering I - CS504 Power Point Slides Lecture 04
No ratings yet
Software Engineering I - CS504 Power Point Slides Lecture 04
24 pages
OpenHIM Product Overview
No ratings yet
OpenHIM Product Overview
17 pages
Dynapac Automates Medius AP Automation: Invoice Processing With
No ratings yet
Dynapac Automates Medius AP Automation: Invoice Processing With
10 pages
Cloud Infrastructure Security at Different Laevels
No ratings yet
Cloud Infrastructure Security at Different Laevels
7 pages
As 4
No ratings yet
As 4
4 pages
ITNE3006 Lesson 7 - Activity
No ratings yet
ITNE3006 Lesson 7 - Activity
3 pages
GlusterFS Administration and Deployment: Definitive Reference for Developers and Engineers
From Everand
GlusterFS Administration and Deployment: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
From Everand
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

36 DC Expt9

Uploaded by

36 DC Expt9

Uploaded by

St.

Francis Institute of Technology

Department of Computer Engineering

Academic Year: 2023-2024 Semester: VIII

Aim: Case Study: Distributed File System

Pre-requisites: Distributed File System

Key Features of GFS:

● Scalability: GFS is designed to scale horizontally, allowing it to accommodate the

GFS Design Requirements

GFS system mainly stores 2 types of data

● Not the best fit for small files.

You might also like