0% found this document useful (0 votes)

14 views3 pages

GFD Summary

GFS is a scalable and fault-tolerant file system designed to meet Google's storage needs, characterized by large file sizes, high throughput, and a master-slave architecture. It employs relaxed consistency, automatic recovery mechanisms, and chunk replication to ensure data integrity and availability. GFS effectively supports large-scale applications by managing petabytes of data across numerous machines.

Uploaded by

MANGAL KALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

GFD Summary

Uploaded by

MANGAL KALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 3

1.

Introduction
GFS was created to meet Google’s unique storage needs, where conventional file systems proved
inefficient. The key characteristics that shaped its design include:
• Component Failures as the Norm: Hardware failures are frequent and must be managed
transparently.
• Large File Sizes: Most files are multi-gigabyte in size.
• Workload Characteristics: Workloads consist of large streaming reads, frequent appends,
and minimal random writes.
• High Throughput: The system prioritizes throughput over low latency.

2. Design Overview
GFS follows a master-slave architecture, where:
• Master Server: Maintains metadata and manages file system operations.
• Chunkservers: Store actual file data in fixed-size chunks (typically 64 MB) and replicate
them.
• Clients: Interact with the master for metadata and communicate directly with chunkservers
for data operations.
Key design decisions include:
• File Mutability: Files are mostly appended, not modified in place.
• Relaxed Consistency Model: System ensures high availability but does not strictly enforce
consistency.
• Automatic Recovery Mechanisms: Self-healing replication and rebalancing of chunks
across chunkservers.

3. Architecture
3.1 Master Server
• Stores namespace, file-to-chunk mapping, and chunk metadata.
• Keeps all metadata in memory for fast access.
• Logs changes persistently and periodically checkpoints the state.
• Assigns and reassigns chunks to chunkservers dynamically.

3.2 Chunkservers
• Store file chunks, each identified by a unique 64-bit chunk handle.
• Chunks are replicated (default: 3 replicas) for fault tolerance.
• Periodically communicate with the master to report chunk health.

3.3 Clients
• Query the master for chunk locations and cache this information.
• Interact directly with chunkservers for reading/writing data.
• Minimize interaction with the master to reduce bottlenecks.
4. Data Consistency Model
4.1 Consistency Guarantees
GFS provides relaxed consistency, meaning:
• Writes are atomic at the chunk level but not always immediately consistent.
• The system ensures eventual consistency, meaning a consistent state is reached given
sufficient time.

4.2 Types of Writes

• Record Append: Clients append data to a file, and GFS guarantees the data is written at
least once.
• Write: Overwrites part of a chunk, leading to possible inconsistencies across replicas.

5. System Interactions
5.1 File Reads
1. Client requests chunk location from master.
2. Master returns chunkserver locations.
3. Client reads directly from the chunkserver.

5.2 File Writes

1. Master designates a primary chunkserver.
2. Client writes to all replicas.
3. Primary applies changes, followed by secondaries.
4. Primary notifies client upon successful replication.

5.3 Record Append

1. Master assigns primary chunkserver.
2. Primary chooses an offset and writes data.
3. If a replica fails, append may succeed partially, leading to duplicate records.

6. Fault Tolerance
6.1 Chunk Replication
• Chunks are replicated across multiple chunkservers.
• Master ensures replication levels are maintained.

6.2 Master Recovery

• Master state is checkpointed frequently and can be reconstructed from logs.

6.3 Data Integrity

• Each chunk has checksums stored separately.
• Clients validate data using checksums to detect corruption.

7. Performance Optimizations
7.1 Caching
• Clients cache metadata to reduce master load.

7.2 Load Balancing

• Master distributes chunks based on storage and workload patterns.

7.3 Rebalancing
• Underutilized chunkservers are assigned additional chunks.

7.4 Garbage Collection

• Deleted files are marked for deletion and purged later.

8. Real-World Deployment
• GFS powers Google’s large-scale applications, including indexing and data processing tasks.
• Handles petabytes of data across thousands of machines.

9. Conclusion
GFS is a highly scalable and fault-tolerant file system tailored to Google’s needs. Its design
principles, including relaxed consistency, replication, and self-healing mechanisms, make it well-
suited for large-scale distributed data processing.

Free Space Optics Link Design Project
100% (5)
Free Space Optics Link Design Project
63 pages
Unit Iv Cogeneration and Residual Heat Recovery
No ratings yet
Unit Iv Cogeneration and Residual Heat Recovery
16 pages
PSS SB 3006-3 Operating Manual 20955-En-05
No ratings yet
PSS SB 3006-3 Operating Manual 20955-En-05
83 pages
Microsoft PL-900 Exam - Questions and Answers - CertLibrary - Com-Pg1
0% (1)
Microsoft PL-900 Exam - Questions and Answers - CertLibrary - Com-Pg1
11 pages
Power Line Carrier
No ratings yet
Power Line Carrier
15 pages
7VK87
No ratings yet
7VK87
5 pages
Google File System Paper - Summary
50% (2)
Google File System Paper - Summary
4 pages
Sodapdf
No ratings yet
Sodapdf
6 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
Kormarine 2021 Exhibitor List
No ratings yet
Kormarine 2021 Exhibitor List
6 pages
Beyond Automation AIs Strategic Role in Insurance
No ratings yet
Beyond Automation AIs Strategic Role in Insurance
20 pages
Net-Centric Past Questions Answers
No ratings yet
Net-Centric Past Questions Answers
7 pages
Dokumen - Tips - Hsupa Deployment Guidelines PDF
No ratings yet
Dokumen - Tips - Hsupa Deployment Guidelines PDF
14 pages
BaslerBE1 11g
No ratings yet
BaslerBE1 11g
690 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
22 pages
Manual GEM GSM-19 PDF
No ratings yet
Manual GEM GSM-19 PDF
149 pages
DD 11-16-23 SILVER ET2 Datasheet
No ratings yet
DD 11-16-23 SILVER ET2 Datasheet
2 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
Bcom ITM - Informatics 2B
No ratings yet
Bcom ITM - Informatics 2B
160 pages
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
PCA82C250 / 251 CAN Transceiver: Application Note
No ratings yet
PCA82C250 / 251 CAN Transceiver: Application Note
24 pages
Google File System
No ratings yet
Google File System
48 pages
The Google File System: Kenneth Chiu
No ratings yet
The Google File System: Kenneth Chiu
40 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Adobe Photoshop Interface
No ratings yet
Adobe Photoshop Interface
4 pages
Lecture 14 HDFS GFS
No ratings yet
Lecture 14 HDFS GFS
30 pages
Saritha Gfs Report
No ratings yet
Saritha Gfs Report
28 pages
Google Fs
No ratings yet
Google Fs
35 pages
LTE and Scheduling
No ratings yet
LTE and Scheduling
25 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
A5 N Updated Brochure 65 1125 Tons
No ratings yet
A5 N Updated Brochure 65 1125 Tons
14 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
2 GFS
No ratings yet
2 GFS
30 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
Google File System
No ratings yet
Google File System
20 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
The Google File System Final
No ratings yet
The Google File System Final
20 pages
Unit 2
No ratings yet
Unit 2
22 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Google File System 1
No ratings yet
Google File System 1
48 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
Gfs Google File System 13331
No ratings yet
Gfs Google File System 13331
28 pages
Chap 6
No ratings yet
Chap 6
54 pages
Google File System (GFS)
No ratings yet
Google File System (GFS)
18 pages
The Google File System
No ratings yet
The Google File System
21 pages
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
No ratings yet
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
33 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
Ds 2016 17 Lec18
No ratings yet
Ds 2016 17 Lec18
26 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
BCA 5 Google File System
No ratings yet
BCA 5 Google File System
17 pages
Distributed File System Google File System
No ratings yet
Distributed File System Google File System
44 pages
Comparison of Neutral Earthing Methods: Students Corner
No ratings yet
Comparison of Neutral Earthing Methods: Students Corner
12 pages
Google File System
No ratings yet
Google File System
9 pages
Large Scale Distributed File System Survey
No ratings yet
Large Scale Distributed File System Survey
7 pages
FCHN - Module 1 - Fundamentals of Computer System
No ratings yet
FCHN - Module 1 - Fundamentals of Computer System
14 pages
Chapter 2 Google File System 250525 070947
No ratings yet
Chapter 2 Google File System 250525 070947
42 pages
Detailed Performance Test Plan Example
No ratings yet
Detailed Performance Test Plan Example
18 pages
Reozjd: Model
No ratings yet
Reozjd: Model
4 pages
Google File System
No ratings yet
Google File System
22 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
AnalyzingGFS HDFS
No ratings yet
AnalyzingGFS HDFS
11 pages
M4 - 05 - Google File System
No ratings yet
M4 - 05 - Google File System
28 pages
Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Demands of Google's Data Processing Needs. Performance, Scalability, Reliability, and Availability. A Proprietary DFS
No ratings yet
Demands of Google's Data Processing Needs. Performance, Scalability, Reliability, and Availability. A Proprietary DFS
9 pages
Mac OS X Security Checklist
No ratings yet
Mac OS X Security Checklist
8 pages
Lab Session 01
No ratings yet
Lab Session 01
8 pages
Qvproperties
No ratings yet
Qvproperties
6 pages
Assisnment # 1 Os
No ratings yet
Assisnment # 1 Os
6 pages
User Management Module
No ratings yet
User Management Module
3 pages
Unit 3.4 Gfs and Hdfs
No ratings yet
Unit 3.4 Gfs and Hdfs
4 pages
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
Cardstudio Datasheet en Us
No ratings yet
Cardstudio Datasheet en Us
2 pages
36 DC Expt9
No ratings yet
36 DC Expt9
4 pages
DS Mod 5.2
No ratings yet
DS Mod 5.2
6 pages
Google File System Basics: Google World Wide Web Computers
No ratings yet
Google File System Basics: Google World Wide Web Computers
5 pages
What Is Distributed Data Processing?
No ratings yet
What Is Distributed Data Processing?
2 pages
Appendix 5B. Preliminary Electrical Design Drawings Part4
No ratings yet
Appendix 5B. Preliminary Electrical Design Drawings Part4
1 page
Low-Temperature Heating and Cooling: Augustin Mouchot 1878 Universal Exhibition in Paris Sahara Frank Shuman
No ratings yet
Low-Temperature Heating and Cooling: Augustin Mouchot 1878 Universal Exhibition in Paris Sahara Frank Shuman
2 pages
Data Analyst - CV
No ratings yet
Data Analyst - CV
1 page
MIT 6.824 - Lecture 3 - GFS
No ratings yet
MIT 6.824 - Lecture 3 - GFS
1 page
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet

GFD Summary

Uploaded by

GFD Summary

Uploaded by

1.

4.2 Types of Writes

5.2 File Writes

5.3 Record Append

6.2 Master Recovery

6.3 Data Integrity

7.2 Load Balancing

7.4 Garbage Collection

You might also like