0% found this document useful (0 votes)

106 views20 pages

The Google File System Final

This document summarizes the key points of the paper "The Google File System" by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. It discusses the need for a large, distributed, highly fault-tolerant file system to support Google's applications and data storage. The Google File System (GFS) is introduced as the solution, with an architecture that includes a master node to store metadata and chunkservers to store file data in large chunks. Reading and writing procedures are outlined that minimize the master's involvement for high performance and fault tolerance.

Uploaded by

sushmsn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views20 pages

The Google File System Final

Uploaded by

sushmsn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

The Google File System

By Sanjay Ghemawat, Howard Gobioff, and

Shun-Tak Leung
(Presented at SOSP 2003)

Introduction
Google – search engine.
Applications process lots of data.
Need good file system.
Solution: Google File System (GFS).
Motivational Facts
More than 15,000 commodity-class PC's.
Multiple clusters distributed worldwide.
Thousands of queries served per second.
One query reads 100's of MB of data.
One query consumes 10's of billions of CPU cycles.
Google stores dozens of copies of the entire Web!

Conclusion: Need large, distributed, highly fault-

tolerant file system.

Topics
Design Motivations
Architecture
Read/Write/Record Append
Fault-Tolerance
Performance Results
Design Motivations
1. Fault-tolerance and auto-recovery need to be
built into the system.
2. Standard I/O assumptions (e.g. block size)
have to be re-examined.
3. Record appends are the prevalent form of
writing.
4. Google applications and GFS should be co-
designed.

GFS Architecture (Analogy)

On a single-machine FS:
An upper layer maintains the metadata.
A lower layer (i.e. disk) stores the data in units
called “blocks”.
Upper layer store
In the GFS:
A master process maintains the metadata.
A lower layer (i.e. a set of chunkservers) stores
the data in units called “chunks”.
GFS Architecture
Master
r Metadata
t fo
e ques ta)
(r ada
m et
ta
Client tada
(me nse)
o
rep
Chunkserver Chunkserver
( rea
d/wr
ite r
eque
( rea st)
d/wr
ite r
espo
nse)
Linux FS Linux FS

GFS Architecture
What is a chunk?
Analogous to block, except larger.
Size: 64 MB!
Stored on chunkserver as file
Chunk handle (~ chunk file name) used to
reference chunk.
Chunk replicated across multiple chunkservers
Note: There are hundreds of chunkservers in a
GFS cluster distributed over multiple racks.
GFS Architecture
What is a master?
A single process running on a separate
machine.
Stores all metadata:
File namespace
File to chunk mappings
Chunk location information
Access control information
Chunk version numbers
Etc.

GFS Architecture
Master <-> Chunkserver Communication:
Master and chunkserver communicate
regularly to obtain state:
Is chunkserver down?
Are there disk failures on chunkserver?
Are any replicas corrupted?
Which chunk replicas does chunkserver store?
Master sends instructions to chunkserver:
Delete existing chunk.
Create new chunk.
GFS Architecture
Serving Requests:
Client retrieves metadata for operation
from master.
Read/Write data flows between client
and chunkserver.
Single master is not bottleneck,
because its involvement with read/write
operations is minimized.

Overview
Design Motivations
Architecture
Master
Chunkservers
Clients
Read/Write/Record Append
Fault-Tolerance
Performance Results
And now for the Meat…

Read Algorithm

Application
1
2
(file name, byte range) (file name,
chunk index)
Master
GFS Client
(chunk handle,
replica locations)

3
Read Algorithm

Chunk Server

Application 4
(chunk handle,
byte range)
Chunk Server
6 (data from file)

GFS Client (data from file)

Chunk Server
5

Read Algorithm
1. Application originates the read request.
2. GFS client translates the request from (filename,
byte range) -> (filename, chunk index), and sends
it to master.
3. Master responds with chunk handle and replica
locations (i.e. chunkservers where the replicas are
stored).
4. Client picks a location and sends the (chunk
handle, byte range) request to that location.
5. Chunkserver sends requested data to the client.
6. Client forwards the data to the application.
Read Algorithm (Example)

Indexer
1
2
Master Ch_1001
(crawl_99, 2048 bytes) (crawl_99, {3,8,12}
index: 3)
Ch_1002
GFS Client crawl_99 {1,8,14}
(ch_1003, Ch_1003
{chunkservers: {4,7,9}
4,7,9})

Read Algorithm (Example)

Calculating chunk index from byte range:
(Assumption: File position is 201,359,161 bytes)
Chunk size = 64 MB.
64 MB = 1024 *1024 * 64 bytes =
67,108,864 bytes.
201,359,161 bytes = 67,108,864 * 2 +
32,569 bytes.
So, client translates 2048 byte range ->
chunk index 3.
Read Algorithm (Example)

Chunk Server #4
4
Application
(ch_1003,
{chunkservers:
4,7,9})
Chunk Server #7
6 (2048 bytes of data)

GFS Client (2048 bytes of

data) Chunk Server #9
5

Write Algorithm

Application
1
2
(file name, data) (file name,
chunk index)
Master
GFS Client
(chunk handle,
primary and 3
secondary replica
locations)
Write Algorithm

Primary
Chunk
Buffer
Application
(Data)
Secondary
Chunk
Buffer
(Data)

GFS Client Secondary

(Data)
Chunk
Buffer
4

Write Algorithm
(write command,
serial order)
6 7
Primary
(Write Chunk
D1 | D2| D3| D4
command)
Application
5
Secondary
Chunk
D1 | D2| D3| D4

GFS Client Secondary

Chunk
D1 | D2| D3| D4
Write Algorithm

8
9 Primary
Chunk
(empty)
(response)
Application

(response)
Secondary
Chunk
(empty)

GFS Client Secondary

Chunk
(empty)

Write Algorithm
1. Application originates write request.
2. GFS client translates request from
(filename, data) -> (filename, chunk index),
and sends it to master.
3. Master responds with chunk handle and
(primary + secondary) replica locations.
4. Client pushes write data to all locations.
Data is stored in chunkservers’ internal
buffers.
5. Client sends write command to primary.
Write Algorithm
6. Primary determines serial order for data
instances stored in its buffer and writes the
instances in that order to the chunk.
7. Primary sends serial order to the
secondaries and tells them to perform the
write.
8. Secondaries respond to the primary.
9. Primary responds back to client.
Note: If write fails at one of chunkservers,
client is informed and retries the write.

Record Append Algorithm

Important operation at Google:
Merging results from multiple machines in one file.
Using file as producer - consumer queue.

1. Application originates record append request.

2. GFS client translates request and sends it to master.
3. Master responds with chunk handle and (primary +
secondary) replica locations.
4. Client pushes write data to all locations.
Record Append Algorithm
5. Primary checks if record fits in specified chunk.
6. If record does not fit, then the primary:
• pads the chunk,
• tells secondaries to do the same,
• and informs the client.
• Client then retries the append with the next
chunk.
7. If record fits, then the primary:
• appends the record,
• tells secondaries to do the same,
• receives responses from secondaries,
• and sends final response to the client.

Observations
Clients can read in parallel.
Clients can write in parallel.
Clients can append records in parallel.
Overview
Design Motivations
Architecture
Algorithms:
Read
Write
Record Append
Fault-Tolerance
Performance Results

Fault Tolerance
Fast Recovery: master and chunkservers are designed to restart
and restore state in a few seconds.
Chunk Replication: across multiple machines, across multiple
racks.
Master Mechanisms:
Log of all changes made to metadata.
Periodic checkpoints of the log.
Log and checkpoints replicated on multiple machines.
Master state is replicated on multiple machines.
“Shadow” masters for reading data if “real” master is down.

Data integrity:
Each chunk has an associated checksum.
Performance (Test Cluster)
Performance measured on cluster with:
1 master
16 chunkservers
16 clients
Server machines connected to central
switch by 100 Mbps Ethernet.
Same for client machines.
Switches connected with 1 Gbps link.

Performance (Test Cluster)

Performance (Real-world Cluster)

Cluster A:
Used for research and development.
Used by over a hundred engineers.
Typical task initiated by user and runs for a few hours.
Task reads MB’s-TB’s of data, transforms/analyzes the
data, and writes results back.
Cluster B:
Used for production data processing.
Typical task runs much longer than a Cluster A task.
Continuously generates and processes multi-TB data
sets.
Human users rarely involved.
Clusters had been running for about a week
when measurements were taken.
Performance (Real-world Cluster)

Performance (Real-world Cluster)

Many computers at each cluster (227, 342!)
On average, cluster B file size is triple cluster
A file size.
Metadata at chunkservers:
Chunk checksums.
Chunk Version numbers.
Metadata at master is small (48, 60 MB) ->
master recovers from crash within seconds.
Performance (Real-world Cluster)

Performance (Real-world Cluster)

Many more reads than writes.
Both clusters were in the middle of heavy
read activity.
Cluster B was in the middle of a burst of write
activity.
In both clusters, master was receiving 200-
500 operations per second -> master is not a
bottleneck.
Performance (Real-world Cluster)
Experiment in recovery time:
One chunkserver in Cluster B killed.

Chunkserver has 15,000 chunks containing

600 GB of data.
Limits imposed:
Cluster can only perform 91 concurrent clonings.
Each clone operation can consume at most 6.25 MB/s.
Took 23.2 minutes to restore all the chunks.
This is 440 MB/s.

Conclusion
Design Motivations
Architecture
Algorithms:
Fault-Tolerance
Performance Results

Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
Chapter_2_Google_File_System_250525_070947
No ratings yet
Chapter_2_Google_File_System_250525_070947
42 pages
GFS
No ratings yet
GFS
44 pages
M4_05_Google File System
No ratings yet
M4_05_Google File System
28 pages
Google_File_System_1
No ratings yet
Google_File_System_1
48 pages
Chapter_2_c8ad153f2f004857aca733db68105108_1712934164766
No ratings yet
Chapter_2_c8ad153f2f004857aca733db68105108_1712934164766
21 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Chapter 5a
No ratings yet
Chapter 5a
23 pages
An Overview of Google File System (GFS) _ Medium
No ratings yet
An Overview of Google File System (GFS) _ Medium
10 pages
BCA 5 Google File System
No ratings yet
BCA 5 Google File System
17 pages
ds_2016_17_lec18
No ratings yet
ds_2016_17_lec18
26 pages
storage-systems
No ratings yet
storage-systems
23 pages
AnalyzingGFS_HDFS
No ratings yet
AnalyzingGFS_HDFS
11 pages
GFS
No ratings yet
GFS
33 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
GFS
No ratings yet
GFS
9 pages
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
chap6
No ratings yet
chap6
54 pages
36 DC Expt9
No ratings yet
36 DC Expt9
4 pages
Unit-II (BIG DATA)
No ratings yet
Unit-II (BIG DATA)
9 pages
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
Chunky
No ratings yet
Chunky
3 pages
2 GFS
No ratings yet
2 GFS
30 pages
BDA-Unit-I
No ratings yet
BDA-Unit-I
18 pages
Google File System
No ratings yet
Google File System
9 pages
Lecture_14_HDFS_GFS
No ratings yet
Lecture_14_HDFS_GFS
30 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Google Fs
No ratings yet
Google Fs
35 pages
Saritha Gfs Report
No ratings yet
Saritha Gfs Report
28 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Google File System (GFS)
No ratings yet
Google File System (GFS)
18 pages
CC
No ratings yet
CC
17 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
BDA Complete Notes
100% (1)
BDA Complete Notes
88 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
Unit 3.4 Gfs and Hdfs
No ratings yet
Unit 3.4 Gfs and Hdfs
4 pages
Route Flap Damping
No ratings yet
Route Flap Damping
13 pages
Gfs Google File System 13331
No ratings yet
Gfs Google File System 13331
28 pages
Research On Cloud Data Storage
No ratings yet
Research On Cloud Data Storage
5 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
22 pages
What Is Distributed Data Processing?
No ratings yet
What Is Distributed Data Processing?
2 pages
Unit 2
No ratings yet
Unit 2
22 pages
Google File System
No ratings yet
Google File System
20 pages
Google File System Basics: Google World Wide Web Computers
No ratings yet
Google File System Basics: Google World Wide Web Computers
5 pages
Google File System
No ratings yet
Google File System
48 pages
MIT 6.824 - Lecture 3 - GFS
No ratings yet
MIT 6.824 - Lecture 3 - GFS
1 page
The Google File System: Kenneth Chiu
No ratings yet
The Google File System: Kenneth Chiu
40 pages
Google File System
No ratings yet
Google File System
22 pages
The Google File System
No ratings yet
The Google File System
21 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
BGP Convergence
No ratings yet
BGP Convergence
24 pages
Gluster Filesystem - Practical Method
From Everand
Gluster Filesystem - Practical Method
Fabian Mestre
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
A Quick Introduction To The Domain Name System: David Conrad
No ratings yet
A Quick Introduction To The Domain Name System: David Conrad
74 pages
Core-Stateless Fair Queueing: A Scalable Architecture To Approximate Fair Bandwidth Allocations in High Speed Networks
No ratings yet
Core-Stateless Fair Queueing: A Scalable Architecture To Approximate Fair Bandwidth Allocations in High Speed Networks
56 pages
Botnets: Randy Marchany Marchany@vt - Edu VA Tech IT Security Lab VASCAN 2005
No ratings yet
Botnets: Randy Marchany Marchany@vt - Edu VA Tech IT Security Lab VASCAN 2005
41 pages
8 RouterSupport
No ratings yet
8 RouterSupport
36 pages
Chord: A Scalable Peer-to-Peer Lookup Protocol For Internet Applications
No ratings yet
Chord: A Scalable Peer-to-Peer Lookup Protocol For Internet Applications
40 pages
Rarest First and Choke Algorithms Are Enough: Arnaud LEGOUT
No ratings yet
Rarest First and Choke Algorithms Are Enough: Arnaud LEGOUT
29 pages
Chord: A Scalable Peer-to-Peer Lookup Service For Internet Applications
No ratings yet
Chord: A Scalable Peer-to-Peer Lookup Service For Internet Applications
33 pages
A Case For End System Multicast: Yang-Hua Chu, Sanjay Rao and Hui Zhang Carnegie Mellon University
No ratings yet
A Case For End System Multicast: Yang-Hua Chu, Sanjay Rao and Hui Zhang Carnegie Mellon University
27 pages
A Multifaceted Approach To Understanding The Botnet Phenomenon
No ratings yet
A Multifaceted Approach To Understanding The Botnet Phenomenon
27 pages
Internet Architecture: CPS 214 (Nick Feamster) January 14, 2008
No ratings yet
Internet Architecture: CPS 214 (Nick Feamster) January 14, 2008
31 pages
Clustering and Sharing Incentives in Bittorrent Systems
No ratings yet
Clustering and Sharing Incentives in Bittorrent Systems
23 pages
Chord: A Scalable Peer-To-Peer Lookup Protocol For Internet Applications
No ratings yet
Chord: A Scalable Peer-To-Peer Lookup Protocol For Internet Applications
25 pages
A Case For End System Multicast
No ratings yet
A Case For End System Multicast
19 pages
Domain Name System: DNS
No ratings yet
Domain Name System: DNS
16 pages
Resilient Overlay Networks: David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris
No ratings yet
Resilient Overlay Networks: David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris
15 pages
Greening of The Internet
No ratings yet
Greening of The Internet
14 pages
Delayed Internet Routing Convergence: Craig Labovitz Abha Ahuja, Abhijit Bose Farnam Jahanian
No ratings yet
Delayed Internet Routing Convergence: Craig Labovitz Abha Ahuja, Abhijit Bose Farnam Jahanian
13 pages
Congestion Avoidance and Control: V. Jacobson
No ratings yet
Congestion Avoidance and Control: V. Jacobson
17 pages
Narada
No ratings yet
Narada
12 pages
CSC: Principles of Computer Networks: Demultiplexing
No ratings yet
CSC: Principles of Computer Networks: Demultiplexing
11 pages
Multi Cast
No ratings yet
Multi Cast
38 pages
Reducing Network Energy Consumption Via Sleeping and Rate-Adaptation
No ratings yet
Reducing Network Energy Consumption Via Sleeping and Rate-Adaptation
14 pages
Congestion Control For High Bandwidth-Delay Product Networks
No ratings yet
Congestion Control For High Bandwidth-Delay Product Networks
14 pages
Torrent Clustering
No ratings yet
Torrent Clustering
12 pages
Timer Interaction in Route Flap Damping
No ratings yet
Timer Interaction in Route Flap Damping
11 pages
Green Inertia - Firm Presentation-April 2011-1
No ratings yet
Green Inertia - Firm Presentation-April 2011-1
18 pages