0% found this document useful (0 votes)

9 views

Lecture 11 Google Architecture Design

Uploaded by

flowerinthedawnn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Lecture 11 Google Architecture Design

Uploaded by

flowerinthedawnn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Google Architecture Design

Main Content

❖ Introduction the case study: Google

❖ Overall architecture and design
philosophy
❖ Underlying communication paradigms
❖ Data storage and coordination
services
❖ Distributed computation services
❖ Summary
Google
Google case study

❖ Google is a US-based corporation

with its headquarters in Mountain View
❖ Offering Internet search and broader
web applications and earning revenue
largely from advertising
❖ Google ~ googol (10^100)
❖ Google was born out of a research
project at Stanford University (1998)
➢ It is a fascinating case study with
extremely demanding requirements,
particularly in terms of scalability,
reliability, performance and openness
Google as Search Engine

❖ Crawling
▪ To locate and retrieve the contents of the
Web and pass the contents onto the
indexing subsystem.
▪ Performed by Googlebot with the deep
searching technique.
▪ Caffeine in 2010 (continuous process of
crawling intended to offer more
freshness)
Google as Search Engine

❖ Indexing
▪ To produce an index for the contents of
the Web (like index at the back of books)
▪ Inverted index mapping words appearing
in web pages and other textual web
resources onto the positions occurring in
documents
▪ Also keep track of which pages link to a
given site
▪ Using the index to narrow down the set of
candidate web pages from billions to
perhaps tens of thousands
Google as Search Engine

❖ Ranking
▪ Indexing provides no information about
the relative importance of the web pages
containing a particular set of keywords
▪ Ranking focus on a system of ranking
whereby a higher rank is an indication of
the importance of a page
▪ Google uses the PageRank algorithm
▪ Ranking in Google is affected by many
factors (up to 200 factors)
Google as Search Engine

❖ Anatomy of a search engine

Google as Cloud Provider

❖ Offering other web-based applications

❖ Google is a major player in the area
of cloud computing
❖ Software as a service:
▪ Offering application-level software over
the Internet as web applications.
▪ Google Apps: Docs, Calendar, Sites…
❖ Platform as a service
▪ Offering distributed system APIs as
services across the Internet
▪ APIs used to support the development
and hosting of web applications
Overall Architecture and
Design Philosophy
• Physical model
• Overall system architecture
Physical Model

❖ Key philosophy
▪ Use very large numbers of commodity PCs to
produce a cost-effective environment for
distributed storage and computation
▪ Design the infrastructure using a range of
strategies to tolerate such failures
❖ Physical structures
▪ Commodity PCs are organized in racks with
between 40 and 80 PCs in a given rack.
▪ Racks are organized into clusters
▪ Each rack is connected to both switches for
redundancy;
▪ Clusters are housed in Google data centres that
are spread around the world
Overall system architecture

❖Requirements
▪ Scalability
• Being able to deal with more data
• Being able to deal with more queries
• Seeking better results
▪ Reliability:
• Stringent reliability requirements, especially with
regard toavailability of services.
• a 99.9% service level agreement (effectively, a system
guarantee) to paying customers of Google Apps
covering Gmail, Google Calendar, Google Docs,
Google Sites and Google Talk.
▪ Performance
• Overall performance of the system is critical
• Target of completing web search operations in 0.2
seconds
Overall system architecture

❖Requirements (Continues)
▪ Openness
• Support further development in the
range of web applications
• an infrastructure that is extensible and
provides support for the development
of new applications
Overall system architecture

❖ Google infrastructure
▪ Communication paradims
• Protocol buffers component
• Google publish-subscriber service
▪ Data and coordination services: GFS,
Chubby, Bigtable
▪ Distributed computation services
• MapReduce
• Sawzall
Overall system architecture

❖Associated design principles

▪ Simplicity: do one thing and do it well
▪ Performance: every millisecond counts
▪ Stringent testing regimes on software
Underlying
Communication
Paradigms
• Remote invocation
• Publish-subscribe
• Summary of key design choices for communication
Remote invocation

❖ Protocol buffers used for a substantial

majority of interactions within the
infrastructure
❖ Provide a language- and platform-neutral
way to specify and serialize data
❖ A language is provided for the specification
of messages
❖ Most common use of protocol buffers, is to
specify RPC exchanges across the network
Remote invocation
Publish-subscribe

❖ A publish-subscribe system intended to

be used where distributed events need to
be disseminated in real time and with
reliability guarantees to potentially large
numbers of recipients.
❖ Google adopts a topic-based publish-
subscribe system, providing several
channels for event streams
❖ A strong emphasis on both reliable and
timely delivery
▪ Reliability: the system maintains redundant
trees;
▪ Timely delivery: a quality of service management
technique to control message flows
Key design choices
Key design choices
Data storage and
coordination services
• Google File System (GFS)
• Chubby
• Bigtable
Google File System (GFS)

❖ GFS is a kind of distributed file system

❖ It offers similar abstractions but is
specialized for the very particular
requirements
❖ GFS Requirements
▪ Run reliably on the physical architecture
▪ Be optimized for the patterns of usage within
Google (the types of files stored and the
patterns of access to those files)
▪ Meet all the requirements for the Google
infrastructure as a whole
Google File System (GFS)

❖ GFS interface:
▪ Provides a conventional file system
interface offering a hierarchical
namespace with individual files identified
by pathnames
▪ Many of the operations will be familiar to
users of such file systems
Google File System (GFS)

❖ GFS architecture:
▪ storage of files in fixed-size chunks
▪ the job of GFS is to provide a mapping
from files to chunks and then to support
standard operations on files
▪ Each GFS cluster has a single master
and multiple chunkservers
Chubby

❖ Chubby is a crucial service at the heart of

the Google infrastructure offering storage
and coordination services for other
infrastructure services
❖ Four distinct capabilities
▪ Provides coarse-grained distributed locks to
synchronize distributed activities
▪ Provides a file system offering the reliable
storage of small files
▪ Support the election of a primary in a set of
replicas
▪ Used as a name service within Google
Chubby

❖ Chubby interface
▪ Provides an abstraction based on a file system
▪ Files are organized into a hierarchical
namespace using directory structures
Chubby

❖ Chubby architecture:
▪ A single instance of a Chubby system is known
as a cell;
▪ Each cell consists of a relatively small number
of replicas (typically five) with one designated
as the master
▪ Client applications access this set of replicas
via a Chubby library
▪ Each replica maintains a small database whose
elements are entities in the Chubby namespace
▪ A Chubby session is a relationship between a
client and a Chubby cell
Chubby

❖ Chubby architecture:
Bigtable

❖ Google would be to implement (or reuse) a

distributed database with full operators
❖ The achievement of good performance and
scalability in such distributed databases is
recognized as a difficult problem
❖ Google therefore has introduced Bigtable
which retains the table model offered by
relational databases but with a much
simpler interface designed to support the
efficient storage and retrieval of quite
massive structured datasets
Bigtable

❖ Bigtable interface
▪ Distributed storage system that supports the
storage of potentially vast volumes of structured
data
▪ Given table is a three-dimensional structure
containing cells indexed by a row key, a column
key and a timestamp
▪ Bigtable API:
• creation, deletion of tables, column families within
tables;
• accessing data from given rows;
• writing or deleting cell values
• carrying out atomic row mutations
• iterating over different column families…
Bigtable

❖ Bigtable interface
Bigtable

❖ Bigtable architecture
▪ A Bigtable is broken up into tablets, with a given
tablet being approximately 100–200 megabytes
in size
▪ Bigtable is to manage tablets and to support the
operations described above for accessing and
changing the associated structured data
▪ A single instance of a Bigtable implementation
is known as a cluster
▪ Each cluster can store a number of tables
Bigtable

❖ Bigtable architecture
Design choices
Distributed computation
services
• MapReduce
• Sawzall
MapReduce

❖ MapReduce is a simple programming

model to support the development of
such applications, hiding underlying
detail from the programmer including:
▪ details related to the parallelization of the
computation
▪ monitoring and recovery from failure
▪ data management and load balancing
onto the underlying physical
infrastructure
MapReduce

❖ MapReduce interface: key principle

behind MapReduce is the recognition
that many parallel computations share
the same overall pattern
▪ break the input data into a number of
chunks;
▪ carry out initial processing on these
chunks of data to produce intermediary
results;
▪ combine the intermediary results to
produce the final output.
MapReduce

❖ MapReduce interface
MapReduce

❖ MapReduce architecture
▪ Implemented by a library focusing on specifying the
map and reduce functions. Key phases:
• The first stage is to split the input file into M pieces
• The MapReduce library then starts a set of worker machines
(workers) from the pool available in the cluster
• A worker will first read the contents of the input file allocated
to that map task, extract the key-value pairs and supply them
as input to the map function
• The intermediary buffers are periodically written to a file local
to the map computation
• When a worker is assigned to carry out a reduce function, it
reads its corresponding partition from the local disk of the
map workers using RPC
MapReduce

❖ MapReduce architecture
Sawzall

❖ Sawzall is an interpreted
programming language for performing
parallel data analysis over very large
datasets in highly distributed
environments such as that provided by
the physical The Google infrastructure.
Design choices
Thank You !

Cloud Unit3
No ratings yet
Cloud Unit3
26 pages
Jungian 16-Type Personality Assessment Questionnaire
0% (1)
Jungian 16-Type Personality Assessment Questionnaire
3 pages
Building Scalable Web Sites
No ratings yet
Building Scalable Web Sites
21 pages
Induction Agents
No ratings yet
Induction Agents
100 pages
Google Case Study
No ratings yet
Google Case Study
23 pages
Google Distributed System
No ratings yet
Google Distributed System
40 pages
Sub: Dbms.. Topic: Architectue of Google..: Googie Architecture Is A Form of Modern Architecture
No ratings yet
Sub: Dbms.. Topic: Architectue of Google..: Googie Architecture Is A Form of Modern Architecture
7 pages
TLW Assignment 3 27-Sep-2024 10-32-28
No ratings yet
TLW Assignment 3 27-Sep-2024 10-32-28
28 pages
storage-systems
No ratings yet
storage-systems
23 pages
Unit4
No ratings yet
Unit4
41 pages
Group E
No ratings yet
Group E
29 pages
Unit - 4-Cloud
No ratings yet
Unit - 4-Cloud
122 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
UNIT-IV notes.docx
No ratings yet
UNIT-IV notes.docx
15 pages
CC
No ratings yet
CC
17 pages
Ccomputing Madurya
No ratings yet
Ccomputing Madurya
20 pages
Big Table
No ratings yet
Big Table
21 pages
Chubby System and Google API
No ratings yet
Chubby System and Google API
13 pages
Storage Architecture and Challenges: Faculty Summit, July 29, 2010 Andrew Fikes, Principal Engineer
No ratings yet
Storage Architecture and Challenges: Faculty Summit, July 29, 2010 Andrew Fikes, Principal Engineer
25 pages
Big Data NoSLQ Kopyası
No ratings yet
Big Data NoSLQ Kopyası
51 pages
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
No ratings yet
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
12 pages
G G 'S Bigtable: Name: Tunahan YILDIRIM Number:2195303 Paper: A Distributed Storage System For Structured Data
No ratings yet
G G 'S Bigtable: Name: Tunahan YILDIRIM Number:2195303 Paper: A Distributed Storage System For Structured Data
38 pages
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
Bigtable A System For Distributed Structured Storage: Motivation
No ratings yet
Bigtable A System For Distributed Structured Storage: Motivation
9 pages
5.3.1 Google App Engine
No ratings yet
5.3.1 Google App Engine
5 pages
Unit 5
No ratings yet
Unit 5
19 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
26 pages
CC Ques Bank Cloud Computing QB UNIT 4
No ratings yet
CC Ques Bank Cloud Computing QB UNIT 4
11 pages
1. GAE
No ratings yet
1. GAE
13 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
UNIT5
No ratings yet
UNIT5
34 pages
Google Talk: Ed Austin 12-09-09
No ratings yet
Google Talk: Ed Austin 12-09-09
51 pages
2. Programming Environment for GAE
No ratings yet
2. Programming Environment for GAE
35 pages
Programming Support of Google App Engine
No ratings yet
Programming Support of Google App Engine
8 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
23 pages
3.2 - Data Storage Services
No ratings yet
3.2 - Data Storage Services
98 pages
Bba Unit-1
No ratings yet
Bba Unit-1
11 pages
Bigtable - A Distributed Storage System For Structured Data
No ratings yet
Bigtable - A Distributed Storage System For Structured Data
2 pages
05 Data Storage Services
No ratings yet
05 Data Storage Services
75 pages
Public File Sharing System Using Google Drive
No ratings yet
Public File Sharing System Using Google Drive
4 pages
Google: Designs, Lessons and Advice From Building Large Distributed Systems
100% (3)
Google: Designs, Lessons and Advice From Building Large Distributed Systems
73 pages
Google Cloud Fundamentals: Core Infrastructure: Summary and Next Steps
No ratings yet
Google Cloud Fundamentals: Core Infrastructure: Summary and Next Steps
15 pages
2.2 Storage and Database Services
No ratings yet
2.2 Storage and Database Services
64 pages
2.2 Storage and Database Services
No ratings yet
2.2 Storage and Database Services
64 pages
Unit-5 Final
No ratings yet
Unit-5 Final
19 pages
System Design
No ratings yet
System Design
56 pages
Web Application Architecture. What's Web Application Architecture - by Viplove Prakash - Geek Culture - Sep, 2021 - Medium
No ratings yet
Web Application Architecture. What's Web Application Architecture - by Viplove Prakash - Geek Culture - Sep, 2021 - Medium
7 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
CC - Lecture 8-Final
No ratings yet
CC - Lecture 8-Final
51 pages
System Design
No ratings yet
System Design
56 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Ccs335 CC Unit IV Cloud Computing Unit 4 Notes
No ratings yet
Ccs335 CC Unit IV Cloud Computing Unit 4 Notes
42 pages
CH04 Designing Architecture
No ratings yet
CH04 Designing Architecture
42 pages
Unit 3
No ratings yet
Unit 3
24 pages
002 Bigtable
No ratings yet
002 Bigtable
16 pages
Chapter 3 ITT545
No ratings yet
Chapter 3 ITT545
39 pages
05_Storage_and_Database_Services
No ratings yet
05_Storage_and_Database_Services
74 pages
System Design Handbooks
No ratings yet
System Design Handbooks
13 pages
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Simple Golang Programming for Beginners
From Everand
Simple Golang Programming for Beginners
Terry T. Diaz
No ratings yet
The Non-Existence of God
No ratings yet
The Non-Existence of God
27 pages
Research Officer
No ratings yet
Research Officer
4 pages
Paradox of Choice - Schwartz.ebs
100% (1)
Paradox of Choice - Schwartz.ebs
23 pages
GS Plasto 4F-2021
No ratings yet
GS Plasto 4F-2021
1 page
P2,5 Indoor 3,5x2,5 M Palembang
No ratings yet
P2,5 Indoor 3,5x2,5 M Palembang
2 pages
The Literature Review Section of A Research Report Might Include A Summary of Which of The Following
100% (1)
The Literature Review Section of A Research Report Might Include A Summary of Which of The Following
6 pages
Flipkart HDFC Offer Cashback Offers On Electronics Mobiles Laptops Clothing 2019
No ratings yet
Flipkart HDFC Offer Cashback Offers On Electronics Mobiles Laptops Clothing 2019
8 pages
Iso 10816 5 en PDF
No ratings yet
Iso 10816 5 en PDF
8 pages
AMT 2202 Final Learning Module 1
No ratings yet
AMT 2202 Final Learning Module 1
10 pages
Download full Andy me crisis and transformation on the lean journey 2nd ed Edition Dennis ebook all chapters
100% (13)
Download full Andy me crisis and transformation on the lean journey 2nd ed Edition Dennis ebook all chapters
60 pages
Industrial Training Report
No ratings yet
Industrial Training Report
50 pages
Mary Wright Priest
No ratings yet
Mary Wright Priest
2 pages
By Jamie Andreas: "The Principles of Correct Practice For Guitar"
No ratings yet
By Jamie Andreas: "The Principles of Correct Practice For Guitar"
33 pages
Codinng and Robotics Report
No ratings yet
Codinng and Robotics Report
3 pages
10 Primeras Paginas de Ingles
100% (1)
10 Primeras Paginas de Ingles
21 pages
Field Manager Course
100% (1)
Field Manager Course
183 pages
Proposed Work
No ratings yet
Proposed Work
23 pages
Case Study (2a) - 1 - Worksheet Questions
No ratings yet
Case Study (2a) - 1 - Worksheet Questions
2 pages
Financial Literacy Student
No ratings yet
Financial Literacy Student
3 pages
12 Angry Men - Hypothesis
No ratings yet
12 Angry Men - Hypothesis
3 pages
Tutorial 4 Sim
No ratings yet
Tutorial 4 Sim
2 pages
Poem - Leisure
No ratings yet
Poem - Leisure
3 pages
Critical Care of COVID-19 in The Emergency Department 2021
No ratings yet
Critical Care of COVID-19 in The Emergency Department 2021
214 pages
TEST 1 (45 Minutes) I. Complete Using The Correct Form of The Words in Brackets
No ratings yet
TEST 1 (45 Minutes) I. Complete Using The Correct Form of The Words in Brackets
2 pages
Databases Description 1
No ratings yet
Databases Description 1
8 pages
Amara Berri
No ratings yet
Amara Berri
1 page
Parathvam
No ratings yet
Parathvam
5 pages
Q-Block Done
No ratings yet
Q-Block Done
4 pages

Lecture 11 Google Architecture Design

Uploaded by

Lecture 11 Google Architecture Design

Uploaded by

Google Architecture Design

❖ Introduction the case study: Google

❖ Google is a US-based corporation

❖ Anatomy of a search engine

❖ Offering other web-based applications

❖Associated design principles

❖ Protocol buffers used for a substantial

❖ A publish-subscribe system intended to

❖ GFS is a kind of distributed file system

❖ Chubby is a crucial service at the heart of

❖ Google would be to implement (or reuse) a

❖ MapReduce is a simple programming

❖ MapReduce interface: key principle

You might also like