0% found this document useful (0 votes)

57 views86 pages

ESSIR MapReduce For Indexing

The document describes MapReduce, a programming model for distributed computing. It discusses how MapReduce is used at Google for large-scale data processing tasks like web search indexing and machine learning. Examples are provided of using MapReduce to solve problems like word counting in documents and computing word probabilities. Optimization techniques for MapReduce like combiners are also covered.

Uploaded by

Derly R. Oha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views86 pages

ESSIR MapReduce For Indexing

Uploaded by

Derly R. Oha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

MapReduce

and its use for indexing

The Programming Model and Practice

Enrique Alfonseca
Manager, Natural Language Understanding
Google Research Zurich
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ Usability
What is MapReduce?

A programming model for large-scale distributed data

processing
● Simple, elegant concept
● Restricted, yet powerful programming construct
● Building block for other parallel programming tools
● Extensible for different applications

Also an implementation of a system to execute such

programs
● Take advantage of parallelism
● Tolerate failures and jitters
● Hide messy internals from users
● Provide tuning knobs for different applications
Programming Model

Inspired by Map/Reduce in functional programming

languages, such as LISP from 1960's, but not equivalent

Map(k,v) --> (k', v') Reduce(k',v'[]) --> v"

Group (k', v')s by k'
Mapper Reducer

Input Output
MapReduce Execution Overview

User
Program
(1) fork (1) fork
(1) fork

Master

(2) assign
map (2) assign
reduce
worker
split 0
m ote worker
(6) write output
split 1 )re d file 0
(4) local write (5 rea
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on local disks) phase files
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ Usability
Use of MapReduce inside Google

Stats for Month Aug.'04 Mar.'06 Sep.'07

Number of jobs 29,000 171,000 2,217,000

Avg. completion time (secs) 634 874 395
Machine years used 217 2,002 11,081
Map input data (TB) 3,288 52,254 403,152
Map output data (TB) 758 6,743 34,774
reduce output data (TB) 193 2,970 14,018
Avg. machines per job 157 268 394

Unique implementations
Mapper 395 1958 4083
Reducer 269 1208 2418

From "MapReduce: simplified data processing on large clusters"

MapReduce inside Google

Googlers' hammer for 80% of our data crunching

● Large-scale web search indexing
● Clustering problems for Google News
● Produce reports for popular queries, e.g. Google Trend
● Processing of satellite imagery data
● Language model processing for statistical machine
translation
● Large-scale machine learning problems
● Just a plain tool to reliably spawn large number of tasks
○ e.g. parallel data backup and restore
The other 20%? e.g. Pregel
Use of MR in System Health Monitoring
● Monitoring service talks to every
server frequently
● Collect
○ Health signals
○ Activity information
○ Configuration data
● Store time-series data forever
● Parallel analysis of repository data
○ MapReduce/Sawzall
Investigating System Health Issues

● Case study
○ Higher DRAM errors observed in a new GMail cluster
○ Similar servers running GMail elsware not affected
■ Same version of the software, kernel, firmware, etc.
○ Bad DRAM is the initial culprit
■ ... but that same DRAM model was fairly healthy elsewhere
○ Actual problem: bad motherboard batch
■ Poor electrical margin in some memory bus signals
■ GMail got more than its fair share of the bad batch
■ Analysis of this batch allocated to other services confirmed the
theory

● Analysis possible by having all relevant data in one place

and processing power to digest it
○ MapReduce is part of the infrastructure
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ Usability
Application Examples

● Word count and frequency in a large set of documents

○ Power of sorted keys and values
○ Combiners for map output

● Computing average income in a city for a given year

○ Using customized readers to
■ Optimize MapReduce
■ Mimic rudimentary DBMS functionality

● Overlaying satellite images

○ Handling various input formats using protocol bufers
Word Count Example

● Input: Large number of text documents

● Task: Compute word count across all the documents

Solution
● Mapper:
○ For every word in a document output (word, "1")
● Reducer:
○ Sum all occurrences of words and output (word, total_count)
Word Count Solution

//Pseudo-code for "word counting"

map(String key, String value):
// key: document name,
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");

reduce(String key, Iterator values):

// key: a word
// values: a list of counts
int word_count = 0;
for each v in values:
word_count += ParseInt(v);
Emit(key, AsString(word_count));

No types, just strings*

Word Count Optimization: Combiner

● Apply reduce function to map output before it is sent to

reducer
○ Reduces number of records output by the mapper!

Partition (k', v')s from

Map(k,v) --> (k', v') Mappers to Reducers Reduce(k',v'[]) --> v"
according to k'
Mapper C
Reducer
Mapper C
Reducer
Input split inputs
Input Mapper C
Input Reducer
Input Output
Input Mapper C
Input Reducer
Input Output
Input
Input Output
Input
Output
Word Probability Example

● Input: Large number of text documents

● Task: Compute word probabilities across all the documents
○ Frequency is calculated using the total word count

● A naive solution with basic MapReduce model requires two

MapReduces
○ MR1: count number of all words in these documents
■ Use combiners
○ MR2: count number of each word and divide it by the total
count from MR1
Word Probability Example

● Can we do better?

● Two nice features of Google's MapReduce implementation

○ Ordering guarantee of reduce key
○ Auxiliary functionality: EmitToAllReducers(k, v)

● A nice trick: To compute the total number of words in all

documents
○ Every map task sends its total world count with key ""
to ALL reducer splits
○ Key "" will be the first key processed by reducer
■ Sum of its values → total number of words!
Word Probability Solution:
Mapper with Combiner
map(String key, String value):
// key: document name, value: document contents
int word_count = 0;
for each word w in value:
EmitIntermediate(w, "1");
word_count++;
EmitIntermediateToAllReducers("", AsString(word_count));

combine(String key, Iterator values):

// Combiner for map output
// key: a word, values: a list of counts
int partial_word_count = 0;
for each v in values:
partial_word_count += ParseInt(v);
Emit(key, AsString(partial_word_count));
Word Probability Solution: Reducer

reduce(String key, Iterator values):

// Actual reducer
// key: a word
// values: a list of counts
if (is_first_key):
assert("" == key); // sanity check
total_word_count_ = 0;
for each v in values:
total_word_count_ += ParseInt(v)
else:
assert("" != key); // sanity check
int word_count = 0;
for each v in values:
word_count += ParseInt(v);
Emit(key, AsString(word_count / total_word_count_));
Application Examples

● Word frequency in a large set of documents

○ Power of sorted keys and values
○ Combiners for map output

● Computing average income in a city for a given year

○ Using customized readers to
■ Optimize MapReduce
■ Mimic rudimentary DBMS functionality

● Overlaying satellite images

○ Handling various input formats using protocol bufers
Average Income In a City

SSTable 1: (SSN, {Personal Information})

123456:(John Smith;Sunnyvale, CA)
123457:(Jane Brown;Mountain View, CA)
123458:(Tom Little;Mountain View, CA)

SSTable 2: (SSN, {year, income})

123456:(2007,$70000),(2006,$65000),(2005,$6000),...
123457:(2007,$72000),(2006,$70000),(2005,$6000),...
123458:(2007,$80000),(2006,$85000),(2005,$7500),...

Task: Compute average income in each city in 2007

Note: Both inputs sorted by SSN

Average Income in a City Basic Solution
Mapper 1a: Mapper 1b:
Input: SSN → Personal Information Input: SSN → Annual Incomes
Output: (SSN, City) Output: (SSN, 2007 Income)

Reducer 1:
Input: SSN → {City, 2007 Income}
Output: (SSN, [City, 2007 Income])

Mapper 2:
Input: SSN → [City, 2007 Income]
Output: (City, 2007 Income)

Reducer 2:
Input: City → 2007 Incomes
Output: (City, AVG(2007 Incomes))
Average Income in a City Basic Solution
Mapper 1a: Mapper 1b:
Input: SSN → Personal Information Input: SSN → Annual Incomes
Output: (SSN, City) Output: (SSN, 2007 Income)

Reducer 1:
Input: SSN → {City, 2007 Income}
Output: (SSN, [City, 2007 Income])
Our Inputs are sorted

Custom input readers

Mapper 2:
Input: SSN → [City, 2007 Income]
Output: (City, 2007 Income)

Reducer 2:
Input: City → 2007 Incomes
Output: (City, AVG(2007 Incomes))
Average Income in a Joined Solution

Mapper:
Input: SSN → Personal Information and Incomes
Output: (City, 2007 Income)

Reducer
Input: City → 2007 Income
Output: (City, AVG(2007 Incomes))
Application Examples

● Word frequency in a large set of documents

○ Power of sorted keys and values
○ Combiners for map output

● Computing average income in a city for a given year

○ Using customized readers to
■ Optimize MapReduce
■ Mimic rudimentary DBMS functionality

● Overlaying satellite images

○ Handling various input formats using protocol bufers
Stitch Imagery Data for Google Maps

A simplified version could be:

● Imagery data from different content providers
○ Different formats
○ Different coverages
○ Different timestamps
○ Different resolutions
○ Different exposures/tones
● Large amount to data to be processed
● Goal: produce data to serve a "satellite" view to users
Stitch Imagery Data Algorithm

1. Split the whole territory into "tiles" with fixed location IDs
2. Split each source image according to the tiles it covers

3. For a given tile, stitch contributions from different sources,

based on its freshness and resolution, or other preference

4. Serve the merged imagery data for each tile, so they can be
loaded into and served from a image server farm.
Using Protocol Buffers
to Encode Structured Data
● Open sourced from Google, among many others:
http://code.google.com/p/protobuf/
● It supports C++, Java and Python.
● A way of encoding structured data in an efficient yet extensible
format. e.g. we can define
message Tile {
required int64 location_id = 1;
group coverage {
double latitude = 2;
double longitude = 3;
double width = 4; // in km
double length = 5; // in km
}
required bytes image_data = 6; // Bitmap Image data
required int64 timestamp = 7;
optional float resolution = 8 [default = 10];
optinal string debug_info = 10;
}

Google uses Protocol Buffers for almost all its internal RPC
protocols, file formats and of course in MapReduce.
Stitch Imagery Data Solution: Mapper
map(String key, String value):
// key: image file name
// value: image data
Tile whole_image;

switch (file_type(key)):
FROM_PROVIDER_A: Convert_A(value, &whole_image);
FROM PROVIDER_B: Convert_B(...);
...

// split whole_image according to the grid into tiles

for each Tile t in whole_image
string v;
t.SerializeToString(&v);
EmitIntermediate(IntToStr(t.location_id(),v);
Stitch Imagery Data Solution: Reducer

reduce(String key, Iterator values):

// key: location_id,
// values: tiles from different sources

sort values according v.resolution() and v.timestamp();

Tile merged_tile;
for each v in values:
overlay pixels in v to merged_tile based on
v.coverage();

Normalize merged_tile to be the serve tile size;

Emit(key, ProtobufToString(merged_tile));
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ Usability
Distributed Computing Landscape

Dimensions to compare Apples and Oranges

● Data organization
● Programming model
● Execution model

● Target applications
● Assumed computing environment
● Overall operating cost
My Basket of Fruit
Declarative

DBMS/SQL
Programming
Model

MapReduce
Procedural

MPI

Flat raw files Structured

Data Organization
Nutritional Information of My Basket
MPI MapReduce DBMS/SQL
What they are A general parrellel A programming paradigm A system to store, manipulate
programming paradigm and its associated execution and serve data.
system

Programming Model Messages passing between Restricted to Map/Reduce Declarative on data

nodes operations query/retrieving;
Stored procedures

Data organization No assumption "files" can be sharded Organized datastructures

Data to be manipulated Any k,v pairs: string/protomsg Tables with rich types

Execution model Nodes are independent Map/Shuffle/Reduce Transaction

Checkpointing/Backup Query/operation optimization
Physical data locality Materialized view

Usability Steep learning curve*; Simple concept Declarative interface;

difficult to debug Could be hard to optimize Could be hard to debug in
runtime

Key selling point Flexible to accommodate Plow through large amount Interactive querying the data;
various applications of data with commodity Maintain a consistent view
hardware across clients

See what others say: [1], [2], [3]

Taste Them with Your Own Grain of Salt
Dimensions to choose between Apples and Oranges for an
application developer:

● Target applications
○ Complex operations run frequently v.s. one time plow
○ Off-line processing v.s. real-time serving

● Assumed computing environment

○ Off-the-shelf, custom-made or donated
○ Formats and sources of your data

● Overall operating cost

○ Hardware maintenance, license fee
○ Manpower to develop, monitor and debug
Existing MapReduce and Similar Systems
Google MapReduce
● Support C++, Java, Python, Sawzall, etc.
● Based on proprietary infrastructures
○ GFS(SOSP'03), MapReduce(OSDI'04) , Sawzall(SPJ'05), Chubby
(OSDI'06), Bigtable(OSDI'06)
○ and some open source libraries

Hadoop Map-Reduce
● Open Source!
● Plus the whole equivalent package, and more
○ HDFS, Map-Reduce, Pig, Zookeeper, HBase, Hive
● Used by Yahoo!, Facebook, Amazon and Google-IBM NSF cluster

Dryad
● Proprietary, based on Microsoft SQL servers
● Dryad(EuroSys'07), DryadLINQ(OSDI'08)
● Michael's Dryad TechTalk@Google (Nov.'07)
And others
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ Usability
Inverted Index Construction

● Input: Large number of text documents

● Task: Postings lists for every term in the collection
○ For every word, all documents that contain the word and
the positions.
http://www.cat.com/ http://www.dog.com/

I saw the cat on the mat I saw the dog on the mat

I http://www.cat.com, 0 http://www.dog.com, 0

saw http://www.cat.com, 1 http://www.dog.com, 1

the http://www.cat.com, 2 http://www.dog.com, 2

cat http://www.cat.com, 3

mat http://www.cat.com, 6 http://www.dog.com, 6

Inverted Index Construction

Solution:

● Mapper:
○ For every word in a document output (word, [URL, position])

● Reducer:
○ Aggregate all the information that we have about each word.
Inverted Index Solution

//Pseudo-code for "inverted index"

map(String key, String value):
// key: document URL,
// value: document contents
vector words = tokenize(value)
for position from 0 to len(words):
EmitIntermediate(w, {key, position});

reduce(String key, Iterator values):

// key: a word
// values: a list of {URL, position} tuples.
postings_list = [];
for each v in values:
postings_list.append(v);
sort(postings_list); // Sort by URL, then position
Emit(key, AsString(postings_list));
Inverted Index Optimization: Combiner

● Combiners can also be used to reduce the number of

intermediate outputs, to start aggregating all occurrences
of document words.

Partition (k', v')s from

Map(k,v) --> (k', v') Mappers to Reducers Reduce(k',v'[]) --> v"
according to k'
Mapper C
Reducer
Mapper C
Input split inputs Reducer
Input Mapper C
Input Reducer
Input Output
Input Mapper C
Input Reducer
Input Output
Input
Input Output
Input
Output
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ (Operational) Usability
■ monitoring, debugging, profiling, etc.

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation
PageRank computation

● Input: Large number of documents with hyperlinks structured

as a graph.
● Task:

○ Algorithm to compute the probability that a random walk

on the graph will land in a given page.

○ Used as a measure of the importance of the page.

○ With a small probability, the user can jump to any page in

the graph (not following hyperlinks).
PageRank computation

(Source: http://en.wikipedia.org/wiki/PageRank)
PageRank computation

Algorithm:
● N = total number of web pages
● Matrix M defined as;
○ M[i][j] is 0 if the j-th page has no links to the i-th page.
○ M[i][j] is the probability to move from page j to page i,
assuming the same probability for all outgoing links.

● Vector R defined as:

○ R[i] is the estimated PageRank value for page i.

● Iterative algorithm:
R = (1-d) . M . R + d/N
where d is the decay term
PageRank computation

No decay term, one iteration.

Most probability mass in B and E
A B C D E F G H I J K PR PR

A 0 0 0 0.5 0 0 0 0 0 0 0 0.09 0.05

B 0 0 1 0.5 0.33 0.5 0.5 0.5 0.5 0 0 0.09 0.34

C 0 1 0 0 0 0 0 0 0 0 0 0.09 0.09

D 0 0 0 0 0.33 0 0 0 0 0 0 0.09 0.03

E 0 0 0 0 0 0.5 0.5 0.5 0.5 1 1 0.09 0.36

F 0 0 0 0 0.33 0 0 0 0 0 0 0.09 0.03

G 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

H 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

I 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

J 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

K 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00
PageRank computation

No decay term, three iterations.

Most probability mass in B and C
A B C D E F G H I J K PR PR

A 0 0 0 0.5 0 0 0 0 0 0 0 0.09 0.06

B 0 0 1 0.5 0.33 0.5 0.5 0.5 0.5 0 0 0.09 0.47

C 0 1 0 0 0 0 0 0 0 0 0 0.09 0.24

D 0 0 0 0 0.33 0 0 0 0 0 0 0.09 0.01

E 0 0 0 0 0 0.5 0.5 0.5 0.5 1 1 0.09 0.06

F 0 0 0 0 0.33 0 0 0 0 0 0 0.09 0.01

G 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

H 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

I 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

J 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00

K 0 0 0 0 0 0 0 0 0 0 0 0.09 0.00
PageRank computation

Decay term 0.18, three iterations.

A B C D E F G H I J K PR PR

A 0 0 0 0.5 0 0 0 0 0 0 0 0.09 0.04

B 0 0 1 0.5 0.33 0.5 0.5 0.5 0.5 0 0 0.09 0.45

C 0 1 0 0 0 0 0 0 0 0 0 0.09 0.57

D 0 0 0 0 0.33 0 0 0 0 0 0 0.09 0.06

E 0 0 0 0 0 0.5 0.5 0.5 0.5 1 1 0.09 0.09

F 0 0 0 0 0.33 0 0 0 0 0 0 0.09 0.01

G 0 0 0 0 0 0 0 0 0 0 0 0.09 0.01

H 0 0 0 0 0 0 0 0 0 0 0 0.09 0.01

I 0 0 0 0 0 0 0 0 0 0 0 0.09 0.01

J 0 0 0 0 0 0 0 0 0 0 0 0.09 0.01

K 0 0 0 0 0 0 0 0 0 0 0 0.09 0.01
PageRank computation

● Matrix m is sparse:
○ We can store one <key, value> pair per row.
○ key = URL, value = URLs of outgoing links.

● Vector R:
○ one <key, value> pair per element.

● Matrix multiplication:
○ Join both sets (aggregate by key).
○ Multiply to produce each new value of R’ in the reduce
step.
PageRank computation

● Joins are trivial to implement in MapReduce:

○ For the first dataset, one mapper function maps
(key1, value1) to (key1, value1)

○ For the second dataset, other mapper function maps

(key2, value2) to (key2, value2)

○ The reducer aggregates, for the same key, the two

values, if both are present.
PageRank computation

//Pseudo-code for "PageRank" (no decay factor)

map_matrix(String key,
String input_value,
String joined_input_value):
// key: document URL,
// input_value: URLs of outgoing links and weights
// joined_input_value: Current PageRank.
for (URL, weight) in value:
EmitIntermediate(key, (weight * joined_input_value));

reduce_pagerank(String key, Iterator values):

// key: a URL
// values: the incoming PageRank from each node with
// an incoming link.
int sum = 0;
for each v in values:
sum += v;
Emit(key, v);
PageRank computation

//Pseudo-code for "PageRank" (with decay factor)

reduce_pagerank(String key, Iterator values):

// key: a URL
// values: the incoming PageRank from each node with
// an incoming link.
int sum = 0;
for each v in values:
sum += v;

// N is the graph size, assumed to be known from when the input

// sparse matrix was constructed.
sum = d * sum + (1-d) / N;
Emit(key, v);
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ Usability
Google Computing Infrastructure
● Infrastructure must support
○ Diverse set of applications
■ Increasing over time
○ Ever-increasing application usage
○ Ever-increasing computational requirements
○ Cost effective

● Data centers
○ Google-specific mechanical, thermal and electrical design
○ Highly-customized PC-class motherboards
○ Running Linux
○ In-house management & application software
Sharing is the Way of Life

+ Batch processing
(MapReduce, Sazwall)
Major Challenges
To organize the world’s information and make it universally
accessible and useful.

● Failure handling
○ Bad apples appear now and there
● Scalability
○ Fast growing dataset
○ Broad extension of Google services
● Performance and utilization
○ Minimizing run-time for individual jobs
○ Maximizing throughput across all services
● Usability
○ Troubleshooting
○ Performance tuning
○ Production monitoring
Failures in Literature

● LANL data (DSN 2006)

○ Data collected over 9 years
○ Covered 4750 machines and 24101 CPUs
○ Distribution of failures
■ Hardware ~ 60%, Software ~
20%, Network/Environment/Humans ~ 5%, Aliens ~ 25%*
■ Depending on a system, failures occurred between
once a day to once a month
○ Most of the systems in the survey were the cream of the crop at
their time

● PlanetLab (SIGMETRICS 2008 HotMetrics Workshop)

○ Average frequency of failures per node in a 3-months period
■ Hard failures: 2.1
■ Soft failures: 41
■ Approximately failure every 4 days
Failures in Google Data Centers

● DRAM errors analysis (SIGMETRICS 2009)

○ Data collected over 2.5 years
○ 25,000 to 70,000 errors per billion device hours per Mbit
■ Order of magnitude more than under lab conditions
○ 8% of DIMMs affected by errors
○ Hard errors are dominant cause of failure

● Disk drive failure analysis (FAST 2007)

○ Annualized Failure Rates vary from 1.7% for one year old
drives to over 8.6% in three year old ones
○ Utilization affects failure rates only in very old and very old
disk drive populations
○ Temperature change can cause increase in failure rates but
mostly for old drives
Failures in Google

● Failures are a part of everyday life

○ Mostly due to the scale and shared environment

● Sources of job failures

○ Hardware
○ Software
○ Preemption by a more important job
○ Unavailability of a resource due to overload

● Failure types
○ Permanent
○ Transient
Different Failures Require Different Actions

● Fatal failure (the whole job dies)

○ Simplest case around :)
○ You'd prefer to resume computation rather than recompute

● Transient failures
○ You'd want your job to adjust and finish when
issues resolve

● Program hangs.. forever.

○ Define "forever"
○ Can we figure out why?
○ What to do?

● "It's-Not-My-Fault" failures
MapReduce: Task Failure
User
Program
(1) fork (1) fork
(1) fork

Master

(2) assign
map (2) assign
reduce
worker
split 0
ote (6) write output
split 1 5) remad worker
file 0
(4) local write ( re
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on local disks) phase files
Recover from Task Failure by Re-
execution
User
Program
(1) fork (1) fork
(1) fork

(2) assign
map Master

(2) assign
reduce
worker
split 0
m ote worker
(6) write output
re file 0
split 1
(4) local write (5) read
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on local disks) phase files
Recover by Checkpointing Map Output

User
Program
(1) fork (1) fork
(1) fork

Master

(2) assign
map (2) assign
reduce
worker
split 0
o te (6) write output
split 1 5 ) remad worker
file 0
(4) write ( re
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on GFS) phase files
MapReduce: Master Failure
User
Program
(1) fork (1) fork
(1) fork

Master

(2) assign
map (2) assign
reduce
worker
split 0
m ote worker
(6) write output
re file 0
split 1
(4) local write (5) read
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on local disks) phase files
Master as a Single Point of Failure
User
Program
(1) fork (1) fork
(1) fork

Master

(2) assign
map (2) assign
reduce
worker
split 0
(6) write output
worker
split 1 file 0

split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on local disks) phase files
Resume from Execution Log on GFS
User
Program
(1) fork (1) fork
(1) fork
execution log on GFS
Master

(2) assign
map (2) assign
reduce
worker
split 0
m ote worker
(6) write output
re file 0
split 1
(4) write (5) read
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Intermediate files Reduce Output

files phase (on GFS) phase files
MapReduce: Slow Worker/Task
User
Program
(1) fork (1) fork
(1) fork

Master

(2) assign
map (2) assign
reduce
worker
split 0
o te (6) write output
split 1 5 ) remad worker
file 0
(4) write ( re
(3) read
split 2 worker

split 3 output
worker
file 1
split 4
worker

Input Map Reduce Output

Intermediate files
files phase phase files
Handle Unfixable Failures

● Input data is in a partially wrong format or is corrupted

○ Data is mostly well-formatted, but there are instances where
your code crashes
○ Corruptions happen rarely, but they are possible at scale

● Your application depends on an external library which you

do not control
○ Which happens to have a bug for a particular, yet very rare,
input pattern

● What would you do?

○ Your job is critical to finish as soon as possible
○ The problematic records are very rare
○ IGNORE IT!
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
■ Some techniques and tuning tips
○ Usability
Performance and Scalability of
MapReduce

Terasort and Petasort with MapReduce in Nov 2008

● Not particularly representative for production MRs
● An important benchmark to evaluate the whole stack
● Sorted 1TB (as 10 billion 100-byte uncompressed text)
on 1,000 computers in 68 seconds
● Sorted 1PB (10 trillion 100-byte records) on 4,000
computers in 6 hours and 2 minutes

With Open-source Hadoop in May 2009 (TechReport)

● Terasort: 62 seconds on 1460 nodes
● Petasort: 16 hours and 15 minutes on 3658 nodes
Built up on Great Google Infrastructure

Google MapReduce is built upon an set of high

performance infrastructure components:
● Google file system (GFS) (SOSP'03)
● Chubby distributed lock service (OSDI'06)
● Bigtable for structured data storage (OSDI'06)
● Google cluster management system
● Powerful yet energy efficient* hardware and finetuned
platform software
● Other house-built libraries and services
Take Advantage of Locality Hints from
GFS

● Files in GFS
○ Divided into chunks (default 64MB)
○ Stored with replications, typical r=3
○ Reading from local disk is much faster and cheaper
than reading from a remote server

● MapReduce uses the locality hints from GFS

○ Try to assign a task to a machine with a local copy of
input
○ Or, less preferable, to a machine where a copy stored
on a server on the same network switch
○ Or, assign to any available worker
Tuning Task Granularity

Questions often asked in production:

● How many Map tasks I should split my input into?
● How many Reduce splits I should have?

Implications on scalability
● Master has to make O(M+R) decisions
● System has to keep O(M*R) metadata for distributing
map output to reducers

To balance locality, performance and scalability

● By default, each map task is 64MB (== GFS chunksize)
● Usually, #reduce tasks is a small multiple of #machine
More on Map Task Size

● Small map tasks allow fast failure recovery

○ Define "small": input size, output size or processing time

● Big map tasks may force mappers to read from

multiple remote chunkservers

● Too many small map shards might lead to excessive

overhead in map output distribution
Reduce Task Partitioning Function

It is relatively easy to control Map input granularity

● Each map task is independent

For Reduce tasks, we can tweak the partitioning function

instead.
Partition (k', v')s from
Map(k,v) --> (k',
Mappers to Reducers Reduce(k',v'[]) --> v"
v')
according to k' Reduce
Reduce key
Mapper
Reducer
input size
*.blogspot.com 82.9G
Mapper
Reducer

Mapper
cgi.ebay.com 58.2G
Reducer

Mapper
Reducer profile.myspace.com 56.3G
yellowpages.superpages.com 49.6G

www.amazon.co.uk 41.7G

average reduce input size for a given 300K

key
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
■ Dealing with stragglers
○ Usability
Dealing with Reduce Stragglers

Many reason leads to stragglers but reducing is inherently

expensive:
● Reducer retrieves data remotely from many servers
● Sorting is expensive on local resources
● Reducing usually can not start until Mapping is done
Re-execution due to machine failures could double the
runtime.
map reduce worker
worker

reducer output
sorter file 0
map
worker

reduce output
worker file 1
map
worker
Dealing with Reduce Stragglers
Technique 1:
Create a backup instance as early and as necessary as
possible

map reduce worker

worker

reducer output
sorter file 0
map
worker

output
R'
file 0
map
worker
Steal Reduce Input for Backups
Technique 2:
Retrieving map output and sorting are expensive, but we
can transport the sorted input to the backup reducer

map reduce worker

worker

reducer output
sorter file 0
map
worker

output
R'
file 0
map
worker
Reduce Task Splitting

Technique 3:
Divide a reduce task into smaller ones to take advantage of
more parallelism.

map reduce worker

worker

reduce() output
sorter file 0
map
worker

output
R'
file 0.2
map
worker output
R'
file 0.1

output
R'
file 0.0
Tutorial Overview

● MapReduce programming model

○ Brief intro to MapReduce
○ Use of MapReduce inside Google
○ MapReduce programming examples
○ MapReduce, similar and alternatives

● Practical indexing examples in IR

○ Inverted index construction
○ PageRank computation

● Implementation of Google MapReduce

○ Dealing with failures
○ Performance & scalability
○ (Operational) Usability
■ monitoring, debugging, profiling, etc.
Tools for Google MapReduce

Local run mode for debugging/profiling MapReduce

applications

Status page to monitor and track progress of MapReduce

executions, also
● Email notification
● Replay progress postmortem

Distributed counters used by MapReduce library and

application for validation, debugging and tuning
● System invariant
● Performance profiling
MapReduce Counters

Light-weighted stats with only "increment" operations

● per task counters: contributed by each M/R task
○ only counted once even there are backup instances
● per worker counters: contributed by each worker process
○ aggregated contributions from all instances
● Can be easily added by developers

Examples:
● num_map_output_records == num_reduce_input_records
● CPU time spend in Map() and Reduce() functions
MapReduce Development inside Google

Support C++, Java, Python, Sawzall, etc.

Nurtured greatly by Google engineer community

● Friendly internal user discussion groups
● Fix-it! instead of complain-about-it! attitude
● Users contribute to both the core library and contrib
○ Thousands of Mapper Reducer implementations
○ Tens of Input/Output formats
○ Endless new ideas and proposals
Summary

● MapReduce is a flexible programming framework for

many applications through a couple of restricted Map()
/Reduce() constructs

● Google invented and implemented MapReduce around

its infrastructure to allow our engineers scale with the
growth of the Internet, and the growth of Google
products/services

● Open source implementations of MapReduce, such as

Hadoop are creating a new ecosystem to enable large
scale computing over the off-the-shelf clusters

● MapReduce has many applications for web information

retrieval, to parallelize work on large-scale datasets.
Thank you!

Web Designing Using Asp - Net Gondwana University Gadchiroli
100% (1)
Web Designing Using Asp - Net Gondwana University Gadchiroli
73 pages
Cit191.0 Computer Laboratory 1.0
No ratings yet
Cit191.0 Computer Laboratory 1.0
67 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
ECS765P - W2 - The MapReduce Programming Model
No ratings yet
ECS765P - W2 - The MapReduce Programming Model
53 pages
Guidelines For ATV Over WiFi-16!09!2024
No ratings yet
Guidelines For ATV Over WiFi-16!09!2024
31 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
4a MapReduce
No ratings yet
4a MapReduce
47 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
Assembly Lang Details Unit1
No ratings yet
Assembly Lang Details Unit1
73 pages
1431 in (English) Assignments No 1
No ratings yet
1431 in (English) Assignments No 1
12 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Load Mesh and Animations in Marmoset Toolbag EN
No ratings yet
Load Mesh and Animations in Marmoset Toolbag EN
8 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Lecture 2 - Map Reduce
No ratings yet
Lecture 2 - Map Reduce
20 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Python (Part-1)
No ratings yet
Python (Part-1)
44 pages
Hadoop 2
No ratings yet
Hadoop 2
31 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Amit Maurya 8
No ratings yet
Amit Maurya 8
39 pages
2 - Fundamental File Processing Operations
No ratings yet
2 - Fundamental File Processing Operations
30 pages
The Mapreduce Programming Model
No ratings yet
The Mapreduce Programming Model
64 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Lec 8
No ratings yet
Lec 8
19 pages
Unit 5
No ratings yet
Unit 5
45 pages
Map Reduce Design and Execution Framework Part 1
No ratings yet
Map Reduce Design and Execution Framework Part 1
19 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Lec 8
No ratings yet
Lec 8
24 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
FMR VS TMR
No ratings yet
FMR VS TMR
14 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
BDP 2024 08
No ratings yet
BDP 2024 08
14 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Employee Daily Visit Management System: Bachelor of Science (Information Technology)
No ratings yet
Employee Daily Visit Management System: Bachelor of Science (Information Technology)
31 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
CS301 (P) Lab 1
No ratings yet
CS301 (P) Lab 1
10 pages
Amdals Law Notes
No ratings yet
Amdals Law Notes
8 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
June 19th 2009
No ratings yet
June 19th 2009
71 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Cloud Computing
No ratings yet
Cloud Computing
15 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Map Reduce Intro CS4961-L22
No ratings yet
Map Reduce Intro CS4961-L22
20 pages
Getting Started With Distributed Data Parallel - PyTorch Tutorials 2.4.0+cu124 Documentation
No ratings yet
Getting Started With Distributed Data Parallel - PyTorch Tutorials 2.4.0+cu124 Documentation
4 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
Case Study Scenario 1
No ratings yet
Case Study Scenario 1
9 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Java Programming MCQ Questions and Answers: Read/Download
No ratings yet
Java Programming MCQ Questions and Answers: Read/Download
2 pages
Reading The Design Into PT: Learning Objectives
No ratings yet
Reading The Design Into PT: Learning Objectives
16 pages
Distributed Computing Seminar: Mapreduce Theory and Implementation
No ratings yet
Distributed Computing Seminar: Mapreduce Theory and Implementation
30 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
IC Robotic Process Automation Assessment Template 10704
No ratings yet
IC Robotic Process Automation Assessment Template 10704
4 pages
Write A Program To Fin Tte Sum of Iumbers II Ai Array Usiig Poiiters
No ratings yet
Write A Program To Fin Tte Sum of Iumbers II Ai Array Usiig Poiiters
6 pages
Java Programming: Open Ended Experiment To Make Basic Calculator
No ratings yet
Java Programming: Open Ended Experiment To Make Basic Calculator
6 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
Alta ARINC PCIE4L A429 Data Sheet 1
No ratings yet
Alta ARINC PCIE4L A429 Data Sheet 1
2 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
2020 Copresult
No ratings yet
2020 Copresult
4 pages
Release Notes V2.0: HUAWEI E5573s-320TCPU-V200R001B315D01SP00C306
No ratings yet
Release Notes V2.0: HUAWEI E5573s-320TCPU-V200R001B315D01SP00C306
7 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Talk - Why Is Google A Verb
No ratings yet
Talk - Why Is Google A Verb
11 pages
Write Your First MapReduce Program in 20 Minutes
No ratings yet
Write Your First MapReduce Program in 20 Minutes
16 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Hadoop Tutorial - YDN
No ratings yet
Hadoop Tutorial - YDN
14 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
CMU - CampusMap
No ratings yet
CMU - CampusMap
1 page
SMDC Residential in Philippines Access Control Case Study
No ratings yet
SMDC Residential in Philippines Access Control Case Study
2 pages
Course Outline
No ratings yet
Course Outline
3 pages
KT-6200 Recommended Spare Parts List V1.2-20190718
No ratings yet
KT-6200 Recommended Spare Parts List V1.2-20190718
1 page
PCK 126 Activity
No ratings yet
PCK 126 Activity
3 pages
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet