The Mapreduce Paradigm: Michael Kleber
The Mapreduce Paradigm: Michael Kleber
Michael Kleber
with most slides shamelessly stolen from Jeff Dean and Yonatan Zunger
Google, Inc.
Jan. 14, 2008
A typical exercise for a new Google engineer in his or her first week
Input: files with one document per record
Specify a map function that takes a key/value pair
key = document URL
value = document contents
Output of map function is (potentially many) key/value pairs.
In our case, output (word, “1”) once per word in the document
“to”, “1”
“be”, “1”
“or”, “1”
…
Except as otherwise noted, this presentation is released
under the Creative Commons Attribution 2.5 License.
Example: Word Frequencies in Web Pages
MapReduce library gathers together all pairs with the same key
(shuffle/sort)
Specify a reduce function that combines the values for a key
In our case, compute the sum
“be”, “2”
“not”, “1”
“or”, “1”
“to”, “2”
Except as otherwise noted, this presentation is released
under the Creative Commons Attribution 2.5 License.
Under the hood: Scheduling
Master
Worker failure:
• Detect failure via periodic heartbeats
• Re-execute completed and in-progress map tasks
• Re-execute in-progress reduce tasks
• Task completion committed through master
Master failure:
• State is checkpointed to replicated file system
• New master recovers & continues
Very Robust: lost 1600 of 1800 machines once, but finished fine