MapReduce Example
MapReduce Example
UNIVERSIT DT FRANCO-ALLEMANDE POUR JEUNES CHERCHEURS 2011 CLOUD COMPUTING: DFIS ET OPPORTUNITS
Hadoop in Practice
1 / 76
Introduction
Overview of this Lecture Hadoop: Architecture Design and Implementation Details [45 minutes]
HDFS Hadoop MapReduce Hadoop MapReduce I/O
Exercise Session:
Warm up: WordCount and rst design patterns [45 minutes] Exercises on various design patterns: [60 minutes]
Pairs Stripes Order Inversion [HomeWork]
Hadoop in Practice
2 / 76
Hadoop MapReduce
Hadoop MapReduce
Hadoop in Practice
3 / 76
Hadoop MapReduce
Preliminaries
Terminology
MapReduce:
Job: an execution of a Mapper and Reducer across a data set Task: an execution of a Mapper or a Reducer on a slice of data Task Attempt: instance of an attempt to execute a task
Example:
Running Word Count across 20 les is one job 20 les to be mapped = 20 map tasks + some number of reduce tasks At least 20 attempts will be performed... more if a machine crashes
Hadoop in Practice
4 / 76
Hadoop MapReduce
HDFS in details
HDFS in details
Hadoop in Practice
5 / 76
Hadoop MapReduce
HDFS in details
Hadoop in Practice
6 / 76
Hadoop MapReduce
HDFS in details
Hadoop in Practice
7 / 76
Hadoop MapReduce
HDFS in details
Secondary NameNode
Merges the namespce with the edit log A useful trick to recover from a failure of the NameNode is to use the NFS copy of metadata and switch the secondary to primary
DataNode
They store data and talk to clients They report periodically to the NameNode the list of blocks they hold
Pietro Michiardi (Eurecom) Hadoop in Practice 8 / 76
Hadoop MapReduce
HDFS in details
External clients
For each block, the NameNode returns a set of DataNodes holding a copy thereof DataNodes are sorted according to their proximity to the client
MapReduce clients
TaskTracker and DataNodes are colocated For each block, the NameNode usually1 returns the local DataNode
Hadoop MapReduce
HDFS in details
Details on replication
Clients ask NameNode for a list of suitable DataNodes This list forms a pipeline: rst DataNode stores a copy of a block, then forwards it to the second, and so on
Replica Placement
Tradeoff between reliability and bandwidth Default placement:
First copy on the same node of the client, second replica is off-rack, third replica is on the same rack as the second but on a different node Since Hadoop 0.21, replica placement can be customized
Hadoop in Practice
10 / 76
Hadoop MapReduce
HDFS in details
Hadoop in Practice
11 / 76
Hadoop MapReduce
Hadoop in Practice
12 / 76
Hadoop MapReduce
Hadoop in Practice
13 / 76
Hadoop MapReduce
Job Submission
JobClient class
The runJob() method creates a new instance of a JobClient Then it calls the submitJob() on this class
Hadoop in Practice
14 / 76
Hadoop MapReduce
Hadoop MapReduce
Selecting a task
JobTracker rst needs to select a job (i.e. scheduling) TaskTrackers have a xed number of slots for map and reduce tasks JobTracker gives priority to map tasks (WHY?)
Data locality
JobTracker is topology aware
Useful for map tasks Unused for reduce tasks
Pietro Michiardi (Eurecom) Hadoop in Practice 16 / 76
Hadoop MapReduce
Task Execution
Hadoop in Practice
17 / 76
Hadoop MapReduce
The MapReduce framework guarantees the input to every reducer to be sorted by key
The process by which the system sorts and transfers map outputs to reducers is known as shufe
Shufe is the most important part of the framework, where the magic happens
Good understanding allows optimizing both the framework and the execution time of MapReduce jobs
Hadoop in Practice
18 / 76
Hadoop MapReduce
Hadoop in Practice
19 / 76
Hadoop MapReduce
Shufe and Sort: the Map Side The output of a map task is not simply written to disk
In memory buffering Pre-sorting
Disk spills
Written in round-robin to a local dir Output data is partitioned corresponding to the reducers they will be sent to Within each partition, data is sorted (in-memory) Optionally, if there is a combiner, it is executed just after the sort phase
Pietro Michiardi (Eurecom) Hadoop in Practice 20 / 76
Hadoop MapReduce
Hadoop in Practice
21 / 76
Hadoop MapReduce
Hadoop in Practice
22 / 76
Hadoop MapReduce
Shufe and Sort: the Reduce Side The map output le is located on the local disk of TaskTracker Another TaskTracker (in charge of a reduce task) requires input from many other TaskTracker (that nished their map tasks)
How do reducers know which TaskTrackers to fetch map output from?
When a map task nishes it noties the parent TaskTracker The TaskTracker noties (with the heartbeat mechanism) the jobtracker A thread in the reducer polls periodically the JobTracker TaskTrackers do not delete local map output as soon as a reduce task has fetched them (WHY?)
Hadoop MapReduce
Shufe and Sort: the Reduce Side The map outputs are copied to the the TraskTracker running the reducer in memory (if they t)
Otherwise they are copied to disk
Input consolidation
A background thread merges all partial inputs into larger, sorted les Note that if compression was used (for map outputs to save bandwidth), decompression will take place in memory
Hadoop in Practice
24 / 76
Hadoop MapReduce
Hadoop I/O
Hadoop I/O
Hadoop in Practice
25 / 76
Hadoop MapReduce
Hadoop I/O
Whats next
Overview of what Hadoop offers For an in depth knowledge, use [7]
Hadoop in Practice
26 / 76
Hadoop MapReduce
Hadoop I/O
Hadoop MapReduce
Hadoop I/O
Hadoop in Practice
28 / 76
Hadoop MapReduce
Hadoop I/O
MapReduce Types
Types:
K types implement WritableComparable V types implement Writable
Hadoop in Practice
29 / 76
Hadoop MapReduce
Hadoop I/O
What is a Writable
Hadoop denes its own classes for strings (Text), integers (IntWritable), etc... All keys are instances of WritableComparable
Why comparable?
Hadoop in Practice
30 / 76
Hadoop MapReduce
Hadoop I/O
Hadoop in Practice
31 / 76
Hadoop MapReduce
Hadoop I/O
Reading Data
Hadoop in Practice
32 / 76
Hadoop MapReduce
Hadoop I/O
KeyValueTextInputFormat
Maps newline-terminated text lines of key SEPARATOR value
SequenceFileInputFormat
Binary le of key-value pairs with some additional metadata
SequenceFileAsTextInputFormat
Same as before but, maps (k.toString(), v.toString())
Hadoop in Practice
33 / 76
Hadoop MapReduce
Hadoop I/O
Record Readers
KeyValueRecordReader
Used by KeyValueTextInputFormat
Hadoop in Practice
34 / 76
Hadoop MapReduce
Hadoop I/O
On the top of the Crumpetty Tree The Quangle Wangle sat, But his face you could not see, On account of his Beaver Hat.
(0, On the top of the Crumpetty Tree) (33, The Quangle Wangle sat,) (57, But his face you could not see,) (89, On account of his Beaver Hat.)
Hadoop in Practice
35 / 76
Hadoop MapReduce
Hadoop I/O
Hadoop in Practice
36 / 76
Hadoop MapReduce
Hadoop I/O
Analogous to InputFormat TextOutputFormat writes key value <newline> strings to output le SequenceFileOutputFormat uses a binary format to pack key-value pairs NullOutputFormat discards output
Hadoop in Practice
37 / 76
Algorithm Design
Algorithm Design
Hadoop in Practice
38 / 76
Algorithm Design
Preliminaries
Preliminaries
Hadoop in Practice
39 / 76
Algorithm Design
Preliminaries
Learn by examples
Design patterns Synchronization is perhaps the most tricky aspect
Pietro Michiardi (Eurecom) Hadoop in Practice 40 / 76
Algorithm Design
Preliminaries
Algorithm Design Aspects that are not under the control of the designer
Where a mapper or reducer will run When a mapper or reducer begins or nishes Which input key-value pairs are processed by a specic mapper Which intermediate key-value paris are processed by a specic reducer
Algorithm Design
Preliminaries
Algorithm Design
Optimizations
Scalability (linear) Resource requirements (storage and bandwidth)
Hadoop in Practice
42 / 76
Algorithm Design
Local Aggregation
Hadoop in Practice
43 / 76
Algorithm Design
Local Aggregation
Motivations In the context of data-intensive distributed processing, the most important aspect of synchronization is the exchange of intermediate results
This involves copying intermediate results from the processes that produced them to those that consume them In general, this involves data transfers over the network In Hadoop, also disk I/O is involved, as intermediate results are written to disk
Algorithm Design
Local Aggregation
Hadoop in Practice
45 / 76
Algorithm Design
Local Aggregation
Mapper:
Takes an input key-value pair, tokenize the document Emits intermediate key-value pairs: the word is the key and the integer is the value
The framework:
Guarantees all values associated with the same key (the word) are brought to the same reducer
The reducer:
Receives all values associated to some keys Sums the values and writes output key-value pairs: the key is the word and the value is the number of occurrences
Pietro Michiardi (Eurecom) Hadoop in Practice 46 / 76
Algorithm Design
Local Aggregation
Technique 1: Combiners Combiners are a general mechanism to reduce the amount of intermediate data
They could be thought of as mini-reducers
Note: due to Zipan nature of term distributions, not all mappers will see all terms
Hadoop in Practice
47 / 76
Algorithm Design
Local Aggregation
Hadoop in Practice
48 / 76
Algorithm Design
Local Aggregation
In-Mapper Combiners
Hadoop in Practice
49 / 76
Algorithm Design
Local Aggregation
In-Mapper Combiners
Mappers still need to emit all key-value pairs, combiners only reduce network trafc
Hadoop in Practice
50 / 76
Algorithm Design
Local Aggregation
Scalability bottleneck
The in-mapper combining technique strictly depends on having sufcient memory to store intermediate results
And you dont want the OS to deal with swapping
Multiple threads compete for the same resources A possible solution: block and ush
Implemented with a simple counter
Hadoop in Practice
51 / 76
Algorithm Design
Hadoop in Practice
52 / 76
Algorithm Design
Next, we focus on a particular problem that benets from these two methods
Hadoop in Practice
53 / 76
Algorithm Design
Problem statement The problem: building word co-occurrence matrices for large corpora
The co-occurrence matrix of a corpus is a square n n matrix n is the number of unique words (i.e., the vocabulary size) A cell mij contains the number of times the word wi co-occurs with word wj within a specic context Context: a sentence, a paragraph a document or a window of m words NOTE: the matrix may be symmetric in some cases
Motivation
This problem is a basic building block for more complex operations Estimating the distribution of discrete joint events from a large number of observations Similar problem in other domains:
Customers who buy this tend to also buy that
Pietro Michiardi (Eurecom) Hadoop in Practice 54 / 76
Algorithm Design
The mapper:
Processes each input document Emits key-value pairs with:
Each co-occurring word pair as the key The integer one (the count) as the value
The reducer:
Receives pairs relative to co-occurring words Computes an absolute count of the joint event Emits the pair and the count as the nal key-value output
Basically reducers emit the cells of the matrix
Pietro Michiardi (Eurecom) Hadoop in Practice 55 / 76
Algorithm Design
Hadoop in Practice
56 / 76
Algorithm Design
The mapper:
Same two nested loops structure as before Co-occurrence information is rst stored in an associative array Emit key-value pairs with words as keys and the corresponding arrays as values
The reducer:
Receives all associative arrays related to the same word Performs an element-wise sum of all associative arrays with the same key Emits key-value output in the form of word, associative array
Basically, reducers emit rows of the co-occurrence matrix
Pietro Michiardi (Eurecom) Hadoop in Practice 57 / 76
Algorithm Design
Hadoop in Practice
58 / 76
Algorithm Design
The values are more complex and have serialization/deserialization overhead Greately benets from combiners, as the key space is the vocabulary Suffers from memory paging problems, if not properly engineered
Pietro Michiardi (Eurecom) Hadoop in Practice 59 / 76
Algorithm Design
Order Inversion
Hadoop in Practice
60 / 76
Algorithm Design
Order Inversion
N (, ) is the number of times a co-occurring word pair is observed The denominator is called the marginal
Pietro Michiardi (Eurecom) Hadoop in Practice 61 / 76
Algorithm Design
Order Inversion
Algorithm Design
Order Inversion
Computing relative frequenceies: a basic approach We must dene the sort order of the pair
In this way, the keys are rst sorted by the left word, and then by the right word (in the pair) Hence, we can detect if all pairs associated with the word we are conditioning on (wi ) have been seen At this point, we can use the in-memory buffer, compute the relative frequencies and emit
What we want is that all pairs with the same left word are sent to the same reducer
Pietro Michiardi (Eurecom) Hadoop in Practice Limitations of this approach 63 / 76
Algorithm Design
Order Inversion
Hadoop in Practice
64 / 76
Algorithm Design
Order Inversion
Computing relative frequenceies: order inversion Recall that mappers emit pairs of co-occurring words as keys The mapper:
additionally emits a special key of the form (wi , ) The value associated to the special key is one, that represtns the contribution of the word pair to the marginal Using combiners, these partial marginal counts will be aggregated before being sent to the reducers
The reducer:
We must make sure that the special key-value pairs are processed before any other key-value pairs where the left word is wi We also need to modify the partitioner as before, i.e., it would take into account only the rst word
Pietro Michiardi (Eurecom) Hadoop in Practice 65 / 76
Algorithm Design
Order Inversion
Memory requirements:
Minimal, because only the marginal (an integer) needs to be stored No buffering of individual co-occurring word No scalability bottleneck
Hadoop in Practice
66 / 76
Algorithm Design
PageRank
Hadoop in Practice
67 / 76
Algorithm Design
PageRank
mL(n)
|G| is the number of nodes in the graph is a random jump factor L(n) is the set of out-going links from page n C (m) is the out-degree of node m
Pietro Michiardi (Eurecom) Hadoop in Practice 68 / 76
Algorithm Design
PageRank
Hadoop in Practice
69 / 76
Algorithm Design
PageRank
Hadoop in Practice
70 / 76
Algorithm Design
PageRank
Hadoop in Practice
71 / 76
Algorithm Design
PageRank
PageRank in MapReduce
The reducer updates the value of the PageRank of every single node
Hadoop in Practice
72 / 76
Algorithm Design
PageRank
Hadoop in Practice
73 / 76
Algorithm Design
PageRank
Hadoop in Practice
74 / 76
References
References I [1] Adversarial information retrieval workshop. [2] Monica Bianchini, Marco Gori, and Franco Scarselli. Inside pagerank. In ACM Transactions on Internet Technology, 2005. [3] Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In Proc. of SPAA, 2011. [4] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densication laws, shrinking diamters and possible explanations. In Proc. of SIGKDD, 2005.
Hadoop in Practice
75 / 76
References
References II [5] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringin order to the web. In Stanford Digital Library Working Paper, 1999. [6] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed le system. In Proc. of the 26th IEEE Symposium on Massive Storage Systems and Technologies (MSST). IEEE, 2010. [7] Tom White. Hadoop, The Denitive Guide. OReilly, Yahoo, 2010.
Hadoop in Practice
76 / 76