MapReduce Tutorial
MapReduce Tutorial
Pietro Michiardi
Eurecom
Tutorial: MapReduce
1 / 191
Introduction
Introduction
Tutorial: MapReduce
2 / 191
Introduction
What is MapReduce
A programming model:
Inspired by functional programming Allows expressing distributed computations on massive amounts of data
An execution framework:
Designed for large-scale data processing Designed to run on clusters of commodity hardware
Tutorial: MapReduce
3 / 191
Introduction
Tutorial: MapReduce
4 / 191
Introduction
Motivations
Motivations
Tutorial: MapReduce
5 / 191
Introduction
Motivations
Introduction
Big Ideas
Tutorial: MapReduce
7 / 191
Introduction
Big Ideas
For data-intensive workloads, a large number of commodity servers is preferred over a small number of high-end servers
Cost of super-computers is not linear But datacenter efciency is a difcult problem to solve [3, 5]
Some numbers:
Data processed by Google every day: 20 PB Data processed by Facebook every day: 15 TB
Tutorial: MapReduce
8 / 191
Introduction
Big Ideas
Sharing is difcult:
Synchronization, deadlocks Finite bandwidth to access data from SAN Temporal dependencies are complicated (restarts)
Tutorial: MapReduce
9 / 191
Introduction
Big Ideas
Introduction
Big Ideas
Implications of Failures
Sources of Failures
Hardware / Software Preemption Unavailability of a resource due to overload
Failure Types
Permanent Transient
Tutorial: MapReduce
11 / 191
Introduction
Big Ideas
Tutorial: MapReduce
12 / 191
Introduction
Big Ideas
Introduction
Big Ideas
Typically, data is collected elsewhere and copied to the distributed lesystem Data-intensive applications
Read and process the whole Internet dataset from a crawler Read and process the whole Social Graph
Tutorial: MapReduce
14 / 191
Introduction
Big Ideas
Auxiliary components
Hadoop Pig Hadoop Hive Cascading
Pietro Michiardi (Eurecom) Tutorial: MapReduce 15 / 191
Introduction
Big Ideas
Seamless Scalability
Introduction
Big Ideas
Part One
Tutorial: MapReduce
17 / 191
MapReduce Framework
Tutorial: MapReduce
18 / 191
MapReduce Framework
Preliminaries
Preliminaries
Tutorial: MapReduce
19 / 191
MapReduce Framework
Preliminaries
Tutorial: MapReduce
20 / 191
MapReduce Framework
Preliminaries
Decompose the original problem in smaller, parallel tasks Schedule tasks on workers distributed in a cluster
Data locality Resource availability
Ensure workers get the data they need? Coordinate synchronization among workers? Share partial results Handle failures?
Tutorial: MapReduce
21 / 191
MapReduce Framework
Preliminaries
Tutorial: MapReduce
22 / 191
MapReduce Framework
Programming Model
Tutorial: MapReduce
23 / 191
MapReduce Framework
Programming Model
Tutorial: MapReduce
24 / 191
MapReduce Framework
Programming Model
map phase:
Given a list, map takes as an argument a function f (that takes a single argument) and applies it to all element in a list
fold phase:
Given a list, fold takes as arguments a function g (that takes two arguments) and an initial value g is rst applied to the initial value and the rst item in the list The result is stored in an intermediate variable, which is used as an input together with the next item to a second application of g The process is repeated until all items in the list have been consumed
Tutorial: MapReduce
25 / 191
MapReduce Framework
Programming Model
MapReduce Framework
Programming Model
In practice:
User-specied computation is applied (in parallel) to all input records of a dataset Intermediate results are aggregated by another user-specied computation
Tutorial: MapReduce
27 / 191
MapReduce Framework
Programming Model
Tutorial: MapReduce
28 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
29 / 191
MapReduce Framework
The Framework
Data Structures
In some algorithms, input keys are not used, in others they uniquely identify a record Keys can be combined in complex ways to design various algorithms
Tutorial: MapReduce
30 / 191
MapReduce Framework
The Framework
A MapReduce job
MapReduce Framework
The Framework
Where the magic happens Implicit between the map and reduce phases is a distributed group by operation on intermediate keys
Intermediate data arrive at each reducer in order, sorted by the key No ordering is guaranteed across reducers
Output keys from reducers are written back to the distributed lesystem
The output may consist of r distinct les, where r is the number of reducers Such output may be the input to a subsequent MapReduce phase
MapReduce Framework
The Framework
Figure: Mappers are applied to all input key-value pairs, to generate an arbitrary number of intermediate pairs. Reducers are applied to all intermediate values associated with the same intermediate key. Between the map and reduce phase lies a barrier that involves a large distributed sort and group by.
Pietro Michiardi (Eurecom) Tutorial: MapReduce 33 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
34 / 191
MapReduce Framework
The Framework
Mapper:
Takes an input key-value pair, tokenize the document Emits intermediate key-value pairs: the word is the key and the integer is the value
The framework:
Guarantees all values associated with the same key (the word) are brought to the same reducer
The reducer:
Receives all values associated to some keys Sums the values and writes output key-value pairs: the key is the word and the value is the number of occurrences
Pietro Michiardi (Eurecom) Tutorial: MapReduce 35 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
36 / 191
MapReduce Framework
The Framework
Side effects
Not allowed in functional programming E.g.: preserving state across multiple inputs State is kept internal
Tutorial: MapReduce
37 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
38 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
39 / 191
MapReduce Framework
The Framework
The number of tasks may exceed the number of available machines in a cluster
The scheduler takes care of maintaining something similar to a queue of pending tasks to be assigned to machines with available resources
MapReduce Framework
The Framework
Tutorial: MapReduce
41 / 191
MapReduce Framework
The Framework
Data/code co-location
Tutorial: MapReduce
42 / 191
MapReduce Framework
The Framework
IMPORTANT: the reduce operation cannot start until all mappers have nished
This is different from functional programming that allows lazy aggregation In practice, a common optimization is for reducers to pull data from mappers as soon as they nish
Tutorial: MapReduce
43 / 191
MapReduce Framework
The Framework
Errors and faults Using quite simple mechanisms, the MapReduce framework deals with:
Hardware failures
Individual machines: disks, RAM Networking equipment Power / cooling
Software failures
Exceptions, bugs
Tutorial: MapReduce
44 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
45 / 191
MapReduce Framework
The Framework
Hash-based partitioner
Computes the hash of the key modulo the number of reducers r This ensures a roughly even partitioning of the key space
However, it ignores values: this can cause imbalance in the data processed by each reducer
When dealing with complex keys, even the base partitioner may need customization
Tutorial: MapReduce
46 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
47 / 191
MapReduce Framework
The Framework
Figure: Complete view of MapReduce illustrating combiners and partitioners. Note: in Hadoop, partitioners are executed before combiners.
Pietro Michiardi (Eurecom) Tutorial: MapReduce 48 / 191
MapReduce Framework
The Framework
Tutorial: MapReduce
49 / 191
MapReduce Framework
The Framework
Colocate data and computation! As dataset sizes increase, more computing capacity is required for processing As compute capacity grows, the link between the compute nodes and the storage nodes becomes a bottleneck
One could eventually think of special-purpose interconnects for high-performance networking This is often a costly solution as cost does not increase linearly with performance
Key idea: abandon the separation between compute and storage nodes
This is exactly what happens in current implementations of the MapReduce framework A distributed lesystem is not mandatory, but highly desirable
Pietro Michiardi (Eurecom) Tutorial: MapReduce 50 / 191
MapReduce Framework
The Framework
Distributed lesystems
In this tutorial we will focus on HDFS, the Hadoop implementation of the Google distributed lesystem (GFS)
Tutorial: MapReduce
51 / 191
MapReduce Framework
The Framework
Master-slave architecture
NameNode: master maintains the namespace (metadata, le to block mapping, location of blocks) and maintains overall health of the le system DataNode: slaves manage the data blocks
Pietro Michiardi (Eurecom) Tutorial: MapReduce 52 / 191
MapReduce Framework
The Framework
HDFS, an Illustration
Tutorial: MapReduce
53 / 191
MapReduce Framework
The Framework
Contact the NameNode to determine where the actual data is stored NameNode replies with block identiers and locations (i.e., which DataNode) Contact the DataNode to fetch data
2 3 4
Contact the NameNode to update the namespace and verify permissions NameNode allocates a new block on a suitable DataNode The client directly streams to the selected DataNode Currently, HDFS les are immutable
MapReduce Framework
The Framework
HDFS Replication
Replication policy
Spread replicas across differen racks Robust against cluster node failures Robust against rack failures
Tutorial: MapReduce
55 / 191
MapReduce Framework
The Framework
HDFS: more on operational assumptions A small number of large les is preferred over a large number of small les
Metadata may explode Input splits fo MapReduce based on individual les
Mappers are launched for every le High startup costs Inefcient shufe and sort
MapReduce Framework
The Framework
Part Two
Tutorial: MapReduce
57 / 191
Hadoop MapReduce
Tutorial: MapReduce
58 / 191
Hadoop MapReduce
Preliminaries
Preliminaries
Tutorial: MapReduce
59 / 191
Hadoop MapReduce
Preliminaries
Hadoop Deployments
Pietro Michiardi (Eurecom) Tutorial: MapReduce 60 / 191
Hadoop MapReduce
Preliminaries
Terminology MapReduce:
Job: an execution of a Mapper and Reducer across a data set Task: an execution of a Mapper or a Reducer on a slice of data Task Attempt: instance of an attempt to execute a task Example:
Running Word Count across 20 les is one job 20 les to be mapped = 20 map tasks + some number of reduce tasks At least 20 attempts will be performed... more if a machine crashes
Task Attempts
Task attempted at least once, possibly more Multiple crashes on input imply discarding it Multiple attempts may occur in parallel (speculative execution) Task ID from TaskInProgress is not a unique identier
Tutorial: MapReduce
61 / 191
Hadoop MapReduce
HDFS in details
HDFS in details
Tutorial: MapReduce
62 / 191
Hadoop MapReduce
HDFS in details
Tutorial: MapReduce
63 / 191
Hadoop MapReduce
HDFS in details
Tutorial: MapReduce
64 / 191
Hadoop MapReduce
HDFS in details
Secondary NameNode
Merges the namespce with the edit log A useful trick to recover from a failure of the NameNode is to use the NFS copy of metadata and switch the secondary to primary
DataNode
They store data and talk to clients They report periodically to the NameNode the list of blocks they hold
Pietro Michiardi (Eurecom) Tutorial: MapReduce 65 / 191
Hadoop MapReduce
HDFS in details
External clients
For each block, the NameNode returns a set of DataNodes holding a copy thereof DataNodes are sorted according to their proximity to the client
MapReduce clients
TaskTracker and DataNodes are colocated For each block, the NameNode usually3 returns the local DataNode
Hadoop MapReduce
HDFS in details
Details on replication
Clients ask NameNode for a list of suitable DataNodes This list forms a pipeline: rst DataNode stores a copy of a block, then forwards it to the second, and so on
Replica Placement
Tradeoff between reliability and bandwidth Default placement:
First copy on the same node of the client, second replica is off-rack, third replica is on the same rack as the second but on a different node Since Hadoop 0.21, replica placement can be customized
Tutorial: MapReduce
67 / 191
Hadoop MapReduce
HDFS in details
Tutorial: MapReduce
68 / 191
Hadoop MapReduce
HDFS in details
Tutorial: MapReduce
69 / 191
Hadoop MapReduce
Hadoop I/O
Hadoop I/O
Tutorial: MapReduce
70 / 191
Hadoop MapReduce
Hadoop I/O
Whats next
Overview of what Hadoop offers For an in depth knowledge, use [11]
Tutorial: MapReduce
71 / 191
Hadoop MapReduce
Hadoop I/O
Data Integrity Every I/O operation on disks or the network may corrupt data
Users expect data not to be corrupted during storage or processing Data integrity usually achieved with checksums
Tutorial: MapReduce
72 / 191
Hadoop MapReduce
Hadoop I/O
Compression
Tutorial: MapReduce
73 / 191
Hadoop MapReduce
Hadoop I/O
Hadoop MapReduce
Hadoop I/O
Hadoop MapReduce
Hadoop I/O
SequenceFiles
Provide a persistent data structure for binary key-value pairs Also work well as containers for smaller les so that the framework is more happy (remember, better few large les than lots of small les) They come with the sync() method to introduce sync points to help managing InputSplits for MapReduce
Tutorial: MapReduce
76 / 191
Hadoop MapReduce
Tutorial: MapReduce
77 / 191
Hadoop MapReduce
Tutorial: MapReduce
78 / 191
Hadoop MapReduce
Job Submission
JobClient class
The runJob() method creates a new instance of a JobClient Then it calls the submitJob() on this class
Tutorial: MapReduce
79 / 191
Hadoop MapReduce
Hadoop MapReduce
Selecting a task
JobTracker rst needs to select a job (i.e. scheduling) TaskTrackers have a xed number of slots for map and reduce tasks JobTracker gives priority to map tasks (WHY?)
Data locality
JobTracker is topology aware
Useful for map tasks Unused for reduce tasks
Pietro Michiardi (Eurecom) Tutorial: MapReduce 81 / 191
Hadoop MapReduce
Hadoop MapReduce
Handling Failures In the real world, code is buggy, processes crash and machine fails Task Failure
Case 1: map or reduce task throws a runtime exception
The child JVM reports back to the parent TaskTracker TaskTracker logs the error and marks the TaskAttempt as failed TaskTracker frees up a slot to run another task
4 5
With streaming, you need to take care of the orphaned process. Exception is made for speculative execution
Tutorial: MapReduce 83 / 191
Hadoop MapReduce
JobTracker Failure
Currently, Hadoop has no mechanism for this kind of failure In future releases:
Multiple JobTrackers Use ZooKeeper as a coordination mechanisms
Tutorial: MapReduce
84 / 191
Hadoop MapReduce
Fair Scheduler
Every user gets a fair share of the cluster capacity over time Jobs are placed in to pools, one for each user
Users that submit more jobs have no more resources than oterhs Can guarantee minimum capacity per pool
Capacity Scheduler
Hierarchical queues (mimic an oragnization) FIFO scheduling in each queue Supports priority
Pietro Michiardi (Eurecom) Tutorial: MapReduce 85 / 191
Hadoop MapReduce
The MapReduce framework guarantees the input to every reducer to be sorted by key
The process by which the system sorts and transfers map outputs to reducers is known as shufe
Shufe is the most important part of the framework, where the magic happens
Good understanding allows optimizing both the framework and the execution time of MapReduce jobs
Tutorial: MapReduce
86 / 191
Hadoop MapReduce
Tutorial: MapReduce
87 / 191
Hadoop MapReduce
Shufe and Sort: the Map Side The output of a map task is not simply written to disk
In memory buffering Pre-sorting
Disk spills
Written in round-robin to a local dir Output data is parttioned corresponding to the reducers they will be sent to Within each partition, data is sorted (in-memory) Optionally, if there is a combiner, it is executed just after the sort phase
Pietro Michiardi (Eurecom) Tutorial: MapReduce 88 / 191
Hadoop MapReduce
Tutorial: MapReduce
89 / 191
Hadoop MapReduce
Tutorial: MapReduce
90 / 191
Hadoop MapReduce
Shufe and Sort: the Reduce Side The map output le is located on the local disk of tasktracker Another tasktracker (in charge of a reduce task) requires input from many other TaskTracker (that nished their map tasks)
How do reducers know which tasktrackers to fetch map output from?
When a map task nishes it noties the parent tasktracker The tasktracker noties (with the heartbeat mechanism) the jobtracker A thread in the reducer polls periodically the jobtracker Tasktrackers do not delete local map output as soon as a reduce task has fetched them (WHY?)
Tutorial: MapReduce
91 / 191
Hadoop MapReduce
Shufe and Sort: the Reduce Side The map outputs are copied to the the trasktracker running the reducer in memory (if they t)
Otherwise they are copied to disk
Input consolidation
A background thread merges all partial inputs into larger, sorted les Note that if compression was used (for map outputs to save bandwidth), decompression will take place in memory
Tutorial: MapReduce
92 / 191
Hadoop MapReduce
Tutorial: MapReduce
93 / 191
Hadoop MapReduce
MapReduce Types
Types:
K types implement WritableComparable V types implement Writable
Tutorial: MapReduce
94 / 191
Hadoop MapReduce
What is a Writable
Hadoop denes its own classes for strings (Text), integers (intWritable), etc... All keys are instances of WritableComparable
Why comparable?
Tutorial: MapReduce
95 / 191
Hadoop MapReduce
Tutorial: MapReduce
96 / 191
Hadoop MapReduce
Each split is divided into records, and the map processes each record (a key-value pair) in turn Splits and records are logical, they are not physically bound to a le
Tutorial: MapReduce
97 / 191
Hadoop MapReduce
Tutorial: MapReduce
98 / 191
Hadoop MapReduce
KeyValueTextInputFormat
Maps newline-terminated text lines of key SEPARATOR value
SequenceFileInputFormat
Binary le of key-value pairs with some additional metadata
SequenceFileAsTextInputFormat
Same as before but, maps (k.toString(), v.toString())
Tutorial: MapReduce
99 / 191
Hadoop MapReduce
FileInputFormat reads all les out of a specied directory and send them to the mapper
Tutorial: MapReduce
100 / 191
Hadoop MapReduce
Record Readers
KeyValueRecordReader
Used by KeyValueTextInputFormat
Tutorial: MapReduce
101 / 191
Hadoop MapReduce
Tutorial: MapReduce
102 / 191
Hadoop MapReduce
Any (WritableComparable, Writable) can be used By defalut, mapper output type assumed to be the same as the reducer output type
Tutorial: MapReduce
103 / 191
Hadoop MapReduce
WritableComparator
Tutorial: MapReduce
104 / 191
Hadoop MapReduce
Partiotioner
Tutorial: MapReduce
105 / 191
Hadoop MapReduce
The Reducer
void reduce(k2 key, Iterator<v2> values, OutputCollector<k3, v3> output, Reporter reporter ) Keys and values sent to one partition all go to the same reduce task Calls are sorted by key
Early keys are reduced and output before late keys
Tutorial: MapReduce
106 / 191
Hadoop MapReduce
Tutorial: MapReduce
107 / 191
Hadoop MapReduce
Analogous to InputFormat TextOutputFormat writes key value <newline> strings to output le SequenceFileOutputFormat uses a binary format to pack key-value pairs NullOutputFormat discards output
Tutorial: MapReduce
108 / 191
Hadoop MapReduce
Tutorial: MapReduce
109 / 191
Hadoop MapReduce
Tutorial: MapReduce
110 / 191
Hadoop MapReduce
Tutorial: MapReduce
111 / 191
Hadoop MapReduce
Conguration Before writing a MapReduce program, we need to set up and cogure the development environment
Components in Hadoop are congured with an ad hoc API Configuration class is a collection of properties and their values Resources can be combined into a conguration
Alternatives
Switch congurations (local, cluster) Alternatives (see Cloudera documentation for Ubuntu) is very effective
Pietro Michiardi (Eurecom) Tutorial: MapReduce 112 / 191
Hadoop MapReduce
Tutorial: MapReduce
113 / 191
Hadoop MapReduce
Cluster Execution
Packaging Launching a Job The WebUI Hadoop Logs Running Dependent Jobs, and Oozie
Tutorial: MapReduce
114 / 191
Hadoop MapReduce
Hadoop Deployments
Hadoop Deployments
Tutorial: MapReduce
115 / 191
Hadoop MapReduce
Hadoop Deployments
Cluster deployment
Private cluster Cloud-based cluster AWS Elasitc MapReduce
Outlook:
Cluster specication
Hardware Network Topology
Hadoop Conguration
Memory considerations
Tutorial: MapReduce
116 / 191
Hadoop MapReduce
Hadoop Deployments
Commodity = High-end
High-end machines perform better, which would imply a smaller cluster A single machine failure would compromise a large fraction of the cluster
A 2010 specication:
2 quad-cores 16-24 GB ECC RAM 4 1 TB SATA disks6 Gigabit Ethernet
6
Hadoop MapReduce
Hadoop Deployments
Hadoop MapReduce
Hadoop Deployments
Cluster Specication
Tutorial: MapReduce
119 / 191
Hadoop MapReduce
Hadoop Deployments
Tutorial: MapReduce
120 / 191
Hadoop MapReduce
Hadoop Deployments
Typical conguration
30-40 servers per rack 1 GB switch per rack Core switch or router with 1GB or better
Features
Aggregate bandwidth between nodes on the same rack is much larger than for nodes on different racks Rack awareness
Hadoop should know the cluster topology Benets both HDFS (data placement) and MapReduce (locality)
Tutorial: MapReduce
121 / 191
Hadoop MapReduce
Hadoop Deployments
Hadoop Conguration There are a handful of les for controlling the operation of an Hadoop Cluster
See next slide for a summary table
Hadoop MapReduce
Hadoop Deployments
Hadoop Conguration
Format Bash script Hadoop conguration XML Hadoop conguration XML Hadoop conguration XML Plain text Plain text
Description Environment variables that are used in the scripts to run Hadoop. I/O settings that are common to HDFS and MapReduce. Namenode, the secondary namenode, and the datanodes. Jobtracker, and the tasktrackers. A list of machines that each run a secondary namenode. A list of machines that each run a datanode and a tasktracker.
Tutorial: MapReduce
123 / 191
Hadoop MapReduce
Hadoop Deployments
All the moving parts of Hadoop (HDFS and MapReduce) can be individually congured
This is true for cluster conguration but also for job specic congurations
Hadoop MapReduce
Hadoop Deployments
Tutorial: MapReduce
125 / 191
Hadoop MapReduce
Hadoop Deployments
Example
Launch a cluster test-hadoop-cluster, with one master node (JobTracker and NameNode) and 5 worker nodes (DataNodes and TaskTrackers) hadoop-ec2 launch-cluster test-hadoop-cluster 5 See project webpage and Chapter 9, page 290 [11]
Pietro Michiardi (Eurecom) Tutorial: MapReduce 126 / 191
Hadoop MapReduce
Hadoop Deployments
Hadoop as a service
Amazon handles everything, which becomes transparent How this is done remains a mistery
Tutorial: MapReduce
127 / 191
Hadoop MapReduce
Hadoop Deployments
Part Three
Tutorial: MapReduce
128 / 191
Algorithm Design
Tutorial: MapReduce
129 / 191
Algorithm Design
Preliminaries
Preliminaries
Tutorial: MapReduce
130 / 191
Algorithm Design
Preliminaries
Learn by examples
Design patterns Synchronization is perhaps the most tricky aspect
Pietro Michiardi (Eurecom) Tutorial: MapReduce 131 / 191
Algorithm Design
Preliminaries
Algorithm Design Aspects that are not under the control of the designer
Where a mapper or reducer will run When a mapper or reducer begins or nishes Which input key-value pairs are processed by a specic mapper Which intermediate key-value paris are processed by a specic reducer
Algorithm Design
Preliminaries
Optimizations
Scalability (linear) Resource requirements (storage and bandwidth)
Outline
Local Aggregation Pairs and Stripes Order inversion Graph algorithms
Pietro Michiardi (Eurecom) Tutorial: MapReduce 133 / 191
Algorithm Design
Local Aggregation
Local Aggregation
Tutorial: MapReduce
134 / 191
Algorithm Design
Local Aggregation
Local Aggregation In the context of data-intensive distributed processing, the most important aspect of synchronization is the exchange of intermediate results
This involves copying intermediate results from the processes that produced them to those that consume them In general, this involves data transfers over the network In Hadoop, also disk I/O is involved, as intermediate results are written to disk
Algorithm Design
Local Aggregation
Combiners Combiners are a general mechanism to reduce the amount of intermediate data
They could be thought of as mini-reducers
Note: due to Zipan nature of term distributions, not all mappers will see all terms
Tutorial: MapReduce
136 / 191
Algorithm Design
Local Aggregation
Tutorial: MapReduce
137 / 191
Algorithm Design
Local Aggregation
In-Mapper Combiners
Tutorial: MapReduce
138 / 191
Algorithm Design
Local Aggregation
In-Mapper Combiners
Tutorial: MapReduce
139 / 191
Algorithm Design
Local Aggregation
In-Mapper Combiners
Tutorial: MapReduce
140 / 191
Algorithm Design
Local Aggregation
In-Mapper Combiners
Tutorial: MapReduce
141 / 191
Algorithm Design
Local Aggregation
In-Mapper Combiners
Mappers still need to emit all key-value pairs, combiners only reduce network trafc
Tutorial: MapReduce
142 / 191
Algorithm Design
Local Aggregation
Scalability bottleneck
The in-mapper combining technique strictly depends on having sufcient memory to store intermediate results
And you dont want the OS to deal with swapping
Multiple threads compete for the same resources A possible solution: block and ush
Implemented with a simple counter
Tutorial: MapReduce
143 / 191
Algorithm Design
Local Aggregation
Further Remarks
The extent to which efciency can be increased with local aggregation depends on the size of the intermediate key space
Opportunities for aggregation araise when multiple values are associated to the same keys
Tutorial: MapReduce
144 / 191
Algorithm Design
Local Aggregation
Algorithmic correctness with local aggregation The use of combiners must be thought carefully
In Hadoop, they are optional: the correctness of the algorithm cannot depend on computation (or even execution) of the combiners
In MapReduce, the reducer input key-value type must match the mapper output key-value type
Hence, for combiners, both input and output key-value types must match the output key-value type of the mapper
Algorithm Design
Local Aggregation
Algorithm Design
Local Aggregation
Tutorial: MapReduce
147 / 191
Algorithm Design
Local Aggregation
Tutorial: MapReduce
148 / 191
Algorithm Design
Local Aggregation
Tutorial: MapReduce
149 / 191
Algorithm Design
Local Aggregation
Tutorial: MapReduce
150 / 191
Algorithm Design
Local Aggregation
Tutorial: MapReduce
151 / 191
Algorithm Design
Local Aggregation
Tutorial: MapReduce
152 / 191
Algorithm Design
Tutorial: MapReduce
153 / 191
Algorithm Design
Next, we focus on a particular problem that benets from these two methods
Tutorial: MapReduce
154 / 191
Algorithm Design
Problem statement The problem: building word co-occurrence matrices for large corpora
The co-occurrence matrix of a corpus is a square n n matrix n is the number of unique words (i.e., the vocabulary size) A cell mij contains the number of times the word wi co-occurs with word wj within a specic context Context: a sentence, a paragraph a document or a window of m words NOTE: the matrix may be symmetric in some cases
Motivation
This problem is a basic building block for more complex operations Estimating the distribution of discrete joint events from a large number of observations Similar problem in other domains:
Customers who buy this tend to also buy that
Pietro Michiardi (Eurecom) Tutorial: MapReduce 155 / 191
Algorithm Design
Compression
Such techniques can help in solving the problem on a single machine However, there are scalability problems
Pietro Michiardi (Eurecom) Tutorial: MapReduce 156 / 191
Algorithm Design
The mapper:
Processes each input document Emits key-value pairs with:
Each co-occurring word pair as the key The integer one (the count) as the value
The reducer:
Receives pairs relative to co-occurring words
This requires moding the partitioner
Computes an absolute count of the joint event Emits the pair and the count as the nal key-value output
Basically reducers emit the cells of the matrix
Pietro Michiardi (Eurecom) Tutorial: MapReduce 157 / 191
Algorithm Design
Tutorial: MapReduce
158 / 191
Algorithm Design
The mapper:
Same two nested loops structure as before Co-occurrence information is rst stored in an associative array Emit key-value pairs with words as keys and the corresponding arrays as values
The reducer:
Receives all associative arrays related to the same word Performs an element-wise sum of all associative arrays with the same key Emits key-value output in the form of word, associative array
Basically, reducers emit rows of the co-occurrence matrix
Pietro Michiardi (Eurecom) Tutorial: MapReduce 159 / 191
Algorithm Design
Tutorial: MapReduce
160 / 191
Algorithm Design
The values are more complex and have serialization/deserialization overhead Greately benets from combiners, as the key space is the vocabulary Suffers from memory paging problems, if not properly engineered
Pietro Michiardi (Eurecom) Tutorial: MapReduce 161 / 191
Algorithm Design
Order Inversion
Order Inversion
Tutorial: MapReduce
162 / 191
Algorithm Design
Order Inversion
N(, ) is the number of times a co-occurring word pair is observed The denominator is called the marginal
Pietro Michiardi (Eurecom) Tutorial: MapReduce 163 / 191
Algorithm Design
Order Inversion
Algorithm Design
Order Inversion
Computing relative frequenceies: a basic approach We must dene the sort order of the pair
In this way, the keys are rst sorted by the left word, and then by the right word (in the pair) Hence, we can detect if all pairs associated with the word we are conditioning on (wi ) have been seen At this point, we can use the in-memory buffer, compute the relative frequencies and emit
What we want is that all pairs with the same left word are sent to the same reducer
Pietro Michiardi (Eurecom) Tutorial: Limitations of this approachMapReduce 165 / 191
Algorithm Design
Order Inversion
Tutorial: MapReduce
166 / 191
Algorithm Design
Order Inversion
Computing relative frequenceies: order inversion Recall that mappers emit pairs of co-occurring words as keys The mapper:
additionally emits a special key of the form (wi , ) The value associated to the special key is one, that represtns the contribution of the word pair to the marginal Using combiners, these partial marginal counts will be aggrefated before being sent to the reducers
The reducer:
We must make sure that the special key-value pairs are processed before any other key-value pairs where the left word is wi We also need to modify the partitioner as before, i.e., it would take into account only the rst word
Pietro Michiardi (Eurecom) Tutorial: MapReduce 167 / 191
Algorithm Design
Order Inversion
Memory requirements:
Minimal, because only the marginal (an integer) needs to be stored No buffering of individual co-occurring word No scalability bottleneck
Tutorial: MapReduce
168 / 191
Algorithm Design
Graph Algorithms
Graph Algorithms
Tutorial: MapReduce
169 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
170 / 191
Algorithm Design
Graph Algorithms
Algorithm Design
Graph Algorithms
Graph Representations
Tutorial: MapReduce
172 / 191
Algorithm Design
Graph Algorithms
Parallel Breadth-First-Search
Tutorial: MapReduce
173 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
174 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
175 / 191
Algorithm Design
Graph Algorithms
The pseudo-code
We use n to denote the node id (an integer) We use N to denote the node adjacency list and current distance The algorithm works by mapping over all nodes Mappers emit a key-value pair for each neighbor on the nodes adjacency list
The key: node id of the neighbor The value: the current distace to the node plus one If we can reach node n with a distance d, then we must be able to reach all the nodes connected ot n with distance d + 1
Pietro Michiardi (Eurecom) Tutorial: MapReduce 176 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
177 / 191
Algorithm Design
Graph Algorithms
MapReduce iterations
The rst time we run the algorithm, we discover all nodes connected to the source The second iteration, we discover all nodes connected to those Each iteration expands the search frontier by one hop How many iterations before convergence?
Tutorial: MapReduce
178 / 191
Algorithm Design
Graph Algorithms
Extensions
Storing the actual shortest-path Weighted edges (as opposed to unit distance)
Tutorial: MapReduce
179 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
180 / 191
Algorithm Design
Graph Algorithms
PageRank
Tutorial: MapReduce
181 / 191
Algorithm Design
Graph Algorithms
mL(n)
|G| is the number of nodes in the graph is a random jump factor L(n) is the set of out-going links from page n C(m) is the out-degree of node m
Pietro Michiardi (Eurecom) Tutorial: MapReduce 182 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
183 / 191
Algorithm Design
Graph Algorithms
PageRank in MapReduce
Tutorial: MapReduce
184 / 191
Algorithm Design
Graph Algorithms
PageRank in MapReduce
Tutorial: MapReduce
185 / 191
Algorithm Design
Graph Algorithms
PageRank in MapReduce
Tutorial: MapReduce
186 / 191
Algorithm Design
Graph Algorithms
PageRank in MapReduce
The reducer updates the value of the PageRank of every single node
Tutorial: MapReduce
187 / 191
Algorithm Design
Graph Algorithms
Tutorial: MapReduce
188 / 191
References
References I [1] [2] Adversarial information retrieval workshop. Michele Banko and Eric Brill. Scaling to very very large corpora for natural language disambiguation. In Proc. of the 39th Annual Meeting of the Association for Computational Linguistic (ACL), 2001. Luiz Andre Barroso and Urs Holzle. The datacebter as a computer: An introduction to the design of warehouse-scale machines. Morgan & Claypool Publishers, 2009. Monica Bianchini, Marco Gori, and Franco Scarselli. Inside pagerank. In ACM Transactions on Internet Technology, 2005.
Pietro Michiardi (Eurecom) Tutorial: MapReduce 189 / 191
[3]
[4]
References
References II
[5]
James Hamilton. Cooperative expendable micro-slice servers (cems): Low cost, low power servers for internet-scale services. In Proc. of the 4th Biennal Conference on Innovative Data Systems Research (CIDR), 2009. Tony Hey, Stewart Tansley, and Kristin Tolle. The fourth paradigm: Data-intensive scientic discovery. Microsoft Research, 2009. Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In Proc. of SPAA, 2011.
[6]
[7]
Tutorial: MapReduce
190 / 191
References
References III [8] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densication laws, shrinking diamters and possible explanations. In Proc. of SIGKDD, 2005. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringin order to the web. In Stanford Digital Library Working Paper, 1999.
[9]
[10] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed le system. In Proc. of the 26th IEEE Symposium on Massive Storage Systems and Technologies (MSST). IEEE, 2010.
Tutorial: MapReduce
191 / 191
References
References IV
[11] Tom White. Hadoop, The Denitive Guide. OReilly, Yahoo, 2010.
Tutorial: MapReduce
192 / 191