BDP 2024 08
BDP 2024 08
BDP 2024 08
Jiaul Paik
Lecture 8
Programming on Hadoop
Map-reduce Model
Example Problem: Counting Words
• Sample application
• Case 1:
• Files too large for memory, but all <word, count> pairs fit in memory
• You can create a big string array OR you can create a hash table
Word Count
• Case 2: All <word, count> pairs do not fit in memory, but fit into disk
Map
extract something you care about (here word Group by key Reduce
and count) (sort and shuffle) Aggregate, summarize
Summary
k1 v1
map
c1
f1 k2 v2
map
c2
f2 k3 v3
… …
map
c3 k4 v4
f3
MapReduce: Reduce Step
key-value pairs
(produced by Map step)
Output
Key-value groups key-value pairs
k1 v1
reduce
k2 v2
k1 v1 v3 v6 k1 𝑣′
reduce
k1 v3
Group
by key k2 v2 v5 k2 𝑣′′
…
k3 v4
… …
k2 v5
𝑣′′′
v6 reduce
k1 k3 v4 k3
Map-reduce: Word Count
Provided by the Handled by MR system Provided by the
programmer programmer
MAP: Reduce:
Group by key:
Read input and Collect all values
Collect all pairs with
deep learning produces a set of belonging to the key
the same key
architectures such as key-value pairs and output
deep neural networks,
deep belief networks,
(deep, 1) (deep, 1)
deep reinforcement (learning, 1)
(deep, 1)
learning, recurrent (architectures, 1)
(such, 1) (networks, 1) (deep, 2)
neural networks and (as, 1) (networks, 1) (networks, 3)
(…., 1) (networks, 1)
convolutional neural (….., 1)
(…., 1) (the, 3)
nets have been applied
(networks, 1)
to fields including (the, 1) (reinforcement, 1)
computer vision (deep, 1)
(the, 1)
(reinforcement , 1) (vision, 1)
(…., 1) (the, 1) …
(and, 1) (reinforcement, 1)
…. (vision, 1)
(vision, 1)
Big document …
……..
(key, value) (key, value) (key, value)
Map-Reduce Execution
Detailed Look
Map-reduce System: Inside Look
map task 1 map task 2 map task 3
M M M M M M M M M
() () () () () () () () ()
() ()
R R R R
……
m-1
Shuffle and Sort
merge on disk
map
Buffer in on local disk
memory
Input data
(chunk)
partitions