Lecture 03
Lecture 03
Welcome1
Welcome Everyone
Everyone1
Hello Everyone
Hello 1
Input <filename, file text>
Everyone1
Map
MAP TASK 2
Map
• Parallelly Process a large number of
individual records to generate intermediate
key/value pairs.
Welcome 1
Welcome Everyone Everyone 1
Hello Everyone Hello 1
Why are you here
Everyone 1
I am also here
Why 1
They are also here
Are 1
Yes, it’s THEM!
You 1
The same people we were thinking of
Here 1
…….
…….
MAP TASKS
Reduce
• Reduce processes and merges all intermediate
values associated per key
Key Value
Welcome1 Everyone2
Everyone1 Hello 1
Hello 1 Welcome1
Everyone1
Reduce
• Each key assigned to one Reduce
• Parallelly Processes and merges all intermediate values by partitioning keys
Welcome1 Everyone2
REDUCE
Everyone1 TASK 1
Hello 1
Hello 1
REDUCE Welcome1
Everyone1 TASK 2
• Popular: Hash partitioning, i.e., key is assigned to reduce # = hash(key)
%number of reduce servers
Reduce(Cont.)
• Popular: Hash partitioning, i.e., key is assigned to reduce # = hash(key)%number
of reduce servers
N.B: Grep is used to search for a String pattern in file. MapReduce is a programming
model which takes care.
Some Applications of
MapReduce (2)
Reverse Web-Link Graph
– Input: Web graph: tuples (a, b) where (page a page b)
– Output: For each page, list of pages that link to it
– Map – process web log and for each input <source, target>, it outputs
<target, source>
– Reduce - emits <target, list(source)>
Reverse Web-Link Graph — The map function outputs <target, source> pairs for each link to a target URL found in a page named
“source”. The reduce function concatenates the list of all source URLs associated with a given target URL and emits the pair: <target,
list(source)>.
Some Applications of
MapReduce (3)
Count of URL access frequency
– Input: Log of accessed URLs, e.g., from proxy server
– Output: For each URL, % of total accesses for that URL
Sort
– Input: Series of (key, value) pairs
– Output: Sorted <value>s