3.4 Map Scheduler
3.4 Map Scheduler
Welcome1
Welcome Everyone
Everyone1
Hello Everyone
Hello 1
Input <filename, file text>
Everyone1
Map
MAP TASK 2
Map
• Parallelly Process a large number of
individual records to generate intermediate
key/value pairs.
Welcome 1
Welcome Everyone Everyone 1
Hello Everyone Hello 1
Why are you here
Everyone 1
I am also here
Why 1
They are also here
Are 1
Yes, it’s THEM!
You 1
The same people we were thinking of
Here 1
…….
…….
MAP TASKS
Reduce
• Reduce processes and merges all intermediate
values associated per key
Key Value
Welcome1 Everyone2
Everyone1 Hello 1
Hello 1 Welcome1
Everyone1
Reduce
• Each key assigned to one Reduce
• Parallelly Processes and merges all intermediate values by partitioning keys
Welcome1 Everyone2
REDUCE
Everyone1 TASK 1
Hello 1
Hello 1
REDUCE Welcome1
Everyone1 TASK 2
• Popular: Hash partitioning, i.e., key is assigned to reduce # = hash(key)
%number of reduce servers
Hadoop Code - Map
public static class MapClass extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one =
new IntWritable(1);
private Text word = new Text();
– Map – process web log and for each input <source, target>, it
outputs <target, source>
– Reduce - emits <target, list(source)>
Some Applications of
MapReduce
Count of URL access frequency
(3)
– Input: Log of accessed URLs, e.g., from proxy server
– Output: For each URL, % of total accesses for that URL
Sort
– Input: Series of (key, value) pairs
– Output: Sorted <value>s