Mapreduce
Mapreduce
Source: Hadoop.apache.org
Source: edureka.com
How Mapreduce Word Count Works
• Divide the input into three splits as shown in the figure. This will distribute the work
among all the map nodes.
• Tokenize the words in each of the mappers and give a hardcoded value (1) to each of
the tokens or words.
• Mapper phase: A list of key-value pair will be created where the key is the individual
words and value is one.
• Sorting and shuffling: A partition process takes place where sorting and shuffling
happen so that all the tuples with the same key are sent to the corresponding reducer.
• Each reducer will have a unique key and a list of values corresponding to that very key.
For example, Bear, [1,1]; Car, [1,1,1].., etc.
• Each Reducer counts the values which are present in that list of values. As shown in
the figure, reducer gets a list of values which is [1,1] for the key Bear. Then, it counts
the number of ones in the very list and gives the final output as – Bear, 2.
• Finally, all the output key/value pairs are then collected and written in the output file.