0% found this document useful (0 votes)
14 views5 pages

Mapreduce

Hadoop MapReduce is a programming model designed for processing large datasets in a distributed manner, consisting of two main phases: the Map phase, which produces key-value pairs from input data, and the Reduce phase, which aggregates these pairs to generate final output. The process involves dividing input data, tokenizing words, and utilizing mappers and reducers to count occurrences of each word. The final results are collected and written to an output file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Mapreduce

Hadoop MapReduce is a programming model designed for processing large datasets in a distributed manner, consisting of two main phases: the Map phase, which produces key-value pairs from input data, and the Reduce phase, which aggregates these pairs to generate final output. The process involves dividing input data, tokenizing words, and utilizing mappers and reducers to count occurrences of each word. The final results are collected and written to an output file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Mapreduce

What is mapreduce and what does it do ?


Source: edureka.com
Overview of Mapreduce
• Hadoop MapReduce is a programming model for processing large datasets in a
distributed manner, primarily used within the Hadoop ecosystem.

• Two Main Phases:


• Map Phase: Processes input data and produces key-value pairs.
• Reduce Phase: Aggregates the key-value pairs and generates the final output.

• The MapReduce component distributes the computational tasks and may


redistribute data between the "map" and "reduce" phases for processing. It also
handles gathering the results back together.

• Minimally, applications specify the input/output locations and


supply map and reduce functions via implementations of appropriate interfaces
and/or abstract-classes.

Source: Hadoop.apache.org
Source: edureka.com
How Mapreduce Word Count Works
• Divide the input into three splits as shown in the figure. This will distribute the work
among all the map nodes.
• Tokenize the words in each of the mappers and give a hardcoded value (1) to each of
the tokens or words.
• Mapper phase: A list of key-value pair will be created where the key is the individual
words and value is one.
• Sorting and shuffling: A partition process takes place where sorting and shuffling
happen so that all the tuples with the same key are sent to the corresponding reducer.
• Each reducer will have a unique key and a list of values corresponding to that very key.
For example, Bear, [1,1]; Car, [1,1,1].., etc.
• Each Reducer counts the values which are present in that list of values. As shown in
the figure, reducer gets a list of values which is [1,1] for the key Bear. Then, it counts
the number of ones in the very list and gives the final output as – Bear, 2.
• Finally, all the output key/value pairs are then collected and written in the output file.

You might also like