What Is MapReduce in Hadoop
What Is MapReduce in Hadoop
Map tasks deal with splitting and mapping of data while Reduce
tasks shuffle and reduce the data.
Consider you have following input data for your MapReduce in Big
data Program
bad
Class
good
Hadoop
is
to
Welcome
Input Splits:
Mapping
This is the very first phase in the execution of map-reduce
program. In this phase data in each split is passed to a mapping
function to produce output values.
Shuffling
Reducing
In this phase, output values from the Shuffling phase are aggregated.
This phase combines values from Shuffling phase and returns a
single output value.
Hadoop divides the job into tasks. There are two types of tasks:
A job is divided into multiple tasks which are then run onto
multiple data nodes in a cluster.