Big Data Analytics (2171607) : Chapter - 1 Mapreduce
Big Data Analytics (2171607) : Chapter - 1 Mapreduce
CHAPTER – 1
PROF. M DHANALAKSHMI,
ASST. PROF.,
IT DEPT,
SCET, SURAT. MapReduce
1. Map step:
Master node takes large problem input and slices it into
smaller sub problems; distributes these to worker nodes.
2. Reduce Step:
Reduce Job:
The reducer receives the key-value pair from multiple map jobs.
1. Mapper Class
2. Reducer Class
3. Driver Class
Input Split:
RecordReader:
3. Driver Class:
Major component in a MapReduce job is a Driver Class.
Mapping
Splitting step takes input Data Set from source and breaks up into
smaller Data Sets.
The output of this map function is a set of key and value pairs as
<Key,Value>.
Shuffle Function:
Also called Combine Function.
Sorting
Shuffle Function:
Also called Combine Function.
Sorting
This will distribute the work among all the map nodes.
A list of key-value pair will be created where the key is nothing but
the individual words and value is one.
After the mapper phase, a partition process takes place where sorting
and shuffling happen so that all the tuples with the same key are sent
to the corresponding reducer.
After the sorting and shuffling phase, each reducer will have a
unique key and a list of values corresponding to that very key. For
example, Bear, [1,1]; Car, [1,1,1].., etc.