Map Reduce
Map Reduce
By
Isha Shrestha
Jebina Maharjan
Manisha Bhandari
Sarah Gorkhali
Introduction
● Map Reduce is designed to process the large amount of data in
parallel by dividing the work.
● The whole job is taken from the user and divided into smaller
tasks and assign them into the working nodes.
● Map Reduce programs take input as a list and convert to the
output in the form of list as well.
Why Map Reduce
● Distribute the load
● Reduce the big data and extract the meaningful
data.
Working of Map Reduce
● The input data is first split into smaller blocks. Each block is
then assigned to a mapper for processing.
● For example, if a file has 100 records to be processed, 100
mappers can run together to process one record each. Or
maybe 50 mappers can run together to process two records
each. The Hadoop framework decides how many mappers to
use, based on the size of the data to be processed and the
memory block available on each mapper server.
Reduce
● It is the process that translates the <key, value> pairs resulting from mappers to
another set of <key, value> pairs to feed into the reducer.
● It decides how the data has to be presented to the reducer and also assigns it to a
particular reducer.
● The default partitioner determines the hash value for the key, resulting from the
mapper, and assigns a partition based on this hash value. There are as many partitions
as there are reducers. So, once the partitioning is complete, the data from each
partition is sent to a specific reducer.
Implementation
It can be written in Java, C, C++, Python, Ruby,Perl,etc.
Uses
It can be used with any complex problem that can be solved through
parallelization.
A social media site could use it to determine how many new sign-ups it received
over the past month from different countries, to gauge its increasing popularity
among different geographies.
A trading firm could perform its batch reconciliations faster and also determine
which scenarios often cause trades to break.
Search engines could determine page views, and marketers could perform
sentiment analysis using MapReduce.