0% found this document useful (0 votes)
4 views1 page

Map Reduce

MapReduce is a programming model for processing large data sets using a map function to generate intermediate key/value pairs and a reduce function to merge values. It allows for automatic parallelization and execution on commodity machines, simplifying the use of distributed systems for programmers. The model has been widely adopted, with many programs implemented and thousands of jobs executed daily on Google's clusters.

Uploaded by

mdhasan.ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views1 page

Map Reduce

MapReduce is a programming model for processing large data sets using a map function to generate intermediate key/value pairs and a reduce function to merge values. It allows for automatic parallelization and execution on commodity machines, simplifying the use of distributed systems for programmers. The model has been widely adopted, with many programs implemented and thousands of jobs executed daily on Google's clusters.

Uploaded by

mdhasan.ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

1.

1 MapReduce

MapReduce [14] is a programming model and an associated implementation for processing and
generating large data sets. Users specify a map function that processes a key/value pair to
generate a set of intermediate key/value pairs and a reduce function that merges all intermediate
values associated with the same intermediate key. Programs written in this functional style are
automatically parallelized and executed on a large cluster of commodity machines. The run-time
system takes care of the details of partitioning the input data, scheduling the program’s execution
across a set of machines, handling machine failures and managing the required inter-machine
communication. This allows programmers without any experience with parallel and distributed
systems to easily utilize the resources of a large distributed system. A typical MapReduce com-
putation processes many terabytes of data on thousands of machines. Programmers find the
system easy to use: hundreds of MapReduce programs have been implemented and upwards of
one thousand MapReduce jobs are executed on Googles clusters every day. MapReduce provides
an abstraction that involves the programmer defining a “mapper” and a “reducer,” with the
following signatures:

 Map: (key1, value1) → list (key2, value2)


 Reduce: (key2, list (value2)) → list (key3, value3).

You might also like