Map Reduce Summary

This document summarizes a research paper on MapReduce and how it provides a simplified model for processing large datasets in parallel across clusters of computers. It describes MapReduce's map and reduce functions and provides results on sorting and searching large datasets. The summary concludes that MapReduce hides parallelization complexity and has been widely adopted at Google for various applications.

Uploaded by

karim ben hassen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Map Reduce Summary

Uploaded by

karim ben hassen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Research paper summary Page 1

Ahmad Afiq Bin Che Johari"

Rachid Chelouah"

Initiation to Research"

1 Apr 2014"

MapReduce:: Simplified Data Processing on Large Clusters Summary"

Introduction!
"
The authors, Jeffrey Dean and Sanjay Ghemawat are both employees at Google."
At Google, it’s very common to analyse a large input data in order to perform computa-
tions. "
"
For example, to output the set of most frequent queries against a database on a given
day, analysing the web request logs, graph structure of web documents, etc."
Since the input data is large, several computers or machines need to be used to dis-
tribute the computations so that we can get our answer faster."
"
Most of the time the computations are straightforward but the main challenge is with the
parallelisation of computation, distributing the data and how to handle failures on the dis-
tributed machines. "
"
Therefore, we seek for a programming model that allows the programmers to hide away
the complexity of parallelisation on distributed machines while performing the computa-
tions."
"
Method!
"
Below is the schematic representation of the authors MapReduce model."
Research paper summary Page 2

"
The data is split into a number of partitions and each partition of data will be passed
through a map function independently of each other. The map function outputs an inter-
mediary key-value results. And then, the reduce function will receive the intermediary
key-value to output a smaller set of key-value, possibly one key-value. Finally, the results
from the reduce function of each independent machines are aggregated to produce the
final result."
"
"
Results!
"
In order to provide a sense of benchmark to MapReduce, the authors measure the per-
formance of its implementation on two computations. The first involves a searching for a
pattern through a roughly one terabyte of data and the second, a sort on a roughly one
terabyte of data. These two examples should be representative of using MapReduce
against large datasets, that is the sort example to show a typical program that needs to
shuffle a large data and the search example to represent a program that needs to extract
information from a large data. The following is the result for the search implementation of
MapReduce."
"
" The Grep program(search)"

The figure above is result of the distributed grep program implementing MapReduce
model in the authors paper. It shows the progress of computing the search over time."
"
The input file is 10^10 100-byte records and the grep searches for a three-character pat-
tern which is relatively rare(92,337 records out of 10^10). The input file is then split into
64MB pieces, that is 15,000 * 64MB, and the resulting output is then stored in one file."
"
The rate gradually increases at the beginning as many machines are assigned during the
Mapping phase. It peaked at 30GB/s when around 1764 workers are assigned. As the
Mapping phase terminates, the rate decreases quite dramatically"
"
"
"
"
Research paper summary Page 3

"
Perspectives!
"
" In fact, the first version of the MapReduce library was implemented in February of 2003 "
" and enhancements were being made later in August of 2003 dealing mainly with the"
" more technical parallelisation problems like locality optimisation, dynamic load balancing
" of task execution across worker machiens, etc."
"
The authors were also surprised by how broadly the MapReduce paradigm could be im-
plemented at Google. Notable examples are:"
• large-scale machine learning problem"
• clustering problems for the Google News and Froogle products."
• large-scale graph computations."

"
" There has been a significant growth of the number of different instances of MapReduce "
" programs since its introduction at Google. "
" "
" The main factor that contributes to its success is that a program that could be written in "
" MapReduce model is simple and highly scalable, meaning it can be executed on different
" machines while hiding away the complexity of parallelisation. Therefore, a programmer "
" who has no experience with distributed or parallel systems could easily exploit the re"
" sources of the distributed machines."
"
" Below is some interesting statistics related to MapReduce instances at provided by the "
" authors from Google’s code management system."
Research paper summary Page 4

"
" "
Conclusion!
"
The MapReduce programming model has been widely accepted at Google for many dif-
ferent purposes."
"
The proposed model hides away the parallelisation details allowing programmers to have
a higher abstraction on a given problem. So, we focus more on the problem and not on
the parallelisation details."
"
Though not all, but many problems are expressible in MapReduce computations."
"
Bibliography!
"
The oldest reference provided by the authors dates back to 1989 but most of the refer-
ences were from the late nineties i.e 1997, 1996, and early twenties i.e 2001, 2003. "
Most of the references are related to parallel computation.

A Distributed File System-1
No ratings yet
A Distributed File System-1
65 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
192 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
MapReduce Algorithms For Big Data Analysis
No ratings yet
MapReduce Algorithms For Big Data Analysis
2 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Hadoop Mapreduce
No ratings yet
Hadoop Mapreduce
131 pages
Chapter 4
No ratings yet
Chapter 4
71 pages
1s07 Map Reduce Presentation 2019
No ratings yet
1s07 Map Reduce Presentation 2019
43 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
7-Brief About Big Data, Hadoop Map Reduce-31-07-2023
No ratings yet
7-Brief About Big Data, Hadoop Map Reduce-31-07-2023
35 pages
347 VLDBJ2013 MapReduceSurvey
No ratings yet
347 VLDBJ2013 MapReduceSurvey
27 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
MapReduce - Simpli Ed Data Processing On Large Clusters
No ratings yet
MapReduce - Simpli Ed Data Processing On Large Clusters
22 pages
Week 02
No ratings yet
Week 02
115 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
Large Scale Data Processing: Mapreduce Intro
No ratings yet
Large Scale Data Processing: Mapreduce Intro
24 pages
Map Reduce
No ratings yet
Map Reduce
27 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
AAAI2011 Tutorial Slides
No ratings yet
AAAI2011 Tutorial Slides
213 pages
Map Reduced B Seminar
No ratings yet
Map Reduced B Seminar
17 pages
Ijwsc 030401
No ratings yet
Ijwsc 030401
13 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
No ratings yet
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
42 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
Problem-Solving Using Mapreduce/Hadoop
No ratings yet
Problem-Solving Using Mapreduce/Hadoop
22 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
Research Assignment
No ratings yet
Research Assignment
7 pages
BDA Test Book
No ratings yet
BDA Test Book
9 pages
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
No ratings yet
Key Ideas Behind Mapreduce 3. What Is Mapreduce? 4. Hadoop Implementation of Mapreduce 5. Anatomy of A Mapreduce Job Run
27 pages
Simplified Data Processing For Large Cluster A Map
No ratings yet
Simplified Data Processing For Large Cluster A Map
7 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Mapreduce Article Review
No ratings yet
Mapreduce Article Review
8 pages
Low-Latency, High-Throughput Access To Static Global Resources Within The Hadoop Framework
No ratings yet
Low-Latency, High-Throughput Access To Static Global Resources Within The Hadoop Framework
15 pages
CC Unit4
No ratings yet
CC Unit4
14 pages
Map Reduce
No ratings yet
Map Reduce
69 pages
Untitled
No ratings yet
Untitled
16 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
BDA Unit 3
No ratings yet
BDA Unit 3
7 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
3412ijwsc01 PDF
No ratings yet
3412ijwsc01 PDF
13 pages
Bwu BTD 21 079-Pratap
No ratings yet
Bwu BTD 21 079-Pratap
9 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Ditp ch2
No ratings yet
Ditp ch2
2 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Mapreduce: Simpli - Ed Data Processing On Large Clusters
No ratings yet
Mapreduce: Simpli - Ed Data Processing On Large Clusters
4 pages

Map Reduce Summary

Uploaded by

Map Reduce Summary

Uploaded by

Research paper summary Page 1

Ahmad Afiq Bin Che Johari"

MapReduce:: Simplified Data Processing on Large Clusters Summary"

You might also like