Ditp - ch2 3

This document provides a simplified view of MapReduce. Mappers process input key-value pairs in parallel and generate intermediate key-value pairs. These pairs are shuffled and sorted by key, then reducers process all values associated with the same key to generate the final output. The word count algorithm is provided as an example - mappers emit each word as a key paired with a count of 1, while reducers sum the counts for each word.

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views2 pages

Ditp - ch2 3

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

A α B β C γ D δ E ε F ζ

mapper mapper mapper mapper

a 1 b 2 c 3 c 6 a 5 c 2 b 7 c 8

Shuffle and Sort: aggregate values by keys

a 1 5 b 2 7 c 2 9 8

reducer reducer reducer

X 5 Y 7 Z 9

Figure 2.2: Simplified view of MapReduce. Mappers are applied to all input
Figure 2.2: Skey-value
impliﬁed pairs,
view owhich
f MapReduce.
generate M anappers
arbitraryare number
applied tofo intermediate
all input key-‐value
key-value pairs,
which generate
pairs.an arbitrary are
Reducers number
applied of to
intermediate key-‐value
all values associated with pairs.
theRsameeducers
key.are applied to
Between
all values associated
the map and with reduce
the same
phases key. lies
Between
a barrierthe that
map and reduce
involves a large phases lies a barrier
distributed sort
that involves a large
and groupdistributed
by. sort and group by.

Algorithm 2.1 Word count

The mapper emits an intermediate key-value pair for each word in a document.
The reducer sums up all counts for each word.
1: class Mapper
2: method Map(docid a, doc d)
3: for all term t ∈ doc d do
4: Emit(term t, count 1)
1: class Reducer
2: method Reduce(term t, counts [c1 , c2 , . . .])
3: sum ← 0
4: for all count c ∈ counts [c1 , c2 , . . .] do
5: sum ← sum + c
6: Emit(term t, count sum)
number of map tasks to run, but the execution framework (see next section)
makes the final determination based on the physical layout of the data (more
details in Section 2.5 and Section 2.6). The situation is similar for the reduce
phase: a reducer object is initialized for each reduce task, and the Reduce
method is called once per intermediate key. In contrast with the number of
map tasks, the programmer can precisely specify the number of reduce tasks.
We will return to discuss the details of Hadoop job execution in Section 2.6,
which is dependent on an understanding of the distributed file system (covered
in Section 2.5). To reiterate: although the presentation of algorithms in this
book closely mirrors the way they would be implemented in Hadoop, our fo-
cus is on algorithm design and conceptual understanding—not actual Hadoop
programming. For that, we would recommend Tom White’s book [154].
What are the restrictions on mappers and reducers? Mappers and reduc-
ers can express arbitrary computations over their inputs. However, one must
generally be careful about use of external resources since multiple mappers or
reducers may be contending for those resources. For example, it may be unwise
for a mapper to query an external SQL database, since that would introduce a
scalability bottleneck on the number of map tasks that could be run in parallel
(since they might all be simultaneously querying the database).10 In general,
mappers can emit an arbitrary number of intermediate key-value pairs, and
they need not be of the same type as the input key-value pairs. Similarly,
reducers can emit an arbitrary number of final key-value pairs, and they can
differ in type from the intermediate key-value pairs. Although not permitted
in functional programming, mappers and reducers can have side effects. This
is a powerful and useful feature: for example, preserving state across multiple
inputs is central to the design of many MapReduce algorithms (see Chapter 3).
Such algorithms can be understood as having side effects that only change
state that is internal to the mapper or reducer. While the correctness of such
algorithms may be more difficult to guarantee (since the function’s behavior
depends not only on the current input but on previous inputs), most potential
synchronization problems are avoided since internal state is private only to in-
dividual mappers and reducers. In other cases (see Section 4.4 and Section 7.5),
it may be useful for mappers or reducers to have external side effects, such as
writing files to the distributed file system. Since many mappers and reducers
are run in parallel, and the distributed file system is a shared global resource,
special care must be taken to ensure that such operations avoid synchroniza-
tion conflicts. One strategy is to write a temporary file that is renamed upon
successful completion of the mapper or reducer [45].
In addition to the “canonical” MapReduce processing flow, other variations
are also possible. MapReduce programs can contain no reducers, in which case
mapper output is directly written to disk (one file per mapper). For embar-
rassingly parallel problems, e.g., parse a large text collection or independently
analyze a large number of images, this would be a common pattern. The
converse—a MapReduce program with no mappers—is not possible, although
10 Unless, of course, the database itself is highly scalable.

03 MapReduce
No ratings yet
03 MapReduce
184 pages
Map Reduce
No ratings yet
Map Reduce
33 pages
MAP Reduce - 1
No ratings yet
MAP Reduce - 1
34 pages
Unit 3
No ratings yet
Unit 3
33 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
ECS765P - W2 - The MapReduce Programming Model
No ratings yet
ECS765P - W2 - The MapReduce Programming Model
53 pages
Big Data
No ratings yet
Big Data
120 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
2 1-MapReduce
No ratings yet
2 1-MapReduce
16 pages
Unit - III
No ratings yet
Unit - III
37 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
STB ZTE B860H 4k Firmware (Root and Unlock) - Leakite
No ratings yet
STB ZTE B860H 4k Firmware (Root and Unlock) - Leakite
3 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Map Reduce Intro CS4961-L22
No ratings yet
Map Reduce Intro CS4961-L22
20 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
BIS613D Module 5 Textbook
No ratings yet
BIS613D Module 5 Textbook
9 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Marathi Chavat Katha PDF
No ratings yet
Marathi Chavat Katha PDF
3 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop - Mapreduce
No ratings yet
Hadoop - Mapreduce
5 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
MapReduce - Documentation
No ratings yet
MapReduce - Documentation
2 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
No ratings yet
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
42 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Selling Platform Connect - GuÃ-a Usuario - SP - 201406 PDF
No ratings yet
Selling Platform Connect - GuÃ-a Usuario - SP - 201406 PDF
194 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Map Red
No ratings yet
Map Red
6 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Mapreduce: Simpli - Ed Data Processing On Large Clusters
No ratings yet
Mapreduce: Simpli - Ed Data Processing On Large Clusters
4 pages
Automatic Examination Seating Arrangement System
No ratings yet
Automatic Examination Seating Arrangement System
41 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
10 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Cycle Count1
No ratings yet
Cycle Count1
2 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
7 pages
Theories of Cyberspace Regulation: Internet Governance, Topic 3
No ratings yet
Theories of Cyberspace Regulation: Internet Governance, Topic 3
34 pages
Dahua Open Source Software Notice
No ratings yet
Dahua Open Source Software Notice
33 pages
Us 11422692
No ratings yet
Us 11422692
107 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
4 pages
Debugging Essay
100% (1)
Debugging Essay
9 pages
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
No ratings yet
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
42 pages
Sea True
No ratings yet
Sea True
8 pages
HMarkets Brochure
No ratings yet
HMarkets Brochure
15 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
29 pages
MapReduce - What It Is, and Why It Is So Popular
No ratings yet
MapReduce - What It Is, and Why It Is So Popular
7 pages
2 Mapreduce Model Principles
No ratings yet
2 Mapreduce Model Principles
7 pages
Hadoop
No ratings yet
Hadoop
7 pages
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
No ratings yet
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
7 pages
UMA Manual
No ratings yet
UMA Manual
23 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
6 pages
K-Means Clustering Optimization Algorithm Based On Mapreduce
No ratings yet
K-Means Clustering Optimization Algorithm Based On Mapreduce
6 pages
Uraku Raku S-200c HD DVR
No ratings yet
Uraku Raku S-200c HD DVR
59 pages
Aeronautical Development Establishment (ADE) (DRDO Institute)
No ratings yet
Aeronautical Development Establishment (ADE) (DRDO Institute)
9 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
4 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
4 pages
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
No ratings yet
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
4 pages
Balanced K-Means Revisited-1
No ratings yet
Balanced K-Means Revisited-1
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
3 pages
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
No ratings yet
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
3 pages
BEREKET Database Design Basics
No ratings yet
BEREKET Database Design Basics
7 pages
Slides GeoprocessWithPythonInArcGIS10 1
No ratings yet
Slides GeoprocessWithPythonInArcGIS10 1
22 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
3 pages
Fast Scalable K-Means++ Algorithm With Mapreduce
No ratings yet
Fast Scalable K-Means++ Algorithm With Mapreduce
2 pages
Paper Dvi
No ratings yet
Paper Dvi
7 pages
Lab 1
No ratings yet
Lab 1
10 pages
Boost - Signals: Douglas Gregor
No ratings yet
Boost - Signals: Douglas Gregor
38 pages
OOP Sessional 1 Solution
No ratings yet
OOP Sessional 1 Solution
11 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
3 pages
Abcdplace Tcad2020 Lin
No ratings yet
Abcdplace Tcad2020 Lin
13 pages
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
No ratings yet
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
3 pages
Balanced K-Means Revisited-5
No ratings yet
Balanced K-Means Revisited-5
3 pages
AI and DS
No ratings yet
AI and DS
6 pages
2014-Mmac-Tr-Xxx - Ias - 10920ec001 - Investigation of Failure On Hot Standby Unit MCR Rev 01
No ratings yet
2014-Mmac-Tr-Xxx - Ias - 10920ec001 - Investigation of Failure On Hot Standby Unit MCR Rev 01
10 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
2 pages
My Resume
No ratings yet
My Resume
1 page
Planning Tools Sap Integrated Planning Ip
No ratings yet
Planning Tools Sap Integrated Planning Ip
10 pages
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
No ratings yet
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
1 page
SPM Unitwise Imp Questions
No ratings yet
SPM Unitwise Imp Questions
4 pages
Lab 02
No ratings yet
Lab 02
4 pages
Data Structures in C MCQ Questions and Answers For Sppu Exam 2020 Part-1 - ToolsandJobs
No ratings yet
Data Structures in C MCQ Questions and Answers For Sppu Exam 2020 Part-1 - ToolsandJobs
4 pages
Aacable Wordpress Com Tag Block Adult Web Sites in Mikrotik
No ratings yet
Aacable Wordpress Com Tag Block Adult Web Sites in Mikrotik
7 pages
Upload File From Local Disk To A Specific Box Folder in C#
No ratings yet
Upload File From Local Disk To A Specific Box Folder in C#
2 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Ditp - ch2 3

Uploaded by

Ditp - ch2 3

Uploaded by

A α B β C γ D δ E ε F ζ

mapper mapper mapper mapper

Shuffle and Sort: aggregate values by keys

reducer reducer reducer

Algorithm 2.1 Word count

You might also like