0% found this document useful (0 votes)

19 views

Mapreduce: Simple Programming For Big Results

MapReduce provides a simplified programming model for processing large datasets in parallel across clusters of computers. It involves two steps - Map and Reduce. In Map, a function is applied to all elements to generate intermediate key-value pairs. In Reduce, a summary operation is performed on all intermediate values with the same key. This allows for massively parallel processing without requiring expertise in threads and locks. A common example is WordCount, where words are counted by mapping words to keys and reducing the counts of identical keys. MapReduce abstracts parallelization details and is well-suited for applications with independent data-parallel tasks on large datasets.

Uploaded by

Kosuru ratnasai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Mapreduce: Simple Programming For Big Results

Uploaded by

Kosuru ratnasai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

MapReduce:

Simple Programming for

Big Results
After this video you will be able to..
• Explain how MapReduce simplifies
creating parallel programs

• Design a WordCount application using the

MapReduce programming model
MapReduce = Programming
Model for Hadoop Ecosystem

Hive Pig
Giraph

Spark
Storm

Flink
MapReduce

HBase

Cassandra

MongoDB
Zookeeper

YARN

HDFS
Parallel Programming = Requires Expertise

Semaphores
Threads Monitors
Message
Shared
Passing
Memory
Locks
MapReduce = Only Map and Reduce!

Semaphores
Threads Monitors
Message
Shared
Passing
Memory
Locks
Based on Functional Programming

Map = apply operation f (x) = y

to all elements

Reduce = summarize
operation on elements
Example MapReduce Application: WordCount

File 1
Result
File 2 WordCount
File

File N
Step 0: File is stored in HDFS
Step 1: Map on each node
My apple is red and my rose is blue....
…

You are the apple of my eye....

…

…
Map generates
My apple is red and my rose is blue.... key-value pairs
…
my, my  (my, 1), (my, 1)
apple  (apple, 1)
is, is  (is, 1), (is, 1)
red  (red, 1)
and  (and, 1)
rose  (rose, 1)
blue  (blue, 1)
Map generates
You are the apple of my eye.... key-value pairs
…
You  (You, 1)
are  (are, 1)
the  (the, 1)
apple  (apple, 1)
of  (of, 1)
my  (my, 1)
eye  (eye, 1)
Step 2: Sort and Shuffle
Pairs with same key
moved to same node
(You, 1) Step 2: Sort and Shuffle
(apple, 1) Pairs with same key
moved to same node
(apple, 1)

(is, 1)
(is, 1)

(rose, 1)
(red, 1)
Step 3: Reduce Add values for same keys
Step 3: Reduce Add values for same keys
(You, 1) (You, 1)
(apple, 1), (apple, 1) (apple, 2)

(my, 1), (my, 1),

(my, 3)
(my, 1)
(red, 1) (red, 1)
(rose, 1) (rose, 1)
Shuffle
Map Reduce
and Sort

Represents a large
number of applications.
Sort and Shuffle (You, http://you1.fake)
(apple, http://apple1.fake)
(apple, http://apple2.fake)

(is, http://apple2.fake)
(is, http://apple2.fake)

(rose, http://apple2.fake)
(red, http://apple2.fake)
Reduce Results for “apple”

(apple -> http://apple1.fake,

http://apple2.fake)
Reduce Results for “apple”

Key Value
(apple -> http://apple1.fake,
http://apple2.fake)

apple
Shuffle
Map Reduce
and Sort
Shuffle
Map Reduce
and Sort

Parallelization
over the input
Shuffle
Map Reduce
and Sort

Parallelization
Parallelization
over the input
data sorting
Shuffle
Map Reduce
and Sort

Parallelization Parallelization
Parallelization over
over the input intermediate data over data groups
MapReduce is bad for:
MapReduce is bad for:

Frequently changing data

MapReduce is bad for:

Frequently changing data

Dependent tasks
MapReduce is bad for:

Frequently changing data

Dependent tasks
Interactive analysis
MapReduce

Simplified parallel Applications with

programming independent data-
parallel tasks

MU - Comen - Monitor Multiparâmetro - K12 PRO
No ratings yet
MU - Comen - Monitor Multiparâmetro - K12 PRO
411 pages
CK Vsi Tds 11.2 Manual
No ratings yet
CK Vsi Tds 11.2 Manual
800 pages
VF Page Document
No ratings yet
VF Page Document
21 pages
T3 - Auditing Oracle Financials 11i - Part 1
No ratings yet
T3 - Auditing Oracle Financials 11i - Part 1
43 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
No ratings yet
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
87 pages
ESSIR MapReduce For Indexing
No ratings yet
ESSIR MapReduce For Indexing
86 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
June 19th 2009
No ratings yet
June 19th 2009
71 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
ECS765P_W2_The MapReduce Programming Model
No ratings yet
ECS765P_W2_The MapReduce Programming Model
53 pages
Lec 8
No ratings yet
Lec 8
19 pages
Lec 8
No ratings yet
Lec 8
24 pages
Write Your First MapReduce Program in 20 Minutes
No ratings yet
Write Your First MapReduce Program in 20 Minutes
16 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
L-4
No ratings yet
L-4
3 pages
Big Data Analytics Module 3: Mapreduce Paradigm: Faculty Name: Ms. Varsha Sanap Dr. Vivek Singh
No ratings yet
Big Data Analytics Module 3: Mapreduce Paradigm: Faculty Name: Ms. Varsha Sanap Dr. Vivek Singh
36 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
Ch02a Mapreduce
No ratings yet
Ch02a Mapreduce
53 pages
L04-MapReduce
No ratings yet
L04-MapReduce
37 pages
Distributed Computing Seminar: Mapreduce Theory and Implementation
No ratings yet
Distributed Computing Seminar: Mapreduce Theory and Implementation
30 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Map Reduce Intro CS4961-L22
No ratings yet
Map Reduce Intro CS4961-L22
20 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
BDA List of Experiments For Practical Exam
No ratings yet
BDA List of Experiments For Practical Exam
21 pages
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
No ratings yet
Google'S Mapreduce Programming Model - Revisited: Ralf L Ammel
42 pages
mapreduce-example
No ratings yet
mapreduce-example
9 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Mapreduce: Theory and Implementation: Cse 490H - Intro To Distributed Computing, Modified by George Lee
No ratings yet
Mapreduce: Theory and Implementation: Cse 490H - Intro To Distributed Computing, Modified by George Lee
33 pages
L11 MapReduce Dijkstra BFS
No ratings yet
L11 MapReduce Dijkstra BFS
50 pages
Parallel Programming, Mapreduce Model: Unit Ii
No ratings yet
Parallel Programming, Mapreduce Model: Unit Ii
47 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
Hadoop_and_MapReduce_Notes
No ratings yet
Hadoop_and_MapReduce_Notes
4 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
MapReduce Introduction
No ratings yet
MapReduce Introduction
34 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Color Mapping: Exploring Visual Perception and Analysis in Computer Vision
From Everand
Color Mapping: Exploring Visual Perception and Analysis in Computer Vision
Fouad Sabry
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Depth First Search: Fundamentals and Applications
From Everand
Depth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
100 Recipes for Programming Java
From Everand
100 Recipes for Programming Java
Jamie Munro
4.5/5 (2)
Module 4 Introduction To Spreadsheets - Models v2
No ratings yet
Module 4 Introduction To Spreadsheets - Models v2
12 pages
Why Is Big Data Processing Different?
No ratings yet
Why Is Big Data Processing Different?
23 pages
Practical Mvo
No ratings yet
Practical Mvo
30 pages
When To Reconsider Hadoop ?
No ratings yet
When To Reconsider Hadoop ?
11 pages
The Moving Average Models MA (1) and MA (2) : Al Nosedal University of Toronto
No ratings yet
The Moving Average Models MA (1) and MA (2) : Al Nosedal University of Toronto
47 pages
Getting Started: Why Hadoop?
No ratings yet
Getting Started: Why Hadoop?
13 pages
The Relational Data Model
No ratings yet
The Relational Data Model
10 pages
The Hadoop Ecosystem: So Much Free Stuff!
No ratings yet
The Hadoop Ecosystem: So Much Free Stuff!
21 pages
Cloud Computing: An Important Big Data Enabler
No ratings yet
Cloud Computing: An Important Big Data Enabler
23 pages
Pca General DB 0
No ratings yet
Pca General DB 0
52 pages
Problems and challenges in spatial analysis
No ratings yet
Problems and challenges in spatial analysis
5 pages
Introduction To Computer Graphics Using Opengl And Java 3rd Edition 3rd Karsten Lehn pdf download
100% (1)
Introduction To Computer Graphics Using Opengl And Java 3rd Edition 3rd Karsten Lehn pdf download
88 pages
Advanced DAX ODT Data Sheet
No ratings yet
Advanced DAX ODT Data Sheet
1 page
Introduction To Cadd
No ratings yet
Introduction To Cadd
17 pages
Embedded Computing Systems Unit - I-Instruction Set Text Books: 1. Wayne Wolf: Computers As Components, Principles of Embedded Computing Systems Design, 2nd Edition, Elsevier, 2008
No ratings yet
Embedded Computing Systems Unit - I-Instruction Set Text Books: 1. Wayne Wolf: Computers As Components, Principles of Embedded Computing Systems Design, 2nd Edition, Elsevier, 2008
40 pages
Frenic Inverter Pole Tuning Procedure
No ratings yet
Frenic Inverter Pole Tuning Procedure
2 pages
WCP Faceplates Analogue en
No ratings yet
WCP Faceplates Analogue en
79 pages
Advantages of Using MS Word
No ratings yet
Advantages of Using MS Word
1 page
Microsoft - AZ-900.v2020-05-10.q66: Show Answer
No ratings yet
Microsoft - AZ-900.v2020-05-10.q66: Show Answer
42 pages
Artillery Sidewinder X1 Manual 20200804
No ratings yet
Artillery Sidewinder X1 Manual 20200804
128 pages
Industry 4.0 and IoT
No ratings yet
Industry 4.0 and IoT
23 pages
New Text Document
No ratings yet
New Text Document
3 pages
Adeel CV-1
No ratings yet
Adeel CV-1
1 page
CMS Supervisor Install Guide
No ratings yet
CMS Supervisor Install Guide
126 pages
SIMATIC - S7-1200 Modbus TCP
80% (5)
SIMATIC - S7-1200 Modbus TCP
21 pages
01 System Administration Concepts PDF
No ratings yet
01 System Administration Concepts PDF
26 pages
Xii Science It Assignment File Feb 2024
No ratings yet
Xii Science It Assignment File Feb 2024
14 pages
GCSE Computer Science notes
No ratings yet
GCSE Computer Science notes
52 pages
Mon Parapluie Jaune - Poojana Prasanna
No ratings yet
Mon Parapluie Jaune - Poojana Prasanna
9 pages
Home Français Español Deutsch Português Русский: English My Account My Wishlist US$
No ratings yet
Home Français Español Deutsch Português Русский: English My Account My Wishlist US$
24 pages
SAL Functional Work Tracker
No ratings yet
SAL Functional Work Tracker
123 pages
Kerion Gemma Toric Numberic Fourth
No ratings yet
Kerion Gemma Toric Numberic Fourth
23 pages
Human Capital Management (HCM I) Case Study: Product Motivation Prerequisites
No ratings yet
Human Capital Management (HCM I) Case Study: Product Motivation Prerequisites
48 pages
Toolkit Nasabah
No ratings yet
Toolkit Nasabah
4 pages
De-Bug It
No ratings yet
De-Bug It
2 pages

Mapreduce: Simple Programming For Big Results

Uploaded by

Mapreduce: Simple Programming For Big Results

Uploaded by

MapReduce:

Simple Programming for

• Design a WordCount application using the

Map = apply operation f (x) = y

You are the apple of my eye....

(my, 1), (my, 1),

(apple -> http://apple1.fake,

Frequently changing data

Frequently changing data

Frequently changing data

Simplified parallel Applications with

You might also like