0% found this document useful (0 votes)
14 views

Mapreduce: Simple Programming For Big Results

MapReduce provides a simplified programming model for processing large datasets in parallel across clusters of computers. It involves two steps - Map and Reduce. In Map, a function is applied to all elements to generate intermediate key-value pairs. In Reduce, a summary operation is performed on all intermediate values with the same key. This allows for massively parallel processing without requiring expertise in threads and locks. A common example is WordCount, where words are counted by mapping words to keys and reducing the counts of identical keys. MapReduce abstracts parallelization details and is well-suited for applications with independent data-parallel tasks on large datasets.

Uploaded by

Kosuru ratnasai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Mapreduce: Simple Programming For Big Results

MapReduce provides a simplified programming model for processing large datasets in parallel across clusters of computers. It involves two steps - Map and Reduce. In Map, a function is applied to all elements to generate intermediate key-value pairs. In Reduce, a summary operation is performed on all intermediate values with the same key. This allows for massively parallel processing without requiring expertise in threads and locks. A common example is WordCount, where words are counted by mapping words to keys and reducing the counts of identical keys. MapReduce abstracts parallelization details and is well-suited for applications with independent data-parallel tasks on large datasets.

Uploaded by

Kosuru ratnasai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

MapReduce:

Simple Programming for


Big Results
After this video you will be able to..
• Explain how MapReduce simplifies
creating parallel programs

• Design a WordCount application using the


MapReduce programming model
MapReduce = Programming
Model for Hadoop Ecosystem

Hive Pig
Giraph

Spark
Storm

Flink
MapReduce

HBase

Cassandra

MongoDB
Zookeeper

YARN

HDFS
Parallel Programming = Requires Expertise

Semaphores
Threads Monitors
Message
Shared
Passing
Memory
Locks
MapReduce = Only Map and Reduce!

Semaphores
Threads Monitors
Message
Shared
Passing
Memory
Locks
Based on Functional Programming

Map = apply operation f (x) = y


to all elements

Reduce = summarize
operation on elements
Example MapReduce Application: WordCount

File 1
Result
File 2 WordCount
File

File N
Step 0: File is stored in HDFS
Step 1: Map on each node
My apple is red and my rose is blue....

You are the apple of my eye....



Map generates
My apple is red and my rose is blue.... key-value pairs

my, my  (my, 1), (my, 1)
apple  (apple, 1)
is, is  (is, 1), (is, 1)
red  (red, 1)
and  (and, 1)
rose  (rose, 1)
blue  (blue, 1)
Map generates
You are the apple of my eye.... key-value pairs

You  (You, 1)
are  (are, 1)
the  (the, 1)
apple  (apple, 1)
of  (of, 1)
my  (my, 1)
eye  (eye, 1)
Step 2: Sort and Shuffle
Pairs with same key
moved to same node
(You, 1) Step 2: Sort and Shuffle
(apple, 1) Pairs with same key
moved to same node
(apple, 1)

(is, 1)
(is, 1)

(rose, 1)
(red, 1)
Step 3: Reduce Add values for same keys
Step 3: Reduce Add values for same keys
(You, 1) (You, 1)
(apple, 1), (apple, 1) (apple, 2)

(my, 1), (my, 1),


(my, 3)
(my, 1)
(red, 1) (red, 1)
(rose, 1) (rose, 1)
Shuffle
Map Reduce
and Sort

Represents a large
number of applications.
Sort and Shuffle (You, http://you1.fake)
(apple, http://apple1.fake)
(apple, http://apple2.fake)

(is, http://apple2.fake)
(is, http://apple2.fake)

(rose, http://apple2.fake)
(red, http://apple2.fake)
Reduce Results for “apple”

(apple -> http://apple1.fake,


http://apple2.fake)
Reduce Results for “apple”

Key Value
(apple -> http://apple1.fake,
http://apple2.fake)

apple
Shuffle
Map Reduce
and Sort
Shuffle
Map Reduce
and Sort

Parallelization
over the input
Shuffle
Map Reduce
and Sort

Parallelization
Parallelization
over the input
data sorting
Shuffle
Map Reduce
and Sort

Parallelization Parallelization
Parallelization over
over the input intermediate data over data groups
MapReduce is bad for:
MapReduce is bad for:

Frequently changing data


MapReduce is bad for:

Frequently changing data


Dependent tasks
MapReduce is bad for:

Frequently changing data


Dependent tasks
Interactive analysis
MapReduce

Simplified parallel Applications with


programming independent data-
parallel tasks

You might also like