Mapreduce

Hadoop MapReduce is a programming model designed for processing large datasets in a distributed manner, consisting of two main phases: the Map phase, which produces key-value pairs from input data, and the Reduce phase, which aggregates these pairs to generate final output. The process involves dividing input data, tokenizing words, and utilizing mappers and reducers to count occurrences of each word. The final results are collected and written to an output file.

Uploaded by

priyanka chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Mapreduce

Uploaded by

priyanka chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Mapreduce

What is mapreduce and what does it do ?

Source: edureka.com
Overview of Mapreduce
• Hadoop MapReduce is a programming model for processing large datasets in a
distributed manner, primarily used within the Hadoop ecosystem.

• Two Main Phases:

• Map Phase: Processes input data and produces key-value pairs.
• Reduce Phase: Aggregates the key-value pairs and generates the final output.

• The MapReduce component distributes the computational tasks and may

redistribute data between the "map" and "reduce" phases for processing. It also
handles gathering the results back together.

• Minimally, applications specify the input/output locations and

supply map and reduce functions via implementations of appropriate interfaces
and/or abstract-classes.

Source: Hadoop.apache.org
Source: edureka.com
How Mapreduce Word Count Works
• Divide the input into three splits as shown in the figure. This will distribute the work
among all the map nodes.
• Tokenize the words in each of the mappers and give a hardcoded value (1) to each of
the tokens or words.
• Mapper phase: A list of key-value pair will be created where the key is the individual
words and value is one.
• Sorting and shuffling: A partition process takes place where sorting and shuffling
happen so that all the tuples with the same key are sent to the corresponding reducer.
• Each reducer will have a unique key and a list of values corresponding to that very key.
For example, Bear, [1,1]; Car, [1,1,1].., etc.
• Each Reducer counts the values which are present in that list of values. As shown in
the figure, reducer gets a list of values which is [1,1] for the key Bear. Then, it counts
the number of ones in the very list and gives the final output as – Bear, 2.
• Finally, all the output key/value pairs are then collected and written in the output file.

Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Map Reduce
No ratings yet
Map Reduce
33 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
BDA Experiment 3
No ratings yet
BDA Experiment 3
7 pages
Hadoop - Mapreduce
No ratings yet
Hadoop - Mapreduce
5 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Unit 4 2 - CC
No ratings yet
Unit 4 2 - CC
6 pages
UNIT 2-tt1
No ratings yet
UNIT 2-tt1
7 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Unit 3 MapReduce Part 1
No ratings yet
Unit 3 MapReduce Part 1
12 pages
BDP 2024 08
No ratings yet
BDP 2024 08
14 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
25 pages
ECS765P - W2 - The MapReduce Programming Model
No ratings yet
ECS765P - W2 - The MapReduce Programming Model
53 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Bda 03
No ratings yet
Bda 03
10 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
No ratings yet
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
5 pages
2 MapReduce Continue
No ratings yet
2 MapReduce Continue
12 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Exp 5 Bda
No ratings yet
Exp 5 Bda
9 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
Big Data
No ratings yet
Big Data
120 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Data Science
No ratings yet
Data Science
7 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Map Red
No ratings yet
Map Red
6 pages
Write Your First MapReduce Program in 20 Minutes
No ratings yet
Write Your First MapReduce Program in 20 Minutes
16 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
What Is Mapreduce?
No ratings yet
What Is Mapreduce?
3 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Group - 3
No ratings yet
Group - 3
24 pages
Lecture 2
No ratings yet
Lecture 2
63 pages
06 ImpalaHiveDataModeling
No ratings yet
06 ImpalaHiveDataModeling
47 pages
Group - 1
No ratings yet
Group - 1
27 pages
Group - 36
No ratings yet
Group - 36
9 pages
UTD Resume Final
No ratings yet
UTD Resume Final
1 page
Programming For Data Science - Assignment 1
No ratings yet
Programming For Data Science - Assignment 1
2 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Mapreduce

Uploaded by

Mapreduce

Uploaded by

Mapreduce

What is mapreduce and what does it do ?

• Two Main Phases:

• The MapReduce component distributes the computational tasks and may

• Minimally, applications specify the input/output locations and

You might also like