Understanding MapReduce

Uploaded by

gopikrishna

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Understanding MapReduce

Uploaded by

gopikrishna

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

UNDERSTANDING

MAPREDUCE
 MapReduce
 Key/value pairs
 The Hadoop Java API for MapReduce
 The Mapper class
 The Reducer class Presented by,
Gopikrishna PP
UNDERSTANDING MAP REDUCE FUNDAMENTALS

■ MapReduce is a software framework and programming model used for processing huge amounts of
data. MapReduce program work in two phases, namely, Map and Reduce. Map tasks deal with splitting and
mapping of data while Reduce tasks shuffle and reduce the data.

■ MapReduce is a programming model and an associated implementation for processing and

generating big data sets with a parallel, distributed algorithm on a cluster.
■ Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and
C++. The programs of Map Reduce in cloud computing are parallel in nature, thus are very useful for
performing large-scale data analysis using multiple machines in the cluster.
How MapReduce Organizes Work?
■ Hadoop divides the job into tasks. There are two types of tasks:

1)Map tasks (Splits & Mapping)

2)Reduce tasks (Shuffling, Reducing)

■ The input to each phase is key-value pairs. In addition, every programmer needs to specify two
functions: map function and reduce function.
Phases of execution
■ The whole process goes through four phases of execution:

1. Mapper

 It is the first phase of MapReduce programming and contains the coding logic of the mapper function.

 The conditional logic is applied to the ‘n’ number of data blocks spread across various data nodes.

 Mapper function accepts key-value pairs as input as (k, v), where the key represents the offset address of
each record and the value represents the entire record content.

 The output of the Mapper phase will also be in the key-value format as (k’, v’).

■ 2. Shuffle and Sort

 The output of various mappers (k’, v’), then goes into Shuffle and Sort phase.

 All the duplicate values are removed, and different values are grouped together based on similar keys.

 The output of the Shuffle and Sort phase will be key-value pairs again as key and array of values (k,
v[]).
■ 3. Reducer

 The output of the Shuffle and Sort phase (k, v[]) will be the input of the Reducer phase.

 In this phase reducer function’s logic is executed and all the values are aggregated against their
corresponding keys.

 Reducer consolidates outputs of various mappers and computes the final job output.

 The final output is then written into a single file in an output directory of HDFS.

■ 4. Combiner

 It is an optional phase in the MapReduce model.

 The combiner phase is used to optimize the performance of MapReduce jobs.

 In this phase, various outputs of the mappers are locally reduced at the node level.

 For example, if different mapper outputs (k, v) coming from a single node contains duplicates, then
they get combined i.e. locally reduced as a single (k, v[]) output.

 This phase makes the Shuffle and Sort phase work even quicker thereby enabling additional
■ Following illustration shows how Tweeter manages its tweets with the help of MapReduce.

■ As shown in the illustration, the MapReduce algorithm performs the following actions −
 Tokenize − Tokenizes the tweets into maps of tokens and writes them as key-value pairs.
 Filter − Filters unwanted words from the maps of tokens and writes the filtered maps as key-value pairs.
 Count − Generates a token counter per word.
 Aggregate Counters − Prepares an aggregate of similar counter values into small manageable units.
Intermediate keys
 They key-value pairs generated by the
mapper are known as intermediate
keys.
Mapper in MapReduce
■ Map-Reduce is a programming model that is mainly divided into two phases Map
Phase and Reduce Phase. It is designed for processing the data in parallel which is
divided on various machines(nodes).
■ The Hadoop Java programs are consist of Mapper class and Reducer class along with
the driver class.
Understand the Mapper in Map-Reduce:

 Mapper is a simple user-defined program that performs some operations on input-splits as per it is designed.
Mapper is a base class that needs to be extended by the developer or programmer in his lines of code
according to the organization’s requirements. input and output type need to be mentioned under the Mapper
class argument which needs to be modified by the developer.
■ For Example:
Class MyMappper extends Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

 Five components
 Mapper is the initial line of code that initially interacts with the input dataset.
 The Mapper mainly consists of 5 components:
 Input
 Input Splits,
 Record Reader
 Map
 Intermediate output disk.

 How to calculate the number of Mappers In Hadoop:

 The number of blocks of input file defines the number of map-task in the Hadoop Map-phase,
which can be calculated with the help of the below formula.
Mapper = (total data size)/ (input split size)
Reducer in MapReduce

• Here, in the above image, we can observe that there are multiple Mapper which are generating the key-
value pairs as output. The output of each mapper is sent to the sorter which will sort the key-value pairs
according to its key value.
• Shuffling also takes place during the sorting process and the output will be sent to the Reducer part and
final output is produced.
The Hadoop Java API for MapReducer
 MapReducer API is the super-interface for all the classes, which defines different jobs in MapReduce.
 The Hadoop MapReduce API is implemented in Java, so MapReduce applications are generally Java-based
 In this section, we focus on MapReduce APIs. Here, we learn about the classes and methods used in
MapReduce programming

MapReduce API in Hadoop

frame work
■ In this section, we focus on MapReduce APIs. Here, we learn about the classes
and methods used in MapReduce programming.
The mapper class
■ In MapReduce, the role of the Mapper class is to map the input key-value pairs to a set of
intermediate key-value pairs. It transforms the input records into intermediate records.
■ These intermediate records associated with a given output key and passed to Reducer for the final
output.

Methods of Mapper Class

The reducer class
■ In MapReduce, the role of the Reducer class is to reduce the set of intermediate values. Its
implementations can access the Configuration for the job via the JobContext.getConfiguration()
method.
Thank you

Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
BDA U2 - copy
No ratings yet
BDA U2 - copy
79 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
BDA FW-4
No ratings yet
BDA FW-4
7 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
UNIT 4
No ratings yet
UNIT 4
10 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Mapreduce Notes (1)
No ratings yet
Mapreduce Notes (1)
4 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
Map reduce
No ratings yet
Map reduce
35 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
25 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Map Reduce
No ratings yet
Map Reduce
10 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Data Science
No ratings yet
Data Science
7 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
MapReduce - Documentation
No ratings yet
MapReduce - Documentation
2 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
BDA_UNIT_2
No ratings yet
BDA_UNIT_2
48 pages
UNIT 3bda
No ratings yet
UNIT 3bda
16 pages
BDA Assignment - 148 - A
No ratings yet
BDA Assignment - 148 - A
4 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Unit - III
No ratings yet
Unit - III
37 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
6.UNIT 3 BDA
No ratings yet
6.UNIT 3 BDA
18 pages
Bda Mod2
No ratings yet
Bda Mod2
8 pages
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
2 Blood Bank Management System Project 2
No ratings yet
2 Blood Bank Management System Project 2
18 pages
Cyber Security Analyst
No ratings yet
Cyber Security Analyst
2 pages
Static and Rotating Equipment PDF
No ratings yet
Static and Rotating Equipment PDF
4 pages
N7201A563E06 PANASONIC NPM AM Operating Instructions Manual Parts Verification Options Wired - 1
No ratings yet
N7201A563E06 PANASONIC NPM AM Operating Instructions Manual Parts Verification Options Wired - 1
10 pages
Autocad Question
No ratings yet
Autocad Question
7 pages
BWD Unit-1
No ratings yet
BWD Unit-1
18 pages
Stopwatch
No ratings yet
Stopwatch
40 pages
Consume Rest Api Django
No ratings yet
Consume Rest Api Django
29 pages
9i How To Setup Oracle Streams Replication. (Doc ID 224255.1)
No ratings yet
9i How To Setup Oracle Streams Replication. (Doc ID 224255.1)
9 pages
JFF Publications Website - Google Search
No ratings yet
JFF Publications Website - Google Search
2 pages
Message Prioritization in Advanced Adapter Engine
No ratings yet
Message Prioritization in Advanced Adapter Engine
9 pages
iSMA-Configurator - Datasheet - V1.0 Controlador
No ratings yet
iSMA-Configurator - Datasheet - V1.0 Controlador
2 pages
Class 19
No ratings yet
Class 19
15 pages
Rhythmizer Use and DAW Setup
No ratings yet
Rhythmizer Use and DAW Setup
13 pages
Ijrcm 2 IJRCM 2 - Vol 3 - 2013 - Issue 4 - April
No ratings yet
Ijrcm 2 IJRCM 2 - Vol 3 - 2013 - Issue 4 - April
157 pages
Canon I350 Waste Tank Full - Fixyourownprinter
No ratings yet
Canon I350 Waste Tank Full - Fixyourownprinter
22 pages
Semister Project
No ratings yet
Semister Project
12 pages
Complex Surcharges
No ratings yet
Complex Surcharges
7 pages
Kasus Puzzle#2
No ratings yet
Kasus Puzzle#2
1 page
SysInternals Docs
100% (1)
SysInternals Docs
191 pages
Chapter 2 (SOFTCOPY)
No ratings yet
Chapter 2 (SOFTCOPY)
4 pages
Test-1 Answer Key
No ratings yet
Test-1 Answer Key
23 pages
PDF Barcodes For Mobile Devices 1st Edition Hiroko Kato Download
100% (4)
PDF Barcodes For Mobile Devices 1st Edition Hiroko Kato Download
57 pages
Practical File CS 2023-24 - KV RAJAHMUNDRY
No ratings yet
Practical File CS 2023-24 - KV RAJAHMUNDRY
55 pages
Q & A Exam (Adv V11) Q & A Exam (Adv V11) : Review Your Answers
No ratings yet
Q & A Exam (Adv V11) Q & A Exam (Adv V11) : Review Your Answers
7 pages
Emc Powerpath Ve For Vmware Sphere Install Reference
No ratings yet
Emc Powerpath Ve For Vmware Sphere Install Reference
9 pages
Adsp-2185m 0
No ratings yet
Adsp-2185m 0
40 pages
Lumion 7 Crack Cgpersia
No ratings yet
Lumion 7 Crack Cgpersia
1 page
Allen Bradley 1756-CNB Control Net Bridge Module - Burn Only (103 Pages)
No ratings yet
Allen Bradley 1756-CNB Control Net Bridge Module - Burn Only (103 Pages)
203 pages
Architectural Design: Presentation Sub Title
No ratings yet
Architectural Design: Presentation Sub Title
20 pages