Understanding MapReduce

MapReduce programming divides jobs into Map and Reduce tasks to enhance efficiency and scalability in data processing. The Map phase involves reading input data, processing it into key-value pairs, and optionally aggregating results before passing them to reducers. The Reduce phase shuffles and sorts the intermediate data, allowing for aggregation and final output generation, which is then written back to the Hadoop Distributed File System (HDFS).

Uploaded by

Makkapati Deepthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views4 pages

Understanding MapReduce

Uploaded by

Makkapati Deepthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Understanding MapReduce

MapReduce programming splits jobs (applications) into two main tasks:

1. Map tasks – Responsible for processing small subsets of the data.
2. Reduce tasks – Aggregate and generate the final output from
intermediate results.
These tasks are executed in parallel across a Hadoop cluster to improve
efficiency and scalability.

Map Task Phases

A map task involves:
1. Record Reader: Reads input data from the Hadoop Distributed File
System (HDFS) and converts it into key-value pairs for processing.
2. Mapper: Processes the key-value pairs, transforming the data and
generating intermediate key-value pairs.
3. Combiner (optional): An optimization step that performs local
aggregation on the mapper output to reduce the data size sent to the
reducer.
4. Partitioner: Determines which reducer will process each intermediate
key-value pair.
The output from the map task is referred to as intermediate keys and values.

Reduce Task Phases

The reduce task takes intermediate key-value pairs and processes them
through the following phases:
1. Shuffle: Transfers the intermediate data from mappers to reducers.
2. Sort: Sorts the intermediate data by keys to prepare for reduction.
3. Reducer: Aggregates or processes the sorted data to produce the final
output.
4. Output Format: Writes the final output back to HDFS in the required
format.

MAPPER
1. RecordReader
 Function: Converts a byte-oriented view of the input into a record-
oriented view.
 Input Split: Data is divided into smaller chunks (input splits) before being
passed to the mapper.
 Output: Presents data as key-value pairs to the mapper.
o The key typically represents positional information (e.g., an offset
in the file).
o The value represents a chunk of data (e.g., a line in a text file).

2. Map
 Core Function: The mapper function processes the input key-value pairs
produced by RecordReader and generates zero or more intermediate
key-value pairs.
 Logic: The transformation logic is user-defined and varies depending on
the problem.
o For example, in word count applications, the mapper generates
(word, 1) for each word found.

3. Combiner (Optional)
 Purpose: Acts as a local reducer to aggregate mapper output before
sending it to the reducer.
 Performance Benefit: Reduces the amount of data transferred over the
network, saving bandwidth and disk space.
 Functionality: Combines multiple intermediate key-value pairs (e.g.,
summing counts for words) before sending them to the reducer.

4. Partitioner
 Function: Divides intermediate key-value pairs into partitions (shards)
and assigns each partition to a reducer.
 Key Assignment: Ensures that keys with the same value are sent to the
same reducer.
 Data Storage: The partitioned data is written to the local disk and pulled
by the corresponding reducer for further processing.

Reducer
1. Shuffle and Sort
 Function: The shuffle phase takes the output from all partitioners and
downloads it to the reducer’s local machine.
 Sorting: Data is sorted by keys to group similar keys together. This
grouping is necessary so the reducer can process all values associated
with a key in a single pass.
 Purpose: Ensures that all key-value pairs for a particular key are
processed together, facilitating efficient reduction.

2. Reduce
 Core Task: The reducer iterates through the sorted data, applies user-
defined logic, and processes one key-value group at a time.
 Operations: It can perform operations like aggregation, filtering, and
combining. For example, in a word count problem, it aggregates word
counts from all mappers.
 Output: The output can be zero or more key-value pairs, depending on
the logic applied in the reduce function.

3. Output Format
 Writing the Output: The default format separates the key-value pairs
with a tab and writes the final results to a file in Hadoop Distributed File
System (HDFS).
 Custom Formatting: Users can customize the output format as needed.

2.1-MapReduce
No ratings yet
2.1-MapReduce
16 pages
unit3
No ratings yet
unit3
33 pages
Hadoop MapReduce Tutorial
No ratings yet
Hadoop MapReduce Tutorial
25 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
04_MapReduce
No ratings yet
04_MapReduce
45 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
BDA FW-4
No ratings yet
BDA FW-4
7 pages
day6
No ratings yet
day6
12 pages
2 Bda Chapter2 Answer
No ratings yet
2 Bda Chapter2 Answer
9 pages
BDA U2 - copy
No ratings yet
BDA U2 - copy
79 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Bda Winter 2021 Solution
No ratings yet
Bda Winter 2021 Solution
27 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
unit 2
No ratings yet
unit 2
12 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
BDA notes
No ratings yet
BDA notes
39 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Unit - III
No ratings yet
Unit - III
37 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
MAP Reduce - 1 (1).Pptx (1)
No ratings yet
MAP Reduce - 1 (1).Pptx (1)
34 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
ASEBA PC and Network Manual 2023
No ratings yet
ASEBA PC and Network Manual 2023
80 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Unit 3
No ratings yet
Unit 3
13 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Bda Mod2
No ratings yet
Bda Mod2
8 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Unit 3
No ratings yet
Unit 3
10 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Datasheet BMV 700 Series en
No ratings yet
Datasheet BMV 700 Series en
2 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Unit 5
No ratings yet
Unit 5
7 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Big Assignment 2
No ratings yet
Big Assignment 2
10 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
Satellite Communications Fundamentals
No ratings yet
Satellite Communications Fundamentals
26 pages
MapReduce - Documentation
No ratings yet
MapReduce - Documentation
2 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
training-matrix-computer-system-servicing-ncii
No ratings yet
training-matrix-computer-system-servicing-ncii
5 pages
M-Series Quantum: User Manual
No ratings yet
M-Series Quantum: User Manual
48 pages
envea_co12e_co_air_quality_analyzer_e-series_en
No ratings yet
envea_co12e_co_air_quality_analyzer_e-series_en
2 pages
Nidhi Resume 2024
No ratings yet
Nidhi Resume 2024
2 pages
04 - Recurrence
No ratings yet
04 - Recurrence
22 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Public Policy Framework
No ratings yet
Public Policy Framework
84 pages
Sena Customer Support Policies For Customers: Defective Product Replacement Policy
No ratings yet
Sena Customer Support Policies For Customers: Defective Product Replacement Policy
3 pages
SRMCEM MCA Farewell Magazine 23
100% (2)
SRMCEM MCA Farewell Magazine 23
49 pages
Hadoop: Er. Gursewak Singh Dsce
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
Operating - System - KCS 401 - Assignment - 1 PDF
No ratings yet
Operating - System - KCS 401 - Assignment - 1 PDF
5 pages
BSpline
No ratings yet
BSpline
4 pages
SAVIOR Payroll Software
100% (1)
SAVIOR Payroll Software
24 pages
C&C++ Record
No ratings yet
C&C++ Record
33 pages
Game Crash
No ratings yet
Game Crash
9 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Big Data Analytics Mid 2
No ratings yet
Big Data Analytics Mid 2
9 pages
All Roadmap and Free Courses
No ratings yet
All Roadmap and Free Courses
1 page
SYLLABUS DS101 Discrete Structure 2023 2024
No ratings yet
SYLLABUS DS101 Discrete Structure 2023 2024
10 pages
Hygieia Research
No ratings yet
Hygieia Research
3 pages
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
No ratings yet
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
19 pages
Jurnsl Mikrokontroler
No ratings yet
Jurnsl Mikrokontroler
8 pages
Introduction To Information Systems: ITEC 1010 Information and Organizations
No ratings yet
Introduction To Information Systems: ITEC 1010 Information and Organizations
77 pages
Network Security PDF
No ratings yet
Network Security PDF
4 pages
Human Resource Management System Thesis Download
100% (1)
Human Resource Management System Thesis Download
8 pages
(NIZX) Free Fire IMEI .Lua - RcyQm - 1nbWdK - Lasm. - RgzI7.lua
No ratings yet
(NIZX) Free Fire IMEI .Lua - RcyQm - 1nbWdK - Lasm. - RgzI7.lua
3 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
8255
No ratings yet
8255
28 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Managing A FortiSwitch Unit With A FortiGate
No ratings yet
Managing A FortiSwitch Unit With A FortiGate
33 pages
Brochure - ABX Micros 60 OT PDF
No ratings yet
Brochure - ABX Micros 60 OT PDF
2 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
Practical Game Development With Unity and Blender
100% (3)
Practical Game Development With Unity and Blender
353 pages
Tax refund guide modified
No ratings yet
Tax refund guide modified
11 pages

Understanding MapReduce

Uploaded by

Understanding MapReduce

Uploaded by

Understanding MapReduce

MapReduce programming splits jobs (applications) into two main tasks:

Map Task Phases

Reduce Task Phases

You might also like