0% found this document useful (0 votes)

7 views25 pages

Understanding MapReduce in Hadoop

Uploaded by

bauuaverma2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views25 pages

Understanding MapReduce in Hadoop

Uploaded by

bauuaverma2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

UNDERSTANDING

MAPREDUCE IN
HADOOP
Introduction

MapReduce is a component of the

Apache Hadoop ecosystem, a
framework that enhances massive
data processing.
Introduction
There are two primary tasks in
MapReduce: map and reduce.

We perform the former task before

the latter.
Introduction
In the map job, we split the input dataset into
chunks.

Reducers process the intermediate data from

the maps into smaller tuples, that reduces the
tasks, leading to the final output of the
framework.
How MapReduce in Hadoop
works
MapReduce architecture consists of
various components.
How MapReduce in Hadoop
works
Job: This is the actual work that needs
to be executed or processed

Task: This is a piece of the actual work

that needs to be executed or
processed.
How MapReduce in Hadoop
works
Job Tracker: This tracker plays the role of
scheduling jobs and tracking all jobs assigned to the
task tracker.

Task Tracker: This tracker plays the role of tracking

tasks and reporting the status of tasks to the job
tracker.
How MapReduce in Hadoop
works
Input data: This is the data used to process in
the mapping phase.

Output data: This is the result of mapping and

reducing.
How MapReduce in Hadoop
works
Client: This is a program that submits jobs to the MapReduce.

Hadoop MapReduce Master: This plays the role of dividing

jobs into job-parts.

Job-parts: These are sub-jobs that result from the

division of the main job.
How MapReduce in Hadoop
works
In the MapReduce architecture, clients submit jobs
to the MapReduce Master.

This master will then sub-divide the job into equal

sub-parts.
How MapReduce in Hadoop
works
The job-parts will be used for the two main tasks in
MapReduce: mapping and reducing.

The developer will write logic that satisfies the

requirements of the organization or company.

The input data will be split and mapped.

How MapReduce in Hadoop
works
The intermediate data will then be sorted and
merged.

The reducer that will generate a final output stored

in the HDFS will process the resulting output.
How MapReduce in Hadoop
works
How job trackers and task
trackers work
The job tracker acts as a master.

It ensures that we execute all jobs.

The job tracker schedules jobs that have been submitted

by clients.

It will assign jobs to task trackers.

How job trackers and task
trackers work
Task trackers report the status of each assigned job to
the job tracker.
Phases of MapReduce
The MapReduce program is executed in three main
phases: mapping, shuffling, and reducing.

There is also an optional phase known as the combiner

phase.
Mapping Phase
A dataset is split into equal units called chunks (input
splits) in the splitting step.

Hadoop consists of a RecordReader that uses

TextInputFormat to transform input splits into key-
value pairs.
Mapping Phase
The mapping step contains a coding logic that is
applied to these data blocks.

In this step, the mapper processes the key-value

pairs and produces an output of the same form
(key-value pairs).
Shuffling phase
This is the second phase that takes place after the
completion of the Mapping phase.

It consists of two main steps: sorting and merging.

In the sorting step, the key-value pairs are sorted

using the keys.
Shuffling phase
Merging ensures that key-value pairs are
combined.

The shuffling phase facilitates the removal of

duplicate values and the grouping of values.

Different values with similar keys are grouped.

Reducer phase
In the reducer phase, the output of the shuffling
phase is used as the input.

The reducer processes this input further to reduce

the intermediate values into smaller values.
MR Word Count Process
On Summarizing

3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
What Is MapReduce in Hadoop
No ratings yet
What Is MapReduce in Hadoop
5 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Data Science
No ratings yet
Data Science
7 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
BDA U2 - copy
No ratings yet
BDA U2 - copy
79 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
5 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
unit 2
No ratings yet
unit 2
12 pages
What Is MapReduce in Hadoop - Architecture - Example
No ratings yet
What Is MapReduce in Hadoop - Architecture - Example
7 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
3 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Unit 3
No ratings yet
Unit 3
13 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
No ratings yet
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
5 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Big Data Unit-2 PPT part2
No ratings yet
Big Data Unit-2 PPT part2
78 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Hadoop Map Reduce
No ratings yet
Hadoop Map Reduce
53 pages
unit3
No ratings yet
unit3
33 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
MapReduce (1)
No ratings yet
MapReduce (1)
33 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Unit - III
No ratings yet
Unit - III
37 pages
Bda 03
No ratings yet
Bda 03
10 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
26 pages
Map Red
No ratings yet
Map Red
6 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
UNIT - 5
No ratings yet
UNIT - 5
57 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Cloud Computing Viva Question & Answer
0% (1)
Cloud Computing Viva Question & Answer
9 pages
DAV Quantum
No ratings yet
DAV Quantum
143 pages
Data Engineering Bootcamp
No ratings yet
Data Engineering Bootcamp
5 pages
Analisis Big Data en El Mundo Corporativo
No ratings yet
Analisis Big Data en El Mundo Corporativo
8 pages
Big Data Testing
No ratings yet
Big Data Testing
9 pages
Design and Implementation of Vsstor: A Large-Scale Video Surveillance Storage System
No ratings yet
Design and Implementation of Vsstor: A Large-Scale Video Surveillance Storage System
6 pages
Lesson 1 Overview of Big Data Analytics
No ratings yet
Lesson 1 Overview of Big Data Analytics
6 pages
CAIIB Elective Paper Information Technology 2023 Mock 01 20th October
No ratings yet
CAIIB Elective Paper Information Technology 2023 Mock 01 20th October
25 pages
BigQueryTechnicalWP PDF
No ratings yet
BigQueryTechnicalWP PDF
12 pages
Cloudera Administration
No ratings yet
Cloudera Administration
481 pages
Map Viewer Configuration For OBIEE 11g
100% (1)
Map Viewer Configuration For OBIEE 11g
30 pages
Hdfs Commands
No ratings yet
Hdfs Commands
4 pages
R360-Installation Guide PDF
No ratings yet
R360-Installation Guide PDF
172 pages
Cyber Security Intelligence and Analytics Zheng Xu All Chapters Instant Download
100% (1)
Cyber Security Intelligence and Analytics Zheng Xu All Chapters Instant Download
62 pages
Kapita Selekta-Pertemuan I: Rajkumar Buyya
No ratings yet
Kapita Selekta-Pertemuan I: Rajkumar Buyya
63 pages
Veritas NetBackup v10 Advanced Administration
No ratings yet
Veritas NetBackup v10 Advanced Administration
5 pages
Expert Veri Ed, Online, Free.: Custom View Settings
No ratings yet
Expert Veri Ed, Online, Free.: Custom View Settings
12 pages
PDI (Pentaho Data Integration)
100% (1)
PDI (Pentaho Data Integration)
37 pages
PHD Thesis Big Data
100% (3)
PHD Thesis Big Data
7 pages
3
No ratings yet
3
11 pages
NetBackup90 EEB Guide
No ratings yet
NetBackup90 EEB Guide
57 pages
Docu56010 - Isilon OneFS 7.2.0.0 - 7.2.0.4 Release Notes
100% (1)
Docu56010 - Isilon OneFS 7.2.0.0 - 7.2.0.4 Release Notes
172 pages
Apache HBase PPT
No ratings yet
Apache HBase PPT
12 pages
Modern Artificial Intelligence and Data Science: Tools, Techniques and Systems 1st Edition Abdellah Idrissi - The full ebook version is available, download now to explore
100% (3)
Modern Artificial Intelligence and Data Science: Tools, Techniques and Systems 1st Edition Abdellah Idrissi - The full ebook version is available, download now to explore
66 pages
Unit III_Full
No ratings yet
Unit III_Full
31 pages
IT6006 Data Analytics
No ratings yet
IT6006 Data Analytics
12 pages
PXF 5 11 2
No ratings yet
PXF 5 11 2
252 pages
Hadoop Multi Node Cluster Setup
No ratings yet
Hadoop Multi Node Cluster Setup
7 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Robotics 07 00047 PDF
No ratings yet
Robotics 07 00047 PDF
25 pages

Understanding MapReduce in Hadoop

Uploaded by

Understanding MapReduce in Hadoop

Uploaded by

UNDERSTANDING

MapReduce is a component of the

We perform the former task before

Reducers process the intermediate data from

Task: This is a piece of the actual work

Task Tracker: This tracker plays the role of tracking

Output data: This is the result of mapping and

Hadoop MapReduce Master: This plays the role of dividing

Job-parts: These are sub-jobs that result from the

This master will then sub-divide the job into equal

The developer will write logic that satisfies the

The input data will be split and mapped.

The reducer that will generate a final output stored

It ensures that we execute all jobs.

The job tracker schedules jobs that have been submitted

It will assign jobs to task trackers.

There is also an optional phase known as the combiner

Hadoop consists of a RecordReader that uses

In this step, the mapper processes the key-value

It consists of two main steps: sorting and merging.

In the sorting step, the key-value pairs are sorted

The shuffling phase facilitates the removal of

Different values with similar keys are grouped.

The reducer processes this input further to reduce

You might also like