0% found this document useful (0 votes)

19 views5 pages

MapReduce Architecture

Uploaded by

sivaganesha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views5 pages

MapReduce Architecture

Uploaded by

sivaganesha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 5

MapReduce Architecture

MapReduce and HDFS are the two major components

of Hadoop which makes it so powerful and efficient to use.
MapReduce is a programming model used for efficient processing in
parallel over large data-sets in a distributed manner. The data is
first split and then combined to produce the final result. The libraries
for MapReduce is written in so many programming languages with
various different-different optimizations. The purpose of MapReduce
in Hadoop is to Map each of the jobs and then it will reduce it to
equivalent tasks for providing less overhead over the cluster
network and to reduce the processing power. The MapReduce task is
mainly divided into two phases Map Phase and Reduce Phase.

MapReduce Architecture:
Components of MapReduce Architecture:

1. Client: The MapReduce client is the one who brings the Job to
the MapReduce for processing. There can be multiple clients
available that continuously send jobs for processing to the
Hadoop MapReduce Manager.
2. Job: The MapReduce Job is the actual work that the client
wanted to do which is comprised of so many smaller tasks that
the client wants to process or execute.
3. Hadoop MapReduce Master: It divides the particular job into
subsequent job-parts.
4. Job-Parts: The task or sub-jobs that are obtained after
dividing the main job. The result of all the job-parts combined
to produce the final output.
5. Input Data: The data set that is fed to the MapReduce for
processing.
6. Output Data: The final result is obtained after the processing.

In MapReduce, we have a client. The client will submit the job of a

particular size to the Hadoop MapReduce Master. Now, the
MapReduce master will divide this job into further equivalent job-
parts. These job-parts are then made available for the Map and
Reduce Task. This Map and Reduce task will contain the program as
per the requirement of the use-case that the particular company is
solving. The developer writes their logic to fulfill the requirement
that the industry requires. The input data which we are using is then
fed to the Map Task and the Map will generate intermediate key-
value pair as its output. The output of Map i.e. these key-value pairs
are then fed to the Reducer and the final output is stored on the
HDFS. There can be n number of Map and Reduce tasks made
available for processing the data as per the requirement. The
algorithm for Map and Reduce is made with a very optimized way
such that the time complexity or space complexity is minimum.

Let’s discuss the MapReduce phases to get a better understanding

of its architecture:

The MapReduce task is mainly divided into 2 phases i.e. Map phase
and Reduce phase.

1. Map: As the name suggests its main use is to map the input
data in key-value pairs. The input to the map may be a key-
value pair where the key can be the id of some kind of address
and value is the actual value that it keeps. The Map() function
will be executed in its memory repository on each of these
input key-value pairs and generates the intermediate key-
value pair which works as input for the Reducer
or Reduce() function.

2. Reduce: The intermediate key-value pairs that work as input

for Reducer are shuffled and sort and send to
the Reduce() function. Reducer aggregate or group the data
based on its key-value pair as per the reducer algorithm
written by the developer.

How Job tracker and the task tracker deal with MapReduce:

1. Job Tracker: The work of Job tracker is to manage all the

resources and all the jobs across the cluster and also to
schedule each map on the Task Tracker running on the same
data node since there can be hundreds of data nodes available
in the cluster.

2. Task Tracker: The Task Tracker can be considered as the

actual slaves that are working on the instruction given by the
Job Tracker. This Task Tracker is deployed on each of the nodes
available in the cluster that executes the Map and Reduce task
as instructed by Job Tracker.

There is also one important component of MapReduce Architecture

known as Job History Server. The Job History Server is a daemon
process that saves and stores historical information about the task
or application, like the logs which are generated during or after the
job execution are stored on Job History Server.

Learn in a distraction-free environment with refined, high-quality

content and 35+ expert-led tech courses to help you crack any
interview. From programming languages and DSA to web
development and data science, GeeksforGeeks Premium has you
covered!

Choose GeeksforGeeks Premium today and also get access

to Unlimited Article Summarization, 100% Ad free
environment, A.I. Bot support in all coding problems, and much
more. Go Premium!

Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
3 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Big Data notes (1)
No ratings yet
Big Data notes (1)
13 pages
3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Unit 5
No ratings yet
Unit 5
35 pages
What Is MapReduce in Hadoop - Architecture - Example
No ratings yet
What Is MapReduce in Hadoop - Architecture - Example
7 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Data Science
No ratings yet
Data Science
7 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Unit 3
No ratings yet
Unit 3
13 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
5 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
B. Hadoop Ecosystem_III (MapReduce)
No ratings yet
B. Hadoop Ecosystem_III (MapReduce)
55 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Unit - III
No ratings yet
Unit - III
37 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
25 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Hadoop: Er. Gursewak Singh Dsce
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
2 Bda Chapter2 Answer
No ratings yet
2 Bda Chapter2 Answer
9 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
UNIT 3bda
No ratings yet
UNIT 3bda
16 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
UNIT III Notes_18540760ab9652a7b4b8d9c1d0f56f3c
No ratings yet
UNIT III Notes_18540760ab9652a7b4b8d9c1d0f56f3c
24 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
P.Prabu (23x61c) CCS334-BDA - Unit-3
No ratings yet
P.Prabu (23x61c) CCS334-BDA - Unit-3
23 pages
unit 2
No ratings yet
unit 2
12 pages
BDA U2 - copy
No ratings yet
BDA U2 - copy
79 pages
Map Reduce Intro
No ratings yet
Map Reduce Intro
21 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
Chapter 4 MapReduce and New Software Stack
No ratings yet
Chapter 4 MapReduce and New Software Stack
48 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Unit 3 & 4 big data
No ratings yet
Unit 3 & 4 big data
18 pages
What Is MapReduce in Hadoop
No ratings yet
What Is MapReduce in Hadoop
5 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Mapreduce Lifecycle
No ratings yet
Mapreduce Lifecycle
8 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
BDA_UNIT_2
No ratings yet
BDA_UNIT_2
48 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
unit5 b
No ratings yet
unit5 b
4 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Call For Paper
No ratings yet
Call For Paper
1 page
Vocabulario para Las Especificaciones de 3GPP
No ratings yet
Vocabulario para Las Especificaciones de 3GPP
68 pages
mad-summer-2022-mad-model-answer-paper
No ratings yet
mad-summer-2022-mad-model-answer-paper
40 pages
MC68376BGCFT20
No ratings yet
MC68376BGCFT20
440 pages
Ishika Soni: Educational Alifications
No ratings yet
Ishika Soni: Educational Alifications
2 pages
EchoPoint User Interface Maunal
No ratings yet
EchoPoint User Interface Maunal
32 pages
Accord Service Manual
No ratings yet
Accord Service Manual
1,379 pages
Debenu Quick PDF Library 10.11 Reference Guide
No ratings yet
Debenu Quick PDF Library 10.11 Reference Guide
891 pages
bài tập mệnh đề quan hệ rút gọn
No ratings yet
bài tập mệnh đề quan hệ rút gọn
4 pages
HTML Simplified Complete Notes Guide
No ratings yet
HTML Simplified Complete Notes Guide
40 pages
International Students Guide
No ratings yet
International Students Guide
45 pages
Review Questions MIS
100% (1)
Review Questions MIS
12 pages
Manual Simplex MINIPLEX 4100ES Series
No ratings yet
Manual Simplex MINIPLEX 4100ES Series
164 pages
Module 7 - ICT - Project - Management - Theory - Practice - Draft1
No ratings yet
Module 7 - ICT - Project - Management - Theory - Practice - Draft1
9 pages
Origins of The Data Analysis Process
No ratings yet
Origins of The Data Analysis Process
4 pages
ClothFlow A Flow-Based Model For Clothed Person Generation ICCV 2019 Paper
No ratings yet
ClothFlow A Flow-Based Model For Clothed Person Generation ICCV 2019 Paper
10 pages
National Bank of Abu Dhabi: Case Study
No ratings yet
National Bank of Abu Dhabi: Case Study
2 pages
Biel hub (1)
No ratings yet
Biel hub (1)
44 pages
VMware Skyline HealthDiagnostics Installation Setup and Operations Guide
No ratings yet
VMware Skyline HealthDiagnostics Installation Setup and Operations Guide
100 pages
Cream Brown Simple Web Developer CV
No ratings yet
Cream Brown Simple Web Developer CV
2 pages
GRBL Configuration ENG
No ratings yet
GRBL Configuration ENG
5 pages
Appendix 63 RIS
No ratings yet
Appendix 63 RIS
1 page
109769093_S7_1500RH_AddIn_DOC_V1_4_en
No ratings yet
109769093_S7_1500RH_AddIn_DOC_V1_4_en
12 pages
Educational Leadership in ICT
No ratings yet
Educational Leadership in ICT
5 pages
repaper
No ratings yet
repaper
9 pages
Chapter 5 - System implementation
No ratings yet
Chapter 5 - System implementation
53 pages
First Quarter Summative Assessment
No ratings yet
First Quarter Summative Assessment
4 pages
Wiley - Technical Writing For Dummies - 978-1-118-06942-4
No ratings yet
Wiley - Technical Writing For Dummies - 978-1-118-06942-4
2 pages
Monitoreando Android Usando SNMP
No ratings yet
Monitoreando Android Usando SNMP
2 pages
Pay-Pa - Adapted Style Guide
No ratings yet
Pay-Pa - Adapted Style Guide
19 pages

MapReduce Architecture

Uploaded by

MapReduce Architecture

Uploaded by

MapReduce Architecture

MapReduce and HDFS are the two major components

In MapReduce, we have a client. The client will submit the job of a

Let’s discuss the MapReduce phases to get a better understanding

2. Reduce: The intermediate key-value pairs that work as input

1. Job Tracker: The work of Job tracker is to manage all the

2. Task Tracker: The Task Tracker can be considered as the

There is also one important component of MapReduce Architecture

Learn in a distraction-free environment with refined, high-quality

Choose GeeksforGeeks Premium today and also get access

You might also like