Unit-2 (MapReduce-II)

A MapReduce job in Apache Hadoop splits input data and processes it in parallel across multiple machines. The input is split into map tasks that process the data in parallel. The outputs from the map tasks are shuffled and sorted before being processed by reduce tasks to generate the final results.

Uploaded by

tripathineeharika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views11 pages

Unit-2 (MapReduce-II)

Uploaded by

tripathineeharika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Big Data and Analytics

Unit-II
MapReduce
Anatomy of a MapReduce Job in Apache Hadoop
Hadoop Framework comprises of two main components,
• Hadoop Distributed File System (HDFS) for Data Storage
• MapReduce for Data Processing.
A typical Hadoop MapReduce job is divided into a set of Map
and Reduce tasks that execute on a Hadoop cluster. The
execution flow occurs as follows:
• Input data is split into small subsets of data.
• Map tasks work on these data splits.
• The intermediate input data from Map tasks is then
submitted to Reduce task after an intermediate process
called ‘shuffle’.
• The Reduce task(s) works on this intermediate data to
generate the result of a MapReduce Job.
• Hadoop MapReduce jobs are divided into a set of map
tasks and reduce tasks
• The input to a MapReduce job is a set of files in the data store
that are spread out over the HDFS. In Hadoop, these files are
split with an input format, which defines how to separate a files
into input split. You can assume that input split is a byte-
oriented view of a chunk of the files to be loaded by a map task.
• Each map task in Hadoop is broken into following phases: record
reader, mapper, combiner, partitioner. The output of map
phase, called intermediate key and values are sent to the
reducers.
• The reduce tasks are broken into following phases: shuffle, sort,
reducer and output format.
The map tasks are assigned
by Hadoop framework to
those DataNodes where the
actual data to be processed
resides. This ensures that
the data typically doesn’t
have to move over the
network to save the
network bandwidth and
data is computed on the
local machine itself so called
map task is data local.
Mapper
Record Reader:
The record reader translates an input split generated by input format into
records. The purpose of record reader is to parse the data into record but
doesn’t parse the record itself. It passes the data to the mapper in form of
key/value pair. Usually the key in this context is positional information and
the value is a chunk of data that composes a record.
Map:
Map function is the heart of mapper task, which is executed on each
key/value pair from the record reader to produce zero or more key/value
pair, called intermediate pairs. The decision of what is key/value pair
depends on what the MapReduce job is accomplishing. The data is grouped
on key and the value is the information pertinent to the analysis in the
reducer.
Combiner:
• Combiner is not applicable to all the MapReduce algorithms but where
ever it can be applied it is always recommended to use. It takes the
intermediate keys from the mapper and applies a user-provided
method to aggregate values in a small scope of that one mapper. e.g
sending (hadoop, 3) requires fewer bytes than sending (hadoop, 1)
three times over the network.
Partitioner:
• The partitioner takes the intermediate key/value pairs from mapper
and split them into shards, one shard per reducer. This randomly
distributes the keyspace evenly over the reducer, but still ensures that
keys with the same value in different mappers end up at the same
reducer. The partitioned data is written to the local filesystem for each
map task and waits to be pulled by its respective reducer.
Reducer
Shuffle and Sort:
• The reduce task start with the shuffle and sort step. This
step takes the output files written by all of the hadoop
partitioners and downloads them to the local machine in
which the reducer is running. These individual data pipes
are then sorted by keys into one larger data list. The
purpose of this sort is to group equivalent keys together
so that their values can be iterated over easily in the
reduce task.
Reduce:
• The reducer takes the grouped data as input and runs a reduce
function once per key grouping. The function is passed the key and
an iterator over all the values associated with that key. A wide range
of processing can happen in this function, the data can be
aggregated, filtered, and combined in a number of ways. Once it is
done, it sends zero or more key/value pair to the final step, the
output format.
Output Format:
• The output format translate the final key/value pair from the reduce
function and writes it out to a file by a record writer. By default, it
will separate the key and value with a tab and separate record with
a new line character.
Anatomy of Hadoop MapReduce Execution:

Once we give a MapReduce job the system will enter into a

series of life cycle phases:
1. Job Submission Phase
2. Job Initialization Phase
3. Task Assignment Phase
4. Task Execution Phase
5. Progress update Phase
6. Failure Recovery
In order to run the MR program the hadoop uses the command-‘yarn jar
client.jar job-class HDFS input HDFS-output directory’, where yarn is an utility
and jar is the command.Client.jar and job class name written by the
developer. When we execute on terminal the Yarn will initiate a set of actions
1. Loading configurations
2. Identifying command
3. Setting class path
4. Identifying the java class
corresponding to the jar command
i.e..org.apache.hadoop.util.RunJar.Then it
will set the user provided command to
“java. Org .apache. hadoop.util. RunJar
job-class HDFS-input HDFS-output
directory”.

Flash Cards
0% (1)
Flash Cards
6 pages
Build Your Own Memory-Powered Chatbot With Google Generative AI, LangChain, and Gradio - by Vinod Pillai - Nov, 2024 - Medium
No ratings yet
Build Your Own Memory-Powered Chatbot With Google Generative AI, LangChain, and Gradio - by Vinod Pillai - Nov, 2024 - Medium
13 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
319 Scriptural Rosary
No ratings yet
319 Scriptural Rosary
40 pages
SSC Shedule by Shubh Chahhc 2025
No ratings yet
SSC Shedule by Shubh Chahhc 2025
93 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Bitcoin Transfer Procedure
No ratings yet
Bitcoin Transfer Procedure
3 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
Unit-2 (Hadoop)
No ratings yet
Unit-2 (Hadoop)
16 pages
English Verb Conjugation 2
No ratings yet
English Verb Conjugation 2
2 pages
The Apostles Creed Sacrament Prayer: Leader
100% (1)
The Apostles Creed Sacrament Prayer: Leader
4 pages
PJSUA2 Doc
No ratings yet
PJSUA2 Doc
273 pages
Unit 4 (MongoDB)
No ratings yet
Unit 4 (MongoDB)
46 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Tom04 Quick Overview of The Bible
No ratings yet
Tom04 Quick Overview of The Bible
38 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Hadoop: Er. Gursewak Singh Dsce
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
Bda U2
No ratings yet
Bda U2
79 pages
Unit-2 Bda Kalyan - Pagenumber
No ratings yet
Unit-2 Bda Kalyan - Pagenumber
15 pages
Lecture 5 - Hadoop and Mapreduce
No ratings yet
Lecture 5 - Hadoop and Mapreduce
30 pages
Table and Image in HTML
No ratings yet
Table and Image in HTML
21 pages
Indian Institue of Technology 1
No ratings yet
Indian Institue of Technology 1
63 pages
Big Data-Introduction
No ratings yet
Big Data-Introduction
14 pages
Matlab Code For Embedded Zero Wavelet (EZW) Image Compression - Image Processing Projects - Projects Source Code
No ratings yet
Matlab Code For Embedded Zero Wavelet (EZW) Image Compression - Image Processing Projects - Projects Source Code
22 pages
2002 Amc 10B
No ratings yet
2002 Amc 10B
6 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Unit - III
No ratings yet
Unit - III
37 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
25 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Unit 3
No ratings yet
Unit 3
13 pages
Unit 4
No ratings yet
Unit 4
19 pages
ResponsiveWebDesign PresentandFuturebyMRizwanPasha
No ratings yet
ResponsiveWebDesign PresentandFuturebyMRizwanPasha
21 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Map Red
No ratings yet
Map Red
6 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
G321.93 0.01: A Rare Site of Multiple Hub-Filament Systems With Evidence of Collision and Merging of Filaments
No ratings yet
G321.93 0.01: A Rare Site of Multiple Hub-Filament Systems With Evidence of Collision and Merging of Filaments
27 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Spectral Density Estimation Using P-Spline Priors
No ratings yet
Spectral Density Estimation Using P-Spline Priors
15 pages
Unit 2
No ratings yet
Unit 2
12 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
SM Contents-1
No ratings yet
SM Contents-1
8 pages
CSS Margin and Padding Properties Box Model
No ratings yet
CSS Margin and Padding Properties Box Model
15 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Modern Land Law 10th Edition Dixon Martin 2024 Scribd Download
No ratings yet
Modern Land Law 10th Edition Dixon Martin 2024 Scribd Download
24 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
Unit 3
No ratings yet
Unit 3
27 pages
BDA - Mapreduce 31 01 2025
No ratings yet
BDA - Mapreduce 31 01 2025
48 pages
MAP Reduce - 1
No ratings yet
MAP Reduce - 1
34 pages
3D Subtitles Tutorial
No ratings yet
3D Subtitles Tutorial
15 pages
B. Hadoop Ecosystem - III - B (MapReduce Framework)
No ratings yet
B. Hadoop Ecosystem - III - B (MapReduce Framework)
33 pages
The Early Trinity
No ratings yet
The Early Trinity
14 pages
21stCLPW - Lesson 4B - Ilocano Short Story, Voice Tape by Ariel S. Tabag
No ratings yet
21stCLPW - Lesson 4B - Ilocano Short Story, Voice Tape by Ariel S. Tabag
13 pages
Unit 3
No ratings yet
Unit 3
33 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Bda Unit 3
No ratings yet
Bda Unit 3
14 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
WORDWALL and Google Site
No ratings yet
WORDWALL and Google Site
11 pages
Anomaly Detection Firewalls Capabilities and Limitations ICCSE1.2018.8374204
No ratings yet
Anomaly Detection Firewalls Capabilities and Limitations ICCSE1.2018.8374204
5 pages
Map Reduce
No ratings yet
Map Reduce
10 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Hadoop Streaming: Mapreduce
No ratings yet
Hadoop Streaming: Mapreduce
8 pages
Choose The Word Whose Stress Pattern Is Different From That of The Others
No ratings yet
Choose The Word Whose Stress Pattern Is Different From That of The Others
4 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Science 6 - Week 7 Dll-Bow
No ratings yet
Science 6 - Week 7 Dll-Bow
2 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Eed 20 - Final Examination-3a-3b
No ratings yet
Eed 20 - Final Examination-3a-3b
3 pages
Template Reading Assessment Monitoring Tool
No ratings yet
Template Reading Assessment Monitoring Tool
4 pages
Enclitics 2
No ratings yet
Enclitics 2
4 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Acasde
No ratings yet
Acasde
3 pages
Executive Summary of Mujib Climate Prosperity Plan
No ratings yet
Executive Summary of Mujib Climate Prosperity Plan
3 pages
Jewish Beliefs Quote Bank
No ratings yet
Jewish Beliefs Quote Bank
3 pages
MapReduce - Documentation
No ratings yet
MapReduce - Documentation
2 pages
Executing Hadoop Map Reduce Jobs
No ratings yet
Executing Hadoop Map Reduce Jobs
2 pages
Google Input Tool
No ratings yet
Google Input Tool
3 pages
Marriland Team Builder For Pokémon Teams
No ratings yet
Marriland Team Builder For Pokémon Teams
1 page

Unit-2 (MapReduce-II)

Uploaded by

Unit-2 (MapReduce-II)

Uploaded by

Big Data and Analytics

Once we give a MapReduce job the system will enter into a

You might also like