Unit 4

The document provides an overview of MapReduce programming in Hadoop, detailing the roles of the Mapper and Reducer classes in processing data in parallel. It explains how to calculate the number of Mappers based on input file blocks and discusses the function of the Combiner and Partitioner in optimizing data transfer and processing. Additionally, it includes example code snippets for setting up a MapReduce job and presents job counters for performance metrics.

Uploaded by

viswalecturer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Unit 4

Uploaded by

viswalecturer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

DEPARTMENT OF INFORMATION TECHNOLOGY

20CS601 BIG DATA ANALYTICS

UNIT IV-MAPREDUCE
PROGRAMMING AND
HIVE
• Mapper Reducer

• Map-Reduce is a programming model that is mainly divided into two

phases Map Phase and Reduce Phase. It is designed for processing the
data in parallel which is divided on various machines (nodes). The
Hadoop Java programs are consist of Mapper class and Reducer class
along with the driver class. Hadoop Mapper is a function or task
which is used to process all input records from a file and generate the
output which works as input for Reducer.
• How to calculate the number of Mappers in Hadoop:

• The number of blocks of input file defines the number of map-task in the
Hadoop

• Map-phase, which can be calculated with the help of the below formula.

• Mapper = (total data size)/ (input split size)

• Combiner The Combiner class is used in between the Map class and the
Reduce class to reduce the volume of data transfer between Map and
Reduce. Usually, the output of the map task is large and the data
transferred to the reduce task is high
• It produces the output by returning new key value pairs. The
input data has to be converted to key-value pairs as Mapper
can not process the raw input records or tuples(key-value
pairs). The mapper also generates some small blocks of data
while processing the input records as a key-value pair. we
will discuss the various process that occurs in Mapper, There
key features and how the key-value pairs are generated in the
Mapper.
• The number of partitioners is equal to the number of reducers. That means a partitioner

will divide the data according to the number of reducers. Therefore, the data

• passed from a single partitioner is processed by a single Reducer.

• A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the

data using a user-defined condition, which works like a hash function. The

• total number of partitions is same as the number of Reducer tasks for the job. Let us take

an example to understand how the partitioner works

• Map Tasks
• The map task accepts the key-value pairs as input while we have the
text data in
• a text file. The input for this map task is as follows
• Input: The key would be a pattern such as “any special key + filename
+ line number”
• (example: key = @input1) and the value would be the data in that line
(example: value =
• 1201 \t gopal \t 45 \t Male \t 50000)
• job.setMapperClass(MapClass.class);
• job.setMapOutputKeyClass(Text.class);
• job.setMapOutputValueClass(Text.class);
• //set partitioner statement
• job.setPartitionerClass(CaderPartitioner.class);
• job.setReducerClass(ReduceClass.class);
• job.setNumReduceTasks(3);
• job.setInputFormatClass(TextInputFormat.class);
• job.setOutputFormatClass(TextOutputFormat.class);
• job.setOutputKeyClass(Text.class);
• job.setOutputValueClass(Text.class);
• System.exit(job.waitForCompletion(true)? 0 : 1);
• return 0
• FILE: Number of large read operations=0 FILE: Number of write
operations=0 HDFS: Number of bytes read=480 HDFS: Number of
bytes written=72 HDFS: Number of read operations=12 HDFS:
Number of large read operations=0 HDFS: Number of write
operations=6
• Job Counters Launched map tasks=1 Launched reduce tasks=3 Data-
local map tasks=1 Total time spent by all maps in occupied slots
(ms)=8212 Total time spent by all reduces in occupied slots
(ms)=59858 Total time spent by all map tasks (ms)=8212.
Thank You

Piper Progressive Inspection 100 Hour Cycle: Cheyenne
No ratings yet
Piper Progressive Inspection 100 Hour Cycle: Cheyenne
66 pages
Final Marketing Plan Whole
No ratings yet
Final Marketing Plan Whole
19 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
4th Gen Core Family Desktop Vol 1 Datasheet
No ratings yet
4th Gen Core Family Desktop Vol 1 Datasheet
125 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
Module 2.1 Managerial Economics
No ratings yet
Module 2.1 Managerial Economics
18 pages
Incredible India Quiz Prelims: Avant-Garde Essence 2018
No ratings yet
Incredible India Quiz Prelims: Avant-Garde Essence 2018
86 pages
Hadoop
No ratings yet
Hadoop
38 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Mapreduce Types and Formats
No ratings yet
Mapreduce Types and Formats
65 pages
3 - Technical - Methods of Development
No ratings yet
3 - Technical - Methods of Development
29 pages
Corolla Diesel PDF
No ratings yet
Corolla Diesel PDF
2 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Unit 2
No ratings yet
Unit 2
15 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Riello Burner
No ratings yet
Riello Burner
48 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Business Research
No ratings yet
Business Research
44 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
SuperiorBroomDT80 CT
No ratings yet
SuperiorBroomDT80 CT
2 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Chapter 4 Legal Regulatory and Political Issues
No ratings yet
Chapter 4 Legal Regulatory and Political Issues
2 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Big Assignment 2
No ratings yet
Big Assignment 2
10 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
2014 Capstone Team Member Guide
No ratings yet
2014 Capstone Team Member Guide
28 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
A'Seeb Wastewater Project Seeb, Muscat, Sultanate of Oman
No ratings yet
A'Seeb Wastewater Project Seeb, Muscat, Sultanate of Oman
3 pages
Search:: A Really Simple Database
No ratings yet
Search:: A Really Simple Database
30 pages
Question Bank-BDA
No ratings yet
Question Bank-BDA
15 pages
Map Reduce
No ratings yet
Map Reduce
10 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Cases Syllabus IV - Book III
No ratings yet
Cases Syllabus IV - Book III
46 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Mock Board Exam Reviewer
No ratings yet
Mock Board Exam Reviewer
21 pages
Advisory: Region11.Davaodelsur@Tesda - Gov.Ph, Ftbarretejr@Tesda - Gov.Ph. Dz4Oxerkpthbyig-Kddmfjhdt4Iefefkhy/Edit#Gid 0
No ratings yet
Advisory: Region11.Davaodelsur@Tesda - Gov.Ph, Ftbarretejr@Tesda - Gov.Ph. Dz4Oxerkpthbyig-Kddmfjhdt4Iefefkhy/Edit#Gid 0
2 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Map Red
No ratings yet
Map Red
6 pages
Chapter 3 Abstract
No ratings yet
Chapter 3 Abstract
19 pages
Ielts
No ratings yet
Ielts
1 page
Enterprise Value and EBITDA
No ratings yet
Enterprise Value and EBITDA
3 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
ĐỀ THI THỬ SỐ 10 - Khóa Đề
No ratings yet
ĐỀ THI THỬ SỐ 10 - Khóa Đề
6 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Roy T. Fielding Thesis
100% (3)
Roy T. Fielding Thesis
7 pages
Bda U2
No ratings yet
Bda U2
79 pages
Updated CV Hrithik Mhatre
No ratings yet
Updated CV Hrithik Mhatre
2 pages
Bda Mod2
No ratings yet
Bda Mod2
8 pages
Unit 3
No ratings yet
Unit 3
14 pages
Bda U3, U4 and U5 Two Marks Qs
No ratings yet
Bda U3, U4 and U5 Two Marks Qs
19 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Mapreduce Notes
No ratings yet
Mapreduce Notes
4 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
AD 6 Chawl Case Study
No ratings yet
AD 6 Chawl Case Study
4 pages
Partitioner & Combiner
No ratings yet
Partitioner & Combiner
2 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Daa 2
No ratings yet
Daa 2
4 pages
3rd Quarter ACR
No ratings yet
3rd Quarter ACR
4 pages
NRC, Logistics Officer, Cover Letter & CV, Elhamfrotan.
No ratings yet
NRC, Logistics Officer, Cover Letter & CV, Elhamfrotan.
4 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
MAP Reduce - 1
No ratings yet
MAP Reduce - 1
34 pages
Unit 2
No ratings yet
Unit 2
24 pages
Unit 3
No ratings yet
Unit 3
33 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
SPM Physics Definition List
No ratings yet
SPM Physics Definition List
5 pages
Unit 2
No ratings yet
Unit 2
12 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
64 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
2 1-MapReduce
No ratings yet
2 1-MapReduce
16 pages
E-Commerce & Business Communication Ebook (SEM 4)
No ratings yet
E-Commerce & Business Communication Ebook (SEM 4)
87 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

DEPARTMENT OF INFORMATION TECHNOLOGY

20CS601 BIG DATA ANALYTICS

• Map-Reduce is a programming model that is mainly divided into two

• Mapper = (total data size)/ (input split size)

• passed from a single partitioner is processed by a single Reducer.

• A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the

an example to understand how the partitioner works

You might also like