0% found this document useful (0 votes)
10 views10 pages

Unit 4

The document provides an overview of MapReduce programming in Hadoop, detailing the roles of the Mapper and Reducer classes in processing data in parallel. It explains how to calculate the number of Mappers based on input file blocks and discusses the function of the Combiner and Partitioner in optimizing data transfer and processing. Additionally, it includes example code snippets for setting up a MapReduce job and presents job counters for performance metrics.

Uploaded by

viswalecturer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Unit 4

The document provides an overview of MapReduce programming in Hadoop, detailing the roles of the Mapper and Reducer classes in processing data in parallel. It explains how to calculate the number of Mappers based on input file blocks and discusses the function of the Combiner and Partitioner in optimizing data transfer and processing. Additionally, it includes example code snippets for setting up a MapReduce job and presents job counters for performance metrics.

Uploaded by

viswalecturer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DEPARTMENT OF INFORMATION TECHNOLOGY

20CS601 BIG DATA ANALYTICS


UNIT IV-MAPREDUCE
PROGRAMMING AND
HIVE
• Mapper Reducer

• Map-Reduce is a programming model that is mainly divided into two


phases Map Phase and Reduce Phase. It is designed for processing the
data in parallel which is divided on various machines (nodes). The
Hadoop Java programs are consist of Mapper class and Reducer class
along with the driver class. Hadoop Mapper is a function or task
which is used to process all input records from a file and generate the
output which works as input for Reducer.
• How to calculate the number of Mappers in Hadoop:

• The number of blocks of input file defines the number of map-task in the
Hadoop

• Map-phase, which can be calculated with the help of the below formula.

• Mapper = (total data size)/ (input split size)

• Combiner The Combiner class is used in between the Map class and the
Reduce class to reduce the volume of data transfer between Map and
Reduce. Usually, the output of the map task is large and the data
transferred to the reduce task is high
• It produces the output by returning new key value pairs. The
input data has to be converted to key-value pairs as Mapper
can not process the raw input records or tuples(key-value
pairs). The mapper also generates some small blocks of data
while processing the input records as a key-value pair. we
will discuss the various process that occurs in Mapper, There
key features and how the key-value pairs are generated in the
Mapper.
• The number of partitioners is equal to the number of reducers. That means a partitioner

will divide the data according to the number of reducers. Therefore, the data

• passed from a single partitioner is processed by a single Reducer.

• A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the

data using a user-defined condition, which works like a hash function. The

• total number of partitions is same as the number of Reducer tasks for the job. Let us take

an example to understand how the partitioner works


• Map Tasks
• The map task accepts the key-value pairs as input while we have the
text data in
• a text file. The input for this map task is as follows
• Input: The key would be a pattern such as “any special key + filename
+ line number”
• (example: key = @input1) and the value would be the data in that line
(example: value =
• 1201 \t gopal \t 45 \t Male \t 50000)
• job.setMapperClass(MapClass.class);
• job.setMapOutputKeyClass(Text.class);
• job.setMapOutputValueClass(Text.class);
• //set partitioner statement
• job.setPartitionerClass(CaderPartitioner.class);
• job.setReducerClass(ReduceClass.class);
• job.setNumReduceTasks(3);
• job.setInputFormatClass(TextInputFormat.class);
• job.setOutputFormatClass(TextOutputFormat.class);
• job.setOutputKeyClass(Text.class);
• job.setOutputValueClass(Text.class);
• System.exit(job.waitForCompletion(true)? 0 : 1);
• return 0
• FILE: Number of large read operations=0 FILE: Number of write
operations=0 HDFS: Number of bytes read=480 HDFS: Number of
bytes written=72 HDFS: Number of read operations=12 HDFS:
Number of large read operations=0 HDFS: Number of write
operations=6
• Job Counters Launched map tasks=1 Launched reduce tasks=3 Data-
local map tasks=1 Total time spent by all maps in occupied slots
(ms)=8212 Total time spent by all reduces in occupied slots
(ms)=59858 Total time spent by all map tasks (ms)=8212.
Thank You

You might also like