0% found this document useful (0 votes)

7 views64 pages

6. Map Reduce Programming

The document provides an overview of Hadoop MapReduce, a framework for processing large data sets in parallel across clusters of commodity hardware. It explains the MapReduce programming model, including the roles of mappers and reducers, as well as various input formats and the importance of combiners and partitioners in optimizing data processing. Additionally, it includes examples of MapReduce jobs, such as word count and inverted index, along with Java code snippets for implementation.

Uploaded by

21bce233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views64 pages

6. Map Reduce Programming

Uploaded by

21bce233

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Map Reduce Programming

Contents
• Map-Reduce Programming
• Exercises
• Mappers & Reducers
• Hadoop combiners
• Hadoop partitioners
Overview
• Hadoop MapReduce is a software framework for easily writing applications
which process vast amounts of data (multi-terabyte data-sets) in-parallel
on large clusters (thousands of nodes) of commodity hardware in a
reliable, fault-tolerant manner.
• A MapReduce job usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner.
• The framework sorts the outputs of the maps, which are then input to the
reduce tasks.
• Typically both the input and the output of the job are stored in a file-
system.
• The framework takes care of scheduling tasks, monitoring them and re-
executes the failed tasks.
• Typically the compute nodes and the storage nodes are the same, that
is, the MapReduce framework and the Hadoop Distributed File System
are running on the same set of nodes.
• This configuration allows the framework to effectively schedule tasks on
the nodes where data is already present, resulting in very high aggregate
bandwidth across the cluster.
• The MapReduce framework consists of a single master Resource
Manager, one worker NodeManager per cluster-node, and
MRAppMaster per application
What is Map Reduce?
Word count Job
Input : Text file
Output : count of words
Hi how are you
Hi how are you? how is your job
how is your job?
how is your family how is your family
how is your sister how is your sister
how is your brother
what is the time now How is your brother
what is the strength of the Hadoop what is the time now
File.txt
what is the strength of the Hadoop
Size :: 500MB
Text Input format
Key Value Text Input Format
Sequence File Input Format
Input file SequenceFileAsTextInput Format

Input split 1 Input split 1 Input split 1 Input split 1

Record reader Record reader Record reader Record reader

Byteoffset , record
Mapper Mapper Mapper Mapper

Because of collection framework , as it doesnot work on the

Primitive types Wrapper Class Box Class types, wrapper classes are created.
int Integer IntWritable Collection framework Work with the object to type so objec
long Long LongWritable class is to be created. Similar to Java as it has introduced wr
float Float FloatWritable corresponding to primitive class. Hadoop has introduced Bo
double Double DoubleWritable For Java conversion from primitive to wrapper is done autom
string String StringWritable for Hadoop we need to explicitly mention that conversion.
char Character CharWritable Int -→ new IntWritable(int) and get () method for back
etc… etc.. etc..
400 MB
The MapReduce programming model
• Simple distributed functional programming primitives
• Modeled after Lisp primitives:
• map (apply function to all items in a collection) and
• reduce (apply function to set of items with a common key)
• We start with:
• A user-defined function to be applied to all data,
map: (key,value) → (key, value)
• Another user-specified operation
reduce: (key, {set of values}) → result
• A set of n nodes, each with data
• All nodes run map on all of their data, producing new data with keys
• This data is collected by key, then shuffled, and finally reduced
• Dataflow is through temp files on GFS

2024/9/22 9
MapReduce: In Parallel

2024/9/22 10
Steps of MapReduce
3 steps of MapReduce
• Sequentially read a lot of data
• Map: Extract something you care about
• Group by key: Sort and shuffle
• Reduce: Aggregate, summarize, filter or transform
• Output the result

2024/9/22 11
MapReduce Examples #example1
• word count using MapReduce
• map(keyin, valuein,keyout,valueout):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
• reduce(keyin, values): (hi,1)(how,1)(hi,1)(you,1)

// key: a word; values: an iterator over counts

result = 0
for each count v in values: (hi,(1,1))
result += v
emit(key, result)
counting words of different lengths#example2
• Input file :
hi how are you?
Welcome to Nirma University.
• Output file :
2: 2 , 3:3 , 5:1, 7:1 , 10:1
How??
hi:2, how:3, are:3, you:3,welcome:7,to:2,Nirma:5,University:10
• Mapper Task :
• Emit(2,hi),(2,to)(3,how)………
• Reducer Task :
• (2:[hi,to])
• (3:[how,are,you])
• ……
Find out the word length histogram#example3
• particular document is given and then we have to, find out that how
many big medium and small words are
• appearing in the particular document and this becomes the word
length histogram.
Big : Yellow : 10+
Medium: Red : 5 to 9
Small: Blue : 2to 4
Tiny: pink : 1
Inverted Index #example4
• Finding given word from search engine
• Input:
Tweet1, “I love pancakes for breakfast”
Tweet2, “I dislike pancakes”
Tweet3, “What should I eat for breakfast?”
Tweet4, “I love to eat”
• Output:
Pancakes(tweet1,tweet2)
Breakfast(tweet1,tweet3)
eat(tweet3,tweet4)
love(tweet1,tweet4)
• Find out Mapper and Reducer
Sum of even numbers and sum of odd
numbers of squares of a given number list
Sum of even numbers , sum of odd numbers and
sum of prime numbers of squares of a given
number list
In Detail
• Hadoop Mapper
• Hadoop Reducer
• Key-Value Pairs
• Input Format
• Record Reader
• Partitioner
Mapper in Hadoop Map-Reduce
• How is key value pair generated in Hadoop?
1. Input Split
2. Record Reader
InputSplit
• InputSplit in Hadoop MapReduce is the logical representation of data. It
describes a unit of work that contains a single map task in a MapReduce
program.
• As a user, we don’t need to deal with InputSplit directly, because they are
created by an InputFormat
• mapred.min.split.size parameter in mapred-site.xml we can control this value
or by overriding the parameter in the Job object used to submit a particular
MapReduce job.
• The client (running the job) can calculate the splits for a job by calling
‘getSplit()’, and then sent to the application master, which uses their
storage locations to schedule map tasks that will process them on the
cluster.
• Then, map task passes the split to the createRecordReader() method
on InputFormat to get RecordReader for the split and RecordReader
generate record (key-value pair), which it passes to the map function.
public abstract class InputFormat<K, V>
{
public abstract List<InputSplit> getSplits(JobContext context)
throws IOException, InterruptedException;
public abstract RecordReader<K, V>
createRecordReader(InputSplit split, TaskAttemptContext context)
throws IOException,
InterruptedException;
}
What is Hadoop InputFormat?
• The InputFormat class is one of the fundamental classes in the Hadoop
MapReduce framework which provides the following functionality:
1. The files or other objects that should be used for input is selected by the
InputFormat.
2. InputFormat defines the Data splits, which defines both the size of
individual Map tasks and its potential execution server.
3. InputFormat defines the RecordReader, which is responsible for reading
actual records from the input files.
Types of InputFormat in MapReduce
1. FileInputFormat

• It is the base class for all file-based InputFormats.

• FileInputFormat also specifies input directory which has data files
location.
• When we start a MapReduce job execution, FileInputFormat provides
a path containing files to read.
• This InpuFormat will read all files. Then it divides these files into one
or more InputSplits.
2. TextInputFormat

• It is the default InputFormat. This InputFormat treats each line of

each input file as a separate record. It performs no parsing.
TextInputFormat is useful for unformatted data or line-based records
like log files. Hence,
• Key – It is the byte offset of the beginning of the line within the file
(not whole file one split). So it will be unique if combined with the
file name.
• Value – It is the contents of the line. It excludes line terminators.
3. KeyValueTextInputFormat

• It is similar to TextInputFormat. This InputFormat also treats each line

of input as a separate record.
• While the difference is that TextInputFormat treats entire line as the
value, but the KeyValueTextInputFormat breaks the line itself into key
and value by a tab character (‘/t’). Hence,
• Key – Everything up to the tab character.
• Value – It is the remaining part of the line after tab character.
4. SequenceFileInputFormat

• It is an InputFormat which reads sequence files.

• Sequence files are binary files.
• These files also store sequences of binary key-value pairs. These are
block-compressed and provide direct serialization and deserialization
of several arbitrary data.
• Key & Value both are user-defined.
5. SequenceFileAsTextInputFormat

• It is the variant of SequenceFileInputFormat. This format converts the

sequence file key values to Text objects. So, it performs conversion by
calling ‘tostring()’ on the keys and values.
• Hence, SequenceFileAsTextInputFormat makes sequence files suitable
input for streaming.

6. SequenceFileAsBinaryInputFormat
By using SequenceFileInputFormat we can extract the sequence
file’s keys and values as an opaque binary object.
7. NlineInputFormat

• It is another form of TextInputFormat where the keys are byte offset of the
line. And values are contents of the line. So, each mapper receives a
variable number of lines of input with TextInputFormat and
KeyValueTextInputFormat.
• The number depends on the size of the split. Also, depends on the length
of the lines. So, if want our mapper to receive a fixed number of lines of
input, then we use NLineInputFormat.
• N- It is the number of lines of input that each mapper receives.
• By default (N=1), each mapper receives exactly one line of input.
• Suppose N=2, then each split contains two lines. So, one mapper receives
the first two Key-Value pairs. Another mapper receives the second two key-
value pairs.
8. DBInputFormat

• This InputFormat reads data from a relational database, using JDBC. It

also loads small datasets, perhaps for joining with large datasets
from HDFS using MultipleInputs. Hence,
• Key – LongWritables
• Value – DBWritables.
Where to mention?
• Driver code
• job.setInputFormatClass(DBInputFormat.class);
• job.setOutputFormatClass(DBOutputFormat.class);
Record Reader
• The MapReduce RecordReader in Hadoop takes the byte-oriented
view of input, provided by the InputSplit and presents as a record-
oriented view for Mapper.
• Map task passes the split to the createRecordReader() method on
InputFormat in task tracker to obtain a RecordReader for that split.
The RecordReader load’s data from its source and converts into key-
value pairs suitable for reading by the mapper.
Types of Hadoop Record Reader in MapReduce
i. LineRecordReader
ii. SequenceFileRecordReader

• Maximum size for a Single Record

• conf.setInt("mapred.linerecordreader.maxlength", Integer.MAX_VALUE);
• A line with a size greater than this maximum value (default is
2,147,483,647) will be ignored.
Hadoop Record Writer
• Record Writer writes these output key-value pairs from the Reducer
phase to output files.
• TextOutputFormat
• SequenceFileOutputFormat
• SequenceFileAsBinaryOutputFormat
• MapFileOutputFormat
• MultipleOutputs
• LazyOutputFormat
• DBOutputFormat
Java Programs for Word count Problem
• Driver Code
• Mapper Code
• Reducer Code
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,Context context) throws

IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
//job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
NCDC data
Raw Format of weather data

Mapper Input data

Mapper Output data

Reducer Input data

Reducer Output data

Hadoop/Map Reduce Combiners
• On a large dataset when we run MapReduce job, large chunks of
intermediate data is generated by the Mapper and this intermediate
data is passed on the Reducer for further processing, which leads to
enormous network congestion. MapReduce framework provides a
function known as Hadoop Combiner that plays a key role in reducing
network congestion
• The combiner in MapReduce is also known as ‘Mini-reducer’. The
primary job of Combiner is to process the output data from the
Mapper, before passing it to Reducer. It runs after the mapper and
before the Reducer and its use is optional
MapReduce program without Combiner
MapReduce program with Combiner
Advantages of MapReduce Combiner
• Hadoop Combiner reduces the time taken for data transfer between
mapper and reducer.
• It decreases the amount of data that needed to be processed by the
reducer.
• The Combiner improves the overall performance of the reducer.

Disadvantages of MapReduce Combiner

• MapReduce jobs cannot depend on the Hadoop combiner execution
because there is no guarantee in its execution.
• In the local filesystem, the key-value pairs are stored in the Hadoop and
run the combiner later which will cause expensive disk IO.
Where to define it?

In Driver class
• job. Set Combiner Class(ReduceClass.class);
Separate java file

• public class Combiners Hadoop {

Hadoop Partitioner / MapReduce Partitioner
• Partitioning of the keys of the intermediate map output is controlled
by the Partitioner.
Need of Hadoop MapReduce Partitioner

• MapReduce job takes an input data set and produces the list of the
key-value pair which is the result of map phase in which input data is
split and each task processes the split and each map, output the list
of key-value pairs. Then, the output from the map phase is sent to
reduce task which processes the user-defined reduce function on
map outputs.

• Before reduce phase, partitioning of the map output take place on

the basis of the key and sorted.
• This partitioning specifies that all the values for each key are grouped
together and make sure that all the values of a single key go to the
same reducer, thus allows even distribution of the map output over
the reducer.
• Partitioner in Hadoop MapReduce redirects the mapper output to the
reducer by determining which reducer is responsible for the
particular key.
• The Default Hadoop partitioner in Hadoop MapReduce is Hash
Partitioner which computes a hash value for the key and assigns the
partition based on this result.
MapReduce: A Diagram

2024/9/22 60
How many Partitioner

• The total number of Partitioners that run in Hadoop is equal to the

number of reducers i.e. Partitioner will divide the data according to
the number of reducers which is set by JobConf.setNumReduceTasks()
method.
• Thus, the data from single Partitioner is processed by a single reducer.
And Partitioner is created only when there are multiple reducers
Poor Partitioning in Hadoop MapReduce
• If in data input one key appears more than any other key then
1. The key appearing more will be sent to one partition.
2. All the other key will be sent to partitions according to their hashCode().
• If hashCode() method does not uniformly distribute other keys data
over partition range, then data will not be evenly sent to reducers.
• Poor partitioning of data means that some reducers will have more
data input than other i.e. they will have more work to do than other
reducers. So, the entire job will wait for one reducer to finish its extra-
large share of the load.
• we can create Custom partitioner, which allows sharing workload
uniformly across different reducers.
• Partitioner provides the getPartition() method that you can implement
yourself if you want to declare the custom partition for your job.

• public static class MyPartitioner extends Partitioner<Text,Text>{

• public int getPartition(Text key, Text value, int numReduceTasks){

if(numReduceTasks==0)
return 0;
if(key.equals(new Text(“Male”)) )
return 0;
if(key.equals(new Text(“Female”)))
return 1;
}
}
How to set number of reducers?
• By default no of reducer=1
• If you mention JobConf.setNumReduceTasks(0) then no of reducers
are 0 and process will be executed only using mappers. No sorting&
shuffling will applied
• Methods to set no of reducers
1. Command line (bin/hadoop jar -Dmapreduce.job.maps=5 yourapp.jar..)
mapred.map.tasks --> mapreduce.job.maps
mapred.reduce.tasks --> mapreduce.job.reduces
2. In the code, one can configure JobConf variables.
job.setNumMapTasks(5); // 5 mappers
job.setNumReduceTasks(2); // 2 reducers

Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
S MapReduce Types Formats
100% (2)
S MapReduce Types Formats
22 pages
CLOUD UNIT 5
No ratings yet
CLOUD UNIT 5
52 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
Lecture-4
No ratings yet
Lecture-4
28 pages
Mapreduce Types and Formats
No ratings yet
Mapreduce Types and Formats
65 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
S MapReduce Types Formats Features
No ratings yet
S MapReduce Types Formats Features
15 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Unit 4
No ratings yet
Unit 4
11 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
B. Hadoop Ecosystem_III_b (MapReduce Framework)
No ratings yet
B. Hadoop Ecosystem_III_b (MapReduce Framework)
33 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Map Reduce Types and Formats
No ratings yet
Map Reduce Types and Formats
32 pages
Understanding Inputs and Outputs of Mapreduce
No ratings yet
Understanding Inputs and Outputs of Mapreduce
13 pages
Hadoop MapReduce Flow Chart
No ratings yet
Hadoop MapReduce Flow Chart
28 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts (1)
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts (1)
26 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Quick HadOop Ref Card Always
No ratings yet
Quick HadOop Ref Card Always
2 pages
Map Red
No ratings yet
Map Red
6 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Hadoop Week 4
No ratings yet
Hadoop Week 4
13 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Job Scheduling in MR
No ratings yet
Job Scheduling in MR
6 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
hadoop2
No ratings yet
hadoop2
31 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Hadoop Tutorial - YDN
No ratings yet
Hadoop Tutorial - YDN
14 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
MAP Reduce - 1 (1).Pptx (1)
No ratings yet
MAP Reduce - 1 (1).Pptx (1)
34 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Hadoop Training in Hyderabad
No ratings yet
Hadoop Training in Hyderabad
49 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Ethics of Cinematic Experience Screens of Alterity (Orna Raviv) (Z-Library)
No ratings yet
Ethics of Cinematic Experience Screens of Alterity (Orna Raviv) (Z-Library)
175 pages
Moro (2003) Parents and Infants in Changing Cultural Context - Imigration, Trauma and Risk
No ratings yet
Moro (2003) Parents and Infants in Changing Cultural Context - Imigration, Trauma and Risk
26 pages
GCC Case File
No ratings yet
GCC Case File
17 pages
Andrew S. Berggren
No ratings yet
Andrew S. Berggren
2 pages
GRADE-2-CREATIVE-ACTIVITIES-DESIGN-May-2024-final_NoRestriction
No ratings yet
GRADE-2-CREATIVE-ACTIVITIES-DESIGN-May-2024-final_NoRestriction
52 pages
Champollion 21 Advanced-M
No ratings yet
Champollion 21 Advanced-M
94 pages
CEMS in Power Plant
No ratings yet
CEMS in Power Plant
37 pages
34th EOS
No ratings yet
34th EOS
38 pages
Assessment Task 2-RMP
No ratings yet
Assessment Task 2-RMP
3 pages
Unit 6
No ratings yet
Unit 6
70 pages
Barneys Reaction To Priem and Butler
No ratings yet
Barneys Reaction To Priem and Butler
17 pages
Dolar - Phrenology of Spirit
No ratings yet
Dolar - Phrenology of Spirit
10 pages
Chapter 3 - Problem Solutions Problems Are Form Ugural's Book Chapter 3, Problem 29
No ratings yet
Chapter 3 - Problem Solutions Problems Are Form Ugural's Book Chapter 3, Problem 29
8 pages
Operations-Research (Set 3)
No ratings yet
Operations-Research (Set 3)
17 pages
BEHAVIORISM VS PSYCHO ANALYSIS
No ratings yet
BEHAVIORISM VS PSYCHO ANALYSIS
12 pages
Instant Download Functions of the Brain A Conceptual Approach to Cognitive Neuroscience Albert Kok PDF All Chapters
100% (5)
Instant Download Functions of the Brain A Conceptual Approach to Cognitive Neuroscience Albert Kok PDF All Chapters
55 pages
Catalog Grandfar - 2023 - 118
No ratings yet
Catalog Grandfar - 2023 - 118
1 page
KA SAT SNG Terminals PDF
No ratings yet
KA SAT SNG Terminals PDF
2 pages
Chapter 13 SUMMARY
No ratings yet
Chapter 13 SUMMARY
11 pages
Proprioceptive Neuromuscular Facilitation (PNF) Lecture (5)
100% (1)
Proprioceptive Neuromuscular Facilitation (PNF) Lecture (5)
13 pages
Big Data Analytics - Sgtrategy and Roadmap
No ratings yet
Big Data Analytics - Sgtrategy and Roadmap
31 pages
Manual - Pdms Hvac Design Vol1
No ratings yet
Manual - Pdms Hvac Design Vol1
98 pages
Card FAQ: The City Bank LTD
No ratings yet
Card FAQ: The City Bank LTD
13 pages
Analisis Nilai-Nilai Pancasila Yang Terkandung Di Dalam Seni Tutur Tadut (Studi Kasus Tadut Di Kota Pagaralam)
No ratings yet
Analisis Nilai-Nilai Pancasila Yang Terkandung Di Dalam Seni Tutur Tadut (Studi Kasus Tadut Di Kota Pagaralam)
17 pages
026 - Manuale Smontaggio Leopard 125 Ingl
No ratings yet
026 - Manuale Smontaggio Leopard 125 Ingl
33 pages
Anna University: Results of UG/PG Examinations (Grade System) - NOV/DEC 2012
No ratings yet
Anna University: Results of UG/PG Examinations (Grade System) - NOV/DEC 2012
1 page
Standards Projects 18-11-08 NBF
No ratings yet
Standards Projects 18-11-08 NBF
2 pages
HP Pavilion Dv5000 1657
No ratings yet
HP Pavilion Dv5000 1657
5 pages
AT 401 - OM 6 Lubrication PDF
No ratings yet
AT 401 - OM 6 Lubrication PDF
2 pages
0417 s18 QP 21
No ratings yet
0417 s18 QP 21
12 pages

6. Map Reduce Programming

Uploaded by

6. Map Reduce Programming

Uploaded by

Map Reduce Programming

Input split 1 Input split 1 Input split 1 Input split 1

Record reader Record reader Record reader Record reader

Because of collection framework , as it doesnot work on the

// key: a word; values: an iterator over counts

• It is the base class for all file-based InputFormats.

• It is the default InputFormat. This InputFormat treats each line of

• It is similar to TextInputFormat. This InputFormat also treats each line

• It is an InputFormat which reads sequence files.

• It is the variant of SequenceFileInputFormat. This format converts the

• This InputFormat reads data from a relational database, using JDBC. It

• Maximum size for a Single Record

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

public void reduce(Text key, Iterable<IntWritable> values,Context context) throws

Mapper Input data

Mapper Output data

Reducer Output data

Disadvantages of MapReduce Combiner

• public class Combiners Hadoop {

• Before reduce phase, partitioning of the map output take place on

• The total number of Partitioners that run in Hadoop is equal to the

• public static class MyPartitioner extends Partitioner<Text,Text>{

• public int getPartition(Text key, Text value, int numReduceTasks){

You might also like