0% found this document useful (0 votes)

2 views

6. Map Reduce Programming

MapReduce is a programming model for processing large data sets using a map function to generate intermediate key/value pairs and a reduce function to merge values. It automatically parallelizes tasks across a cluster, handling data partitioning and machine failures. The document also covers practical examples, including word count and inverted index, and details the roles of Mapper, Reducer, and Driver classes in a MapReduce job.

Uploaded by

yogipatel2724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

6. Map Reduce Programming

Uploaded by

yogipatel2724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

Map Reduce

Programming
Contents
• Map-Reduce Programming
• Exercises
• Mappers & Reducers
• Hadoop combiners
• Hadoop partitioners
• MapReduce is a programming model and associated implementation
for processing and generating the large data set.
• Users specify a map function that processes a Key/Value pair to
generate a set of intermediate key/value pairs, and a reduce function
that merges all intermediate values associated with the same
intermediate key.
• Many real world tasks are expressible in this model
• Programs written in this functional style are automatically parallelized
and executed on a large cluster of commodity machines
• The run-time system takes care of the details of partitioning the input
data, scheduling the program’s execution across a set of machines,
handling machine failures and managing the required inter-machine
communication.
• Typical MapReduce computation processes many terabytes of data on
thousands of machines.
MapReduce Examples
#example1
• word count using MapReduce
• map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
(hi,1)(how,1)(hi,1)(you,1)
• reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
(hi,(1,1))
for each count v in values:
result += v
emit(key, result)
counting words of different
lengths#example2
• Input file :
hi how are you?
Welcome to Nirma University.
• Output file :
2: 2 , 3:3 , 5:1, 7:1 , 10:1
How??
hi:2, how:3, are:3, you:3,welcome:7,to:2,Nirma:5,University:10
• Mapper Task :
• Emit(2,hi),(2,to)(3,how)………
• Reducer Task :
• (2:[hi,to])
• (3:[how,are,you])
• ……
Recap: HDFS
• HDFS: A specialized distributed file system
• Good for large amounts of data, sequential reads
• Bad for lots of small files, random access, non-append writes
• Architecture: Blocks, namenode, datanodes
• File data is broken into large blocks (64MB default)
• Blocks are stored & replicated by datanodes
• Single namenode manages all the metadata
• Secondary namenode: Housekeeping & (some) redundancy
• Usage: Special command-line interface
• Example: hadoop fs -ls /path/in/hdfs
Example
input : Dear, Bear, River, Car, Car, River, Deer, Car and Bear
Word count Job
Input : Text file
Output : count of words
Hi how are you
Hi how are you? how is your job
how is your job?
how is your family how is your family
how is your sister how is your sister
how is your brother
what is the time now How is your brother
what is the strength of the Hadoop what is the time now
File.txt
what is the strength of the Hadoop
Size :: 200MB
Text Input format
Input file Key Value Text Input Format
Sequence File Input Format
SequenceFileAsTextInput Format

Input split 1 Input split 1 Input split 1 Input split 1

Record reader Record reader Record reader Record reader

Byteoffset , record
Mapper Mapper Mapper Mapper

• Because of collection framework , as it doesnot work on the

primitive types, wrapper classes are created.
Primitive types Wrapper Class Box Class • Collection framework Work with the object to type so object of
int Integer IntWritable wrapper class is to be created.
long Long LongWritable • Similar to Java as it has introduced wrapper class corresponding
float Float FloatWritable to primitive class. Hadoop has introduced Box classes.
double Double DoubleWritable • For Java conversion from primitive to wrapper is done
string String StringWritable automatically but for Hadoop we need to explicitly mention that
char Character CharWritable conversion.
etc… etc.. etc.. • Int - new IntWritable(int) and get () method for back
More Details
• Input: a set of key-value pairs
• Programmer specifies two methods:
• Map(k, v) → <k’, v’>*
• Takes a key-value pair and outputs a set of key-value pairs
• E.g., key is the filename, value is a single line in the file
• There is one Map call for every (k,v) pair

• Reduce(k’, <v’>) → <k’, v’’>

• All values v’ with same key k’ are reduced together and processed in
v’ order
• There is one Reduce function call per unique key k’

02/15/2025 19
MapReduce: A Diagram

02/15/2025 20
What is Map Reduce?
• Sum of squares:
• (map square ‘(1 2 3 4))
Output: (1 4 9 16)
• (reduce + ‘(1 4 9 16))
(+ 16 (+ 9 (+ 4 +( 1)) ) )
Output: 30
Mapper Class

• The first stage in Data Processing using MapReduce is the Mapper

Class. Here, RecordReader processes each Input record and generates
the respective key-value pair. Hadoop’s Mapper store saves this
intermediate data into the local disk.
• Input Split
It is the logical representation of data. It represents a block of work that
contains a single map task in the MapReduce Program.
• RecordReader
It interacts with the Input split and converts the obtained data in the form of
Key-Value Pairs.
Reducer Class

• The Intermediate output generated from the mapper is fed to the

reducer which processes it and generates the final output which is
then saved in the HDFS.
Driver Class

• The major component in a MapReduce job is a Driver Class. It is

responsible for setting up a MapReduce Job to run-in Hadoop. We
specify the names of Mapper and Reducer Classes long with data
types and their respective job names.
More examples
• Distributed grep – all lines matching a pattern
• Map: filter by pattern
• Reduce: output set
• Count URL access frequency
• Map: output each URL as key, with count 1
• Reduce: sum the counts
• Reverse web-link graph
• Map: output (target,source) pairs when link to target
found in souce
• Reduce: concatenates values and emits (target,list(source))

02/15/2025 26
What do we need to write a MR program?

• A mapper
• Accepts (key,value) pairs from the input
• Produces intermediate (key,value) pairs, which are then shuffled
• A reducer
• Accepts intermediate (key,value) pairs
• Produces final (key,value) pairs for the output
• A driver
• Specifies which inputs to use, where to put the outputs
• Chooses the mapper and the reducer to use
• Hadoop takes care of the rest!!
• Default behaviors can be customized by the driver
02/15/2025 27
The Mapper Input format Intermediate format
(file offset, line) can be freely chosen

import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.io.*;

public class WCMapper extends Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable key, Text value, Context context) {
context.write(new Text("foo"), value);
}
}

• Extends abstract 'Mapper' class

• Input/output types are specified as type parameters
• Implements a 'map' function
• Accepts (key,value) pair of the specified type
• Writes output pairs by calling 'write' method on context
• Mixing up the types will cause problems at runtime (!)
02/15/2025 28
The Reducer Intermediate format
(same as mapper output) Output format

import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.io.*;

public class WCReducer extends Reducer<Text, Text, IntWritable, Text> {

public void reduce(Text key, Iterable<Text> values, Context context)
throws java.io.IOException, InterruptedException
{
for (Text value: values)
context.write(new IntWritable(4711), value); Note: We may get
} multiple values for
} the same key!

• Extends abstract 'Reducer' class

• Must specify types again (must be compatible with mapper!)
• Implements a 'reduce' function
• Values are passed in as an 'Iterable'
• Caution: These are NOT normal Java classes. Do not store them in
collections - content can change between iterations!
02/15/2025 29
The Driver import
import
org.apache.hadoop.mapreduce.*;
org.apache.hadoop.io.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCDriver {

public static void main(String[] args) throws Exception { Mapper&Reducer are
Configuration c=new Configuration(); in the same Jar as
Job j=Job.getInstance(c,"Word count Example");
WCDriver
j.setJarByClass(WordCount.class);
j.setMapperClass(WCMapper.class);
j.setNumReduceTasks(3); Format of the (key,value)
j.setReducerClass(WCReducer.class); pairs output by the
j.setOutputKeyClass(Text.class); reducer
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j,new Path(args[0]));
FileOutputFormat.setOutputPath(j, new Path(args[1]));
Input and Output
System.exit(j.waitForCompletion(true) ? 0 : 1);
} paths
}

• Specifies how the job is to be executed

• Input and output directories; mapper & reducer classes

02/15/2025 30
Find out the word length histogram#example3
• particular document is given and then we have to, find out that how
many big medium and small words are
• appearing in the particular document and this becomes the word
length histogram.
bb

Big : Yellow : 10+

Medium: Red : 5 to 9
Small: Blue : 2to 4
Tiny: pink : 1
Inverted Index #example4
• Finding given word from search engine
• Input:
Tweet1, “I love pancakes for breakfast”
Tweet2, “I dislike pancakes”
Tweet3, “What should I eat for breakfast?”
Tweet4, “I love to eat”
• Output:
Pancakes(tweet1,tweet2)
Breakfast(tweet1,tweet3)
eat(tweet3,tweet4)
love(tweet1,tweet4)
• Find out Mapper and Reducer
In Detail
• Hadoop Mapper
• Hadoop Reducer
• Key-Value Pairs
• Input Format
• Record Reader
• Partitioner
Mapper in Hadoop Map-Reduce
• How is key value pair generated in Hadoop?
1. Input Split
2. Record Reader
InputSplit
• InputSplit in Hadoop MapReduce is the logical representation of data.
It describes a unit of work that contains a single map task in a
MapReduce program.
• As a user, we don’t need to deal with InputSplit directly, because they
are created by an InputFormat
• mapred.min.split.size parameter in mapred-site.xml we can control
this value or by overriding the parameter in the Job object used to
submit a particular MapReduce job.
• The client (running the job) can calculate the splits for a job by calling
‘getSplit()’, and then sent to the application master, which uses their
storage locations to schedule map tasks that will process them on the
cluster.
• Then, map task passes the split to the createRecordReader() method
on InputFormat to get RecordReader for the split and RecordReader
generate record (key-value pair), which it passes to the map function.
What is Hadoop InputFormat?
• The InputFormat class is one of the fundamental classes in the Hadoop
MapReduce framework which provides the following functionality:
1. The files or other objects that should be used for input is selected by the
InputFormat.
2. InputFormat defines the Data splits, which defines both the size of
individual Map tasks and its potential execution server.
3. InputFormat defines the RecordReader, which is responsible for reading
actual records from the input files.
How we get the data to
mapper?

{
public abstract List<InputSplit> getSplits(JobContext context) throws IOException,
InterruptedException;
public abstract RecordReader<K, V>
createRecordReader(InputSplit split,
TaskAttemptContext context) throws IOException, InterruptedException;
}
Types of InputFormat in
MapReduce
Record Reader
• The MapReduce RecordReader in Hadoop takes the byte-oriented
view of input, provided by the InputSplit and presents as a record-
oriented view for Mapper.
• Map task passes the split to the createRecordReader() method on
InputFormat in task tracker to obtain a RecordReader for that split.
The RecordReader load’s data from its source and converts into key-
value pairs suitable for reading by the mapper.
Types of Hadoop Record Reader
in MapReduce
i. LineRecordReader
ii. SequenceFileRecordReader

• Maximum size for a Single Record

• conf.setInt("mapred.linerecordreader.maxlength", Integer.MAX_VALUE);
• A line with a size greater than this maximum value (default is
2,147,483,647) will be ignored.
Hadoop Record Writer
• Record Writer writes these output key-value pairs from the Reducer
phase to output files.
• TextOutputFormat
• SequenceFileOutputFormat
• SequenceFileAsBinaryOutputFormat
• MapFileOutputFormat
• MultipleOutputs
• LazyOutputFormat
• DBOutputFormat
Java Programs for Word count
Problem
• Driver Code
• Mapper Code
• Reducer Code
$hadoop jar wc.jar wordcount i/p o/p

public class WCDriver{

public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
// TODO code application logic here
Configuration c=new Configuration();
Job j=Job.getInstance(c,"Word count Example");
j.setJarByClass(WordCount.class);
j.setMapperClass(WCMapper.class);
j.setNumReduceTasks(3);
j.setReducerClass(WCReducer.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j,new Path(args[0]));
FileOutputFormat.setOutputPath(j, new Path(args[1]));
System.exit(j.waitForCompletion(true) ? 0 : 1);
}

}
Adding following lines // Importing libraries

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WCMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

String x= value.toString(); // x="hi how are you"

for(String word: x.split(" ")) // ["hi","how","are","you"]

{
context.write(new Text(word),new IntWritable(1));
}
}

}
public class WCMapper extends Mapper< LongWritable,
Text, Text, IntWritable>

•We have created a class Map that extends the class Mapper which is already
defined in the MapReduce Framework.
•We define the data types of input and output key/value pair after the class
declaration using angle brackets.
•Both the input and output of the Mapper is a key/value pair.
•Input:
•The key is nothing but the offset of each line in the text file: LongWritable
•The value is each individual line (as shown in the figure at the right): Text
•Output:
•The key is the tokenized words: Text
•We have the hardcoded value in our case which is 1: IntWritable
•Example – Dear 1, Bear 1, etc.
•We have written a java code where we have tokenized each word and assigned
them a hardcoded value equal to 1.
Adding following lines // Importing libraries

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Mapper;
public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws
IOException, InterruptedException {

int count=0;

for (IntWritable val : values) { //A - <1,1,1>

count += val.get();
}
context.write(key, new IntWritable(count));

}
• We have created a class Reduce which extends class Reducer like that of Mapper.
• We define the data types of input and output key/value pair after the class declaration using angle brackets as
done for Mapper.
• Both the input and the output of the Reducer is a key-value pair.
• Input:
• The key nothing but those unique words which have been generated after the sorting and shuffling phase: Text
• The value is a list of integers corresponding to each key: IntWritable
• Example – Bear, [1, 1], etc.
• Output:
• The key is all the unique words present in the input text file: Text
• The value is the number of occurrences of each of the unique words: IntWritable
• Example – Bear, 2; Car, 3, etc.
• We have aggregated the values present in each of the list corresponding to each key and produced the final
answer.
• In general, a single reducer is created for each of the unique words, but, you can specify the number of reducer
in mapred-site.xml.
Adding following lines // Importing libraries

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Reducer;
NCDC data
Raw Format of weather data

Mapper Input data

Mapper Output data

Reducer Input data

Reducer Output data

Hadoop/Map Reduce Combiners
• On a large dataset when we run MapReduce job, large chunks of
intermediate data is generated by the Mapper and this intermediate
data is passed on the Reducer for further processing, which leads to
enormous network congestion. MapReduce framework provides a
function known as Hadoop Combiner that plays a key role in reducing
network congestion
• The combiner in MapReduce is also known as ‘Mini-reducer’. The
primary job of Combiner is to process the output data from the
Mapper, before passing it to Reducer. It runs after the mapper and
before the Reducer and its use is optional
MapReduce program without
Combiner
MapReduce program with
Combiner
Advantages of MapReduce Combiner
• Hadoop Combiner reduces the time taken for data transfer between
mapper and reducer.
• It decreases the amount of data that needed to be processed by the
reducer.
• The Combiner improves the overall performance of the reducer.
Disadvantages of MapReduce
Combiner
• MapReduce jobs cannot depend on the Hadoop combiner execution
because there is no guarantee in its execution.
• In the local filesystem, the key-value pairs are stored in the Hadoop and
run the combiner later which will cause expensive disk IO.
Where to define it?

In Driver class
• job. Set Combiner
Class(ReduceClass.class);
Separate java file

• public class Combiners Hadoop {

Hadoop Partitioner / MapReduce Partitioner
• Partitioning of the keys of the intermediate map output is controlled
by the Partitioner.
Need of Hadoop MapReduce
Partitioner
• MapReduce job takes an input data set and produces the list of the
key-value pair which is the result of map phase in which input data is
split and each task processes the split and each map, output the list of
key-value pairs. Then, the output from the map phase is sent to
reduce task which processes the user-defined reduce function on map
outputs.

• Before reduce phase, partitioning of the map output take place on the
basis of the key and sorted.
• This partitioning specifies that all the values for each key are grouped
together and make sure that all the values of a single key go to the
same reducer, thus allows even distribution of the map output over
the reducer.
• Partitioner in Hadoop MapReduce redirects the mapper output to the
reducer by determining which reducer is responsible for the particular
key.
• The Default Hadoop partitioner in Hadoop MapReduce is Hash
Partitioner which computes a hash value for the key and assigns the
partition based on this result.
How many Partitioner

• The total number of Partitioners that run in Hadoop is equal to the

number of reducers i.e. Partitioner will divide the data according to
the number of reducers which is set by JobConf.setNumReduceTasks()
method.
• Thus, the data from single Partitioner is processed by a single reducer.
And Partitioner is created only when there are multiple reducers
Poor Partitioning in Hadoop
MapReduce
• If in data input one key appears more than any other key then
1. The key appearing more will be sent to one partition.
2. All the other key will be sent to partitions according to their hashCode().
• If hashCode() method does not uniformly distribute other keys data
over partition range, then data will not be evenly sent to reducers.
• Poor partitioning of data means that some reducers will have more
data input than other i.e. they will have more work to do than other
reducers. So, the entire job will wait for one reducer to finish its extra-
large share of the load.
• we can create Custom partitioner, which allows sharing workload
uniformly across different reducers.
• Partitioner provides the getPartition() method that you can implement
yourself if you want to declare the custom partition for your job.
• public static class MyPartitioner extends Partitioner<Text,Text>{
public int getPartition(Text key, Text value, int numReduceTasks){
if(numReduceTasks==0)
return 0;
if(key.equals(new Text(“Male”)) )
return 0;
if(key.equals(new Text(“Female”)))
return 1;
}
}
How to set number of reducers?
• By default no of reducer=1
• If you mention JobConf.setNumReduceTasks(0) then no of reducers
are 0 and process will be executed only using mappers. No sorting&
shuffling will applied
• Methods to set no of reducers
1. Command line (bin/hadoop jar -Dmapreduce.job.maps=5 yourapp.jar..)
mapred.map.tasks --> mapreduce.job.maps
mapred.reduce.tasks --> mapreduce.job.reduces
2. In the code, one can configure JobConf variables.
job.setNumMapTasks(5); // 5 mappers
job.setNumReduceTasks(2); // 2 reducers
MapReduce: In Parallel

02/15/2025 74

Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
64 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Unit 3
No ratings yet
Unit 3
14 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
unit 2
No ratings yet
unit 2
12 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Palak
No ratings yet
Palak
10 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
M4_06_MapReduce
No ratings yet
M4_06_MapReduce
28 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
31 pages
CLOUD UNIT 5
No ratings yet
CLOUD UNIT 5
52 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Hadoop
No ratings yet
Hadoop
34 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Hadoop Map Reduce
No ratings yet
Hadoop Map Reduce
53 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Research Paper on Hadoop Mapreduce
100% (1)
Research Paper on Hadoop Mapreduce
4 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
WWW W3schools Com CPP CPP - Function - Overloading Asp
No ratings yet
WWW W3schools Com CPP CPP - Function - Overloading Asp
6 pages
Lab Guide May2015
100% (1)
Lab Guide May2015
272 pages
E-Sys_EN_Release-Notes_V3_39_1
No ratings yet
E-Sys_EN_Release-Notes_V3_39_1
1 page
MIICT0010EA DICOM MWM - Conformance
No ratings yet
MIICT0010EA DICOM MWM - Conformance
24 pages
CUDA Installation Guide Linux
No ratings yet
CUDA Installation Guide Linux
45 pages
Advanced Verilog Coding
No ratings yet
Advanced Verilog Coding
76 pages
1Z0-1195-25
No ratings yet
1Z0-1195-25
8 pages
KMW Whitepaper mMIMO
No ratings yet
KMW Whitepaper mMIMO
6 pages
Full Stack Javascript Techdegree - Program Outline PDF
No ratings yet
Full Stack Javascript Techdegree - Program Outline PDF
8 pages
Microproject DTM.
No ratings yet
Microproject DTM.
19 pages
CSEC Information Technology Syllabus with Specimen Papers
No ratings yet
CSEC Information Technology Syllabus with Specimen Papers
72 pages
Managing Your Salesforce CRM Storage
100% (1)
Managing Your Salesforce CRM Storage
5 pages
Slno 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
No ratings yet
Slno 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
16 pages
Azbil r35 r36 Manual
No ratings yet
Azbil r35 r36 Manual
12 pages
Required Specification of Devices For Online Program
No ratings yet
Required Specification of Devices For Online Program
2 pages
AP1 MeasuringBox-Rev4Eng PDF
No ratings yet
AP1 MeasuringBox-Rev4Eng PDF
9 pages
DSD Fpga R22 Lab Maual Ok
No ratings yet
DSD Fpga R22 Lab Maual Ok
54 pages
Nakshatra's Resume Adobe
No ratings yet
Nakshatra's Resume Adobe
2 pages
Oracle Cloud Infrastructure Powers Cloud-Connected Enterprises
No ratings yet
Oracle Cloud Infrastructure Powers Cloud-Connected Enterprises
19 pages
Lecture 1 - Introduction To Networking
No ratings yet
Lecture 1 - Introduction To Networking
37 pages
Router Configuration - Synology Inc
No ratings yet
Router Configuration - Synology Inc
3 pages
Mazak t2
No ratings yet
Mazak t2
3 pages
Assignment 2 - Linux commands
No ratings yet
Assignment 2 - Linux commands
6 pages
S4 EC1 Lab
No ratings yet
S4 EC1 Lab
87 pages
Ds Acopos1022 1045 1090
No ratings yet
Ds Acopos1022 1045 1090
19 pages
Stellar AP Deployment Troubleshooting Guide
No ratings yet
Stellar AP Deployment Troubleshooting Guide
250 pages
Lecture05 - PCM - DPCM - DM - ADM - Shounak Dasgupta
No ratings yet
Lecture05 - PCM - DPCM - DM - ADM - Shounak Dasgupta
63 pages
Cloudsim
No ratings yet
Cloudsim
15 pages
Display Monitor Measurement Methods
No ratings yet
Display Monitor Measurement Methods
60 pages
Freecom 1 Plus
No ratings yet
Freecom 1 Plus
22 pages

6. Map Reduce Programming

Uploaded by

6. Map Reduce Programming

Uploaded by

Map Reduce

Input split 1 Input split 1 Input split 1 Input split 1

Record reader Record reader Record reader Record reader

• Because of collection framework , as it doesnot work on the

• Reduce(k’, <v’>*) → <k’, v’’>*

• The first stage in Data Processing using MapReduce is the Mapper

• The Intermediate output generated from the mapper is fed to the

• The major component in a MapReduce job is a Driver Class. It is

public class WCMapper extends Mapper<LongWritable, Text, Text, Text> {

• Extends abstract 'Mapper' class

public class WCReducer extends Reducer<Text, Text, IntWritable, Text> {

• Extends abstract 'Reducer' class

public class WCDriver {

• Specifies how the job is to be executed

Big : Yellow : 10+

• Maximum size for a Single Record

public class WCDriver{

String x= value.toString(); // x="hi how are you"

for(String word: x.split(" ")) // ["hi","how","are","you"]

for (IntWritable val : values) { //A - <1,1,1>

Mapper Input data

Mapper Output data

Reducer Output data

• public class Combiners Hadoop {

• The total number of Partitioners that run in Hadoop is equal to the

You might also like

• Reduce(k’, <v’>) → <k’, v’’>