Exp 4 Word Count
Exp 4 Word Count
Objective: Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
Description:
MapReduce is a processing technique and a program model for distributed computing based on
java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes
a set of data and converts it into another set of data, where individual elements are broken down
into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as an input
and combines those data tuples into a smaller set of tuples. As the sequence of the name
MapReduce implies, the reduce task is always performed after the map job. WordCount is a simple
program that counts the number of occurrences of each word in a given text input set.
Program:
Mapper Class:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
// Reduce function
public void reduce(Text key, Iterator<IntWritable> value,
OutputCollector<Text, IntWritable> output,
Reporter rep) throws IOException
{
int count = 0;
// Main Method
public static void main(String args[]) throws Exception
{
int exitCode = ToolRunner.run(new WCDriver(), args);
System.out.println(exitCode);
}
}
Output:
Input File:
Welcome everyone.
Welcome to Hadoop lab.
Today we are going to work on Hadoop MapReduce concept.
Output File:
MapReduce 1
Today 1
Welcome 2
are 1
concept. 1
everyone 1
going 1
Hadoop 2
lab. 1
on 1
to 2
we 1
work 1