Experiment 6 BDA
Experiment 6 BDA
AIM: -To write a program to implement a word count program using MapReduce.
THEORY:
WordCount is a simple program which counts the number of occurrences of each word in a
given text input data set. WordCount fits very well with the MapReduce programming
model making it a great example to understand the Hadoop Map/Reduce programming
style. The implementation consists of three main parts:
1. Mapper
2. Reducer
3. Driver
Pseudo-code
Pseudo-code
final_output.collect(keyword, sum);
}
Code:
import java.io.IOException;
import
java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
public class WordCount
{
}
}
IOException,InterruptedException {
int sum=0;
for(IntWritable x:
values)
{
sum+=x.get();
}
}
public static void main(String[] args) throws
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class
);Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the
jobFileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
outputPath.getFileSystem(conf).delete(outputPath);
CONCLUSION:
We have studied the concept of map reduce and implemented word count using
map reduce
Date: 23/09/2024
Roll No.: 53