Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes.Each node
processes the data that is stored at that node.
Mapper Phase
Reduce phase
Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can
be of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate
key value pairs. Applies reducer logic upon the key value pairs and produces the output in
desired format.Output is stored in HDFS
Execution Step:
http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip
Output:
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
@Override
public int run(String[] args) throws Exception {
if(args.length<2)
return -1;
FileInputFormat.setInputPaths(conf,new Path(args[0]));
conf.setMapperClass(wordmapper.class);
conf.setReducerClass(wordreducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
System.exit(exitcode);
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
throws IOException {
String s =value.toString();
if(word.length()>0)
}
}
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
throws IOException {
int count=0;
while(values.hasNext())
{
IntWritable i= values.next();
count+= i.get();