0% found this document useful (0 votes)
12 views6 pages

Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description

Uploaded by

bodanaji5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description

Uploaded by

bodanaji5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Practical-2

Aim: Write a program of Word Count in Map Reduce over HDFS.


Description:

MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes.Each node
processes the data that is stored at that node.

Consists of two main phases

Mapper Phase

Reduce phase

Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can
be of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate
key value pairs. Applies reducer logic upon the key value pairs and produces the output in
desired format.Output is stored in HDFS
Execution Step:

http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip

Create the jar file of this program and name it wordcount.jar.

Run the jar file

hadoop fs –mkdir /input

hadoop fs –put /home/training/Desktop/sample.txt /input

hadoop jar /home/training/Desktop/wc.jar wordcount /input/sample.txt /output

Output:

hadoop fs –cat /output/part-00000

Word Count Java Program

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class wordcount extends Configured implements Tool {

@Override
public int run(String[] args) throws Exception {

if(args.length<2)

System.out.println("Plz Give Input Output Directory Correctly");

return -1;

JobConf conf = new JobConf(wordcount.class);

FileInputFormat.setInputPaths(conf,new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

conf.setMapperClass(wordmapper.class);

conf.setReducerClass(wordreducer.class);

conf.setMapOutputKeyClass(Text.class);

conf.setMapOutputValueClass(IntWritable.class);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

JobClient.runJob(conf);

return 0;

public static void main(String args[]) throws Exception

int exitcode = ToolRunner.run(new wordcount(), args);

System.exit(exitcode);

}
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

public class wordmapper extends MapReduceBase implements


Mapper<LongWritable,Text,Text,IntWritable>

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter r)

throws IOException {

String s =value.toString();

for(String word:s.split(" "))

if(word.length()>0)

output.collect(new Text(word), new IntWritable(1));

}
}

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class wordreducer extends MapReduceBase implements


Reducer<Text,IntWritable,Text,IntWritable>

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter r)

throws IOException {

int count=0;

while(values.hasNext())
{

IntWritable i= values.next();

count+= i.get();

output.collect(key, new IntWritable(count));

You might also like