Word Count Program
Word Count Program
driver.java
package wordcount;
import java.io. *;
import java.util.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.fs.Path;
public class driver
{
public static void main(String args[]) throws IOException
{
JobConf conf=new JobConf(driver.class);
conf.setMapperClass(mapper.class);
conf.setReducerClass(reducer.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));
JobClient.runJob(conf);
}
}
mapper.java
package wordcount;
import java.io.*;
import java.util.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.io.*;
reducer.java
package wordcount;
import java.io.*;
import java.util.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.io.*;
int sum = 0;
Steps to run
1. Create a New File named Bash.sh
2. Copy the Below code and Paste inside Bash.sh and save that File.
export JAVA_HOME=$(readlink -f $(which javac) | awk 'BEGIN {FS="/bin"} {print $1}')
export PATH=$(echo $PATH):$(pwd)/bin
export CLASSPATH=$(hadoop classpath)
3. Execute the bash.sh File using following command source Bash.sh.
4. Verify JAVA_HOME variable to be set to Java Path and PATH variable has your USN
Hadoop Folder.
If any previous PATH set to Hadoop Folder remove that inside .bashrc file.
5. Verify Hadoop is Installed or not by executing hadoop command.if command gives
Information about
Hadoop command then Hadoop is Successfully Installed.
6. Create a folder word count and move to that folder.
7. Make the driver.java , mapper.java and reducer.java files.
8. Compile all java files (driver.java mapper.java reducer.java)
javac -d . *.java
9. Set driver class in manifest
echo Main-Class: wordcount.driver > Manifest.txt
10. Create an executable jar file
jar cfm wordcount.jar Manifest.txt word count/*.class
11. oe.txt is input file for Oddeven create Input File
echo “hello good morning, hello have a nice day” > input.txt
12. Run the jar file
hadoop jar wordcount.jar input.txt output
13. To see the Output
cat output/*