0% found this document useful (0 votes)
4 views

✅ PART 1- Install Java and Hadoop on Ubuntu

This document provides a step-by-step guide to install Java and Hadoop on Ubuntu, configure environment variables, and write a WordCount Java program. It includes instructions for compiling the program, creating a JAR file, and running a MapReduce job to count word occurrences in a text file. The final output displays the count of each word processed by the job.

Uploaded by

ayeshagujrati00
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

✅ PART 1- Install Java and Hadoop on Ubuntu

This document provides a step-by-step guide to install Java and Hadoop on Ubuntu, configure environment variables, and write a WordCount Java program. It includes instructions for compiling the program, creating a JAR file, and running a MapReduce job to count word occurrences in a text file. The final output displays the count of each word processed by the job.

Uploaded by

ayeshagujrati00
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

✅ PART 1: Install Java and Hadoop on Ubuntu

🧰 Step 1: Install Java (JDK)


sudo apt update
sudo apt install openjdk-11-jdk -y
java -version

📦 Step 2: Download and Configure Hadoop (Standalone Mode)


🔽 Download Hadoop
cd ~
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop

🔧 Set Environment Variables


Edit ~/.bashrc:

nano ~/.bashrc

Add these at the end:

export HADOOP_HOME=~/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

Apply the changes:


source ~/.bashrc

✅ Test:
hadoop version

✅ PART 2: Write the WordCount Java Code


Create a folder and Java file:

mkdir ~/wordcount
cd ~/wordcount
nano WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

​ public static class TokenizerMapper


​extends Mapper<Object, Text, Text, IntWritable> {
​ private final static IntWritable one = new IntWritable(1);
​ private Text word = new Text();
​ public void map(Object key, Text value, Context context)
​ throws IOException, InterruptedException {
​ StringTokenizer itr = new StringTokenizer(value.toString());
​ while (itr.hasMoreTokens()) {
​ word.set(itr.nextToken());
​ context.write(word, one);
​ }
​ }
​ }

​ public static class IntSumReducer


​ extends Reducer<Text,IntWritable,Text,IntWritable> {
​ private IntWritable result = new IntWritable();
​ public void reduce(Text key, Iterable<IntWritable> values,
​Context context) throws IOException, InterruptedException {
​ int sum = 0;
​ for (IntWritable val : values) {
​ sum += val.get();
​ }
​ result.set(sum);
​ context.write(key, result);
​ }
​ }

​ public static void main(String[] args) throws Exception {


​ Configuration conf = new Configuration();
​ Job job = Job.getInstance(conf, "word count");
​ job.setJarByClass(WordCount.class);
​ job.setMapperClass(TokenizerMapper.class);
​ job.setCombinerClass(IntSumReducer.class);
​ job.setReducerClass(IntSumReducer.class);
​ job.setOutputKeyClass(Text.class);
​ job.setOutputValueClass(IntWritable.class);
​ FileInputFormat.addInputPath(job, new Path(args[0]));
​ FileOutputFormat.setOutputPath(job, new Path(args[1]));
​ System.exit(job.waitForCompletion(true) ? 0 : 1);
​ }
}

✅ PART 3: Compile and Run the Program


🔧 Step 1: Compile
mkdir classes
javac -classpath
"$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/mapreduce/*" -d
classes WordCount.java

📦 Step 2: Create a JAR


jar -cvf wordcount.jar -C classes/ .

✅ PART 4: Run WordCount Job (Standalone)


📁 Step 1: Create Input File
mkdir input
echo "hadoop mapreduce hadoop word count word count" > input/test.txt

▶️ Step 2: Run MapReduce Job


hadoop jar wordcount.jar WordCount input output

📄 Step 3: View Output


cat output/part-r-00000

count 2
hadoop 2
mapreduce 1
word​ 2

You might also like