0% found this document useful (0 votes)
0 views

Lab3_BigData-MapReduce

This document outlines a lab exercise for implementing a Word Count program using Hadoop MapReduce. It details the prerequisites, lab tasks, and provides code snippets for creating Mapper, Reducer, and Runner classes. The final steps include creating a JAR file, uploading a sample text file to HDFS, and running the Word Count job while tracking its progress online.

Uploaded by

bts.nou.waw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lab3_BigData-MapReduce

This document outlines a lab exercise for implementing a Word Count program using Hadoop MapReduce. It details the prerequisites, lab tasks, and provides code snippets for creating Mapper, Reducer, and Runner classes. The final steps include creating a JAR file, uploading a sample text file to HDFS, and running the Word Count job while tracking its progress online.

Uploaded by

bts.nou.waw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lab3 Big Data

MapReduce

Objective:
The objective of this lab is to implement a basic Word Count program using Hadoop MapReduce.
Students will go through the process of setting up a Hadoop project, defining dependencies, writing
Mapper and Reducer classes, running the job, and verifying the results.

Prerequisites:
− Java Development Environment: Ensure that you have Java installed on your machine, and the
Java development environment is set up.
− Apache Maven: Maven should be installed to manage the project build and
dependencies. Participants should have a basic understanding of Maven.
− Hadoop Installation: A Hadoop cluster or a local Hadoop installation should be available.
Hadoop binaries and configurations should be properly set up.
− Text Editor or IDE: Choose a text editor or integrated development environment (IDE) for
editing code and managing the project.
− Basic Understanding of Hadoop MapReduce: Participants should have a basic understanding of
the MapReduce programming model and its key components such as Mapper, Reducer, and
the overall workflow..
Note :
− Adjust paths based on your specific project setup.

− Ensure that you have the necessary permissions to perform the operations.

Lab Tasks:
1. Open your java IDE and create a maven project “WordCount”
2. Open the pom.xml and add the following dependencies

3. Save the pom.xml file and update the project

<?xml version="1.0" encoding="UTF-8"?>


<project xmlns="http://maven.apache.org/POM/4.0.0"
<modelVersion>4.0.0</modelVersion>

<groupId>org.codenouhayla</groupId>
<artifactId>WordCount</artifactId>
<version>1.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.2.2</version>
</dependency>

</dependencies>

</project>

4. Create the WC_Mapper class and the add the following code

package org.codenouhayla;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable, Text,


Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);


private final Text word = new Text();

@Override
public void map(LongWritable key,
Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {

String line = value.toString();


StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

5. Create the WC_Reducer class and the add the following code

package org.codenouhayla;

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WC_Reducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {

@Override
public void reduce(Text key,
Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {

int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}

output.collect(key, new IntWritable(sum));


}
}

6. Create the WC_Runner class and the add the following code
package org.codenouhayla;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WC_Runner {
public static void main(String[] args) throws IOException {
if (args.length < 2) {
System.err.println("Usage: WC_Runner <input path> <output path>");
System.exit(-1);
}
JobConf conf = new JobConf(WC_Runner.class);
conf.setJobName("WordCount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(WC_Mapper.class);
conf.setCombinerClass(WC_Reducer.class);
conf.setReducerClass(WC_Reducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}

7. Create the jar file and verify its existence.


9. Create a directory named “input" in HDFS

10. Upload a local file sample.txt to the “input" directory in HDFS:

11. Run the Wordcount using the following command


hadoop jar <localpatht>\WordCount2024\target\WordCount2024-1.0-
SNAPSHOT.jar org.codenouhayla.WC_Runner /input/sample.txt /output
12. Open the /output directory and view its content.
13. Open in the browser the url http://localhost:8088/cluster to track the job.

You might also like