0% found this document useful (0 votes)

0 views

Lab3_BigData-MapReduce

This document outlines a lab exercise for implementing a Word Count program using Hadoop MapReduce. It details the prerequisites, lab tasks, and provides code snippets for creating Mapper, Reducer, and Runner classes. The final steps include creating a JAR file, uploading a sample text file to HDFS, and running the Word Count job while tracking its progress online.

Uploaded by

bts.nou.waw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Lab3_BigData-MapReduce

Uploaded by

bts.nou.waw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Lab3 Big Data

MapReduce

Objective:
The objective of this lab is to implement a basic Word Count program using Hadoop MapReduce.
Students will go through the process of setting up a Hadoop project, defining dependencies, writing
Mapper and Reducer classes, running the job, and verifying the results.

Prerequisites:
− Java Development Environment: Ensure that you have Java installed on your machine, and the
Java development environment is set up.
− Apache Maven: Maven should be installed to manage the project build and
dependencies. Participants should have a basic understanding of Maven.
− Hadoop Installation: A Hadoop cluster or a local Hadoop installation should be available.
Hadoop binaries and configurations should be properly set up.
− Text Editor or IDE: Choose a text editor or integrated development environment (IDE) for
editing code and managing the project.
− Basic Understanding of Hadoop MapReduce: Participants should have a basic understanding of
the MapReduce programming model and its key components such as Mapper, Reducer, and
the overall workflow..
Note :
− Adjust paths based on your specific project setup.

− Ensure that you have the necessary permissions to perform the operations.

Lab Tasks:
1. Open your java IDE and create a maven project “WordCount”
2. Open the pom.xml and add the following dependencies

3. Save the pom.xml file and update the project

<?xml version="1.0" encoding="UTF-8"?>

<groupId>org.codenouhayla</groupId>
<artifactId>WordCount</artifactId>
<version>1.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.2.2</version>
</dependency>

</dependencies>

</project>

4. Create the WC_Mapper class and the add the following code

package org.codenouhayla;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable, Text,

Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private final Text word = new Text();

@Override
public void map(LongWritable key,
Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

5. Create the WC_Reducer class and the add the following code

package org.codenouhayla;

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WC_Reducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {

@Override
public void reduce(Text key,
Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {

int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}

output.collect(key, new IntWritable(sum));

}
}

6. Create the WC_Runner class and the add the following code
package org.codenouhayla;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WC_Runner {
public static void main(String[] args) throws IOException {
if (args.length < 2) {
System.err.println("Usage: WC_Runner <input path> <output path>");
System.exit(-1);
}
JobConf conf = new JobConf(WC_Runner.class);
conf.setJobName("WordCount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(WC_Mapper.class);
conf.setCombinerClass(WC_Reducer.class);
conf.setReducerClass(WC_Reducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}

7. Create the jar file and verify its existence.

9. Create a directory named “input" in HDFS

10. Upload a local file sample.txt to the “input" directory in HDFS:

11. Run the Wordcount using the following command

hadoop jar <localpatht>\WordCount2024\target\WordCount2024-1.0-
SNAPSHOT.jar org.codenouhayla.WC_Runner /input/sample.txt /output
12. Open the /output directory and view its content.
13. Open in the browser the url http://localhost:8088/cluster to track the job.

Alberta Residential Offer To Lease
67% (3)
Alberta Residential Offer To Lease
2 pages
Ch. 12 Review Sheet Key
No ratings yet
Ch. 12 Review Sheet Key
26 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
BDA3
No ratings yet
BDA3
7 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Steps to create jar file and execute word count problem in mapper reducer
No ratings yet
Steps to create jar file and execute word count problem in mapper reducer
5 pages
6 - Simple Wordcount
No ratings yet
6 - Simple Wordcount
2 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
wrordcount
No ratings yet
wrordcount
2 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
ExNo04
No ratings yet
ExNo04
4 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
Palak
No ratings yet
Palak
10 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Source Code for Wordcount
No ratings yet
Source Code for Wordcount
3 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Word Count
No ratings yet
Word Count
10 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Codigo Haddop
No ratings yet
Codigo Haddop
3 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Word Count Program
No ratings yet
Word Count Program
2 pages
049
No ratings yet
049
2 pages
Map Reduce Java Program
No ratings yet
Map Reduce Java Program
2 pages
Wordcount
No ratings yet
Wordcount
3 pages
3 MapReduce program ex code
No ratings yet
3 MapReduce program ex code
14 pages
BDA
No ratings yet
BDA
6 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
✅ PART 1- Install Java and Hadoop on Ubuntu
No ratings yet
✅ PART 1- Install Java and Hadoop on Ubuntu
4 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Exp-11
No ratings yet
Exp-11
4 pages
Exp 3-Word Count
No ratings yet
Exp 3-Word Count
4 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
1WordCount
No ratings yet
1WordCount
2 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Classcreation
No ratings yet
Classcreation
2 pages
11. WordCountApp
No ratings yet
11. WordCountApp
2 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
CCBDI Full Lab Manual Anurag Removed
No ratings yet
CCBDI Full Lab Manual Anurag Removed
97 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Ad4329 Manual
No ratings yet
Ad4329 Manual
54 pages
AIC Netbooks Case Study Solution Maham Irfan 900888967
No ratings yet
AIC Netbooks Case Study Solution Maham Irfan 900888967
5 pages
Importance of Tourism and Hospitality - Lesson 3 Module 1
100% (1)
Importance of Tourism and Hospitality - Lesson 3 Module 1
1 page
IMM - CPM - Course Material (00000002)
No ratings yet
IMM - CPM - Course Material (00000002)
96 pages
I 1040 Sca
No ratings yet
I 1040 Sca
15 pages
21PA1A0571 Internship Final Report
No ratings yet
21PA1A0571 Internship Final Report
38 pages
E NPCertificate23 06 2021 12 - 26 - 10
No ratings yet
E NPCertificate23 06 2021 12 - 26 - 10
3 pages
Full download (Ebook) Programming with C++20: Concepts, Coroutines, Ranges, and more (Updated 2024) by Andreas Fertig ISBN 9783949323058, 3949323058 pdf docx
No ratings yet
Full download (Ebook) Programming with C++20: Concepts, Coroutines, Ranges, and more (Updated 2024) by Andreas Fertig ISBN 9783949323058, 3949323058 pdf docx
86 pages
Charles Schwab Corporation: Group 3 - Section D
No ratings yet
Charles Schwab Corporation: Group 3 - Section D
8 pages
3934-Tender Terms & Conditions
No ratings yet
3934-Tender Terms & Conditions
11 pages
Marcus Harmon Resumegood
No ratings yet
Marcus Harmon Resumegood
1 page
Customs Duty
No ratings yet
Customs Duty
27 pages
Defective Towers Final
No ratings yet
Defective Towers Final
12 pages
Wic 5 Gmobiledevicetestingmisc 36097631921621253867155
No ratings yet
Wic 5 Gmobiledevicetestingmisc 36097631921621253867155
16 pages
SALDANA v. NIAMATALI
No ratings yet
SALDANA v. NIAMATALI
2 pages
Questionnaire Balai Attach (IP-505+IP-509)
No ratings yet
Questionnaire Balai Attach (IP-505+IP-509)
6 pages
Kelly Salary Guide 2017
100% (2)
Kelly Salary Guide 2017
48 pages
Research Proposal 1
No ratings yet
Research Proposal 1
20 pages
Result For ADITI 1.0 Challenges of The Defence Space Agency (DSA)
No ratings yet
Result For ADITI 1.0 Challenges of The Defence Space Agency (DSA)
1 page
SECTORS OF THE INDIAN ECONOMY - Mind Map
No ratings yet
SECTORS OF THE INDIAN ECONOMY - Mind Map
2 pages
LHU RAD UMUM Toshiba RS Kurnia Cilegon
No ratings yet
LHU RAD UMUM Toshiba RS Kurnia Cilegon
20 pages
VF1RZG00666351005
No ratings yet
VF1RZG00666351005
12 pages
CP PBL
No ratings yet
CP PBL
13 pages
A Workbook in Lexical Semantics
No ratings yet
A Workbook in Lexical Semantics
35 pages
Ch14
No ratings yet
Ch14
5 pages
Selfstudys Com File (7)
No ratings yet
Selfstudys Com File (7)
22 pages
Experiment No: 05 Name of The Experiment: Objectives:: Required Component
No ratings yet
Experiment No: 05 Name of The Experiment: Objectives:: Required Component
2 pages
Wimpey Laboratories DUBAI Accreditation
No ratings yet
Wimpey Laboratories DUBAI Accreditation
49 pages

Lab3_BigData-MapReduce

Uploaded by

Lab3_BigData-MapReduce

Uploaded by

Lab3 Big Data

3. Save the pom.xml file and update the project

<?xml version="1.0" encoding="UTF-8"?>

public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable, Text,

private final static IntWritable one = new IntWritable(1);

String line = value.toString();

output.collect(key, new IntWritable(sum));

7. Create the jar file and verify its existence.

10. Upload a local file sample.txt to the “input" directory in HDFS:

11. Run the Wordcount using the following command

You might also like