0% found this document useful (0 votes)

14 views9 pages

B1 Instructions

The document provides a step-by-step guide for creating a simple Word Count application using the Hadoop Map-Reduce framework in Java. It includes instructions for setting up the environment, writing the mapper, reducer, and runner classes, compiling the code, and executing the MapReduce job. The final output will display the count of occurrences for each word in the provided input text file.

Uploaded by

shrushtib27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

B1 Instructions

Uploaded by

shrushtib27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Group B- Big Data Analytics

Write a code in JAVA for a simple Word Count application that counts
the number of occurrences of each word in a given input set using the
Hadoop Map-Reduce framework on local-standalone set-up.

Execute following commands on Terminal

java -version

To check the version of Java installed on your system, you can use the command
java -version.

su - hadoop

The command su - hadoop is used to switch the current user to the user named
"hadoop" in a Unix-based operating system. This is typically used in environments
where multiple users have access to a system, and "hadoop" might be a user
account associated with the Hadoop distributed computing framework.

When you run this command, you will be prompted to enter the password for the
"hadoop" user account. If the password is entered correctly, you will then be logged
in as the "hadoop" user, inheriting its environment settings and permissions.

cd hadoop

hadoop version

To check the version of Hadoop installed on your system, you can use the hadoop
version command. This command will display the installed version of Hadoop along
with some additional information.

nano data1.txt

put following text into data1.txt

HDFS is a storage unit of Hadoop
MapReduce is a processing tool of Hadoop
press control + s then control x

start-all.sh
The start-all.sh script is typically used in Hadoop environments to start all the
Hadoop daemons simultaneously.
hdfs dfs -mkdir /test_wc

The command hdfs dfs -mkdir /test_wc is used to create a directory named "test_wc" in

the Hadoop Distributed File System (HDFS).

Here's what each part of the command does:

● hdfs: This command is used to interact with the Hadoop Distributed File System
(HDFS).
● dfs: This sub-command specifies that the operation is related to the Hadoop
Distributed File System.
● -mkdir: This option specifies that you want to create a directory.
● /test_wc: This is the path of the directory you want to create. The leading forward
slash (/) indicates that the directory should be created in the root directory of HDFS.

So, when you run hdfs dfs -mkdir /test_wc, it creates a directory named "test_wc" in the

root directory of HDFS.

hdfs dfs -put admin:///home/hadoop/hadoop/data1.txt /test_wc

Or
hdfs dfs -put /home/hadoop/hadoop/data1.txt /test_wc

data1.txt will be copied into the /test_wc directory in HDFS.

Ifconfig
Copy the ip address
put following URL in firefox
Ip address followed by :9870
go to utilities
browse the file system
type /test_wc

nano WC_Mapper.java

package com.javatpoint;

/* This line specifies the package name where this Java class belongs. Packages are
used for organizing classes into namespaces.*/

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

/* These lines import necessary Java and Hadoop libraries that are required for the
functionalities used in this class. For instance, java.io.IOException is imported for
handling input/output exceptions, java.util.StringTokenizer is used to tokenize strings,
and the org.apache.hadoop.io packages contain classes for various data types used in
Hadoop MapReduce jobs. */

public class WC_Mapper extends MapReduceBase implements

Mapper<LongWritable,Text,Text,IntWritable>{

/* This line declares a Java class named WC_Mapper. It extends MapReduceBase, which
is a Hadoop class used as a base class for MapReduce mapper and reducer classes.
It also implements the Mapper interface, specifying the input and output key-value
types for the mapper. Here, LongWritable represents the input key type (offset of a
line in the input file), Text represents the input value type (a line of text), Text
represents the output key type (a word), and IntWritable represents the output value
type (count of occurrences of the word). */

private final static IntWritable one = new IntWritable(1);

/* This line declares a constant variable named one of type IntWritable, initialized
with the value 1. It is used to represent the count of each word, initialized to 1. */

private Text word = new Text();

/* This line declares a variable named word of type Text. It is used to store each word
extracted from the input text during the mapping process. */

public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable>

output,Reporter reporter) throws IOException{

/* This line defines the map method required by the Mapper interface. It takes four
parameters: key (representing the offset of a line in the input file), value
(representing a line of text), output (used to collect output key-value pairs), and
reporter (used for reporting progress and status). It throws IOException to handle
input/output exceptions. */

String line = value.toString();

/* This line converts the Text object value (representing a line of text) into a Java
string named line. */

StringTokenizer tokenizer = new StringTokenizer(line);

/* This line creates a StringTokenizer object named tokenizer, which is used to

tokenize the line into individual words based on whitespace. */
while (tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
output.collect(word, one);
/* This block of code iterates over each tokenized word using the StringTokenizer.
For each word, it sets the word variable to the current word using word.set(), and
then it collects the key-value pair (word, one) using the OutputCollector. This
effectively emits each word with a count of 1, which is later aggregated in the
reducer phase to calculate the total count of each word. */
}
}

}

control+s then control+x

nano WC_Reducer.java

package com.javatpoint;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WC_Reducer extends MapReduceBase implements
Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterator<IntWritable>

values,OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {

/* This line defines the reduce method, which is a part of the reducer class in a
Hadoop MapReduce program.

● Text key: This parameter represents the key for this particular invocation of
the reduce method. In a word count example like this, it represents a unique
word.
● Iterator<IntWritable> values: This parameter represents an iterator over
the list of values associated with the key. In this case, it iterates over the
counts of occurrences of the word represented by the key.
● OutputCollector<Text, IntWritable> output: This parameter is used to
collect the output key-value pairs produced by the reducer. The reducer
aggregates the values for each key (word) and emits the final key-value pairs.
● Reporter reporter: This parameter is used for reporting progress and status
of the reducer job to the Hadoop framework.
● throws IOException: This method may throw an IOException in case of
input/output errors. */

int sum=0;
while (values.hasNext()) {
sum+=values.next().get();
}

/* This loop iterates through all the values associated with the given key. For each
value, it adds the integer value retrieved by get() method of IntWritable to the sum
variable. This effectively calculates the total count of occurrences of the word
represented by the key. */

output.collect(key,new IntWritable(sum));

/* After summing up all the counts for the current word, this line emits the final
key-value pair. The key remains the same (representing the word), while the value is
the total count of occurrences (sum). This key-value pair is collected using the output
object, which is an instance of OutputCollector. This output will be passed on to the
Hadoop framework to be written to the output file. */

}
}

Nano WC_Runner.java

package com.javatpoint;

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WC_Runner {

/ * This line declares a Java class named WC_Runner, which serves as the entry point
for running the MapReduce job. */

public static void main(String[] args) throws IOException{

/* This line defines the main method, which is the starting point of execution for the
Java program. It takes an array of strings args as input arguments and may throw an
IOException. */

JobConf conf = new JobConf(WC_Runner.class);

/* This line creates a new instance of JobConf, which is a configuration class for a
MapReduce job. The constructor takes the class name of the job as an argument
(WC_Runner.class in this case). */

conf.setJobName("WordCount");

/* This line sets the name of the MapReduce job to "WordCount" using the setJobName
method of the JobConf object conf. */

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

/* These lines set the output key and value classes for the MapReduce job. In this
case, the output key is of type Text (representing words) and the output value is of
type IntWritable (representing counts). */

conf.setMapperClass(WC_Mapper.class);
conf.setCombinerClass(WC_Reducer.class);

/* These lines specify the mapper, combiner, and reducer classes for the MapReduce
job. WC_Mapper.class is set as the mapper class, WC_Reducer.class is set as both the
combiner and the reducer class. This indicates that the same reducer logic will be
applied as the combiner logic for intermediate aggregation. */

conf.setReducerClass(WC_Reducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf,new Path(args[1]));

/* These lines specify the input and output paths for the MapReduce job. The input
path is taken from the first argument (args[0]) provided in the command-line
arguments, and the output path is taken from the second argument (args[1]). Both
paths are converted to Path objects using the Path class constructor. */

JobClient.runJob(conf);

/* Finally, this line runs the MapReduce job with the configuration specified in the
object using the runJob method of JobClient. This initiates the execution of the
conf
MapReduce job. */
}
}

javac -classpath "$(hadoop classpath)" -d . WC_Mapper.java WC_Reducer.java

WC_Runner.java

/* This command compiles the three Java files (WC_Mapper.java, WC_Reducer.java,

and WC_Runner.java) using the Hadoop classpath for resolving dependencies and
places the compiled .class files in the current directory. */

jar -cvf wordcount.jar com

/* By running this command, a JAR file named wordcount.jar will be created,

containing all the contents of the com package directory and its subdirectories. This
JAR file can then be used to distribute and execute the Hadoop MapReduce
application. */

hadoop jar /home/hadoop/hadoop/wordcount.jar com.javatpoint.WC_Runner

/test_wc/data1.txt /r_output

/* This command executes the Hadoop MapReduce job defined in the JAR file
/home/hadoop/hadoop/wordcount.jar, using the com.javatpoint.WC_Runner class as
the main entry point. It takes /tes_wc/data1.txt as input and writes the output to
/r_output. */
hdfs dfs -cat /r_output/part-00000

/* when you run hdfs dfs -cat /r_output/part-00000, you'll see the contents of the
/r_output/part-00000 file displayed in the terminal. This file should contain the
output of your MapReduce job, which, in the case of a word count program, would
likely consist of word-count pairs. */

Maxsurf Tutorial1 - Design Simple Hull
100% (1)
Maxsurf Tutorial1 - Design Simple Hull
7 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
No ratings yet
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
5 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
BDA3
No ratings yet
BDA3
7 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Palak
No ratings yet
Palak
10 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Dsbda 11
No ratings yet
Dsbda 11
15 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Wrordcount
No ratings yet
Wrordcount
2 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Lab3 BigData-MapReduce
No ratings yet
Lab3 BigData-MapReduce
8 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Exp 3-Word Count
No ratings yet
Exp 3-Word Count
4 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Wordcount
No ratings yet
Wordcount
3 pages
Cloud LAB 10.1,11.1,12.1
No ratings yet
Cloud LAB 10.1,11.1,12.1
6 pages
Word Count Program
No ratings yet
Word Count Program
2 pages
Hadoop Developingapps PDF
No ratings yet
Hadoop Developingapps PDF
17 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
PART 1 - Install Java and Hadoop On Ubuntu
No ratings yet
PART 1 - Install Java and Hadoop On Ubuntu
4 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
Ex No 04
No ratings yet
Ex No 04
4 pages
Experiment 6 BDA
No ratings yet
Experiment 6 BDA
4 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
MapReduce Word Count Example - Javatpoint
No ratings yet
MapReduce Word Count Example - Javatpoint
12 pages
Sribharanitharan.M 71762234049
No ratings yet
Sribharanitharan.M 71762234049
2 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Word Count
No ratings yet
Word Count
10 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Ant Colony Optimization
100% (1)
Ant Colony Optimization
34 pages
Primavera Training Course Content: Chapter 1: Introduction
100% (1)
Primavera Training Course Content: Chapter 1: Introduction
4 pages
Unit 1 - Object Oriented Programming and Methodology - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Object Oriented Programming and Methodology - WWW - Rgpvnotes.in
16 pages
Oop ST
No ratings yet
Oop ST
10 pages
Chapter 39. PL/PGSQL - SQL Procedural Language: 39.1. Overview
No ratings yet
Chapter 39. PL/PGSQL - SQL Procedural Language: 39.1. Overview
57 pages
Prototyping and RAD
No ratings yet
Prototyping and RAD
21 pages
Aswin Resume Fresher PDF
No ratings yet
Aswin Resume Fresher PDF
2 pages
Research Paper On JVM
No ratings yet
Research Paper On JVM
7 pages
Introduction To Machine Learning - Midterm Quiz 1
No ratings yet
Introduction To Machine Learning - Midterm Quiz 1
10 pages
SSD 1400 14B PDF
No ratings yet
SSD 1400 14B PDF
2 pages
Section 1: Clinics and Departments
No ratings yet
Section 1: Clinics and Departments
6 pages
Project Plan Template in Excel Free
No ratings yet
Project Plan Template in Excel Free
2 pages
Apspdcl LDC Previous Paper 2012
75% (4)
Apspdcl LDC Previous Paper 2012
6 pages
Electronic Instrumentation
0% (1)
Electronic Instrumentation
136 pages
Instructor's Signature With Date: - HOD's Signature With Date: - Print Date: 09.07.2018 Print Time: 19:24:10
No ratings yet
Instructor's Signature With Date: - HOD's Signature With Date: - Print Date: 09.07.2018 Print Time: 19:24:10
5 pages
Cage
No ratings yet
Cage
538 pages
Visibility Techniques
100% (1)
Visibility Techniques
5 pages
Mathematics 9 2nd
No ratings yet
Mathematics 9 2nd
6 pages
VisionView Application Note
No ratings yet
VisionView Application Note
18 pages
Analytic Hierarchy Process: Presented by Gueddouche Roumaissa
No ratings yet
Analytic Hierarchy Process: Presented by Gueddouche Roumaissa
15 pages
1.1 Algorithms
No ratings yet
1.1 Algorithms
68 pages
C by Examples PDF
No ratings yet
C by Examples PDF
6 pages
b2b Project On Tcs
No ratings yet
b2b Project On Tcs
15 pages
Simplescalar Installation
No ratings yet
Simplescalar Installation
4 pages
Cucm Mva Hairpin Cheatsheet
No ratings yet
Cucm Mva Hairpin Cheatsheet
4 pages
Pakistan Map With Eastings Northings Without Google Base Map
No ratings yet
Pakistan Map With Eastings Northings Without Google Base Map
1 page
Gradute Prospectus of AIOU
No ratings yet
Gradute Prospectus of AIOU
41 pages
Mukularanyam English School: A Project Report ON Travel Agency Management System
No ratings yet
Mukularanyam English School: A Project Report ON Travel Agency Management System
71 pages
OOP Java Code Examples
No ratings yet
OOP Java Code Examples
3 pages

B1 Instructions

Uploaded by

B1 Instructions

Uploaded by

Group B- Big Data Analytics

put following text into data1.txt

the Hadoop Distributed File System (HDFS).

Here's what each part of the command does:

root directory of HDFS.

hdfs dfs -put admin:///home/hadoop/hadoop/data1.txt /test_wc

data1.txt will be copied into the /test_wc directory in HDFS.

​ public class WC_Mapper extends MapReduceBase implements

​ private final static IntWritable one = new IntWritable(1);

​ private Text word = new Text();

​ public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable>

​ String line = value.toString();

​ StringTokenizer tokenizer = new StringTokenizer(line);

/* This line creates a StringTokenizer object named tokenizer, which is used to

control+s then control+x

​ public void reduce(Text key, Iterator<IntWritable>

​ public static void main(String[] args) throws IOException{

​ JobConf conf = new JobConf(WC_Runner.class);

javac -classpath "$(hadoop classpath)" -d . WC_Mapper.java WC_Reducer.java

/* This command compiles the three Java files (WC_Mapper.java, WC_Reducer.java,

jar -cvf wordcount.jar com

/* By running this command, a JAR file named wordcount.jar will be created,

hadoop jar /home/hadoop/hadoop/wordcount.jar com.javatpoint.WC_Runner

You might also like

public class WC_Mapper extends MapReduceBase implements

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable>

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

public void reduce(Text key, Iterator<IntWritable>

public static void main(String[] args) throws IOException{

JobConf conf = new JobConf(WC_Runner.class);