How to Execute Character Count Program in MapReduce Hadoop?
Last Updated :
10 Sep, 2020
Prerequisites: Hadoop and MapReduce
Required setup for completing the below task.
- Java Installation
- Hadoop installation
Our task is to count the frequency of each character present in our input file. We are using Java for implementing this particular scenario. However, The MapReduce program can also be written in Python or C++. Execute the below steps to complete the task for finding the occurrence of each character.
Example:
Input
GeeksforGeeks
Output
F 1
G 2
e 4
k 2
o 1
r 1
s 2
Step 1: First Open Eclipse -> then select File -> New -> Java Project ->Name it CharCount -> then select use an execution environment -> choose JavaSE-1.8 then next -> Finish.

Step 2: Create Three Java Classes into the project. Name them CharCountDriver(having the main function), CharCountMapper, CharCountReducer.
Mapper Code: You have to copy and paste this program into the CharCountMapper Java Class file.
Java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class CharCountMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,
Reporter reporter) throws IOException{
String line = value.toString();
String tokenizer[] = line.split( "" );
for (String SingleChar : tokenizer)
{
Text charKey = new Text(SingleChar);
IntWritable One = new IntWritable( 1 );
output.collect(charKey, One);
}
}
}
|
Reducer Code: You have to copy-paste this below program into the CharCountReducer Java Class file.
Java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class CharCountReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text,
IntWritable> {
public void
reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
int sum = 0 ;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
|
Driver Code: You have to copy-paste this below program into the CharCountDriver Java Class file.
Java
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class CharCountDriver {
public static void main(String[] args)
throws IOException
{
JobConf conf = new JobConf(CharCountDriver. class );
conf.setJobName( "CharCount" );
conf.setOutputKeyClass(Text. class );
conf.setOutputValueClass(IntWritable. class );
conf.setMapperClass(CharCountMapper. class );
conf.setCombinerClass(CharCountReducer. class );
conf.setReducerClass(CharCountReducer. class );
conf.setInputFormat(TextInputFormat. class );
conf.setOutputFormat(TextOutputFormat. class );
FileInputFormat.setInputPaths(conf,
new Path(args[ 0 ]));
FileOutputFormat.setOutputPath(conf,
new Path(args[ 1 ]));
JobClient.runJob(conf);
}
}
|
Step 3: Now we need to add an external jar for the packages that we have import. Download the jar package Hadoop Common and Hadoop MapReduce Core according to your Hadoop version. You can check Hadoop Version with the below command:
hadoop version

Step 4: Now we add these external jars to our CharCount project. Right Click on CharCount -> then select Build Path-> Click on Configure Build Path and select Add External jars…. and add jars from it’s download location then click -> Apply and Close.

Step 5: Now export the project as a jar file. Right-click on CharCount choose Export.. and go to Java -> JAR file click -> Next and choose your export destination then click -> Next. Choose Main Class as CharCount by clicking -> Browse and then click -> Finish -> Ok.



Now the Jar file is successfully created and saved at /Documents directory with the name charectercount.jar in my case.
Step 6: Create a simple text file and add some data to it.
nano test.txt
You can also add text to the file manually or using some other editor like Vim or gedit.
To see the content of the file use cat command available in Linux.
cat test.txt

Step 7: Start our Hadoop Daemons
start-dfs.sh
start-yarn.sh

Step 8: Move your test.txt file to the Hadoop HDFS.
Syntax:
hdfs dfs -put /file_path /destination
In below command / shows the root directory of our HDFS.
hdfs dfs -put /home/dikshant/Documents/test.txt /
Check the file is present in the root directory of HDFS or not.
hdfs dfs -ls /

Step 9: Now Run your Jar File with the below command and produce the output in CharCountResult File.
Syntax:
hadoop jar /jar_file_location /dataset_location_in_HDFS /output-file_name
Command:
hadoop jar /home/dikshant/Documents/charectercount.jar /test.txt /CharCountResult

Step 10: Now Move to localhost:50070/, under utilities select Browse the file system and download part-r-00000 in /CharCountResult directory to see result. we can also check the result i.e. that part-r-00000 file with cat command as shown below.
hdfs dfs -cat /CharCountResult/part-00000

Similar Reads
Distributed Cache in Hadoop MapReduce
Hadoop's MapReduce framework provides the facility to cache small to moderate read-only files such as text files, zip files, jar files etc. and broadcast them to all the Datanodes(worker-nodes) where MapReduce job is running. Each Datanode gets a copy of the file(local-copy) which is sent through Di
4 min read
MapReduce Programming Model and its role in Hadoop.
In the Hadoop framework, MapReduce is the programming model. MapReduce utilizes the map and reduce strategy for the analysis of data. In todayâs fast-paced world, there is a huge number of data available, and processing this extensive data is one of the critical tasks to do so. However, the MapReduc
6 min read
How to Configure the Eclipse with Apache Hadoop?
Eclipse is an IDE(Integrated Development Environment) that helps to create and build an application as per our requirement. And Hadoop is used for storing and processing big data. And if you have requirements to configure eclipse with Hadoop then you can follow this section step by step. Here, we wi
2 min read
Sum of even and odd numbers in MapReduce using Cloudera Distribution Hadoop(CDH)
Prerequisites: Hadoop and MapReduce Counting the number of even and odd and finding their sum in any language is a piece of cake like in C, C++, Python, Java, etc. MapReduce also uses Java for the writing the program but it is very easy if you know the syntax how to write it. It is the basic of MapR
4 min read
Hadoop - mrjob Python Library For MapReduce With Example
mrjob is the famous python library for MapReduce developed by YELP. The library helps developers to write MapReduce code using a Python Programming language. Developers can test the MapReduce Python code written with mrjob locally on their system or on the cloud using Amazon EMR(Elastic MapReduce).
3 min read
How MapReduce completes a task?
Application master changes the status for the job to "successful" when it receives a notification that the last task for a job is complete. Then it learns that the job has completed successfully when the Job polls for status. So, a message returns from the waitForCompletion() method after it prints
4 min read
Hadoop - Reducer in Map-Reduce
Map-Reduce is a programming model that is mainly divided into two phases i.e. Map Phase and Reduce Phase. It is designed for processing the data in parallel which is divided on various machines(nodes). The Hadoop Java programs are consist of Mapper class and Reducer class along with the driver class
4 min read
MapReduce Program - Weather Data Analysis For Analyzing Hot And Cold Days
Here, we will write a Map-Reduce program for analyzing weather datasets to understand its data processing programming model. Weather sensors are collecting weather information across the globe in a large volume of log data. This weather data is semi-structured and record-oriented.This data is stored
6 min read
Difference between MapReduce and Pig
MapReduce is a model that works over Hadoop to access big data efficiently stored in HDFS (Hadoop Distributed File System). It is the core component of Hadoop, which divides the big data into small chunks and process them parallelly. Features of MapReduce: It can store and distribute huge data acros
2 min read
MapReduce Program - Finding The Average Age of Male and Female Died in Titanic Disaster
All of us are familiar with the disaster that happened on April 14, 1912. The big giant ship of 46000-ton in weight got sink-down to the depth of 13,000 feet in the North Atlantic Ocean. Our aim is to analyze the data obtained after this disaster. Hadoop MapReduce can be utilized to deal with this l
5 min read