BDA - LAB Manual
BDA - LAB Manual
AIM:
To work with HDFS commands.
PROCEDURE:
Step 1: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 2: Open a new command prompt and start working with HDFS commands.
COMMANDS:
Objective: To copy a file from local file system to HDFS via copyFromLocal command.
Act:
C:\Users\CSE>hdfs dfs -copyFromLocal C:/salary.txt /bigdata/salary.txt
Objective: To get the list of complete directories and files of HDFS.
Act:
C:\Users\CSE>hdfs dfs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /analytics
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:51 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt
Objective: To display the contents of an HDFS file by the name word.txt on console.
Act:
C:\Users\CSE>hdfs dfs -cat /bigdata/word.txt
Hadoop,action,BigData,Export,update,visualization,word,count,file,mongodb,aggregate,hdfs,ins
ert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,planning,administration,update,file
,mongodb,dataset,Hadoop,MapReduce,food,e-health,cancer,diagnosis
Objective: To count the number of directories, files, and bytes under the specified path.
Act:
C:\Users\CSE>hdfs dfs -count /
3 516510 /
Objective: To show disk usage, in bytes, for all the files present on the path specified.
Act:
C:\Users\CSE>hdfs dfs -du /bigdata
8092 /bigdata/salary.txt
0 /bigdata/sample.txt
258 /bigdata/word.txt
Objective:To append the content of the local file to the specified destination file on HDFS.The
destination file will be created if it does not exist. If local file is specified as -, then the input is
read from stdin.
Act:
C:\Users\CSE>hdfs dfs -appendToFile C:/word.txt /analytics/student.txt
RESULT:
Thus the working of HDFS commands has been done successfully.
Ex No:02 Roll no:
Date: Page no:
AIM:
To perform the file management tasks in Hadoop.
PROCEDURE:
Step 1: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 2: Open a new command prompt and start performing file management tasks in Hadoop.
COMMANDS:
Objective: To copy a file from local file system to HDFS via copyFromLocal command.
Act:
C:\Users\CSE>hadoop fs -copyFromLocal C:/salary.txt /bigdata/salary.txt
Objective: To display the contents of an HDFS file by the name word.txt on console.
Act:
C:\Users\CSE>hadoop fs -cat /bigdata/word.txt
Hadoop,action,BigData,Export,update,visualization,word,count,file,mongodb,aggregate,hdfs,ins
ert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,planning,administration,update,file
,mongodb,dataset,Hadoop,MapReduce,food,e-health,cancer,diagnosis
C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:56 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:54 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /bigdata/student.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt
C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /analytics/student.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt
Objective: To show disk usage, in bytes, for all the files present on the path specified.
Act:
C:\Users\CSE>hadoop fs -du /bigdata
8092 /bigdata/salary.txt
0 /bigdata/sample.txt
258 /bigdata/word.txt
Objective:To append the content of the local file to the specified destination file on HDFS.The
destination file will be created if it does not exist. If local file is specified as -, then the input is
read from stdin.
Act:
C:\Users\CSE>hadoop fs -appendToFile C:/word.txt /analytics/student.txt
C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:00 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 0 2019-09-02 15:00 /bigdata/sample.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt
C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:14 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt
RESULT:
Thus the file management tasks in Hadoop has been done successfully.
Ex No:03 Roll no:
Date: Page no:
MAP REDUCE PROGRAM FOR RETRIEVING THE SUM AND AVERAGE SALARY
OF EMPLOYEES IN EVERY DEPARTMENT
AIM:
To write a Map Reduce program for retrieving the sum and average salary of employees
in every department.
PROCEDURE:
Step 1: Open Eclipse and select File -> New -> Java Project -> (Name it -MRDemo) -> Finish.
Step 2: Right Click the project name and select New -> Package (Name it - com.app) -> Finish.
Step 3: Right Click the package name and select New -> Class (Name it - SumAvgSalary).
Step 4: Add the following reference libraries by right clicking the project name and select
properties -> Java Build Path -> Add External JARs
hadoop-core-0.20.0.jar
org.apache.commons.cli-1.2.0.jar
Step 5: Write the Map Reduce program in SumAvgSalary.java file and save the file.
Step 6: Make the project jar file by right clicking the project name and select Export -> Select
export destination as Jar File under Java ->click next (Give the JAR file name as
MRDemo.jar) ->Finish.
Step 7: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 8: Move the input file into HDFS by typing the following in the command prompt
hadoop fs -put C:/salary.txt /salary.txt
Step 9: Run the jar file by typing the following in the command prompt
hadoop jar MRDemo.jar com.app.SumAvgSalary /salary.txt /salarysumavg
Step 10:Check the output by typing the following in the command prompt
hadoop fs -ls /salarysumavg
hadoop fs -cat /salarysumavg/part-r-00000
PROGRAM:
SumAvgSalary.java:
package com.app;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;
public class SumAvgSalary
{
public static void main(String[] args) throws IOException, InterruptedException,
ClassNotFoundException
{
Job job=new Job();
job.setJobName("SumAvgSalary");
job.setJarByClass(SumAvgSalary.class);
job.setMapperClass(SumAvgSalaryMap.class);
job.setReducerClass(SumAvgSalaryRed.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);
FileInputFormat.addInputPath(job, new Path("/salary.txt"));
FileOutputFormat.setOutputPath(job, new Path("/salarysumavg"));
System.exit(job.waitForCompletion(true)?0:1);
}
public static class SumAvgSalaryMap extends Mapper<LongWritable, Text, Text,
FloatWritable>
{
public void map(LongWritable key, Text empRecord, Context con) throws
IOException, InterruptedException
{
String[] word = empRecord.toString().split("\\t");
String un = word[7];
Float salary = Float.parseFloat(word[8]);
con.write(new Text(un), new FloatWritable(salary));
}
}
public static class SumAvgSalaryRed extends Reducer<Text, FloatWritable, Text, Text>
{
public void reduce(Text key, Iterable<FloatWritable>valueList,Context con)
throws IOException, InterruptedException
{
Float total = (float) 0;
int count=0;
for (FloatWritablevar :valueList)
{
total += var.get();
count++;
}
Float avg = (Float) total / count;
String out = "Total: " + total + " " + "Average: " + avg;
con.write(key, new Text(out));
}
}
}
INPUT:
OUTPUT:
RESULT:
Thus the Map Reduce program for retrieving the sum and average salary of employees in
every department has been written, executed and the output is verified successfully.
Ex No:04 Roll no:
Date: Page no:
AIM:
To write a Map Reduce program for finding the unit wise salary.
PROCEDURE:
Step 1: Open Eclipse and select File -> New -> Java Project -> (Name it -MRDemo) -> Finish.
Step 2: Right Click the project name and select New -> Package (Name it -com.app) -> Finish.
Step 3: Right Click the package name and select New -> Class (Name it - UnitWiseSalary).
Step 4: Add the following reference libraries by right clicking the project name and select
properties -> Java Build Path -> Add External JARs
hadoop-core-0.20.0.jar
org.apache.commons.cli-1.2.0.jar
Step 5: Write the Map Reduce program in UnitWiseSalary.java file and save the file.
Step 6: Make the project jar file by right clicking the project name and select Export -> Select
export destination as Jar File under Java ->click next (Give the JAR file name as
MRDemo.jar) ->Finish.
Step 7: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 8: Move the input file into HDFS by typing the following in the command prompt
hadoop fs -put C:/salary.txt /salary.txt
Step 9: Run the jar file by typing the following in the command prompt
hadoop jar MRDemo.jar com.app.UnitWiseSalary /salary.txt /salarysum
Step 10:Check the output by typing the following in the command prompt
hadoop fs -ls /salarysum
hadoop fs -cat /salarysum/part-r-00000
PROGRAM:
UnitWiseSalary.java:
package com.app;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;
INPUT:
OUTPUT:
RESULT:
Thus the Map Reduce program for finding the unit wise salary has been written, executed
and the output is verified successfully.
Ex No:05 Roll no:
Date: Page no:
AIM:
To demonstrate CRUD (Create, Read, Update and Delete) operations in MongoDB.
PROCEDURE:
Step 1: Open a command prompt and navigate to the bin directory present in the MongoDB
installation folder. Installation folder is C:\Program Files\MongoDB\Server\4.2.
Step 2: To get into the MongoDB shell, type the command “mongo.exe” in the command
prompt.
CRUD OPERATIONS:
CREATE:
To create a collection in a database, use db.createCollection() method. Collection can
also be created automatically, when some document is inserted.
>db.createCollection("Students")
{ "ok" : 1 }
INSERT:
To insert a document into a collection, use insert() method.
>db.Students.insert({_id:1,StudName:"Michelle Jacintha",Grade:"VII",Hobbies:"Internet
Surfing"});
WriteResult({ "nInserted" : 1 })
>db.Students.insert({_id:2,StudName:"Mabel Mathews",Grade:"VII",Hobbies:"Baseball"});
WriteResult({ "nInserted" : 1 })
>db.Students.insert({_id:3,StudName:"Aryan David",Grade:"VII",Hobbies:"Skatting"});
WriteResult({ "nInserted" : 1 })
>db.Students.insert({_id:4,StudName:"HersehGibbs",Grade:"VII",Hobbies:"Graffiti"});
WriteResult({ "nInserted" : 1 })
READ:
To read or display the documents from a collection, use find() method. The pretty()
method is used to format the result.
>db.Students.find().pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Skatting"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
READ:
>db.Students.find().pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}
READ:
To find a document wherein the “StudName” has the value “Aryan David”.
>db.Students.find({StudName:"Aryan David"}).pretty();
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
READ:
To display only the StudName from all the documents of the Students collection. The
identifier “_id” can be suppressed and not mentioned by specifying it as 0.
>db.Students.find({},{StudName:1,_id:0});
{ "StudName" : "Michelle Jacintha" }
{ "StudName" : "Mabel Mathews" }
{ "StudName" : "Aryan David" }
{ "StudName" : "Herseh Gibbs" }
{ "StudName" : "Vamsi Bapat" }
READ:
To find those documents where the Grade is set to “VII”. The relational operator $eq can
be used.
>db.Students.find({Grade:{$eq:'VII'}}).pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}
READ:
To find those documents where the Hobbies is not equal to “Baseball”. The relational
operator $ne can be used.
>db.Students.find({Hobbies:{$ne:'Baseball'}}).pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}
READ:
To find those documents where the Hobbies is set to either “Chess” or is set to “Skating”.
The operator $in can be used.
>db.Students.find({Hobbies:{$in:['Chess','Skating']}});
{ "_id" : 3, "StudName" : "Aryan David", "Grade" : "VII", "Hobbies" : "Chess" }
READ:
To find those documents where the Hobbies is set neither to “Chess” nor is set to
“Skating”. The operator $nin can be used.
>db.Students.find({Hobbies:{$nin:['Chess','Skating']}});
{ "_id" : 1, "StudName" : "Michelle Jacintha", "Grade" : "VII", "Hobbies" : "Internet Surfing" }
{ "_id" : 2, "StudName" : "Mabel Mathews", "Grade" : "VII", "Hobbies" : "Baseball" }
{ "_id" : 4, "StudName" : "Herseh Gibbs", "Grade" : "VII", "Hobbies" : "Graffiti" }
{ "_id" : 5, "StudName" : "Vamsi Bapat", "Grade" : "VII", "Hobbies" : "Cricket" }
DELETE:
To delete a document where the “_id” is set to 4.
>db.Students.remove({_id:4});
WriteResult({ "nRemoved" : 1 })
DELETE:
To delete all documents from the collection “Students”.
>db.Students.remove({});
WriteResult({ "nRemoved" : 4 })
RESULT:
Thus the CRUD (Create, Read, Update and Delete) operations in MongoDB has been
executed and the output is verified successfully.
Ex No:06 Roll no:
Date: Page no:
PROCEDURE:
Step 1: Pick any public dataset from the site www.kdnuggets.com with at least two numeric
Columns, convert it into CSV format, name the file as sample.csv and save it in C drive.
Step 2: Open a command prompt and navigate to the MongoDB installation folder. Installation
folder is C:\Program Files\MongoDB\Server\4.2.
Step 3: Use Mongoimport to import data from CSV format file into MongoDB collection,
“sample” in test database.
mongoimport --db test --collection sample --type csv --headerline --file C:\sample.csv
Step 4: Navigate to the bin directory present in the MongoDB installation folder specified above.
Step 5: To get into the MongoDB shell, type the command mongo.exe in the command propmpt.
Step 6: Compute the average of the values in the second numeric column using aggregate
operations in MongoDB.
Step 7: Type exit and come out of the MongoDB shell.
Step 8: Use Mongoexport to export documents of the sample collection in the test database into
CSV format file.
IMPORT IN MONGODB:
At the command prompt, execute the following command:
mongoimport --db test --collection sample --type csv --headerline --file C:\sample.csv
INPUT:
OUTPUT:
AGGREGATION IN MONGODB:
db.sample.aggregate({$group:{_id:"$_unit_id",Tot_choose_one:{$avg:"$choose_one"}}});
OUTPUT:
EXPORT IN MONGODB:
At the command prompt, execute the following command:
mongoexport --db test --collection sample --fieldFile C:\sample.xlsx --out d:\output.txt
OUTPUT:
RESULT:
Thus the import, export and aggregation in MongoDB has been executed and the output
is verified successfully.
Ex No:07 Roll no:
Date: Page no:
AIM:
To demonstrate the Top N and Bottom N view on the worksheet for a dataset using
Tableau visualization tool.
PROCEDURE:
Step 1: Open Tableau visualization tool by clicking on the Tableau icon in the desktop.
Step 2: Select the data onto Tableau by choosing the Sample-Superstore.xls file present under
the Saved Data Sources menu in the home page. A worksheet is shown where the chart is created
and displayed.
Step 3: Drag the dimension Product Name under Product to the Rows shelf and the Measures
Profit to the Columns shelf. Choose the chart type as Bar in the Marks section. Tableau shows
the following chart.
Step 4: Right click on the field Product Name and select sort. Choose Sort by as field, Sort Order
as Descending.
Step 5: Right click on the Top Customers under Paramters and click Edit. Give the name as Top
and Bottom Products, change the current value as 10 and click OK.
Step 6: Right click on the field Product Name and select Filter. In General tab, choose Use all
radio option. In the Top tab, choose the second radio option By Field. In the second drop-down,
choose the Top and Bottom categories, click Apply and the OK. The top 10 Sub-Category of
products by profit is now shown in the chart.
Step 7: Right click on the filter Product Name under Filter section, select Create Set. Name the
set as Top 10 Products. In the Top tab, choose the second radio option By Field. In the first drop-
down, choose Top and in second drop-down, choose the Top and Bottom Products, click OK.
Step 8: Right click on the filter Product Name under Filter section, select Create Set. Name the
set as Bottom 10 Products. In the Top tab, choose the second radio option By Field. In the first
drop-down, choose Bottom and in second drop-down, choose the Top and Bottom Products,
click OK.
Step 9: Right click on the Top 10 Customers under sets and select Create Combined Set. Name
the set as Top 10 and Bottom 10 Profit Products. In the second drop-down, select Bottom 10
Products and click OK.
Step 10: Right click on the Product Name in the Filters section and select remove.
Step 11: Drag Top 10 and Bottom 10 Profit Products Set from Sets section to the Filters section.
Step 12: Now click on the presenatation icon or F7 to view the Top 10 and Bottom 10 Products
based on the Profit in a single view.
RESULT:
Thus the Top N and Bottom N view on the worksheet for a dataset using Tableau
visualization tool has been demonstrated and the output is verified successfully.
Ex No:08 Roll no:
Date: Page no:
AIM:
To write a Map Reduce program for counting the occurrences of similar words in a file.
PROCEDURE:
Step 1: Open Eclipse and select File -> New -> Java Project -> (Name it -MRDemo) -> Finish.
Step 2: Right Click the project name and select New -> Package (Name it - com.app) -> Finish.
Step 3: Right Click the package name and select New -> Class (Name it - WordCounter).
Step 4: Add the following reference libraries by right clicking theproject name and select
properties -> Java Build Path -> Add External JARs
hadoop-core-0.20.0.jar
org.apache.commons.cli-1.2.0.jar
Step 5: Write the Map Reduce program in WordCounter.java file and save the file.
Step 6: Make the project jar file by right clicking the project name and select Export -> Select
export destination as Jar File under Java ->click next (Give the JAR file name as
MRDemo.jar) ->Finish.
Step 7: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 8: Move theinput file into HDFS by typing the following in the command prompt
hadoop fs -put C:/word.txt /word.txt
Step 9: Run the jar file by typing the following in the command prompt
hadoop jar MRDemo.jar com.app.WordCounter /word.txt /wordcount
Step 10:Check the output by typing the following in the command prompt
hadoop fs -ls /wordcount
hadoop fs -cat /wordcount/part-r-00000
PROGRAM:
WordCounter.java:
package com.app;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;
public class WordCounter
{
public static void main(String [] args) throws IOException, InterruptedException,
ClassNotFoundException
{
Job job=new Job();
job.setJobName("WordCounter");
job.setJarByClass(WordCounter.class);
job.setMapperClass(WordCounterMap.class);
job.setReducerClass(WordCounterRed.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/word.txt"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount"));
System.exit(job.waitForCompletion(true)?0:1);
}
public static class WordCounterMap extends Mapper<LongWritable, Text, Text,
IntWritable>
{
@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException
{
String[] words=value.toString().split(",");
for(String word: words )
{
context.write(new Text(word), new IntWritable(1));
}
}
}
public static class WordCounterRed extends Reducer<Text, IntWritable, Text,
IntWritable>
{
@Override
public void reduce(Text word, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
Integer count=0;
for(IntWritableval : values)
{
count += val.get();
}
context.write(word, new IntWritable(count));
}
}
}
INPUT:
OUTPUT:
RESULT:
Thus the Map Reduce program for finding the occurrences of similar words in a file has
been written, executed and the output is verified successfully.