0% found this document useful (0 votes)
12 views44 pages

BDA - LAB Manual

The document outlines procedures for working with HDFS commands and performing file management tasks in Hadoop. It includes steps for starting Hadoop, creating directories, copying files, displaying file contents, and managing files within HDFS. The document provides specific command-line instructions and expected outcomes for various file operations in Hadoop's file system.

Uploaded by

Gopika Gopika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

BDA - LAB Manual

The document outlines procedures for working with HDFS commands and performing file management tasks in Hadoop. It includes steps for starting Hadoop, creating directories, copying files, displaying file contents, and managing files within HDFS. The document provides specific command-line instructions and expected outcomes for various file operations in Hadoop's file system.

Uploaded by

Gopika Gopika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Ex No:01 Roll no:

Date: Page no:

WORKING WITH HDFS COMMANDS

AIM:
To work with HDFS commands.

PROCEDURE:

Step 1: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 2: Open a new command prompt and start working with HDFS commands.

COMMANDS:

Objective: To print the Hadoop version.


Act:
C:\Users\CSE>hadoop version
Hadoop 2.8.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
91f2b7a13d1e97be65db92ddabc627cc29ac0009
Compiled by jdu on 2017-03-17T04:12Z
Compiled with protoc 2.5.0
From source with checksum 60125541c2b3e266cbf3becc5bda666
This command was run using /C:/hadoop-2.8.0/share/hadoop/common/hadoop-common-2.8.0.jar

Objective: To create a directory by the name „bigdata‟and „analytics‟ in HDFS.


Act:
C:\Users\CSE>hdfs dfs -mkdir /bigdata
C:\Users\CSE>hdfs dfs -mkdir /analytics

Objective: To get the list of directories at the root of HDFS.


Act:
C:\Users\CSE>hdfs dfs -ls /
Found 2 items
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /analytics
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /bigdata
Objective: To copy a file from local file system to HDFS.
Act:
C:\Users\CSE>hdfs dfs -put C:/word.txt /bigdata/word.txt

Objective: To copy a file from local file system to HDFS via copyFromLocal command.
Act:
C:\Users\CSE>hdfs dfs -copyFromLocal C:/salary.txt /bigdata/salary.txt
Objective: To get the list of complete directories and files of HDFS.
Act:
C:\Users\CSE>hdfs dfs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /analytics
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:51 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To display the contents of an HDFS file by the name word.txt on console.
Act:
C:\Users\CSE>hdfs dfs -cat /bigdata/word.txt
Hadoop,action,BigData,Export,update,visualization,word,count,file,mongodb,aggregate,hdfs,ins
ert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,planning,administration,update,file
,mongodb,dataset,Hadoop,MapReduce,food,e-health,cancer,diagnosis

Objective: To copy a file from one directory to another on HDFS.


Act:
C:\Users\CSE>hdfs dfs -cp /bigdata/salary.txt /analytics/salary.txt

C:\Users\CSE>hdfs dfs -ls -R /


drwxr-xr-x - CSE supergroup 0 2019-09-02 14:56 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:54 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /bigdata/student.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To move a file from one directory to another on HDFS.


Act:
C:\Users\CSE>hdfs dfs -mv /bigdata/student.txt /analytics/student.txt

C:\Users\CSE>hdfs dfs -ls -R /


drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /analytics/student.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To create a file in HDFS with file size 0 bytes.


Act:
C:\Users\CSE>hdfs dfs -touchz /bigdata/sample.txt

C:\Users\CSE>hdfs dfs -ls -R /


drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /analytics/student.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:00 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 0 2019-09-02 15:00 /bigdata/sample.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To count the number of directories, files, and bytes under the specified path.
Act:
C:\Users\CSE>hdfs dfs -count /
3 516510 /

Objective: To show the last 1 KB of the file on console or stdout.


Act:
C:\Users\CSE>hdfs dfs -tail /bigdata/salary.txt
H_CLERK 3900 123 50
194 Samuel McCain F SMCCAIN 650.501.3876 01-JUL-06 SH_CLERK 3200
123 50
195 Vance Jones M VJONES 650.501.4876 17-MAR-07 SH_CLERK 2800
123 50
196 Alana Walsh M AWALSH 650.507.9811 24-APR-06 SH_CLERK 3100
124 50
197 Kevin Feeney M KFEENEY 650.507.9822 23-MAY-06 SH_CLERK 3000
124 50
198 Donald OConnell M DOCONNEL 650.507.9833 21-JUN-07
SH_CLERK 2600
124 50
199 Douglas Grant F DGRANT 650.507.9844 13-JAN-08 SH_CLERK 2600
124 50
200 Jennifer Whalen M JWHALEN 515.123.4444 17-SEP-03 AD_ASST
4400 101 10
201 Michael Hartstein M MHARTSTE 515.123.5555 17-FEB-04
MK_MAN 13000 100
20
202 Pat Fay M PFAY 603.123.6666 17-AUG-05 MK_REP 6000 201
20
203 Susan Mavris F SMAVRIS 515.123.7777 07-JUN-02 HR_REP 6500 101
40
204 Hermann Baer M HBAER 515.123.8888 07-JUN-02 PR_REP 10000 101
70
205 Shelley Higgins M SHIGGINS 515.123.8080 07-JUN-02 AC_MGR
12008 101 110
206 William Gietz M WGIETZ 515.123.8181 07-JUN-02 AC_ACCOUNT 8300
205 110
7 NIKHIL RANJAN M NIKSR 515.123.4568 21-SEP-11 AD_VP 20400
100 90

Objective: To show disk usage, in bytes, for all the files present on the path specified.
Act:
C:\Users\CSE>hdfs dfs -du /bigdata
8092 /bigdata/salary.txt
0 /bigdata/sample.txt
258 /bigdata/word.txt

Objective:To empty the trash in Hadoop file system.


Act:
C:\Users\CSE>hdfs dfs -expunge

Objective:To append the content of the local file to the specified destination file on HDFS.The
destination file will be created if it does not exist. If local file is specified as -, then the input is
read from stdin.
Act:
C:\Users\CSE>hdfs dfs -appendToFile C:/word.txt /analytics/student.txt

C:\Users\CSE>hdfs dfs -cat /analytics/student.txt


1001,John,45
1002,Jack,39
1003,Alex,44
1004,Smith,38
1005,bob,33Hadoop,action,Big Data,Export,update,visualization,word,count,file,mongodb,
aggregate,hdfs,insert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,
planning,administration,update,file,mongodb,dataset,Hadoop,MapReduce,food,
e-health,cancer,diagnosis

C:\Users\CSE>hdfs dfs -appendToFile - /analytics/student.txt


NoSQL database used here is MongoDB

C:\Users\CSE>hdfs dfs -cat /analytics/student.txt


1001,John,45
1002,Jack,39
1003,Alex,44
1004,Smith,38
1005,bob,33Hadoop,action,Big Data,Export,update,visualization,word,count,file,mongodb,
aggregate,hdfs,insert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,
planning,administration,update,file,mongodb,dataset,Hadoop,MapReduce,food,
e-health,cancer,diagnosis
NoSQL database used here is MongoDB

Objective: To remove an empty directory from HDFS.


Act:
C:\Users\CSE>hdfs dfs -rmdir /lab

Objective: To remove a directory and its contents from HDFS.


Act:
C:\Users\CSE>hdfs dfs -rm -r /analytics
Deleted /analytics

C:\Users\CSE>hdfs dfs -ls -R /


drwxr-xr-x - CSE supergroup 0 2019-09-02 15:00 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 0 2019-09-02 15:00 /bigdata/sample.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To remove a file from HDFS.


Act:
C:\Users\CSE>hdfs dfs -rm /bigdata/sample.txt
Deleted /bigdata/sample.txt
C:\Users\CSE>hdfs dfs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:14 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

RESULT:
Thus the working of HDFS commands has been done successfully.
Ex No:02 Roll no:
Date: Page no:

FILE MANAGEMENT TASKS IN HADOOP

AIM:
To perform the file management tasks in Hadoop.

PROCEDURE:

Step 1: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 2: Open a new command prompt and start performing file management tasks in Hadoop.

COMMANDS:

Objective: To create a directory by the name „bigdata‟and „analytics‟ in HDFS.


Act:
C:\Users\CSE>hadoop fs -mkdir /bigdata
C:\Users\CSE>hadoop fs -mkdir /analytics

Objective: To get the list of directories at the root of HDFS.


Act:
C:\Users\CSE>hadoop fs -ls /
Found 2 items
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /analytics
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /bigdata

Objective: To copy a file from local file system to HDFS.


Act:
C:\Users\CSE>hadoop fs -put C:/word.txt /bigdata/word.txt

Objective: To copy a file from local file system to HDFS via copyFromLocal command.
Act:
C:\Users\CSE>hadoop fs -copyFromLocal C:/salary.txt /bigdata/salary.txt

Objective: To get the list of complete directories and files of HDFS.


Act:
C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 13:57 /analytics
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:51 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To display the contents of an HDFS file by the name word.txt on console.
Act:
C:\Users\CSE>hadoop fs -cat /bigdata/word.txt
Hadoop,action,BigData,Export,update,visualization,word,count,file,mongodb,aggregate,hdfs,ins
ert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,planning,administration,update,file
,mongodb,dataset,Hadoop,MapReduce,food,e-health,cancer,diagnosis

Objective: To copy a file from one directory to another on HDFS.


Act:
C:\Users\CSE>hadoop fs -cp /bigdata/salary.txt /analytics/salary.txt

C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:56 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:54 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /bigdata/student.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To move a file from one directory to another on HDFS.


Act:
C:\Users\CSE>hadoop fs -mv /bigdata/student.txt /analytics/student.txt

C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /analytics/student.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To create a file in HDFS with file size 0 bytes.


Act:
C:\Users\CSE>hadoop fs -touchz /bigdata/sample.txt
C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 14:58 /analytics
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:56 /analytics/salary.txt
-rw-r--r-- 1 CSE supergroup 68 2019-09-02 14:54 /analytics/student.txt
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:00 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 0 2019-09-02 15:00 /bigdata/sample.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To show the last 1 KB of the file on console or stdout.


Act:
C:\Users\CSE>hadoop fs -tail /bigdata/salary.txt
197 Kevin Feeney M KFEENEY 650.507.9822 23-MAY-06 SH_CLERK 3000
124 50
198 Donald OConnell M DOCONNEL 650.507.9833 21-JUN-07
SH_CLERK 2600
124 50
199 Douglas Grant F DGRANT 650.507.9844 13-JAN-08 SH_CLERK 2600
124 50
200 Jennifer Whalen M JWHALEN 515.123.4444 17-SEP-03 AD_ASST
4400 101 10
201 Michael Hartstein M MHARTSTE 515.123.5555 17-FEB-04
MK_MAN 13000 100
20
202 Pat Fay M PFAY 603.123.6666 17-AUG-05 MK_REP 6000 201
20
203 Susan Mavris F SMAVRIS 515.123.7777 07-JUN-02 HR_REP 6500 101
40
204 Hermann Baer M HBAER 515.123.8888 07-JUN-02 PR_REP 10000 101
70
205 Shelley Higgins M SHIGGINS 515.123.8080 07-JUN-02 AC_MGR
12008 101 110
206 William Gietz M WGIETZ 515.123.8181 07-JUN-02 AC_ACCOUNT 8300
205 110
7 NIKHIL RANJAN M NIKSR 515.123.4568 21-SEP-11 AD_VP 20400
100 90

Objective: To show disk usage, in bytes, for all the files present on the path specified.
Act:
C:\Users\CSE>hadoop fs -du /bigdata
8092 /bigdata/salary.txt
0 /bigdata/sample.txt
258 /bigdata/word.txt

Objective:To append the content of the local file to the specified destination file on HDFS.The
destination file will be created if it does not exist. If local file is specified as -, then the input is
read from stdin.
Act:
C:\Users\CSE>hadoop fs -appendToFile C:/word.txt /analytics/student.txt

C:\Users\CSE>hadoop fs -cat /analytics/student.txt


1001,John,45
1002,Jack,39
1003,Alex,44
1004,Smith,38
1005,bob,33Hadoop,action,Big Data,Export,update,visualization,word,count,file,mongodb,
aggregate,hdfs,insert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,
planning,administration,update,file,mongodb,dataset,Hadoop,MapReduce,food,
e-health,cancer,diagnosis

C:\Users\CSE>hadoop fs -appendToFile - /analytics/student.txt


NoSQL database used here is MongoDB

C:\Users\CSE>hadoop fs -cat /analytics/student.txt


1001,John,45
1002,Jack,39
1003,Alex,44
1004,Smith,38
1005,bob,33Hadoop,action,Big Data,Export,update,visualization,word,count,file,mongodb,
aggregate,hdfs,insert,MapReduce,dataset,tableau,tool,action,Hadoop,count,hdfs,
planning,administration,update,file,mongodb,dataset,Hadoop,MapReduce,food,
e-health,cancer,diagnosis
NoSQL database used here is MongoDB

Objective: To remove an empty directory from HDFS.


Act:
C:\Users\CSE>hadoop fs -rmdir /lab

Objective: To remove a directory and its contents from HDFS.


Act:
C:\Users\CSE>hadoop fs -rm -r /analytics
Deleted /analytics

C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:00 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 0 2019-09-02 15:00 /bigdata/sample.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

Objective: To remove a file from HDFS.


Act:
C:\Users\CSE>hadoop fs -rm /bigdata/sample.txt
Deleted /bigdata/sample.txt

C:\Users\CSE>hadoop fs -ls -R /
drwxr-xr-x - CSE supergroup 0 2019-09-02 15:14 /bigdata
-rw-r--r-- 1 CSE supergroup 8092 2019-09-02 14:51 /bigdata/salary.txt
-rw-r--r-- 1 CSE supergroup 258 2019-09-02 14:49 /bigdata/word.txt

RESULT:
Thus the file management tasks in Hadoop has been done successfully.
Ex No:03 Roll no:
Date: Page no:

MAP REDUCE PROGRAM FOR RETRIEVING THE SUM AND AVERAGE SALARY
OF EMPLOYEES IN EVERY DEPARTMENT

AIM:
To write a Map Reduce program for retrieving the sum and average salary of employees
in every department.

PROCEDURE:
Step 1: Open Eclipse and select File -> New -> Java Project -> (Name it -MRDemo) -> Finish.
Step 2: Right Click the project name and select New -> Package (Name it - com.app) -> Finish.
Step 3: Right Click the package name and select New -> Class (Name it - SumAvgSalary).
Step 4: Add the following reference libraries by right clicking the project name and select
properties -> Java Build Path -> Add External JARs
 hadoop-core-0.20.0.jar
 org.apache.commons.cli-1.2.0.jar
Step 5: Write the Map Reduce program in SumAvgSalary.java file and save the file.
Step 6: Make the project jar file by right clicking the project name and select Export -> Select
export destination as Jar File under Java ->click next (Give the JAR file name as
MRDemo.jar) ->Finish.
Step 7: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 8: Move the input file into HDFS by typing the following in the command prompt
hadoop fs -put C:/salary.txt /salary.txt
Step 9: Run the jar file by typing the following in the command prompt
hadoop jar MRDemo.jar com.app.SumAvgSalary /salary.txt /salarysumavg
Step 10:Check the output by typing the following in the command prompt
hadoop fs -ls /salarysumavg
hadoop fs -cat /salarysumavg/part-r-00000

PROGRAM:
SumAvgSalary.java:

package com.app;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;
public class SumAvgSalary
{
public static void main(String[] args) throws IOException, InterruptedException,
ClassNotFoundException
{
Job job=new Job();
job.setJobName("SumAvgSalary");
job.setJarByClass(SumAvgSalary.class);
job.setMapperClass(SumAvgSalaryMap.class);
job.setReducerClass(SumAvgSalaryRed.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);
FileInputFormat.addInputPath(job, new Path("/salary.txt"));
FileOutputFormat.setOutputPath(job, new Path("/salarysumavg"));
System.exit(job.waitForCompletion(true)?0:1);
}
public static class SumAvgSalaryMap extends Mapper<LongWritable, Text, Text,
FloatWritable>
{
public void map(LongWritable key, Text empRecord, Context con) throws
IOException, InterruptedException
{
String[] word = empRecord.toString().split("\\t");
String un = word[7];
Float salary = Float.parseFloat(word[8]);
con.write(new Text(un), new FloatWritable(salary));
}
}
public static class SumAvgSalaryRed extends Reducer<Text, FloatWritable, Text, Text>
{
public void reduce(Text key, Iterable<FloatWritable>valueList,Context con)
throws IOException, InterruptedException
{
Float total = (float) 0;
int count=0;
for (FloatWritablevar :valueList)
{
total += var.get();
count++;
}
Float avg = (Float) total / count;
String out = "Total: " + total + " " + "Average: " + avg;
con.write(key, new Text(out));
}
}
}

INPUT:

OUTPUT:

RESULT:
Thus the Map Reduce program for retrieving the sum and average salary of employees in
every department has been written, executed and the output is verified successfully.
Ex No:04 Roll no:
Date: Page no:

MAP REDUCE PROGRAM FOR FINDING THE UNIT WISE SALARY

AIM:
To write a Map Reduce program for finding the unit wise salary.

PROCEDURE:

Step 1: Open Eclipse and select File -> New -> Java Project -> (Name it -MRDemo) -> Finish.
Step 2: Right Click the project name and select New -> Package (Name it -com.app) -> Finish.
Step 3: Right Click the package name and select New -> Class (Name it - UnitWiseSalary).
Step 4: Add the following reference libraries by right clicking the project name and select
properties -> Java Build Path -> Add External JARs
 hadoop-core-0.20.0.jar
 org.apache.commons.cli-1.2.0.jar
Step 5: Write the Map Reduce program in UnitWiseSalary.java file and save the file.
Step 6: Make the project jar file by right clicking the project name and select Export -> Select
export destination as Jar File under Java ->click next (Give the JAR file name as
MRDemo.jar) ->Finish.
Step 7: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 8: Move the input file into HDFS by typing the following in the command prompt
hadoop fs -put C:/salary.txt /salary.txt
Step 9: Run the jar file by typing the following in the command prompt
hadoop jar MRDemo.jar com.app.UnitWiseSalary /salary.txt /salarysum
Step 10:Check the output by typing the following in the command prompt
hadoop fs -ls /salarysum
hadoop fs -cat /salarysum/part-r-00000

PROGRAM:
UnitWiseSalary.java:

package com.app;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;

public class UnitWiseSalary


{
public static void main(String[] args) throws IOException, InterruptedException,
ClassNotFoundException
{
Job job=new Job();
job.setJobName("UnitWiseSalary");
job.setJarByClass(UnitWiseSalary.class);
job.setMapperClass(UnitWiseSalaryMap.class);
job.setReducerClass(UnitWiseSalaryRed.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);
FileInputFormat.addInputPath(job, new Path("/salary.txt"));
FileOutputFormat.setOutputPath(job, new Path("/salarysum"));
System.exit(job.waitForCompletion(true)?0:1);
}
public static class UnitWiseSalaryMap extends Mapper<LongWritable, Text, Text,
FloatWritable>
{
public void map(LongWritable key, Text empRecord, Context con)
throwsIOException, InterruptedException
{
String[] word = empRecord.toString().split("\\t");
String un = word[7];
Float salary = Float.parseFloat(word[8]);
con.write(new Text(un), newFloatWritable(salary));
}
}
public static class UnitWiseSalaryRed extends Reducer<Text, FloatWritable, Text, Text>
{
public void reduce(Text key, Iterable<FloatWritable>valueList,Context con)
throwsIOException, InterruptedException
{
Float total = (float) 0;
for (FloatWritablevar :valueList)
{
total += var.get();
}
String out = "Total: " + total;
con.write(key, new Text(out));
}
}
}

INPUT:

OUTPUT:

RESULT:
Thus the Map Reduce program for finding the unit wise salary has been written, executed
and the output is verified successfully.
Ex No:05 Roll no:
Date: Page no:

CRUD (CREATE, READ, UPDATE AND DELETE) OPERATIONS IN MONGODB

AIM:
To demonstrate CRUD (Create, Read, Update and Delete) operations in MongoDB.

PROCEDURE:
Step 1: Open a command prompt and navigate to the bin directory present in the MongoDB
installation folder. Installation folder is C:\Program Files\MongoDB\Server\4.2.
Step 2: To get into the MongoDB shell, type the command “mongo.exe” in the command
prompt.

CRUD OPERATIONS:
CREATE:
To create a collection in a database, use db.createCollection() method. Collection can
also be created automatically, when some document is inserted.
>db.createCollection("Students")
{ "ok" : 1 }

INSERT:
To insert a document into a collection, use insert() method.
>db.Students.insert({_id:1,StudName:"Michelle Jacintha",Grade:"VII",Hobbies:"Internet
Surfing"});
WriteResult({ "nInserted" : 1 })
>db.Students.insert({_id:2,StudName:"Mabel Mathews",Grade:"VII",Hobbies:"Baseball"});
WriteResult({ "nInserted" : 1 })
>db.Students.insert({_id:3,StudName:"Aryan David",Grade:"VII",Hobbies:"Skatting"});
WriteResult({ "nInserted" : 1 })
>db.Students.insert({_id:4,StudName:"HersehGibbs",Grade:"VII",Hobbies:"Graffiti"});
WriteResult({ "nInserted" : 1 })

READ:
To read or display the documents from a collection, use find() method. The pretty()
method is used to format the result.
>db.Students.find().pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Skatting"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}

INSERT USING UPDATE:


Documents can also be inserted into a collection using update() method with upsert set to
true. Upsert means Update else insert.
>db.Students.update({_id:3,StudName:"Aryan David",Grade:"VII"},{$set:{Hobbies:"Chess"}},
{upsert:true});
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

READ AFTER INSERT USING UPDATE:


>db.Students.find({_id:3});
{ "_id" : 3, "StudName" : "Aryan David", "Grade" : "VII", "Hobbies" : "Chess" }

INSERT USING SAVE:


Documents can also be inserted into a collection using save() method.
>db.Students.save({_id:5,StudName:"Vamsi Bapat",Grade:"VII",Hobbies:"Cricket"})
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : 5 })

READ AFTER INSERT USING SAVE:


>db.Students.find({_id:5});
{ "_id" : 5, "StudName" : "Vamsi Bapat", "Grade" : "VII", "Hobbies" : "Cricket" }

READ:
>db.Students.find().pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}

READ:
To find a document wherein the “StudName” has the value “Aryan David”.
>db.Students.find({StudName:"Aryan David"}).pretty();
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}

READ:
To display only the StudName from all the documents of the Students collection. The
identifier “_id” can be suppressed and not mentioned by specifying it as 0.
>db.Students.find({},{StudName:1,_id:0});
{ "StudName" : "Michelle Jacintha" }
{ "StudName" : "Mabel Mathews" }
{ "StudName" : "Aryan David" }
{ "StudName" : "Herseh Gibbs" }
{ "StudName" : "Vamsi Bapat" }

READ:
To find those documents where the Grade is set to “VII”. The relational operator $eq can
be used.
>db.Students.find({Grade:{$eq:'VII'}}).pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}

READ:
To find those documents where the Hobbies is not equal to “Baseball”. The relational
operator $ne can be used.
>db.Students.find({Hobbies:{$ne:'Baseball'}}).pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 4,
"StudName" : "Herseh Gibbs",
"Grade" : "VII",
"Hobbies" : "Graffiti"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}
READ:
To find those documents where the Hobbies is set to either “Chess” or is set to “Skating”.
The operator $in can be used.
>db.Students.find({Hobbies:{$in:['Chess','Skating']}});
{ "_id" : 3, "StudName" : "Aryan David", "Grade" : "VII", "Hobbies" : "Chess" }

READ:
To find those documents where the Hobbies is set neither to “Chess” nor is set to
“Skating”. The operator $nin can be used.
>db.Students.find({Hobbies:{$nin:['Chess','Skating']}});
{ "_id" : 1, "StudName" : "Michelle Jacintha", "Grade" : "VII", "Hobbies" : "Internet Surfing" }
{ "_id" : 2, "StudName" : "Mabel Mathews", "Grade" : "VII", "Hobbies" : "Baseball" }
{ "_id" : 4, "StudName" : "Herseh Gibbs", "Grade" : "VII", "Hobbies" : "Graffiti" }
{ "_id" : 5, "StudName" : "Vamsi Bapat", "Grade" : "VII", "Hobbies" : "Cricket" }

DELETE:
To delete a document where the “_id” is set to 4.
>db.Students.remove({_id:4});
WriteResult({ "nRemoved" : 1 })

READ AFTER DELETE:


>db.Students.find().pretty();
{
"_id" : 1,
"StudName" : "Michelle Jacintha",
"Grade" : "VII",
"Hobbies" : "Internet Surfing"
}
{
"_id" : 2,
"StudName" : "Mabel Mathews",
"Grade" : "VII",
"Hobbies" : "Baseball"
}
{
"_id" : 3,
"StudName" : "Aryan David",
"Grade" : "VII",
"Hobbies" : "Chess"
}
{
"_id" : 5,
"StudName" : "Vamsi Bapat",
"Grade" : "VII",
"Hobbies" : "Cricket"
}

DELETE:
To delete all documents from the collection “Students”.
>db.Students.remove({});
WriteResult({ "nRemoved" : 4 })

READ AFTER DELETE:


>db.Students.find().pretty();
>

RESULT:
Thus the CRUD (Create, Read, Update and Delete) operations in MongoDB has been
executed and the output is verified successfully.
Ex No:06 Roll no:
Date: Page no:

IMPORT, EXPORT AND AGGREGATION IN MONGODB


AIM:
To demonstrate import, export and aggregation in MongoDB.

PROCEDURE:
Step 1: Pick any public dataset from the site www.kdnuggets.com with at least two numeric
Columns, convert it into CSV format, name the file as sample.csv and save it in C drive.
Step 2: Open a command prompt and navigate to the MongoDB installation folder. Installation
folder is C:\Program Files\MongoDB\Server\4.2.
Step 3: Use Mongoimport to import data from CSV format file into MongoDB collection,
“sample” in test database.
mongoimport --db test --collection sample --type csv --headerline --file C:\sample.csv
Step 4: Navigate to the bin directory present in the MongoDB installation folder specified above.
Step 5: To get into the MongoDB shell, type the command mongo.exe in the command propmpt.
Step 6: Compute the average of the values in the second numeric column using aggregate
operations in MongoDB.
Step 7: Type exit and come out of the MongoDB shell.
Step 8: Use Mongoexport to export documents of the sample collection in the test database into
CSV format file.

IMPORT IN MONGODB:
At the command prompt, execute the following command:
mongoimport --db test --collection sample --type csv --headerline --file C:\sample.csv
INPUT:
OUTPUT:

AGGREGATION IN MONGODB:

db.sample.aggregate({$group:{_id:"$_unit_id",Tot_choose_one:{$avg:"$choose_one"}}});

OUTPUT:
EXPORT IN MONGODB:
At the command prompt, execute the following command:
mongoexport --db test --collection sample --fieldFile C:\sample.xlsx --out d:\output.txt

OUTPUT:

RESULT:
Thus the import, export and aggregation in MongoDB has been executed and the output
is verified successfully.
Ex No:07 Roll no:
Date: Page no:

TOP N AND BOTTOM N VIEW ON THE WORKSHEET USING TABLEAU


VISUALIZATION TOOL

AIM:
To demonstrate the Top N and Bottom N view on the worksheet for a dataset using
Tableau visualization tool.

PROCEDURE:

Step 1: Open Tableau visualization tool by clicking on the Tableau icon in the desktop.

Step 2: Select the data onto Tableau by choosing the Sample-Superstore.xls file present under
the Saved Data Sources menu in the home page. A worksheet is shown where the chart is created
and displayed.
Step 3: Drag the dimension Product Name under Product to the Rows shelf and the Measures
Profit to the Columns shelf. Choose the chart type as Bar in the Marks section. Tableau shows
the following chart.
Step 4: Right click on the field Product Name and select sort. Choose Sort by as field, Sort Order
as Descending.
Step 5: Right click on the Top Customers under Paramters and click Edit. Give the name as Top
and Bottom Products, change the current value as 10 and click OK.
Step 6: Right click on the field Product Name and select Filter. In General tab, choose Use all
radio option. In the Top tab, choose the second radio option By Field. In the second drop-down,
choose the Top and Bottom categories, click Apply and the OK. The top 10 Sub-Category of
products by profit is now shown in the chart.
Step 7: Right click on the filter Product Name under Filter section, select Create Set. Name the
set as Top 10 Products. In the Top tab, choose the second radio option By Field. In the first drop-
down, choose Top and in second drop-down, choose the Top and Bottom Products, click OK.
Step 8: Right click on the filter Product Name under Filter section, select Create Set. Name the
set as Bottom 10 Products. In the Top tab, choose the second radio option By Field. In the first
drop-down, choose Bottom and in second drop-down, choose the Top and Bottom Products,
click OK.
Step 9: Right click on the Top 10 Customers under sets and select Create Combined Set. Name
the set as Top 10 and Bottom 10 Profit Products. In the second drop-down, select Bottom 10
Products and click OK.
Step 10: Right click on the Product Name in the Filters section and select remove.
Step 11: Drag Top 10 and Bottom 10 Profit Products Set from Sets section to the Filters section.
Step 12: Now click on the presenatation icon or F7 to view the Top 10 and Bottom 10 Products
based on the Profit in a single view.

RESULT:
Thus the Top N and Bottom N view on the worksheet for a dataset using Tableau
visualization tool has been demonstrated and the output is verified successfully.
Ex No:08 Roll no:
Date: Page no:

MAP REDUCE PROGRAM FOR WORD COUNTER

AIM:
To write a Map Reduce program for counting the occurrences of similar words in a file.

PROCEDURE:
Step 1: Open Eclipse and select File -> New -> Java Project -> (Name it -MRDemo) -> Finish.
Step 2: Right Click the project name and select New -> Package (Name it - com.app) -> Finish.
Step 3: Right Click the package name and select New -> Class (Name it - WordCounter).
Step 4: Add the following reference libraries by right clicking theproject name and select
properties -> Java Build Path -> Add External JARs
 hadoop-core-0.20.0.jar
 org.apache.commons.cli-1.2.0.jar
Step 5: Write the Map Reduce program in WordCounter.java file and save the file.
Step 6: Make the project jar file by right clicking the project name and select Export -> Select
export destination as Jar File under Java ->click next (Give the JAR file name as
MRDemo.jar) ->Finish.
Step 7: Open the command prompt to start the Hadoop by typing the following command in
the specified path and hit enter.
C:\hadoop-2.8.0\sbin>start-all
Step 8: Move theinput file into HDFS by typing the following in the command prompt
hadoop fs -put C:/word.txt /word.txt
Step 9: Run the jar file by typing the following in the command prompt
hadoop jar MRDemo.jar com.app.WordCounter /word.txt /wordcount
Step 10:Check the output by typing the following in the command prompt
hadoop fs -ls /wordcount
hadoop fs -cat /wordcount/part-r-00000

PROGRAM:
WordCounter.java:

package com.app;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;
public class WordCounter
{
public static void main(String [] args) throws IOException, InterruptedException,
ClassNotFoundException
{
Job job=new Job();
job.setJobName("WordCounter");
job.setJarByClass(WordCounter.class);
job.setMapperClass(WordCounterMap.class);
job.setReducerClass(WordCounterRed.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/word.txt"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount"));
System.exit(job.waitForCompletion(true)?0:1);
}
public static class WordCounterMap extends Mapper<LongWritable, Text, Text,
IntWritable>
{
@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException
{
String[] words=value.toString().split(",");
for(String word: words )
{
context.write(new Text(word), new IntWritable(1));
}
}
}
public static class WordCounterRed extends Reducer<Text, IntWritable, Text,
IntWritable>
{
@Override
public void reduce(Text word, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
Integer count=0;
for(IntWritableval : values)
{
count += val.get();
}
context.write(word, new IntWritable(count));
}
}
}

INPUT:

OUTPUT:

RESULT:
Thus the Map Reduce program for finding the occurrences of similar words in a file has
been written, executed and the output is verified successfully.

You might also like