21CP059 DAV Lab Manual
21CP059 DAV Lab Manual
BIRLA VISHVAKARMA
MAHAVIDYALAYA
(An Autonomous Institution)
Computer Department
Course Code: 4CP02
Course Name: Data Analysis and
Visualization
Faculty: Divyang sir
Name: Devarshi Chinivar
ID: 21CP059
21CP059 4CP02 - DAV
Practical List
No. Date Practical Sign
11. Write a java program to insert, update and delete records from HBase.
Practical - 1 :
Configure Hadoop cluster in distributed mode.
To download Cloudera, visit https://community.cloudera.com/t5/Support-
Questions/Cloudera- QuickStart-VM-Download/
Open Oracle Virtual Box and head to File menu and select Import appliance and
give location of our installed Cloudera files.
21CP059 4CP02 - DAV
After the completion of import, simply run the cloudera software in your virtual box.
Practical - 2 :
21CP059 4CP02 - DAV
Example Output:
Example Output :
Example Output :
(No output if successful)
4. cd (Change Directory)
• Changes the current directory to the specified path.
21CP059 4CP02 - DAV
Command:
• cd new_directory
Example Output :
(No output if successful)
Example Output :
(No output if successful)
Example Output :
(The file opens in the respective text editor; no command-line output)
Example Output :
Command:
• more file.txt
Example Output :
Example Output :
Example Output :
(No output if successful)
Example Output :
(No output if successful)
Command:
• rm file.txt
Example Output :
(No output if successful; will prompt for confirmation if using -i option)
Example Output :
(No output if successful; will error if the directory is not empty)
Example Output :
Command:
• hdfs dfs -mkdir /path/to/new_directory
Example Output :
Example Output :
Command:
• hdfs dfs -get /path/to/hdfs_file.txt local_directory/
Example Output :
Example Output :
Command:
• hdfs dfs -mv /path/to/old_location /path/to/new_location
Example Output :
Example Output :
Command:
• hdfs dfs -expunge
Example Output :
• (No output if successful)
Practical - 3 :
Write a Map Reduce Code for Count Frequency of words
21CP059 4CP02 - DAV
Steps:
First Open Eclipse -> then select File -> New -> Java Project ->Name
it WordCount -> then Finish.
CreateThree Java Classes into the project. Name them WCDriver(having the main
function), WCMapper, WCReducer.
In the above figure, you can see the Add External JARs option on the Right Hand
Side. Click on it and add the below mention files. You can find these files
in /usr/lib/
1. /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.6.0-mr1-cdh5.13.0.jar
2. /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar
21CP059 4CP02 - DAV
Mapper Code:
// Importing libraries
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
// Map function
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter rep) throws IOException
{
Reducer Code:
// Importing libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
// Reduce function
public void reduce(Text key, Iterator<IntWritable> value,
OutputCollector<Text, IntWritable> output,
Reporter rep) throws IOException
{
int count = 0;
Driver Code:
// Importing libraries
import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
// Main Method
public static void main(String args[]) throws Exception
{
int exitCode = ToolRunner.run(new WCDriver(), args);
System.out.println(exitCode);
}
}
_________________________________________________________________
Now you have to make a jar file. Right Click on Project-> Click on Export-
> Select export destination as Jar File-> Name the jar File(WordCount.jar) -
> Click on next -> at last Click on Finish. Now copy this file into the Workspace
directory of Cloudera
21CP059 4CP02 - DAV
Open the terminal on CDH and change the directory to the workspace. You can do
this by using “cd workspace/” command. Now, Create a text file(WCFile.txt) and
move it to HDFS. For that open terminal and write this code(remember you should
be in the same directory as jar file you have created just now).
21CP059 4CP02 - DAV
Now, run this command to copy the file input file into the HDFS.
After Executing the code, you can see the result in WCOutput file or by writing
following command on terminal.
Input:
Hello I am DevarshiChinivar
Hello I am a Student
Output:
DevarshiChinivar 1
Hello 2
I 2
Student 1
am 2
a 1
21CP059 4CP02 - DAV
Practical - 4 :
Develop a MapReduce program to Analyze weather
data set and print whether the day is shiny or cool.
Steps:
Example of our dataset where column 6 and column 7 is showing Maximum and
Minimum temperature, respectively.
First Open Eclipse -> then select File -> New -> Java Project ->Name
it MyProject -> then select use an execution environment ->
choose JavaSE-1.8 then next -> Finish.
21CP059 4CP02 - DAV
In this Project Create Java class with name MyMaxMin -> then click Finish.
MyMaxMin.java:
// importing Libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
// Mapper
/**
* @method map
* This method takes the input as a text data type.
* Now leaving the first five tokens, it takes
* 6th token is taken as temp_max and
* 7th token is taken as temp_min. Now
* temp_max > 30 and temp_min < 15 are
* passed to the reducer.
*/
@Override
public void map(LongWritable arg0, Text Value, Context context)
throws IOException, InterruptedException {
// if maximum temperature is
// greater than 30, it is a hot day
if (temp_Max > 30.0) {
// Hot day
context.write(new Text("The Day is Hot Day :" + date),
new
Text(String.valueOf(temp_Max)));
}
// Cold day
context.write(new Text("The Day is Cold Day :" + date),
new Text(String.valueOf(temp_Min)));
}
}
}
// Reducer
/**
* @method reduce
* This method takes the input as key and
* list of values pair from the mapper,
* it does aggregation based on keys and
* produces the final context.
*/
/**
* @method main
* This method is used for setting
* all the configuration properties.
* It acts as a driver for map-reduce
* code.
*/
}
}
Now we need to add external jar for the packages that we have import. Download
the jar package Hadoop Common and Hadoop MapReduce Core according to
your Hadoop version.
You can check Hadoop Version:
hadoop version
Now we add these external jars to our MyProject. Right Click on MyProject ->
then select Build Path-> Click on Configure Build Path and select Add
External jars…. and add jars from it’s download location then click -> Apply
and Close.
21CP059 4CP02 - DAV
Now export the project as jar file. Right-click on MyProject choose Export.. and
go to Java -> JAR file click -> Next and choose your export destination then
click -> Next.
choose Main Class as MyMaxMin by clicking -> Browse and then click -
> Finish -> Ok.
21CP059 4CP02 - DAV
start-dfs.sh
start-yarn.sh
Syntax:
Now Run your Jar File with below command and produce the output
in MyOutput File.
Syntax:
Command:
Now Move to localhost:50070/, under utilities select Browse the file system and
download part-r-00000 in /MyOutput directory to see result.
21CP059 4CP02 - DAV
In the above image, you can see the top 10 results showing the cold days. The
second column is a day in yyyy/mm/dd format. For Example, 20200101 means
year = 2020
month = 01
Date = 01