Big Data Fundamentals and Platforms Assginment 3
Big Data Fundamentals and Platforms Assginment 3
ASSIGNMENT
Question 3: Show practical example to list files, Insert data, retrieving data and
shutting down HDFS.
Initially, you have to format the configured HDFS file system, open namenode (HDFS
server), and execute the following command.
$ hadoop namenode
-format
After formatting the HDFS, start the distributed file system. The following command will
start the namenode as well as the data nodes as cluster.
$ start-dfs.sh
Step 2
Transfer and store a data file from local systems to the Hadoop file system using the
put command.
$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input
Step 3
You can verify the file using ls command.
$ $HADOOP_HOME/bin/hadoop fs -ls /user/input
Step 2
Get the file from HDFS to the local file system using get command.
$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/
Q4. Building on the simple WordCount example done in class and Hadoop
tutorial, your task is to perform simple processing on provided COVID-19
dataset.
The task is to count the total number of reported cases for every country/location
till April 8th, 2020 (NOTE: There data does contain case rows for Dec 2019, you
will have to filter that data)
COVID - Analysis from the csv file using
MapReduce Programming.
File: MapperClass.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
public void map(LongWritable key, Text value, OutputCollector
<Text, IntWritable> output, Reporter reporter) throws IOException {
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
public void reduce(Text t_key, Iterator<IntWritable> values,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException {
// determine key object and counter variable
Text key = t_key;
int counter = 0;
// as long that the values inside the data being mapped,
// will counting how many data with the same key
while (values.hasNext()) {
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
public static void main(String[] args) {
// called the input path for file and defined the output
path
FileInputFormat.setInputPaths(job_conf, new Path(args[0]));
FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));
my_client.setConf(job_conf);
try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
How to execute:
Step 1. Creating a directory named as classes/
$ mkdir classes
Step 3: Creating Jar file for the above created classes which are stored in classes
folder.
$ jar -cvf CountMe.jar -C classes/ .
# do not forget to put <space> dot at the end inthe above command.
Output:
added manifest
adding: MainClass.class(in = 1609) (out= 806)(deflated 49%)
adding: MapperClass.class(in = 1911) (out= 754)(deflated 60%)
adding: ReducerClass.class(in = 1561) (out= 628)(deflated 59%)
Afghanistan 367
Albania 383
Algeria 1468
Andorra 545
Angola 17
Anguilla 3
Antigua and Barbuda 15
Argentina 1715
Armenia 853
Aruba 74
Australia 5956
Austria 12640
Azerbaijan 717
Bahamas 36
Bahrain 811
Bangladesh 164
Barbados 63