0% found this document useful (0 votes)

90 views6 pages

Big Data Fundamentals and Platforms Assginment 3

This document provides code for a MapReduce program to analyze a COVID-19 dataset from a CSV file to count the total reported cases for each country/location until April 8, 2020. It includes Java code for the Mapper, Reducer, and Main classes. The Mapper splits the CSV data and outputs the location and new cases. The Reducer sums the new cases for each location. The Main class runs the MapReduce job, specifying the input/output paths and classes.

Uploaded by

sagar srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views6 pages

Big Data Fundamentals and Platforms Assginment 3

Uploaded by

sagar srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Big Data Fundamentals and Platforms

ASSIGNMENT

Question 3: Show practical example to list files, Insert data, retrieving data and
shutting down HDFS.
Initially, you have to format the configured HDFS file system, open namenode (HDFS
server), and execute the following command.

$ hadoop namenode
-format
After formatting the HDFS, start the distributed file system. The following command will
start the namenode as well as the data nodes as cluster.
$ start-dfs.sh

Listing Files in HDFS

After loading the information in the server, we can find the list of files in a directory,
status of a file, using ‘ls’. Given below is the syntax of ls that you can pass to a
directory or a filename as an argument.
$ $HADOOP_HOME/bin/hadoop fs -ls
<args>

Inserting Data into HDFS

Assume we have data in the file called file.txt in the local system which is ought to be
saved in the hdfs file system. Follow the steps given below to insert the required file in
the Hadoop file system.
Step 1
You have to create an input directory.
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step 2
Transfer and store a data file from local systems to the Hadoop file system using the
put command.
$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input

Step 3
You can verify the file using ls command.
$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

Assume we have a file in HDFS called outfile. Given below is a simple demonstration
for retrieving the required file from the Hadoop file system.
Step 1
Initially, view the data from HDFS using cat command.
$ $HADOOP_HOME/bin/hadoop fs -cat
/user/output/outfile

Step 2
Get the file from HDFS to the local file system using get command.
$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Shutting Down the HDFS

You can shut down the HDFS by using the following command.
$ stop-dfs.sh

Q4. Building on the simple WordCount example done in class and Hadoop
tutorial, your task is to perform simple processing on provided COVID-19
dataset.

The task is to count the total number of reported cases for every country/location
till April 8th, 2020 (NOTE: There data does contain case rows for Dec 2019, you
will have to filter that data)
COVID - Analysis from the csv file using
MapReduce Programming.
File: MapperClass.java
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class MapperClass extends MapReduceBase implements Mapper

<LongWritable, Text, Text, IntWritable> {

// initialize the field variable

    private final static IntWritable one = new IntWritable(1);
    private final static int LOCATION = 1;
    private final static int NEW_CASES = 2;
    private final static String CSV_SEPARATOR = ",";

public void map(LongWritable key, Text value, OutputCollector
<Text, IntWritable> output, Reporter reporter) throws IOException {

// initiate the variable

String valueString = value.toString();

// split the data with CSV_SEPARATOR

        String[] columnData = valueString.split(CSV_SEPARATOR);

        // collect the data with defined column
        output.collect(new Text(columnData[LOCATION]), new
IntWritable(Integer.parseInt(columnData[NEW_CASES])));
    }
}

File: ReducerClass.java
import java.io.IOException;
import java.util.*;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class ReducerClass extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text t_key, Iterator<IntWritable> values,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException {
        // determine key object and counter variable
        Text key = t_key;
        int counter = 0;

        // as long that the values inside the data being mapped,
        // will counting how many data with the same key
        while (values.hasNext()) {

// replace type of value with the actual type of our

value
            IntWritable value = (IntWritable) values.next();
            counter += value.get();
        }
        output.collect(key, new IntWritable(counter));
    }
}

File: MainClass.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.*;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

public class MainClass {

public static void main(String[] args) {

// create new JobClient

JobClient my_client = new JobClient();

// Create a configuration object for the job

JobConf job_conf = new JobConf(MainClass.class);

// Set a name of the Job

job_conf.setJobName("MapReduceCSV");

// Specify data type of output key and value

job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class

job_conf.setMapperClass(MapperClass.class);
job_conf.setReducerClass(ReducerClass.class);

// Specify formats of the data type of Input and output

job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line

arguments,
// arg[0] = name of input directory on HDFS, and
// arg[1] = name of output directory to be created to store
the output file.

        // called the input path for file and defined the output
path
        FileInputFormat.setInputPaths(job_conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

        my_client.setConf(job_conf);
        try {
            // Run the job
            JobClient.runJob(job_conf);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

How to execute:
Step 1. Creating a directory named as classes/
$ mkdir classes

Step 2. Compiling the source file using the following command.

$ javac -cp hadoop-common-2.2.0.jar:hadoop-mapreduce-client-core-
2.7.1.jar:classes:. -d classes/ *.java

Step 3: Creating Jar file for the above created classes which are stored in classes
folder.
$ jar -cvf CountMe.jar -C classes/ .
# do not forget to put <space> dot at the end inthe above command.

Output:
added manifest
adding: MainClass.class(in = 1609) (out= 806)(deflated 49%)
adding: MapperClass.class(in = 1911) (out= 754)(deflated 60%)
adding: ReducerClass.class(in = 1561) (out= 628)(deflated 59%)

Step 4: upload the csv file to hadoop distributed file system

$ hadoop fs -put covid.csv .
# do not forget to remove the header line from the covid file provided before
uploading to HDFS
# do not forget to put <space> dot in the above command to upload it to the home
folder.

Step 5: Running the Hadoop file using hadoop jar command.

$ hadoop jar CountMe.jar MainClass covid.csv output/
Step 6: Checking the output folder has been populated or not and also printing the
output on terminal

$ hadoop fs -ls output/

$ hadoop fs -cat output/part-00000

Afghanistan 367
Albania 383
Algeria 1468
Andorra 545
Angola 17
Anguilla 3
Antigua and Barbuda 15
Argentina 1715
Armenia 853
Aruba 74
Australia 5956
Austria 12640
Azerbaijan 717
Bahamas 36
Bahrain 811
Bangladesh 164
Barbados 63

Ms 102t00a Enu Powerpoint 01
0% (1)
Ms 102t00a Enu Powerpoint 01
58 pages
Part 3 of Negida Handbook of Clinical Research
No ratings yet
Part 3 of Negida Handbook of Clinical Research
147 pages
CS3492 DBMS Notes
100% (1)
CS3492 DBMS Notes
165 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Spider V 20 MkII Manual - English
No ratings yet
Spider V 20 MkII Manual - English
7 pages
Topic 1 Introduction To Digital Logic and Boolean Algebra
No ratings yet
Topic 1 Introduction To Digital Logic and Boolean Algebra
99 pages
FG-600E Datasheet: Quick Spec
No ratings yet
FG-600E Datasheet: Quick Spec
5 pages
IOT in 5G Training and Certification by TELCOMA Global
100% (1)
IOT in 5G Training and Certification by TELCOMA Global
150 pages
Voice Assistant
No ratings yet
Voice Assistant
46 pages
Steps To Understanding Introductory Book 750 Words Introductory BK 1 PDF
No ratings yet
Steps To Understanding Introductory Book 750 Words Introductory BK 1 PDF
2 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Afp Afp Afp Afp Afp-300 - 300 - 300 - 300 - 300 Afp Afp Afp Afp Afp-400 - 400 - 400 - 400 - 400
No ratings yet
Afp Afp Afp Afp Afp-300 - 300 - 300 - 300 - 300 Afp Afp Afp Afp Afp-400 - 400 - 400 - 400 - 400
10 pages
PGIM Setup Admin
No ratings yet
PGIM Setup Admin
324 pages
Water Pollution: From Wikipedia, The Free Encyclopedia
No ratings yet
Water Pollution: From Wikipedia, The Free Encyclopedia
19 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
The Paradigm Shift in Indian Oil and Gas Industry: A Knowledge Paper Prepared For
No ratings yet
The Paradigm Shift in Indian Oil and Gas Industry: A Knowledge Paper Prepared For
36 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
More About Spreadsheet Errors and Fixes
100% (1)
More About Spreadsheet Errors and Fixes
3 pages
Computer Interface Design: Dr. Ghassan Abu Samhadana
No ratings yet
Computer Interface Design: Dr. Ghassan Abu Samhadana
37 pages
Architectural Lighting and LED Drivers Ebook FINAL
No ratings yet
Architectural Lighting and LED Drivers Ebook FINAL
14 pages
Lab File Format
No ratings yet
Lab File Format
60 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
DSBDSAssingment 11
No ratings yet
DSBDSAssingment 11
20 pages
Basic ITK Customization Concept Part-01
No ratings yet
Basic ITK Customization Concept Part-01
1 page
REN R20ut4813ej0100-Rfp MAN 20201001
No ratings yet
REN R20ut4813ej0100-Rfp MAN 20201001
87 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Wikipedia: Jump To Navigation Jump To Search
No ratings yet
Wikipedia: Jump To Navigation Jump To Search
15 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Automatic Free
No ratings yet
Automatic Free
4 pages
Introduction To Hadoop - Part Two: 1 Working With Found Datasets 1 2 Hadoop and Comma Separated Values (CSV) Files 1
No ratings yet
Introduction To Hadoop - Part Two: 1 Working With Found Datasets 1 2 Hadoop and Comma Separated Values (CSV) Files 1
18 pages
Noise: From Wikipedia, The Free Encyclopedia
No ratings yet
Noise: From Wikipedia, The Free Encyclopedia
12 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Palak
No ratings yet
Palak
10 pages
Review of Literature For Mobile Banking
No ratings yet
Review of Literature For Mobile Banking
5 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Environment: Created By: Khushi Mattu
No ratings yet
Environment: Created By: Khushi Mattu
5 pages
Puchakayala Kranthi Reddy: Skills
No ratings yet
Puchakayala Kranthi Reddy: Skills
3 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Etherchannel: By. Eng. Ayman Boghdady
No ratings yet
Etherchannel: By. Eng. Ayman Boghdady
19 pages
Unit V Programming Model
No ratings yet
Unit V Programming Model
53 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Hindi Intent Classification
No ratings yet
Hindi Intent Classification
13 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
Big Data File
No ratings yet
Big Data File
16 pages
Data Science
No ratings yet
Data Science
82 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
HADOOP AND BIG DATA - Final
No ratings yet
HADOOP AND BIG DATA - Final
26 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Grade 1 Computer Worksheet# 1,2
No ratings yet
Grade 1 Computer Worksheet# 1,2
4 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Midterm Question Bank Health Informatics
No ratings yet
Midterm Question Bank Health Informatics
42 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Big Data
No ratings yet
Big Data
23 pages
Big Data
No ratings yet
Big Data
28 pages
Slide 1.pptx-1
No ratings yet
Slide 1.pptx-1
41 pages
Adaptive Echo Cancellation Proj
No ratings yet
Adaptive Echo Cancellation Proj
55 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
OddEven Program
No ratings yet
OddEven Program
2 pages
Cs 083 HP 2 Mat Cs Final
No ratings yet
Cs 083 HP 2 Mat Cs Final
10 pages
Hadoop Module1
No ratings yet
Hadoop Module1
37 pages
Revu Ipad Help Guide v3 8
No ratings yet
Revu Ipad Help Guide v3 8
55 pages
Exp 12
No ratings yet
Exp 12
7 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Big Datalab
No ratings yet
Big Datalab
4 pages
CSA Assessment Test 2022
No ratings yet
CSA Assessment Test 2022
5 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Assignment 2 Write-Up
No ratings yet
Assignment 2 Write-Up
7 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
controllingTheFlow Answers
No ratings yet
controllingTheFlow Answers
11 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Part A OUTPUT
No ratings yet
Part A OUTPUT
2 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Using Map Reduce Concept, Implement A Java Pro...
No ratings yet
Using Map Reduce Concept, Implement A Java Pro...
2 pages
BDA
No ratings yet
BDA
19 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

Big Data Fundamentals and Platforms Assginment 3

Uploaded by

Big Data Fundamentals and Platforms Assginment 3

Uploaded by

Big Data Fundamentals and Platforms

Listing Files in HDFS

Inserting Data into HDFS

Retrieving Data from HDFS

Shutting Down the HDFS

public class MapperClass extends MapReduceBase implements Mapper

// initialize the field variable

// initiate the variable

// split the data with CSV_SEPARATOR

public class ReducerClass extends MapReduceBase implements

// replace type of value with the actual type of our

public class MainClass {

// create new JobClient

// Create a configuration object for the job

// Set a name of the Job

// Specify data type of output key and value

// Specify names of Mapper and Reducer Class

// Specify formats of the data type of Input and output

// Set input and output directories using command line

Step 2. Compiling the source file using the following command.

Step 4: upload the csv file to hadoop distributed file system

Step 5: Running the Hadoop file using hadoop jar command.

$ hadoop fs -ls output/

You might also like