0% found this document useful (0 votes)
17 views159 pages

Big Data Lab

The document outlines the steps to install Apache Hadoop on Ubuntu by first installing VMware and Ubuntu, then updating Ubuntu, installing Java and SSH, downloading Hadoop, and editing configuration files to set environment variables and paths for Hadoop. Key steps include installing VMware to run Ubuntu, installing Java and SSH on Ubuntu, downloading Hadoop, and editing files like bashrc to configure Hadoop paths and environment variables. The experiments cover configuring a single-node Hadoop cluster on Ubuntu using these installation and configuration steps.

Uploaded by

praveenm026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views159 pages

Big Data Lab

The document outlines the steps to install Apache Hadoop on Ubuntu by first installing VMware and Ubuntu, then updating Ubuntu, installing Java and SSH, downloading Hadoop, and editing configuration files to set environment variables and paths for Hadoop. Key steps include installing VMware to run Ubuntu, installing Java and SSH on Ubuntu, downloading Hadoop, and editing files like bashrc to configure Hadoop paths and environment variables. The experiments cover configuring a single-node Hadoop cluster on Ubuntu using these installation and configuration steps.

Uploaded by

praveenm026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR B.

Tech
(CSE) – III-II L T P C
0 0 3 1.5

(19A05602P) BIG DATA ANALYTICS LABORATORY

Course Objectives:

This course is designed to:

1. Get familiar with Hadoop distributions, configuring Hadoop and performing Filemanagement
tasks
2. Experiment MapReduce in Hadoop frameworks
3. Implement MapReduce programs in variety applications
4. Explore MapReduce support for debugging
5. Understand different approaches for building Hadoop MapReduce programs for real-time
applications

Experiments:

1. Install Apache Hadoop

2. Develop a MapReduce program to calculate the frequency of a given word in a given file.

3. Develop a MapReduce program to find the maximum temperature in each year.

4. Develop a MapReduce program to find the grades of student’s.

5. Develop a MapReduce program to implement Matrix Multiplication.

6. Develop a MapReduce to find the maximum electrical consumption in each year givenelectrical
consumption for each month in each year.

7. Develop a MapReduce to analyze weather data set and print whether the day is shinny or coolday.

8. Develop a MapReduce program to find the number of products sold in each country byconsidering
sales data containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi


_Date uct ce _Type Me ty ate ntry Created ogin ude tude

9. Develop a MapReduce program to find the tags associated with each movie by analyzingmovie
lens data.

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. XYZ.com is an online music website where users listen to various tracks, the data gets collected
which is given below.

The data is coming in log files and looks like as shown below.

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

Write a MapReduce program to get the following


 Number of unique listeners
 Number of times the track was shared with others
 Number of times the track was listened to on the radio
 Number of times the track was listened to in total
 Number of times the track was skipped on the radio

11. Develop a MapReduce program to find the frequency of books published eachyear and findin which
year maximum number of books were published usingthe following data.

Title Author Published Author Language No of pages


Year country

12. Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people
(both male and female) who died in the tragedy. How many persons are survived in each class.

The titanic data will be..


Column 1 :PassengerI d Column 2 : Survived (survived=0 &died=1)
Column 3 :Pclass Column 4 : Name
Column 5 : Sex Column 6 : Age
Column 7 :SibSp Column 8 :Parch
Column 9 : Ticket Column 10 : Fare
Column 11 :Cabin Column 12 : Embarked

13. Develop a MapReduce program to analyze Uber data set to find the days on which each
basement has more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number Date active_vehicles trips

14. Develop a program to calculate the maximum recorded temperature by yearwise for theweather
dataset in Pig Latin

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


15. Write queries to sort and aggregate the data in a table using HiveQL.

16. Develop a Java application to find the maximum temperature using Spark.
Text Books:

1. Tom White, “Hadoop: The Definitive Guide” Fourth Edition, O’reilly Media, 2015.

Reference Books:

1. Glenn J. Myatt, Making Sense of Data , John Wiley & Sons, 2007 Pete Warden, Big Data
Glossary, O’Reilly, 2011.
2. Michael Berthold, David J.Hand, Intelligent Data Analysis, Spingers, 2007.
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos, Uderstanding BigData :
Analytics for Enterprise Class Hadoop and Streaming Data, McGrawHill Publishing, 2012.
4. AnandRajaraman and Jeffrey David UIIman, Mining of Massive Datasets Cambridge
University Press, 2012.

Course Outcomes:

Upon completion of the course, the students should be able to:

1. Configure Hadoop and perform File Management Tasks (L2)


2. Apply MapReduce programs to real time issues like word count, weather dataset andsales of
a company (L3)
3. Critically analyze huge data set using Hadoop distributed file systems and MapReduce(L5)
4. Apply different data processing tools like Pig, Hive and Spark.(L6)

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiments: 1

Install Apache Hadoop

AIM:
To install a single-node Hadoop cluster backed by the Hadoop Distributed File System on Ubuntu
using VMware.

PRE REQUISITES:

 Download VMware player 15.5.7


 Download ubuntu 20.04.4 iso file

Description:

1. Installing VMware
i. Double click to launch the VMware-workstation-full-15 application.
ii. Security warning panel and click on Run to continue.
iii. Initial screen will appear, wait for the process to complete.
iv. VMware Workstation setup wizard open, click next.
v. Select I accept the terms within the License Agreement and click on next.
vi. Select the directory during which you’d wish to install the appliance Also select Enhanced
Keyboard Driver checkbox.
vii. Leave it to defaults Settings and click next.
viii. Select both the options desktop and start Menu Programs Folder and click next.
ix. Click Install to start the installation process.
x. Installation in progress, wait for
this to complete.

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


2. Install ubuntu in VMware
i. Open VMware Workstation and Click on Create a new Virtual Machine.
ii. Choose Installer disc image file (iso): to make the workstation to detect that the iso file is
appropriate or not.
iii. Fill information about full name, user name and password.
iv. 4-Then click next and give your virtual Machine a relevant name.
v. 5-Allocate size of Hard Disk
vi. 6-Run Virtual Machine
vii. 7-Install Ubuntu 20.04 LTS Desktop
viii. 8-To begin the installation, click install Ubuntu.
ix. 9-Choose Keyboard Layout
x. 10-This will take a while to complete.

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
3. Run ubuntu in VMware
i. Open VMware and run ubuntu 64bit virtual machine
ii. Logon to ubuntu OS
iii. open terminal or use short cut “ctrl+alt+t”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
4.check ubuntu is updated
$sudo apt update
Output:

5- Installing Java

$sudo apt install openjdk-8-jdk -y

#checking java version


$java -version;
$javac –version

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6- Installing SSH
$sudo apt install openssh-server openssh-client -y

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


7- Create and Setup SSH Certificates
$ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$chmod 0600 ~/.ssh/authorized_keys

$ssh localhost

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8- Downloading Hadoop

$wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


9- Editng 6 important files
==========================
1st file
==========================

$sudo nano .bashrc


#Add below lines in this file

#Hadoop Related Options


export HADOOP_HOME=/home/bigdata/hadoop-3.3.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"

Now type :-
$source ~/.bashrc

==========================
2nd File
==========================
$sudo nano $HADOOP_HOME/etc/bigdata/hadoop-env.sh

#Add below line in this file in the end

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

==========================
3rd File
============================
$sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")


<property>
<name>hadoop.tmp.dir</name>
<value>/home/bigdata/tmpdata</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system></description>
</property>
==========================
4th File
===========================
$sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>dfs.data.dir</name>
<value>/home/bigdata/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/bigdata/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

==========================
5th File
===========================

$sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
==========================
6th File
==========================
$sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


<property>
<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CL
ASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

#to know java jdk installation location


$readlink -f $(which java)

#change the directory first and then:-


$cd ~/hadoop-3.3.1/sbin/
$ls -lrt

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
10- Launching Hadoop

$hdfs namenode -format

#start all services


$start-all.sh

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


#open browser and type
localhost:8088
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Final Output:

RESULT:

The installation of single-node Hadoop cluster backed by the Hadoop Distributed File System on
Ubuntu using VMware is successfully executed.

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiments: 2
Develop a MapReduce program to calculate the frequency of a given word in a given file.

Aim:
To write a MapReduce program for counting the number of occurrences of each word in a text
files using the Mapreduce concepts.

PROCEDURE:
1. Make sure you have installed and running hadoop and java

$hadoop version
$javac –version

2. Write below program and save as “WordCount.java” in Desktop

Program: WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper


extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(Object key, Text value, Context context


) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

3. Create folder for input data as filename “input_data”


4. Create a text file contains some date and name as “input.txt”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


5. Create a new folder to store java classes and name as “bigdata_classes”

Output:

6. set HADOOP classpath environment variable


$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output;

7.create a directory inside it for input


$hadoop fs –mkdir /WordCount
$ hadoop fs –mkdir /WordCount/Input

hadoop
#Check in browser localhost://50070 or 9870

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8. Upload the input file to that directory
$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input.txt’ /WordCount/Input

Output:

Check in browser

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


9. Change current directory to WordCount directory

$ cd /home/bigdata/Desktop/

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/WordCount.java’

Check compiled java files in folder name “bigdata_classes”

11. Put the output file in jar file


$jar –cvf first.jar –C bigdata_classes/ .

12. Now we run the jar file


$hadoop jar ‘/home/bigdata/Desktop/first.jar’ WordCount /WordCount/Input /WordCount/Output

13. check output


$hadoop dfs –cat /WordCount/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiments: 3

Develop a MapReduce program to find the maximum temperature in each year.

Aim:
To find maximum temperature per year from sensor temperature data sheet, using hadoop
mapreduce framework.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MaxTemp.java” in Desktop

Program: MaxTemp.java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemp {

public static class MaxTempMapper extends Mapper<LongWritable , Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {

String line = value.toString();


String year = line.substring(15,19);
int temperature,MISSING=0;

if (line.charAt(87)=='+')
temperature =Integer.parseInt(line.substring(88, 92));
else
temperature = Integer.parseInt(line.substring(87, 92));
String quality = line.substring(92, 93);
if(temperature != MISSING && quality.matches("[01459]"))
context.write(new Text(year),new IntWritable(temperature));
}
}

public static class MaxTempReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable maxTempResult = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,


Context context
) throws IOException, InterruptedException {
int max_temp = 0;
for (IntWritable val : values) {
int temp = val.get();
if (temp > max_temp)
max_temp = temp;
}
maxTempResult.set(max_temp);
context.write(key, maxTempResult);
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Max Temp");
job.setJarByClass(MaxTemp.class);
job.setMapperClass(MaxTempMapper.class);
job.setCombinerClass(MaxTempReducer.class);
job.setReducerClass(MaxTempReducer.class);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains some NDC date and name as “input1.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir /MaxTemp
$ hadoop fs –mkdir /MaxTem/Input1

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input1.txt’ /MaxTemp/Input1

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

9. Change current directory to “MaxTemp.java” directory


$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


‘/home/bigdata/Desktop/MaxTemp.java’

Check compiled java files in folder name “bigdata_classes”


Output:

11. Put the output file in jar file


$jar –cvf first.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$hadoop jar ‘/home/bigdata/Desktop/second.jar’ MaxTemp /MaxTemp/Input1 /MaxTemp/Output

# check output
$hadoop dfs –cat /MaxTemp/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 4

Develop a MapReduce program to find the grades of student’s.

Aim:
To find the grades of student’s using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “StudentGrade.java” in Desktop

Program: StudentGrade.java

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


public class StudentGrade {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private Text name = new Text();


private Text grade = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
String line = value.toString();
String str[] = line.split(", ");
context.write(name, grade);
}
}

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)


throws IOException, InterruptedException
{

int avg = 0;
int l = 0;
for (IntWritable val : values) {
l += 1;
avg += val.get();
}
avg=avg / l;
context.write(key, new IntWritable(avg));
if(avg>=80)
{
output.collect(new Text("A " + grade),new Text(String.valueOf(name)));
}
else if(avg>=60 && avg<80)
{
output.collect(new Text("B " + grade),new Text(String.valueOf(name)));
}
else if(avg>=40 && avg<60)

{
output.collect(new Text("C " + grade),new Text(String.valueOf(name)));
}
else
{
output.collect(new Text("D " + grade),new Text(String.valueOf(name)));
}
}
}

public static void main(String[] args) throws Exception


{
Configuration conf = new Configuration();

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Job job = new Job(conf, "StudentGrade");
job.setJarByClass(StudentGrade.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path out = new Path(args[1]);
out.getFileSystem(conf).delete(out);
job.waitForCompletion(true);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains student date and name as “input4.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir /StudentGrade
$ hadoop fs –mkdir /StudentGrade/Input4

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input4.txt’ /StudentGrade/Input4
Output:

9. Change current directory to “StudentGrade.java” directory


$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/SudentGrade.java’

Check compiled java files in folder name “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


11. Put the output file in jar file
$jar –cvf first.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/four.jar’ SrudentGrade /StudentGrade/Input4 /StudentGrade/Output

# check output
$hadoop dfs –cat /StudentGrade/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 5

Develop a MapReduce program to implement Matrix Multiplication.

Aim:
1. To find Matrix Multiplication using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MatrixMul.java” in Desktop

Program: MatrixMul.java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MatrixMul {

public class MatrixMapper extends Mapper<LongWritable, Text,


Text, Text> {
public void map(LongWritable key, Text value, Context
context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("A")) {
for (int k = 0; k < p; k++) {
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("A," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
} else {
for (int i = 0; i < m; i++) {
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("B," + indicesAndValue[1] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}

public class MatrixReducer extends Reducer<Text, Text, Text, Text> {


public void reduce(Text key, Iterable<Text> values, Context
context) throws IOException, InterruptedException {
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer,
Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer,
Float>();
for (Text val : values) {
value = val.toString().split(",");
if (value[0].equals("A")) {

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


hashA.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
} else {
hashB.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
}
}
int n =Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float a_ij;
float b_jk;
for (int j = 0; j < n; j++) {
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f) {
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
// A is an m-by-n matrix; B is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "5");
conf.set("p", "3");
Job job = new Job(conf, "MatrixMul");
job.setJarByClass(MatrixMul.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains matrix A&B and name as “input5.txt”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

6. set HADOOP classpath environment variable


$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir /MatrixMul
$ hadoop fs –mkdir /MatrixMul/Input5

#Check in browser localhost://50070 or 9870


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input5.txt’ /MatrixMul/Input5


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


9. Change current directory to “MatrixMul.java” directory
$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MatrixMul.java’

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Check compiled java files in folder name “bigdata_classes”
Output:

11. Put the output file in jar file


$jar –cvf five.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/five.jar’ MatrixMul /MatrixMul/Input5 /MatrixMul/Output

# check output
$hadoop dfs –cat /MatrixMul/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiment: 6

Develop a MapReduce program to find the maximum electrical consumption in each year given
electrical consumption for each month in each year.

Aim:
To find the maximum electrical consumption in each year givenelectrical consumption for each month
in each year using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “ProcessUnits.java” in Desktop

Program: ProcessUnits.java

import java.util.*;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


import org.apache.hadoop.util.*;
public class ProcessUnits
{
public static class E_EMapper extends MapReduceBase implements
Mapper<LongWritable ,
Text,
Text,
IntWritable>
{

public void map(LongWritable key, Text value,


OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();

while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}

int avgprice = Integer.parseInt(lasttoken);


output.collect(new Text(year), new IntWritable(avgprice));
}
}

public static class E_EReduce extends MapReduceBase implements


Reducer< Text, IntWritable, Text, IntWritable >
{

public void reduce( Text key, Iterator <IntWritable> values,


OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;

while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}

}
}

public static void main(String args[])throws Exception


{

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


JobConf conf = new JobConf(ProcessUnits.class);

conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));


FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains electricity consumption data every month in a year and name as “inpu6.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir /ProcessUnits
$ hadoop fs –mkdir /ProcessUnits/Input6

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input6.txt’ /ProcessUnits/Input6
Output:

9. Change current directory to “ProcessUnits.java” directory


$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/ProcessUnits.java’

Check compiled java files in folder name “bigdata_classes”


Output:

11. Put the output file in jar file


$jar –cvf first.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/six.jar’ ProcessUnits /ProcessUnits/Input6 /ProcessUnits/Output

# check output
$hadoop dfs –cat /ProcessUnits/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiment: 7

Develop a MapReduce to analyze weather data set and print whether the day is shinny or coolday.

Aim:
To analyze weather data set and print whether the day is shinny or coolday using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MyMaxMin.java” in Desktop

Program: MyMaxMin.java

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class MyMaxMin {

public static class MaxTemperatureMapper extends MapReduceBase implements


Mapper<LongWritable, Text, Text, Text> {

@Override
public void map(LongWritable arg0, Text Value,
OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {

String line = Value.toString();

// Example of Input
// Date Max Min
// 25380 20130101 2.514 -135.69 58.43 8.3 1.1 4.7 4.9 5.6 0.01 C
1.0 -0.1 0.4 97.3 36.0 69.4 -99.000 -99.000 -99.000 -99.000 -99.000 -9999.0 -9999.0 -9999.0 -
9999.0 -9999.0

String date = line.substring(6, 14);

float temp_Max = Float.parseFloat(line.substring(39, 45).trim());


float temp_Min = Float.parseFloat(line.substring(47, 53).trim());

if (temp_Max > 40.0) {


// shinny day
output.collect(new Text("shinny Day " + date),
new Text(String.valueOf(temp_Max)));
}

if (temp_Min < 10) {


// Cool day
output.collect(new Text("Cool Day " + date),
new Text(String.valueOf(temp_Min)));
}
}

public static class MaxTemperatureReducer extends MapReduceBase implements


Reducer<Text, Text, Text, Text> {

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


@Override
public void reduce(Text Key, Iterator<Text> Values,
OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {

// Find Max temp yourself ?


String temperature = Values.next().toString();
output.collect(Key, new Text(temperature));
}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(MyMaxMin .class);


conf.setJobName("temp");

// Note:- As Mapper's output types are not default so we have to define


// the
// following properties.
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);

conf.setMapperClass(MaxTemperatureMapper.class);
conf.setReducerClass(MaxTemperatureReducer.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));


FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}
}

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains some NDC date and name as “input7.txt”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

6. set HADOOP classpath environment variable


$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir /MaxTemp
$ hadoop fs –mkdir /MaxTem/Input1

#Check in browser localhost://50070 or 9870


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input7.txt’ /MyMaxMin/Input7


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


9. Change current directory to “MyMaxMin.java” directory
$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MyMaxMin.java’

Check compiled java files in folder name “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


11. Put the output file in jar file
$jar –cvf first.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/seven.jar’ MyMaxMin /MyMaxMin/Input7 /MyMaxMin/Output

# check output
$hadoop dfs –cat /MyMaxMin/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiment: 8

Develop a MapReduce program to find the number of products sold in each country byconsidering sales data
containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi


_Date uct ce _Type Me ty ate ntry Created ogin ude tude

Aim:
To Develop a MapReduce program to find the number of products sold in each country byconsidering sales
data containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi


_Date uct ce _Type Me ty ate ntry Created ogin ude tude

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


2. Write below program and save as “SalesCountry.java” in Desktop

Program: SalesCountry.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class SalesCountry


{
public class SalesMapper extends MapReduceBase implements Mapper <LongWritable, Text, Text,
IntWritable> {
private final IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector <Text, IntWritable> output,
Reporter reporter) throws IOException {

String valueString = value.toString();


String[] SingleCountryData = valueString.split(",");
output.collect(new Text(SingleCountryData[7]), one);
}
}
public class SalesCountryReducer extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values,


OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
Text key = t_key;
int frequencyForCountry = 0;
while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get();

}
output.collect(key, new IntWritable(frequencyForCountry));
}
}
public static void main(String[] args)
{
JobClient my_client = new JobClient();
// Create a configuration object for the job

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


JobConf job_conf = new JobConf(SalesCountry.class);

// Set a name of the Job


job_conf.setJobName("SalePerCountry");

// Specify data type of output key and value


job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class


job_conf.setMapperClass(SalesMapper.class);
job_conf.setReducerClass(SalesCountryReducer.class);

// Specify formats of the data type of Input and output


job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line arguments,


//arg[0] = name of input directory on HDFS, and arg[1] = name of output directory to be
created to store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));


FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

my_client.setConf(job_conf);
try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains some sales data and name as “input8.txt”

Output:
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
5. Check folder to store java classes and with name as “bigdata_classes”
Output:

6. set HADOOP classpath environment variable


$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir /SalesCountry
$ hadoop fs –mkdir /SalesCountry/Input8

#Check in browser localhost://50070 or 9870


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input8.txt’ /SalesCountry/Input8


Output:

9. Change current directory to “SalesCountry.java” directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/SalesCountry.java’

Check compiled java files in folder name “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


11. Put the output file in jar file
$jar –cvf eight.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file


$hadoop jar ‘/home/bigdata/Desktop/eight.jar’ SalesCountry /SalesCountry/Input8 /SalesCountry/Output

# check output
$hadoop dfs –cat /SalesCountry/Output/*

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 9

Develop a MapReduce program to find the tags associated with each movie by analyzingmovie lens
data.

Aim:
To Develop a MapReduce program to find the tags associated with each movie by analyzingmovie lens
data.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MovieLens.java” in Desktop

Program: MovieLens.java

import java.io.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MovieLens {

public static class RatingMapper


extends Mapper<Object, Text, Text, DoubleWritable>{

private Text word = new Text();


private DoubleWritable rating=new DoubleWritable();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String line = value.toString();
if(line.charAt(0)!='u')
{
String[] line_values = line.split(",");
word.set(line_values[1]);
rating.set(Double.parseDouble(line_values[2]));
context.write(word, rating);
}
}
}

public static class AverageReducer


extends Reducer<Text,DoubleWritable,Text,DoubleWritable> {
private DoubleWritable result = new DoubleWritable();

public void reduce(Text key, Iterable<DoubleWritable> values,


Context context
) throws IOException, InterruptedException {
double sum=0.0;
int count=0;
for (DoubleWritable val : values) {
sum += val.get();
count++;
}
result.set(sum/count);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "hdfs://cshadoop1:61120");
conf.set("yarn.resourcemanager.address", "cshadoop1.utdallas.edu:8032");
conf.set("mapreduce.framework.name", "yarn");
Job job = Job.getInstance(conf, "MovieLens");
job.setJarByClass(mapReduceMovieR.class);
job.setMapperClass(RatingMapper.class);
job.setReducerClass(AverageReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains some movie lens data and name as “input9.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


7. create a directory inside hadoop for input
$hadoop fs –mkdir /MovieLens
$ hadoop fs –mkdir /MovieLens/Input9

#Check in browser localhost://50070 or 9870


Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input9.txt’ /MovieLens/Input9
Output:

9. Change current directory to “MovieLens.java” directory


$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MovieLens.java’

Check compiled java files in folder name “bigdata_classes”


Output:

11. Put the output file in jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$jar –cvf nine.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file


$hadoop jar ‘/home/bigdata/Desktop/nine.jar’ MovieLens /MovieLens/Input9 /MovieLens/Output

# check output
$hadoop dfs –cat /MovieLens/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 10

XYZ.com is an online music website where users listen to various tracks, the data gets collected which is
given below.

The data is coming in log files and looks like as shown below.

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

Write a MapReduce program to get the following


 Number of unique listeners
 Number of times the track was shared with others
 Number of times the track was listened to on the radio
 Number of times the track was listened to in total
 Number of times the track was skipped on the radio

Aim:
XYZ.com is an online music website where users listen to various tracks, the data gets collected which is
given below.

The data is coming in log files and looks like as shown below.

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

To Write a MapReduce program to get the following


 Number of unique listeners
 Number of times the track was shared with others
 Number of times the track was listened to on the radio
 Number of times the track was listened to in total
 Number of times the track was skipped on the radio

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
2. Write below program and save as “MusicTrack.java” in Desktop

Program: MusicTrack.java

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
//import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MusicTrack
{
public static class MusicMapper extends Mapper<Object,Text,Text,Text>
{
public void map(Object key,Text value,Context context) throws
IOException,InterruptedException
{
String[] tokens=value.toString().split("\\|");
String trackid = /*"1";*/tokens[1];
String others = tokens[0]+"\t"+tokens[2]+"\t"+tokens[3]+"\t"+tokens[4];
context.write(new Text(trackid),new Text(others));
}
}

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


public static class MusicReduceer extends Reducer<Text,Text,Text,Text>
{
public void reduce(Text Key,Iterable<Text> value,Context context) throws
IOException,InterruptedException
{

Set<Integer> userIdSet = new HashSet<Integer>();


int shared = 0;
int radio =0;
int skip= 0;
int listen=0;

for(Text val:value)

{
String[] valTokens = val.toString().split("\t");

int sh = Integer.parseInt(valTokens[1]);
int ra = Integer.parseInt(valTokens[2]);
int sk = Integer.parseInt(valTokens[3]);

shared = shared+sh;
radio=radio+ra;
skip=skip+sk;
listen = shared + radio;

int cus = Integer.parseInt(valTokens[0]);

userIdSet.add(cus);

IntWritable size = new IntWritable(userIdSet.size());

context.write(new Text(Key),new Text("customerId- "+size+"\t"+"Shared-


"+shared+"\t"+"Radio- "+radio+"\t"+"Skipped- "+skip+"\t"+"Listen- "+listen));
}

}
public static void main(String args[]) throws Exception
{
Configuration conf=new Configuration();
Job job=new Job(conf,"MusicTrack");
job.setNumReduceTasks(1);
job.setJarByClass(MusicTrack.class);
job.setMapperClass(MusicMapper.class);

job.setReducerClass(MusicReduceer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Path outputpath= new Path(args[1]);
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
outputpath.getFileSystem(conf).delete(outputpath,true);
System.exit(job.waitForCompletion(true)?0:1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains some movie lens data and name as “input10.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


7. create a directory inside hadoop for input
$hadoop fs –mkdir /MusicTrack
$ hadoop fs –mkdir /MusicTrack/Input10

#Check in browser localhost://50070 or 9870


Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input10.txt’ /MusicTrack/Input10
Output:

9. Change current directory to “MusicTrack.java” directory


$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MusicTrack.java’

Check compiled java files in folder name “bigdata_classes”


Output:

11. Put the output file in jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$jar –cvf ten.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file


$hadoop jar ‘/home/bigdata/Desktop/ten.jar’ MusicTrack /MusicTrack/Input10 /MusicTrack/Output

# check output
$hadoop dfs –cat /MusicTrack/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 11

Develop a MapReduce program to find the frequency of books published eachyear and findin which year
maximum number of books were published usingthe following data.

Title Author Published Author Language No of pages


Year country

Aim:
To Develop a MapReduce program to find the frequency of books published each year and findin which
year maximum number of books were published using the following data.

Title Author Published Author Language No of pages


Year country

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “BookMax.java” in Desktop

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Program: BookMax.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class BookMax {

public static class BookMaxMapper extends Mapper<LongWritable , Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {

String line = value.toString();


String year = line.substring(15,19);
int author,MISSING=0;

if (line.charAt(87)=='+')
author =Integer.parseInt(line.substring(25, 32));
else
author = Integer.parseInt(line.substring(20, 25));
String quality = line.substring(92, 93);
if(author != MISSING && quality.matches("[01200]"))
context.write(new Text(year),new IntWritable(author));
}
}

public static class BookMaxReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable maxBookResult = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,


Context context
) throws IOException, InterruptedException {
int max_book = 0;
for (IntWritable val : values) {
int book = val.get();
if (book > max_book)
max_book = book;
}
maxBookResult.set(max_book);
context.write(key, maxBookResult);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "BookMax");
job.setJarByClass(BookMax.class);
job.setMapperClass(BookMaxMapper.class);
job.setCombinerClass(BookMaxReducer.class);
job.setReducerClass(BookMaxReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains some movie lens data and name as “input11.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


7. create a directory inside hadoop for input
$hadoop fs –mkdir /BookMax
$ hadoop fs –mkdir /BookMax/Input11

#Check in browser localhost://50070 or 9870


Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input11.txt’ /MusicTrack/Input11
Output:

9. Change current directory to “BookMax.java” directory


$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/BookMax.java’

Check compiled java files in folder name “bigdata_classes”


Output:

11. Put the output file in jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$jar –cvf eleven.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file


$hadoop jar ‘/home/bigdata/Desktop/eleven.jar’ BookMax /BookMax/Input11 /BookMax/Output

# check output
$hadoop dfs –cat /BookMax/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiments: 12

Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people
(both male and female) who died in the tragedy. How many persons are survived in each class.

The titanic data will be..


Column 1 :PassengerI d Column 2 : Survived (survived=0 &died=1)
Column 3 :Pclass Column 4 : Name
Column 5 : Sex Column 6 : Age
Column 7 :SibSp Column 8 :Parch
Column 9 : Ticket Column 10 : Fare
Column 11 :Cabin Column 12 : Embarked

Aim:
To Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people
(both male and female) who died in the tragedy. How many persons are survived in each class.

The titanic data will be..


Column 1 :PassengerI d Column 2 : Survived (survived=0 &died=1)
Column 3 :Pclass Column 4 : Name
Column 5 : Sex Column 6 : Age
Column 7 :SibSp Column 8 :Parch
Column 9 : Ticket Column 10 : Fare
Column 11 :Cabin Column 12 : Embarked

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


2. Write below program and save as “Average_age.java” in Desktop

Program: Average_age.java

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class Average_age {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private Text gender = new Text();


private IntWritable age = new IntWritable();
public void map(LongWritable key, Text value, Context context ) throws IOException,
InterruptedException {
String line = value.toString();
String str[]=line.split(",");
if(str.length>6){
gender.set(str[4]);
if((str[1].equals("0")) ){
if(str[5].matches("\\d+")){
int i=Integer.parseInt(str[5]);
age.set(i);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


}
}
}
context.write(gender, age);

public static class Reduce extends Reducer<Text,IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)


throws IOException, InterruptedException {
int sum = 0;
int l=0;
for (IntWritable val : values) {
l+=1;
sum += val.get();
}
sum=sum/l;
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();

@SuppressWarnings("deprecation")
Job job = new Job(conf, "Averageage_survived");
job.setJarByClass(Average_age.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path out=new Path(args[1]);
out.getFileSystem(conf).delete(out);
job.waitForCompletion(true);
}

}
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains titanic date and name as “input12.txt”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

6. set HADOOP classpath environment variable


$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir / Average_age
$ hadoop fs –mkdir / Average_age /Input12

#Check in browser localhost://50070 or 9870


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input12.txt’ / Average_age/Input12


Output:

9. Change current directory to “Average_age.java” directory


$ cd /home/bigdata/Desktop/

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/ Average_age.java’

Check compiled java files in folder name “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


11. Put the output file in jar file
$jar –cvf twelve.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file


$hadoop jar ‘/home/bigdata/Desktop/twelve.jar’ Average_age / Average_age /Input12 / Average_age /Output

# check output
$hadoop dfs –cat / Average_age /Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiments: 13

Develop a MapReduce program to analyze Uber data set to find the days on which eachbasement has
more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number Date active_vehicles trips

Aim:
To Develop a MapReduce program to analyze Uber data set to find the days on which eachbasement
has more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number Date active_vehicles trips

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “UberTrack.java” in Desktop

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Program: UberTrack.java

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class UberTrack


{
public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

public Text basement = new Text();


public int trips;
Calendar calendar = Calendar.getInstance();
Date date = null;
SimpleDateFormat format = new SimpleDateFormat("MM/DD/YYYY");
String[] days = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

public void map(Object key, Text record, Context context) throws IOException,
InterruptedException {
String[] parts = record.toString().split("[,]");
basement.set(parts[0]);

try {
date = format.parse(parts[1]);
calendar.setTime(date);
} catch (ParseException e) {
e.printStackTrace();
}

trips = new Integer(parts[3]);


String Keys = basement.toString() + " " + days[Calendar.DAY_OF_WEEK];
context.write(new Text(Keys), new IntWritable(trips));

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


}

}
public class Sum_reducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws


IOException, InterruptedException {

int sum = 0;

for(IntWritable val: values)


{
sum += val.get();
}

result.set(sum);

context.write(key, result);

}
}
public class Trip_tracking extends Configured implements Tool {

public static void main(String[] args) throws Exception {


int exitcode = ToolRunner.run(new Trip_tracking(), args);
System.exit(exitcode);
}

public int run(String[] args) throws Exception {

Job job = Job.getInstance(getConf(), "Uber Trip tracking to fiund the more trips for
each basement");
job.setJarByClass(getClass());

FileInputFormat.setInputPaths(job, new Path("In"));


FileOutputFormat.setOutputPath(job, new Path(args[1] + "23/05/17"));

job.setMapperClass(TokenizerMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setCombinerClass(Sum_reducer.class);
job.setReducerClass(Sum_reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

return job.waitForCompletion(true) ? 0 :1;


}

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


4. Create a text file contains titanic date and name as “input13.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input


$hadoop fs –mkdir / UberTrack
$ hadoop fs –mkdir / UberTrack /Input13

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input13.txt’ / UberTrack/Input13

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Output:

9. Change current directory to “UberTrack.java” directory


$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code


$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


‘/home/bigdata/Desktop/ UberTrack.java’

Check compiled java files in folder name “bigdata_classes”


Output:

11. Put the output file in jar file


$jar –cvf thirteen.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/ thirteen.jar’ UberTrack /UberTrack /Input13 / UberTrack /Output

# check output
$hadoop dfs –cat / UberTrack /Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 14

Develop a program to calculate the maximum recorded temperature by yearwise for theweather
dataset in Pig Latin

Aim:
To develop a program to calculate the maximum recorded temperature by yearwise for theweather
dataset in Pig Latin

Description:

1. Install java
2. Install hadoop
3. Run hadoop
$start-all.sh

Output:

4. Download pig from apache

$wget http://www.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


5. Unzip Pig

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$tar xvzf ‘/home/bigdata/pig-0.16.0.tar.gz’

Output:

6. Change Directory to pig-0.16.0 folder

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$cd home/bigdata/pig-0.16.0

7. Open the .bashrc file

$nano .bashrc

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8.Update .bashrc file
$source .bashrc

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10. Set JAVA_HOME

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

OUTPUT:

11.Run the pig

$pig

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


12. Move weather data set input to hdfs root directory

$hdfs dfs –copyFromLocal /home/bigdata/Desktop/input_data/input1.txt hdfs:/

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


13. Execute the below lines of code in grunt shell

grunt>records = LOAD 'hdfs://sample.txt' AS (year:chararray, temperature:int, quality:int);


grunt >filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR
quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
grunt >grouped_records = GROUP filtered_records BY year;
grunt >max_temp = FOREACH grouped_records GENERATE group,
MAX(filtered_records.temperature);
grunt >DUMP max_temp;

final output:

1949 111
1950 22

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 15

Write queries to sort and aggregate the data in a table using HiveQL.

Aim:
To write queries to sort and aggregate the data in a table using HiveQL.

Description:

1. Install java
2. Install hadoop
3. Run hadoop
$start-all.sh

Output:

4. Download hive from apache

$wget “https://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


5. Unzip hive

$tar xvzf ‘/home/bigdata/ apache-hive-3.1.2-bin.tar.gz’


Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


6. Change Directory to apache-hive-3.1.2-bin folder

$cd home/bigdata/ apache-hive-3.1.2-bin

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


7. Open the .bashrc file

$nano .bashrc

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


8.Update .bashrc file
$source .bashrc

09. Set JAVA_HOME

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

OUTPUT:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


10.Run the hive

$hive

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


11. Create and Select the database in which we want to create a table.
hive> create database svcn;
hive> use svcn;

12. Now, create a table by using the following command

hive> create table emp (Id int, Name string , Salary float, Department string)
row format delimited
fields terminated1949by111',' ;
1950 22

13. Load the data into the table.

hive> load data local inpath '/home/bigdata/hive/emp_data' into table emp;

Final output:

14. Now, fetch the data in the descending order (sorting) by using the following command.

hive> select * from emp sort by salary desc;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


15. Now, fetch the sum of employee salaries department wise(aggregate) by using the
following command
hive> select department, sum(salary) from emp group by department;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


Experiment: 16

Develop a Java application to find the maximum temperature using Spark.

Aim:
To Develop a Java application to find the maximum temperature using Spark.

Description:

1. Install java
2. Install hadoop
3. Run hadoop
$start-all.sh

Output:

4. Download spark from apache

$wget “https://www.apache.org/dyn/closer.lua/spark/spark-2.4.1/spark-2.4.1-bin-hadoop2.7.tgz”

5. Unzip spark

$tar xvzf ‘/home/bigdata/spark-2.4.1-bin-hadoop2.7.tgz

6. Change Directory to spark-2.4.1-bin-hadoop2.7 folder

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


$cd home/bigdata/spark-2.4.1-bin-hadoop2.7

7. Open the .bashrc file

$nano .bashrc
SPARK_HOME=/ home/bigdata/spark-2.4.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

8.Update .bashrc file

$source .bashrc
09. Set JAVA_HOME

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

10.Run the spark


$ spark-shell

11. Create a text file “input11.txt” in your local machine and write some weather data set into it.

12. Write below java Program to find maximum temperature

import org.apache.spark.SparkContext._

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN


import org.apache.spark.{SparkConf, SparkContext}
object MaxTemperature {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Max Temperature").setMaster("local")
val sc = new SparkContext(conf)
val lines = sc.textFile("input11.txt")
val records = lines.map(_.split("\t"))
val filtered = records.filter(rec => (rec(1) != "9999"
&& rec(2).matches("[01459]")))
val tuples = filtered.map(rec => (rec(0).toInt, rec(1).toInt))
val maxTemps = tuples.reduceByKey((a, b) => Math.max(a, b))
maxTemps.foreach(println(_))
}
}

13. Final output

Scala> 1949 111

1950 22

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

You might also like