0% found this document useful (0 votes)

17 views159 pages

Big Data Lab

The document outlines the steps to install Apache Hadoop on Ubuntu by first installing VMware and Ubuntu, then updating Ubuntu, installing Java and SSH, downloading Hadoop, and editing configuration files to set environment variables and paths for Hadoop. Key steps include installing VMware to run Ubuntu, installing Java and SSH on Ubuntu, downloading Hadoop, and editing files like bashrc to configure Hadoop paths and environment variables. The experiments cover configuring a single-node Hadoop cluster on Ubuntu using these installation and configuration steps.

Uploaded by

praveenm026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views159 pages

Big Data Lab

Uploaded by

praveenm026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 159

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR B.

Tech
(CSE) – III-II L T P C
0 0 3 1.5

(19A05602P) BIG DATA ANALYTICS LABORATORY

Course Objectives:

This course is designed to:

1. Get familiar with Hadoop distributions, configuring Hadoop and performing Filemanagement
tasks
2. Experiment MapReduce in Hadoop frameworks
3. Implement MapReduce programs in variety applications
4. Explore MapReduce support for debugging
5. Understand different approaches for building Hadoop MapReduce programs for real-time
applications

Experiments:

1. Install Apache Hadoop

2. Develop a MapReduce program to calculate the frequency of a given word in a given file.

3. Develop a MapReduce program to find the maximum temperature in each year.

4. Develop a MapReduce program to find the grades of student’s.

5. Develop a MapReduce program to implement Matrix Multiplication.

6. Develop a MapReduce to find the maximum electrical consumption in each year givenelectrical
consumption for each month in each year.

7. Develop a MapReduce to analyze weather data set and print whether the day is shinny or coolday.

8. Develop a MapReduce program to find the number of products sold in each country byconsidering
sales data containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi

_Date uct ce _Type Me ty ate ntry Created ogin ude tude

9. Develop a MapReduce program to find the tags associated with each movie by analyzingmovie
lens data.

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. XYZ.com is an online music website where users listen to various tracks, the data gets collected
which is given below.

The data is coming in log files and looks like as shown below.

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

Write a MapReduce program to get the following

 Number of unique listeners
 Number of times the track was shared with others
 Number of times the track was listened to on the radio
 Number of times the track was listened to in total
 Number of times the track was skipped on the radio

11. Develop a MapReduce program to find the frequency of books published eachyear and findin which
year maximum number of books were published usingthe following data.

Title Author Published Author Language No of pages

Year country

12. Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people
(both male and female) who died in the tragedy. How many persons are survived in each class.

The titanic data will be..

Column 1 :PassengerI d Column 2 : Survived (survived=0 &died=1)
Column 3 :Pclass Column 4 : Name
Column 5 : Sex Column 6 : Age
Column 7 :SibSp Column 8 :Parch
Column 9 : Ticket Column 10 : Fare
Column 11 :Cabin Column 12 : Embarked

13. Develop a MapReduce program to analyze Uber data set to find the days on which each
basement has more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number Date active_vehicles trips

14. Develop a program to calculate the maximum recorded temperature by yearwise for theweather
dataset in Pig Latin

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

15. Write queries to sort and aggregate the data in a table using HiveQL.

16. Develop a Java application to find the maximum temperature using Spark.
Text Books:

1. Tom White, “Hadoop: The Definitive Guide” Fourth Edition, O’reilly Media, 2015.

Reference Books:

1. Glenn J. Myatt, Making Sense of Data , John Wiley & Sons, 2007 Pete Warden, Big Data
Glossary, O’Reilly, 2011.
2. Michael Berthold, David J.Hand, Intelligent Data Analysis, Spingers, 2007.
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos, Uderstanding BigData :
Analytics for Enterprise Class Hadoop and Streaming Data, McGrawHill Publishing, 2012.
4. AnandRajaraman and Jeffrey David UIIman, Mining of Massive Datasets Cambridge
University Press, 2012.

Course Outcomes:

Upon completion of the course, the students should be able to:

1. Configure Hadoop and perform File Management Tasks (L2)

2. Apply MapReduce programs to real time issues like word count, weather dataset andsales of
a company (L3)
3. Critically analyze huge data set using Hadoop distributed file systems and MapReduce(L5)
4. Apply different data processing tools like Pig, Hive and Spark.(L6)

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiments: 1

Install Apache Hadoop

AIM:
To install a single-node Hadoop cluster backed by the Hadoop Distributed File System on Ubuntu
using VMware.

PRE REQUISITES:

 Download VMware player 15.5.7

 Download ubuntu 20.04.4 iso file

Description:

1. Installing VMware
i. Double click to launch the VMware-workstation-full-15 application.
ii. Security warning panel and click on Run to continue.
iii. Initial screen will appear, wait for the process to complete.
iv. VMware Workstation setup wizard open, click next.
v. Select I accept the terms within the License Agreement and click on next.
vi. Select the directory during which you’d wish to install the appliance Also select Enhanced
Keyboard Driver checkbox.
vii. Leave it to defaults Settings and click next.
viii. Select both the options desktop and start Menu Programs Folder and click next.
ix. Click Install to start the installation process.
x. Installation in progress, wait for
this to complete.

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

2. Install ubuntu in VMware
i. Open VMware Workstation and Click on Create a new Virtual Machine.
ii. Choose Installer disc image file (iso): to make the workstation to detect that the iso file is
appropriate or not.
iii. Fill information about full name, user name and password.
iv. 4-Then click next and give your virtual Machine a relevant name.
v. 5-Allocate size of Hard Disk
vi. 6-Run Virtual Machine
vii. 7-Install Ubuntu 20.04 LTS Desktop
viii. 8-To begin the installation, click install Ubuntu.
ix. 9-Choose Keyboard Layout
x. 10-This will take a while to complete.

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
3. Run ubuntu in VMware
i. Open VMware and run ubuntu 64bit virtual machine
ii. Logon to ubuntu OS
iii. open terminal or use short cut “ctrl+alt+t”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
4.check ubuntu is updated
$sudo apt update
Output:

5- Installing Java

$sudo apt install openjdk-8-jdk -y

#checking java version

$java -version;
$javac –version

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6- Installing SSH
$sudo apt install openssh-server openssh-client -y

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

7- Create and Setup SSH Certificates
$ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$chmod 0600 ~/.ssh/authorized_keys

$ssh localhost

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8- Downloading Hadoop

$wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

9- Editng 6 important files
==========================
1st file
==========================

$sudo nano .bashrc

#Add below lines in this file

#Hadoop Related Options

export HADOOP_HOME=/home/bigdata/hadoop-3.3.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"

Now type :-
$source ~/.bashrc

==========================
2nd File
==========================
$sudo nano $HADOOP_HOME/etc/bigdata/hadoop-env.sh

#Add below line in this file in the end

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

==========================
3rd File
============================
$sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>hadoop.tmp.dir</name>
<value>/home/bigdata/tmpdata</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system></description>
</property>
==========================
4th File
===========================
$sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>dfs.data.dir</name>
<value>/home/bigdata/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/bigdata/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

==========================
5th File
===========================

$sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
==========================
6th File
==========================
$sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

<property>
<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CL
ASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

#to know java jdk installation location

$readlink -f $(which java)

#change the directory first and then:-

$cd ~/hadoop-3.3.1/sbin/
$ls -lrt

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
10- Launching Hadoop

$hdfs namenode -format

#start all services

$start-all.sh

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

#open browser and type
localhost:8088
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Final Output:

RESULT:

The installation of single-node Hadoop cluster backed by the Hadoop Distributed File System on
Ubuntu using VMware is successfully executed.

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiments: 2
Develop a MapReduce program to calculate the frequency of a given word in a given file.

Aim:
To write a MapReduce program for counting the number of occurrences of each word in a text
files using the Mapreduce concepts.

PROCEDURE:
1. Make sure you have installed and running hadoop and java

$hadoop version
$javac –version

2. Write below program and save as “WordCount.java” in Desktop

Program: WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

3. Create folder for input data as filename “input_data”

4. Create a text file contains some date and name as “input.txt”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

5. Create a new folder to store java classes and name as “bigdata_classes”

Output:

6. set HADOOP classpath environment variable

$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output;

7.create a directory inside it for input

$hadoop fs –mkdir /WordCount
$ hadoop fs –mkdir /WordCount/Input

hadoop
#Check in browser localhost://50070 or 9870

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8. Upload the input file to that directory
$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input.txt’ /WordCount/Input

Output:

Check in browser

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

9. Change current directory to WordCount directory

$ cd /home/bigdata/Desktop/

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/WordCount.java’

Check compiled java files in folder name “bigdata_classes”

11. Put the output file in jar file

$jar –cvf first.jar –C bigdata_classes/ .

12. Now we run the jar file

$hadoop jar ‘/home/bigdata/Desktop/first.jar’ WordCount /WordCount/Input /WordCount/Output

13. check output

$hadoop dfs –cat /WordCount/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiments: 3

Develop a MapReduce program to find the maximum temperature in each year.

Aim:
To find maximum temperature per year from sensor temperature data sheet, using hadoop
mapreduce framework.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MaxTemp.java” in Desktop

Program: MaxTemp.java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemp {

public static class MaxTempMapper extends Mapper<LongWritable , Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {

String line = value.toString();

String year = line.substring(15,19);
int temperature,MISSING=0;

if (line.charAt(87)=='+')
temperature =Integer.parseInt(line.substring(88, 92));
else
temperature = Integer.parseInt(line.substring(87, 92));
String quality = line.substring(92, 93);
if(temperature != MISSING && quality.matches("[01459]"))
context.write(new Text(year),new IntWritable(temperature));
}
}

public static class MaxTempReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable maxTempResult = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context
) throws IOException, InterruptedException {
int max_temp = 0;
for (IntWritable val : values) {
int temp = val.get();
if (temp > max_temp)
max_temp = temp;
}
maxTempResult.set(max_temp);
context.write(key, maxTempResult);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Max Temp");
job.setJarByClass(MaxTemp.class);
job.setMapperClass(MaxTempMapper.class);
job.setCombinerClass(MaxTempReducer.class);
job.setReducerClass(MaxTempReducer.class);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains some NDC date and name as “input1.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir /MaxTemp
$ hadoop fs –mkdir /MaxTem/Input1

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input1.txt’ /MaxTemp/Input1

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

9. Change current directory to “MaxTemp.java” directory

$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

‘/home/bigdata/Desktop/MaxTemp.java’

Check compiled java files in folder name “bigdata_classes”

Output:

11. Put the output file in jar file

$jar –cvf first.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$hadoop jar ‘/home/bigdata/Desktop/second.jar’ MaxTemp /MaxTemp/Input1 /MaxTemp/Output

# check output
$hadoop dfs –cat /MaxTemp/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 4

Develop a MapReduce program to find the grades of student’s.

Aim:
To find the grades of student’s using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “StudentGrade.java” in Desktop

Program: StudentGrade.java

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

public class StudentGrade {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private Text name = new Text();

private Text grade = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
String line = value.toString();
String str[] = line.split(", ");
context.write(name, grade);
}
}

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException
{

int avg = 0;
int l = 0;
for (IntWritable val : values) {
l += 1;
avg += val.get();
}
avg=avg / l;
context.write(key, new IntWritable(avg));
if(avg>=80)
{
output.collect(new Text("A " + grade),new Text(String.valueOf(name)));
}
else if(avg>=60 && avg<80)
{
output.collect(new Text("B " + grade),new Text(String.valueOf(name)));
}
else if(avg>=40 && avg<60)

{
output.collect(new Text("C " + grade),new Text(String.valueOf(name)));
}
else
{
output.collect(new Text("D " + grade),new Text(String.valueOf(name)));
}
}
}

public static void main(String[] args) throws Exception

{
Configuration conf = new Configuration();

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Job job = new Job(conf, "StudentGrade");
job.setJarByClass(StudentGrade.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path out = new Path(args[1]);
out.getFileSystem(conf).delete(out);
job.waitForCompletion(true);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains student date and name as “input4.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir /StudentGrade
$ hadoop fs –mkdir /StudentGrade/Input4

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input4.txt’ /StudentGrade/Input4
Output:

9. Change current directory to “StudentGrade.java” directory

$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/SudentGrade.java’

Check compiled java files in folder name “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

11. Put the output file in jar file
$jar –cvf first.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/four.jar’ SrudentGrade /StudentGrade/Input4 /StudentGrade/Output

# check output
$hadoop dfs –cat /StudentGrade/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 5

Develop a MapReduce program to implement Matrix Multiplication.

Aim:
1. To find Matrix Multiplication using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MatrixMul.java” in Desktop

Program: MatrixMul.java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MatrixMul {

public class MatrixMapper extends Mapper<LongWritable, Text,

Text, Text> {
public void map(LongWritable key, Text value, Context
context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("A")) {
for (int k = 0; k < p; k++) {
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("A," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
} else {
for (int i = 0; i < m; i++) {
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("B," + indicesAndValue[1] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}

public class MatrixReducer extends Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterable<Text> values, Context
context) throws IOException, InterruptedException {
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer,
Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer,
Float>();
for (Text val : values) {
value = val.toString().split(",");
if (value[0].equals("A")) {

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

hashA.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
} else {
hashB.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
}
}
int n =Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float a_ij;
float b_jk;
for (int j = 0; j < n; j++) {
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f) {
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
// A is an m-by-n matrix; B is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "5");
conf.set("p", "3");
Job job = new Job(conf, "MatrixMul");
job.setJarByClass(MatrixMul.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains matrix A&B and name as “input5.txt”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

6. set HADOOP classpath environment variable

$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir /MatrixMul
$ hadoop fs –mkdir /MatrixMul/Input5

#Check in browser localhost://50070 or 9870

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input5.txt’ /MatrixMul/Input5

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

9. Change current directory to “MatrixMul.java” directory
$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MatrixMul.java’

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Check compiled java files in folder name “bigdata_classes”
Output:

11. Put the output file in jar file

$jar –cvf five.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/five.jar’ MatrixMul /MatrixMul/Input5 /MatrixMul/Output

# check output
$hadoop dfs –cat /MatrixMul/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiment: 6

Develop a MapReduce program to find the maximum electrical consumption in each year given
electrical consumption for each month in each year.

Aim:
To find the maximum electrical consumption in each year givenelectrical consumption for each month
in each year using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “ProcessUnits.java” in Desktop

Program: ProcessUnits.java

import java.util.*;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

import org.apache.hadoop.util.*;
public class ProcessUnits
{
public static class E_EMapper extends MapReduceBase implements
Mapper<LongWritable ,
Text,
Text,
IntWritable>
{

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();

while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}

int avgprice = Integer.parseInt(lasttoken);

output.collect(new Text(year), new IntWritable(avgprice));
}
}

public static class E_EReduce extends MapReduceBase implements

Reducer< Text, IntWritable, Text, IntWritable >
{

public void reduce( Text key, Iterator <IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;

while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}

}
}

public static void main(String args[])throws Exception

{

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

JobConf conf = new JobConf(ProcessUnits.class);

conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains electricity consumption data every month in a year and name as “inpu6.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir /ProcessUnits
$ hadoop fs –mkdir /ProcessUnits/Input6

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input6.txt’ /ProcessUnits/Input6
Output:

9. Change current directory to “ProcessUnits.java” directory

$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/ProcessUnits.java’

Check compiled java files in folder name “bigdata_classes”

Output:

11. Put the output file in jar file

$jar –cvf first.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/six.jar’ ProcessUnits /ProcessUnits/Input6 /ProcessUnits/Output

# check output
$hadoop dfs –cat /ProcessUnits/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiment: 7

Develop a MapReduce to analyze weather data set and print whether the day is shinny or coolday.

Aim:
To analyze weather data set and print whether the day is shinny or coolday using map reduce program.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MyMaxMin.java” in Desktop

Program: MyMaxMin.java

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class MyMaxMin {

public static class MaxTemperatureMapper extends MapReduceBase implements

Mapper<LongWritable, Text, Text, Text> {

@Override
public void map(LongWritable arg0, Text Value,
OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {

String line = Value.toString();

// Example of Input
// Date Max Min
// 25380 20130101 2.514 -135.69 58.43 8.3 1.1 4.7 4.9 5.6 0.01 C
1.0 -0.1 0.4 97.3 36.0 69.4 -99.000 -99.000 -99.000 -99.000 -99.000 -9999.0 -9999.0 -9999.0 -
9999.0 -9999.0

String date = line.substring(6, 14);

float temp_Max = Float.parseFloat(line.substring(39, 45).trim());

float temp_Min = Float.parseFloat(line.substring(47, 53).trim());

if (temp_Max > 40.0) {

// shinny day
output.collect(new Text("shinny Day " + date),
new Text(String.valueOf(temp_Max)));
}

if (temp_Min < 10) {

// Cool day
output.collect(new Text("Cool Day " + date),
new Text(String.valueOf(temp_Min)));
}
}

public static class MaxTemperatureReducer extends MapReduceBase implements

Reducer<Text, Text, Text, Text> {

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

@Override
public void reduce(Text Key, Iterator<Text> Values,
OutputCollector<Text, Text> output, Reporter arg3)
throws IOException {

// Find Max temp yourself ?

String temperature = Values.next().toString();
output.collect(Key, new Text(temperature));
}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(MyMaxMin .class);

conf.setJobName("temp");

// Note:- As Mapper's output types are not default so we have to define

// the
// following properties.
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);

conf.setMapperClass(MaxTemperatureMapper.class);
conf.setReducerClass(MaxTemperatureReducer.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}
}

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains some NDC date and name as “input7.txt”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

6. set HADOOP classpath environment variable

$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir /MaxTemp
$ hadoop fs –mkdir /MaxTem/Input1

#Check in browser localhost://50070 or 9870

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input7.txt’ /MyMaxMin/Input7

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

9. Change current directory to “MyMaxMin.java” directory
$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MyMaxMin.java’

Check compiled java files in folder name “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

11. Put the output file in jar file
$jar –cvf first.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/seven.jar’ MyMaxMin /MyMaxMin/Input7 /MyMaxMin/Output

# check output
$hadoop dfs –cat /MyMaxMin/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiment: 8

Develop a MapReduce program to find the number of products sold in each country byconsidering sales data
containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi

_Date uct ce _Type Me ty ate ntry Created ogin ude tude

Aim:
To Develop a MapReduce program to find the number of products sold in each country byconsidering sales
data containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi

_Date uct ce _Type Me ty ate ntry Created ogin ude tude

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

2. Write below program and save as “SalesCountry.java” in Desktop

Program: SalesCountry.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class SalesCountry

{
public class SalesMapper extends MapReduceBase implements Mapper <LongWritable, Text, Text,
IntWritable> {
private final IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector <Text, IntWritable> output,
Reporter reporter) throws IOException {

String valueString = value.toString();

String[] SingleCountryData = valueString.split(",");
output.collect(new Text(SingleCountryData[7]), one);
}
}
public class SalesCountryReducer extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values,

OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
Text key = t_key;
int frequencyForCountry = 0;
while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get();

}
output.collect(key, new IntWritable(frequencyForCountry));
}
}
public static void main(String[] args)
{
JobClient my_client = new JobClient();
// Create a configuration object for the job

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

JobConf job_conf = new JobConf(SalesCountry.class);

// Set a name of the Job

job_conf.setJobName("SalePerCountry");

// Specify data type of output key and value

job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class

job_conf.setMapperClass(SalesMapper.class);
job_conf.setReducerClass(SalesCountryReducer.class);

// Specify formats of the data type of Input and output

job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line arguments,

//arg[0] = name of input directory on HDFS, and arg[1] = name of output directory to be
created to store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

my_client.setConf(job_conf);
try {
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains some sales data and name as “input8.txt”

Output:
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
5. Check folder to store java classes and with name as “bigdata_classes”
Output:

6. set HADOOP classpath environment variable

$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir /SalesCountry
$ hadoop fs –mkdir /SalesCountry/Input8

#Check in browser localhost://50070 or 9870

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input8.txt’ /SalesCountry/Input8

Output:

9. Change current directory to “SalesCountry.java” directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/SalesCountry.java’

Check compiled java files in folder name “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

11. Put the output file in jar file
$jar –cvf eight.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

$hadoop jar ‘/home/bigdata/Desktop/eight.jar’ SalesCountry /SalesCountry/Input8 /SalesCountry/Output

# check output
$hadoop dfs –cat /SalesCountry/Output/*

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 9

Develop a MapReduce program to find the tags associated with each movie by analyzingmovie lens
data.

Aim:
To Develop a MapReduce program to find the tags associated with each movie by analyzingmovie lens
data.

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “MovieLens.java” in Desktop

Program: MovieLens.java

import java.io.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MovieLens {

public static class RatingMapper

extends Mapper<Object, Text, Text, DoubleWritable>{

private Text word = new Text();

private DoubleWritable rating=new DoubleWritable();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String line = value.toString();
if(line.charAt(0)!='u')
{
String[] line_values = line.split(",");
word.set(line_values[1]);
rating.set(Double.parseDouble(line_values[2]));
context.write(word, rating);
}
}
}

public static class AverageReducer

extends Reducer<Text,DoubleWritable,Text,DoubleWritable> {
private DoubleWritable result = new DoubleWritable();

public void reduce(Text key, Iterable<DoubleWritable> values,

Context context
) throws IOException, InterruptedException {
double sum=0.0;
int count=0;
for (DoubleWritable val : values) {
sum += val.get();
count++;
}
result.set(sum/count);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "hdfs://cshadoop1:61120");
conf.set("yarn.resourcemanager.address", "cshadoop1.utdallas.edu:8032");
conf.set("mapreduce.framework.name", "yarn");
Job job = Job.getInstance(conf, "MovieLens");
job.setJarByClass(mapReduceMovieR.class);
job.setMapperClass(RatingMapper.class);
job.setReducerClass(AverageReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains some movie lens data and name as “input9.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

7. create a directory inside hadoop for input
$hadoop fs –mkdir /MovieLens
$ hadoop fs –mkdir /MovieLens/Input9

#Check in browser localhost://50070 or 9870

Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input9.txt’ /MovieLens/Input9
Output:

9. Change current directory to “MovieLens.java” directory

$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MovieLens.java’

Check compiled java files in folder name “bigdata_classes”

Output:

11. Put the output file in jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$jar –cvf nine.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

$hadoop jar ‘/home/bigdata/Desktop/nine.jar’ MovieLens /MovieLens/Input9 /MovieLens/Output

# check output
$hadoop dfs –cat /MovieLens/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 10

XYZ.com is an online music website where users listen to various tracks, the data gets collected which is
given below.

The data is coming in log files and looks like as shown below.

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

Write a MapReduce program to get the following

Aim:
XYZ.com is an online music website where users listen to various tracks, the data gets collected which is
given below.

The data is coming in log files and looks like as shown below.

UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

To Write a MapReduce program to get the following

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:
G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
2. Write below program and save as “MusicTrack.java” in Desktop

Program: MusicTrack.java

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
//import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MusicTrack
{
public static class MusicMapper extends Mapper<Object,Text,Text,Text>
{
public void map(Object key,Text value,Context context) throws
IOException,InterruptedException
{
String[] tokens=value.toString().split("\\|");
String trackid = /*"1";*/tokens[1];
String others = tokens[0]+"\t"+tokens[2]+"\t"+tokens[3]+"\t"+tokens[4];
context.write(new Text(trackid),new Text(others));
}
}

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

public static class MusicReduceer extends Reducer<Text,Text,Text,Text>
{
public void reduce(Text Key,Iterable<Text> value,Context context) throws
IOException,InterruptedException
{

Set<Integer> userIdSet = new HashSet<Integer>();

int shared = 0;
int radio =0;
int skip= 0;
int listen=0;

for(Text val:value)

{
String[] valTokens = val.toString().split("\t");

int sh = Integer.parseInt(valTokens[1]);
int ra = Integer.parseInt(valTokens[2]);
int sk = Integer.parseInt(valTokens[3]);

shared = shared+sh;
radio=radio+ra;
skip=skip+sk;
listen = shared + radio;

int cus = Integer.parseInt(valTokens[0]);

userIdSet.add(cus);

IntWritable size = new IntWritable(userIdSet.size());

context.write(new Text(Key),new Text("customerId- "+size+"\t"+"Shared-

"+shared+"\t"+"Radio- "+radio+"\t"+"Skipped- "+skip+"\t"+"Listen- "+listen));
}

}
public static void main(String args[]) throws Exception
{
Configuration conf=new Configuration();
Job job=new Job(conf,"MusicTrack");
job.setNumReduceTasks(1);
job.setJarByClass(MusicTrack.class);
job.setMapperClass(MusicMapper.class);

job.setReducerClass(MusicReduceer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Path outputpath= new Path(args[1]);
FileInputFormat.addInputPath(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
outputpath.getFileSystem(conf).delete(outputpath,true);
System.exit(job.waitForCompletion(true)?0:1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains some movie lens data and name as “input10.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

7. create a directory inside hadoop for input
$hadoop fs –mkdir /MusicTrack
$ hadoop fs –mkdir /MusicTrack/Input10

#Check in browser localhost://50070 or 9870

Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input10.txt’ /MusicTrack/Input10
Output:

9. Change current directory to “MusicTrack.java” directory

$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/MusicTrack.java’

Check compiled java files in folder name “bigdata_classes”

Output:

11. Put the output file in jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$jar –cvf ten.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

$hadoop jar ‘/home/bigdata/Desktop/ten.jar’ MusicTrack /MusicTrack/Input10 /MusicTrack/Output

# check output
$hadoop dfs –cat /MusicTrack/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 11

Develop a MapReduce program to find the frequency of books published eachyear and findin which year
maximum number of books were published usingthe following data.

Title Author Published Author Language No of pages

Year country

Aim:
To Develop a MapReduce program to find the frequency of books published each year and findin which
year maximum number of books were published using the following data.

Title Author Published Author Language No of pages

Year country

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “BookMax.java” in Desktop

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Program: BookMax.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class BookMax {

public static class BookMaxMapper extends Mapper<LongWritable , Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {

String line = value.toString();

String year = line.substring(15,19);
int author,MISSING=0;

if (line.charAt(87)=='+')
author =Integer.parseInt(line.substring(25, 32));
else
author = Integer.parseInt(line.substring(20, 25));
String quality = line.substring(92, 93);
if(author != MISSING && quality.matches("[01200]"))
context.write(new Text(year),new IntWritable(author));
}
}

public static class BookMaxReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable maxBookResult = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context
) throws IOException, InterruptedException {
int max_book = 0;
for (IntWritable val : values) {
int book = val.get();
if (book > max_book)
max_book = book;
}
maxBookResult.set(max_book);
context.write(key, maxBookResult);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "BookMax");
job.setJarByClass(BookMax.class);
job.setMapperClass(BookMaxMapper.class);
job.setCombinerClass(BookMaxReducer.class);
job.setReducerClass(BookMaxReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains some movie lens data and name as “input11.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

7. create a directory inside hadoop for input
$hadoop fs –mkdir /BookMax
$ hadoop fs –mkdir /BookMax/Input11

#Check in browser localhost://50070 or 9870

Output:

8. Upload the input file to that directory

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input11.txt’ /MusicTrack/Input11
Output:

9. Change current directory to “BookMax.java” directory

$ cd /home/bigdata/Desktop/

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. Compile the java code
$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/BookMax.java’

Check compiled java files in folder name “bigdata_classes”

Output:

11. Put the output file in jar file

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$jar –cvf eleven.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

$hadoop jar ‘/home/bigdata/Desktop/eleven.jar’ BookMax /BookMax/Input11 /BookMax/Output

# check output
$hadoop dfs –cat /BookMax/Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiments: 12

Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people
(both male and female) who died in the tragedy. How many persons are survived in each class.

The titanic data will be..

Aim:
To Develop a MapReduce program to analyze Titanic ship data and to find the average age of the people
(both male and female) who died in the tragedy. How many persons are survived in each class.

The titanic data will be..

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

2. Write below program and save as “Average_age.java” in Desktop

Program: Average_age.java

public class Average_age {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private Text gender = new Text();

private IntWritable age = new IntWritable();
public void map(LongWritable key, Text value, Context context ) throws IOException,
InterruptedException {
String line = value.toString();
String str[]=line.split(",");
if(str.length>6){
gender.set(str[4]);
if((str[1].equals("0")) ){
if(str[5].matches("\\d+")){
int i=Integer.parseInt(str[5]);
age.set(i);

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

}
}
}
context.write(gender, age);

public static class Reduce extends Reducer<Text,IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException {
int sum = 0;
int l=0;
for (IntWritable val : values) {
l+=1;
sum += val.get();
}
sum=sum/l;
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

@SuppressWarnings("deprecation")
Job job = new Job(conf, "Averageage_survived");
job.setJarByClass(Average_age.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path out=new Path(args[1]);
out.getFileSystem(conf).delete(out);
job.waitForCompletion(true);
}

}
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

3. Check folder for input data with name as “input_data”

Output:

4. Create a text file contains titanic date and name as “input12.txt”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

6. set HADOOP classpath environment variable

$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir / Average_age
$ hadoop fs –mkdir / Average_age /Input12

#Check in browser localhost://50070 or 9870

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input12.txt’ / Average_age/Input12

Output:

9. Change current directory to “Average_age.java” directory

$ cd /home/bigdata/Desktop/

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’
‘/home/bigdata/Desktop/ Average_age.java’

Check compiled java files in folder name “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

11. Put the output file in jar file
$jar –cvf twelve.jar –C bigdata_classes/ .
Output:

12. Now we run the jar file

$hadoop jar ‘/home/bigdata/Desktop/twelve.jar’ Average_age / Average_age /Input12 / Average_age /Output

# check output
$hadoop dfs –cat / Average_age /Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN
Experiments: 13

Develop a MapReduce program to analyze Uber data set to find the days on which eachbasement has
more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number Date active_vehicles trips

Aim:
To Develop a MapReduce program to analyze Uber data set to find the days on which eachbasement
has more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number Date active_vehicles trips

Description:

1. Make sure you have installed and running hadoop

$start-all.sh

Output:

2. Write below program and save as “UberTrack.java” in Desktop

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Program: UberTrack.java

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class UberTrack

{
public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

public Text basement = new Text();

public int trips;
Calendar calendar = Calendar.getInstance();
Date date = null;
SimpleDateFormat format = new SimpleDateFormat("MM/DD/YYYY");
String[] days = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };

public void map(Object key, Text record, Context context) throws IOException,
InterruptedException {
String[] parts = record.toString().split("[,]");
basement.set(parts[0]);

try {
date = format.parse(parts[1]);
calendar.setTime(date);
} catch (ParseException e) {
e.printStackTrace();
}

trips = new Integer(parts[3]);

String Keys = basement.toString() + " " + days[Calendar.DAY_OF_WEEK];
context.write(new Text(Keys), new IntWritable(trips));

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

}

}
public class Sum_reducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws

IOException, InterruptedException {

int sum = 0;

for(IntWritable val: values)

{
sum += val.get();
}

result.set(sum);

context.write(key, result);

}
}
public class Trip_tracking extends Configured implements Tool {

public static void main(String[] args) throws Exception {

int exitcode = ToolRunner.run(new Trip_tracking(), args);
System.exit(exitcode);
}

public int run(String[] args) throws Exception {

Job job = Job.getInstance(getConf(), "Uber Trip tracking to fiund the more trips for
each basement");
job.setJarByClass(getClass());

FileInputFormat.setInputPaths(job, new Path("In"));

FileOutputFormat.setOutputPath(job, new Path(args[1] + "23/05/17"));

job.setMapperClass(TokenizerMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setCombinerClass(Sum_reducer.class);
job.setReducerClass(Sum_reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

return job.waitForCompletion(true) ? 0 :1;

}

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

}

Output:

3. Check folder for input data with name as “input_data”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

4. Create a text file contains titanic date and name as “input13.txt”

Output:

5. Check folder to store java classes and with name as “bigdata_classes”

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. set HADOOP classpath environment variable
$export HADOOP_CLASSPATH=$(hadoop classpath)
$echo $ HADOOP_CLASSPATH

Output:

7. create a directory inside hadoop for input

$hadoop fs –mkdir / UberTrack
$ hadoop fs –mkdir / UberTrack /Input13

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

#Check in browser localhost://50070 or 9870
Output:

8. Upload the input file to that directory

$hadoop fs -put ‘/home/bigdata/Desktop/input_data/input13.txt’ / UberTrack/Input13

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Output:

9. Change current directory to “UberTrack.java” directory

$ cd /home/bigdata/Desktop/

Output:

10. Compile the java code

$javac –classpath ${HADOOP_CLASSPATH} -d ‘/home/bigdata/Desktop/bigdata_classes’

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

‘/home/bigdata/Desktop/ UberTrack.java’

Check compiled java files in folder name “bigdata_classes”

Output:

11. Put the output file in jar file

$jar –cvf thirteen.jar –C bigdata_classes/ .
Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

12. Now we run the jar file
$hadoop jar ‘/home/bigdata/Desktop/ thirteen.jar’ UberTrack /UberTrack /Input13 / UberTrack /Output

# check output
$hadoop dfs –cat / UberTrack /Output/*

Final Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 14

Develop a program to calculate the maximum recorded temperature by yearwise for theweather
dataset in Pig Latin

Aim:
To develop a program to calculate the maximum recorded temperature by yearwise for theweather
dataset in Pig Latin

Description:

1. Install java
2. Install hadoop
3. Run hadoop
$start-all.sh

Output:

4. Download pig from apache

$wget http://www.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

5. Unzip Pig

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$tar xvzf ‘/home/bigdata/pig-0.16.0.tar.gz’

Output:

6. Change Directory to pig-0.16.0 folder

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$cd home/bigdata/pig-0.16.0

7. Open the .bashrc file

$nano .bashrc

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8.Update .bashrc file
$source .bashrc

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10. Set JAVA_HOME

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

OUTPUT:

11.Run the pig

$pig

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

12. Move weather data set input to hdfs root directory

$hdfs dfs –copyFromLocal /home/bigdata/Desktop/input_data/input1.txt hdfs:/

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

13. Execute the below lines of code in grunt shell

grunt>records = LOAD 'hdfs://sample.txt' AS (year:chararray, temperature:int, quality:int);

grunt >filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR
quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
grunt >grouped_records = GROUP filtered_records BY year;
grunt >max_temp = FOREACH grouped_records GENERATE group,
MAX(filtered_records.temperature);
grunt >DUMP max_temp;

final output:

1949 111
1950 22

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 15

Write queries to sort and aggregate the data in a table using HiveQL.

Aim:
To write queries to sort and aggregate the data in a table using HiveQL.

Description:

1. Install java
2. Install hadoop
3. Run hadoop
$start-all.sh

Output:

4. Download hive from apache

$wget “https://downloads.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz”

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

5. Unzip hive

$tar xvzf ‘/home/bigdata/ apache-hive-3.1.2-bin.tar.gz’

Output:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

6. Change Directory to apache-hive-3.1.2-bin folder

$cd home/bigdata/ apache-hive-3.1.2-bin

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

7. Open the .bashrc file

$nano .bashrc

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

8.Update .bashrc file
$source .bashrc

09. Set JAVA_HOME

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

OUTPUT:

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

10.Run the hive

$hive

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

11. Create and Select the database in which we want to create a table.
hive> create database svcn;
hive> use svcn;

12. Now, create a table by using the following command

hive> create table emp (Id int, Name string , Salary float, Department string)
row format delimited
fields terminated1949by111',' ;
1950 22

13. Load the data into the table.

hive> load data local inpath '/home/bigdata/hive/emp_data' into table emp;

Final output:

14. Now, fetch the data in the descending order (sorting) by using the following command.

hive> select * from emp sort by salary desc;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

15. Now, fetch the sum of employee salaries department wise(aggregate) by using the
following command
hive> select department, sum(salary) from emp group by department;

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Experiment: 16

Develop a Java application to find the maximum temperature using Spark.

Aim:
To Develop a Java application to find the maximum temperature using Spark.

Description:

1. Install java
2. Install hadoop
3. Run hadoop
$start-all.sh

Output:

4. Download spark from apache

$wget “https://www.apache.org/dyn/closer.lua/spark/spark-2.4.1/spark-2.4.1-bin-hadoop2.7.tgz”

5. Unzip spark

$tar xvzf ‘/home/bigdata/spark-2.4.1-bin-hadoop2.7.tgz

6. Change Directory to spark-2.4.1-bin-hadoop2.7 folder

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

$cd home/bigdata/spark-2.4.1-bin-hadoop2.7

7. Open the .bashrc file

$nano .bashrc
SPARK_HOME=/ home/bigdata/spark-2.4.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

8.Update .bashrc file

$source .bashrc
09. Set JAVA_HOME

$export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

10.Run the spark

$ spark-shell

11. Create a text file “input11.txt” in your local machine and write some weather data set into it.

12. Write below java Program to find maximum temperature

import org.apache.spark.SparkContext._

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

import org.apache.spark.{SparkConf, SparkContext}
object MaxTemperature {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Max Temperature").setMaster("local")
val sc = new SparkContext(conf)
val lines = sc.textFile("input11.txt")
val records = lines.map(_.split("\t"))
val filtered = records.filter(rec => (rec(1) != "9999"
&& rec(2).matches("[01459]")))
val tuples = filtered.map(rec => (rec(0).toInt, rec(1).toInt))
val maxTemps = tuples.reduceByKey((a, b) => Math.max(a, b))
maxTemps.foreach(println(_))
}
}

13. Final output

Scala> 1949 111

1950 22

G.VIDYA SAGAR, ASSISTANT PROFESSOR CSE, SVCN

Mazda MX5 ND Manual Transmission M66M-D Serivce Manual
100% (2)
Mazda MX5 ND Manual Transmission M66M-D Serivce Manual
62 pages
Ccs334 - Big Data Analytics
75% (4)
Ccs334 - Big Data Analytics
2 pages
Oil and Gas Indonesia
No ratings yet
Oil and Gas Indonesia
87 pages
Cummins PowerBox 20ft 40ft Container Genset Installation Manual
100% (1)
Cummins PowerBox 20ft 40ft Container Genset Installation Manual
28 pages
MPR-214F Instruction
No ratings yet
MPR-214F Instruction
35 pages
QESV1138 01 AP1055D Slides
No ratings yet
QESV1138 01 AP1055D Slides
185 pages
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
No ratings yet
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
16 pages
Meesho F
No ratings yet
Meesho F
75 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
Colorbond Brochure 140220
No ratings yet
Colorbond Brochure 140220
40 pages
TVF2 5
No ratings yet
TVF2 5
107 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
20dce017 Bda Pracfil
No ratings yet
20dce017 Bda Pracfil
41 pages
Zero Lecture: Big Data Analytics Lab BCA04206 From: Megha Garg
No ratings yet
Zero Lecture: Big Data Analytics Lab BCA04206 From: Megha Garg
19 pages
AIADS 7th Sem Syllabus Signed
No ratings yet
AIADS 7th Sem Syllabus Signed
19 pages
Vertical Separator Calculator Guide
No ratings yet
Vertical Separator Calculator Guide
12 pages
EXERCISE 13A - Door and Window Schedule
No ratings yet
EXERCISE 13A - Door and Window Schedule
1 page
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Bigdata Lab Manual
No ratings yet
Bigdata Lab Manual
37 pages
Unit1 Ai&ml
No ratings yet
Unit1 Ai&ml
51 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
Bda Lab
No ratings yet
Bda Lab
94 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
34 pages
Bda Record 18071a0597-1
No ratings yet
Bda Record 18071a0597-1
28 pages
Syllabus Big Data Analytics
No ratings yet
Syllabus Big Data Analytics
2 pages
R Rec M.823 3 2006
No ratings yet
R Rec M.823 3 2006
20 pages
Notes
No ratings yet
Notes
53 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Big Data Lab Manual and Syllabus
No ratings yet
Big Data Lab Manual and Syllabus
71 pages
20ai402 Data Analytics Unit-2
No ratings yet
20ai402 Data Analytics Unit-2
72 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
Essentials of Big Data Griet
No ratings yet
Essentials of Big Data Griet
2 pages
Big Daa R18 Manual
No ratings yet
Big Daa R18 Manual
84 pages
CCS334 BDA Lab Manual
No ratings yet
CCS334 BDA Lab Manual
35 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Specialised Programme On Big Data and Machine Learning - 8 Weeks
No ratings yet
Specialised Programme On Big Data and Machine Learning - 8 Weeks
6 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
BDA LAB FILE Final 18EGICS110
No ratings yet
BDA LAB FILE Final 18EGICS110
54 pages
Lec 10
No ratings yet
Lec 10
51 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
25-30 KV Underground Cables
No ratings yet
25-30 KV Underground Cables
3 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
Das 350
No ratings yet
Das 350
6 pages
Renolit Poliplex Series - en
No ratings yet
Renolit Poliplex Series - en
2 pages
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
No ratings yet
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
29 pages
Bda Da1
No ratings yet
Bda Da1
14 pages
Big Dataa-Lab-Manual
No ratings yet
Big Dataa-Lab-Manual
24 pages
BL Gritel W 49
No ratings yet
BL Gritel W 49
6 pages
CCS334 Bda
No ratings yet
CCS334 Bda
5 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Q.A. Basic Provison and Salary Package
No ratings yet
Q.A. Basic Provison and Salary Package
2 pages
Bda Manual
No ratings yet
Bda Manual
47 pages
BSBPEF501 - Assessment Task 2 2024
No ratings yet
BSBPEF501 - Assessment Task 2 2024
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Customizing Documentation For Electronic Document Processing of SAP Document and Reporting Compliance
No ratings yet
Customizing Documentation For Electronic Document Processing of SAP Document and Reporting Compliance
4 pages
Brodie Paton Resume
No ratings yet
Brodie Paton Resume
2 pages
A Step-By-Step Approach On Installing Hadoop in Vmware Workstation
No ratings yet
A Step-By-Step Approach On Installing Hadoop in Vmware Workstation
9 pages
DSA Practical Index
No ratings yet
DSA Practical Index
3 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Aptitude Training Registered Students
No ratings yet
Aptitude Training Registered Students
24 pages
Separation and Gathering Facilities in Kuwait
No ratings yet
Separation and Gathering Facilities in Kuwait
3 pages
Lab Manual Big Data Analyticts
No ratings yet
Lab Manual Big Data Analyticts
67 pages
Solution 6000 Matrix V2.53.28
No ratings yet
Solution 6000 Matrix V2.53.28
2 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
56 pages
A I and Contemporary Challenges TH 2024 Journal of Open Innovation Technol
No ratings yet
A I and Contemporary Challenges TH 2024 Journal of Open Innovation Technol
9 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
GC 2025 01 26
No ratings yet
GC 2025 01 26
2 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
Installing Clickhouse On A Kubernetes Cluster
No ratings yet
Installing Clickhouse On A Kubernetes Cluster
3 pages
List of Questions Big Data
No ratings yet
List of Questions Big Data
5 pages
AI&DS AC Lab Manual
No ratings yet
AI&DS AC Lab Manual
5 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
Bda Manual Lab Manual
No ratings yet
Bda Manual Lab Manual
117 pages
Bad601 Lab
No ratings yet
Bad601 Lab
32 pages
Rush
No ratings yet
Rush
90 pages
Bda Lab Manual - Cse 8 Sem - Compl
No ratings yet
Bda Lab Manual - Cse 8 Sem - Compl
57 pages
Syllabus BDA
No ratings yet
Syllabus BDA
1 page
Maths Project
No ratings yet
Maths Project
6 pages
Gmail - FWD - Uniosun Webpay Transaction Details (Ref - Osu - 24bp29522 - lnb6g7)
No ratings yet
Gmail - FWD - Uniosun Webpay Transaction Details (Ref - Osu - 24bp29522 - lnb6g7)
2 pages
How To Program A Mobile Game
From Everand
How To Program A Mobile Game
Duong Tran
4/5 (1)
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
The Beginner’s Guide to Node.js
From Everand
The Beginner’s Guide to Node.js
Steven Mcananey
No ratings yet