Short Programs
Short Programs
EXP NO: 3
MapReduce program to find the maximum temperature in each year
Date:
AIM: To Develop a MapReduce program to find the maximum temperature in each year.
Description: MapReduce is a programming model designed for processing large volumes of data
in parallel by dividing the work into a set of independent tasks.Our previous traversal has given an
introduction about MapReduce This traversal explains how to design a MapReduce program. The
aim of the program is to find the Maximum temperature recorded for each year of NCDC data.
The input for our program is weather data files for each year This weather data is collected by
National Climatic Data Center – NCDC from weather sensors at all over the world. You can find
weather data for each year from ftp://ftp.ncdc.noaa.gov/pub/data/noaa/.All files are zipped by year
and the weather station. For each year, there are multiple files for different weather stations.
Here is an example for 1990 (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1901/).
010080-99999-1990.gz
010100-99999-1990.gz
010150-99999-1990.gz
…………………………………
MapReduce is based on set of key value pairs. So first we have to decide on the types for the
key/value pairs for the input.
Map Phase: The input for Map phase is set of weather data files as shown in snap shot. The types
of input key value pairs are LongWritable and Text and the types of output key value pairs are
Text and IntWritable. Each Map task extracts the temperature data from the given year file. The
output of the map phase is set of key value pairs. Set of keys are the years. Values are the
temperature of each year.
Reduce Phase: Reduce phase takes all the values associated with a particular key. That is all the
temperature values belong to a particular year is fed to a same reducer. Then each reducer finds
the highest recorded temperature for each year. The types of output key value pairs in Map phase
is same for the types of input key value pairs in reduce phase (Text and IntWritable). The types of
output key value pairs in reduce phase is too Text and IntWritable.
So, in this example we write three java classes:
HighestMapper.java
HighestReducer.java
HighestDriver.java
Program: HighestMapper.java
14
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class HighestMapper extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable>
{
public static final int MISSING = 9999;
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String year = line.substring(15,19);
int temperature;
if (line.charAt(87)=='+')
temperature = Integer.parseInt(line.substring(88, 92));
else
temperature = Integer.parseInt(line.substring(87, 92));
String quality = line.substring(92, 93);
if(temperature != MISSING && quality.matches("[01459]"))
output.collect(new Text(year),new IntWritable(temperature));
}
}
HighestReducer.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class HighestReducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException
{
int max_temp = 0;
;
while (values.hasNext())
{
15
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
int current=values.next().get();
if ( max_temp < current)
max_temp = current;
}
output.collect(key, new IntWritable(max_temp/10));
}
HighestDriver.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class HighestDriver extends Configured implements Tool{
public int run(String[] args) throws Exception
{
JobConf conf = new JobConf(getConf(), HighestDriver.class);
conf.setJobName("HighestDriver");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(HighestMapper.class);
conf.setReducerClass(HighestReducer.class);
Path inp = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.addInputPath(conf, inp);
FileOutputFormat.setOutputPath(conf, out);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
int res = ToolRunner.run(new Configuration(), new HighestDriver(),args);
System.exit(res);
}
}
Output:
16
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Result:
17
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
EXP NO: 4
MapReduce program to find the grades of student’s
Date:
import java.util.Scanner;
public class JavaExample
{
public static void main(String args[])
{
/* This program assumes that the student has 6 subjects,
* thats why I have created the array of size 6. You can
* change this as per the requirement.
*/
int marks[] = new int[6];
int i;
float total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+(i+1)+":");
marks[i] = scanner.nextInt();
total = total + marks[i];
}
scanner.close();
//Calculating average
here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
}
else if(avg>=40 && avg<60)
{
18
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
System.out.print("C");
}
else
{
System.out.print("D");
}
}
}
Expected Output:
Actual Output:
Result:
19
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
Source code:
import java.util.*;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
29
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class ProcessUnits
{
//Mapper class
public static class E_EMapper extends MapReduceBase implements
Mapper<LongWritable ,/*Input key Type */ Text, /*Input value Type*/
Text, /*Output key Type*/ IntWritable> /*Output value Type*/
{
//Map function
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString(); String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();
while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}
int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}
//Reducer class
public static class E_EReduce extends MapReduceBase implements
Reducer< Text, IntWritable, Text, IntWritable >
{
//Reduce function
public void reduce( Text key, Iterator <IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws
IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;
while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}
}
30
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
}
//Main function
public static void main(String args[])throws Exception
{
JobConf conf = new JobConf(ProcessUnits.class);
conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
Expected OUTPUT:
Input:
Kolkata,56
Jaipur,45
Delhi,43
Mumbai,34
Goa,45
Kolkata,35
Jaipur,34
Delhi,32
Output:
Kolkata 56
Jaipur 45
Delhi 43
Mumbai 34
Actual Output:
31
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Result:
32
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
AIM: Develop a MapReduce program to find the number of products sold in each
country by considering sales data containing fields like
Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi
_Date uct ce _Type me ty ate ntry Created ogin ude tude
Source code:
public class Driver extends Configured implements Tool {
enum Counters { DISCARDED_ENTRY
}
public static void main(String[] args) throws Exception { ToolRunner.run(new Driver(), args);
}
job.setMapperClass(Mapper.class); job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setCombinerClass(Combiner.class); job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class);
Text value,
org.apache.hadoop.mapreduce.Mapper<
43
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
LongWritable,
Text,
LongWritable,
Text
>.Context context
.substring(values.get(2).length() - 4);
// convert time to minutes (e.g. 1542 -> 942)
* 60 + Integer.parseInt(time.substring(2,4));
new LongWritable(Integer.parseInt(year)),
);
} else
{
44
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
}}
45
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
}
}
Expected Output:
Actual Output:
47
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Result:
48
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
AIM: XYZ.com is an online music website where users listen to various tracks, the data
gets collected which is given below.
The data is coming in log files and looks like as shown below.
111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0
XYZ.com is an online music website where users listen to various tracks, the data gets collected
like shown below. Write a map reduce program to get following stats
The data is coming in log files and looks like as shown below.
UserId|TrackId|Shared|Radio|Skip
111115|222|0|1|0
111113|225|1|0|0
57
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
111117|223|0|1|1
111115|225|1|0|0
In this tutorial we are going to solve the first problem, that is finding out unique listeners per
track.
First of all we need to understand the data, here the first column is UserId and the second one
is Track Id. So we need to write a mapper class which would emit trackId and userIds and
intermediate key value pairs. To make it simple to remember the data sequence, let's create a
constants class as shown below
public static final int USER_ID = 0; public static final int TRACK_ID = 1; public static final int
IS_SHARED = 2; public static final int RADIO = 3;
Now, lets create the mapper class which would emit intermediate key value pairs as
(TrackId, UserId) as shwon below
Mapper< Object , Text, IntWritable, IntWritable > { IntWritable trackId = new IntWritable();
if (parts.length == 5) {
context.write(trackId, userId);
58
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
} else {
context.getCounter(COUNTERS.INVALID_RECORD_COUNT).increment(1L);
}
}
}
Now let's write a Reducer class to aggregate the results. Here we simply can not use sum
reducer as the records we are getting are not unique and we have to count only unique users.
Here is how the code would look like
public static void main(String[] args) throws Exception { Configuration conf = new
Configuration(); if (args.length != 2) {
System.err.println("Usage: uniquelisteners < in > < out >");
System.exit(2);
}
Job job = new Job(conf, "Unique listeners per track"); job.setJarByClass(UniqueListeners.class);
job.setMapperClass(UniqueListenersMapper.class);
job.setReducerClass(UniqueListenersReducer.class); job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new
Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1); org.apache.hadoop.mapreduce.Counters
counters = job.getCounters(); System.out.println("No. of Invalid Records :"
59
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
+ counters.findCounter(COUNTERS.INVALID_RECORD_COUNT)
.getValue());
}
Expected Output:
UserId |TrackId |Shared |Radio | Skip
111115 | 222 | 0 | 1 |0
111113 | 225 | 1 | 0 |0
111117 | 223 | 0 | 1 |1
111115 | 225 | 1 | 0 |0
Actual Output:
Result:
60
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
AIM: Develop a MapReduce program to find the frequency of books published eachyear and
find in which year maximum number of books were published usingthe following data.
Description:
MapReduce is a software framework for easily writing applications which process vast amounts
of data residing on multiple systems. Although it is a very powerful framework, it doesn’t provide
a solution for all the big dataproblems.
Before discussing about MapReduce let first understand framework in general. Framework is a
set of rules which we follow or should follow to obtain the desired result. So whenever we write
a MapReduce program we should fit our solution into the MapReduce framework.
Although MapReduce is very powerful it has its limitations. Some problems like processing graph
algorithms, algorithms which require iterative processing, etc. are tricky and challenging. So
implementing such problems in MapReduce is very difficult. To overcome such problems we can
use MapReduce design pattern.
[Note: A Design pattern is a general repeatable solution to a commonly occurring problem in
software design. A design pattern isn’t a finished design that can be transformed directly into
code. It is a description or template for how to solve a problem that can be used in many different
situations.]
We generally use MapReduce for data analysis. The most important part of data analysis is to
find outlier. An outlier is any value that is numerically distant from most of the other data points
in a set of data. These records are most interesting and unique pieces of data in the set.
The point of this blog is to develop MapReduce design pattern which aims at finding the Top K
records for a specific criteria so that we could take a look at them and perhaps figure out the
reason which made them special.
This can be achived by defining a ranking function or comparison function between two records
that determines whether one is higher than the other. We can apply this pattern to use
MapReduce to find the records with the highest value across the entire data set.
Before discussing MapReduce approach let’s understand the traditional approach of finding Top
K records in a file located on a single machine.
61
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
1985,ATL,NL,barkele01,870000
1985,ATL,NL,bedrost01,550000
1985,ATL,NL,benedbr01,545000
1985,ATL,NL,campri01,633333
1985,ATL,NL,ceronri01,625000
1985,ATL,NL,chambch01,800000
Above data set contains 5 columns – yearID, teamID, lgID, playerID, salary. In this example we
are finding Top K records based on salary.
For sorting the data easily we can use java.lang.TreeMap. It will sort the keys automatically.
But in the default behavior Tree sort will ignore the duplicate values which will not give the
correct results.
To overcome this we should create a Tree Map with our own compactor to include the duplicate
values and sort them.
Below is the implementation of Comparator to sort and include the duplicate values :
Comparator code:
import java.util.Comparator;
62
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
}
}
class MySalaryComp1 implements Comparator<Salary>{
@Override
public int compare(Salary e1, Salary e2) {
if(e1.getSum()>e2.getSum()){
return 1;
} else {
return -1;
}
}
}
Mapper Code:
public class Top20Mapper extends Mapper<LongWritable, Text, NullWritable, Text> {
// create the Tree Map with MySalaryComparator
public static TreeMap<sala, Text> ToRecordMap = new TreeMap<Salary , Text>(new
MySalaryComp1());
public void map(LongWritable key, Text value, Context context)throws IOException,
InterruptedException {
String line=value.toString();
String[] tokens=line.split("\t");
//split the data and fetch salary
int salary=Integer.parseInt(tokens[3]);
//insert salary object as key and entire row as value
//tree map sort the records based on salary
ToRecordMap.put(new Salary (salary), new Text(value));
// If we have more than ten records, remove the one with the lowest salary
// As this tree map is sorted in descending order, the employee with
// the lowest salary is the last key.
Iterator<Entry<Salary , Text>> iter = ToRecordMap.entrySet().iterator();
63
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Reducer Code:
import java.io.IOException;
import java.util.Iterator;
import java.util.TreeMap;
import java.util.Map.Entry;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
public class Top20Reducer extends Reducer<NullWritable, Text, NullWritable, Text> {
public static TreeMap<Salary , Text> ToRecordMap = new TreeMap<Salary , Text>(new
MySalaryComp1());
public void reduce(NullWritable key, Iterable<Text> values,Context context) throws
IOException, InterruptedException {
for (Text value : values) {
String line=value.toString();
if(line.length()>0){
String[] tokens=line.split("\t");
//split the data and fetch salary
int salary=Integer.parseInt(tokens[3]);
//insert salary as key and entire row as value
//tree map sort the records based on salary
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
}
}
// If we have more than ten records, remove the one with the lowest sal
// As this tree map is sorted in descending order, the user with
// the lowest sal is the last key.
Iterator<Entry<Salary , Text>> iter = ToRecordMap.entrySet().iterator();
Entry<Salary , Text> entry = null;
while(ToRecordMap.size()>10){
entry = iter.next();
iter.remove();
}
for (Text t : ToRecordMap.descendingMap().values()) {
// Output our ten records to the file system with a null key
context.write(NullWritable.get(), t);
}
}
}
Expected Output:
The Output: of the Job is Top K records.
This way we can obtain the Top K records using MapReduce functionality.
I hope this blog was helpful in giving you a better understanding of
Implementing MapReduce design pattern.
Actual Output:
Result:
65
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
EXP NO: 12 MapReduce program to analyze Titanic ship data and to find the
Date: average age of the people
AIM: Develop a MapReduce program to analyze Titanic ship data and to find the average
age of the people (both male and female) who died in the tragedy. How many persons are
survived in each class.
Description:
There have been huge disasters in the history of Map reduce, but the magnitude of the Titanic’s
disaster ranks as high as the depth it sank too. So much so that subsequent disasters have always
been described as “titanic in proportion” – implying huge losses.
Anyone who has read about the Titanic, know that a perfect combination of natural events and
human errors led to the sinking of the Titanic on its fateful maiden journey from Southampton to
New York on April 14, 1912.
There have been several questions put forward to understand the cause/s of the tragedy – foremost
among them is: What made it sink and even more intriguing How can a 46,000 ton ship sink to
the depth of 13,000 feet in a matter of 3 hours? This is a mind boggling question indeed!
There have been as many inquisitions as there have been questions raised and equally that many
types of analysis methods applied to arrive at conclusions. But this blog is not about analyzing
why or what made the Titanic sink – it is about analyzing the data that is present about the
Titanic publicly. It actually uses Hadoop MapReduce to analyze and arrive at:
• The average age of the people (both male and female) who died in the tragedy
using Hadoop MapReduce.
• How many persons survived – traveling class wise.
This blog is about analyzing the data of Titanic. This total analysis is performed in Hadoop
MapReduce.
66
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
This Titanic data is publically available and the Titanic data set is described below under
the heading Data Set Description.
Using that dataset we will perform some Analysis and will draw out some insights
like finding the average age of male and females died in Titanic, Number of males and
females died in each compartment.
Mapper code:
public class Average_age {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
String str[]=line.split(",");
if(str.length>6){
gender.set(str[4]);
if((str[1].equals("0")) ){
67
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
if(str[5].matches("\\d+")){
int i=Integer.parseInt(str[5]);
age.set(i);
}
}
}
context.write(gender, age)
}
}
Reducer Code:
public static class Reduce extends Reducer<Text,IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
int l=0;
for (IntWritable val : values) {
l+=1;
sum += val.get();
}
sum=sum/l;
context.write(key, new IntWritable(sum));
}
}
Configuration Code:
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
https://github.com/kiran0541/Map-
Reduce/blob/master/Average%20age%20of%20male%20and%20female%20
people%20died%20in%20titanic
68
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Way to to execute the Jar file to get the result of the first problem statement:
Here ‘hadoop’ specifies we are running a Hadoop command and jar specifies
which type of application we are running and average.jar is the jar file which we
have created which consists the above source code and the path of the Input file
name in our case it is TitanicData.txt and the output file where to store the output
here we have given it as avg out.
Result:
69
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
EXP NO: 13
MapReduce program to analyze Uber data set
Date:
AIM: To Develop a MapReduce program to analyze Uber data set to find the days on
which each basement has more trips using the following dataset.
The Uber dataset consists of four columns they are
dispatching_base_number date active_vehicles trips
Problem Statement 1: In this problem statement, we will find the days on which each
basement has more trips.
Source Code
Mapper Class:
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");
String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};
private Text basement = new Text();
Date date = null;
private int trips;
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String line = value.toString();
String[] splits = line.split(",");
basement.set(splits[0]);
try {
date = format.parse(splits[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
trips = new Integer(splits[3]);
String keys = basement.toString()+ " "+days[date.getDay()];
context.write(new Text(keys), new IntWritable(trips));
}
}
70
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Reducer Class:
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
job.setJarByClass(Uber1.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
72
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Expected Output:
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Result:
74
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Description:
The National Climatic Data Center (NCDC) is the world's largest active archive of weather data.
I downloaded the NCDC data for year 1930 and loaded it in HDFS system. I implemented
MapReduce program and Pig, Hove scripts to findd the Min, Max, avg temparature for diffrent
stations.
PIG Script
Hive Script
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
MaxTemperature.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
76
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
MaxTemperatureMapper.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
MaxTemperatureReducer.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
77
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
Expected Output:
1921 -222
1921 -144
1921 -122
1921 -139
1921 -122
1921 -89
1921 -72
1921 -61
1921 -56
1921 -44
1921 -61
1921 -72
1921 -67
1921 -78
1921 -78
1921 -133
1921 -189
1921 -250
1921 -200
1921 -150
1921 -156
78
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
1921 -144
1921 -133
1921 -139
1921 -161
1921 -233
1921 -139
1921 -94
1921 -89
1921 -122
1921 -100
1921 -100
1921 -106
1921 -117
1921 -144
1921 -128
1921 -139
1921 -106
1921 -100
1921 -94
1921 -83
1921 -83
1921 -106
1921 -150
1921 -200
1921 -178
1921 -72
1921 -156
Actual Output:
79
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
Result:
80
Department of CSE
BIG DATA ANALYTICS LABORATORY (19A05602P)
EXP NO: 16
Java application to find the maximum temperature using Spark
Date:
AIM: To Develop a Java application to find the maximum temperature using Spark.
Sourcecode:
import org.apache.spark._;
object testfilter extends App {
val conf=new SparkConf().setMaster(“local[2]”).setAppName(“testfilter”)
val sc = new SparkContext(conf)
System.setProperty(“hadoop.home.dir”, “c://winutil//”)
val input=sc.textFile(“file:///D://sparkprog//temp//stats.txt”)
val line=input.map(x=>x.split(“\t
val city=line.map(x=>(x(3)+”\t x(4)))
val rdd3=city.map(x=>x.split(“\t
val maintemp=rdd3.map(x=>((x(0),x(1))))
val grp= maintemp.groupByKey()
val main = grp.map{case (x,iter) => (x,iter.toList.max)}
for ( i<- main)
{
print(i)
print(“\n”)
}
}
OUTPUT:
87
Department of CSE