0% found this document useful (0 votes)

48 views53 pages

Notes

The document describes how to install Apache Hadoop in standalone mode and run a sample MapReduce word count program to verify the installation. Key steps include installing Java, downloading and extracting Hadoop files, configuring environment variables, formatting HDFS, starting core Hadoop services, and executing the sample word count program on sample input data.

Uploaded by

Radheshyam Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views53 pages

Notes

Uploaded by

Radheshyam Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 53

Big data

EXPERIMENT FILE

B.TECH
(IV YEAR – VII SEM)

(2022‐2023)

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING

SUBMITTED BY :Naresh Baiga

SUBMITTED TO: Vandana Gautam
BIG DATA LABORATORY

Course Objectives:

This course is designed to:

1. Get familiar with Hadoop distributions, configuring Hadoop and performing File
management tasks
2. Experiment MapReduce in Hadoop frameworks
3. Implement MapReduce programs in variety applications
4. Explore MapReduce support for debugging
5. Understand different approaches for building Hadoop MapReduce programs for real-time
applications

Experiments:

1. Install Apache Hadoop

2. Develop a MapReduce program to calculate the frequency of a given word in agiven file.

3. Develop a MapReduce program to find the maximum temperature in each year.

4. Develop a MapReduce program to find the grades of student’s.

5. Develop a MapReduce program to implement Matrix Multiplication.

6. Develop a MapReduce to find the maximum electrical consumption in each year given
electrical consumption for each month in each year.

7. Develop a MapReduce to analyze weather data set and print whether the day is shinny or
cool day.

8. Develop a MapReduce program to find the number of products sold in each country by
considering sales data containing fields like

Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi

_Date uct ce _Type Me ty ate ntry Created ogin ude tude
9. Develop a MapReduce program to find the tags associated with each movie by analyzing
movie lens data.
10. XYZ.com is an online music website where users listen to various tracks, the data gets
collected which is given below.

The data is coming in log files and looks like as shown below.
UserId | TrackId | Shared | Radio | Skip

111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0

Write a MapReduce program to get the following

• Number of unique listeners
• Number of times the track was shared with others
• Number of times the track was listened to on the radio
• Number of times the track was listened to in total
• Number of times the track was skipped on the radio

11. Develop a MapReduce program to find the frequency of books published eachyear and find
in which year maximum number of books were published usingthe following data.
Title Author Published Author Language No of pages
year country
12. Develop a MapReduce program to analyze Titanic ship data and to find the average age of
the people (both male and female) who died in the tragedy. How many persons are survived
in each class.
The titanic data will be..
Column 1 :PassengerI d Column 2 : Survived (survived=0
&died=1)
Column 3 :Pclass Column 4 : Name
Column 5 : Sex Column 6 : Age
Column 7 :SibSp Column 8 :Parch
Column 9 : Ticket Column 10 : Fare
Column 11 :Cabin Column 12 : Embarked
13. Develop a MapReduce program to analyze Uber data set to find the days on which each
basement has more trips using the following dataset.

The Uber dataset consists of four columns they are

dispatching_base_number date active_vehicles trips
14. Develop a program to calculate the maximum recorded temperature by yearwise for the
weather dataset in Pig Latin
15. Write queries to sort and aggregate the data in a table using HiveQL.

16. Develop a Java application to find the maximum temperature using Spark. Text Books:

1. Tom White, “Hadoop: The Definitive Guide” Fourth Edition, O’reilly Media, 2015.
Reference Books:

1. Glenn J. Myatt, Making Sense of Data , John Wiley & Sons, 2007 Pete Warden, Big Data
Glossary, O’Reilly, 2011.
2. Michael Berthold, David J.Hand, Intelligent Data Analysis, Spingers, 2007.
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos, Uderstanding Big
Data : Analytics for Enterprise Class Hadoop and Streaming Data, McGrawHill Publishing,
2012.
4. AnandRajaraman and Jeffrey David UIIman, Mining of Massive Datasets Cambridge
University Press, 2012.

Course Outcomes:

Upon completion of the course, the students should be able to:

1. Configure Hadoop and perform File Management Tasks (L2)

2. Apply MapReduce programs to real time issues like word count, weather dataset and
sales of a company (L3)
3. Critically analyze huge data set using Hadoop distributed file systems and MapReduce
(L5)
4. Apply different data processing tools like Pig, Hive and Spark.(L6)
BIG DATA LABORATORY

EXP NO: 1
Install Apache Hadoop
Date:
AIM: To Install Apache Hadoop.

Hadoop software can be installed in three modes of

Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open
source project in the big data playing field and is sponsored by the Apache Software
Foundation.

Hadoop-2.7.3 is comprised of four main layers:

 Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
 HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
 YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
 MapReduce is the original processing model for Hadoop clusters. It distributes work
within the cluster or map, then organizes and reduces the results from the nodes into a
response to a query. Many other processing models are available for the 2.x version of
Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode
which is suitable for learning about Hadoop, performing simple operations, and debugging.

Procedure:

we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.

Prerequisites:

Step1: Installing Java 8 version.

Openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME

Step2: Installing Hadoop

With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent
stable release. Follow the binary for the current release:
1

Department of CSE
BIG DATA LABORATORY

Download Hadoop from www.hadoop.apache.org

Department of CSE
BIG DATA LABORATORY

2
Procedure to Run Hadoop

1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS

If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install,
Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.

2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)

Run following commands.

Command Prompt
C:\Users\abhijitg>cd c:\hadoop
c:\hadoop>sbin\start-dfs c:\
hadoop>sbin\start-yarn starting
yarn daemons

Namenode, Datanode, Resource Manager and Node Manager will be started in few
minutes and ready to execute Hadoop MapReduce job in the Single Node
(pseudo-distributed mode) cluster.

Resource Manager & Node Manager:

Department of CSE
BIG DATA LABORATORY

Run wordcount MapReduce job

Now we'll run wordcount MapReduce job available
in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-
2.2.0.jar

Create a text file with some content. We'll pass this file as input to the
wordcount MapReduce job for counting words.
C:\file1.txt
Install Hadoop

Run Hadoop Wordcount Mapreduce Example

Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used for counting
words.
C:\Users\abhijitg>cd c:\hadoop
C:\hadoop>bin\hdfs dfs -mkdir input
Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.

C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

Check content of the copied file.
5

Department of CSE
BIG DATA LABORATORY

C:\hadoop>hdfs dfs -ls input

Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt

Install Hadoop
Run Hadoop Wordcount Mapreduce Example
Run the wordcount MapReduce job provided in %HADOOP_HOME%\share\hadoop\
mapreduce\hadoop-mapreduce-examples-2.2.0.jar

C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-

2.2.0.jar wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 1
14/02/03 13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application
application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job: job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running in
uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002 completed
successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=171
6

Department of CSE
BIG DATA LABORATORY

HDFS: Number of bytes written=59

HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5657
Total time spent by all reduces in occupied slots (ms)=6128
Map-Reduce Framework
Map input records=2
Map output records=7
Map output bytes=82
Map output materialized bytes=89
Input split bytes=116
Combine input records=7
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=89
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=145
CPU time spent (ms)=1418
Physical memory (bytes) snapshot=368246784
Virtual memory (bytes) snapshot=513716224
Total committed heap usage (bytes)=307757056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0 WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters Bytes
Read=55
7

Department of CSE
BIG DATA LABORATORY

File Output Format Counters

Bytes Written=59
http://abhijitg:8088/cluster

Result: We've installed Hadoop in stand-alone mode and verified it by running an example
program it provided.
EXP NO: 2
MapReduce program to calculate the frequency
Date:
AIM: To Develop a MapReduce program to calculate the frequency of a given word in agiven

file Map Function – It takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (Key-Value pair).

Example – (Map function in Word Count)

Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN

Output
Convert into another set of data
(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
8

Department of CSE
BIG DATA LABORATORY

(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)

Reduce Function – Takes the output from Map as an input and combines those data tuples
into a smaller set of tuples.

Example – (Reduce function in Word Count)

Input Set of Tuples
(output of Map function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)

Output Converts into smaller set of tuples

(BUS,7), (CAR,7), (TRAIN,4)

Work Flow of Program

Workflow of MapReduce consists of 5 steps

Department of CSE
BIG DATA LABORATORY

1. Splitting – The splitting parameter can be anything, e.g. splitting by space, comma,
semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In order to
group them in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each cluster) is
combine together to form a Result

Now Let’s See the Word Count Program in Java

Make sure that Hadoop is installed on your system with java idk

Steps to follow

Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo) >
Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish
Step 3. Right Click on Package > New > Class (Name it - WordCount)
Step 4. Add Following Reference Libraries –
Right Click on Project > Build Path> Add External Archivals
• /usr/lib/hadoop-0.20/hadoop-core.jar
• Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

Program: Step 5. Type following Program :

package PackageDemo; import

java.io.IOException;
import org.apache.hadoop.conf.Configuration; import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text; import
org.apache.hadoop.mapreduce.Job; import
org.apache.hadoop.mapreduce.Mapper; import
org.apache.hadoop.mapreduce.Reducer; import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import
org.apache.hadoop.util.GenericOptionsParser; public class
WordCount {

Department of CSE
BIG DATA LABORATORY

public static void main(String [] args) throws Exception

{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(j,
input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
String[] words=line.split(","); for(String
word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,
IntWritable> {
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws
IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
11

Department of CSE
BIG DATA LABORATORY

{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}

Make Jar File

Right Click on Project> Export> Select export destination as Jar File > next> Finish

Department of CSE
BIG DATA LABORATORY

To Move this into Hadoop directly, open the terminal and enter the following commands:
13

Department of CSE
BIG DATA LABORATORY

[training@localhost ~]$ hadoop fs -put wordcountFile wordCountFile

Run Jar file

(Hadoop jar jarfilename.jar packageName.ClassName PathToInputTextFile
PathToOutputDirectry)

[training@localhost ~]$ Hadoop jar MRProgramsDemo.jar

PackageDemo.WordCount wordCountFile MRDir1

Result: Open Result

[training@localhost ~]$ hadoop fs -ls MRDir1

Found 3 items
-rw-r--r-- 1 training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_SUCCESS
drwxr-xr-x - training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_logs
-rw-r--r-- 1 training supergroup
20 2016-02-23 03:36 /user/training/MRDir1/part-r-00000
[training@localhost ~]$ hadoop fs -cat MRDir1/part-r-00000
BUS 7
CAR 4
TRAIN 6
EXP NO: 3
MapReduce program to find the maximum temperature in each year
Date:
AIM: To Develop a MapReduce program to find the maximum temperature in each year.

Description: MapReduce is a programming model designed for processing large volumes of data
in parallel by dividing the work into a set of independent tasks.Our previous traversal has given
an introduction about MapReduce This traversal explains how to design a MapReduce program.
The aim of the program is to find the Maximum temperature recorded for each year of NCDC
data. The input for our program is weather data files for each year This weather data is collected
by National Climatic Data Center – NCDC from weather sensors at all over the world. You can
find weather data for each year from ftp://ftp.ncdc.noaa.gov/pub/data/noaa/.All files are zipped
by year and the weather station. For each year, there are multiple files for different weather
stations. Here is an example for 1990 (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1901/).

• 010080-99999-1990.gz
• 010100-99999-1990.gz
• 010150-99999-1990.gz
14

Department of CSE
BIG DATA LABORATORY

• …………………………………

MapReduce is based on set of key value pairs. So first we have to decide on the types for the
key/value pairs for the input.
Map Phase: The input for Map phase is set of weather data files as shown in snap shot. The
types of input key value pairs are LongWritable and Text and the types of output key value pairs
are Text and IntWritable. Each Map task extracts the temperature data from the given year file.
The output of the map phase is set of key value pairs. Set of keys are the years. Values are the
temperature of each year.
Reduce Phase: Reduce phase takes all the values associated with a particular key. That is all the
temperature values belong to a particular year is fed to a same reducer. Then each reducer finds
the highest recorded temperature for each year. The types of output key value pairs in Map phase
is same for the types of input key value pairs in reduce phase (Text and IntWritable). The types
of output key value pairs in reduce phase is too Text and IntWritable. So, in this example we
write three java classes:

• HighestMapper.java
• HighestReducer.java
• HighestDriver.java

Program: HighestMapper.java
import java.io.IOException; import
org.apache.hadoop.io.*; import
org.apache.hadoop.mapred.*;
public class HighestMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,
IntWritable>
{ public static final int MISSING = 9999;
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException
{
String line = value.toString();
String year =
line.substring(15,19); int
temperature; if
(line.charAt(87)=='+')
temperature = Integer.parseInt(line.substring(88, 92));
else

Department of CSE
BIG DATA LABORATORY

temperature = Integer.parseInt(line.substring(87, 92)); String

quality = line.substring(92, 93); if(temperature != MISSING
&& quality.matches("[01459]")) output.collect(new
Text(year),new IntWritable(temperature));
}
}

HighestReducer.java
import java.io.IOException; import
java.util.Iterator; import
org.apache.hadoop.io.*; import
org.apache.hadoop.mapred.*;
public class HighestReducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException
{
int max_temp = 0;
;
while (values.hasNext())
{
int current=values.next().get();
if ( max_temp < current)
max_temp = current;
}
output.collect(key, new
IntWritable(max_temp/10)); }

HighestDriver.java
import org.apache.hadoop.fs.Path; import
org.apache.hadoop.conf.*; import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*; import
org.apache.hadoop.util.*; public class HighestDriver
extends Configured implements
Tool{public int run(String[] args) throws Exception
{
16

Department of CSE
BIG DATA LABORATORY

JobConf conf = new JobConf(getConf(), HighestDriver.class);

conf.setJobName("HighestDriver");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(HighestMapper.class);
conf.setReducerClass(HighestReducer.class);
Path inp = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.addInputPath(conf, inp);
FileOutputFormat.setOutputPath(conf, out);
JobClient.runJob(conf)
; return 0; }
public static void main(String[] args) throws Exception
{
int res = ToolRunner.run(new Configuration(), new HighestDriver(),args);
System.exit(res);
}
}

Output:

Department of CSE
BIG DATA LABORATORY

Result:

17
EXP NO: 4
MapReduce program to find the grades of student’s
Date:
AIM: To Develop a MapReduce program to find the grades of student’s.

import java.util.Scanner;
public class JavaExample
{ public static void main(String args[])
{
/* This program assumes that the student has 6 subjects,
* thats why I have created the array of size 6. You can
* change this as per the requirement.
*/ int marks[] = new
int[6]; int i; float
total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+
(i+1)+":"); marks[i] = scanner.nextInt(); total = total +
marks[i];
}
scanner.close();
//Calculating average
here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
}
else if(avg>=40 && avg<60)
{

Department of CSE
BIG DATA LABORATORY

System.out.print("C");
}
else
{

System.out.print("D");
}
}
}

Expected Output:

Enter Marks of Subject1:40

Enter Marks of Subject2:80
Enter Marks of Subject3:80
Enter Marks of Subject4:40
Enter Marks of Subject5:60
Enter Marks of Subject6:60
The student Grade is: B Actual
Output:

Result:

EXP NO: 5
MapReduce program to implement Matrix Multiplication
Date:
19

Department of CSE
BIG DATA LABORATORY

AIM: To Develop a MapReduce program to implement Matrix Multiplication.

In mathematics, matrix multiplication or the matrix product is a binary

operation that produces a matrix from two matrices. The definition is motivated by
linear equations and linear transformations on vectors, which have numerous
applications in applied mathematics, physics, and engineering. In more detail, if A
is an n×m matrix and B is an m×p matrix, their matrix product AB is an n× p
matrix, in which the m entries across a row of A are multiplied with the m entries
down a column of B and summed to produce an entry of AB. When two linear
transformations are represented by matrices, then the matrix product represents the
composition of the two transformations.

Algorithm for Map Function.

a. for each element mij of M do

produce (key,value) pairs as ((i,k), (M,j,m ij), for k=1,2,3,.. upto the number of
columns of N

Department of CSE
BIG DATA LABORATORY

b. for each element njk of N do produce (key,value) pairs as ((i,k),(N,j,Njk), for i =

1,2,3,.. Upto the number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij)
and (N, j,njk) for all possible values of j.
Algorithm for Reduce Function.

d. for each key (i,k) do

e. sort values begin with M by j in listM sort values begin with N by j in listN
multiply mij and njk for jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk

Step 1. Download the hadoop jar files with these links.

DownloadHadoopCommon Jar files: https://goo.gl/G4MyHp
$ wget https://goo.gl/G4MyHp -O hadoop-common-2.2.0.jar
DownloadHadoop Mapreduce Jar File: https://goo.gl/KT8yfB $ wget
https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-2.7.1.jar

Step 2. Creating Mapper file for Matrix Multiplication.

import java.io.DataInput;
import java.io.DataOutput;
import
java.io.IOException;
import java.util.ArrayList;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.DoubleWritable; import
org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.io.Text; import
org.apache.hadoop.io.Writable; import
org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import
21

Department of CSE
BIG DATA LABORATORY

org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;

class Element implements Writable

{int tag; int
index;
double
value;
Element() { tag =
0; index =
0; value =
0.0;
}
Element(int tag, int index, double value)
{this.tag = tag;
this.index =
index; this.value =
value;
}
@Override
public void readFields(DataInput input) throws IOException
{tag = input.readInt();
index = input.readInt();
value =
input.readDouble();
}
@Override
public void write(DataOutput output) throws IOException
{output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair>
{int i;
int j;

Pair() { i =
0;
22

Department of CSE
BIG DATA LABORATORY

j = 0;
}
Pair(int i, int j)
{ this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException
{i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException
{output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare)
{if (i > compare.i)
{ return 1;
} else if ( i < compare.i)
{return -1;
} else { if(j > compare.j)
{ return 1;
} else if (j < compare.j)
{return -1;
}
} return
0;
}
public String toString()
{ return i + " " + j + "
";
}
}
public class Multiply { public static class MatriceMapperM extends
Mapper<Object,Text,IntWritable,Element> {

23
@Override

Department of CSE
BIG DATA LABORATORY

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException { String
readLine = value.toString(); String[] stringTokens
= readLine.split(",");

int index = Integer.parseInt(stringTokens[0]);

double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new
IntWritable(Integer.parseInt(stringTokens[1]));
context.write(keyValue, e);
}
}
public static class MatriceMapperN extends Mapper<Object,Text,IntWritable,Element>
{@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException { String
readLine = value.toString(); String[] stringTokens
= readLine.split(","); int index =
Integer.parseInt(stringTokens[1]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(1,index, elementValue);
IntWritable keyValue = new
IntWritable(Integer.parseInt(stringTokens[0]));
context.write(keyValue, e);
}
}
public static class ReducerMxN extends Reducer<IntWritable,Element, Pair,
DoubleWritable> { @Override
public void reduce(IntWritable key, Iterable<Element> values, Context context) throws
IOException, InterruptedException {
ArrayList<Element> M = new ArrayList<Element>();
ArrayList<Element> N = new ArrayList<Element>();
Configuration conf = context.getConfiguration();
for(Element element : values) {
Element tempElement = ReflectionUtils.newInstance(Element.class,
conf);
ReflectionUtils.copy(conf, element, tempElement);

Department of CSE
BIG DATA LABORATORY

if (tempElement.tag == 0)
{ M.add(tempEleme
n
t);
} else if(tempElement.tag == 1)
{N.add(tempElement);
}
}
for(int i=0;i<M.size();i++) { for(int
j=0;j<N.size();j++) {

Pair p = new Pair(M.get(i).index,N.get(j).index); double

multiplyOutput = M.get(i).value * N.get(j).value;

context.write(p, new
DoubleWritable(multiplyOutput)); }
}
}
}
public static class MapMxN extends Mapper<Object, Text, Pair, DoubleWritable>
{@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] pairValue = readLine.split(" ");
Pair p = new
Pair(Integer.parseInt(pairValue[0]),Integer.parseInt(pairValue[1]));
DoubleWritable val = new
DoubleWritable(Double.parseDouble(pairValue[2]));
context.write(p, val);
}
}
public static class ReduceMxN extends Reducer<Pair, DoubleWritable, Pair,
DoubleWritable>
{ @Override
public void reduce(Pair key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double sum = 0.0;
for(DoubleWritable value : values) {
25

Department of CSE
BIG DATA LABORATORY

sum += value.get();
}
context.write(key, new
DoubleWritable(sum)); }
}
public static void main(String[] args) throws Exception
{Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,
MatriceMapperN.class); job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true); Job
job2 = Job.getInstance();
job2.setJobName("MapFinalOutput")
; job2.setJarByClass(Project1.class);

job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);

job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);

job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);

job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job2, new Path(args[2]));

FileOutputFormat.setOutputPath(job2, new Path(args[3]));

Department of CSE
BIG DATA LABORATORY

job2.waitForCompletion(true);
}
}

Step 5. Compiling the program in particular folder named as operation

#!/bin/bash rm -rf

multiply.jar classes

module load

hadoop/2.6.0

mkdir -p classes javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop

classpath` Multiply.java jar cf multiply.jar -C classes .

echo "end"

Step 6. Running the program in particular folder named as operation

export HADOOP_CONF_DIR=/home/$USER/cometcluster
module load hadoop/2.6.0
myhadoop-
configure.sh start-
dfs.sh start-yarn.sh

hdfs dfs -mkdir -p /user/$USER hdfs dfs -put M-matrix-

large.txt /user/$USER/M-matrix-large.txt hdfs dfs -put N-matrix-
large.txt /user/$USER/N-matrix-large.txt
hadoop jar multiply.jar edu.uta.cse6331.Multiply /user/$USER/M-matrix-large.txt
/user/$USER/N-matrix-large.txt /user/$USER/intermediate /user/$USER/output
rm -rf output-distr mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr

stop-yarn.sh
stop-dfs.sh myhadoop-
cleanup.sh Expected Output:

Department of CSE
BIG DATA LABORATORY

module load hadoop/2.6.0

rm -rf output intermediate

hadoop --config $HOME jar multiply.jar edu.uta.cse6331.Multiply M-matrix-small.txt N-

matrixsmall.txt intermediate output

Actual Output:

Result:
EXP NO: 6 MapReduce to find the maximum electrical consumption in
Date: each year
AIM: To Develop a MapReduce to find the maximum electrical consumption in each year
given electrical consumption for each month in each year.

Department of CSE
BIG DATA LABORATORY

Given below is the data regarding the electrical consumption of an organization. It contains the
monthly electrical consumption and the annual average for various years.

If the above data is given as input, we have to write applications to process it and produce
results such as finding the year of maximum usage, year of minimum usage, and so on. This is a
walkover for the programmers with finite number of records. They will simply write the logic
to produce the required output, and pass the data to the application written.
But, think of the data representing the electrical consumption of all the largescale industries of a
particular state, since its formation.

When we write applications to process such bulk data,

• They will take a lot of time to execute.

• There will be a heavy network traffic when we move data from source to network server and
so on.
To solve these problems, we have the MapReduce framework

Input Data
The above data is saved as sample.txt and given as input. The input file looks as shown below.

1979 23 23 2 43 24 25 26 26 26 26 25 26 25

1980 26 27 28 28 28 30 31 31 31 30 30 30 29

1981 31 32 32 32 33 34 35 36 36 34 34 34 34

1984 39 38 39 39 39 41 42 43 40 39 38 38 40

1985 38 39 39 39 39 41 41 41 00 40 39 39 45

Source code:
import java.util.*; import
java.io.IOException; import
java.io.IOException; import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.conf.*; import
org.apache.hadoop.io.*; import
org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class ProcessUnits
{
//Mapper class
29

Department of CSE
BIG DATA LABORATORY

public static class E_EMapper extends MapReduceBase implements

Mapper<LongWritable ,/*Input key Type */ Text, /*Input value Type*/
Text, /*Output key Type*/ IntWritable> /*Output value Type*/
{
//Map function
publicvoid map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString(); String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken(); while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}
int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}
//Reducer class
public static class E_EReduce extends MapReduceBase implements
Reducer< Text, IntWritable, Text, IntWritable >
{
//Reduce function
public void reduce( Text key, Iterator <IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws
IOException
{
int maxavg=30; int
val=Integer.MIN_VALUE;
while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}
}
}
//Main function
public static void main(String args[])throws Exception

Department of CSE
BIG DATA LABORATORY

{
JobConf conf = new JobConf(ProcessUnits.class);
conf.setJobName("max_eletricityunits"); conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}

Expected OUTPUT:
Input:
Kolkata,56
Jaipur,45
Delhi,43
Mumbai,34
Goa,45
Kolkata,35
Jaipur,34
Delhi,32
Output:
Kolkata 56
Jaipur 45
Delhi 43
Mumbai 34

Actual Output:
Result:

EXP NO: 7 MapReduce to analyze weather data set and print whether the
Date: day is shinny or cool
AIM: To Develop a MapReduce to analyze weather data set and print whether the day is shinny
or cool day.

Department of CSE
BIG DATA LABORATORY

NOAA’s National Climatic Data Center (NCDC) is responsible for preserving,

monitoring, assessing, and providing public access to weather data.

NCDC provides access to daily data from the U.S. Climate Reference Network /
U.S. Regional Climate Reference Network (USCRN/USRCRN) via anonymous ftp at:

Dataset ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01

After going through wordcount mapreduce guide, you now have the basic idea of
how a mapreduce program works. So, let us see a complex mapreduce program
on weather dataset. Here I am using one of the dataset of year 2015 of Austin,
Texas . We will do analytics on the dataset and classify whether it was a hot day or
a cold day depending on the temperature recorded by NCDC.
NCDC gives us all the weather data we need for this

mapreduce project. The dataset which we will be using

looks like below snapshot.

Department of CSE
BIG DATA LABORATORY

ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2015/CRND
0103-2015- TX_Austin_33_NW.txt

Step 1

Download the complete project using below link.

https://drive.google.com/file/d/0B2SFMPvhXPQ5bUdoVFZsQjE2ZDA/view?

usp=sharing
import java.io.IOException; import
java.util.Iterator; import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import
org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import
org.apache.hadoop.mapreduce.Job; import
org.apache.hadoop.mapreduce.Mapper; import
org.apache.hadoop.mapreduce.Reducer; import
org.apache.hadoop.conf.Configuration; public class MyMaxMin {

Department of CSE
BIG DATA LABORATORY

public static class MaxTemperatureMapper extends

Mapper<LongWritable, Text, Text, Text> {
/**
* @method map
* This method takes the input as text data type
* Now leaving the first five tokens,it takes 6th token is taken as temp_max and
* 7th token is taken as temp_min. Now temp_max > 35 and temp_min < 10 are passed to the reducer.
*/ @Override
public void map(LongWritable arg0, Text Value, Context 2 context) throws IOException,
InterruptedException {
//Converting the record (single line) to String and storing it in a String variable line

String line = Value.toString();

//Checking if the line is not empty
if (!(line.length() == 0)) {

//date

String date = line.substring(6, 14);

//maximum temperature float
temp_Max = Float
parseFloat(line.substring(39, 45).trim());
//minimum temperature
float temp_Min = Float
parseFloat(line.substring(47, 53).trim());
//if maximum temperature is greater than 35 , its a hot day
if (temp_Max > 35.0) { // Hot day
context.write(new Text("Hot Day " + date),
new Text(String.valueOf(temp_Max)));
}
//if minimum temperature is less than 10, it’s a cold day if

(temp_Min < 10) {

Department of CSE
BIG DATA LABORATORY

// Cold day context.write(new

Text("Cold Day " + date),

new Text(String.valueOf(temp_Min)));
}
}
}
}
//Reducer
*MaxTemperatureReducer class is static and extends Reducer abstract having

four hadoop generics type Text, Text, Text, Text.

public static class MaxTemperatureReducer extends Reducer<Text, Text, Text,

Text> {

public void reduce (Text Key, Iterator<Text> Values, Context context) throws
IOException, Interrupted Exception { String
temperature = Values.next().toString();
context.write(Key, new Text(temperature));
}
}
public static void main(String[] args) throws Exception
{Configuration conf = new Configuration();
Job job = new Job(conf, "weather example");

job.setJarByClass(MyMaxMin.class);

job.setMapOutputKeyClass(Text.class);

Department of CSE
BIG DATA LABORATORY

job.setMapOutputValueClass(Text.class);

job.setMapperClass(MaxTemperatureMapper.class);

job.setReducerClass(MaxTemperatureReducer.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

Path OutputPath = new Path(args[1]);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

OutputPath.getFileSystem(conf).delete(OutputPath);

System.exit(job.waitForCompletion(true) ? 0 : 1);

Import the project in eclipse IDE in the same way it was told in earlier guide and
change the jar paths with the jar files present in the lib directory of this project.

When the project is not having any error, we will export it as a jar file, same as we
did in wordcount mapreduce guide. Right Click on the Project file and click on
Export. Select jar file.

Give the path where you want to save the file.

Department of CSE
BIG DATA LABORATORY

Click on Finish to export.

Department of CSE
BIG DATA LABORATORY

You can download the jar file directly using below link temperature.jar

https://drive.google.com/file/d/0B2SFMPvhXPQ5RUlZZDZSR3FYVDA/view?us

p=sharing

Download Dataset used by me using below

link weather_data.txt

https://drive.google.com/file/d/0B2SFMPvhXPQ5aFVILXAxbFh6ejA/view?usp=s
haring

OUTPUT:

Department of CSE
BIG DATA LABORATORY

Result:

Department of CSE
BIG DATA LABORATORY

EXP NO: 8 MapReduce program to find the number of products sold in

Date: each country
AIM: Develop a MapReduce program to find the number of products sold in each country by
considering sales data containing fields like
Tranction Prod Pri Payment Na Ci St Cou Account_ Last_L Latit Longi
_Date uct ce _Type me ty ate ntry Created ogin ude tude

Source code:
public class Driver extends Configured implements Tool
{enum Counters { DISCARDED_ENTRY
}
public static void main(String[] args) throws Exception { ToolRunner.run(new Driver(), args);
} public int run(String[] args) throws Exception { Configuration configuration =

getConf();

Job job = Job.getInstance(configuration);

job.setJarByClass(Driver.class);

job.setMapperClass(Mapper.class); job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);

job.setCombinerClass(Combiner.class); job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job,

new Path(args[1]));

return job.waitForCompletion(true) ? 0 : -1;

}
}

public class Mapper extends org.apache.hadoop.mapreduce.Mapper< LongWritable,

Text, LongWritable, Text
{
@Override

protected void
map( LongWritabl
e key,

Text value,
42

Department of CSE
BIG DATA LABORATORY

org.apache.hadoop.mapreduce.Mapper<

43
LongWritable,
Text,
LongWritable,
Text

>.Context context

) throws IOException, InterruptedException {

// parse the CSV line

ArrayList<String> values = this.parse(value.toString());

// validate the parsed values if

(this.isValid(values)) {

// fetch the third and the fourth column

String time = values.get(3);

String year = values.get(2)

.substring(values.get(2).length() - 4);

// convert time to minutes (e.g. 1542 -> 942) int

minutes = Integer.parseInt(time.substring(0, 2))

* 60 + Integer.parseInt(time.substring(2,4));

// create the aggregate atom (a/n)

// with a = time in minutes and n = 1

context.write( new

LongWritable(Integer.parseInt(year)), new

Text(Integer.toString(minutes) + ":1")

);

} else
{

// invalid line format, so we increment a counter

Department of CSE
BIG DATA LABORATORY

context.getCounter(Driver.Counters.DISCARDED_ENTRY).increment(1);
}} protected boolean
isValid(ArrayList<String> values)

{return values.size() > 3

&& values.get(2).length() == 10
&& values.get(3).length() == 4;
} protected ArrayList<String> parse(String
line)
{ ArrayList<String> values = new
ArrayList<>();String current = "";
boolean escaping = false; for (int i
= 0; i < line.length(); i++){ char c =
line.charAt(i); if (c == '"')
{ escaping = !escaping;
} else if (c == ',' && !
escaping)
{values.add(current);
current = "";
} else { current
+= c;
}
}
values.add(current);
return values;
}
}
public class Combiner extends org.apache.hadoop.mapreduce.Reducer< LongWritable,Text,
LongWritable, Text
{
@Override
protected void reduce( LongWritable key,
Iterable<Text> values,
Context context
) throws IOException, InterruptedException
{ Long n = 0l; Long a = 0l;
Iterator<Text> iterator =
values.iterator(); // calculate intermediate
aggregates while (iterator.hasNext()) {
44

Department of CSE
BIG DATA LABORATORY

String[] atom = iterator.next().toString().split(":");

a += Long.parseLong(atom[0]);
n += Long.parseLong(atom[1]);
}
context.write(key, new Text(Long.toString(a) + ":" + Long.toString(n)));
}
}
public class Reducer extends org.apache.hadoop.mapreduce.Reducer<
LongWritable,
Text,
LongWritable,
Text
{
@Override protected
void
reduce( LongWritable
key, Iterable<Text>
values,Context context
) throws IOException, InterruptedException
{ Long n = 0l; Long a = 0l;
Iterator<Text> iterator =
values.iterator(); // calculate the finale
aggregate while (iterator.hasNext()) {
String[] atom = iterator.next().toString().split(":");
a += Long.parseLong(atom[0]);
n += Long.parseLong(atom[1]);
}
// cut of seconds int average =
Math.round(a / n);

// convert the average minutes back to time

context.write( key
,
new Text( Integer.toString(average
/ 60)
+ ":" + Integer.toString(average % 60)
)
);
46
}

Department of CSE
BIG DATA LABORATORY

Expected Output:

Actual Output:

47
EXP NO: 9 MapReduce program to find the tags associated with each
Date: movie by analyzing movie lens data

Department of CSE
BIG DATA LABORATORY

AIM: To Develop a MapReduce program to find the tags associated with each movie by
analyzing movie lens data.

For this analysis the Microsoft R Open distribution was used. The reason for this was its
multithreaded performance as described here. Most of the packages that were used come from
the tidyverse - a collection of packages that share common philosophies of tidy data. The tidytext
and wordcloud packages were used for some text processing. Finally, the doMC package was
used to embrace the multithreading in some of the custom functions which will be described
later. doMC package is not available on Windows. Use doParallel package instead. Driver1.java
package KPI_1;

import org.apache.hadoop.conf.Configuration; import

org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text; import
org.apache.hadoop.mapreduce.Job; import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.input.MultipleInputs; import
org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser; public class
Driver1
{
public static void main(String[] args) throws Exception
{Path firstPath = new Path(args[0]);
Path sencondPath = new Path(args[1]);

Path outputPath_1 = new Path(args[2]);

Path outputPath_2 = new Path(args[3]);

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Most Viewed Movies");
49
//set Driver class
job.setJarByClass(Driver1.class);

Department of CSE
BIG DATA LABORATORY

//output format for mapper

job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);

//output format for reducer

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);

//use MultipleOutputs and specify different Record class and Input formats
MultipleInputs.addInputPath(job, firstPath, TextInputFormat.class,
movieDataMapper.class);
MultipleInputs.addInputPath(job, sencondPath, TextInputFormat.class,
ratingDataMapper.class); //set
Reducer class
job.setReducerClass(dataReducer.class);
FileOutputFormat.setOutputPath(job, outputPath_1);
job.waitForCompletion(true)
Job job1 = Job.getInstance(conf, "Most Viewed Movies2");
job1.setJarByClass(Driver1.class);
job1.setMapperClass(topTenMapper.class);
job1.setReducerClass(topTenReducer.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(LongWritable.class);
job1.setOutputKeyClass(LongWritable.class);
job1.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(job1, outputPath_1);
FileOutputFormat.setOutputPath(job1, outputPath_2);
job1.waitForCompletion(true);
}
}
50

Department of CSE

CCS334 BDA Lab Manual
No ratings yet
CCS334 BDA Lab Manual
35 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Functions, Limits & Continuity, Derivatives
100% (1)
Functions, Limits & Continuity, Derivatives
3 pages
VL5PJ - Datasheet (Low) - LG Video Wall - 211203 (20211220 - 140913)
No ratings yet
VL5PJ - Datasheet (Low) - LG Video Wall - 211203 (20211220 - 140913)
3 pages
Advance Music Technology Bje-440-1
No ratings yet
Advance Music Technology Bje-440-1
69 pages
E-Mail Function For Correspondence
No ratings yet
E-Mail Function For Correspondence
4 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Ccs334 - Big Data Analytics
75% (4)
Ccs334 - Big Data Analytics
2 pages
Big Data Analysis 3170722 Lab Manual
No ratings yet
Big Data Analysis 3170722 Lab Manual
68 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
F75 Ver.6.0-DCS
0% (1)
F75 Ver.6.0-DCS
286 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
Rush
No ratings yet
Rush
90 pages
Practical Final Ai
No ratings yet
Practical Final Ai
90 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Bda Lab Manual - Cse 8 Sem - Compl
No ratings yet
Bda Lab Manual - Cse 8 Sem - Compl
57 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
43 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Bda Record
No ratings yet
Bda Record
83 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
Os Notes
No ratings yet
Os Notes
65 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
Data Science
No ratings yet
Data Science
82 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Chapter 5 - AISe - Student
No ratings yet
Chapter 5 - AISe - Student
78 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Big Data Lab
No ratings yet
Big Data Lab
159 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Bda Lab
No ratings yet
Bda Lab
94 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Lesson 5.1: Slide Design and Elements, and Animation and Media
No ratings yet
Lesson 5.1: Slide Design and Elements, and Animation and Media
7 pages
Bda Lab Record
No ratings yet
Bda Lab Record
32 pages
Bda Lab
No ratings yet
Bda Lab
36 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
BDA 1.docx PUGALARASAN
No ratings yet
BDA 1.docx PUGALARASAN
13 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
BDH Lab Manual FINAL (Hadoop)
No ratings yet
BDH Lab Manual FINAL (Hadoop)
29 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
How To Clear Bios Info
No ratings yet
How To Clear Bios Info
5 pages
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
No ratings yet
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
49 pages
Nosql Not Only SQL: Databases That Don't Require A Fixed Schema
No ratings yet
Nosql Not Only SQL: Databases That Don't Require A Fixed Schema
21 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
20dce017 Bda Pracfil
No ratings yet
20dce017 Bda Pracfil
41 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Bda Da1
No ratings yet
Bda Da1
14 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Cardless Transaction in ATM
No ratings yet
Cardless Transaction in ATM
6 pages
BGM220P Wireless Gecko Bluetooth Module Data Sheet
No ratings yet
BGM220P Wireless Gecko Bluetooth Module Data Sheet
49 pages
Revision 2 QP
No ratings yet
Revision 2 QP
6 pages
Profile: Kevin Njoroge - It Graduate
No ratings yet
Profile: Kevin Njoroge - It Graduate
3 pages
Unit 1 Class and Objects: Structure Page Nos
No ratings yet
Unit 1 Class and Objects: Structure Page Nos
22 pages
Lenovo BIOS Setup Using Windows Management Instrumentation Deployment Guide V
No ratings yet
Lenovo BIOS Setup Using Windows Management Instrumentation Deployment Guide V
3 pages
PST Unit 4
No ratings yet
PST Unit 4
10 pages
Bda Record 18071a0597-1
No ratings yet
Bda Record 18071a0597-1
28 pages
ITT107 Jelise Hayden Assignment#2
No ratings yet
ITT107 Jelise Hayden Assignment#2
27 pages
Mobilett XP Digital - Brochure
No ratings yet
Mobilett XP Digital - Brochure
14 pages
Laboratory Exercise: SAP Business One As An ERP System and Its Basic Navigation
No ratings yet
Laboratory Exercise: SAP Business One As An ERP System and Its Basic Navigation
9 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
The Invigilator App-Student User Guide-UNISA
No ratings yet
The Invigilator App-Student User Guide-UNISA
20 pages
MAA Data Sheet
No ratings yet
MAA Data Sheet
6 pages
How To Remove Objects in DaVinci Resolve (In Color, Fusion or Edit Tabs) - EasyEdit
No ratings yet
How To Remove Objects in DaVinci Resolve (In Color, Fusion or Edit Tabs) - EasyEdit
16 pages
Sap Ui 5 and Js Java Script and Fiori Contents
No ratings yet
Sap Ui 5 and Js Java Script and Fiori Contents
16 pages
Virtual Robotics Contest - Annex Forms
No ratings yet
Virtual Robotics Contest - Annex Forms
7 pages
Lab Syllabus Format
No ratings yet
Lab Syllabus Format
4 pages
STEM 1 Assignment 1.odt
No ratings yet
STEM 1 Assignment 1.odt
3 pages
162 SWF Ma6 Prospektus
No ratings yet
162 SWF Ma6 Prospektus
2 pages
ED-2002-045 ModbusRTU Library Ver 1.2
100% (1)
ED-2002-045 ModbusRTU Library Ver 1.2
26 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
AAU3940 Technical Specifications (V100R016C10 - 02) (PDF) - EN
No ratings yet
AAU3940 Technical Specifications (V100R016C10 - 02) (PDF) - EN
28 pages
Big Data
No ratings yet
Big Data
4 pages
Radar Unix: A Complete Package For GPR Data Processing: Gilles Grandjean, Herve Durand
No ratings yet
Radar Unix: A Complete Package For GPR Data Processing: Gilles Grandjean, Herve Durand
9 pages
Big Data & Hadoop - Course Curriculum
No ratings yet
Big Data & Hadoop - Course Curriculum
6 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet