0% found this document useful (0 votes)

67 views7 pages

Assignment 2 Write-Up

The document outlines an assignment to design a distributed application using MapReduce in Java to process a log file and identify users with maximum login duration. It explains the MapReduce framework, detailing the roles of the Mapper and Reducer, as well as the stages of execution: map, shuffle, and reduce. Additionally, it provides code examples for the Mapper, Reducer, and Driver classes, along with steps for compiling and executing the program on a Hadoop platform.

Uploaded by

sahildav24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views7 pages

Assignment 2 Write-Up

Uploaded by

sahildav24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment No.

Title: Design a distributed application using MapReduce (Using Java) which processes a log file of a system.
List out the users who have logged for maximum period on the system. Use simple log file from the Internet
and process it using a pseudo distribution mode on Hadoop platform.
Objectives: To learn the concept of Mapper and Reducer and implement it for log file processing
Aim: To implement a MapReduce program that will process a log file of a system.
Theory:
Introduction

MapReduce is a framework using which we can write applications to process huge amounts of data,
in parallel, on large clusters of commodity hardware in a reliable manner.MapReduce is a processing
technique and a program model for distributed computing based on java.
The MapReduce algorithm contains two important tasks, namely Map and Reduce.Map takes a set of
data and converts it into another set of data, where individual elements are broken down into tuples
(key/value pairs).
Secondly, reduce task, which takes the output from a map as an input and combines those data tuples
into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce taskis always
performed after the map job.
Under the MapReduce model, the data processing primitives are called mappers and reducers.
once we writean application in the MapReduce form, scaling the application to run over hundreds,
thousands, or even tens of thousands of machines in a cluster is merely a configuration change
Algorithm
 MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce
stage.
o Input : file or directory
o Output : Sorted file<key, value>
1. Mapstage :
o The map or mapper‟s job is to process the input data.
o Generally the input data is in the form of file or directory and is stored in the
Hadoop file system (HDFS).
o The input file is passed to the mapper function line by line.
o The mapper processes the data and creates several small chunks of data.

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 1

2. Shuffle stage:
o This phase consumes the output of mapping phase.
o Its task is to consolidate the relevant records from Mapping phase output
3. Reduce stage:
o This stage is the combination of the Shuffle stage and the Reduce stage.
o The Reducer‟s job is to process the data that comes from the mapper.
o After processing, it produces a new set of output, which will be stored in the
HDFS.
Inserting Data into HDFS:

 The MapReduce framework operates on <key, value>pairs, that is, the framework views the input to
the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of thejob,
conceivably of different types.
 The key and the value classes should be in serialized manner by the framework and hence, need to
implement the Writable interface. Additionally, the key classes have to implement the Writable-
Comparable interface to facilitate sorting by the framework.
 Input and Output types of a MapReduce job: (Input) <k1, v1> -> map -><k2, v2>->
reduce -><k3, v3> (Output).

Figure: An Example Program to Understand working of MapReduce Program

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 2

#Mapper Class
packageSalesCountry;

importjava.io.IOException;

importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.*;

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,

IntWritable> {
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,

Reporter reporter) throws IOException {

String valueString = value.toString();

String[] SingleCountryData = valueString.split("-");
output.collect(new Text(SingleCountryData[0]), one);
}
}

#Reducer Class
packageSalesCountry;
importjava.io.IOException;
importjava.util.*;

importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.*;

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text,

IntWritable, Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values,

OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
Text key = t_key;
intfrequencyForCountry = 0;
while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get();

}
output.collect(key, new IntWritable(frequencyForCountry));
}
}

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 3

#Driver Class
packageSalesCountry;

importorg.apache.hadoop.fs.Path; import
org.apache.hadoop.io.*;
importorg.apache.hadoop.mapred.*;

public class SalesCountryDriver {

public static void main(String[] args) {
JobClientmy_client = new JobClient();
// Create a configuration object for the job
JobConfjob_conf = new JobConf(SalesCountryDriver.class);

// Set a name of the Job

job_conf.setJobName("SalePerCountry");

// Specify data type of output key and value

job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class

job_conf.setMapperClass(SalesCountry.SalesMapper.class);
job_conf.setReducerClass(SalesCountry.SalesCountryReducer.class);

// Specify formats of the data type of Input and output

job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line arguments,

//arg[0] = name of input directory on HDFS, and arg[1] = name of output
directory to be created to store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

my_client.setConf(job_conf);try
{
// Run the job
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 4

Steps for Compilation & Execution of Program:
#sudomkdiranalyzelogsls
#sudochmod -R 777 analyzelogs/cd
ls cd
..
pwd
ls cd
pwd
#sudochown -R hduseranalyzelogs/cd
ls
#cd analyzelogs/ls
cd ..
Copy the Files (Mapper.java,Reduce.java,Driver.java to Analyzelogs Folder)
#sudocp /home/mde/Desktop/count_logged_users/* -/analyzelogs/

Start HADOOP
#start-dfs.sh
#start-yarn.sh
#jps
cd
cdanalyzelogs
ls
pwd
ls
#ls -ltr #ls
-al
#sudochmod +r *.*
pwd
#export CLASSPATH="$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-
client-core-2.9.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-
common-2.9.0.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-
2.9.0.jar:~/analyzelogs/SalesCountry/*:$HADOOP_HOME/lib/*"

Compile Java Files

# javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.javals
#cd SalesCountry/ls
cd ..
#sudogedit Manifest.txt
#jar -cfm analyzelogs.jar Manifest.txt SalesCountry/*.classls
cd
jps
#cd analyzelogs/

Create Directory on Hadoop

#sudomkdir ~/input2000ls
pwd
#sudocp access_log_short.csv ~/input2000/
# $HADOOP_HOME/bin/hdfsdfs -put ~/input2000 /
# $HADOOP_HOME/bin/hadoop jar analyzelogs.jar /input2000 /output2000#
$HADOOP_HOME/bin/hdfsdfs -cat /output2000/part-00000

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 5

# stop-all.sh
# jps

Conclusion: Thus, we have learnt how to design a distributed application using MapReduce and process
a log file of a system.

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 6

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 7

BTL Cardiopoint: Service Manual
100% (1)
BTL Cardiopoint: Service Manual
119 pages
Isaac Cs Ocr Book 2022
No ratings yet
Isaac Cs Ocr Book 2022
354 pages
Simple Automatic Voltage Stabilizer
No ratings yet
Simple Automatic Voltage Stabilizer
6 pages
Enabling Macros in Host On-Demand Through EHLLAPI
No ratings yet
Enabling Macros in Host On-Demand Through EHLLAPI
10 pages
Unit 2
No ratings yet
Unit 2
12 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Assgnment2 Group B
No ratings yet
Assgnment2 Group B
5 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
3 Unit
No ratings yet
3 Unit
17 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Bda Unit-Iii
No ratings yet
Bda Unit-Iii
42 pages
Unit-III Big Data
No ratings yet
Unit-III Big Data
10 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
BDA Notes
No ratings yet
BDA Notes
39 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
21CS1601 Unit 5 Understanding Big Data Technolgies
No ratings yet
21CS1601 Unit 5 Understanding Big Data Technolgies
20 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Big Data Unit-2 PPT Part2
No ratings yet
Big Data Unit-2 PPT Part2
78 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Hadoop OnePage
No ratings yet
Hadoop OnePage
2 pages
Data Science
No ratings yet
Data Science
7 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
21 pages
DSBDSAssingment 11
No ratings yet
DSBDSAssingment 11
20 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
What Is Map Reduce Programming Model - Explain.
No ratings yet
What Is Map Reduce Programming Model - Explain.
3 pages
MapReduce BDA
No ratings yet
MapReduce BDA
32 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Soft-Starter: Programming Manual
No ratings yet
Soft-Starter: Programming Manual
162 pages
TP196: R3trans - TP Return Code Transfer (Return Code 14)
No ratings yet
TP196: R3trans - TP Return Code Transfer (Return Code 14)
3 pages
Core Java
100% (2)
Core Java
784 pages
Digital Content
No ratings yet
Digital Content
6 pages
PIR and DC Motor
No ratings yet
PIR and DC Motor
45 pages
Emu8086 Tutorial
No ratings yet
Emu8086 Tutorial
53 pages
Roland VP 9000 VariPhrase Processor Service Manual
No ratings yet
Roland VP 9000 VariPhrase Processor Service Manual
41 pages
Mini Link 66001611756925464
100% (1)
Mini Link 66001611756925464
16 pages
Ard1 - Using 7-Segment and PWM
No ratings yet
Ard1 - Using 7-Segment and PWM
8 pages
Flamma FX100 Manual
No ratings yet
Flamma FX100 Manual
33 pages
NE5000E V800R022C00SPC500 Configuration Guide 03 Network Reliability
No ratings yet
NE5000E V800R022C00SPC500 Configuration Guide 03 Network Reliability
494 pages
Intel Ethernet Controller E810-CAM2-CAM1-XXVAM2 Product Brief
No ratings yet
Intel Ethernet Controller E810-CAM2-CAM1-XXVAM2 Product Brief
4 pages
Axial
No ratings yet
Axial
2 pages
ENISA RM Deliverable2 Final Version v1.0 2006-03-30
No ratings yet
ENISA RM Deliverable2 Final Version v1.0 2006-03-30
10 pages
Cisco Advanced Malware Protection For Endpoints Data Sheet
No ratings yet
Cisco Advanced Malware Protection For Endpoints Data Sheet
11 pages
Lorven Public School: January Monthly Test Class: VII Subject: Computer Applications Marks: 50
No ratings yet
Lorven Public School: January Monthly Test Class: VII Subject: Computer Applications Marks: 50
1 page
Heterodyne Principle
No ratings yet
Heterodyne Principle
8 pages
Password Enabled Door Locking System Using Arduino and Iot IJERTCONV6IS15106
No ratings yet
Password Enabled Door Locking System Using Arduino and Iot IJERTCONV6IS15106
3 pages
Website: Vce To PDF Converter: Facebook: Twitter:: 250-438.vceplus - Premium.Exam.70Q
No ratings yet
Website: Vce To PDF Converter: Facebook: Twitter:: 250-438.vceplus - Premium.Exam.70Q
24 pages
200 Evolution System Requirements Ame
No ratings yet
200 Evolution System Requirements Ame
5 pages
Standard Cell Designformat-2
No ratings yet
Standard Cell Designformat-2
7 pages
IDOCs
100% (1)
IDOCs
32 pages
cMT-SVR-200 Wifi Setting UserManual en
No ratings yet
cMT-SVR-200 Wifi Setting UserManual en
16 pages
Computer Science P2
No ratings yet
Computer Science P2
16 pages
Ap4955 PDF
No ratings yet
Ap4955 PDF
4 pages
D-NWG-FN-23 Exam Practice Questions
No ratings yet
D-NWG-FN-23 Exam Practice Questions
11 pages

Assignment 2 Write-Up

Uploaded by

Assignment 2 Write-Up

Uploaded by

Assignment No.

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 1

Figure: An Example Program to Understand working of MapReduce Program

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 2

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,

String valueString = value.toString();

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text,

public void reduce(Text t_key, Iterator<IntWritable> values,

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 3

public class SalesCountryDriver {

// Set a name of the Job

// Specify data type of output key and value

// Specify names of Mapper and Reducer Class

// Specify formats of the data type of Input and output

// Set input and output directories using command line arguments,

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 4

Compile Java Files

Create Directory on Hadoop

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 5

Department of Information Technology, JSCOE,Hadapsar, Pune-028 Page 6

You might also like