0% found this document useful (0 votes)

5 views

MapReduce - Notes

Map-Reduce is an execution model in the Hadoop framework that consists of two phases: Mapper and Reducer. The Mapper processes input data into key-value pairs, while the Reducer aggregates the data and writes the results to HDFS. The document also outlines the structure of Map-Reduce programs in Java, including necessary classes and configurations for executing jobs.

Uploaded by

kotikoteswararao23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

MapReduce - Notes

Uploaded by

kotikoteswararao23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Map-Reduce

Map Reduce is an execution model in Hadoop frame work. It divides the process into two separate phases
1. Mapper
2. Reducer

Mapper
Mapper takes input from raw-input-file (HDFS) and output of the mapper is called shuffled data
(intermediate result). This output will be sent to Reducer which aggregates our data and writes result into
HDFS.

Shuffling and sorting the data

Making multiple rows of same key into single row (it eliminates duplicate keys) and arranges them in
the sorted order of keys.

Example

In high level terminology,

Mapper input should be in map collection <key, value> pair, its output will also be in the format
of map collection. When input file is submitted, framework will convert each row of file into <key, value>
pairs.
Note: The lines being offset number will become key and entire line will become value.
 Developer logic has to separate the required key and values from the value of input.
 Developer logic in Mapper separate <key, value>.
 Developer logic in Reducer generates aggregate (sum (), avg (), min (), max (), count (), etc….)

Low Level Flow

Example
Reducer

NOTE “Map-Reduce API” is available for following languages

1. Java

2. Python
3. C++
4. Ruby
The following map-reduce programs are implemented in JAVA
Structure of Map-Reduce program in Java:

Note The customer Mapper class and Reducer class should be static.
In Mapper class developer logic has to be implemented in map () function.
In Reducer class developer logic has to be implemented in reduce () function.
Whenever the program is compiled we get 3 separate classes.
1. Main class (or) Driver class
2. Mapper class
3. Reducer class
The following are the common map-reduce classes used in every program.

1. Mapper
2. Reducer 7. Path
3. Job 8. Configuraton
4. Text 9. FileInputFormat
5. IntWritable 10. FileOutputFormat
6. LongWritable 11. GenericOptionParser

Note:
The above first 10 classes are available in Hadoop–core.jar. If you want to access the above classes
you must configure the Hadoop–core.jar. Path for this above jar file is (/usr/lib/Hadoop – 0.20) and 11th
class is available in commons – cli – 1.2.jar, we must configure this jar file also. Path foe this jar file is
(/usr/lib/Hadoop – 0.20/lib).
Mapper : To define mapper class (custom) package is org.apache.hadoop.mapreduce.Mapper . Here
Mapper is a class and the remaining is package. Whenever a java class has extended Mapper class, the java
class will get Mapper functionalities.

Example:

Context
This class is available in Java and also in “map-reduce”. It is inner class, in Mapper and also in Reducer.

Context of Mapper
It shuffles data that means not allowing duplicates in key place.
Example
context (k1, v1) --------- <k1, v1>
context (k1, v2) --------- <k1, <v1, v2>>
context (k2, v1) --------- <k2, v1>
context (k1, v1) --------- <k1, v1>

context of Reducer
It writes key, value pairs into HDFS. It does not bother weather key is duplicated or not.

Reducer Class :
To define custom reducer class we need the package. It is org.apache.hadoop.mapreduce.Reducer
when java class has extended Reducer class the class will get reducer functionality.

Example:

Note: Input key type of Reducer is equal to output key type of Mapper. Input value type of Reducer is equal
to output value type of Mapper.

Map-Reduce Data Types

1. Text

2. IntWritable
3. LongWritable
Java data types are compatible with Operating System. Map-Reduce types are compatible with
HDFS. While contexting results into HDFS, we should use Map-Reduce types.
Input HDFS File

Map-Reduce Types

1. Map-Reduce types are converted into Java type

2. Apply Java logic (to process)
3. Results (Java Types) are to be converted into Map-Reduce types.
4. Context the results.

Text :
It is equivalent to Java “String” type. Package is org.apache.hadoop.io.Text

IntWritable
It is equivalent to Java “int”. Package is org.apache.hadoop.io.IntWritable
LongWritable
It is equivalent to Java “long”. Package is org.apache.hadoop.io.LongWritable

Converting LongWritable to long Converting long to IntWritable

LongWritable x=7500000000; long a=750000000;
long y=x.get(); LongWritable l=new LongWritable ();
l.set(a);
(or)
long x=7800000000;
LongWritable l=new LongWritable (x);

Example
public static class MyMap extends Mapper < LongWritable, Text, Text, LongWritable >
{
Public void map (LongWritable k, Text v, Context con) throws IOException, Interrupted Exception

{
String line= v.toString();
String y= line.subString(5,9);
Int y=Integer.parseInt (line.subString (12,14));
con.write (new Text(y), new IntWritable (t));
}
}

public static class MyReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
Public void reduce (Text y, Iterable <IntWritable> vals, Context con)
throws IOException, InterruptedException
{
int m=0;

for(IntWritable v:vals){
m=Math.max(m, v.get());
con.write(y, new Intwritable(m));
}
}
}
Configuration
To take default parameters of HDFS and Map-Reduce.
Package org.apache.hadoop.conf.Configuration
Example Configuration con=new Configuration ();
GenericOptionsParser

To pass command line arguments of Hadoop jar command.

Example $hadoop jar Desktop/abc.jar abc.xyz file1 dirx;
Desktop/abc.jar jar file path
abc.xyz package & class name
file1 file name (input file name)
dirx Directory name (output directory name)

Example $hadoop jar Desktop/abc.jar abc.xyz /user/Myself/file1.txt /user/urself;

/user/Myself/file1.txt arguments 0
/user/urself; arguments 1
Public static void main (String [ ] args)
{
Configuration con=new Configuration ();

String [] files=new GenericOptionParser (con,args).getRemaingArgs();

}

Explaination
GenericOptionParser gop=new GenericOptionParser (con, args);
String [] files=gop.getRemainingArgs();
files[0]=/user/Myself/file1.txt
files[1]=/user/urself

In the above explanation file[0], file[1] are in operating system file path, JVM does not
know the operating system file path, we must convert this path into HDFS file path.

Path class
To convert operating system file path compatible with HDFS file path.

Package org.apache.hadoop.fs.Path
Path p1=new Path (files [0]);
Path p2=new Path (files [1]);
FileInputFormat class
To specify input file path.
Package org.apache.hadoop.lib.input.FileInputFormat
FileInputFormat.addInputPath(j, p1);
Job j=new Job (con, Myjob);

FileOutputFormat class
To specify output file path.
Package org.apache.hadoop.lib.input.FileOutputFormat
FileOutputFormat.setOutputPath(j, p2);
Job j= new Job(con, Myjob);
Note FileInputFormat, FileOutputFormat classes contain static methods like

1. addInputPath()
2. setOuputPath()
Note These methods are static methods. The above methods are calling without creating
instance. Directly call with class name.

Job To define job.

Package org.apache.hadoop.mapreduce.Job
i. Job j=new Job(con);
j.setJobName(“MyTest”)
ii. Job j=new Job(con, “MyTest”);

j.setJarByClass(MaxTemperature.class);
j.setJarByClass(MaxTemperature.class); //Main class
j.setMapperClass(MyMap.class); //Mapper
j.setReduceClass(MyReduce.class); //Reducer
j.setCombinerClass(MyReducer.class); //Reducer
j.setOutputKeyClass(Text.class); //output key of Mapper

j.setOutputValuesClass(IntWritable.class); //output value of Mapper

Example:
AIM: Word count program using Map-Reducer Mr/file.txt
Input:
Mr HDFS directory
File.txt HDFS File

Hadoop is a framework

for big data storage

and processing

Hadoop is good for big data Analytics

package my.map.reduce;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.fs.Path;
import org.apache.hadoop.io.conf.Configuration;
import org.apache.hadoop.io.util.GenericOptionsParser;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount

{
public static class MapForWordCount
extends Mapper (LongWritable, Text, Text, IntWritable)
{
Public void map( Long Writable k, Text v, Context con)
throws IOException InterruputException
{
String line=v.toString();
StringToKenizer t =new StringToKenizer(line);
While(t.hasMoreTokens())

{
String word=t.nextTokens();
Con.write(new Text(word), new IntWritable(l));
} //end of loop
} //end of map()
} //end of Mapper

public static void main( String [] args ) throws Exception

{
Configuration c=new Configuration ();
String [] files=new GenericOptionsParser(c, args).getRemainingArgs();
Path p1=new Path(files[0]);
Path p2=new Path(files[1]);

Job j=new Job (c, “wordcount”);

j.setJarByClass(WordCount.class);
j.setMapperClass(MapForwardCount.class);
j.setCombinerClass(ReduceForwardCount.class);
j.setReducerClass(ReduceForwardCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWrtable.class);

FileInputFormat.addInputPath(j,p1);
FileOutputFormat.setOutputPath(j,p2);
System.exit(j.waitForCompletion(true)? 0 : 1);
} //end of main function
} //end of main class.
Output: 17 words
Working with delimited files using Map-Reduce
Aim:: Single Grouping with Single Aggregation

Input::

code:
package my.mr.analytics
import java.io.IOException;

import java.util.StringTokenizer;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.fs.Path;
import org.apache.hadoop.io.conf.Configuration;
import org.apache.hadoop.io.util.GenericOptionsParser;
public class Emp

{
public static class MapForEmp
extends Mapper < LongWritable, Text, Text, Intwritable >
{
int sal;
String sex,
public void map (LongWritable k, Text v, Context con)
throws IOException,InterruptedException
{

String line=v.toString();
StringTokenizer t=new StringTokenizer(line, “,”);
int i=1;
while(t.hasMoreTokens())
{
String word=t.nextToken();

if(i==3)
sal=Integer.parseInt(word);
if(i==4)
sex=word;
i++;
}

if(sex.matches(“f”))
sex= “ female”;
else
sex=”male”;
con.write(new Text(sex), new IntWritable(sal));
}
}

// Reducer
Public static class ReducerForEmp
extends Reducer < Text, IntWritable, Text, IntWritable >
{
Public void reduce ( Text sex, Iterable <IntWritable> salaries, Context con )
throws IOException, InterruptExcaption
{
Int tot=0;
for ( IntWritable sal : salaries)
tot +=sal.get();
con.write ( sex, new IntWritable (txt));

} //end of reduce function

} //end of reduce class.
Public static void main( Sting [] args) throwsException
{
-----------
----------- //job definition.

Output::

Output of Mapper Output of Reducer

M, < 2000, 4000, 6000, 7000 > <M, 19000>

F, < 3000, 5000, 9000, 7000 > <F, 24000>

How To Execute Map-Reduce Program

Step1 create Java Project
FileNew Java Project

First time
FileNew OthersJava Project
Project NameMyTestscr

Step2
Create Package under
SrcNewPackagePackage name (my.mapreduce)

Step3
Create Java class
PackageNewClassClass Name (staff.java)
Step4
Supply Java code and Save.

Step5
Create JAR file
Project name -> Export -> Java JAR file

Note Give Jar file name as project name.

Step6
Submit Map-Reduce Jar

Execution Step
$ Hadoop jar <jar file path> <class path name> <input file path> <output file path>

Note : How to configure External JAR files

Src (project)  Build Path  Configure Build Path  Libraries  Add External Jars 
1. Commons-cli-1.2.jar (for GenericOptionsParser)
2. Hadoop-core.jar (for remaining packages).
Open Eclips  Choose Project  write class.

How To Install Uinisim Design Suite R460.1 - Full
No ratings yet
How To Install Uinisim Design Suite R460.1 - Full
2 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Palak
No ratings yet
Palak
10 pages
Execute Java Map Reduce Sample Using Eclipse
No ratings yet
Execute Java Map Reduce Sample Using Eclipse
9 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Unit 3
No ratings yet
Unit 3
14 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Assignment 2 Write-up
No ratings yet
Assignment 2 Write-up
7 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Hadoop
No ratings yet
Hadoop
28 pages
bda megh
No ratings yet
bda megh
50 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
unit 2
No ratings yet
unit 2
12 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Assgnment2 Group B
No ratings yet
Assgnment2 Group B
5 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Data Science
No ratings yet
Data Science
82 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
64 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Hadoop MapReduce Flow Chart
No ratings yet
Hadoop MapReduce Flow Chart
28 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
BDA record
No ratings yet
BDA record
58 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Bda Unit-Iii
No ratings yet
Bda Unit-Iii
42 pages
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
No ratings yet
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
20 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
Hadoop Week 4
No ratings yet
Hadoop Week 4
13 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Java Test 3 - Google Forms
No ratings yet
Java Test 3 - Google Forms
5 pages
Diploma in Computer Engineering / Computer Hardware Engineering
No ratings yet
Diploma in Computer Engineering / Computer Hardware Engineering
5 pages
06lab Gejon Hosea
No ratings yet
06lab Gejon Hosea
2 pages
Randomization in SV
No ratings yet
Randomization in SV
11 pages
Compiler Design BIT052 Complete Notes(RRSIMT)
No ratings yet
Compiler Design BIT052 Complete Notes(RRSIMT)
126 pages
Safe Languages and Safe by Design Software
No ratings yet
Safe Languages and Safe by Design Software
14 pages
Writing Win32 Dynamic Link Libraries (DLLS) and Calling Them From LabVIEW
No ratings yet
Writing Win32 Dynamic Link Libraries (DLLS) and Calling Them From LabVIEW
11 pages
15 Slide
No ratings yet
15 Slide
40 pages
SCRATCH CODING BLOCKS
No ratings yet
SCRATCH CODING BLOCKS
3 pages
Lecture07 MPI by Example
No ratings yet
Lecture07 MPI by Example
27 pages
Array and Pointer - 2
No ratings yet
Array and Pointer - 2
51 pages
Ex 01
No ratings yet
Ex 01
5 pages
React - Js Cheat Sheet
No ratings yet
React - Js Cheat Sheet
14 pages
CS4031 Compiler Construction Lecture 1
No ratings yet
CS4031 Compiler Construction Lecture 1
42 pages
PWP Practical 13 PDF
No ratings yet
PWP Practical 13 PDF
3 pages
BCS306A Jan.2025
No ratings yet
BCS306A Jan.2025
2 pages
Oop Lecture 8
No ratings yet
Oop Lecture 8
12 pages
XII Python Fundamentals
No ratings yet
XII Python Fundamentals
7 pages
Java Lab Manual
No ratings yet
Java Lab Manual
29 pages
Structure of The C Program
No ratings yet
Structure of The C Program
3 pages
C AllinOne
No ratings yet
C AllinOne
398 pages
Follow Code in Java Compiler Design
No ratings yet
Follow Code in Java Compiler Design
5 pages
C12 Ans-Key 70m 1-10
No ratings yet
C12 Ans-Key 70m 1-10
8 pages
Lab Manual C AIDS - 2
No ratings yet
Lab Manual C AIDS - 2
50 pages
Developing Applications - Class 8
No ratings yet
Developing Applications - Class 8
4 pages
3D Computer Graphics
No ratings yet
3D Computer Graphics
7 pages
Python Keywords
No ratings yet
Python Keywords
6 pages
Parameters
No ratings yet
Parameters
14 pages
Loops Solutions
No ratings yet
Loops Solutions
3 pages

MapReduce - Notes

Uploaded by

MapReduce - Notes

Uploaded by

Map-Reduce

Shuffling and sorting the data

In high level terminology,

Low Level Flow

NOTE “Map-Reduce API” is available for following languages

Map-Reduce Data Types

1. Map-Reduce types are converted into Java type

Converting LongWritable to long Converting long to IntWritable

To pass command line arguments of Hadoop jar command.

Example $hadoop jar Desktop/abc.jar abc.xyz /user/Myself/file1.txt /user/urself;

String [] files=new GenericOptionParser (con,args).getRemaingArgs();

Job To define job.

j.setOutputValuesClass(IntWritable.class); //output value of Mapper

for big data storage

Hadoop is good for big data Analytics

public class WordCount

public static void main( String [] args ) throws Exception

Job j=new Job (c, “wordcount”);

} //end of reduce function

Output of Mapper Output of Reducer

M, < 2000, 4000, 6000, 7000 > <M, 19000>

F, < 3000, 5000, 9000, 7000 > <F, 24000>

How To Execute Map-Reduce Program

Note Give Jar file name as project name.

Note : How to configure External JAR files

You might also like