0% found this document useful (0 votes)

18 views6 pages

Java CustomWritables

The document defines custom key and value classes (Name and Patent) for a MapReduce program. The Name class represents a person's first, middle, and last name while the Patent class represents a patent number and country. The mapper extracts fields from input data into these custom classes and writes key-value pairs. Counters track missing fields and records processed. The reducer prints patent information and inventor names grouped by patent number and country.

Uploaded by

Prem Sosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

Java CustomWritables

Uploaded by

Prem Sosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In this exercise, we will define our custom keys and values and use them in our map reduce

program. Following Program runs on 250

mb file and imploys counters.

Defining value class “name”

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

public class name implements Writable {

Text first = null; Your value class should import Writable interface
Text middle = null;
Text last = null;

public name(){
this.first = new Text();
this.middle = new Text();
this.last = new Text();
}
public name(Text f,Text m,Text l){ Overloading different constructors
this.first = f;
this.middle = m;
this.last = l;
}
public name(String f,String m,String l){
this.first = new Text(f);
this.middle = new Text(m);
this.last = new Text(l);
}
@Override
Our value class must
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub override two methods:
first.readFields(in);
middle.readFields(in); 1) readFields()
last.readFields(in); 2) write()
}
@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
first.write(out);
middle.write(out);
last.write(out);
}
public Text getName(){
return new Text(first.toString()+" "+middle.toString()+" "+last.toString());
}
}
Defining key class “patent”

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable; Your value class should import
import org.apache.hadoop.io.Text;
“WritableComparable” interface
import org.apache.hadoop.io.WritableComparable;

public class patent implements WritableComparable<patent>{

LongWritable patentNo = null;
Text country = null;
public patent(){
this.patentNo = new LongWritable();
this.country = new Text();
}
public patent(LongWritable l, Text t){
this.patentNo = l;
this.country = t;
}
public patent(long n,String c){
this.patentNo = new LongWritable(n);
this.country = new Text(c);
}
@Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
patentNo.readFields(in);
country.readFields(in);
}
@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
patentNo.write(out);
country.write(out);
}
@Override
public int compareTo(patent o) {
// TODO Auto-generated method stub
compareTo() method must be
int cmp = country.compareTo(o.country); overridden.
if(cmp!=0)
return cmp; Our compareTo() method first
else
compares two Texts (country name) . if
return patentNo.compareTo(o.patentNo);
} Both country names are same, further
comparison is made bu comparing
}
patentNo(LongWritable)
Defining mapper class

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.LongWritable; “missing” counter keeps track of records whose

import org.apache.hadoop.io.Text;
either mentioned fields are missing in data set.
import org.apache.hadoop.mapreduce.Mapper;

enum missing{ COUNTRY, “total” counter keeps track of written and skipped
FIRST, recoreds.
MIDDLE,
LAST
}
enum Total{ COUNT,
WRITTEN, Our defined key and value classes as parameters to mapper
SKIPPED}
public class map_class extends Mapper<LongWritable, Text, patent, name> {
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException{
String line = value.toString();
StringTokenizer tokens = new StringTokenizer(line,",");
long pat_no = 0;
String last=" ",first=" ",middle=" ",country=" ";
String token=null;
token = tokens.nextToken();
if(token.length()!=8){
pat_no = Long.parseLong(token.substring(0, token.length()));
token = tokens.nextToken();
if(token.length()>1)
last = token.substring(1, token.length()-1);
else
context.getCounter(missing.LAST).increment(1);
token = tokens.nextToken();
if(token.length()>1)
first = token.substring(1, token.length()-1);
else
Extracting context.getCounter(missing.FIRST).increment(1);
token = tokens.nextToken();
required fields for if(token.length()>1)
our custom middle = token.substring(1, token.length()-1);
else
defined key and
context.getCounter(missing.MIDDLE).increment(1);
value pairs using for(int i = 0 ;i < 5;i++){
substring() token = tokens.nextToken();
}
method. if(token.length()>1)
country = token.substring(1, token.length()-1);
else
context.getCounter(missing.COUNTRY).increment(1);
patent p = new patent(pat_no,country);
name n = new name(first,middle,last);
context.write(p,n);
context.getCounter(Total.WRITTEN).increment(1);}
else
context.getCounter(Total.SKIPPED).increment(1);
context.getCounter(Total.COUNT).increment(1);
}

} Incrementing countes through context()

Defining reducer class

import java.io.IOException;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class reduce_class extends Reducer<patent, name, NullWritable, Text> {

public void reduce(patent key,Iterable<name> values,Context context) throws
IOException, InterruptedException{
String n;
NullWritable out = NullWritable.get();
n = key.country.toString() + " " + key.patentNo.toString();
context.write(out,new Text("-----------------------------------------"));
context.write(out, new Text(n));
for(name nn : values){
n = nn.first.toString()+" "+nn.middle.toString()+"
"+nn.last.toString();
context.write(out, new Text(n));
}
context.write(out,new Text("-----------------------------------------"));
}
}

Defining runner class

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.*

public class runner {

public static void main(String[] args) throws IOException, ClassNotFoundException,
InterruptedException {
Configuration conf = new Configuration();
conf.set("heading", "This involves custom writables and partitioners");

Job job = new Job(conf);

job.setJarByClass(runner.class);

FileInputFormat.setInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(map_class.class);
job.setReducerClass(reduce_class.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputKeyClass(patent.class);
job.setMapOutputValueClass(name.class); Defining out custom key and value classes

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

System.exit(job.waitForCompletion(true)?0:1);
}}
Sample Input

Running the MapReduce Program

Output Files:

Object-Oriented Design Heuristics PDF
100% (2)
Object-Oriented Design Heuristics PDF
608 pages
Rplot
No ratings yet
Rplot
1 page
DAA Notes Module 2
No ratings yet
DAA Notes Module 2
39 pages
Daa Lab Task - 3
No ratings yet
Daa Lab Task - 3
9 pages
Shashank CV
No ratings yet
Shashank CV
1 page
Somesh Dwivedi Resume
No ratings yet
Somesh Dwivedi Resume
1 page
Lecture 8.2 - Multi-Dimensional Arrays
No ratings yet
Lecture 8.2 - Multi-Dimensional Arrays
125 pages
Question Bank Class Xi Cs Q&A
No ratings yet
Question Bank Class Xi Cs Q&A
64 pages
New Patch For Template Document 1
No ratings yet
New Patch For Template Document 1
6 pages
Using Map Reduce Concept, Implement A Java Pro...
No ratings yet
Using Map Reduce Concept, Implement A Java Pro...
2 pages
Why Is Object Serialization Essential in Hadoop MapReduce
No ratings yet
Why Is Object Serialization Essential in Hadoop MapReduce
4 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
CSE101 Lecture Notes For Mid Term Exams
No ratings yet
CSE101 Lecture Notes For Mid Term Exams
106 pages
Sougata Jana - CSC407 - OS - 2nd - Year - CSE - SET3
No ratings yet
Sougata Jana - CSC407 - OS - 2nd - Year - CSE - SET3
2 pages
IET Udaipur BDA Unit-3
No ratings yet
IET Udaipur BDA Unit-3
10 pages
Class Notes
No ratings yet
Class Notes
2 pages
Return Picnic Set1
No ratings yet
Return Picnic Set1
2 pages
Map Reduce 1
No ratings yet
Map Reduce 1
8 pages
Cloud LAB 10.1,11.1,12.1
No ratings yet
Cloud LAB 10.1,11.1,12.1
6 pages
Assignment2 PDF
100% (1)
Assignment2 PDF
3 pages
All
No ratings yet
All
11 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
Assignment 2 Write-Up
No ratings yet
Assignment 2 Write-Up
7 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Exp 3 4
No ratings yet
Exp 3 4
7 pages
Document 6
No ratings yet
Document 6
15 pages
SAP Cheet Sheet
No ratings yet
SAP Cheet Sheet
3 pages
Hibernate Nit
100% (6)
Hibernate Nit
246 pages
Python Programming Lab (21AML38)
No ratings yet
Python Programming Lab (21AML38)
31 pages
Gruhasthama Pravesh Karata Pahela Eni Javabadari Samajo
No ratings yet
Gruhasthama Pravesh Karata Pahela Eni Javabadari Samajo
48 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Wrordcount
No ratings yet
Wrordcount
2 pages
FASTA Algorithm
No ratings yet
FASTA Algorithm
15 pages
Apache Kafka Tutorial
100% (3)
Apache Kafka Tutorial
61 pages
Python TIE
No ratings yet
Python TIE
4 pages
BDA MapReduce Program
No ratings yet
BDA MapReduce Program
8 pages
BDAV Practical
No ratings yet
BDAV Practical
17 pages
Lab3 BigData-MapReduce
No ratings yet
Lab3 BigData-MapReduce
8 pages
22MCC20017 Suraj Kumar Thakur BIG Data 2.2
No ratings yet
22MCC20017 Suraj Kumar Thakur BIG Data 2.2
5 pages
4 Matrix
No ratings yet
4 Matrix
2 pages
Lab Manual-ICP Spring 2020
No ratings yet
Lab Manual-ICP Spring 2020
30 pages
Problems For Algorithm Development: Java Programming
0% (1)
Problems For Algorithm Development: Java Programming
7 pages
BDA Exp Removed Removed
No ratings yet
BDA Exp Removed Removed
33 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
BDA Output
No ratings yet
BDA Output
32 pages
22MCC20017 Suraj Kumar Thakur BIG Data 2.1
No ratings yet
22MCC20017 Suraj Kumar Thakur BIG Data 2.1
7 pages
Computer Architecture Course: IT089IU International University - VNU HCM Date: March 2021 Dr. Le Hai Duong & Dr. Ly Tu Nga Time: 6 hours Trần Minh Duy ITITIU18230
No ratings yet
Computer Architecture Course: IT089IU International University - VNU HCM Date: March 2021 Dr. Le Hai Duong & Dr. Ly Tu Nga Time: 6 hours Trần Minh Duy ITITIU18230
9 pages
Drawing ASCII Art
0% (1)
Drawing ASCII Art
3 pages
Classcreation
No ratings yet
Classcreation
2 pages
Practical 2-3
No ratings yet
Practical 2-3
3 pages
Java/J2EE Design Patterns Interview Questions You'll Most Likely Be Asked: Second Edition
No ratings yet
Java/J2EE Design Patterns Interview Questions You'll Most Likely Be Asked: Second Edition
22 pages
Practical 2-2
No ratings yet
Practical 2-2
9 pages
OrderSendReliable mq4
No ratings yet
OrderSendReliable mq4
15 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
C++ Project On Student Report Card
No ratings yet
C++ Project On Student Report Card
31 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Hadoop MapReduce Flow Chart
No ratings yet
Hadoop MapReduce Flow Chart
28 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Experiment 13 PP
No ratings yet
Experiment 13 PP
3 pages
Auto Indent Generation Procedure Requirement
No ratings yet
Auto Indent Generation Procedure Requirement
1 page
11 Numpy Cheat Sheet
No ratings yet
11 Numpy Cheat Sheet
1 page
DA Lab Program-3
No ratings yet
DA Lab Program-3
9 pages
Text Output Lecture
No ratings yet
Text Output Lecture
2 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Worksheet 6th
No ratings yet
Worksheet 6th
6 pages
Short Programs
No ratings yet
Short Programs
41 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Operations Manager 2007 Report Authoring Guide: Authors
No ratings yet
Operations Manager 2007 Report Authoring Guide: Authors
73 pages
CSE06CSL47
No ratings yet
CSE06CSL47
36 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Palak
No ratings yet
Palak
10 pages
ABAP General Naming Standards Quick Reference
No ratings yet
ABAP General Naming Standards Quick Reference
3 pages
Big Data - ASSIGNMENT 2
No ratings yet
Big Data - ASSIGNMENT 2
15 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
Sap R/3 Basis Training User & Authorization
No ratings yet
Sap R/3 Basis Training User & Authorization
79 pages
Customer - 3.java: Import Import Import Import Import Import Import Import
No ratings yet
Customer - 3.java: Import Import Import Import Import Import Import Import
15 pages
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
MapReduce Example
No ratings yet
MapReduce Example
3 pages

Java CustomWritables

Uploaded by

Java CustomWritables

Uploaded by

In this exercise, we will define our custom keys and values and use them in our map reduce

program. Following Program runs on 250

Defining value class “name”

public class name implements Writable {

public class patent implements WritableComparable<patent>{

import org.apache.hadoop.io.LongWritable; “missing” counter keeps track of records whose

} Incrementing countes through context()

public class reduce_class extends Reducer<patent, name, NullWritable, Text> {

Defining runner class

public class runner {

Job job = new Job(conf);

Running the MapReduce Program

You might also like