0% found this document useful (0 votes)

13 views13 pages

Hadoop Week 4

The document provides information about connecting with the Edureka Hadoop training program and an overview of the course topics over 8 weeks. It also recaps the concepts covered in weeks 1-3 which include building a Hadoop cluster, map reduce concepts of mappers, reducers and drivers, and input file formats. Finally, it discusses input splits, record readers and gives examples of different input formats like text, nline, sequence and keyvalue input formats.

Uploaded by

Rahul Kolluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

Hadoop Week 4

Uploaded by

Rahul Kolluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Connect with us

 24x7 Support on Skype, Email & Phone

 Skype ID – edureka.hadoop

 Email – [email protected]

 Call us – +91 88808 62004

 Venkat – [email protected]
Course Topics

 Week 1  Week 5
– Introduction to HDFS – HIVE

 Week 2  Week 6
– Setting Up Hadoop Cluster – HBASE

 Week 3  Week 7
– Map-Reduce Basics, types and formats – ZOOKEEPER

 Week 4  Week 8
– PIG – SQOOP
Recap of Week 1,2 and 3

Built Hadoop Cluster

Map Reduce Concepts

Mapper
Reducer
Driver
InputFormat
Input Split
Record Reader
Overview of InputFileFormats
Map Reduce – A closer look
Input files: Data for a MapReduce Task is initially
stored

InputFormat: How these files are split up and

read is defined here. It provides below
functionality

• Split-up the input file(s) into logical Input Splits, each

of which is then assigned to an individual Mapper

• Provide the Record Reader implementation to be used

to glean input records from the logical Input Split for
processing by Mapper.
Let us run a simple Map Reduce Task

WordCount Example

• Mapper
• Reducer
• Driver
FileSplit is the default Input Split
public abstract class FileInputFormat<K, V> extends InputFormat<K, V>
{ Some InputFileFormat like
……
public List<InputSplit> getSplits(JobContext job ) throws IOException
NLineInputFormat overrides
{ getsplits()
long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
long maxSize = getMaxSplitSize(job);
// generate splits
List<InputSplit> splits = new ArrayList<InputSplit>();
List<FileStatus>files = listStatus(job);
for (FileStatus file: files)
{
Path path = file.getPath();
FileSystem fs = path.getFileSystem(job.getConfiguration());
long length = file.getLen();
BlockLocation[] blkLocations = fs.getFileBlockLocations(file, 0, length);
if ((length != 0) && isSplitable(job, path))
{
long blockSize = file.getBlockSize();
long splitSize = computeSplitSize(blockSize, minSize, maxSize);
………
………
protected long computeSplitSize(long blockSize, long minSize, long maxSize)
{ return Math.max(minSize, Math.min(maxSize, blockSize));

}
TextInputFormat is the default record reader
public class TextInputFormat extends FileInputFormat<LongWritable, Text> { @Override
public RecordReader<LongWritable, Text>
createRecordReader(InputSplit split, TaskAttemptContext context)
{
return new LineRecordReader();

public class LineRecordReader extends RecordReader<LongWritable, Text> {

……
public boolean nextKeyValue() throws IOException
{
if (key == null)
{
key = new LongWritable();
}
key.set(pos);
……..
while (pos < end) { newSize = in.readLine(value, maxLineLength, Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
maxLineLength));
…..
pos += newSize;
Key InputFormats
InputFormat: Description: Key: Value:
Default format; reads The byte offset of the
TextInputFormat The line contents
lines of text files line
Parses lines into key, Everything up to the The remainder of the
KeyValueInputFormat
val pairs first tab character line
A Hadoop-specific
SequenceFileInputFormat high-performance user-defined user-defined
binary format
Sample code and execution

• Custom Record Reader

• NLineInputFormat

• SequenceFileInputFormat

• KeyValueInputFormat

• Custom partitioner

• Custom combiner
Display running jobs

hadoop job –list

Joc an be kille d with following command

hadoop job -kill jobid
Clarifications

Q & A..?
Thank You
See You in Class Next Week

Hadoop MapReduce Flow Chart
No ratings yet
Hadoop MapReduce Flow Chart
28 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Cloudera Academic Partnership 3 PDF
0% (1)
Cloudera Academic Partnership 3 PDF
103 pages
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
No ratings yet
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
11 pages
09 - KPI Dashboard
No ratings yet
09 - KPI Dashboard
18 pages
Practise Quiz Ccd-333 Exam (01-2014) - Cloudera Quiz Learning
No ratings yet
Practise Quiz Ccd-333 Exam (01-2014) - Cloudera Quiz Learning
44 pages
Mapreduce Types and Formats
No ratings yet
Mapreduce Types and Formats
65 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Hadoop Training in Hyderabad
No ratings yet
Hadoop Training in Hyderabad
49 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
Yb - 3032 TR PR 0001 - COBie
No ratings yet
Yb - 3032 TR PR 0001 - COBie
31 pages
How Parsing Is Done in Oracle
No ratings yet
How Parsing Is Done in Oracle
5 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
MapReduce Exam 2019 - Solved Paper
No ratings yet
MapReduce Exam 2019 - Solved Paper
25 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
Hadoop
No ratings yet
Hadoop
28 pages
BDA Practical
No ratings yet
BDA Practical
18 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Unit 4
No ratings yet
Unit 4
10 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
BDA Exp Removed Removed
No ratings yet
BDA Exp Removed Removed
33 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
37 pages
Assgnment2 Group B
No ratings yet
Assgnment2 Group B
5 pages
Palak
No ratings yet
Palak
10 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
New 9
No ratings yet
New 9
3 pages
Cloud Unit 5
No ratings yet
Cloud Unit 5
52 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
CCDH Preperation
No ratings yet
CCDH Preperation
4 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
AZ 305T00A ENU PowerPoint - 03
No ratings yet
AZ 305T00A ENU PowerPoint - 03
28 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Using Map Reduce Concept, Implement A Java Pro...
No ratings yet
Using Map Reduce Concept, Implement A Java Pro...
2 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
64 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Lecture 4
No ratings yet
Lecture 4
28 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Map Reduce Types and Formats
No ratings yet
Map Reduce Types and Formats
32 pages
Master Data Management Policy Updated
No ratings yet
Master Data Management Policy Updated
3 pages
How Works With MapReduce
No ratings yet
How Works With MapReduce
36 pages
Shobha Updated Resume
No ratings yet
Shobha Updated Resume
3 pages
Azure Migrate
No ratings yet
Azure Migrate
1,264 pages
Lecture 04 - Cloud Storage
No ratings yet
Lecture 04 - Cloud Storage
28 pages
Glynn Consulting LTD - File Renaming Utility: WWW - Glynnconsulting.co - Uk, Modified by Shimant Gunjan
No ratings yet
Glynn Consulting LTD - File Renaming Utility: WWW - Glynnconsulting.co - Uk, Modified by Shimant Gunjan
23 pages
NoSQL - Unit2 - PPT
No ratings yet
NoSQL - Unit2 - PPT
24 pages
SQL Server Security (Logins, Users - Fixed Roles)
No ratings yet
SQL Server Security (Logins, Users - Fixed Roles)
3 pages
M.MARKS: 30 Duration: 3 Hrs Split Up of Marks During Practical Exam According To Cbse
No ratings yet
M.MARKS: 30 Duration: 3 Hrs Split Up of Marks During Practical Exam According To Cbse
2 pages
Hostile Takeover Defenses: by Group G.Abhinay: 75 RAHUL.K: 37 THARUN.K:12 MUKESH.T: 65 Triveni.K: 15
100% (1)
Hostile Takeover Defenses: by Group G.Abhinay: 75 RAHUL.K: 37 THARUN.K:12 MUKESH.T: 65 Triveni.K: 15
20 pages
@ASK Training SEO Report
No ratings yet
@ASK Training SEO Report
120 pages
Master Slave
No ratings yet
Master Slave
6 pages
IDoc Statuses - ABAP Development - SCN Wiki
No ratings yet
IDoc Statuses - ABAP Development - SCN Wiki
3 pages
S&J Company Profile PPT - CPP
No ratings yet
S&J Company Profile PPT - CPP
28 pages
15 SQL
No ratings yet
15 SQL
3 pages
ERD Manynotation
No ratings yet
ERD Manynotation
46 pages
6421 13979 1 PB PDF
No ratings yet
6421 13979 1 PB PDF
10 pages
CPG MinIO Implementation Guide
No ratings yet
CPG MinIO Implementation Guide
14 pages
Erwin Compliance White Paper April 2019
No ratings yet
Erwin Compliance White Paper April 2019
12 pages
Ffc8iv4tf6ji ApplicationModernizationScorecard Newadditions
No ratings yet
Ffc8iv4tf6ji ApplicationModernizationScorecard Newadditions
21 pages
Integration of SM
No ratings yet
Integration of SM
9 pages
Advertising Research
No ratings yet
Advertising Research
42 pages
C1 - Data Analytics For Accounting
No ratings yet
C1 - Data Analytics For Accounting
14 pages
Cqrs Best Practices and Misconceptions Slides
No ratings yet
Cqrs Best Practices and Misconceptions Slides
31 pages
SAS Visual Analytics Tricks We Learned From Reading Hundreds of SAS Community Posts
No ratings yet
SAS Visual Analytics Tricks We Learned From Reading Hundreds of SAS Community Posts
19 pages
Google Hummingbird
No ratings yet
Google Hummingbird
14 pages
FB Request Details
No ratings yet
FB Request Details
2 pages
Tutorial 5 Creating Advanced Queries and Enhancing Table Design
No ratings yet
Tutorial 5 Creating Advanced Queries and Enhancing Table Design
34 pages
Advanced Standard SQL Dynamic Structured Data Modeling and Hierarchical
No ratings yet
Advanced Standard SQL Dynamic Structured Data Modeling and Hierarchical
407 pages
What Is A Trigger
No ratings yet
What Is A Trigger
6 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
DCIT 305: Databases Fundamentals: Session 1: Introduction To Database Fundamentals
No ratings yet
DCIT 305: Databases Fundamentals: Session 1: Introduction To Database Fundamentals
47 pages
Source: PEC IPT Preliminary Course - Samuel Davis & Jacaranda IPT Preliminary Course IPT Preliminary - Carole Wilson
No ratings yet
Source: PEC IPT Preliminary Course - Samuel Davis & Jacaranda IPT Preliminary Course IPT Preliminary - Carole Wilson
13 pages
haXe 2 Beginner's Guide
From Everand
haXe 2 Beginner's Guide
Benjamin Dasnois
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)

Hadoop Week 4

Uploaded by

Hadoop Week 4

Uploaded by

Connect with us

 24x7 Support on Skype, Email & Phone

 Call us – +91 88808 62004

Built Hadoop Cluster

Map Reduce Concepts

InputFormat: How these files are split up and

• Split-up the input file(s) into logical Input Splits, each

• Provide the Record Reader implementation to be used

public class LineRecordReader extends RecordReader<LongWritable, Text> {

• Custom Record Reader

hadoop job –list

Joc an be kille d with following command

You might also like