Hadoop MapReduce Flow Chart

here we make a project on how the flow of mapreduce in hadoop in details

Uploaded by

BENFTIMAA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views28 pages

Hadoop MapReduce Flow Chart

here we make a project on how the flow of mapreduce in hadoop in details

Uploaded by

BENFTIMAA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Hadoop MapReduce

Flow Chart
BEN FTIMA MOHAMED
Whatsapp 21692971305
Map Flow Chart : My JOB in Hadoop : Hi
WordCountJob
How is your job
I take one small file :
How is your family
If it is 400mb and if i keep this file in
HDFS , how many number of Blocks will How is your sister
be given 128+128+128= 384mb + How is your brother
remaining 16 mb
What is time now
4 blocks will be given ,Right ok , so
What is the strength of hadoop
Let split this file in 4 input splits :
Let split this file in 4 input splits :because your HDFS is
128mb (hadoop 2.0)
I have to count number of words occurence in this file big.txt
how many times “hi” is available : 2 times
how many times “how” is available : 5 times

output should be :
(hi,2)
(how,5)
(are,1)
(is,6)
(you,3)
(your,1)
(job,1)
(family,1)
(father,1)
……
(what,3)
>benftima$ hadoop jar wordcount.jar word.class big.text outputDir (outputDir is optionnel)

>benftima$ hadoop jar wordcount.jar word.class

big.text outputDir (outputDir is optionnel)

generally when applying the

wordcount.jar file on the hole big.text
file , if i want to process something i
have to process the entire file and not
for one block separately
input split 1 —-> one mapper will be created
Parallel processing
hadoop can run mapreduce in key value pair (K,V) , we have to convert the lines in key value pair
how ? with the interface RecordReader , lines are called records in hadoop terminology
we need to tell in recordreader interface the type of inputfileformat , by
default it is TextInputFormat
take a break
how to concert a record in key value pairs because our mapreduce can take only key value pair :the interface
recordreader will do the job for us but how :

we have to know the fileinput format in the record reader :

the recordreader knows only how to read just one record (one line) only at a time in the inputsplit

1st record : hi how are you

(byteoffset , entireline) = ( 0 , hi how are you )

2nd record: hi how is your job

(byteoffset , entireline) = ( 16, hi how is your job )

byteoffset : i how are you 15 char+ backslash =16

—--> for each key value pair created the mapper run once —------->
map interface will be running in key value pair , the entire collection in
java is based on object type no primitive type in collection
for each key value ie byte-offset line the mapper run once , its parallel processing instead of sending the
hole task to one system , we are sending multiple job to multiple system
the key value pair = (0 , hi how re you ) must be of object type they must be of object boxclasses
type .then key is the number of char in a file must longWritable then value can be Text .ok
We resume now: is you file format
is TextInputFormat
your recordReader is converting
that file in (key ,value)=
(byteoffset,line) and here
byteoffset is LongWritable type
and line is Text type (boxtypes
hadoop).
the mapper will accept duplicate keys
mapper can take only (key, value) pair and give only (key,value) pair as output for —-->reducer , the output
format depends on the job you are running
Now come the reducer wich will combine all your (key,value) pair
Here comes in picture , map function :

key should be not duplicate , but value can be duplicate

shuffling
all wrapper classes are implemented with comparable interface , they provided
implementation for compare method,so that one key can compare to other key so they
can decide the sorting order , all box classes implemented writable comparable
interface , after shuffling no duplicate key.
now sorting can be done automatically.after shuffling and
sorting result will be given to the reducer …………..the
reducer will be executed as number of input times .as in
javaweb doGet and doPost methods will be executed as
times as number of request will be asked to the
controller .hadoop provide ident reducer is doing sorting
only only if you don’t write your own reducer but if you
write your own reducer the shuffling and sorting will be
done ;Mapper doesnt know how to sort your key value
pairs , reducer only knows that .mapper gives (how,1)
(how,1) twice ., when we are writing reducer code shuffling
and sorting will come because of the iterator interface in
the reducer class that we will see when we will discuss that
logic.ident reducer built in hadoop knows how to sort but
dont know how to shuffle it , but reducer know how to sort
and shuffle.
reducer will give result to the recordwriter who knows how to write these
key value pairs

All Certik Skynet Answer (Up-To-date)
100% (2)
All Certik Skynet Answer (Up-To-date)
21 pages
Caterpillar 3512c Fault Code
100% (6)
Caterpillar 3512c Fault Code
33 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Alexis Reid - Type Specimens
No ratings yet
Alexis Reid - Type Specimens
81 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
CURVIC1
No ratings yet
CURVIC1
48 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
64 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Mapreduce Types and Formats
No ratings yet
Mapreduce Types and Formats
65 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Practise Quiz Ccd-333 Exam (01-2014) - Cloudera Quiz Learning
No ratings yet
Practise Quiz Ccd-333 Exam (01-2014) - Cloudera Quiz Learning
44 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Cloud Unit 5
No ratings yet
Cloud Unit 5
52 pages
B. Hadoop Ecosystem - III - B (MapReduce Framework)
No ratings yet
B. Hadoop Ecosystem - III - B (MapReduce Framework)
33 pages
Lecture 4
No ratings yet
Lecture 4
28 pages
DATA MODELING Notes
No ratings yet
DATA MODELING Notes
8 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Hadoop 2
No ratings yet
Hadoop 2
31 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
Lab1 Introduction To Linux
100% (1)
Lab1 Introduction To Linux
20 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Forecasting Solved Examples
No ratings yet
Forecasting Solved Examples
10 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Hadoop Week 4
No ratings yet
Hadoop Week 4
13 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Advanced Programme In: Supply Chain Management
No ratings yet
Advanced Programme In: Supply Chain Management
19 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Hadoop Training in Hyderabad
No ratings yet
Hadoop Training in Hyderabad
49 pages
Libre Office Writer MCQ
No ratings yet
Libre Office Writer MCQ
13 pages
Act No. 2 of 2021the Cyber Security and Cyber Crimes
No ratings yet
Act No. 2 of 2021the Cyber Security and Cyber Crimes
49 pages
Hadoop
No ratings yet
Hadoop
28 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
Hadoop Developingapps PDF
No ratings yet
Hadoop Developingapps PDF
17 pages
Map Red
No ratings yet
Map Red
6 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Linux Command-Line Tips & Tricks
From Everand
Linux Command-Line Tips & Tricks
V. Subhash
No ratings yet
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Hadoop Interview Questions Faq
No ratings yet
Hadoop Interview Questions Faq
14 pages
Palak
No ratings yet
Palak
10 pages
CCNA1 v7.0: ITN Practice PT Skills Assessment (PTSA) Answers
No ratings yet
CCNA1 v7.0: ITN Practice PT Skills Assessment (PTSA) Answers
1 page
BDC Output 3
No ratings yet
BDC Output 3
4 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
New 9
No ratings yet
New 9
3 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Matrikon Data Broker MQTT Publisher User Manual
No ratings yet
Matrikon Data Broker MQTT Publisher User Manual
66 pages
Bachata Musicality
No ratings yet
Bachata Musicality
5 pages
IoT-Based Smart Air Conditioning Control For Thermal Comfort
No ratings yet
IoT-Based Smart Air Conditioning Control For Thermal Comfort
6 pages
Day # 18 (2 Past Papers)
No ratings yet
Day # 18 (2 Past Papers)
32 pages
MD-100 Exam Study Guide
No ratings yet
MD-100 Exam Study Guide
6 pages
CMM Company
No ratings yet
CMM Company
640 pages
Fusion Splicer Option
No ratings yet
Fusion Splicer Option
12 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Admit Card
No ratings yet
Admit Card
1 page
Linux System Administrator Interview Questions You'll Most Likely Be Asked
From Everand
Linux System Administrator Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mid Term Exam Sp20 Solutions
No ratings yet
Mid Term Exam Sp20 Solutions
8 pages
Canon Ir2016 Ir2020 Brochure
No ratings yet
Canon Ir2016 Ir2020 Brochure
4 pages
SRG 250
No ratings yet
SRG 250
216 pages
Cod PDF
No ratings yet
Cod PDF
14 pages
Screenshot 2022-05-20 at 21.51.45
No ratings yet
Screenshot 2022-05-20 at 21.51.45
1 page
96boards Som Carrier Board Schematics
No ratings yet
96boards Som Carrier Board Schematics
28 pages
Fire Pump Digital Panel (FPDP) : Specification Sheet
No ratings yet
Fire Pump Digital Panel (FPDP) : Specification Sheet
2 pages
Database Management Systems 1
No ratings yet
Database Management Systems 1
7 pages
Audiolab Mdac HFC
No ratings yet
Audiolab Mdac HFC
3 pages
UK Tuberculosis Detection Programme
No ratings yet
UK Tuberculosis Detection Programme
1 page
Solutions To Assignment 2: Problem 1: Smallest Error in Differentiation
No ratings yet
Solutions To Assignment 2: Problem 1: Smallest Error in Differentiation
3 pages

Hadoop MapReduce Flow Chart

Uploaded by

Hadoop MapReduce Flow Chart

Uploaded by

Hadoop MapReduce

>benftima$ hadoop jar wordcount.jar word.class

generally when applying the

we have to know the fileinput format in the record reader :

1st record : hi how are you

(byteoffset , entireline) = ( 0 , hi how are you )

2nd record: hi how is your job

(byteoffset , entireline) = ( 16, hi how is your job )

byteoffset : i how are you 15 char+ backslash =16

key should be not duplicate , but value can be duplicate

You might also like