Hadoop MapReduce Flow Chart
Hadoop MapReduce Flow Chart
Flow Chart
BEN FTIMA MOHAMED
Whatsapp 21692971305
Map Flow Chart : My JOB in Hadoop : Hi
WordCountJob
How is your job
I take one small file :
How is your family
If it is 400mb and if i keep this file in
HDFS , how many number of Blocks will How is your sister
be given 128+128+128= 384mb + How is your brother
remaining 16 mb
What is time now
4 blocks will be given ,Right ok , so
What is the strength of hadoop
Let split this file in 4 input splits :
Let split this file in 4 input splits :because your HDFS is
128mb (hadoop 2.0)
I have to count number of words occurence in this file big.txt
how many times “hi” is available : 2 times
how many times “how” is available : 5 times
output should be :
(hi,2)
(how,5)
(are,1)
(is,6)
(you,3)
(your,1)
(job,1)
(family,1)
(father,1)
……
(what,3)
>benftima$ hadoop jar wordcount.jar word.class big.text outputDir (outputDir is optionnel)
the recordreader knows only how to read just one record (one line) only at a time in the inputsplit
—--> for each key value pair created the mapper run once —------->
map interface will be running in key value pair , the entire collection in
java is based on object type no primitive type in collection
for each key value ie byte-offset line the mapper run once , its parallel processing instead of sending the
hole task to one system , we are sending multiple job to multiple system
the key value pair = (0 , hi how re you ) must be of object type they must be of object boxclasses
type .then key is the number of char in a file must longWritable then value can be Text .ok
We resume now: is you file format
is TextInputFormat
your recordReader is converting
that file in (key ,value)=
(byteoffset,line) and here
byteoffset is LongWritable type
and line is Text type (boxtypes
hadoop).
the mapper will accept duplicate keys
mapper can take only (key, value) pair and give only (key,value) pair as output for —-->reducer , the output
format depends on the job you are running
Now come the reducer wich will combine all your (key,value) pair
Here comes in picture , map function :