Hadoop Notes
Hadoop Notes
ohat is
krame wonkdata
thot helbs store
ftadoob is a big data
amount
acYosS mut'ple
and þrocass
bnCess hige and kau'l!
ouilt to handle
Combuters Tt is oben -50nce Sungle comjuter
data that s foo
lage
to þreces eticienty
Hedop Comþonents : wrth mllions
In agine we hawe huge libany Count ha
need to
5 books (data). Che day We
the
amomg
wo rol n a euo
friends. Eoch friemds count
combine this Yesul ts
boo ks and in the end, thay
b get the total count quckty and eb7'ienty.
wo Ks n similas way tor s'
data. divides data and tasks among mu Ifip le
Combuders making þasing fes tes and storoge
ore ve liable.
or ks ther
Hadoo has four main arts that
ike teem
ditibuted ile Systm (HDES)
Hhkrt
be Storage System that sloves a
like a
a big bookshe lh
Ku ping all the
HDFS IZ a like
amount ob data. Sut instead os
sits hem into snall
books (data) in one place , it dilfererd sechons combutea)
þieces and stoxe therm in
Sbve big data ebticienty
Dota is duplicated b onehas conpute kails
soanothes
coF8
actosS
Fast access because data
s
mulitk Comtuters.
MapReduce "The foacesaing System ot
stred in
n ditteon
Now that gu bo0 Ks (data) ae count the oord
seetions (combutes) , how do we
ttienty! ending al the boo kg,
Instead oh one Compute
multile combuters
MabReduce sends the task to
collects
the eiults.
and
Caunt he oog
Map fhase : Each Comptea to it.
booke ai'gred
combined to
All the results
uce Phose i
the kinal count.
Yet Ano lbe Resouscc Negotiato (YARIN) :
dean. You
Tmagine yau aaemaehe s emanage %
Yesoulces (ables, books ,
work ,
Hodoo b Common
all the othes
Jhis is the toolbox that helbs
twonk amoothly.
hadosp wonk st þrovide
meded kor hodoo
tlies
neceass aiy kles and
to
Othes tools in tHadop Eco system make
with -addifonal tools to
Hodoob wohks even easiex
data processing maþ Recuce Jobs.
sq,l -ike queriee into
Hve : Convert easer data tanfomator
:A sobng lngage bo veal-tne bi data
HBae: A NoSQL data base
access.
that wor ks
fostas oong engone
Spak : A
oith hado
(ladarts ito buted le sy lomn ) :
oenass mu ltple Computers.
Stows lange les
sing nnodel that beaks
2 MapRecduce : A þroces the smalles baats and
tas ks imto
2u them in in poalel'
Mas te
hame
mode
slave slawe
Slawe
Data
nocle
mode
Hadoop follaws masle-slave vchi tec hee dasigud to
hondle lange- scale data storauge ond þouing tticin y
3t4 consiss thre þimay layots
DHDES (Haclcofp Disbibutel Laye
Frle Syskem)-- stoeg
) Map Reluce - Proesing laye
3) yARN (Yet Arnobe Re scwte. degofotor ) - Resowce
amd
multiple machines o imþove kault to lerace om
Shead.
Combonents of HDFS
1. Nome No de ( Mastea)- The fle Manage
shiud in
Hacoop Clustess intaconnec fed combuters
A cluster is a wnit. Similarry
that won ks together ofsingle
mulh'ple commodity
cluster congiets awaiable devices
Hodot kor lale and widey
hasdwore Co4
won hing togctaesCuste
t doob (Name Node 4 Resomce Manage)
" Mastes nodes - and conbol the ysten
manage Manoge) stre
Slave node Node
- (Data Node
? Node
Cnd þ0cS data.
1 single Node Hadoct. Cluce* In Single Node
Custes as the name
yneans all
suggests s
owr hadaot
mode which
an
omly singk
i-e. Name Node , Dota Node , Secon daay
DaemonS Node Manage
Name Noole , Resowrce Managel> On the sarne machime.
the Soume
that all of o ooe sses will be
91 also mean
Jvm (Jaua Viatual Mactine)
handled
by single
Procass hotanee. Node
2. Multi Node Hadoo Custer : Ia Molthle contains
the name gests ?
sug
Hado clustevs tind ah custe set
mulile odes . m his will store in
all of ows tladoob Daemons the Same cuyter
dikferent- dit7entnodes in
mulhle node hadoop
set uf ufize 6w high
tay to
clste sctat we
Masle (Nane node
nodes for
þroasng Managee) and we utlze the cheaber
Resouee
the slave Daemon' s (Node Manage
Syetem fo Data ode)
and
Hlodan
hae |00 G43 ok data.
Sutfpose twe Hadaoþ and spork in voles
Pxes sing lo0 6,8 of data on undealying aschitectues
ditbevent opbronche s due to the
kramwo» ks.
models ob thase
and þrocessing
l00 GB o Data Hadoot :
FrscessingHodos is a disti buted storage amd rocesseng
amewonk that wses the Map Re duce bogamming
and
model 9+ is deaignad batch procesíing aloss
is obtimized kor handling laige - sc ole data
a dio bibuted custe.
Processing InHacaos :
final' autput
pyoduca th
Job to he Custe
e4: Subrin't ha
nto a JAR fle
the MasReduce Progounm
ackaze the sheaming)
prefae the sei (fo Aodonh
(for JAA) or to the chuster sing the
Subnt the Jo b
Command.
hadoop Ja
-dass> Jhds|kath to in put
hadooh Jor <Jaa-fle> <main
.
steb-5: Monitb he Job
hadoo Resohee Manage UI to montor
Use the
he brog xess of the Job
te sues.
eroS
-Check logs for amd
Step.4
Caching/þerstence to sore intemadiate
Use
Yeults n memo
data .cacha)
to oFtimize fanllelisn
Adjut the
data: dada. eþao hhion (200)
lq-5 : Erocute the Job
. Submit the spak Job to the clte win
Spak- Submit.
3 blocks
So, Jhe kle s sblit into
Block A : 28 MB
Block B 128 m8
Block Cc : 44 MB
bock s velcated 3 tomes and stored on
Eoch is
diyerent dotamodes Datanodel , 2.3
Bock A : Sto ed om
stor ed Dadanode 4, S, 6
Block A :
Dotamode i &,9
Block c : Storeel
Block A can be stll be
!, Datanode 1 kails then
&3.
accesed som Dedanode 2 othel vacks
7 entie rack 6ais,.
data
ense data anailabi ity.
Hadoop 2x intodeeed YARN kor resowrce managmnt.
enabbing sutport fr muliple þrocesing boamelignte
obkmizat'on l Hive): ü
b) S Kes Join (Broodcos t Jon ) ib one table
Use MAPJOIN
Small .
Toun -ue
SET hveauto Con vert
hive. obhmize . skeojon increase
SET mabreduee job. reduess duors.
no. of re
SET
in he've based
c) rse bucketnf oat a
Insteacl ot has hing bucketng
On skewed co wmnS.
CREATE table Salsluckot (icd int , amount bloc t )
nfo lo buckets.
Clustered by (id)
) Dynamie þauttoning þarth'ons,
þre-de foned
Instead oh xelying
Enable dynaie par i ioning tue;
þarthon =
Set hu've. exec dynamie "
modk.
set hive. exec- dynamic. þorhhon.
HBase
(5 Hve ces ng
(HQL) for bateh pxo
Heve sQL -ke
HsFS. High latency
DB tor real-tme
coumnar
HBase No SQL sits on tDFS.
Low lateney