Bda 1
Bda 1
mapC
Bie data ap
Cinput)
reduce C
Map task -
REEGAL
DATE
Reduce task
. sule
shuttle and soxt - wsig shuling pYOcegS
shattnifs key
the system can sort the laa s i
Value. ence se me ot the mappi tasks aYe
dene shHig heqius that is wy it is
fastey process and does not wait toy compleie
t task pertormed by waapper
Reduce - Hgather taple genevated tom map
and taen pertoxm some soxtirq ad gtion
sort ot pyocess on those key-volue depending
o ts key element
Otput tormat pnce al cpevahiog are
ay
peror med,the key- value pairs aYe wnten into
ile with hel e recoKd wnter, each reeord
in new ineiand key and value in space- eparated
manner
FOR EDUCATIONAL USE
DATE
2:
2DiRerence hetten SaL ad NOS&L
NoSGL
SSL
etructured Sueny Lavguapt Not ory saL ar nen-relatioral
Aatabase
REEGAL
-DATE
3
Explain 5v% c RDA
REEGAL
|a Vexacity
-VeKacity retess to reliaility and acou racy
ot data as bi data can [ometim eg be
messy or incomplete.
-ensun g data qyuality is essentia o
denive eaningtl ane achovahle insipt
-techuiues I:ke data cleavi validation
and veitication are emley ed t addrees
in a cCuraces
Explain
1. structured data i
st ructured dat Can he crndely detined ag
Hae data at resides in tixe tield oithin
xeLoxd
"it is type ct data most taiar toou
eveyday lives
cextain schema bindstsoal the lat
has the same st ot pro perties structored
data is also call ed eational data
relatios hips are entoreed by the applicatioM.
ot tale conetaints
"the bus inees Nalue ot stuctured data ies
wtewitin ox can utili2e
ew wel an
its exish vg Systemg and prcesseg Dor
Analysis purpos es.
FOR EDUCATIONAL USE
REEA
DATE
2
R Sei steted data
. semi structuxed data is not bound by
amy ngid schema for data storaae ad
handli
-sinceseistuctyred data doent nee
stuctae quey largi is commony cled
NoS&L lata
-A dato seiaizatio lawe i's Lse to exclhavg
semi- structuscd data across system that
way even have vayed undexlyig nra
stxucture
Stwi- structure Lotent 's alten wsel to
stoxe aetodata abont businees pvocess
hut i can a o ncle iles
3 UnstucHured data'
unstuctured data is the kind ol data Hhat
loesnt adhere to any letinite shema or set
o rules, ts arrargcment is unplaned avd
hap hazatd.
photos, videos, text doc dlq files can considex
unstuctured data.
.additionaly, ustructured data is algo
known sdask dato'" becage it Cannot
he analyzee oiteut proper soltware too s.
REEGAL
DATE
REEGAL
DATE
3.
YARN ( yet Anather Resource Neigkour)
managewtt
eagevtt laer istaduce!
n hado op
nhado 2.x toto inyo y sialability and
op 2.x
efiieneyol hadoc clusters
YesoUrCe man0ger (ma ster'
Kesource
wanoges the Lere
clugter's yesouK(e
resoures oand scheAlts
and schedias
e Neae wanoger Cslaves)
manageS esouxces on sige node admonitos
fhex Ésource usage cbntaners where
Hhe tosks un
hadoop eeosystem
- hive- ssL-ike intextace for hadoop
- Scipti lag tor comple data trestomotion
" Hbase- Nos&L DR that rung on top
ot HDFS
- Zookeeper- Ce-ordinatian Service tordistnbutes
o0zie- worklow schedi System "