DS Assignment-1
DS Assignment-1
Explaio HDES:
’Hadop Come uith a dütibuad ite SyeGim Called
Catl HOfs.
HDFS data is di tibtd
over SeNeral machines a epli
-catod to ensure
their duualbi (il to yailre & hiagh
to pasald applicatlon. aualabiltd
Hadoop dís
dibibutidtle Sysm s a bock- sliuctured ie
Ssaptin whene eah ile
pre
detamined size hese blocks ane Stored aca0Ss a clusten
% One OY Seveal Mahibes .
exomgle Bock
12%mYs 188 m
HDEs Anchitelin'
the Hadoop anchi tecture
å a package the fle Systin .
mapredu Engine & the HDES
’ HOfs yot'ous the Masla- Slave auchitecure &it has
se komporenls
ame Node |metadals)
calent
Reotdi
Dal nodes Daland
Code
he tle proess Can auo oe clad as
mapper.
a Big dalk layeu
A.i) DalE Soue
ibieatz dayen:
- Orqonizations qunuali a
daa on
daiy louu . he baie tuni on qthe dal Souees
Souues , at
Coning bom Vaiou
Vauy inq
’ he dali obtained komthe dala sSouces,hau to
be Validtd &cdeaned be feu inbo duing it
dogtca! Use
o any
tn he enlpise
’ the tauk
Validatin g , Seting
dow by ngutan layes
the the nqutios oye to ab Sob the
hug. ingow dat e Sort i
dalt out in dikk alzgorDes
hs layen sepols nod e
kom relovanl-
tofomation ,
Can hancle huge Vouums, high veloút ea
Vauilf %data, the tfngetion dnyen Validabi ,cleanses;
the unba ctod da ino
Biq dalt stadk fos fuuthen proauting.
the tundi oni
Idetiícation, fitiati on, Vali dation Noçe edurti on
Trans foun tion, Comp reni oo , nrquation.
Dala in the Hadoop Wold means ELT (Extiact ,
doad E Troms fom) as opposed to ETL
aaditional waehowes
Vúutgation
The
Núunli
daye
zation dayen handtlee tauk inlupretng
&iNúuaigin q Bi data.
’ 1a walizoticn dalr is done bs dala analiu to have
a Jook at ditt aupeli of dali io Vaious
Vaualmodeu
ftouw vüualiztin ayen:
vÛuatizatin Toole
hraditionw
BL Toos
Analyris tools
opuational
Dol stoe
Dala wuhoue kbala seoop
(Dali bkes
Pstind Datzhae NO SQL Dal
Souctured dalk)
Une buchawed
4
3
trplaio map redue wth Sutab le eiarmple
Map kodua :
’A mapredue is
dalk poaninq toot Athích ú uied
to
pocen the dalk aualely fn a ditibt1d fom
the inm petant onporsnl o Hadoop & map
’ Map Redue xoqam tok in ao phaues,
O Map phaue
Redus phase
Map taaks deal uwith Splting & maping dat thich
Teduce asta Shukte & nedue -the datk.
tie ioput q each phase & key Vbdue
Vale pais
Tothe
the map pen, the input
key- Valuu pair.
patr:
’ the output the mapper ed to he reden as
trput
the reduen
appe Ovey
The reduen too takeu fo
put n key- Value toumat
G the output o Tedun is
he fnal output
Aritecdin:
he Map Redue famewok opoali ckuy, Malur >
Paia
the ohole pous qoes heougs 4 phae of exeation.
Tdving maptaks
Reduataks
map(
Amap
Toput datr
data Amapc
Map)
Deen Be as
Deenl
beer Baa kive, Bea Bean
Pire
Car Ca Rivey ca ca Rve
Deer Caa Bea Deer Cae Bea
Dea(u)De
bea
4) CAP theore
The cop theorem Stati that a ditiutd data bae
has o make a kade t beli en oruiutery 4 Availalsililg
When a patition OUcus
A dii bibutt d dala bae Suysum á
is bourd to have patitions
a real- word to neloork faiuu or Some
otha reason.
Thene tore pauti tion bloanu å a ropeny kk Gannot awoid
wkde buld inq
while ous sys So, a di bib utad sus wis ithen
Choose to que up o0 onistercy
pati tion tolesane.
> the heoren providea a
way hirking abous the
dade- offs fhvolwcd dei qníng buildínq di tibutad sys.
Jt help to expain why Catai, typa
more sy' may'be
appoiar fpr Cettain ue Gaes.
’ Au to Brw, the heoren Stalis that a datibutod
Cao hawe at m
7hee thae quaonlia
Koputia CAP theorem i
popelif o 3 dit bilbutid Sys Chaateiu ia to hich CHP
heorm ryu
Gonui
> tdeferyines that all cierls See the Same daa Sinultamaul
mta Which
Whi ch node
node Conned to în a dsbibutod Sy
they Cnel-
’ For eventual Conis tency the quaanl ae a bít d0 0se.
ftenlial Conuú teney quananli cien
means l eventalle
See the Sane dali oo al ne nates n
Some poit time
fulir
4Unodes See
the gan e dal
2. Avaid ailily
1
3. Pati' tion Tolenane
1t defun
do es that he synln ontinues to cpuatt dapie
aubi tauy mesaqe dou failare in pali syo
Dubibutod Sys quanantuing patition tolaane Can
fom pati tion one -the poutiton heala
La