0% found this document useful (0 votes)
10 views8 pages

21261a05a8 - Assignment 2

The document discusses the MapReduce programming model used in big data processing, detailing its components such as Map and Reduce tasks, and the role of Apache Pig in executing these tasks. It explains the process of data input, processing, and output storage in a distributed system, as well as challenges in collaborative filtering and potential solutions. Additionally, it highlights the advantages and disadvantages of machine learning techniques in data analysis and recommendation systems.

Uploaded by

vakiti96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

21261a05a8 - Assignment 2

The document discusses the MapReduce programming model used in big data processing, detailing its components such as Map and Reduce tasks, and the role of Apache Pig in executing these tasks. It explains the process of data input, processing, and output storage in a distributed system, as well as challenges in collaborative filtering and potential solutions. Additionally, it highlights the advantages and disadvantages of machine learning techniques in data analysis and recommendation systems.

Uploaded by

vakiti96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

BOA P.

Sa:
Assigmmet Midhn heddy
Set 3 2126)A0SA8
P1) Undanstand outut
Man Reduee
A.
Big Data Prousing emloys the Map Redee Programmag
Model. A job Man fedee Program .Eah job consista
Seuaral Smyalle nit, called Manhadue Tasks
>The imlemenkaton oof tadoog Man Rede ses Java
Braeoork

Man
Reduau
Input Man)
Rede
Maunt
Man Tagle hedua Tasks
Model
’ The odal dafines tioo important tshs, ) Map
2) hedue
* Map takes inut data Sat es juiees Oata ad
then auous nodes oo naralll
* The Pedue task, ohih takes th
and then combines thoe data hieces
into
a Smala jat Data . A eduee taslt aluags
the man toshsf

’ The ingut dotu HDes


le. Aio, the
outct f tha task gets Stored HpES
’ Ihe Man task means a tast that innlnent a manc)
which Juns e's anlication Codes fot eah vale rai

kay -vale pais CKa, )

to task whih tulkes


trut Vz
tem thehe aan as inut, Combine those data iee i t
Smallen St o data wsing
sing Combine
’The Combines are
otion al lass. Housenen the
Combinois otienal ls.
7Combiesus ontimize Man keda task thet
befere the

data
iagrom.
Pglatin
Sait
Grut Shel Pig Sonen

ortiig
Comil
Eretion Éngine
Man Reduue
Apache Pig -cmsitts
consists 4 Various Lomlonnts
DPao sen : Pig donjts go though ths narsen Lomlonad. Le
vaious chaks ohich unede Byra g Boyd, *yre
ound othan miscalaheou cheks.
a Direted Acydé graFh
) n z :The AG us passed to
Loich panforms loqical optimigatiok buh'as ush doon
rorgjectio,
) Conylil : The commiar comonet tranfoms the oytiniyd
logi'cas lani o aa&agune Maphedue Job
4) Erection Engye : This comyoert dubmits as tha Maphedue
jdbs in Sooted cda to the Hadoop

’finally Maptedue jobs are eeited on Aynacha Hacdoop


to nodue desied Sesuct

2get blot Nane Node


Distibd loatis
Hpfs Fs
namengde

fs Data
lied JVM

eliet node S.yead

Datu Noda Datybde Dakonile


datanode ehtonodo | Ldetanode
Step 1: The cliait oyens the fe it aks wishas to ead by
-calling nens the RleSystem objet, lohich for Hofs
h the
an instae of Distoibedile Sydtem
skepi Nous the nitibuted Fesystem Calls the namanode
rtoredae cal (RPcs)
wesing
’ Thes 0isti bitedEieSyste, etuns an Fs Oocta Inyutsag
Step3: The eliet then calls Xed) n the steam. The
OFSIn ud toeam, which dteges dátan ode addresss tor fit fus
bloks in the e, the connt to losest (irst) datanode

Step4: Data is Streaed rom tha data node baek to


the elieyt which calls eud) oeetedly on the tean
Stens: Now, ohen tha end f the blok is seached,
JFSInyuct Sbeam coie elose the conneti on to the database
Step6: when the elict has finished aading,
talls
lose on the
)
ESDtalntSkyeam
’One thing to note is tht
DESInpt stream ehcoutes
encountes an e
erros,
r , t wietry the
yetclosest ne for tha blok. The 0FsInut
steam
oso Veifies check gums ber duta transferned to t
the datanode
D-9Exain hos
callabotike
the chalesnges aud roetiul dolton oY
Woks ad dis css
mynoving uts
A) Coslaborotwe filtng orks by eeaging te didea th
t tioo sers hane agreed that
they are akely to agua on fuke temis
’ Colaberatine fikoing
A9 Usen base calaboratie ng
* IHem based colabosatila fat9

Chalamges d n callaboratine ftteing


Cald Stat roblem Nen oLhitemith no intenat n
roa a ch allenge to mahe
as the system lahs data
ausate ditin
In
dataseti,esteatt oith ony a
[posse mdsics tht
Small yution ites , lealing to spore
an indes.the eseti verss agenithm
e-commese
beceuse
Bearch through million
Yst he abe to

naighbowa
yrolut that the System
folae nagotiue the eenmen
to secanend adthough
them Secomendeol
are
rtoduta that aze
>false roties ihe .falie rotive
oich the consumen does not
bt
can
dasiable because thay

e
eeutay
The potdtial solution for myoroving ito cwth
collabotectike filbening
Hybrid Arpeachea combining
contet baed ethods oth éransfn deotning
quy knoledge from Sinilay
tehniges
lomains .
can
hal

Albo -csoodsoweing Docta for

tags can help


helo ceate intil nailes, cohih can
be used t malke Jecommedati oes
cold stnt rolblsm

Mtrin fatoy izati on Leehniges eilke Singular


value kaat Saearea
Decomos ition (svn/Atiting Least
Lan
`ycraty byby decomosing the
halo redee snaysty
wnterastin rmtoi ilo loe.
wser-iten
l imas iol matsices
aarrensional nmatriesto
Byaten to edit sBing
This imeve
inte ato ard
amyroe secompendaton qyuality
’ Usen feedbak and Cotinous learning allow
on eeommedation
wiss to provide edbat
can inyove acomandctien qality
Thia can halp adjt the modal and mtove
he systum's cray
0.s) Oicuss aboct Modrontagus and 0isndvantage
Sunat Visd loaenig
A)

) High Acusaey
with wReleabled data
lessicction cokan orovidad wit
d
)wide Agicatron
Vibed leariny s wsedo
dleteorn,
iha frad aletet on, email
li'cation Iroroblem
classitieation and sale fovecas tiny
HEasy to Idoprt egtessicn and olec3ion
’Modal, cseeiay liear cndenstend and
reigefoDaTd to
toee ase foen
amlemet
4) Psedictie Powe

effective otucts for knoon class


’ Poode
o labelad decta
sparited aaving
isadvataye
labelad datasct,
Denndney Requra ge ,
) Data Conduiny nd eena e to /gynae
coidh can be tiwe -
) Limited Scone
’ Cannot handle nes /wnseen Gater data
effetvely aud alies Gn onedfined labels

Oloafttg Risk
tyaning data but
not roany eglarizad
poory on den data f
oat

Comuttion Intensive
) Time and comtat
TsairingYodels
Can be $los for dasge datassts /
liha heusat netooTh
Comlen

You might also like