0% found this document useful (0 votes)
22 views

ML Assignment - 1

Uploaded by

dohifi2695
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

ML Assignment - 1

Uploaded by

dohifi2695
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PAGE NO.

ATHARVA EDUCATIONAL TRUST'S DATE.

ATHARVA COLLEGE OF ENGINEERING, MUMBAI

M.l: Assigament -1
QL a) Ealist &explaio types f Machine Leosoinq: What ane the (39ues
Machinc Leazoinq ? fxplain ay 5 buisoc s9 LapplicatioOs f ML
the, mnain tåpes f ML: i
Aos.
fallowing
Supervised Lerniog
t is defined as when a model gets taained an a
99
Labelled Dotoset Labelled daba sets bove both input
output paaametes.
here 2 main categories f AupeiSedleorniog
-Claesification deals ith prediatioq eaiegorical torget values
Regsession i deals Lith paediciog cootiouous toxget vomiables

Exomple: Considex o scenoTio ihere uou bave to buld Qo îmage


clasaif6er to differenia to beteA cats dog s E foed
the dataset of dags g cots labelled imagS to the olgo
the nachinO Lil leaso to clossify betuJeen dog oT Q at:

tpe
eaoing
f Ml: tedhoique in which an algoith discoves
potterng and elationghips usinq uhlabelled datosct Uolke
SuperviSed leorniog it doesot ioNoke panvidiag the ago
Lth labelled torget outputs
The primo goal o insuperised lomning is ofen to
discoveY hidden potteOsaimioriie s ox custorswithio
the dato which can then be used tos vaious pUspose s
Such cs data explora tionvisualizato ,ete
2 moio categoies ot u0supesised ieonnigg
=Custerdog : frocess of data poiotts ino olusLers
based QOther siolasitte
Asenciatiog: Technique fox dis coVeriog gelabonships bete en
date set:
- Example: Constders that you have datasett thot contalns infbmatioo
about the purch a ses you made foom the shopThrough clusterinq , the olgo
Can group Bhe behavio mong yoU obh customers
which
9eVeals potenial customers without predefined lobelsT5is type of
nfosmation can help buis nesses get tosget cus toners
idenbfy outliers

ii) Sermi-Super viSed Legsning:oiclo3oioeos saisbri


- t is a machìne learning algosithrn thot works between the supervised
& uneupe ised leorning both labelled & unlobelled data
porticularly USeful when obtaining labeled data is costiy g time
Consuming s e9OUTCe -in tensive lhis appogch ig ugefl whe t e
datoset 1S
expensiueo & tine conSUming
- Bxample i Conaide hot Lwe aTe bilding a language toanslation model,
labelled translat ong for every sentence pais can be reSoUTCe s
having
ioten Sive t allotws the model to leasn forn the labeled & unlabeled
senten ce pairs 9 making them more accurote: his technique hes,SerViCes.
led to significant impr ovements in the quality of machine translaton

tu) Reinfovcernent Leorn'ing.


It iga leorning method that inter cts with the nvisonment by
psoducing actions and disoverinQ Tsial
delay
the most eleant characterisics of veinbscernett learninq
- The model keeps easing ts pesfbmane sing Roword
ap feedback to leo the behaior o pattern
- These problems e algorithms ae specific to a pasticular
problem egi Google sef Driving Cers Alpha Go Cwhee a bot
Competes with humn and ibself to get better) Ecch tme we
feed i data they leorn & adc the data to their Knowle dge
which toainin g data 80 it the bettes & it gets boained.

The following the issues foa


) Poos Quolityof dato Ehsuves meticulous deta preprocessing to
outlierS, ilbes misginq valves & eove unwan ted featureS br enhancd oUtputs
i) undefting & OverAtbng of Franing beta i Address underfitting by
maxinizing toining tme , increaSing model complexlby Combat overfittin9
by
USing data augment ation , remouing 'outlleys & selecbng medels with
ewes feotues .
SotaPAGE NO.
ATHARVA EDUCATIONAL TRUST'S DATE
ATHARVA COLLEGE OF ENGINEERING, MUMBAI
Lack of Taniaiag Data : EnsureS Ml aloorithmg ae taaínec with
aufficient onmouots of data to avoid inoccUTOe om hioced predicia
iMsloJ Implementtoo: ML: models negúne sianiicant Hne
cnostaot maaibria 00d aia ten ance to paovide

alloug
Image
the applicationg of Machioe leoxoing:
Recogaiion : b aigaiAoanty advanced stoxtiog fnn
aimple tasks Iike alcssifying cota dogs to comnplex
appiations such OS foce Secagalbao

Speerh Recogs ike Alexa & Siri coOvert vaice


Syatems
instovoton.s ina toxt facilitatiag communicatioo uith gmt
devices:

A d e y s A analy ze
Recammeader uaes petrences and Seoah
histan to pTovide personali2ed csn teat Sevic es

i Stock Magket Toadog lotelliq eot austems ae time sesies


Arecasiog- to pre ict etockmahet

Ql
Medical biagg0s is ML model:g achie ve
accuxay diso ases Suoh brecst cancA
Paxkiagont disege pocumooiaetc

Exphin steps of developínq Mochine Leoaniog


applications
Ans. DIL Collection of Dota :
You could collect the AGmples fnm a webait ond extracHng data
Aom RES feed
"from device Collect Lsiad apeed meaSuTeent

Public y Qvailable dota


) Preporotion cf lnput bata:
" Once you have the nput data, you need to check whether 1t's fn a useable
fbmat os not
LA8MUM 3WiasEaa 40 3031109 AVANTA
"Some algothm Can accept trsget vaziables and featunes as string; some
need therm to be integers
Sone algevithm accepts feabues in a apectal foumat:niau
i) Aoalyzes the input dato:
Lookinq at the dota you have pass ed in a text edior to check collection and
prrepaation of input data steps poopely woting % you dont have o
bunch f empty values
Mottinq dota in l,2 oT 3 dimension s Can also h e l p o o
Disti) multple dimen sion s down to 2/3 So that you can vis uali2e tthe dota

) Toain the Alqovithm


Good clean dota froom the ogt two steps is given Input to the
algorithm:he algorith extracts iofosrnation os knouledge This knowledge
is mostly stoed forsnat thot îs readily vseable by machine
fos ext 2 stepS

) Test the Algoithm:


" lo thig Step the infoem ation leconed în the previous step is used.
when you rse chechin9 an algorith you wil| test iti to fod out
whecher it works propedy or not In supervised cose, you have some
Known Values that Can be vsed to eva luare the algoithm:

Vi) Uge it:


geal
proga is devel oped to do sone task , as
once cgain it ts checked if alt the pTeioUS 8teps worke d as
you expecte d ou mght encounle Some Qew data bave to
Tran the Algornthm sep
Q2 ) Explain tne steps equied for selecting the ight machine
leosning olgorith
Aos: The following ane the steps
lb oe PAGE NO.
DATE.
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING, MUMBAI

ying- to pmedict fomecast aa taget


Value theq look ioko Aupesviged leascDLA
Otese you hauc to Uge IgAupeviSed ecag
have choseo aupevised leorntng hen neat you
on tuhats your togget va lue ?
ocus
1f target value 1g dianete then ClasseAation
F toxqet value. ie contiOuo us theo Jse Remession
)1 oy hove chosen uOsupViSed lecsiog you necd
to fbcus what's youim?
*1f you wont to ffb you daba Oto eDne discxe ke g0ups
then
Cluctesing
to Gestaae of haw stsopg Bne
At iob ach then Uee densihyestimobn algosith .
lDatg: Ase the featreootionUDu K NOmioal Ahre thec
Masinq yolues Ia kohizesIP ues what e a
outliexS Io the deta ? To oAOJ the
algoithn Aeleciag process al of Bhese fea hu of
help ya
Supesed LerDiag Unsupetviae d LeavAtAg
CossiAcabon
fegsessIol
Custeing
hensiy Ésbmathoa
Selechen Algonithn

FxplainLogish e meguessioa Linea gearessio LAth example


Aos
ALogistic fegmession: t ig uSed fox bioy claasiAco tion
Lshee we USe. aigmoi Anction trat tokes input as
indepen deat VOriaoles praduae a probabilihaue
pobabilih
between
Logistie egreslon pedicts the output of a categosicol dependent valwe
ioniable Therebre, the oubcorne must be otegorical or digcne te valve.
can be eithey Yes ors No toue o folse , ete but instead of giving
the exact volue as O 4 it gives. the proba bilishc volue whieh le
be ween
have 2 clOS Ses Class O and Class 1 P the value
Example We
of the logiste fncti on fos an input is roeale than OS
Cthreshold value) then it belong to Class 4 otherutge it
belongs to Class

1) Linea Reges sion : It is a type sf aupervised machine


machin e learrnig algorithm thot computes he (inan
selatio0shp between the. dependent vale vosiable and one

me independent featunes by fting tineqr equation to


obseved data:
" Example: Oneof the most common eOmple wheTe in
USed is when predictin9 the
price of a hoUse by analyzinS Sales data of that egion
" Linec RegseSSienrs classified into nain cotegovies
Simple Line Regres sion
Multiple Lineo Regression:
Q0) plain pesfoornance evaluaion metot cS fbs binary classifica tion
with 9uitable example
Aos: Classificaton is the problem of fdenifying to which of a s et of
categori es /elasses new obsetvaton belonas. based o0 the
training set of data containing ecods whose class label is Koowo
Poedi cted Roedicted
1
Actuval TN

oActual FN
ooTPDent
11

TN: Toue Neqativ es Cactuol O, pnedicted o)


F e TP:TRUe Posibves Cactval 1, predicted 1)
FP alse Positives (ar tuol O, prediced 1)
fN false Negatives Cactua , predic te d o)
PAGE NO.

ATHARVA EDUCATIONAL TRUST'S DATE.


ATHARVA COLLEGE OF ENGINEERING, MUMBAI
Conside the fllawiaq values for the aoousio cmatri

Accuzasyi
Accusay is detoed as the goto of the nunb er of cosKect
paedie hons & the total Aumber f predichons lb lles
bebween

Auro cyTP+ TN
SoTP+ fP+ AN+TN

) Precision Recal:
the oaber
Lef false PoativeG.

500
CTP+FP

Reall in case of canco detecb on


Lwhere want to mioimze the Qymbe of folae
NegatveS
RecallsTe Soo

Specificihy
specitcih is defioed as the maio af TGue neAoHveS ad
NegateS + folSe

apeatfc 350
tv) F1-Score: o 209
both Precision & eoall equals to
" F1- Seone is a metrc that ombines
and eall. te volue lfeS bebween
the hamonic mean
o& 1 Cmoe the vQlue bettex the F1 gore)

2x Psecision × Recall.
Pse cisí on + Recall
2 O:909o o7G92.

83.33

v) AUC - ROc :
" AUc CAsea Un de the cugve)- Roc C Receive Operatt g
chaac teristics) cusve is o n e of the most impestant eatuation
mebics fbs checting any classiAcoien models pe fbrmance
" t i9 plote d between FPR CX-axi9) TPR C-axrs) 1f th e
vQlue lesshas Uhen the mode is even. woSSe han
SOndom Quessnq model:

TP
FP+ FN P+TN
500 = 2.5
200

03 b) Explain issues decision toee.


Ang The folouing the issues in decision boee:
) Ovefitting the batas Oversting occUrS whena decision toee
model becomes too complex & capures ooise in the toaining data
Poo generali2ation to new,unseen data: echniques
like peer prning , coo9 validotion help mitigate this iseue
) 1ocospor ate Continvous Valued Attribu te s Decisi on toeemust banl e
coninuouS data by convestnq Ib t n disoete intemal s: lhis can
be achieved thsovgh techniques ilke binning , where the continuoUs
ange ie divided into bins by vsing algooithms that can
handle continucUs values directy 9uch as CART:
PAGE NO.

DATE
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING, MUMBAI
)Handliog Iaioiog Examples wih Miasiog Attaibute yalues wben
traiaiog data has miasíag values the decision tee algrtha
oeede strategies to manage fhese gapsComoo mehods
ioclude imputiag miasiag values, uaing Buxtagale splz
the alqcithcm to handle missing data
dusiag the leoTin poces

Hondkag Attbubes ith Diffext Costs Drfferen t


attibules

My have ayig COskS QsSociatedith thex meoSugen en t

Aleo attve MensUre t x gele ctina


deciSinn txees use metaics Iike Infogmabon Coa r
Gioi ladex to select atta bue foxaplthas Alteraative measus
Goin Ra boor ChiaQONecan be Sed D Qddess
apeciic isSves Rke bia s tosed attibule wtth mCAu
walues the tees pefosn an ce R0 differen t
type
04aExplain Reqreseion ine, scater plot, erOY fo predicton

atiaight lioe that best apresents the


data scatteg plob
Simple Linea Fegsession this lfoe is defocd by the
equati n whese :
the. dep endent yatre voioble
the independent variab le
S Alope of he
b
i) Seatter Plot
two
*A scatter plot fs a type of gaph used to display & compare
nUmecal vaofables · fach point on the scatte plot represents
nes3
observation tn the dataset
Sctter Plot helpS to wsualfzina the elattonshíp betveen the varotables
toen dg.
and cn fodticate the presen ce of coelations

Evos fo predi ction Befers to the diffesence between the actual


value. and the predic ted value oA the dependent vaable
obser vatton
* ln egresn analyas, the predicthon emo Pos each
is alled the sesidua uhich is caleulated as i
yactual

iv) Best Rtting Line:


the best fBng Roe IS the Itne that minimizes the 8um of
the sqved difforences CesTOrS)between the observedvales and
the values pedi ced by tho ine

The objectve is to Aod the lne where Bhe 8um of the


f bhe SesidUals Snell as possi ble.

Q4 ) Explain the Ran dorn Porest algorithm fn detoil.


Aos
Datoset

Deciaion toee-1 Detsion thee -1 Decsicon boee 1

Resu lb -1 Result-2 oooResulb-N

fveagin9

Final esult.
PAGE NO.

ATHARVA EDUCATIONAL TRUST'S DATE.


ATHARVA COLLEGE OF ENGINEERING, MUMBAI

The Rogdom fraeat algonitha


method uged fog ckassificationad meoe Ssion tasks lb bu lds
hem o
muBale dectsiog tarees dusig bainig meges
inoe accutt

The fotlouiog ose he aleps in Randon Asoest Algoaithm:


) Boatskap Sampliog : ig used to Cnea le multtple subaeS
of the taoiaioq data : fach Subset is egeclod by sanamly
selechag detq paiats th eplacecment
i) Gildiog Decision Tsees: for oet bootatap emple
decis n ConstaUc te d nsea d of considena all
feo hre fon 8plittin a at each node foA dom forest
aadonly SelectS Subset of featxes This fecthuse
iS calle d
boggig class{ficahon ecch bree
in) MakiAg Psedich n
the oxest pedicts the cless an input dotG poiot
made
the Aoel
pseiebon reictions

Advantge
Redvcttcn of Ovextna

Handing of Missig Valves


busadvnotag es
Coaplelly
Tiaioiag T)me
ntepe tabillih
Q5A) Compoxe Baggiog ond Bo9stng with me Ao ven se to ensemb le
|earnin g- Explauo hou eet these me tho d) help to toproue
the
Ans Ensemble learsning helps improve machine leovning results by
Combining several models his approach allow the production of bete
prelfctive pesfermanee compeed to a single model. Bagging &
Boosting ase 2 types of Ensemble Leasningi
)Bagging
Bootstrap Pggregating , also koun as bagging is a mach in e
lecang nsernble meta- algorithm designed to impvove the Stabily
and accusaay of machine lerDtng algor thmg
StatisHcql clossitcetion & tegession
a)Reduces Ovesftbng Gy braining multiple
multiple models on difent subsets
of dara and averag:g thein prediction bagging amooths out e
the peebons eduees the impact of oUtliers and oorse
b) Stebili2es Models' Multiple model3 trained different data gubset
trat the fnal pedicbon ig les sensitive bovaTiabnS
cio the training idata

Boostins is ensernbe modeln techotque designed to cTeatea


sbng classife by combining mulBpte weak classitersi he process
invelves buildin modes where each new mode | afms
sequntially
to coTect theeTTOS made by the pevios Ones.

a) Enhances accuGa cy By seguen tally fbeusing on the eor¡rs f


proevioUs mode ls, boosing CTeates sersieso mode l that
Cosse ct Qch mistokes highly QCCUSa te
others , leodng
fiol mode I
b) Reduces Bias :
Boostins techniques ltke addres
grodient boosting
iteratively mtolmize the loss fün cton fng both bias
and VaNance 9 and poducing more Qceusoale and
generehza ble mode :

O55) Woite shost note CoosS Valfdaton Ma ehine


Learnlng
PAGE NO.

ATHARVA EDUCATIONAL TRUST'S DATE.

ATHARVA COLLEGE OF ENGINEERING, MUMBAI


techaique fo machloe leosoto4
evoluote
node1
iodepea denbdataset It touoves tmalaing
Subset of
he dota
poeylausly uoseen Subset 5is
bhe modes pastmance s stawe andoot oerly
depeodent on the

Baste CxoSS- Volidation :


dotaset a voicotioo get:
feserVe a Subeb of the as
Rf) Toìn the mode uaing the toioing doteset
) Sualuae mode l petrm WSiO4 the Molidatio

Hpplreaioas
Compai
a ent pedictive
the
method
pe sfonman ce of
modeling
Medical Reaeonchi
Researchi Cznss
Crsnss alldatiaa hos gigoificont
applicah on n he me dical gesecach fe l
) Meta- Analyas il bydota saienhts
medical atatiab es meta anaysis

You might also like