0% found this document useful (0 votes)
4 views

Data Science Module - 4

The document discusses various data selection and feature extraction techniques in machine learning, emphasizing the importance of relevant variables for model accuracy. It covers methods such as decision trees, random forests, and statistical evaluations like p-values and information gain. Additionally, it highlights the significance of avoiding overfitting and ensuring the selection of meaningful features to improve predictive performance.

Uploaded by

kanti chandrakar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Science Module - 4

The document discusses various data selection and feature extraction techniques in machine learning, emphasizing the importance of relevant variables for model accuracy. It covers methods such as decision trees, random forests, and statistical evaluations like p-values and information gain. Additionally, it highlights the significance of avoiding overfitting and ensuring the selection of meaningful features to improve predictive performance.

Uploaded by

kanti chandrakar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Modude -4

FeatusL Selechion
7 t a paocaRs oE Consuhng a Subset O dala o r
tunhons of data to be he pheduchH O Vasuables
fos edelg alacrith m s
edudaamciey o Co -yelaA
E Ex Dots can hawe M y
vauabeg va dala.
tonsaucted by t n sji nmna
voLabu ca be
NeuwNew onknuong
vauabe
nto

Y +usn
Vasiables wwth logs,
binay Vauabble ,
dota e otus mcuy
Col leut valu
One add mde
not

poblem, when

isno oal, u Spasty


Ch dola
Obseauating
MR han no. of
no of fcotuue
be ausi of
computlwnal Yealo
had to ma nupulate

or Feaue Extaatiobr
Fealiae Genesodhan
referyto t n g th aw dumP o
FeduIR Extoachon

CAahng t to aUoid qalba


dala s coaululy have d o m a i n
Sc o d to
t de of am a t than

expet we imanahn

-ER: Lo ea h uie attm


Suey t o g t spPondanlk

Can be
mftmtuos
but k poskibe to capue
&wseud ,

Rele Vat
weul, poSabu
tD loa , , s yov dd yov dnt
Releuant
t, S
tu l0j
3 Rete vOMt & Lwel, possible hot k loa t
b u t yov
didnot Eno
. Not Releunnd oY uekl ,
Cat CaplRe t
& you ethei you
5 Not Relevat 07 usekd
CYt didn'E Ocus tu You
yet-ethes Camt ezpteit tt t
e t R e l e v a n t or, uusetul
totn' pttU
Filteag
01deas the pOSbe ealuatu p e t to ranEang
aMeuC Oy Statishc
uh as COSTeaton w t
had on

the outtome vasiable


fralu nto
individual

pdutie power af
Tdates
accomt tu raming
Possible tnkaacticmy
not taing Possible tnksactimy
toeat feauau independau Ombined uuth
hen

etue COM be Mee Pouuwpuuejul


othe featuaes.
pxdicth
o u i e aß pxducti
wuth unu
o
onnn feoduie as
-ExPun uneai PegSRiOn
Rak Ord e
NoR k p-value oTR-sanked

-value Or highest R- SAuas ed


acLddi2 to loweg

WoapPes Subs e
e oob ealR4,
ttalulu,
seletos Pues to tund Subs
Jt a fealuae
STme hixed SIze
of
tungs
Called )
PoSseble Size k Subsels of T
No. O cccus
an
exponehaly, ovR {tlng
3rous
to c o n d o r
2 aSpeLto O weppey
Eo Select ealusep
am alqAthm t o i e
Seleting o r filteg ho de
ude thha
on S e l e t m (aiteuon
2 Deuicde
Ser e atuip
u goat.
yoh
seleh Aatuth M
tcus on
o n soMe
some
electmg
Sepuuuk RecaaRon dy S u b t a i n
eauuy

CaLeO
euhel by odding
S y Slenauc way
n o e l u n p s, odd
Selaton staat u m t
Forwasd possbiltap & pick al
one fadiwe at a h m e Tay but qely uw8s
best.Stop uhen no ong Cmp roueg
feauny &
Eumatun inluda a l
Backwagd
*move one ealle Stup uhe. fmoun
qraduallp wöst
Cnum
Celectun
feluse Make
Combined ApTOah hybad aproah
-elavamce uRe
ecundany MAx imum
muunum Dankecd &
e e dy algo taitp u h best takes hugly
rP OuO NK SE
Selec tion Cukion

Dole Sienlusl to sele +he qeletion C7lle on toue


e ditet selcton ileluon Ond gelrc o b u s - modot

R - Sauuosed
R 2; ( -9;) pophon oj vaSu
On model
(-9)

P-voues
P-vaLuds
estimote co-ekCWt (B's) m
Jn
res Ln,
Leso
-
nul hyPothe asuntL Bas
he
Captuue te probabdb o cobsenving
-P-vaug wndeg nul
dala we obSerned
ypouis
obseue dala ns
> unikly tu
low p- Value tobe
higjhly Fely
-

oet
n o n - 1AO
3 EnRopy
Cnkeim
Atauke Inhmauon
ATC Matel
m
no q poAametay
2k 2 lm (L) iie k +

nL)
MAximzcd valu PE
oal minimuze AIC Le Ot Gkelihovd

5 BIC Bayeiam Intamahn Criterion


K* n(n) -2 Ln(n)
o l u minum BIC

Decson Tees
- brenkng big deusteny doun into seuei of wesheng
- a cka4cabion alathm
nesb Munh
-ExXCha d a g c n s > Y e s , quing to om bak
Sing Comebak next Monh
No, not going to
- The ee u bamches on mde tnfAmalme tung
Cmdom Vaable P(x= ) PCx =o) probablit
X
thot X Drfalse
ETTL&o py
- measue o how mxe up SonmeHaunj
-p(x=o) loq,P=o))
H CX)= -P(X=) loq,(POx-0)
vanighuy
enhopy
1f PCX=) =0 0 P(x =o) = o

InfRmatm Gain, T6%a) HA) O 5

a,
tor gven allubule

IG(x,a = HG) -H(X1a) PaCx=)

Fer Speuc Conduhomal eliophy,

H Cxloacap) -P(X=ilaa) lo9,(P(x-i|a-a))


Pxaola:4o) le3a(p(x-ola:a,))

HCxa) = P(a=a,). H (xla-a;)

Deuum t e Aladittm
allaibute -to sPlit te 4Ree
deude wduch
-Atuuhhm to
nede to maximze in}Rmaluon a n
Ln oda
attibule
Sele
he e e to avoid oveka
pauLne
VasnabL
m
Deusen Tee
Centnous
Howctlu valuabe
Valabe fo
fo bn aay
binau
ConttnR
n
Theshti gecd d to
Predc tos at least lo
h a n io'
pattuon
wto e
Random Foresk
deuson txS
Jt s upeiue d algduth m, t geneholze
uth ba4nng
dataget
bees OY Sub set Ok
Jt Contaung no. odeigtor accsGu
p e d u c k we
Cnd thu take avesa qe o mpou
Of he datase

O No O ees n he Peset (N)


- t has 2 hypeposameley
geleu eah tlee (F)
aeNaA featuley to ravdemup
no. o
YeplGcemem, le
a Sample h
- BootsRap Sampleg han once
data POint mo
te game
mugh Sample Con be
Size o l SeF ,

Size usaly be 307, o


Sape
hypeapasameley
this 3
adjusted ie,
decion beas)
constut N fur each
m ad
Algath a bootstap
sape
take
For eah ae , featuuay (Sauy 5 out q- Ioo)
t f
node
Bmdomlysele
a featue +o
o deude
inttimalin 2aim
p
-

etau
Reeak a t e a t h sage
the
Sput

a
The kaqgle Model. clata S u e n e
waking
we a ake
taglune
we
wuhee
CoMpcy
Suevilust
Dele
ompae
wmpan
ulh
totmg
elaluen lp businem
buSines eg
,uiih oo
ag Compebhoy
for
for
fee, e y husto 9oln haa data poblemg
B e a dat«
problemg
For a to9olw
o w d S c u s c e
T
d a le
e to Sole|
to Solhe
thA
t ha
u shomex
Cushomey
a
a nd
d q qer
CoMpoMG pay t e
uha
sek,
A Senge Cones& ConHegt,bamung
SRE
&test
sek
xsaie given
ty
ntv kaSe, ulho
uulhou
iddM
nto kaste,
y's ase ysaKe
uioaded
uloadcd
pSedictecd
aT
Th e Semn
kaHle Sy
de
shoaln
poRuupanly CaRL o(tudaged to Sutmmd1 te
5 time O day
eA beag d y updales
As pxcducdony ce SubmiteI ragge
M Deal ume
bl competts
leap koung
o8
Compituen shoud no lastlona
NetHix Pau2e ased
Ex. Onnat
model ScAe O6-
2yas & finad ining
e

in OS
as tov tonplcatol tu Pt
O t 30 Nov 10
o t u dtin ot 2o
SubmRm

Customnerg who need analySis


e
-agde hlls He 94 Hw
those wuh Sl

ShaiL popietay dota


Onnpamua
daues, pedict h e
Siun
atlubuky q
auuton S u l n c e compan
CAash by
FaleES
ESSayScrung
sd Compethen
e S s a y D a l e | , x l a

d, dHi e
d , iddr esay
has olumny
dala
rde

Pob-Gacl

impucauon q
ettiul
whed cau altocu 4 u r
alwcus aur
oent
mu stdudicd, n h u bitinga
H u m M 9 5 o d e i s

studidud ,
b, iinhu bitina
meu
Machny
meking thung
3 well
or to do
Ceaku egsaY
to
wsutoa Jood
teyt
3 Is eal of a

SAanndaadzedtegt

User Relention mw del to pXdut


buld
behavior
- ChakuMy daGung Te cosd ge
subsuphiocn
-

usa pay monthy


wole mony - UsiS cE 22 point od
M gus
wme batk afl S:24 a etc.
Ho May g
Menth
To w CRLON ondd new use2.
YUVCLCR YetLUgn yol
Conside?
. No t clys uga ugied Fist month
tume unld Setond wyt

30
er 1 , . . 30
3 No. C POinly on
day j
.+etod nc point m fst Menth
se
5. age ndn ot
6. SaLen Size odeuice

4. Dud u g fuout chagung dyaons pro}e

Puvay 8 beng Shdd w


yov

dek uth what inf8maton uNol yOU


how MLh
0es,
Pot

hoaue owe it.

CalLgeLU piuvacy
d
3 physic ad Nor
d e u t y Theke
- o fun thta
Rnamaa c SS
harasnt

DitoS wid olalk hosm to temwy


a u s to pasDVLal
Unwowted spam
PYOvOCalani phcto Empleymid sk
w m w a l e d
ad
wated Soutaton

Re tomendolien
Engne
AReod wov ld Ve commeudctun

MUwebook vacotiom weim


w e t fur edae
U
Uses V-» k
Bipastle raph
Ne&Aeg
Neghboos
Prwblems wwwh ousleuli c
Seusiity q
S
Cuse ot dmensional Mekic

e m c e y chanze ou
Oveahumg
3 Cometated featie t
tL lo eS to updola
4 Relale impRlama
S. Spas nex
MeagL U N errrna

COmputiuonal omplexit
Beyond Neasest NeighboY ML dassilicatcn

Denoe i Use i s pxeunu fer ikM


tiw
-Ass me 3 numuu alubu fos esth ni
se pKRAn e kor new 1em
2 fi3 then a R

Pi=P,fit , 2 P , 3 e
ehvnate the unead Co-tCmtg

Jhe dunenSionaLy oble m

e SVD S PCA to akle. oveh dumen Sionety

Recuce Dimenslony
Cnate Lalent e u u e
meaSAable
CoOmg O PRSon d ie l y not

One
dimenen to
reduce
Mp
Cooln
Teducd l a l e t aspeE a ha
*dimaionab

Singulo Valu Decomposun (SVD)


Yannk

om unea Age bra


mxn mabux X o k
3 Malux
t into odutt
tCaM be onpose d
X = USv'

MX R
wtay
S Kxk Receml-å Diagnaa
V kx n Sauwask Utoay
PauR wRe Cathwonal
x Cmqu Daliset . wuhn has s e rahnis

M no.
d uppe bond m no. lckew vauableg (tunnu na.
PoRanetu)
tad YOW o CoTMspon ds
tD tenM
COTgPomd a

lotent
mpeRaMu o e a )
-Diasonal of S upndy to

ppextueg o SVD
to eah oth , %o|
U`Y cASA Ohoona
columns o baed on Singds
Cam e
ColunnS

Unpothame om
Value bald
bad on
osduiul
a
Dimumjw
hughest to lowet least imprik
w ale
wuu h Oun
S,UgV SMallia
Cut ot pot og cosugally ,t
t VeLy

chosen
d' X
vectag8 haue
onhnal

K We
no lon* for h a t
íten
ahna
aug Since
o
ep Ce Not w u
33y Fu
aA Compwl SVD Some m eanlå

Yatns
=USv x User Yatny
Usr

by looking
tont x Pu
hng
araing
Pxdi-
e x pemS) n e
ly En pe Sfu Pesent
peLent
coputnhon ShU
Ln ENN
E NN
SVD in
paoblem
Dla
Mussny

Componemt Analysis PCA)


Paunupol
memod
a dimensions Yeciuct
3
such t h a t
tnd V
approxumaton tox
XE Uv b/ X amd
disalpemu
uumuze
The dlot poutt u;vi pxcuute d pkMo
tkm
i, actvod pgekLnw iRi fr itm
Suad dleLce
minwru
- find bext U &Y +

n.
lotent feusaf to une
Choose d oluu tor ea
UuW ha OuN

Lalem ealie ah

ilem Column fr
V ul ha U fr eauh

Lale eu t h e n tdt
then
20,
me han
eHha
about loo, but 4 o
d u saly
inoeoR Ccmpley U c D Y Y e l e t e d

ona
wull be
uulu
Lolet Jeauity
Reulting x
Such ha
wuth U V
Rnd U&v
PTOof O
Sauoied entue ai SMal
SMall
entues
tuil
thar
td US, Y Suu
2 wwm erToS
d
dxdd maluxS
molux G&
&
tnvuble
wth any
Moduky u uuth
nuese oh G
V
ecy
(.v)- x
UV
(u. G).

Alrcthm
Pick a Yamdom V
U, wmle Vis hxed
OPhimuze
wmle U hxed
3 0Ptime v, s not much
mucA
3 wiu change
a Step 2 legs th4n
le h4n é
Repeat ase homgen
uangins
o-etxret¢
Choose e if Convey 9ene.
the
cettagc
then
-
Fix V and uptate
For uA i find amin
e ijuv,

set
i -( has peenre
bRom usA i
whue Suhset O v ulhch
Dnty those mouef

onside
consid only
thOe mou
need to
hlt updating U,we
ony, w
we can
USesi has yatad. only.
e
tas Hem
He
on
dependenk
Eau seR has V.
tt updahon U or ornes, we Con
we com
posallelire dalas
wmer,
neN

ase added oY
fo
uses selechng
ophmuzu by
when ned
cam
we
updatins U g V ompuuting time
Eeep Save
thot URO
upda Ony

You might also like