Chapter 3 - Introduction Via Linear Regression
Chapter 3 - Introduction Via Linear Regression
LINEAR REGRESSION
Study key concepts band on tu
peu nd learning
3 I Supervised learning
explanatory variables
1n are dependent variables dependent variables labels
responses
E
ntn
15
N IO
X
I
OJ X
X
L I X l I 2 En
U L O Y OG O f
UT
X
l
X
1.5
mhm a non
negative loss function llt.tn
Cust Russ risk if correct value is tandeshmate ist
la loss lg Ct It It I 19
Ex quadraticloss lzlt.FI ft Il
I t I
U I loss to It It It I lo O t I
I E It
aryfin Etna felt
txt
O
duEI 2E 2E IT 1 1 0 I E Thr
Et PHI x OS fI t t t 0.2 f tt x
n pI t 1 1
Li t xI t O S t l il O 2 OG x
3 3 Frequentist approach
Assumption Turning data punts xn t n I e D are
drawn i i d from a true but unknown distribution
p It t 1 tn Tn T.ee pl t t I e l N
LA fr e I t n f Hn
PTI t TN Lh ZI x O l
n th
15 N lo
X
OJ X
X
L I X l I 2 En
U L O Y OG O f
UT
X
l
X
1.5
uncle ez loss is th x I an L2 Tx
Minimum generalisation loss
Lp I I l J tf Ptv dt
hh ziti
E Thx 2 tin 12TH ELT x 4h42 It
E T2 H E 4T x Var NH o I
Discriminative model
learning the anchtimal distribution
by E i e plx.tl e
Remark This mutils also the distribution pit
F Nlt
n I
n 1µL xn et D Y
Take log on both sides
N
lug likelihood LL function
Lm p Lt Lt i K B I
n I
hn p l th Hn E Al
E In µ Hn wt tn
E en
Yip tr Fahr pl t n l xn K Al
tn
Mff Lpl Inline 4
w
meh
Lp e training loss
u 21 can be solved in cloud 1am
M T
H El x1
overdetermined case Na M t I
them
got
Es 5 is also known under the
O L Oy U6 O f
n
f
o
M t predictor underhits the data
large training loss
o
s
f
u u
t l t t t t e I
n
what happens Aw a
large training set
reiss n
1
o.s
tent genuhration
o.o
oy
ut
Fp training
N
to 2 Jo lo fo to fo
Renate
If N is lap enough compared tothe of
parameters in E Lp Lyle let the
weight vector yn that minimum also
approx mr minutes Lp
Etf It
Im Lp he
o
yn af
LD Lynut Lp Inc a LpLett
l w Lage N L we make this precise late
Error analysis
Two hourtypes bias and estimation error
Lp hem Lp Htt
estimation error
generalisation 9
best ML estimate
Neafl
ML estimate fu hind N
squarerootloss
is
Lpl him
OG iph.it
o y
v l Lpltt
stem
l l l l l I I J N
lo w 30 40 50 Go 70
ML matimiting LL ie
pl to 1 E BI
MAP matinite
w
pLtd El XD B p let IT
n I
pl t n l tn ie B
en pie
Iif
En lnpltnltn.ie pl
Knap It t
Is I II to why
As N fits
lay contribution of term 21 becomes
I Lp
oT
LD
l l l l l s
ko Lo o hn Ld
Bayesian approach
Iii Data points are jointly distributed via a known
distribution
Ii Model parameters e are jointly distributed with the data
ply pltblxb.ie pl t Ix y H
likelihood term p I ta l xD E
II Mtn Hulin uh B I
o
potty new label PLH x El NI t I felt It AY
punts in L1
pltp.w.tl plus pl t.rs w1pltlw
Pluta vertex tar each involved r V and
h l N
Ot
Bayetianguaffproach is inference based
approach the learningstandpoint is hidden as
v
Jp let pl tlDx E de LY
a posteriori distrib predictive dis hits
y u
tor nu del
UhrigBages
Phd Pl tool XD k
ply 1 D pl TDI xp
a posteriori posterior
belief
s ith B l it E HIT 17 It Eb I IH
y Mltikmn.pl IS Lt
n't
I X X X
y
x
l l l l l S t
OL 0.4 U G US 1.0
marginal likelihood
pl t I xD f pure IIF p I tn l tn ie de
t
Z
r l
4
l
b
l
f
M
3 5 Mr mmmm descriptionlength In DL
u su Chapter 2 G i n te t