0% found this document useful (0 votes)

23 views

Chapter 3 - Introduction Via Linear Regression

The document introduces linear regression as a method for supervised machine learning by formulating the problem, discussing statistical inference and learning approaches, and focusing on maximum likelihood estimation to learn the parameters of a discriminative probabilistic model from training data.

Uploaded by

PranavPrabhakaran

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Chapter 3 - Introduction Via Linear Regression

Uploaded by

PranavPrabhakaran

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

INTRODUCTION TO MACHINE LEARNING VIA

LINEAR REGRESSION
Study key concepts band on tu
peu nd learning

3 I Supervised learning

Problem formulation Gwu training ht D containing

N training punts tn tn nal N
X n are independent variables i w variates domainpoint

explanatory variables
1n are dependent variables dependent variables labels
responses
E
ntn
15
N IO
X
I

OJ X
X
L I X l I 2 En
U L O Y OG O f
UT
X
l
X

1.5

bout predict t 1W an unobserved domain point

need to make assuphins about the mechanism
generating data inductive bias
difference between me minting and learning
3 2 Statistical inference
T
predicting r r
given observation X and known
joint distribution pix t

mhm a non
negative loss function llt.tn
Cust Russ risk if correct value is tandeshmate ist

la loss lg Ct It It I 19

Ex quadraticloss lzlt.FI ft Il
I t I
U I loss to It It It I lo O t I

Generalization risk loss average Ivr pm prediction fly

Lp Ltte Emina htt I HID

Optimalprediction tix obtained by mhinking

I E It
aryfin Etna felt
txt

Onlyneed to know the pushionir distribution pm

joint distribution pt not required
o Fir le E HI Emp Ttt
as
Et S TEl p the DT ft 2 IT EY
Ait DT
Elt Ix 2 2 I ELT ly

O
duEI 2E 2E IT 1 1 0 I E Thr

Et PHI x OS fI t t t 0.2 f tt x
n pI t 1 1

Li t xI t O S t l il O 2 OG x

Performance ofpredictor t r is measured

by the
Clitheroe between Lpl f and the minimum
generals tution loss Lp Et of the optimal
predictor
m the following three learning approaches for t HI
are discussed frequentist approach Bayesian
approach ND L approach

3 3 Frequentist approach
Assumption Turning data punts xn t n I e D are
drawn i i d from a true but unknown distribution

p It t 1 tn Tn T.ee pl t t I e l N

Distribution unknown ay run strategy from us we

cannot be applied
Two ways to approach problem of unknown dishib

separate learning and inference

lean
approx of distribution Bott Ix 1 bated on
data and u te
I lxl ay renin Eta pm I l IT E Ix ell

direct interference via empirical risk minimitation

E Rn learn approximationTp Lt 1 of optimal
decision as
Eb CH agmein Lb Ctl
with the empirical risk loss

LA fr e I t n f Hn

Remaki In contrast to the generahtation loss where

expectation over true distribution is calculated

here we take the average over the available data

we hist look at vice alinear refresh in example

Assure i pl t t I p It I pl t Ix xn Unit 10,1 and

PTI t TN Lh ZI x O l
n th
15 N lo
X

OJ X
X
L I X l I 2 En
U L O Y OG O f
UT
X
l
X

1.5

If distribution is known the optimal predictor

ee act

uncle ez loss is th x I an L2 Tx
Minimum generalisation loss
Lp I I l J tf Ptv dt
hh ziti
E Thx 2 tin 12TH ELT x 4h42 It
E T2 H E 4T x Var NH o I

3 3 1 Discriminative vs generative models

Formulate a hypothesisclass familyofparametric

probabilistic models learn parans of mold which best
ht the data
Muhl la bdt as a
polynomial of domain puntt
Gaussian noir
c Curtin
µ w Wj WTH HI
je O
with weights he two wrist and featureveeter
T
of It I El x x2 XM J M model order

Now define the parametric prob model

pl t l x El N µ tie B Y modulepunnets
E he M
B precinin inverse variance

Discriminative model
learning the anchtimal distribution

plt l t E by learning theparametervector

directlyfromthe data estimator in 14 can be directly
calculated
mode discriminates t band on their Apps
Main focus in this section

Generative probabilistic mode

learning the joint distribution p l t x1 parameterled

by E i e plx.tl e
Remark This mutils also the distribution pit

ofthe cover ates model privates a realitation

of x via p Ct l E bycomputing the marginaldistribution

Un bag es theorem to obtain p I t l t E and
then the estimate fol x l i n l
3 3.2 ML learning
Assume model crow is hired we want to learn
model parameters I
in Mm mu f dahe points
Discriminative more
p Itb Ltd he D
II p I th l th e P

F Nlt
n I
n 1µL xn et D Y
Take log on both sides
N
lug likelihood LL function

Lm p Lt Lt i K B I
n I
hn p l th Hn E Al

E In µ Hn wt tn

E en

ML learning problem chhned as minimitation of

the negative L L I N LL
which is only knit in if muchl pawns

Yip tr Fahr pl t n l xn K Al

cross entropy w tog lo is criterion

why Strong law of late numbers
ftp.uhnpltnlxn er B E exit tenPITHEAD
p
i

Ex up THIplTHI Ilpl t Ike PDDeawpetaiahk.gg

expected cross entropy ML problem attempts
to make model band pl t l x y pl closeto actual pit 1 1
de problem requires only learning the a posteriori mean

IA tem can be ignored

tn
Mff Lpl Inline 4
w
meh

Lp e training loss
u 21 can be solved in cloud 1am

hes N HED ell 131

T
Ep II Lt I
Il th of Itn 1 Nx Ime 1 matrix

M T
H El x1

Mini mitation of 131 is a least squares ILS

problem with the solution

Ini l EptEb T EptEso

overdetermined case Na M t I

them
got
Es 5 is also known under the

name pseudo inverse j

o
Dittantiating the NLL withrespectto p yields
plum Lpl Emr
0Wh Hung and undetithing
Assumption lz loss

Going back to learple with p It 1 1 NI ah 12TH U D

and the optimal but unknown predictor E Lx
h n 12TH
Hw does this impure with the ML predictor
In H µ x Wme
n t Eru lN
M l
L M 3
X Ma g

O L Oy U6 O f
n
f

o
M t predictor underhits the data
large training loss

o M 9 we hits the data i

small training loss
but late generuhtation loss

Lp Incl Et He pull IT Mlt End

model memorites the training tehonly
o M 3 good choice
Is r
undehthny
omitting
i i
v

o
s
f
u u

t l t t t t e I
n

what happens Aw a
large training set
reiss n
1

o.s
tent genuhration
o.o

ut
Fp training
N
to 2 Jo lo fo to fo
Renate
If N is lap enough compared tothe of
parameters in E Lp Lyle let the
weight vector yn that minimum also

approx mr minutes Lp
Etf It
Im Lp he
o
yn af
LD Lynut Lp Inc a LpLett
l w Lage N L we make this precise late

Lp lend IN as it becomes more chillicult to

hit data

Error analysis
Two hourtypes bias and estimation error

Write generulitation loss as follows

bestgeneralizationloss forgiven
lv
motel
Lpl him Lp E t
Lphe l LplFt givenhyp
class
Mh generalisation bias

Lp hem Lp Htt
estimation error
generalisation 9
best ML estimate
Neafl
ML estimate fu hind N
squarerootloss

is
Lpl him
OG iph.it
o y

v l Lpltt
stem
l l l l l I I J N
lo w 30 40 50 Go 70

Validation and testing

Problem i hw to what the model crown
The distribution p Lt t is unknown and
Lp El cannot be evaluated
Shihan divide available data in twoparts

training ht hold out or validation set

used to obtain empirical average
Vr
Lp ht e
th I NI 1
Ll th µ tn ul
our validation ht of hse Nv
he has been obtained via training set
Lp Ld Unit 1W model ordertelectris

Test ht is additionally needed to compute estimate

of l p l w determined choice of M E
3 3 3 MAP learning
Use a
prini information on vectorof weights
can be used to reduce the effects of our hitting
11Wh explodes for increaring M butut Sechin2.3.41

Remedy priori distribution which puts

apply an a

less weight on large values i e

Ku NII L 1 iid variance d

ML matimiting LL ie
pl to 1 E BI
MAP matinite
w
pLtd El XD B p let IT
n I
pl t n l tn ie B

MAP learning criterion

en pie
Iif
En lnpltnltn.ie pl

with training loss Int N II to EDEN

and t.tn p we obtain

meh htt Hell

ML criterion Imlaitation term RIE

flute on Lbystandad LS analytis

Knap It t
Is I II to why

As N fits
lay contribution of term 21 becomes

Mpi gbh ML estimate

ate 1h51
1T

I Lp

oT
LD
l l l l l s
ko Lo o hn Ld

In creating 7 has the same effect as reducing mold orders

0 the example for a

priori dis hi b Lagrange pdf
M
s R y II hell I o Iwl
j
Remthly MAPproblem
LASSO
Mj
n Lp El t
In 11well
leastabsolute shrinkage
and selection operator
3 U Bayesian approach
putunknown
Frequentist approach existence of true distribution
plt ltl assumed ML IMAP problem tries to hnd
parameter vector E huh that a model dis ti b
p I t l x we B is close to the true distribution

Bayesian approach
Iii Data points are jointly distributed via a known
distribution
Ii Model parameters e are jointly distributed with the data

Ignoring P in the IMaung assuming only E to beoplimited

Character led by joint distribution
p I t b he t I xD x 1 1 for a new domain point x
and new label t 1 to be predicted
o
Bayesian mold evaluates the a posteriori posterior
distribution p l t l xp tp.tl p It I D t jiver
XD t to predict new label
a to
posteriori distribution obtained by manipulating
dis hi b da

By why the chain rule of cond probability

a p l bl c
p l ab l c p al b c we obtain

p I t bet I xp x I I tou l t s x pl t I ta tis uh

p
plusI xD y p I t b l exist pI t l D I
a p ri distribution
PI tis he t I xD x

ply pltblxb.ie pl t Ix y H

a penni Uist likelihood hist if new label

likelihood term p I ta l xD E

II Mtn Hulin uh B I
o
potty new label PLH x El NI t I felt It AY

u Factoritation can be praphically represented by a

Bayesian network
We hrs t chop the dependency on the domain

punts in L1
pltp.w.tl plus pl t.rs w1pltlw
Pluta vertex tar each involved r V and

acliched eye for all conditioned r v s he

the main r r S in each dishib

h l N

Ot
Bayetianguaffproach is inference based
approach the learningstandpoint is hidden as
v

all quarh his in the model are r r S

hav bturn a posteriori distribution

o
bymanipulating the

Plt D plwt p It D Hel dy

p t1D H
pl D x
t
f pl DX

Jp let pl tlDx E de LY
a posteriori distrib predictive dis hits
y u
tor nu del
UhrigBages
Phd Pl tool XD k
ply 1 D pl TDI xp
a posteriori posterior
belief

Computation in funeral di th cult but for the example

that p l to I xp uh pl t Ix I are Gaussian
distributions L in wens the convolution of these

distributions and I lie loss assumed we obtain

Pitt xD I N Ml x Wmap s l x with

s ith B l it E HIT 17 It Eb I IH

tu Bishop lyns 13.58 13.59 12 115

optimal predictor is MAP uncle ez loss

not true in general
3 4.1 Comparison with ML and MA P
Dagen us a posteriori distribution p.lt It D
allows for refined prediction of labels t
a more

compared to ML a posteriori distrib pl t l t En I

Nl µ It um B 12milady far MAP
Tame variance for all the data points

Bayesian i accuracy of prediction depends on the

value
of set by the uneven prediction of data
t n

y Mltikmn.pl IS Lt

n't
I X X X
y
x
l l l l l S t
OL 0.4 U G US 1.0

adaptive uncertainty for Bayetian

For N s is i
Wmap Wn L
t
S2 Lx p Bayetian approaches ML
3 9.2 Marginal likelihood and mold will
Advantage
Bayesian model selection without validation possible

marginal likelihood
pl t I xD f pure IIF p I tn l tn ie de

la je M car mult in smaller likelihood

entrust to ML
in
approach where pl te I xp End
can only increase with in creating M allows to telect
model order
plt.rs1 too
selected model order

t
Z
r l
4
l
b
l
f
M

3 5 Mr mmmm descriptionlength In DL

Cm push'm Y r V XE X with distribution pit

tussle Is compressionscheme can be dehorned which
needs F log pc.tl 7 hits to represent x

o MD L selects a model which lostlessly Compresses

D to theshortest possible description
u th u lil Wile M is selected which minimises
description length
N
T lo
L y p I t n I t n Wen l B nu t C th I
nIl
of bits toquantize
smallest of bits to describe parametersWmc Bnc
model
acts as regularize

not discussed in class

u su Chapter 2 G i n te t

read about pitfalls associated with learned distributions

Chapter 2.71

(eBook PDF) A Second Course in Statistics: Regression Analysis 8th Edition 2024 scribd download
100% (1)
(eBook PDF) A Second Course in Statistics: Regression Analysis 8th Edition 2024 scribd download
45 pages
ISOM4520 Sample Midterm Examination Solution
No ratings yet
ISOM4520 Sample Midterm Examination Solution
10 pages
Chapter 4
0% (3)
Chapter 4
42 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
Excel Exam 01
50% (2)
Excel Exam 01
4 pages
Toc 1
No ratings yet
Toc 1
17 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
ML-chap10_2024_110300
No ratings yet
ML-chap10_2024_110300
29 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
Bayesian Learning Rules
No ratings yet
Bayesian Learning Rules
37 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
cs229 Notes4 PDF
No ratings yet
cs229 Notes4 PDF
11 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
AI UNIT 3 tycs
No ratings yet
AI UNIT 3 tycs
16 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
Lecture 1
No ratings yet
Lecture 1
5 pages
BTMMeeting25Nov2020-StatisticalLearning
No ratings yet
BTMMeeting25Nov2020-StatisticalLearning
49 pages
ML Merge
No ratings yet
ML Merge
145 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
213 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
I2ml3e Chap10
No ratings yet
I2ml3e Chap10
27 pages
CS168: The Modern Algorithmic Toolbox Lecture #5: Generalization (Or, How Much Data Is Enough?)
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #5: Generalization (Or, How Much Data Is Enough?)
16 pages
CS772-Lec8
No ratings yet
CS772-Lec8
14 pages
ML_basics_lecture2_linear_classification
No ratings yet
ML_basics_lecture2_linear_classification
34 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
AI Learning
No ratings yet
AI Learning
81 pages
03_lecturenote_MLE_MAP
No ratings yet
03_lecturenote_MLE_MAP
7 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
An Information-Theoretic Approach To Generalization Theory - Part2
No ratings yet
An Information-Theoretic Approach To Generalization Theory - Part2
22 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Statistical Learning Methods
No ratings yet
Statistical Learning Methods
28 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
ML - Unit 1 - Part Ii
No ratings yet
ML - Unit 1 - Part Ii
18 pages
The Naive Bayes Model, Maximum-Likelihood Estimation, and The EM Algorithm
No ratings yet
The Naive Bayes Model, Maximum-Likelihood Estimation, and The EM Algorithm
21 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
poly_aml
No ratings yet
poly_aml
76 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
Vapnik - Statistical Learning Theory - Wiley 1998
No ratings yet
Vapnik - Statistical Learning Theory - Wiley 1998
760 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
A Systematic Review and Meta-Analysis of Longitudinal Hippocampal Atrophy in Healthy Human Ageing
No ratings yet
A Systematic Review and Meta-Analysis of Longitudinal Hippocampal Atrophy in Healthy Human Ageing
12 pages
Clarification: The Covariance of Intercept and Slope in Simple Linear Regression? - Cross Validated
No ratings yet
Clarification: The Covariance of Intercept and Slope in Simple Linear Regression? - Cross Validated
1 page
Data Spss World Viks
No ratings yet
Data Spss World Viks
7 pages
6.3.2.1. Shewhart X-Bar and R and S Control Charts
No ratings yet
6.3.2.1. Shewhart X-Bar and R and S Control Charts
5 pages
Data Analysis Midterm Exam
No ratings yet
Data Analysis Midterm Exam
3 pages
LC M11 12SP-IVa-1
No ratings yet
LC M11 12SP-IVa-1
20 pages
Chapter 9. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 9. Test of Hypotheses For A Single Sample
98 pages
Stat Cluster Sampling
No ratings yet
Stat Cluster Sampling
22 pages
GEO 426: Choropleth Maps Tanita Suepa 1. Classification Map
No ratings yet
GEO 426: Choropleth Maps Tanita Suepa 1. Classification Map
10 pages
Sampling Distribution
No ratings yet
Sampling Distribution
37 pages
Metode Analisis Perencanaan I (MAP I) : Fika Febi Novianti (2018280030)
No ratings yet
Metode Analisis Perencanaan I (MAP I) : Fika Febi Novianti (2018280030)
6 pages
Pure Mathematics With Statistics
No ratings yet
Pure Mathematics With Statistics
4 pages
Applied Business Forecasting and Planning: Moving Averages and Exponential Smoothing
No ratings yet
Applied Business Forecasting and Planning: Moving Averages and Exponential Smoothing
48 pages
UNIT-1 Polynomial Regression
No ratings yet
UNIT-1 Polynomial Regression
7 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
Results and Discussion Assessment of Tourist Attractions in Gonzaga
No ratings yet
Results and Discussion Assessment of Tourist Attractions in Gonzaga
13 pages
Time: 3 Hrs (Maximum Marks: 100) : Cbse Class Xi
No ratings yet
Time: 3 Hrs (Maximum Marks: 100) : Cbse Class Xi
5 pages
RAJIV RANJAN 22 Jan 2023
No ratings yet
RAJIV RANJAN 22 Jan 2023
66 pages
Advance Statistics - Project Business Report
No ratings yet
Advance Statistics - Project Business Report
8 pages
Integrated Learners - MLR Tutorial
No ratings yet
Integrated Learners - MLR Tutorial
15 pages
R Machine Learning PDF
No ratings yet
R Machine Learning PDF
137 pages
Quantitative Finance PDF
No ratings yet
Quantitative Finance PDF
2 pages
Score Rater 1 Rater 2
No ratings yet
Score Rater 1 Rater 2
2 pages
HYPO Final-Exam Statistics
No ratings yet
HYPO Final-Exam Statistics
3 pages
Slide 1
No ratings yet
Slide 1
4 pages