0% found this document useful (0 votes)

14 views8 pages

Lecture03c Maximum Likelihood Annotated

The document discusses the concept of Maximum Likelihood Estimation (MLE) in the context of a machine learning course, specifically focusing on its relationship with least-squares problems. It explains how MLE can be derived from a probabilistic model where data is generated with Gaussian noise, and demonstrates that maximizing the log-likelihood is equivalent to minimizing the mean squared error. Additionally, it outlines the properties of MLE, including consistency, asymptotic normality, and efficiency.

Uploaded by

Quan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

Lecture03c Maximum Likelihood Annotated

Uploaded by

Quan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

annota

Verside

Machine Learning Course - CS-433

Maximum Likelihood

Sept 25, 2024

Martin Jaggi
Last updated on: September 24, 2024
credits to Mohammad Emtiyaz Khan & Rüdiger Urbanke
Motivation
In the previous lecture 3a we arrived at the least-squares -

problem in the following way: we postulated a particular

cost function (square loss) and then, given data, found that
model that minimizes this cost function. In the current lec-
ture we will take an alternative route. The final answer will
be the same, but our starting point will be probabilistic. In
this way we find a second interpretation of the least-squares
problem.

Histogram
140 1200

120
1000

100 en = Yn-Xin
u 800

80
Y
y

600

' 400
O

I
40

#
200
20

0 0
1.2 1.4 1.6 1.8 2 −20 −10 0 10 20
x Error in prediction
Gaussian distribution and independence
Recall the definition of a Gaussian
random variable in R with mean µ
-
-
and variance 2. It has a density of YEIR

2 2 1 (y µ)2
p(y | µ, ) = N (y | µ, ) = p exp 2
.
2⇡ 2 2
In a similar manner, the density of a
Gaussian random vector with mean
µ and covariance ⌃ (which must be
EEIRPXD
a positive semi-definite matrix) is
-

N (y | µ, ⌃) = p
1
exp
YER

1
(y µ)>⌃ 1(y µ) .
D
(2⇡) det(⌃) 2

>
-
Also recall that two random vari-
ables X and Y are called indepen-
dent when p(x, y) = p(x)p(y).

vTMTr
imm
=

*T
A probabilistic model for least-squares
We assume that our data is gener-
ated by the model, noise assum En
- is Gaussian
yn = x>
nw + ✏n ,

M 10 sy
-

where the ✏n (the noise) is a zero- p(n) =

.
,
mean Gaussian random variable
with variance 2 and the noise that
is added to the various samples is
#
independent of each other, and in-
dependent of the input. Note that P(ya(x w) ,

the model w is unknown. =

N(y /xw 54 - ,

Therefore, given N samples, the

likelihood of the data vector y =
(y1, · · · , yN ) given the input X
(each row is one input) and the
model w is equal to assumption

independence on noise
-
YN NY
> 2
p(y | X, w) = p(y | x , w) =
n n N (y | x w,
n n ).
n=1 n=1

The probabilistic view point is that

we should maximize this likelihood
over the choice of model w. I.e., the
“best” model is the one that maxi-
mizes this likelihood.
log p(yIX ul ,

Defining cost with log-likelihood

Instead of maximizing the likeli-
=
losy
hood, we can take the logarithm of
the likelihood and maximize it in-
=
log TTN(y ( -
,
6)

log T expl
stead. Expression is called the log- =

likelihood (LL).

1
N
X
= exp(y y, ) -
x

+ coast
LLL(w) := log p(y | X, w) = 2
(yn x>
n w) 2
+ cnst.
2 n=1

Compare the LL to the MSE (mean

squared error)
independent of w

max N
1 X ↓
LLL(w) = =>

2
(yn x>
n w)
2
+ cnst
2 n=1

-
min N
X
1
LMSE(w) = (yn x>
n w)
2
2N
Li
n=1
argax
Il
arguin Inse
Maximum-likelihood estimator (MLE)
It is clear that maximizing the LL is
equivalent to minimizing the MSE:

C
arg min L (w) = arg max L
w
MSE
w
LL (w).

This gives us another way to design

cost functions.

MLE can also be interpreted as find-

ing the model under which the ob-
served data is most likely to have
been generated from (probabilisti-
cally). This interpretation has some
advantages that we discuss now.
2 = logp(y(x m) ,

Properties of MLE
Maximize
X was
MLE is a sample approximation to
the expected log-likelihood:

-
⇥ ⇤
LLL(w) ⇡ Ep(y,x) log p(y | x, w)

MLE is consistent, i.e., it will give

us the correct model assuming that
war
to
we have a sufficient amount of data.
(can be proven under some weak condi-
tions)

&
wMLE !p wtrue in probability

The MLE is asymptotically normal,

i.e., optional
1
(wMLE wtrue) ! p N (wMLE | 0, F 1(wtrue))
d
N

S
h 2 i
@ L
where F(w) = Ep(y) @w@w > is
the Fisher information.

MLE is efficient, i.e. it achieves the

Cramer-Rao lower bound.

Covariance(wMLE) = F 1(wtrue)
Another example
We can replace Gaussian distribu-

i
tion by a Laplace distribution.
1 1 |y >
p(yn | xn, w) = e b n xn w|
mat 2b

4 ly- *-) = MAE

min

Radiographic DHA EXAM
No ratings yet
Radiographic DHA EXAM
53 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
50 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
32 pages
Lecture03c Maximum Likelihood
No ratings yet
Lecture03c Maximum Likelihood
8 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Regression
No ratings yet
Regression
11 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
No ratings yet
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
46 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Mil780 Likelihood Function Discussion
No ratings yet
Mil780 Likelihood Function Discussion
3 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
9 Mle
No ratings yet
9 Mle
39 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
ML 3
No ratings yet
ML 3
66 pages
1 Review
No ratings yet
1 Review
7 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Linear Regression, Active Learning
No ratings yet
Linear Regression, Active Learning
10 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
Patern Recogniton Part
No ratings yet
Patern Recogniton Part
8 pages
04 Lecturenote MLE MAP Discriminative
No ratings yet
04 Lecturenote MLE MAP Discriminative
6 pages
Bill of Quantities Sample 01
50% (4)
Bill of Quantities Sample 01
13 pages
Note on Generalized Linear Models: y y Xβ w X β w I y Xβ I y Xβ X w X
No ratings yet
Note on Generalized Linear Models: y y Xβ w X β w I y Xβ I y Xβ X w X
4 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Output 25
No ratings yet
Output 25
8 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Output 23
No ratings yet
Output 23
6 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Understanding Maximum Likelihood
No ratings yet
Understanding Maximum Likelihood
5 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
Dis 1
No ratings yet
Dis 1
5 pages
Packaging Technology PDF
No ratings yet
Packaging Technology PDF
35 pages
Fisher Information For GLM
No ratings yet
Fisher Information For GLM
35 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Policarpio 5 - Refresher SEC
100% (1)
Policarpio 5 - Refresher SEC
2 pages
Health and Safety Inspection Checklist
No ratings yet
Health and Safety Inspection Checklist
10 pages
RISC-V Instruction Set Summary
No ratings yet
RISC-V Instruction Set Summary
4 pages
Instruction Booc For Electric Engine Telegraph Logger PDF
No ratings yet
Instruction Booc For Electric Engine Telegraph Logger PDF
61 pages
Physics For The Anaesthetic Viva Complete DOCX Download
100% (15)
Physics For The Anaesthetic Viva Complete DOCX Download
17 pages
Play Types
No ratings yet
Play Types
11 pages
Presentasi Produk New
No ratings yet
Presentasi Produk New
32 pages
Vajra Guru Padma Mantra Benefits 565657
No ratings yet
Vajra Guru Padma Mantra Benefits 565657
3 pages
Chemistry Project Polymers Synthesis and Property Analysis
No ratings yet
Chemistry Project Polymers Synthesis and Property Analysis
19 pages
Volume 2
No ratings yet
Volume 2
270 pages
Purchase Order-PO - 01333 - 22-23
100% (1)
Purchase Order-PO - 01333 - 22-23
2 pages
Computer Networking Notes For Tech Placements
No ratings yet
Computer Networking Notes For Tech Placements
16 pages
Physics Paper 6
No ratings yet
Physics Paper 6
260 pages
Human Behavior and Victimology 2nd Week Discussions
No ratings yet
Human Behavior and Victimology 2nd Week Discussions
3 pages
Hypertensive Crisis: Instructor'S Guide To Changes in This Edition
100% (1)
Hypertensive Crisis: Instructor'S Guide To Changes in This Edition
6 pages
Titration Questions
No ratings yet
Titration Questions
4 pages
Mathematics 11 03784 v3
No ratings yet
Mathematics 11 03784 v3
12 pages
Cinta Teflon Cafe PTFE 5151
No ratings yet
Cinta Teflon Cafe PTFE 5151
2 pages
380 Dia Clutch - Oyster
No ratings yet
380 Dia Clutch - Oyster
29 pages
Enex Sustainability Report 2021
No ratings yet
Enex Sustainability Report 2021
109 pages
Computation of Electromagnetic Fields Around HVDC Transmission Line Tying Egypt and Ksa
No ratings yet
Computation of Electromagnetic Fields Around HVDC Transmission Line Tying Egypt and Ksa
6 pages
Inf Theory 3
No ratings yet
Inf Theory 3
76 pages
"Knowing Oneself": "How Well Do I Know Myself?"
No ratings yet
"Knowing Oneself": "How Well Do I Know Myself?"
20 pages
XRF Theory
No ratings yet
XRF Theory
1 page
Chapter4 Slides
No ratings yet
Chapter4 Slides
42 pages
Lecture03b Overfitting Annotated
No ratings yet
Lecture03b Overfitting Annotated
5 pages
KODAK VISION3 250D 5207 7207 Technical Information
No ratings yet
KODAK VISION3 250D 5207 7207 Technical Information
4 pages
Chapter6 Slides
No ratings yet
Chapter6 Slides
28 pages
Lecture03b Overfitting
No ratings yet
Lecture03b Overfitting
5 pages
A Brief History of Bioplastics
No ratings yet
A Brief History of Bioplastics
8 pages
07PPLapdon - BG - T7 Editted
No ratings yet
07PPLapdon - BG - T7 Editted
21 pages
08 Giaigandung Hephuongtrinh BG Tuan8 Editted
No ratings yet
08 Giaigandung Hephuongtrinh BG Tuan8 Editted
18 pages
Social-Ecological Resilience For The Spatial Planning Process Using A System Dynamics Model: Case Study of Northern Bandung Area, Indonesia
No ratings yet
Social-Ecological Resilience For The Spatial Planning Process Using A System Dynamics Model: Case Study of Northern Bandung Area, Indonesia
20 pages
PH-KMP-PP Presisi Tbk-004-060122
No ratings yet
PH-KMP-PP Presisi Tbk-004-060122
2 pages
Engine
No ratings yet
Engine
33 pages
PE 2 (Prelim Exam - Attempt 1 Review.) PDF
No ratings yet
PE 2 (Prelim Exam - Attempt 1 Review.) PDF
8 pages
Engine Valve Lash - Inspect/Adjust: Testing and Adjusting
No ratings yet
Engine Valve Lash - Inspect/Adjust: Testing and Adjusting
6 pages
SP Datapdf 1664896509845
No ratings yet
SP Datapdf 1664896509845
7 pages