0% found this document useful (0 votes)

10 views8 pages

lecture03c_maximum_likelihood

The lecture discusses an alternative probabilistic approach to the least-squares problem using maximum likelihood estimation (MLE). It explains how the MLE can be derived from the log-likelihood of a Gaussian model and highlights its properties, including consistency, asymptotic normality, and efficiency. Additionally, the lecture briefly mentions the possibility of using a Laplace distribution instead of a Gaussian distribution for modeling.

Uploaded by

Quan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

lecture03c_maximum_likelihood

Uploaded by

Quan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Machine Learning Course - CS-433

Maximum Likelihood

Sept 25, 2024

Martin Jaggi
Last updated on: September 24, 2024
credits to Mohammad Emtiyaz Khan & Rüdiger Urbanke
Motivation
In the previous lecture 3a we arrived at the least-squares
problem in the following way: we postulated a particular
cost function (square loss) and then, given data, found that
model that minimizes this cost function. In the current lec-
ture we will take an alternative route. The final answer will
be the same, but our starting point will be probabilistic. In
this way we find a second interpretation of the least-squares
problem.
140 1200

120
1000

100
800

80
y

600

400
40

200
20

0 0
1.2 1.4 1.6 1.8 2 −20 −10 0 10 20
x Error in prediction
Gaussian distribution and independence
Recall the definition of a Gaussian
random variable in R with mean µ
and variance σ 2. It has a density of
2

1 (y − µ)
p(y | µ, σ 2) = N (y | µ, σ 2) = √ exp − 2
.
2πσ 2 2σ
In a similar manner, the density of a
Gaussian random vector with mean
µ and covariance Σ (which must be
a positive semi-definite matrix) is

1 1
N (y | µ, Σ) = p exp − (y − µ)⊤Σ−1(y − µ) .
(2π)D det(Σ) 2

Also recall that two random vari-

ables X and Y are called indepen-
dent when p(x, y) = p(x)p(y).
A probabilistic model for least-squares
We assume that our data is gener-
ated by the model,

yn = x⊤
n w + ϵn ,

where the ϵn (the noise) is a zero-

mean Gaussian random variable
with variance σ 2 and the noise that
is added to the various samples is
independent of each other, and in-
dependent of the input. Note that
the model w is unknown.
Therefore, given N samples, the
likelihood of the data vector y =
(y1, · · · , yN ) given the input X
(each row is one input) and the
model w is equal to
N
Y N
Y
p(y | X, w) = p(yn | xn, w) = N (yn | x⊤ 2
n w, σ ).
n=1 n=1

The probabilistic view point is that

we should maximize this likelihood
over the choice of model w. I.e., the
“best” model is the one that maxi-
mizes this likelihood.
Defining cost with log-likelihood
Instead of maximizing the likeli-
hood, we can take the logarithm of
the likelihood and maximize it in-
stead. Expression is called the log-
likelihood (LL).
N
1 X
LLL(w) := log p(y | X, w) = − 2 (yn − x⊤
n w) 2
+ cnst.
2σ n=1

Compare the LL to the MSE (mean

squared error)
N
1 X
LLL(w) = − 2 (yn − x⊤ 2
n w) + cnst
2σ n=1
N
1 X
LMSE(w) = (yn − x⊤
n w)
2
2N n=1
Maximum-likelihood estimator (MLE)
It is clear that maximizing the LL is
equivalent to minimizing the MSE:

arg min LMSE(w) = arg max LLL(w).

w w

This gives us another way to design

cost functions.

MLE can also be interpreted as find-

ing the model under which the ob-
served data is most likely to have
been generated from (probabilisti-
cally). This interpretation has some
advantages that we discuss now.
Properties of MLE
MLE is a sample approximation to
the expected log-likelihood:

LLL(w) ≈ Ep(y,x) log p(y | x, w)

MLE is consistent, i.e., it will give

us the correct model assuming that
we have a sufficient amount of data.
(can be proven under some weak condi-
tions)

wMLE −→p wtrue in probability

The MLE is asymptotically normal,

i.e.,
1
(wMLE − wtrue) −→ √ N (wMLE | 0, F−1(wtrue))
d
N
h 2 i
∂ L
where F(w) = −Ep(y) ∂w∂w ⊤ is
the Fisher information.

MLE is efficient, i.e. it achieves the

Cramer-Rao lower bound.

Covariance(wMLE) = F−1(wtrue)
Another example
We can replace Gaussian distribu-
tion by a Laplace distribution.
1 − 1 |yn−x⊤n w|
p(yn | xn, w) = e b
2b

Fundamentals of Applied Probability and Random Processes Second Edition Oliver Ibe download
No ratings yet
Fundamentals of Applied Probability and Random Processes Second Edition Oliver Ibe download
63 pages
lecture1_ml_MLE
No ratings yet
lecture1_ml_MLE
103 pages
Logistic Distribution
No ratings yet
Logistic Distribution
6 pages
Lecture On Stochastic Processes
No ratings yet
Lecture On Stochastic Processes
77 pages
RISC-V Instruction Set Summary
No ratings yet
RISC-V Instruction Set Summary
4 pages
volume2
No ratings yet
volume2
270 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Poisson Distribution
No ratings yet
Poisson Distribution
4 pages
New General Transmuted Family of Distributions With Applications
No ratings yet
New General Transmuted Family of Distributions With Applications
23 pages
MIT18 445S15 Lecture13
No ratings yet
MIT18 445S15 Lecture13
6 pages
Tut3 Questions
No ratings yet
Tut3 Questions
2 pages
P&SM QB for CIE-2
No ratings yet
P&SM QB for CIE-2
2 pages
lecture03c_maximum_likelihood_annotated
No ratings yet
lecture03c_maximum_likelihood_annotated
8 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Meng 2012
No ratings yet
Meng 2012
10 pages
AntI Anisa Fitri - Tugas Latihan SPSS
No ratings yet
AntI Anisa Fitri - Tugas Latihan SPSS
7 pages
OSU5509 U1 Session 01 (1)
No ratings yet
OSU5509 U1 Session 01 (1)
14 pages
LESSON 1.2 Probability Distribution
No ratings yet
LESSON 1.2 Probability Distribution
22 pages
lesson 4 variance and standar deviation of dicrete
No ratings yet
lesson 4 variance and standar deviation of dicrete
15 pages
FAM formula sheet
No ratings yet
FAM formula sheet
16 pages
Bayesian Learning
No ratings yet
Bayesian Learning
21 pages
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
50 pages
Log Reg Skimed.ipynb - Colab
No ratings yet
Log Reg Skimed.ipynb - Colab
10 pages
Regression
No ratings yet
Regression
11 pages
SQL Document - LONGTERM
No ratings yet
SQL Document - LONGTERM
15 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
04_lecturenote_MLE_MAP_discriminative
No ratings yet
04_lecturenote_MLE_MAP_discriminative
6 pages
1332 7369 1 PB
No ratings yet
1332 7369 1 PB
33 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
soln-1
No ratings yet
soln-1
8 pages
lecture03b_overfitting_annotated
No ratings yet
lecture03b_overfitting_annotated
5 pages
lecture03b_overfitting
No ratings yet
lecture03b_overfitting
5 pages
04- Linear-Classification-2024
No ratings yet
04- Linear-Classification-2024
65 pages
ml-3
No ratings yet
ml-3
66 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
ML_basics_lecture2_linear_classification
No ratings yet
ML_basics_lecture2_linear_classification
34 pages
Chapter6 Slides
No ratings yet
Chapter6 Slides
28 pages
R23 P & S UNIT 3 MATERIAL
No ratings yet
R23 P & S UNIT 3 MATERIAL
23 pages
Chapter4 Slides
No ratings yet
Chapter4 Slides
42 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
Normal Distribution
No ratings yet
Normal Distribution
3 pages
Efek Mediasi Work Engagement Dalam Pengaruh Job CH
No ratings yet
Efek Mediasi Work Engagement Dalam Pengaruh Job CH
10 pages
Session On Maximum Likelihood Estimation
No ratings yet
Session On Maximum Likelihood Estimation
15 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
Lesson - 11
No ratings yet
Lesson - 11
50 pages
07PPLapdon - BG - T7 Editted
No ratings yet
07PPLapdon - BG - T7 Editted
21 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
08 Giaigandung Hephuongtrinh BG Tuan8 Editted
No ratings yet
08 Giaigandung Hephuongtrinh BG Tuan8 Editted
18 pages
Machine Learning and Pattern Recognition Week 10 - Bayes - Logistic - Regression
No ratings yet
Machine Learning and Pattern Recognition Week 10 - Bayes - Logistic - Regression
4 pages
Toon signal bolowsruulalt A.Гэрэлсайхан
No ratings yet
Toon signal bolowsruulalt A.Гэрэлсайхан
14 pages
Bishop CH 3 Notes
No ratings yet
Bishop CH 3 Notes
6 pages
Simulation
No ratings yet
Simulation
59 pages
CSE - M (CSE) 401 - PROBABILITY AND STATISTICS - R21 - Booklet
No ratings yet
CSE - M (CSE) 401 - PROBABILITY AND STATISTICS - R21 - Booklet
2 pages
PTSP - MLRS - R22 - II - I - ECE - Syllabus
No ratings yet
PTSP - MLRS - R22 - II - I - ECE - Syllabus
2 pages
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
6 pages
Chapter02-Introduction-to-DeepLearning
No ratings yet
Chapter02-Introduction-to-DeepLearning
84 pages
output_23
No ratings yet
output_23
6 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Lec 05
No ratings yet
Lec 05
53 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
output_25
No ratings yet
output_25
8 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
Econometrics - Exercise set 2 (solution)
No ratings yet
Econometrics - Exercise set 2 (solution)
12 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Inf Theory 3
No ratings yet
Inf Theory 3
76 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
week2
No ratings yet
week2
43 pages
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
32 pages
Machine Learning - Logistic Regression
No ratings yet
Machine Learning - Logistic Regression
16 pages
ML-chap10_2024_110300
No ratings yet
ML-chap10_2024_110300
29 pages
Probability Slides
No ratings yet
Probability Slides
12 pages
dis1
No ratings yet
dis1
5 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
CH 1
No ratings yet
CH 1
24 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Class 4 SP
No ratings yet
Class 4 SP
23 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Chapter 6 PDF Lecture Notes
No ratings yet
Chapter 6 PDF Lecture Notes
41 pages
Financial Modelling Notes
No ratings yet
Financial Modelling Notes
45 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
No ratings yet
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
15 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

lecture03c_maximum_likelihood

Uploaded by

lecture03c_maximum_likelihood

Uploaded by

Machine Learning Course - CS-433

Sept 25, 2024

Also recall that two random vari-

where the ϵn (the noise) is a zero-

The probabilistic view point is that

Compare the LL to the MSE (mean

arg min LMSE(w) = arg max LLL(w).

This gives us another way to design

MLE can also be interpreted as find-

MLE is consistent, i.e., it will give

wMLE −→p wtrue in probability

The MLE is asymptotically normal,

MLE is efficient, i.e. it achieves the

You might also like