lecture03c_maximum_likelihood
lecture03c_maximum_likelihood
Maximum Likelihood
Martin Jaggi
Last updated on: September 24, 2024
credits to Mohammad Emtiyaz Khan & Rüdiger Urbanke
Motivation
In the previous lecture 3a we arrived at the least-squares
problem in the following way: we postulated a particular
cost function (square loss) and then, given data, found that
model that minimizes this cost function. In the current lec-
ture we will take an alternative route. The final answer will
be the same, but our starting point will be probabilistic. In
this way we find a second interpretation of the least-squares
problem.
140 1200
120
1000
100
800
80
y
600
60
400
40
200
20
0 0
1.2 1.4 1.6 1.8 2 −20 −10 0 10 20
x Error in prediction
Gaussian distribution and independence
Recall the definition of a Gaussian
random variable in R with mean µ
and variance σ 2. It has a density of
2
1 (y − µ)
p(y | µ, σ 2) = N (y | µ, σ 2) = √ exp − 2
.
2πσ 2 2σ
In a similar manner, the density of a
Gaussian random vector with mean
µ and covariance Σ (which must be
a positive semi-definite matrix) is
1 1
N (y | µ, Σ) = p exp − (y − µ)⊤Σ−1(y − µ) .
(2π)D det(Σ) 2
yn = x⊤
n w + ϵn ,
Covariance(wMLE) = F−1(wtrue)
Another example
We can replace Gaussian distribu-
tion by a Laplace distribution.
1 − 1 |yn−x⊤n w|
p(yn | xn, w) = e b
2b