0% found this document useful (0 votes)

16 views5 pages

L08 MaximumLikelihoodEstimation

This document reviews the concept of Maximum Likelihood Estimation (MLE) and demonstrates that the sample mean is the MLE of the mean parameter µ in a Gaussian distribution. It explains the likelihood function, the process of maximizing it, and provides examples for both Gaussian and Bernoulli distributions. The MLE for the Bernoulli distribution is shown to be the proportion of ones in the data.

Uploaded by

Ed Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

L08 MaximumLikelihoodEstimation

Uploaded by

Ed Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Maximum Likelihood Estimation

Foundations of Data Analysis

March 4, 2021

The purpose of these notes is to review the definition of a maximum likelihood estimate (MLE), and show
that the sample mean is the MLE of the µ parameter in a Gaussian. For more details about MLEs, see the
Wikipedia article:
https://en.wikipedia.org/wiki/Maximum_likelihood
Consider a random sample X1 , X2 , . . . , Xn coming from a distribution with parameter θ (for example, they
could be from a Gaussian distribution with parameter µ). Remember the terminology “random sample”
means that Xi random variables are independent and identically distributed (i.i.d.). Furthermore, let’s
assume that each Xi has a probability density function pXi (x; θ). Given a realization of our random sample,
x1 , x2 , . . . , xn , (remember, these are the actual numbers that we have observed), we define the likelihood
function L(θ) as follows:
L(θ) = pX1 ,...,Xn (x1 , x2 , . . . , xn ; θ),
Yn
= pXi (xi ; θ), using independence of the Xi .
i=1

Here, pX1 ,...,Xn is the joint pdf for all of the Xi variables. This pdf depends on the value of the parameter θ
for the distribution, so that is in the notation after the semicolon. Notice an important point, we are treating
the xi as constants (they are the data that we’ve observed) and L is a function of θ. Maximum likelihood
now says that we want to maximize this likelihood function as a function of θ.

MLE of Gaussian mean parameter, µ

Now, let’s work this out for the Gaussian case, i.e., let X1 , X2 , . . . , Xn ∼ N (µ, σ 2 ). We will focus only on
the MLE of the µ parameter, essentially treating σ 2 as a known constant for simplicity of the example. The
likelihood function looks like this:
L(µ) = pX1 ,...,Xn (x1 , x2 , . . . , xn ; µ),
Yn
= pXi (xi ; θ), using independence of the Xi ,
i=1
n
Y 1 1
= √ exp − 2 (xi − µ)2 , using Gaussian pdf for each Xi ,
i=1
2π 2σ
n n
!
1 1 X
= √ exp − 2 (xi − µ)2 , product turns into a sum inside exp.
2πσ 2σ i=1

To maximize this function, it is easier to think about maximizing it’s natural log. We can do this because ln
is a monotonically increasing function, so the value of µ that maximizes L also maximizes ln L. So, the log
likelihood function is defined as
n
1 X
`(µ) = ln L(µ) = − (xi − µ)2 + C,
2σ 2 i=1

where C is a constant in µ (we don’t need it to maximize `). Now, defining our estimate of µ to maximize
the log likelihood, we get
Xn
µ̂ = arg max `(µ) = arg min (xi − µ)2 .
µ µ
i=1

1
Notice we changed the sign in the last equality, and this changes us from a max to a min problem. This is
called least squares, as we are minimizing the sum-of-squared differences from the µ to our data xi . We can
solve this maximization problem exactly using the fact (from calculus) that the derivative of ` with respect
to µ will be zero at a maxima. We get
n n
d d X X
0= `(µ) = (xi − µ)2 = 2nµ − 2 xi .
dµ dµ i=1 i=1

Solving for µ, we get the sample mean as the MLE:

n
1X
µ̂ = xi .
n i=1

Here are some plots demonstrating the above MLE of the mean of a Gaussian. First, we generated a random
sample, x1 , . . . , x20 from a normal distribution with µ = 3, σ = 1.
Next, we plot the likelihood functions, p(xi ; µ), for each of the points separately. Note that the xi points are
plotted on the bottom (x-axis) and each one has its own Gaussian pdf “hill” centered above it. These are the
p(xi ; µ).

Individual Likelihoods Per Point

0.4
0.3
L(µ ; xi)
0.2
0.1
0.0

0 1 2 3 4 5 6

Next, we plot the likelihood function for all of the data, which is just the product of all of the p(xi ; µ). The
vertical line is at the average of the xi data. You can see that the maximum of the likelihood curve is indeed
at the average.

2
4e−12 Likelihood Function
L(µ ; x)
2e−12
0e+00

0 1 2 3 4 5 6

Finally, we plot the log-likelihood function (the log of the previous plot, which is just a quadratic). The
maximum is still at the same place.

3
−2
−3 Average Log−Likelihood Function
l(µ ; x)
−4
−5
−6

0 1 2 3 4 5 6

MLE of a Bernoulli probability

The Bernoulli distribution is the binary variable distribution. If now our random variables Xi are binary
variables, the notation is Xi ∼ Ber(θ). The parameter θ gives the probability that Xi is a one. In other
words:
P (Xi = 1) = θ,
P (Xi = 0) = 1 − θ

Now, what is the MLE for θ? The likelihood for a single xi is:

p(xi ; θ) = θxi (1 − θ)1−xi

Notice this is θ when xi = 1 and 1 − θ when xi = 0. Now the joint likelihood of all xi is just the product of
these individual likelihoods:
L(θ) = p(x1 , . . . , xn ; θ)
= p(x1 ; θ) × p(x2 ; θ) × · · · × p(xn ; θ)
Yn
= θxi (1 − θ)1−xi
i=1
P P
xi
=θ i (1 − θ) i (1−xi )
n
X
= θk (1 − θ)n−k , where k = xi
i=1

4
To maximize L(θ), we can take the derivative (without first taking log this time):

dL
= kθk−1 (1 − θ)n−k − (n − k)θk (1 − θ)n−k−1
dθ
= (k(1 − θ) − (n − k)θ) θk−1 (1 − θ)n−k−1
= (k − nθ) θk−1 (1 − θ)n−k−1

Setting this to zero (dL/dθ = 0), and then solving for θ, gives the maximum likelihood estimate:

k
θ̂ = .
n

This is what we intuitively expect. The value k is the number of ones appearing in our data, so θ̂ is the
proportion of ones in our data.

Introducing The Binomial Distribution - Solutions PDF
25% (4)
Introducing The Binomial Distribution - Solutions PDF
2 pages
Z Table
0% (1)
Z Table
1 page
Soln 1
No ratings yet
Soln 1
8 pages
ST466 Jan2024
No ratings yet
ST466 Jan2024
5 pages
Lectures HD
No ratings yet
Lectures HD
301 pages
Chapte 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapte 2 - Maximum Likelihood - HEC - Lausanne
276 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Math403 - 5.0 Joint Probability Distribution
No ratings yet
Math403 - 5.0 Joint Probability Distribution
21 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Commodity
No ratings yet
Commodity
91 pages
663 Detecting 2021
No ratings yet
663 Detecting 2021
78 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
STAT 2006 Chapter 2 - 2022
No ratings yet
STAT 2006 Chapter 2 - 2022
83 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
Commodity February 11 2005
No ratings yet
Commodity February 11 2005
57 pages
Module 05
No ratings yet
Module 05
94 pages
Probability Statistics Notes
No ratings yet
Probability Statistics Notes
59 pages
Slides Week4
No ratings yet
Slides Week4
37 pages
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
50 pages
Lehman GPR
No ratings yet
Lehman GPR
68 pages
WINSEM2023-24 MAT2001 ETH VL2023240505528 2024-02-27 Reference-Material-I
No ratings yet
WINSEM2023-24 MAT2001 ETH VL2023240505528 2024-02-27 Reference-Material-I
69 pages
L06 Vectors
No ratings yet
L06 Vectors
26 pages
MAP&MLE
No ratings yet
MAP&MLE
44 pages
Lecture 36
No ratings yet
Lecture 36
22 pages
Homework 5
No ratings yet
Homework 5
1 page
36-325/725 Homework 1 Due Thursday Aug 29 (1) Chapter 2 Problem 1
No ratings yet
36-325/725 Homework 1 Due Thursday Aug 29 (1) Chapter 2 Problem 1
1 page
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
Section 5
No ratings yet
Section 5
18 pages
L18 Backprop
No ratings yet
L18 Backprop
18 pages
7 Mle
No ratings yet
7 Mle
31 pages
Statistics and Probability 4th Quarter Part 1
No ratings yet
Statistics and Probability 4th Quarter Part 1
29 pages
Inf 2
No ratings yet
Inf 2
37 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
663 Topics
No ratings yet
663 Topics
12 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Chapter 6 (Sampling Distribution)
No ratings yet
Chapter 6 (Sampling Distribution)
19 pages
Commodity Misperceptions January 30 2017
No ratings yet
Commodity Misperceptions January 30 2017
9 pages
All Ex Sol
No ratings yet
All Ex Sol
43 pages
Stat100b Maximum Likelihood
No ratings yet
Stat100b Maximum Likelihood
9 pages
2 Normal Distribution
No ratings yet
2 Normal Distribution
17 pages
Connor Sensible Return Forecasting 1997
No ratings yet
Connor Sensible Return Forecasting 1997
8 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
Probability Distributions
No ratings yet
Probability Distributions
12 pages
Mean and Variance of Discrete Random Variable Grade 11
No ratings yet
Mean and Variance of Discrete Random Variable Grade 11
19 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
PSPP - Chapter 3
No ratings yet
PSPP - Chapter 3
22 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
We I Bull Analysis
No ratings yet
We I Bull Analysis
72 pages
Palisade - Distribution Fitting
No ratings yet
Palisade - Distribution Fitting
7 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
SB Assignment
No ratings yet
SB Assignment
13 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Statistics and Probability 3rd Quarter 3rd Assessment
No ratings yet
Statistics and Probability 3rd Quarter 3rd Assessment
6 pages
Chance Constr
No ratings yet
Chance Constr
22 pages
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
No ratings yet
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
9 pages
Poisson
No ratings yet
Poisson
7 pages
CS - SE 3341 Probability and Statistics in Computer Science (M. Baron) Fall 2010
No ratings yet
CS - SE 3341 Probability and Statistics in Computer Science (M. Baron) Fall 2010
3 pages
2.4 Discrete Probability Distributions: N N EX I N
No ratings yet
2.4 Discrete Probability Distributions: N N EX I N
16 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Biostatistics: School of Mathematics and Statistics
No ratings yet
Biostatistics: School of Mathematics and Statistics
17 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Simulating Maximum Likelihood Estimators: Corbin Miller Stat 342 February 14, 2011
No ratings yet
Simulating Maximum Likelihood Estimators: Corbin Miller Stat 342 February 14, 2011
5 pages
Binomial Distribution Problems
No ratings yet
Binomial Distribution Problems
3 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
EE250: Probability, Random Variables and Stochastic Processes
No ratings yet
EE250: Probability, Random Variables and Stochastic Processes
6 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
ML Notes
No ratings yet
ML Notes
4 pages
Chap2 Multivariate Normal and Related Distributions
No ratings yet
Chap2 Multivariate Normal and Related Distributions
18 pages
Poisson Distribution: C 2nC" Q) 2n
No ratings yet
Poisson Distribution: C 2nC" Q) 2n
9 pages
Semana 12
No ratings yet
Semana 12
5 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Department of Mathematics MAL 108 (Introduction To Statistics) Tutorial Sheet No. 6 (Sampling Distribution)
No ratings yet
Department of Mathematics MAL 108 (Introduction To Statistics) Tutorial Sheet No. 6 (Sampling Distribution)
2 pages
MIT18 05S14 Reading10b PDF
No ratings yet
MIT18 05S14 Reading10b PDF
9 pages
Topics To Know For Exams: CHAPTER 1 (1.1 - 1.4) - Summarizing Data Graphically and Numerically
No ratings yet
Topics To Know For Exams: CHAPTER 1 (1.1 - 1.4) - Summarizing Data Graphically and Numerically
6 pages
MLE Dan Bayesian Estimation From Walpole Book
No ratings yet
MLE Dan Bayesian Estimation From Walpole Book
13 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Maximum
No ratings yet
Maximum
3 pages
Topic 14: Maximum Likelihood Estimation: 1 Examples
No ratings yet
Topic 14: Maximum Likelihood Estimation: 1 Examples
6 pages
Probability and Mathematical Statistics I: Lectures Instructor Office Extension Email Web-Site Text
No ratings yet
Probability and Mathematical Statistics I: Lectures Instructor Office Extension Email Web-Site Text
3 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
The Lognormal Distribution: X Is Said To Have The
No ratings yet
The Lognormal Distribution: X Is Said To Have The
3 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Indian Institute of Technology, Kharagpur:: X"' P (3 X N (O, ?
No ratings yet
Indian Institute of Technology, Kharagpur:: X"' P (3 X N (O, ?
2 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

L08 MaximumLikelihoodEstimation

Uploaded by

L08 MaximumLikelihoodEstimation

Uploaded by

Maximum Likelihood Estimation

Foundations of Data Analysis

MLE of Gaussian mean parameter, µ

Solving for µ, we get the sample mean as the MLE:

Individual Likelihoods Per Point

MLE of a Bernoulli probability

p(xi ; θ) = θxi (1 − θ)1−xi

You might also like