0% found this document useful (0 votes)

34 views24 pages

21 Mle

The document provides information about maximum likelihood estimation (MLE). MLE involves estimating parameters of a statistical model by finding values that maximize the likelihood function based on observed data. The document discusses: 1) Using MLE to estimate the probability of heads for a coin based on observed flip outcomes. 2) Estimating both the mean and variance parameters of a normal distribution based on independent samples, which requires taking partial derivatives with respect to each parameter. 3) Key steps in MLE including taking the log-likelihood, finding derivatives, setting derivatives equal to zero, and using the second derivative test to confirm a maximum.

Uploaded by

alexpaulirungu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views24 pages

21 Mle

Uploaded by

alexpaulirungu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Maximum Likelihood CSE 312 Summer 21

Estimation Lecture 21
Important Dates!

• Real World 2 – Wednesday, Aug 11

• Review Summary 3 – Friday, Aug 13
• Problem Set 7 – Monday, Aug 16
• Final Released – Friday, Aug 13
• Final Due & Key Released – Tuesday, Aug 17
Asking The Opposite Question
So far:
Give you rules for an experiment.
Give you the event/outcome we’re interested in.
You calculate/estimate/bound what the probability is.
Today:
Give you some of the rules of the experiment.
Tell you what happened.
You estimate what the rest of the rules of the experiment were.
Example
Suppose you flip a coin independently 10 times, and you see

HTTTHHTHHH

What is your estimate of the probability the coin comes up heads?

a) 2/5
b) 1/2
c) 3/5 Fill out the poll everywhere so
d) 55/100 Kushal knows how long to explain
Go to pollev.com/cse312su21
Maximum Likelihood Estimation
Idea: we got the results we got.
High probability events happen more often than low probability events.
So, guess the rules that maximize the probability of the events we saw
(relative to other choices of the rules).

Since that event happened, might as well guess the set of rules for
which that event was most likely.
Maximum Likelihood Estimation
Formally, we are trying to estimate a parameter of the experiment (here:
the probability of a coin flip being heads).
The likelihood of an event 𝐸 given a parameter 𝜃 is
ℒ(𝐸; 𝜃) is ℙ(𝐸) when the experiment is run with 𝜃
We’ll use the notation ℙ(𝐸; 𝜃) for probability when run with parameter 𝜃
where the semicolon means “extra rules” rather than conditioning

We will choose 𝜃መ = argmax𝜃 ℒ(𝐸; 𝜃)

argmax is the argument that produces the maximum so the 𝜃 that
causes ℒ(𝐸; 𝜃) to be maximized.
Notation comparison
ℙ(𝑋|𝑌) probability of 𝑋, conditioned on the event 𝑌 having happened
(𝑌 is a subset of the sample space)
ℙ(𝑋; 𝜃) probability of 𝑋, where to properly define our probability space
we need to know the extra piece of information 𝜃. Since 𝜃 isn’t an event,
this is not conditioning
ℒ(𝑋; 𝜃) the likelihood of event 𝑋, given that an experiment was run with
parameter 𝜃. Likelihoods don’t have all the properties we associate with
probabilities (e.g. they don’t all sum up to 1) and this isn’t conditioning
on an event (𝜃 is a parameter/rule of how the event could be
generated).
MLE

Maximum Likelihood Estimator

The maximum likelihood estimator of the parameter 𝜃 is:

𝜃መ = argmax𝜃 ℒ(𝐸; 𝜃)

𝜃 is a variable, 𝜃መ is a number (or formula given the event).

We’ll also use the notation 𝜃መMLE if we want to emphasize how we found
this estimator.
The Coin Example
ℒ(𝐸; 𝜃)

ℒ(HTTTHHTHHH ; 𝜃) = 𝜃 6 1 − 𝜃 4

Where is 𝜃 maximized?
How do we usually find a maximum?
Calculus!!
𝑑 6 𝜃
𝜃 1−𝜃 4 = 6𝜃 5 1 − 𝜃 4 − 4𝜃 6 1 − 𝜃 3
𝑑𝜃
Set equal to 0 and solve
4 3 3
6𝜃෠ 5 1 − 𝜃෠ − 4𝜃෠ 6 1 − 𝜃෠ ෠ ෠ ෠ ෠
= 0 ⇒ 6 1 − 𝜃 − 4𝜃 = 0 ⇒ −10𝜃 = −6 ⇒ 𝜃 =
5
The Coin Example
For this problem, 𝜃 must be in the closed interval [0,1]. Since ℒ() is a
continuous function, the maximum must occur at and endpoint or
where the derivative is 0.

Evaluate ℒ(⋅; 0) = 0, ℒ(⋅; 1) = 0

at 𝜃 = 0.6 we get a positive value,
so 𝜃 = 0.6 is the maximizer on the interval [0,1].
Maximizing a Function
CLOSED INTERVALS SECOND DERIVATIVE TEST
Set derivative equal to 0 and Set derivative equal to 0 and
solve. solve.
Evaluate likelihood at endpoints Take the second derivative. If
and any critical points. negative everywhere, then the
critical point is the maximizer.
Maximum value must be
maximum on that interval.
A Math Trick
We’re going to be taking the derivative of products a lot.
The product rule is not fun. There has to be a better way!
Take the log!
ln 𝑎 ⋅ 𝑏 = ln 𝑎 + ln(𝑏)
We don’t need the product rule if our expression is a sum!

Can we still take the max? ln() is an increasing function, so

argmaxθ ln ℒ(𝐸; 𝜃) = argmaxθ ℒ(𝐸; 𝜃)
Coin flips is easier
ℒ(HTTTHHTHHH; 𝜃) = 𝜃 6 1 − 𝜃 4

ln(ℒ(HTTTHHTHHH; 𝜃) = 6 ln 𝜃 + 4 ln(1 − 𝜃)
𝑑 6 4
ln ℒ ⋅ = −
𝑑𝜃 𝜃 1−𝜃
Set to 0 and solve:
6 4 6 4 3
መ መ
− ෡ = 0 ⇒ ෡ = ෡ ⇒ 6 − 6𝜃 = 4𝜃 ⇒ 𝜃 = መ
෡
𝜃 1−𝜃 𝜃 1−𝜃 5
𝑑2 −6 4
= 2− < 0 everywhere, so any critical point must be a
𝑑𝜃 2 𝜃 1−𝜃 2
maximum.
What about continuous random variables?
Can’t use probability, since the probability is going to be 0.
Can use the density!
It’s supposed to show relative chances, that’s all we’re trying to find
anyway.

ℒ(𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ; 𝜽) = ∏𝒏𝒊=𝟏 𝒇𝑿 (𝒙𝒊 ; 𝜽)

Continuous Example
Suppose you get values 𝑥1 , 𝑥2 , … 𝑥𝑛 from independent draws of a
normal random variable 𝒩(𝜇, 1) (for 𝜇 unkown)
We’ll also call these “realizations” of the random variable.

𝑛 1 1
ℒ(𝑥𝑖 ; 𝜇) = ∏𝑖=1 exp − 𝑥𝑖 − 𝜇 2
2𝜋 2
1 1
ln(ℒ(𝑥𝑖 ; 𝜇)) = σ𝑛𝑖=1 ln − 𝑥𝑖 − 𝜇 2
2𝜋 2
Finding 𝜇ො
1 1
ln ℒ = σ𝑛𝑖=1 ln − 𝑥𝑖 − 𝜇 2
2𝜋 2
𝑑
ln ℒ = σ𝑛𝑖=1 𝑥𝑖 − 𝜇
𝑑𝜇

Setting 𝜇 = 0 and solving:

σ𝑛
𝑖=1 𝑥𝑖
σ𝑛𝑖=1 𝑥𝑖 − 𝜇Ƹ = 0 ⇒ σ𝑛𝑖=1 𝑥𝑖 = 𝜇Ƹ ⋅ 𝑛 ⇒ 𝜇Ƹ =
𝑛
Check using the second derivative test:
𝑑2
2 ln(ℒ) = −𝑛
𝑑𝜇

Second derivative is negative everywhere, so log-likelihood is concave down

and average of the 𝑥𝑖 is a maximizer.
Summary
Given: an event 𝐸 (usually 𝑛 i.i.d. samples from a distribution with
unknown parameter 𝜃).
1. Find likelihood ℒ(𝐸; 𝜃)
Usually ∏ℙ (𝑥𝑖 ; 𝜃) for discrete and ∏𝑓(𝑥𝑖 ; 𝜃) for continuous
2. Maximize the likelihood. Usually:
A. Take the log (if it will make the math easier)
B. Take the derivative
C. Set the derivative to 0 and solve
3. Use the second derivative test to confirm you have a maximizer
Two Parameter Estimation
Two Parameter Estimation Setup
We just saw that to estimate 𝜇 for 𝒩(𝜇, 1) we get:
σ𝑛
𝑖=1 𝑥𝑖
𝜇ො =
𝑛

Now what happens if we know our data is 𝒩() but nothing else. Both
the mean and the variance are unknown.
Log-likelihood
Let 𝜃𝜇 and 𝜃𝜎2 be the unknown mean and standard deviation of a
normal distribution. Suppose we get independent draws 𝑥1 , 𝑥2 , … , 𝑥𝑛 .

2
𝑛 1 1 𝑥𝑖 −𝜃𝜇
ℒ 𝑥1 , … , 𝑥𝑛 ; 𝜃𝜇 , 𝜃𝜎2 = ∏𝑖=1 exp − ⋅
𝜃𝜎2 2𝜋 2 𝜃𝜎2

2
1 1 𝑥𝑖 −𝜃𝜇
ln ℒ 𝑥𝑖 ; 𝜃𝜇 , 𝜃𝜎2 = σ𝑛𝑖=1 ln − ⋅
𝜃𝜎2 2𝜋 2 𝜃𝜎2
Expectation Arithmetic is nearly
identical to known
1 1 𝑥𝑖 −𝜃𝜇
2 variance case.
ln ℒ 𝑥𝑖 ; 𝜃𝜇 , 𝜃𝜎2 = σ𝑛𝑖=1 ln − ⋅
𝜃𝜎2 2𝜋 2 𝜃𝜎2

𝜕 𝑛 𝑥𝑖 −𝜃𝜇
ln ℒ = σ𝑖=1
𝜕𝜃𝜇 𝜃𝜎2
Setting equal to 0 and solving
𝑥𝑖 −𝜃෢𝜇 σ𝑛
𝑖=1 𝑥𝑖
σ𝑛𝑖=1 ෢
= 0 ⇒ σ𝑛𝑖=1 𝑥𝑖 − 𝜃𝜇 = 0 ⇒ σ 𝑛 ෢ ෢
𝑖=1 𝑥𝑖 = 𝑛 ⋅ 𝜃𝜇 ⇒ 𝜃𝜇 =
𝜃𝜎2 𝑛
𝜕2 𝑛
2 = −
𝜕𝜃𝜇 𝜃𝜎2
𝜃𝜎2 is an estimate of a variance. It’ll never be negative (and as long as the
draws aren’t identical it won’t be 0). So, the second derivative is negative, and
we really have a maximizer.
Variance
2
1 1 𝑥𝑖 −𝜃𝜇
ln ℒ 𝑥𝑖 ; 𝜃𝜇 , 𝜃𝜎2 = σ𝑛𝑖=1 ln − ⋅
𝜃𝜎2 2𝜋 2 𝜃𝜎2
2
𝑛 1 1 1 𝑥𝑖 −𝜃𝜇
= σ𝑖=1 − ln 𝜃𝜎 2 − ln(2𝜋) − ⋅
2 2 2 𝜃𝜎2
𝑛 𝑛⋅ln 2𝜋 1 2
= − ln 𝜃𝜎2 − − σ𝑛𝑖=1 𝑥𝑖 − 𝜃𝜇
2 2 2𝜃𝜎2

𝜕 𝑛 1 𝑛 2
ln ℒ = − + 2 σ𝑖=1 𝑥𝑖 − 𝜃𝜇
𝜕𝜃𝜎2 2𝜃𝜎2 2 𝜃𝜎2
Variance
𝜕 𝑛 1 𝑛 2
ln ℒ = − + 2 σ𝑖=1 𝑥𝑖 − 𝜃𝜇
𝜕𝜃𝜎2 2𝜃𝜎2 2 𝜃𝜎2
𝑛 1 𝑛 2
− ෢ + 2 σ𝑖=1 𝑥𝑖 − 𝜃𝜇 =0
2 𝜃𝜎 2 2 𝜃෢
𝜎2
𝑛 1 𝑛 2 2
⇒ ෢
− 𝜃𝜎2 + σ𝑖=1 𝑥𝑖 − 𝜃𝜇 = 0 (multiply by 𝜃𝜎2 )
෢
2 2
1 𝑛 2
⇒ ෢
𝜃𝜎2 = σ𝑖=1 𝑥𝑖 − 𝜃𝜇
𝑛

To get the overall max

We’ll plug in 𝜃
෢𝜇
Summary
If you get independent samples 𝑥1 , 𝑥2 , … , 𝑥𝑛 from a 𝒩(𝜇, 𝜎 2 ) where 𝜇
and 𝜎 2 are unknown, the maximum likelihood estimates of the normal is:

σ𝑛
𝑖=1 𝑥𝑖 1 𝑛 2
෢
𝜃𝜇 = and 𝜃𝜎2 = σ𝑖=1 𝑥𝑖 − 𝜃
෢ ෢𝜇
𝑛 𝑛

The maximum likelihood estimator of the mean is the sample mean that
is the estimate of 𝜇 is the average value of all the data points.
The MLE for the variance is: the variance of the experiment “choose one
of the 𝑥𝑖 at random”

A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
EMSX
No ratings yet
EMSX
16 pages
21 Mle
No ratings yet
21 Mle
24 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Msqe Metrics 1 ps2
No ratings yet
Msqe Metrics 1 ps2
11 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
Probability and Statistics
No ratings yet
Probability and Statistics
4 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Maximum
No ratings yet
Maximum
3 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
ML Notes
No ratings yet
ML Notes
4 pages
Maximum Likelihood Notes1
No ratings yet
Maximum Likelihood Notes1
10 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Inf 2
No ratings yet
Inf 2
37 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
MIT18 05S14 Reading10b PDF
No ratings yet
MIT18 05S14 Reading10b PDF
9 pages
A Guide To Modern Econometrics by Verbeek 181 190
No ratings yet
A Guide To Modern Econometrics by Verbeek 181 190
10 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
T 3 Estimation
No ratings yet
T 3 Estimation
20 pages
Chapte 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapte 2 - Maximum Likelihood - HEC - Lausanne
276 pages
7 Mle
No ratings yet
7 Mle
31 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
MLE Dan Bayesian Estimation From Walpole Book
No ratings yet
MLE Dan Bayesian Estimation From Walpole Book
13 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
EM Algorithm
No ratings yet
EM Algorithm
10 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Topic 14: Maximum Likelihood Estimation: 1 Examples
No ratings yet
Topic 14: Maximum Likelihood Estimation: 1 Examples
6 pages
Chapter 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapter 2 - Maximum Likelihood - HEC - Lausanne
277 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
18.650 Statistics For Applications
No ratings yet
18.650 Statistics For Applications
25 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Probability Concepts Explained
No ratings yet
Probability Concepts Explained
10 pages
MAP&MLE
No ratings yet
MAP&MLE
44 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
Introduction To MME
No ratings yet
Introduction To MME
4 pages
Frequentist Estimation: 4.1 Likelihood Function
No ratings yet
Frequentist Estimation: 4.1 Likelihood Function
6 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
Lectura 1 Point Estimation
No ratings yet
Lectura 1 Point Estimation
47 pages
Imp - Maximum Likelihood Estimation - STAT 414 - 415
No ratings yet
Imp - Maximum Likelihood Estimation - STAT 414 - 415
8 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Chapter4 Estimation
No ratings yet
Chapter4 Estimation
28 pages
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Calculus Super Review
From Everand
Calculus Super Review
Editors of REA
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Effective Requirements Writing and Resume
No ratings yet
Effective Requirements Writing and Resume
10 pages
Taj Hotel Easy Inn Keeping Hotel Software
No ratings yet
Taj Hotel Easy Inn Keeping Hotel Software
8 pages
RDBMS Study Material (K.G)
No ratings yet
RDBMS Study Material (K.G)
128 pages
ITpl
100% (1)
ITpl
98 pages
Duolingo Cartoon - Google Search
No ratings yet
Duolingo Cartoon - Google Search
1 page
Silicon Controlled Rectifiers (SCR) Specifications
No ratings yet
Silicon Controlled Rectifiers (SCR) Specifications
3 pages
Books
0% (1)
Books
4 pages
List of Dissertation Topics in Commerce
100% (2)
List of Dissertation Topics in Commerce
7 pages
SL200 Mobile Datalink SIMPULSE Datasheet
No ratings yet
SL200 Mobile Datalink SIMPULSE Datasheet
2 pages
100 MCQ Questions For Practice
No ratings yet
100 MCQ Questions For Practice
35 pages
CH 07
No ratings yet
CH 07
29 pages
TIGER
No ratings yet
TIGER
4 pages
Office Procedure I Notes
No ratings yet
Office Procedure I Notes
43 pages
React Hooks - 2024
No ratings yet
React Hooks - 2024
4 pages
Sv9500 Data Sheet
No ratings yet
Sv9500 Data Sheet
2 pages
3GPP TS 33.107 - Lawful Interception Architecture and Functions
No ratings yet
3GPP TS 33.107 - Lawful Interception Architecture and Functions
404 pages
R.Practical 12th
No ratings yet
R.Practical 12th
14 pages
Infineon ClassDaudio - MERUS - MA2304 ProductBrief v01 - 00 EN
No ratings yet
Infineon ClassDaudio - MERUS - MA2304 ProductBrief v01 - 00 EN
2 pages
AIReport - 2024-12-24T125125.138
No ratings yet
AIReport - 2024-12-24T125125.138
16 pages
Control Systems: Gate Classes
No ratings yet
Control Systems: Gate Classes
27 pages
Intelligent Sustainable Systems Proceedings of ICISS 2022 Jennifer S Raj Yong Shi Danilo Pelusi Valentina Emilia Balas Eds
No ratings yet
Intelligent Sustainable Systems Proceedings of ICISS 2022 Jennifer S Raj Yong Shi Danilo Pelusi Valentina Emilia Balas Eds
74 pages
Appliance Family Positioning Chart
No ratings yet
Appliance Family Positioning Chart
1 page
Teaching Computer Architecture Organisation Using Simulators
No ratings yet
Teaching Computer Architecture Organisation Using Simulators
6 pages
3rd - 5th Year Mulungushi Verification - 231020 - 115956
No ratings yet
3rd - 5th Year Mulungushi Verification - 231020 - 115956
26 pages
160321-PITE 3836 Ground-Fault Locator - User Manual - V1.5
No ratings yet
160321-PITE 3836 Ground-Fault Locator - User Manual - V1.5
33 pages
Modbus Communication Protocol
No ratings yet
Modbus Communication Protocol
3 pages
The New Global Marketing: Global Products and Brands
No ratings yet
The New Global Marketing: Global Products and Brands
51 pages
FlashcatUSB Manual
No ratings yet
FlashcatUSB Manual
39 pages

21 Mle

Uploaded by

21 Mle

Uploaded by

Maximum Likelihood CSE 312 Summer 21

• Real World 2 – Wednesday, Aug 11

What is your estimate of the probability the coin comes up heads?

We will choose 𝜃መ = argmax𝜃 ℒ(𝐸; 𝜃)

Maximum Likelihood Estimator

𝜃 is a variable, 𝜃መ is a number (or formula given the event).

Evaluate ℒ(⋅; 0) = 0, ℒ(⋅; 1) = 0

Can we still take the max? ln() is an increasing function, so

ℒ(𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ; 𝜽) = ∏𝒏𝒊=𝟏 𝒇𝑿 (𝒙𝒊 ; 𝜽)

Setting 𝜇 = 0 and solving:

Second derivative is negative everywhere, so log-likelihood is concave down

To get the overall max

You might also like