0% found this document useful (0 votes)
28 views19 pages

CPSC340: Entropy and Maximum Likelihood

This document outlines a lecture on maximum likelihood and entropy. It introduces maximum likelihood as a strategy for learning parameters from data by choosing parameters that make the observed data most probable. Maximum likelihood is applied to Bernoulli random variables by differentiating the log likelihood and setting it equal to zero. Entropy is also introduced as a measure of uncertainty in a random variable. The next lecture will cover Bayesian learning.

Uploaded by

bzsahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views19 pages

CPSC340: Entropy and Maximum Likelihood

This document outlines a lecture on maximum likelihood and entropy. It introduces maximum likelihood as a strategy for learning parameters from data by choosing parameters that make the observed data most probable. Maximum likelihood is applied to Bernoulli random variables by differentiating the log likelihood and setting it equal to zero. Entropy is also introduced as a measure of uncertainty in a random variable. The next lecture will cover Bayesian learning.

Uploaded by

bzsahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CPSC340

Entropy and maximum likelihood

Nando de Freitas
September, 2012
University of British Columbia
Outline of the lecture
This lecture introduces to our first strategy for learning: Maximum
Likelihood. The goal is for you to learn:

 Definition of the maximum likelihood learning strategy.


 How to apply maximum likelihood to Bernoulli r.v.s.
 Understand the concepts of information and entropy.
 Derive the connection between maximum likelihood and
differential entropy.
 Understand maximum likelihood as a contrasting principle (the
world vs. the the hallucinations of the mind).
Frequentist learning
Frequentist learning assumes that there exists a true model, say with
parameters θο .
^
The estimate (learned value) will be denoted θ.

Given n data, x1:n = {x1, x2,…, xn }, we choose the value of θ that has
more probability of generating the data. That is,

θ^ = arg max p( x1:n |θ )


θ
Frequentist learning
Example: Suppose we observe the data, x1:n = {1, 1, 1, 1, 1, 1}, where
each xi comes from the same Bernoulli distribution (i.e. it is independent
and identically distributed (iid)). What is a good guess of θ?
Maximum Likelihood procedure
Step 1: Given n data, x1:n = {x1, x2,…, xn }, write down the expression
for the joint distribution of the data:

p( x1:n |θ ) =

Step 2: Compute the log-likelihood.

Step 3: Differentiate and equate to zero to find the estimate of θ .


Bernoulli MLE
Step 1: Write down the specific distribution for each datum (Bernoulli in
our case):
p( xi |θ ) =

p( x1:n |θ ) =

Step 2: Compute the log-likelihood.


Bernoulli MLE
Step 3: Differentiate and equate to zero to find the estimate of θ :
Entropy
In information theory, entropy H is a measure of the uncertainty
associated with a random variable. It is defined as:

H(X) = - Σx p(x) log p(x)


Example: For a Bernoulli variable X, the entropy is:
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
MLE - advanced
Next lecture
In the next lecture, we introduce Bayesian learning.

You might also like