0% found this document useful (0 votes)

17 views4 pages

Latent 2

The document discusses latent variable models for unsupervised learning, focusing on the structure of observed responses rather than mapping regressors to responses. It introduces the Expectation-Maximization (EM) algorithm as a method for finding maximum likelihood estimates in latent variable models, detailing the E-step and M-step processes. An example of a binary mixture of Gaussians is provided to illustrate the concepts of recognition, inference, and model fitting.

Uploaded by

Khánh Minh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Latent 2

Uploaded by

Khánh Minh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Statistical Modeling and Analysis of Neural Data (NEU 560)

Princeton University, Spring 2018

Jonathan Pillow

Lecture 16 notes:
Latent variable models and EM

Tues, 4.10

1 Latent variable models

In the next section we will discuss latent variable models for unsupservised learning, where instead
of trying to learn a mapping from regressors to responses (e.g. from stimuli to responses), we are
simply trying to capture structure in a set of observed responses.
The word latent simply means unobserved. Latent variables are simply random variables that we
posit to exist underlying our data. We could also refer to such models as doubly stochastic, because
they involve two stages of noise: noise in the latent variable and then noise in the mapping from
latent variable to observed variable.
Specifically, we we will specify latent variable models in terms of two pieces

• Prior over the latent: z ∼ p(z)

• Conditional probability of observed data: x|z ∼ p(x|z)

The probability of the observed data x is given by an integral over the latent variable:
Z
p(x) = p(x|z)p(z)dz (1)

or a sum in the case of discrete latent variables:

m
X
p(x) = p(x|z = αi )p(z = αi ), (2)
i=1

where the latent variable takes on a finite set of values z ∈ {α1 , α2 , . . . , αm }.

2 Two key things we want to do with latent variable models

1. Recognition / inference - refers to the problem of inferring the latent variable z from the
data x. The posterior over the latent given the data is specified by Bayes’ rule:
p(x|z)p(z)
p(z|x) = , (3)
p(x)
where the model is specified by the terms in the denominator, and theR denominator is the
marginal probability obtained by integrating the numerator, by p(x) = p(x|z)p(z)dz.

1
2. Model fitting - refers to the problem of learning the model parameters, which we have so
far suppressed. In fact we should write the model as specified by
p(x, z|θ) = p(x|z, θ)p(z|θ) (4)
where θ are the parameters governing both the prior over the latent and the conditional
distribution of the data.
Maximum likelihood fitting involves computing and maximizing the marginal probability:
Z
θ̂ = arg max p(x|θ) = arg max p(x, z|θ)dz. (5)
θ θ

3 Example: binary mixture of Gaussians (MoG)

(Also commonly known as a Gaussian mixture model (GMM)).

This model is specified by:
z ∼ Ber(p) (6)
(
N (µ0 , C0 ), if p = 0
x|z ∼ (7)
N (µ1 , C1 ), if p = 1

So z is a binary random variable that takes value 1 with probability p and value 0 with probability
(1 − p). The datapoint x is then drawn from either Gaussian N0 (x) = N (µ0 , C0 ) if p = 0 or a
different Gaussian N1 (x) = N (µ1 , C1 ) if p = 1.
For this simple model the recognition distribution (conditional distribution of the latent):
(1 − p)N0 (x)
p(z = 0|x) = (8)
(1 − p)N0 (x) + pN1 (x)
pN1 (x)
p(z = 1|x) = (9)
(1 − p)N0 (x) + pN1 (x)

The likelihood (or marginal likelihood) is simply the normalizer in the expressions above:
p(x|θ) = (1 − p)N0 (x) + pN1 (x), (10)
where the model parameters are θ = {p, µ0 , C0 , µ1 , C1 }.
For an entire dataset, likelihood would be the product of independent terms, since we assume each
latent zi is drawn independently from the prior, giving:
N
Y
p(X|θ) = (1 − p)N0 (xi ) + pN1 (xi ) (11)
i=1

and hence
N
X
log p(X|θ) = log (1 − p)N0 (xi ) + pN1 (xi ) . (12)
i=1

2
Clearly we could write a function to compute this sum and use an off-the-shelf algorithm to optimize
it numerically if we wanted to. However, we will next discuss an alternative iterative approach to
maximizing the likelihood.

4 The Expectation-Maximization (EM) algorithm

4.1 Jensen’s inequality

Before we proceed to the algorithm, let’s first describe one of the tools used in its derivation.
Jensen’s inequality: for any concave function f and p ∈ [0, 1],

f ((1 − p)x1 + px2 ) ≥ (1 − p)f (x1 ) + pf (x2 ). (13)

The left hand side is the function f evaluated at a point somewhere between x1 and x2 , while
the right hand side is a point on the straight line (a chord) connecting f (x1 ) and f (x2 ). Since a
concave function lies above any chord, this follows straightforwardly from the definition of concave
functions. (For convex functions the inequality is reversed!)
In our hands we will use the function f (x) = exp(x), in which case we can think of Jensen’s
inequality as equivalent to the statement that “The log of the average is greater than or equal to
the average of the logs”.
The inequality can be extended to any continuous probability distribution p(x) and implies that:
Z Z
f ( p(x)g(x)dx ≥ p(x)f (g(x))dx (14)

for any concave f (x), or in our case:

Z Z
log p(x)g(x)dx ≥ p(x) log g(x). (15)

4.2 EM

The expectation-maximization algorithm is an iterative method for finding the maximum likelihood
estimate for a latent variable model. It consists of iterating between two steps (“Expectation step”
and “Maximization step”, or “E-step” and “M-step” for short) until convergence. Both steps
involve maximizing a lower bound on the likelihood.
Before deriving this lower bound, recall that p(x|z, θ)p(z|θ) = p(x, z|theta) = p(z|x, θ)p(x|θ). This
is a quantity known in the EM literature as the total data likelihood.
The log-likelihood can be lower-bounded through a straightforward application of Jensen’s inequal-

3
ity:

log p(x|θ) = log p(x, z|θ)dz (definition of log-likelihood) (16)

Here q(z|φ) is an arbitrary distribution over the latent z, with parameters φ. The quantity we have
obtained in equation (eq. 18) is known as the negative free energy F (φ, θ).
We will now write the negative free energy in two different forms. First:
Z
p(x, z|θ)
F (φ, θ) = q(z|φ) log dz (20)
q(z|φ)
Z
p(x|θ)p(z|x, θ)
= q(z|φ) log dz (21)
q(z|φ)
Z Z
p(z|x, θ)
= q(z|φ) log p(x|θ) + q(z|φ) log dz (22)
q(z|φ)

= log p(x|θ) − KL q(z|φ)||p(z|x, θ) (23)

This last line makes clear that the NFE is indeed a lower bound on log p(x|θ) because the KL
divergence is always non-negative. Moreover, it shows how to make the bound tight, namely by
setting φ such that the q distribution is equal to the conditional distribution over the latent given
the data and the current parameters θ, i.e., q(z|φ) = p(z|x, θ).
A second way to write the NFE that will prove useful is:
Z
p(x, z|θ)
F (φ, θ) = q(z|φ) log dz (24)
q(z|φ)
Z Z
= q(z|φ) log p(x, z|θ)dz − q(z|φ) log q(z|φ)dz. (25)

Here we observe that the second term is independent of θ. We can therefore maximize the NFE
for θ by simply maximizing the first term.
We are now ready to define the two steps of the EM algorithm:

• E-step: Update φ by setting q(z|φ) = p(z|x, θ) (eq. 23), with θ held fixed.
R
• M-step: Update θ by maximizing the expected total data likelihood, q(z|φ) log p(x, z|θ)dz
(eq. 25), with φ held fixed.

Note that the lower bound on the log-likelihood will be tight after each E-step.

Solutions To Steven Kay's Statistical Estimation Book
67% (3)
Solutions To Steven Kay's Statistical Estimation Book
16 pages
Surge Pricing Solves The Wild Goose Chase
No ratings yet
Surge Pricing Solves The Wild Goose Chase
46 pages
Olympiad Maths Trainer 5
No ratings yet
Olympiad Maths Trainer 5
8 pages
Bgas gr-2 W Answer
100% (5)
Bgas gr-2 W Answer
33 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
5
No ratings yet
5
29 pages
05 Vae
No ratings yet
05 Vae
76 pages
EM Algo
No ratings yet
EM Algo
8 pages
Stats, Mle, and Other Stuff: 1 Sevssd
No ratings yet
Stats, Mle, and Other Stuff: 1 Sevssd
10 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
Maximum Likelihood Estimators and Least Squares
No ratings yet
Maximum Likelihood Estimators and Least Squares
5 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Modeling, Inference and Prediction: 2.1 Probabilistic Models
No ratings yet
Modeling, Inference and Prediction: 2.1 Probabilistic Models
16 pages
Expectation Maximization
No ratings yet
Expectation Maximization
21 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
TS Theme3
No ratings yet
TS Theme3
18 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
11 Hidden Markov Models (HMMS) Model and Problem Description
No ratings yet
11 Hidden Markov Models (HMMS) Model and Problem Description
15 pages
Probabilistic Modelling and Reasoning
No ratings yet
Probabilistic Modelling and Reasoning
13 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
cs236 Lecture5
No ratings yet
cs236 Lecture5
29 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
Class19 Approxinf
No ratings yet
Class19 Approxinf
45 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
No ratings yet
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
6 pages
Msqe Metrics 1 ps2
No ratings yet
Msqe Metrics 1 ps2
11 pages
EM at RIT
No ratings yet
EM at RIT
17 pages
Simple Example
No ratings yet
Simple Example
3 pages
Tutorial On Generalized Expectation
No ratings yet
Tutorial On Generalized Expectation
6 pages
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
No ratings yet
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
6 pages
From Physics To Economics
No ratings yet
From Physics To Economics
19 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
4 Estimation
No ratings yet
4 Estimation
33 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
06 Estimation
No ratings yet
06 Estimation
31 pages
Slide_8_01
No ratings yet
Slide_8_01
37 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Notes
No ratings yet
Notes
10 pages
The Kullback-Liebler Distance and Entropy
No ratings yet
The Kullback-Liebler Distance and Entropy
5 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Exercise 3 Computer Intensive Statistics
No ratings yet
Exercise 3 Computer Intensive Statistics
10 pages
771 A18 Lec16
No ratings yet
771 A18 Lec16
99 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Lecture1 - EM Variational Inference
No ratings yet
Lecture1 - EM Variational Inference
18 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
Section 5
No ratings yet
Section 5
18 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Model Fitting
No ratings yet
Model Fitting
19 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Peer To Peer Markets
No ratings yet
Peer To Peer Markets
34 pages
The Emerging Role of Electronic Marketplaces On The Internet
No ratings yet
The Emerging Role of Electronic Marketplaces On The Internet
14 pages
Pricing and Equilibrium in On-Demand Ride-Pooling Markets
No ratings yet
Pricing and Equilibrium in On-Demand Ride-Pooling Markets
21 pages
The Geography of Ridesharing A Case Study On New York City
No ratings yet
The Geography of Ridesharing A Case Study On New York City
52 pages
Targeted Incentives, Broad Impacts Evidence From An E-Commerce Platform
No ratings yet
Targeted Incentives, Broad Impacts Evidence From An E-Commerce Platform
34 pages
Measuring The Spillovers From Technical Advance
No ratings yet
Measuring The Spillovers From Technical Advance
15 pages
Optimal Pricing in on-Demand-Service-Platform-Operations With Hired Agents and Risk-Sensitive Customers in The Blockchain Era
No ratings yet
Optimal Pricing in on-Demand-Service-Platform-Operations With Hired Agents and Risk-Sensitive Customers in The Blockchain Era
46 pages
Taxes and The On-Demand Economy
No ratings yet
Taxes and The On-Demand Economy
14 pages
Outsourcing Tasks Online Matching Supply and Demand On Peer-To-Peer Internet Platforms
No ratings yet
Outsourcing Tasks Online Matching Supply and Demand On Peer-To-Peer Internet Platforms
20 pages
Pricing and Wage Strategies For An On-Demand Service Platform With Heterogeneous Congestion-Sensitive Customers
No ratings yet
Pricing and Wage Strategies For An On-Demand Service Platform With Heterogeneous Congestion-Sensitive Customers
39 pages
Modeling The Efect of Lockdown Timing As A COVID 19 Control Measure in Countries With Difering Social Contacts
No ratings yet
Modeling The Efect of Lockdown Timing As A COVID 19 Control Measure in Countries With Difering Social Contacts
13 pages
Pricing in Designed Markets The Case of Ride-Sharing
No ratings yet
Pricing in Designed Markets The Case of Ride-Sharing
61 pages
Surge Pricing Moves Uber's Driver-Partners
No ratings yet
Surge Pricing Moves Uber's Driver-Partners
24 pages
Factors Affecting Online Food Delivery Service in Bangladesh An Empirical Study
No ratings yet
Factors Affecting Online Food Delivery Service in Bangladesh An Empirical Study
16 pages
Do Digital Platforms Reduce Moral Hazard The Case of Uber and Taxis
No ratings yet
Do Digital Platforms Reduce Moral Hazard The Case of Uber and Taxis
47 pages
Online Food Delivery App Foodie'
No ratings yet
Online Food Delivery App Foodie'
11 pages
Some Simple Economics of The Blockchain
No ratings yet
Some Simple Economics of The Blockchain
31 pages
Search, Matching, and The Role of Digital Marketplace Design in Enabling Trade Evidence From Airbnb
No ratings yet
Search, Matching, and The Role of Digital Marketplace Design in Enabling Trade Evidence From Airbnb
50 pages
Platform Ecosystems How Developers Invert The Frim
No ratings yet
Platform Ecosystems How Developers Invert The Frim
31 pages
A New Model For Last-Mile Delivery and Satellite Depots Management The Impact of The On-Demand Economy
No ratings yet
A New Model For Last-Mile Delivery and Satellite Depots Management The Impact of The On-Demand Economy
16 pages
COVID-19 and Digital Resilience Evidence From Uber Eats
No ratings yet
COVID-19 and Digital Resilience Evidence From Uber Eats
29 pages
The Role of Customers in The Gig Economy How Perceptions of Working Conditions and Service Quality Influence The Use and Recommendation of Food Delivery Services
No ratings yet
The Role of Customers in The Gig Economy How Perceptions of Working Conditions and Service Quality Influence The Use and Recommendation of Food Delivery Services
31 pages
Mar2024 - Melive Credential
No ratings yet
Mar2024 - Melive Credential
47 pages
Aer 98 3 1069
No ratings yet
Aer 98 3 1069
14 pages
Day 1 Recap Prime Big Deals Day 2023 1697090159
No ratings yet
Day 1 Recap Prime Big Deals Day 2023 1697090159
10 pages
Deera Credentials ENG
No ratings yet
Deera Credentials ENG
39 pages
Hagen Poiseuille Flow A Comprehensive Analysis
No ratings yet
Hagen Poiseuille Flow A Comprehensive Analysis
10 pages
Flexim G608
No ratings yet
Flexim G608
24 pages
Schneider Electric Datasheet - Ic60 - A9F90225
No ratings yet
Schneider Electric Datasheet - Ic60 - A9F90225
4 pages
Moody Chart
No ratings yet
Moody Chart
1 page
Answer Key Jeemain - Guru
No ratings yet
Answer Key Jeemain - Guru
3 pages
Materials
No ratings yet
Materials
236 pages
Biomechanics of Javelin Throwing: Hans-Joachim Menzel
100% (1)
Biomechanics of Javelin Throwing: Hans-Joachim Menzel
14 pages
AI in Trump's America
No ratings yet
AI in Trump's America
14 pages
Nqesh Review Hub-Update and Mock Test No. 4
No ratings yet
Nqesh Review Hub-Update and Mock Test No. 4
25 pages
Geological Map of The Ngawi, Jawa
No ratings yet
Geological Map of The Ngawi, Jawa
1 page
Tech Unit 3.500'' Tubing Centralizer
No ratings yet
Tech Unit 3.500'' Tubing Centralizer
7 pages
Active Structural Control Research at Kajima Corporation
No ratings yet
Active Structural Control Research at Kajima Corporation
54 pages
110V 60a FCBC
100% (1)
110V 60a FCBC
3 pages
Sheet
No ratings yet
Sheet
5 pages
5 Choices Intro
No ratings yet
5 Choices Intro
2 pages
Evantage - Architecting On AWS 5.4 (EN) - Lab Guide
No ratings yet
Evantage - Architecting On AWS 5.4 (EN) - Lab Guide
81 pages
Polycrete MC
No ratings yet
Polycrete MC
2 pages
Assignment No. 1: Waqar Ali Book Code 5405
No ratings yet
Assignment No. 1: Waqar Ali Book Code 5405
7 pages
Tips For Effective Speech DELIVERY
No ratings yet
Tips For Effective Speech DELIVERY
16 pages
PG - Notification Detailed - 2023-24
No ratings yet
PG - Notification Detailed - 2023-24
11 pages
The Folded Earth - 20241207 - 001200 - 0000
No ratings yet
The Folded Earth - 20241207 - 001200 - 0000
12 pages
Sevinj OmarliArticle
No ratings yet
Sevinj OmarliArticle
12 pages
TEACHERs SCHEDULE
No ratings yet
TEACHERs SCHEDULE
18 pages
SASMO Grade 8 (Secondary 2) Sample Questions
No ratings yet
SASMO Grade 8 (Secondary 2) Sample Questions
5 pages
Dressmaking 1 Quarter 4 Week 1
No ratings yet
Dressmaking 1 Quarter 4 Week 1
9 pages
Earned Value Analysis Template Excel - الخرافي
No ratings yet
Earned Value Analysis Template Excel - الخرافي
6 pages
HREQ 1800 Final Exam, Winter 2025
No ratings yet
HREQ 1800 Final Exam, Winter 2025
2 pages
Implicit Theories of Relationships
No ratings yet
Implicit Theories of Relationships
12 pages

Latent 2

Uploaded by

Latent 2

Uploaded by

Statistical Modeling and Analysis of Neural Data (NEU 560)

Princeton University, Spring 2018

1 Latent variable models

• Prior over the latent: z ∼ p(z)

or a sum in the case of discrete latent variables:

where the latent variable takes on a finite set of values z ∈ {α1 , α2 , . . . , αm }.

2 Two key things we want to do with latent variable models

3 Example: binary mixture of Gaussians (MoG)

(Also commonly known as a Gaussian mixture model (GMM)).

4 The Expectation-Maximization (EM) algorithm

4.1 Jensen’s inequality

f ((1 − p)x1 + px2 ) ≥ (1 − p)f (x1 ) + pf (x2 ). (13)

for any concave f (x), or in our case:

log p(x|θ) = log p(x, z|θ)dz (definition of log-likelihood) (16)

You might also like