0% found this document useful (0 votes)
92 views

Probabilistic Models With Latent Variables

This document discusses probabilistic models with latent variables. It introduces latent variables to model multiple modes in empirical distributions for unsupervised learning problems like density estimation. The Expectation-Maximization algorithm is presented as an approach for inference and parameter estimation in these latent variable models, including Gaussian mixture models and principal component analysis.

Uploaded by

About
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Probabilistic Models With Latent Variables

This document discusses probabilistic models with latent variables. It introduces latent variables to model multiple modes in empirical distributions for unsupervised learning problems like density estimation. The Expectation-Maximization algorithm is presented as an approach for inference and parameter estimation in these latent variable models, including Gaussian mixture models and principal component analysis.

Uploaded by

About
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Probabilistic Models

with Latent Variables

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
Density Estimation Problem
•  Learning from unlabeled data
• Unsupervised learning, density estimation

• Empirical distribution typically has multiple modes

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
Density Estimation Problem

From http://yulearning.blogspot.co.uk

From http://courses.ee.sun.ac.za/Pattern_Recognition_813

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3
Density Estimation Problem
•  Conv. composition of unimodal pdf’s: multimodal pdf
where

• Physical interpretation
• Sub populations

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
Latent Variables
• Introduce
  new variable for each
• Latent / hidden: not observed in the data

• Probabilistic interpretation
• Mixing weights:
• Mixture densities:

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
Generative Mixture Model
• 
For
𝑍𝑖
 

𝑋𝑖
 

• recovers mixture distribution


 𝑁

Plate Notation

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6
Tasks in a Mixture Model
•  Inference

• Parameter Estimation
• Find parameters that e.g. maximize likelihood
• Does not decouple according to classes
• Non convex, many local minima

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
Example: Gaussian Mixture Model
•  Model
For

• Inference

• Soft-max function

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
Example: Gaussian Mixture Model
•  Loglikelihood
• Which training instance comes from which component?

• No closed form solution for maximizing

• Possibility 1: Gradient descent etc


• Possibility 2: Expectation Maximization

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 9
Expectation Maximization Algorithm
•  Observation: Know values of easy to maximize

• Key idea: iterative updates


• Given parameter estimates, “infer” all variables
• Given inferred variables, maximize wrt parameters

• Questions
• Does this converge?
• What does this maximize?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
Expectation Maximization Algorithm
•  Complete loglikelihood

• Problem: not known


• Possible solution: Replace w/ conditional expectation

• Expected complete loglikelihood

Wrt where are the current parameters

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
Expectation Maximization Algorithm
• 

Where

• Compare with likelihood for generative classifier

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
Expectation Maximization Algorithm
•  Expectation Step
• Update based on current parameters

• Maximization Step
• Maximize wrt parameters

• Overall algorithm
• Initialize all latent variables
• Iterate until convergence
• M Step
• E Step

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
Example: EM for GMM
•  E Step remains the step for all mixture models

• M Step

• Compare with generative classifier

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
Analysis of EM Algorithm
• Expected complete LL is a lower bound on LL
• EM iteratively maximizes this lower bound

• Converges to a local maximum of the loglikelihood

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
Bayesian / MAP Estimation
• EM overfits
• Possible to perform MAP instead of MLE in M-step

• EM is partially Bayesian
• Posterior distribution over latent variables
• Point estimate over parameters

• Fully Bayesian approach is called Variational Bayes

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
(Lloyd’s) K Means Algorithm
•  Hard EM for Gaussian Mixture Model
• Point estimate of parameters (as usual)
• Point estimate of latent variables
• Spherical Gaussian mixture components

Where

• Most popular “hard” clustering algorithm

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
K Means Problem
• Given
  , find k “means” and data assignments such
that

• Note: is k-dimensional binary vector

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
Model selection: Choosing K for GMM
• Cross validation
• Plot likelihood on training set and validation set for
increasing values of k
• Likelihood on training set keeps improving
• Likelihood on validation set drops after “optimal” k

• Does not work for k-means! Why?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
Principal Component Analysis: Motivation
• Dimensionality reduction
• Reduces #parameters to estimate
• Data often resides in much lower dimension, e.g., on a line
in a 3D space
• Provides “understanding”

• Mixture models very restricted


• Latent variables restricted to small discrete set
• Can we “relax” the latent variable?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
Classical PCA: Motivation
•  Revisit K-means

• W: matrix containing means


• Z: matrix containing cluster membership vectors

• How can we relax Z and W?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
Classical PCA: Problem
• 

• X:
• Arbitrary Z of size ,
• Orthonormal W of size

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
Classical PCA: Optimal Solution
•  Empirical covariance matrix
• Scaled and centered data
• where contains L Eigen vectors for the L largest
Eigen values of

• Alternative solution via Singular Value


Decomposition (SVD)

• W contains the “principal components” that capture


the largest variance in the data

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
Probabilistic PCA
•  Generative model

forced to be diagonal

• Latent linear models


• Factor Analysis
• Special Case: PCA with

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24
Visualization of Generative Process

From Bishop, PRML

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25
Relationship with Gaussian Density
• 

• Why does need to be restricted?

• Intermediate low rank parameterization of Gaussian


covariance matrix between full rank and diagonal
• Compare #parameters

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 26
EM for PCA: Rod and Springs

From Bishop, PRML


Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 27
Advantages of EM
• Simpler than gradient methods w/ constraints

• Handles missing data

• Easy path for handling more complex models

• Not always the fastest method

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28
Summary of Latent Variable Models
• Learning from unlabeled data

• Latent variables
• Discrete: Clustering / Mixture models ; GMM
• Continuous: Dimensionality reduction ; PCA

• Summary / “Understanding” of data

• Expectation Maximization Algorithm

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29

You might also like