ch13 Linear Factor Models
ch13 Linear Factor Models
Jen-Tzung Chien
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
1
Linear Factor Models Deep Learning
Outline
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
2
Linear Factor Models Deep Learning
• Such a model can, in principle, use probabilistic inference to predict any of the
variables in its environment given any of the other variables
3
Linear Factor Models Deep Learning
4
Linear Factor Models Deep Learning
Graphical Model
5
Linear Factor Models Deep Learning
Outline
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
6
Linear Factor Models Deep Learning
Factor Analysis
• In factor analysis (Bartholomew, 1987; Basilevsky, 1994), the latent variable
prior is just the unit variance Gaussian
h ∼ N (h; 0, I)
• The role of the latent variables is thus to capture the dependencies between
the different observed variables xi
x ∼ N (x; b, W W > + ψ)
7
Linear Factor Models Deep Learning
Probabilistic PCA
• This probabilistic PCA model takes advantage of the observation that most
variations in the data can be captured by the latent variables h, up to some
small residualre construction error σ 2
8
Linear Factor Models Deep Learning
Outline
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
9
Linear Factor Models Deep Learning
• The variant that is most similar to the other generative models we have
described here is a variant (Pham et al., 1992) that trains a fully parametric
generative model
• The prior distribution over the underlying factors, p(h), must be fixed ahead
of time by the user.
10
Linear Factor Models Deep Learning
Motivation of ICA
11
Linear Factor Models Deep Learning
Non-Gaussian
• In the maximum likelihood approach where the user explicitly specifies the
distribution
d
• Typical choice is to use p(hi) = dhi σ(hi )
12
Linear Factor Models Deep Learning
Variants of ICA
• Some add some noise in the generation of x rather than using a deterministic
decoder
• Most do not use the maximum likelihood criterion, but instead aim to make
the elements of h = W −1x independent from each other
• Many variants of ICA only know how to transform between x and h, but do
not have any way of representing p(h), and thus do not impose a distribution
over p(x)
13
Linear Factor Models Deep Learning
Generalization of ICA
14
Linear Factor Models Deep Learning
Outline
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
15
Linear Factor Models Deep Learning
• Slow feature analysis (SFA) is a linear factor model that uses information from
time signals to learn invariant features (Wiskott and Sejnowski, 2002)
• The idea is that the important characteristics of scenes change very slowly
compared to the individual measurements that make up a description of a
scene
• The slowness principle predates slow feature analysis and has been applied to
a wide variety of models (Hinton, 1989; Földiák, 1989; Mobahi et al., 2009;
Bergstra and Bengio, 2009)
16
Linear Factor Models Deep Learning
17
Linear Factor Models Deep Learning
Objective Function
• The SFA algorithm (Wiskott and Sejnowski, 2002) consists of defining f (x; θ)
to be a linear transformation, and solving the optimization problem
2
min Et f (x(t+1))i − f (x(t))i =0
θ
Etf (x(t))i = 0
h i
Et f (x(t))2i = 1
18
Linear Factor Models Deep Learning
• The constraint that the learned feature have zero mean is necessary to make
the problem have a unique solution
• The constraint that the features have unit variance is necessary to prevent the
pathological solution where all features collapse to 0
(t) (t)
∀i < j, Et f (x )if (x )j = 0
– this specifies that the learned features must be linearly decorrelated from
each other
– without this constraint, all of the learned features would simply capture the
one slowest signal
19
Linear Factor Models Deep Learning
Conclusion of SFA
• Deep SFA has also been used to learn features for object recognition and pose
estimation (Franzius et al., 2008)
20
Linear Factor Models Deep Learning
Outline
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
21
Linear Factor Models Deep Learning
Sparse Coding
• Sparse coding (Olshausen and Field, 1996) is a linear factor model that has
been heavily studied as an unsupervised feature learning and feature extraction
mechanism
• Sparse coding models typically assume that the linear factors have Gaussian
noise with isotropic precision
1
p(x|h) = N (x; W h + b, I)
β
• The distribution p(h) is chosen to be one with sharp peaks near 0 (Olshausen
and Field, 1996)
– factorized Laplace distributions
– Cauchy distributions
– factorized Student-t distributions
22
Linear Factor Models Deep Learning
• The training alternates between encoding the data and training the decoder
to better reconstruct the data given the encoding
23
Linear Factor Models Deep Learning
Advantages
• The sparse coding approach combined with the use of the non-parametric
encoder can in principle minimize the combination of reconstruction error and
log-prior better than any specific parametric encoder
• For the vast majority of formulations of sparse coding models, where the
inference problem is convex, the optimization procedure will always find the
optimal code
24
Linear Factor Models Deep Learning
Disadvantages
25
Linear Factor Models Deep Learning
Poor Samples
26
Linear Factor Models Deep Learning
Outline
• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA
27
Linear Factor Models Deep Learning
• Linear factor models including PCA and factor analysis can be interpreted as
learning a manifold (Hinton et al., 1997)
28
Linear Factor Models Deep Learning
x̂ = g(h) = b + W h
29
Linear Factor Models Deep Learning
Optimization
• The choices of linear encoder and decoder that minimize reconstruction error
E[||x − x̂||2]
– corresponds to V = W
– µ = b = E[x]
– columns of W form an orthonormal basis which spans the same subspace
as the principal eigenvectors of the covariance matrix
>
C = E (x − µ)(x − µ)
30
Linear Factor Models Deep Learning
Discussions
D
X
min E[||x − x̂||2] = λi
i=d+1
• one can also show that the above solution can be obtained by maximizing the
variances of the elements of h, under orthogonal W
31
Linear Factor Models Deep Learning
References
32