0% found this document useful (0 votes)

124 views

ch13 Linear Factor Models

The document provides an overview of linear factor models for deep learning. It introduces probabilistic PCA and factor analysis, which use latent variables to capture dependencies between observed variables. Independent component analysis (ICA) seeks to separate observed signals into underlying independent signals. Slow feature analysis learns invariant features by enforcing a slowness principle where important scene characteristics change slowly over time. Sparse coding and manifold interpretations of PCA are also briefly mentioned.

Uploaded by

黃良初

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views

ch13 Linear Factor Models

Uploaded by

黃良初

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Deep Learning Tutorial:

Linear Factor Models

Jen-Tzung Chien

Department of Electrical and Computer Engineering

Department of Computer Science
National Chiao Tung University, Hsinchu

September 19, 2016

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

1
Linear Factor Models Deep Learning

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

2
Linear Factor Models Deep Learning

Model with Latent Variables

• Many of the research frontiers in deep learning involve building a probabilistic

model of the input pmodel(x)

• Such a model can, in principle, use probabilistic inference to predict any of the
variables in its environment given any of the other variables

• Many of these models also have latent variables h

• These latent variables provide another means of representing the data

3
Linear Factor Models Deep Learning

Linear Factor Model

• A linear factor model is defined by the use of a stochastic, linear decoder

function that generates x by adding noise to a linear transformation of h

• These models are interesting because they allow us to discover explanatory

factors that have a simple joint distribution

• A linear factor model describes the data generation process as follows

Q
– sample the explanatory factors h from a distribution p(h) = i p(hi)
– sample the real-valued observable variables given the factors
x = W h + b + noise

4
Linear Factor Models Deep Learning

Graphical Model

5
Linear Factor Models Deep Learning

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

6
Linear Factor Models Deep Learning

Factor Analysis
• In factor analysis (Bartholomew, 1987; Basilevsky, 1994), the latent variable
prior is just the unit variance Gaussian

h ∼ N (h; 0, I)

• The observed variables xi are assumed to be conditionally independent given

• The noise is assumed to be drawn from a diagonal covariance Gaussian

distribution
– covariance matrix: ψ = diag(σ 2), where σ 2 = [σ12, σ22, ..., σn2 ]

• The role of the latent variables is thus to capture the dependencies between
the different observed variables xi

x ∼ N (x; b, W W > + ψ)

7
Linear Factor Models Deep Learning

Probabilistic PCA

• In order to cast PCA in a probabilistic framework, we can make a slight

modification
– making the conditional variances σi2 equal to each other

• This yields the conditional distribution: x ∼ N (x; b, W W > + σ 2I)

• This probabilistic PCA model takes advantage of the observation that most
variations in the data can be captured by the latent variables h, up to some
small residualre construction error σ 2

8
Linear Factor Models Deep Learning

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

9
Linear Factor Models Deep Learning

Independent Component Analysis

• ICA is an approach to modeling linear factors that seeks to separate an observed

signal into many underlying signals that are scaled and added together to form
the observed data

• The variant that is most similar to the other generative models we have
described here is a variant (Pham et al., 1992) that trains a fully parametric
generative model

• The prior distribution over the underlying factors, p(h), must be fixed ahead
of time by the user.

• The model then deterministically generates x = W h.

10
Linear Factor Models Deep Learning

Motivation of ICA

• The motivation for this approach is that by choosing p(h) to be independent

• Each xi is one sensor’s observation of the mixed signals

• Each hi is one estimate of one of the original signals

11
Linear Factor Models Deep Learning

Non-Gaussian

• All variants of ICA require that p(h) be non-Gaussian

• This is because if p(h) is an independent prior with Gaussian components,

then W is not identifiable

• In the maximum likelihood approach where the user explicitly specifies the
distribution

d
• Typical choice is to use p(hi) = dhi σ(hi )

– larger peaks near 0 than the Gaussian distribution

12
Linear Factor Models Deep Learning

Variants of ICA

• Some add some noise in the generation of x rather than using a deterministic
decoder

• Most do not use the maximum likelihood criterion, but instead aim to make
the elements of h = W −1x independent from each other

• Many variants of ICA only know how to transform between x and h, but do
not have any way of representing p(h), and thus do not impose a distribution
over p(x)

13
Linear Factor Models Deep Learning

Generalization of ICA

• ICA can be generalized to a nonlinear generative model

– Hyvärinen and Pajunen (1999) for the initial work on nonlinear ICA
– its successful use with ensemble learning by Roberts and Everson (2001)
and Lappalainen et al. (2000)

• Another nonlinear extension of ICA is the approach of nonlinear independent

components estimation, or NICE (Dinh et al., 2014)

• Another generalization of ICA is to learn groups of features, with statistical

dependence allowed within a group but discouraged between groups (Hyvärinen
and Hoyer, 1999; Hyvärinen et al., 2001b).

14
Linear Factor Models Deep Learning

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

15
Linear Factor Models Deep Learning

Slow Feature Analysis

• Slow feature analysis (SFA) is a linear factor model that uses information from
time signals to learn invariant features (Wiskott and Sejnowski, 2002)

• The idea is that the important characteristics of scenes change very slowly
compared to the individual measurements that make up a description of a
scene

• The slowness principle predates slow feature analysis and has been applied to
a wide variety of models (Hinton, 1989; Földiák, 1989; Mobahi et al., 2009;
Bergstra and Bengio, 2009)

16
Linear Factor Models Deep Learning

Model with Slow Principle

• Slowness principle may be introduced by adding a term to the cost function of

the form X
λ L(f (xt+1), f (xt))
t

– λ is a hyperparameter determining the strength of the slowness regularization

term
– t t is the index into a time sequence of examples
– f is the feature extractor to be regularized
– L is a loss function measuring the distance between f (xt+1) and f (xt)

17
Linear Factor Models Deep Learning

Objective Function

• The SFA algorithm (Wiskott and Sejnowski, 2002) consists of defining f (x; θ)
to be a linear transformation, and solving the optimization problem
2
min Et f (x(t+1))i − f (x(t))i =0
θ

– subject to the constraints

Etf (x(t))i = 0
h i
Et f (x(t))2i = 1

18
Linear Factor Models Deep Learning

Discussion about Constraints

• The constraint that the learned feature have zero mean is necessary to make
the problem have a unique solution

• The constraint that the features have unit variance is necessary to prevent the
pathological solution where all features collapse to 0

• To learn multiple features, we must also add the constraint

(t) (t)

∀i < j, Et f (x )if (x )j = 0

– this specifies that the learned features must be linearly decorrelated from
each other
– without this constraint, all of the learned features would simply capture the
one slowest signal

19
Linear Factor Models Deep Learning

Conclusion of SFA

• SFA is typically used to learn nonlinear features by applying a nonlinear basis

expansion to x before running SFA

• A major advantage of SFA is that it is possibly to theoretically predict which

features SFA will learn, even in the deep, nonlinear setting

• Deep SFA has also been used to learn features for object recognition and pose
estimation (Franzius et al., 2008)

20
Linear Factor Models Deep Learning

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

21
Linear Factor Models Deep Learning

Sparse Coding

• Sparse coding (Olshausen and Field, 1996) is a linear factor model that has
been heavily studied as an unsupervised feature learning and feature extraction
mechanism

• Sparse coding models typically assume that the linear factors have Gaussian
noise with isotropic precision

1
p(x|h) = N (x; W h + b, I)
β

• The distribution p(h) is chosen to be one with sharp peaks near 0 (Olshausen
and Field, 1996)
– factorized Laplace distributions
– Cauchy distributions
– factorized Student-t distributions

22
Linear Factor Models Deep Learning

Encoder and Decoder

• Training sparse coding with maximum likelihood is intractable

• The training alternates between encoding the data and training the decoder
to better reconstruct the data given the encoding

• The encoder is an optimization algorithm, that solves an optimization problem

in which we seek the single most likely code value

h∗ = arg max p(h|x)

= arg max log p(h|x)

= arg max λ||h||1 + β||x − W h||22

23
Linear Factor Models Deep Learning

Advantages

• The sparse coding approach combined with the use of the non-parametric
encoder can in principle minimize the combination of reconstruction error and
log-prior better than any specific parametric encoder

• Another advantage is that there is no generalization error to the encoder

• For the vast majority of formulations of sparse coding models, where the
inference problem is convex, the optimization procedure will always find the
optimal code

24
Linear Factor Models Deep Learning

Disadvantages

• The primary disadvantage of the non-parametric encoder is that it requires

greater time to compute h given x

• It is not straight-forward to back-propagate through the non-parametric

encoder
– makes it difficult to pretrain a sparse coding model with an unsupervised
criterion
– and then fine-tune it using a supervised criterion

25
Linear Factor Models Deep Learning

Poor Samples

26
Linear Factor Models Deep Learning

Outline

• Introduction
• Probabilistic PCA and Factor Analysis
• Independent Component Analysis (ICA)
• Slow Feature Analysis
• Sparse Coding
• Manifold Interpretation of PCA

27
Linear Factor Models Deep Learning

Manifold Interpretation of PCA

• Linear factor models including PCA and factor analysis can be interpreted as
learning a manifold (Hinton et al., 1997)

• PCA can be interpreted as aligning this pancake with a linear manifold in a

higher-dimensional space

28
Linear Factor Models Deep Learning

Encoder and Decoder

• Let the encoder be

h = f (x) = W >(x − µ)
– The encoder computes a low-dimensional representation of h

• We have a decoder computing the reconstruction

x̂ = g(h) = b + W h

29
Linear Factor Models Deep Learning

Optimization

• The choices of linear encoder and decoder that minimize reconstruction error

E[||x − x̂||2]

– corresponds to V = W
– µ = b = E[x]
– columns of W form an orthonormal basis which spans the same subspace
as the principal eigenvectors of the covariance matrix

>

C = E (x − µ)(x − µ)

30
Linear Factor Models Deep Learning

Discussions

• One can also show that eigenvalue λi of C corresponds to the variance of x

in the direction of eigenvector v (i)

• If x ∈ RD and h ∈ Rd with d < D, then the optimal reconstruction error is

D
X
min E[||x − x̂||2] = λi
i=d+1

• one can also show that the above solution can be obtained by maximizing the
variances of the elements of h, under orthogonal W

31
Linear Factor Models Deep Learning

References

Quantitative Finance Cheat Sheet: Amit Kumar Jha, UBS
No ratings yet
Quantitative Finance Cheat Sheet: Amit Kumar Jha, UBS
11 pages
بناء بطارية اختبار للذكاء الحركي للأطفال بعمر 4-6 سنوات
No ratings yet
بناء بطارية اختبار للذكاء الحركي للأطفال بعمر 4-6 سنوات
25 pages
CQF January 2022 Learning Pathway
No ratings yet
CQF January 2022 Learning Pathway
12 pages
Ameya Abhyankar, CFA, CQFProfile
0% (1)
Ameya Abhyankar, CFA, CQFProfile
4 pages
Financial Engineering Interview Questions Part 1 1657202878
No ratings yet
Financial Engineering Interview Questions Part 1 1657202878
24 pages
MATH3075 3975 Course Notes 2016
No ratings yet
MATH3075 3975 Course Notes 2016
109 pages
Guide To ACF PACF Plots
No ratings yet
Guide To ACF PACF Plots
6 pages
M.1.Quick Sheet
No ratings yet
M.1.Quick Sheet
6 pages
Bond Duration - Dynamic Chart
No ratings yet
Bond Duration - Dynamic Chart
3 pages
L2-Formula Sheet
No ratings yet
L2-Formula Sheet
4 pages
Basic Concepts
No ratings yet
Basic Concepts
26 pages
How to Implement Market Models Using VBA
From Everand
How to Implement Market Models Using VBA
Francois Goossens
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
45 pages
1994 Clement Dornyei Noels LL
100% (2)
1994 Clement Dornyei Noels LL
17 pages
Deep DFM
No ratings yet
Deep DFM
40 pages
Factor B
No ratings yet
Factor B
68 pages
Case 1
No ratings yet
Case 1
2 pages
Multivariate GARCH Models: Software Choice and Estimation Issues
No ratings yet
Multivariate GARCH Models: Software Choice and Estimation Issues
21 pages
Factor Models
No ratings yet
Factor Models
45 pages
Implied Volatility PDF
No ratings yet
Implied Volatility PDF
14 pages
xVA Thought Leadership Ebook
No ratings yet
xVA Thought Leadership Ebook
28 pages
Robust Optimization
No ratings yet
Robust Optimization
27 pages
Neural Networks Economics
No ratings yet
Neural Networks Economics
27 pages
Kbai Study Guide
No ratings yet
Kbai Study Guide
5 pages
Rust Cheat Sheet
No ratings yet
Rust Cheat Sheet
88 pages
Hedge Fund Risk Modeling: Miguel Alvarez Mike Levinson
No ratings yet
Hedge Fund Risk Modeling: Miguel Alvarez Mike Levinson
27 pages
Probabilistic Aspects of Finance
No ratings yet
Probabilistic Aspects of Finance
22 pages
Volatility Surface Interpolation
No ratings yet
Volatility Surface Interpolation
24 pages
Risk Management and Financial Institutions: by John C. Hull
No ratings yet
Risk Management and Financial Institutions: by John C. Hull
31 pages
IRDA Workbook PDF
100% (1)
IRDA Workbook PDF
662 pages
Neural Network in Financial Analysis
No ratings yet
Neural Network in Financial Analysis
33 pages
Risk Management AND Financial Institutions: Second Edition
No ratings yet
Risk Management AND Financial Institutions: Second Edition
8 pages
Jorion FRM Exam Answers
No ratings yet
Jorion FRM Exam Answers
57 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
Chapter 1: Introduction To (Actuarial) Modelling: Models
No ratings yet
Chapter 1: Introduction To (Actuarial) Modelling: Models
17 pages
preview-9781000176766_A39526004
No ratings yet
preview-9781000176766_A39526004
35 pages
Lecture Note 09 - Interest Rate Models
No ratings yet
Lecture Note 09 - Interest Rate Models
87 pages
Checking Model Validity and Verification
No ratings yet
Checking Model Validity and Verification
13 pages
XonGrid Interpolation Add-In
No ratings yet
XonGrid Interpolation Add-In
5 pages
Aaron Brown - VaR The Next 10 Disasters
No ratings yet
Aaron Brown - VaR The Next 10 Disasters
5 pages
Axioma Global Multi-Asset Class Risk Model Fact Sheet
No ratings yet
Axioma Global Multi-Asset Class Risk Model Fact Sheet
7 pages
FX and Interest Rates - 1996
No ratings yet
FX and Interest Rates - 1996
2 pages
(Merrill Lynch, Youssfi) Convexity Adjustment For Volatility Swaps
No ratings yet
(Merrill Lynch, Youssfi) Convexity Adjustment For Volatility Swaps
18 pages
PRM Handbook Introduction and Contents
0% (2)
PRM Handbook Introduction and Contents
23 pages
Forecasting Volatility of Stock Indices With ARCH Model
No ratings yet
Forecasting Volatility of Stock Indices With ARCH Model
18 pages
Investment Theory
No ratings yet
Investment Theory
29 pages
Back Testing
No ratings yet
Back Testing
33 pages
Pandas Datareader
No ratings yet
Pandas Datareader
31 pages
Enterprise Credit Risk Evaluation Based On Neural Network Algorithm
No ratings yet
Enterprise Credit Risk Evaluation Based On Neural Network Algorithm
8 pages
Performance Attribution For Equity Portfolios PDF
No ratings yet
Performance Attribution For Equity Portfolios PDF
11 pages
RBI Credit Policy
No ratings yet
RBI Credit Policy
1 page
PIMCO Quantitative Research Stock Bond Correlation Oct2013
No ratings yet
PIMCO Quantitative Research Stock Bond Correlation Oct2013
12 pages
FRM Part2
No ratings yet
FRM Part2
65 pages
How To Calc. Sharpe Ratio
No ratings yet
How To Calc. Sharpe Ratio
4 pages
Opengamma Local Vol - 004
No ratings yet
Opengamma Local Vol - 004
1 page
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
Barra Model Journal
100% (1)
Barra Model Journal
6 pages
MFE Formulas
No ratings yet
MFE Formulas
7 pages
Cse291d 8
No ratings yet
Cse291d 8
50 pages
13 LinearFactorModels
No ratings yet
13 LinearFactorModels
37 pages
Math Psych 03
No ratings yet
Math Psych 03
48 pages
Autoencoders and Their Applications in Machine Learning
No ratings yet
Autoencoders and Their Applications in Machine Learning
52 pages
L11 - UCLxDeepMind DL2020
No ratings yet
L11 - UCLxDeepMind DL2020
68 pages
Sample Space and Probability
No ratings yet
Sample Space and Probability
86 pages
真藍牙耳機晶片與系統設計 Data Converters: Su-Hao Wu (吳書豪)
100% (1)
真藍牙耳機晶片與系統設計 Data Converters: Su-Hao Wu (吳書豪)
48 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Ch06 Deep Feedforward Networks
No ratings yet
Ch06 Deep Feedforward Networks
90 pages
Datastructure Problems
No ratings yet
Datastructure Problems
1 page
Quality of Life The Assessment Analysis and Interpretation of Patient reported Outcomes 2nd Edition Peter M. Fayers download
No ratings yet
Quality of Life The Assessment Analysis and Interpretation of Patient reported Outcomes 2nd Edition Peter M. Fayers download
63 pages
Student Satisfaction and Learning Outcomes in E Learning An Introduction to Empirical Research 1st Edition Sean B. Eom - The latest ebook edition with all chapters is now available
100% (2)
Student Satisfaction and Learning Outcomes in E Learning An Introduction to Empirical Research 1st Edition Sean B. Eom - The latest ebook edition with all chapters is now available
72 pages
Lectura 04 C4 - EPT 04. A DEA Approach For Evaluating The Labor Efficiency in The Rural Hotel Industry A Case Study in Spain
No ratings yet
Lectura 04 C4 - EPT 04. A DEA Approach For Evaluating The Labor Efficiency in The Rural Hotel Industry A Case Study in Spain
21 pages
The Impact of Cross-Cultural Communication On Collective Efficay in NCAA Basketball Teams
No ratings yet
The Impact of Cross-Cultural Communication On Collective Efficay in NCAA Basketball Teams
21 pages
Domestic Gas Delivery and Service Satisfaction of Indian Customers
No ratings yet
Domestic Gas Delivery and Service Satisfaction of Indian Customers
9 pages
Written Feedback in Students' Writing PDF
No ratings yet
Written Feedback in Students' Writing PDF
11 pages
10 1108 - Ijchm 02 2017 0084
No ratings yet
10 1108 - Ijchm 02 2017 0084
25 pages
Succession, Power Struggle
No ratings yet
Succession, Power Struggle
26 pages
Jsu 117
No ratings yet
Jsu 117
12 pages
06_Banerjee and Banerjee_Business Analytics_Ch06
No ratings yet
06_Banerjee and Banerjee_Business Analytics_Ch06
21 pages
Usher Pajares 2009
No ratings yet
Usher Pajares 2009
13 pages
Development and Validation of The Body Image Scale For Youth (BISY)
No ratings yet
Development and Validation of The Body Image Scale For Youth (BISY)
13 pages
Mao - Social For Learning - A Mixed Methods Study On High School Students Technology Affordances and Perspective
No ratings yet
Mao - Social For Learning - A Mixed Methods Study On High School Students Technology Affordances and Perspective
11 pages
Exploratory Factor Analysis: Development of Perceived Peer Pressure Scale
No ratings yet
Exploratory Factor Analysis: Development of Perceived Peer Pressure Scale
12 pages
The Development and Implications of A Personal Problem-Solving Inventory
No ratings yet
The Development and Implications of A Personal Problem-Solving Inventory
11 pages
2013 TokEtal Reflective Teaching Practices
100% (1)
2013 TokEtal Reflective Teaching Practices
24 pages
Full Chapter Quantitative Psychological Research The Complete Students Companion 5Th Edition David Clark Carter PDF
100% (22)
Full Chapter Quantitative Psychological Research The Complete Students Companion 5Th Edition David Clark Carter PDF
46 pages
Study Habits Questionnaire For College Students
No ratings yet
Study Habits Questionnaire For College Students
10 pages
Pls-Sem Using Smartpls 3: School of Management Research Seminar
No ratings yet
Pls-Sem Using Smartpls 3: School of Management Research Seminar
50 pages
Análisis de Chatbots de Inteligencia Artificial y Satisfacción en El Aprendizaje en Educación Matemática
No ratings yet
Análisis de Chatbots de Inteligencia Artificial y Satisfacción en El Aprendizaje en Educación Matemática
14 pages
Planning Time Management in School Activities and Relation to Procrastination A Study for Educational Sustainability
No ratings yet
Planning Time Management in School Activities and Relation to Procrastination A Study for Educational Sustainability
1 page
Kunter. Baumert 2006
No ratings yet
Kunter. Baumert 2006
22 pages
PROF JONA Casey Fink Nurse Retention Article
No ratings yet
PROF JONA Casey Fink Nurse Retention Article
9 pages
Measuring Technology Self Efficacy
No ratings yet
Measuring Technology Self Efficacy
9 pages
Project Report On Effect of Packaging and Labeling Information On Consumer Learning With Respect To Food Product in Punjab..
75% (20)
Project Report On Effect of Packaging and Labeling Information On Consumer Learning With Respect To Food Product in Punjab..
52 pages
Best SEM STATA Menu StataSEMMasterDay2and3 PDF
No ratings yet
Best SEM STATA Menu StataSEMMasterDay2and3 PDF
58 pages
ChatGPT As An Academic Support Tool On The Academi
No ratings yet
ChatGPT As An Academic Support Tool On The Academi
13 pages
Psychological Ownership, Territorial PDF
No ratings yet
Psychological Ownership, Territorial PDF
24 pages