0% found this document useful (0 votes)

5 views

Lecture7 1CoordinatedRepresentations

Uploaded by

Sanjay Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Lecture7 1CoordinatedRepresentations

Uploaded by

Sanjay Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Advanced

Multimodal Machine Learning

Lecture 7.1: Multivariate Statistics
and Coordinated Representations
Louis-Philippe Morency

* Original version co-developed with Tadas Baltrusaitis

1
Lecture Objectives

▪ Quick recap
▪ Temporal Joint Representation
▪ Multivariate statistical analysis
▪ Basic concepts (multivariate, covariance,…)
▪ Principal component analysis (+SVD)
▪ Canonical Correlation Analysis
▪ Deep Correlation Networks
▪ Deep CCA, DCCA-AutoEncoder
▪ (Deep) Correlational neural networks
▪ Matrix Factorization
▪ Nonnegative Matrix Factorization
Temporal Joint
Representation
3
Sequence Representation with LSTM

𝒚𝟏 𝒚𝟐 𝒚𝟑 𝒚𝜏

LSTM(1) LSTM(2) LSTM(3) LSTM(𝜏)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏
Multimodal Sequence Representation – Early Fusion

𝒚𝟏 𝒚𝟐 𝒚𝟑 𝒚𝜏

LSTM(1) LSTM(2) LSTM(3) LSTM(𝜏)

(1) (1) (1) (1)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏

(2) (2) (2) (2)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏

(3) (3) (3) (3)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏
Multi-View Long Short-Term Memory (MV-LSTM)

𝒚𝟏 𝒚𝟐 𝒚𝟑 𝒚𝜏

MV- MV- MV- MV-

LSTM(1) LSTM(2) LSTM(3) LSTM(𝜏)

(1) (1) (1) (1)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏

(2) (2) (2) (2)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏

(3) (3) (3) (3)

𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝜏

[Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016]
Multi-View Long Short-Term Memory

Multi-view topologies Multiple

memory cells
(1) 𝒈(1)
𝒕 𝒄𝒕(1)
𝒉𝒕−𝟏 MV- (2)
𝒉(1)
𝒕
MV- 𝒈𝒕
LSTM(1) 𝒉(2)
tanh 𝒄𝒕(2) 𝒉(2)
𝒕−𝟏 𝒕
(3)
𝒈 𝒄𝒕(3)
𝒉(3)
𝒕−𝟏 𝒉(3)
𝒕

MV-
(1) sigm
𝒙𝒕
𝒙(2)
𝒕
MV-
(3)
𝒙𝒕 sigm

MV-
sigm

[Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016]
Topologies for Multi-View LSTM

Fully-
View-specific connected
Multi-view topologies α=1, β=0 α=1, β=1
𝒙(1) 𝒈(1) 𝒙(1) 𝒈(1)
𝒈(1)
𝒕
𝒕 𝒕 𝒕 𝒕
𝒉(1)
𝒕−𝟏 MV- (2) 𝒉(1)
𝒕−𝟏 𝒉(1)
𝒕−𝟏
MV- 𝒈𝒕
tanh
LSTM(1) 𝒉(2)
𝒕−𝟏 𝒉(2)
𝒕−𝟏 𝒉(2)
𝒕−𝟏
(3)
𝒈
𝒉(3)
𝒕−𝟏 𝒉(3)
𝒕−𝟏 𝒉(3)
𝒕−𝟏

𝒙(1)
𝒕 Coupled Hybrid
α: Memory from α=0, β=1 α=2/3, β=1/3
𝒙(2)
𝒕
current view 𝒙(1)
𝒕 𝒈(1) 𝒙(1)
𝒕 𝒈(1)
(3) 𝒕 𝒕
𝒙𝒕 β: Memory from 𝒉(1)
𝒕−𝟏 𝒉(1)
𝒕−𝟏
other views
𝒉(2)
𝒕−𝟏 𝒉(2)
𝒕−𝟏
Design parameters 𝒉(3)
𝒕−𝟏 𝒉(3)
𝒕−𝟏

[Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016]
Multi-View Long Short-Term Memory (MV-LSTM)

Multimodal prediction of children engagement

[Shyam, Morency, et al. Extending Long Short-Term Memory for Multi-View Structured Learning, ECCV, 2016]
Quick Recap
10
Multimodal Representation Learning

Learn (unsupervised) a joint

representation between multiple
modalities where similar unimodal
concepts are closely projected.
··· softmax
❑ Deep Multimodal
Boltzmann machines ···

··· ···

··· ···
Text Image
𝑿 𝒀
11
Multimodal Representation Learning
𝑿′ 𝒀′
Learn (unsupervised) a joint Text Image
representation between multiple
··· ···
modalities where similar unimodal
concepts are closely projected. ··· ···
❑ Deep Multimodal
Boltzmann machines ···
❑ Stacked Autoencoder
··· ···

··· ···
Text Image
𝑿 𝒀
12
Multimodal Representation Learning

Learn (unsupervised) a joint

representation between multiple
modalities where similar unimodal
concepts are closely projected.
❑ Deep Multimodal

···
Boltzmann machines
❑ Stacked Autoencoder
··· ···
❑ Encoder-Decoder
··· ···
Text Image
𝑿 𝒀
13
Multimodal Representation Learning

Learn (unsupervised) a joint

representation between multiple e.g. Sentiment
modalities where similar unimodal ··· softmax

concepts are closely projected. Bimodal

𝒉𝒎
❑ Deep Multimodal Unimodal
1
Boltzmann machines
❑ Stacked Autoencoder 𝒉𝒙 ··· ··· 𝒉𝒚

··· ···
❑ Encoder-Decoder Text Image
𝑿 𝒀
❑ Tensor Fusion representation

How Can We Learn Better Representations?

14
Coordinated
Multimodal
Representations
15
Coordinated multimodal embeddings

▪ Instead of projecting to a joint space enforce the similarity between

unimodal embeddings

Repres. 1 Repres 2

Modality 1 Modality 2
Coordinated Multimodal Representations

Learn (unsupervised) two or more

coordinated representations from
multiple modalities. A loss function
is defined to bring closer these (e.g.,
multiple representations. Similarity metric cosine
distance)

··· ···

··· ···
Text Image
𝑿 𝒀
17
Coordinated Multimodal Embeddings

[Huang et al., Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, 2013]
Multimodal Vector Space Arithmetic

[Kiros et al., Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, 2014]
Multimodal Vector Space Arithmetic

[Kiros et al., Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, 2014]
Structured coordinated embeddings

▪ Instead of or in addition to similarity add alternative

structure

[Vendrov et al., Order-Embeddings of [Jiang and Li, Deep Cross-Modal Hashing]

Images and Language, 2016]
Multivariate
Statistical Analysis
22
Multivariate Statistical Analysis

“Statistical approaches to understand the

relationships in high dimensional data”

▪ Example of multivariate analysis approaches:

▪ Multivariate analysis of variance (MANOVA)
▪ Principal components analysis (PCA)
▪ Factor analysis
▪ Linear discriminant analysis (LDA)
▪ Canonical correlation analysis (CCA)
Random Variables

Definition: A variable whose possible values are

numerical outcomes of a random phenomenon.
❑ Discrete random variable is one which may take on only a
countable number of distinct values such as 0,1,2,3,4,…
❑ Continuous random variable is one which takes an infinite
number of possible values.

Examples of random variables:

• Someone’s age Discrete or
• Someone’s height continuous?
• Someone’s weight Correlated?

24
Definitions
Given two random variables 𝑋 and 𝑌:
Expected value probability-weighted average of all possible values

𝜇 = 𝐸 𝑋 = ෍ 𝑥𝑖 𝑃(𝑥𝑖 )
𝑖
➢ If same probability for all observations 𝑥𝑖 , then same as arithmetic mean

Variance measures the spread of the observations

If data is
𝜎 2 = 𝑉𝑎𝑟(𝑋) = 𝐸[ 𝑋 − 𝜇 𝑋 − 𝜇 ] = 𝐸[𝑋ത 𝑋]
ത
centered
➢ Variance is equal to the square of the standard deviation 𝜎

Covariance measures how much two random variables change together

𝑐𝑜𝑣(𝑋, 𝑌) = 𝐸[ 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑦 ] = 𝐸[𝑋ത 𝑌]
ത

25
Definitions

Pearson Correlation measures the extent to which two

variables have a linear relationship with each other
𝑐𝑜𝑣(𝑋, 𝑌)
𝜌𝑋,𝑌 = 𝑐𝑜𝑟𝑟 𝑋, 𝑌 =
𝑣𝑎𝑟 𝑋 𝑣𝑎𝑟(𝑌)

26
Pearson Correlation Examples

27
Definitions
Multivariate (multidimensional) random variables
(aka random vector)
𝑿= [𝑋 1 , 𝑋 2 , 𝑋 3 , … , 𝑋 𝑀 ]
𝒀 = [𝑌1 , 𝑌 2 , 𝑌 3 , … , 𝑌 𝑁 ]
Covariance matrix generalizes the notion of variance
𝑇 ഥ𝑿ഥ𝑇]
Σ𝑿 = Σ𝑿,𝑿 = 𝑣𝑎𝑟(𝑿) = 𝐸 𝑿 − 𝐸[𝑿] 𝑿 − 𝐸[𝑿] = 𝐸[𝑿
Cross-covariance matrix generalizes the notion of covariance
𝑇 ഥ𝒀ഥ𝑇 ]
Σ𝑿,𝒀 = 𝑐𝑜𝑣(𝑿, 𝒀) = 𝐸 𝑿 − 𝐸[𝑿] 𝒀 − 𝐸[𝒀] = 𝐸[𝑿

28
Definitions
Multivariate (multidimensional) random variables
(aka random vector)
𝑿= [𝑋 1 , 𝑋 2 , 𝑋 3 , … , 𝑋 𝑀 ]
𝒀 = [𝑌1 , 𝑌 2 , 𝑌 3 , … , 𝑌 𝑁 ]
Covariance matrix generalizes the notion of variance
𝑇 ഥ𝑿ഥ𝑇]
Σ𝑿 = Σ𝑿,𝑿 = 𝑣𝑎𝑟(𝑿) = 𝐸 𝑿 − 𝐸[𝑿] 𝑿 − 𝐸[𝑿] = 𝐸[𝑿
Cross-covariance matrix generalizes the notion of covariance

𝑐𝑜𝑣(𝑋1 , 𝑌1 ) 𝑐𝑜𝑣(𝑋2 , 𝑌1 ) … 𝑐𝑜𝑣(𝑋𝑀 , 𝑌1 )

𝑐𝑜𝑣(𝑋1 , 𝑌2 ) 𝑐𝑜𝑣(𝑋2 , 𝑌2 ) … 𝑐𝑜𝑣(𝑋𝑀 , 𝑌2 )
Σ𝑿,𝒀 = 𝑐𝑜𝑣(𝑿, 𝒀) =
⋮ ⋮ ⋱ ⋮
𝑐𝑜𝑣(𝑋1 , 𝑌𝑁 ) 𝑐𝑜𝑣(𝑋2 , 𝑌𝑁 ) … 𝑐𝑜𝑣(𝑋𝑀 , 𝑌𝑁 )
29
Definitions – Matrix Operations

Trace is defined as the sum of the elements on the main diagonal

of any matrix 𝑿
𝑛

𝑡𝑟 𝑿 = ෍ 𝑥𝑖𝑖
𝑖=1

30
Principal component analysis

PCA converts a set of observations of possibly correlated

variables into a set of values of linearly uncorrelated
variables called principal components
▪ Eigenvectors are orthogonal towards each other and have
length one
▪ The first couple of eigenvectors explain the most of the
variance observed in the data
▪ Low eigenvalues indicate little loss of information if omitted

31
Eigenvalues and Eigenvectors
Eigenvalue decomposition
If A is an nn matrix, do there exist nonzero vectors x
in Rn such that Ax is a scalar multiple of x?
➢ (The term eigenvalue is from the German
word Eigenwert, meaning “proper value”)
Eigenvalue equation:
Ax =  x Geometric Interpretation
y

Eigenvector Eigenvalue Ax =  x

A: an nn matrix
: a scalar (could be zero) x

x: a nonzero vector in Rn x
Singular Value Decomposition (SVD)

▪ SVD expresses any matrix 𝐀 as

𝐀 = 𝐔𝐒𝐕 𝑇

▪ The columns of 𝐔 are eigenvectors of 𝐀𝐀𝑇 , and

the columns of 𝐕 are eigenvectors of 𝐀𝑇 𝐀.

𝐀𝐀𝑇 𝐮𝑖 = 𝑠𝑖2 𝐮𝑖
𝐀𝑇 𝐀𝐯𝑖 = 𝑠𝑖2 𝐯𝑖
Canonical
Correlation Analysis
34
Multi-view Learning
𝑿 𝒀

demographic properties responses to survey

audio features at time i video features at time i

35
Canonical Correlation Analysis

“canonical”: reduced to the simplest or clearest

schema possible
1 Learn two linear projections, one

projection of Y
for each view, that are maximally
correlated:
projection of X

𝒖∗ , 𝒗∗ = argmax 𝑐𝑜𝑟𝑟 𝑯𝒙 , 𝑯𝒚 𝑯𝒙 𝑯𝒚
𝒖,𝒗
··· ···
= argmax 𝑐𝑜𝑟𝑟 𝒖𝑻 𝑿, 𝒗𝑻 𝒀 𝑼 𝑽
𝒖,𝒗
··· ···
Text Image
𝑿 𝒀
36
Correlated Projection

1 Learn two linear projections, one for each view,

that are maximally correlated:

𝒖∗ , 𝒗∗ = argmax 𝑐𝑜𝑟𝑟 𝒖𝑻 𝑿, 𝒗𝑻 𝒀
𝒖,𝒗

𝒗
𝒖

𝒀
𝑿

Two views 𝑿, 𝒀 where same instances have the same color

37
Canonical Correlation Analysis

1 Learn two linear projections, one for each view,

that are maximally correlated:
𝒖∗ , 𝒗∗ = argmax 𝑐𝑜𝑟𝑟 𝒖𝑻 𝑿, 𝒗𝑻 𝒀
𝒖,𝒗
where
𝑐𝑜𝑣(𝒖𝑻 𝑿, 𝒗𝑻 𝒀)
= argmax 𝚺𝑿𝒀 = 𝑐𝑜𝑣(𝑿, 𝒀) = 𝑿𝒀𝑻
𝒖,𝒗 𝑣𝑎𝑟 𝒖𝑻 𝑿 𝑣𝑎𝑟(𝒗𝑻 𝒀)
if both 𝑿, 𝒀 have 0 mean
𝒖𝑻 𝑿𝒀𝑇 𝒗 𝝁𝑿 = 𝟎 𝝁𝒀 = 𝟎
= argmax
𝒖,𝒗 𝒖𝑻 𝑿𝑿𝑻 𝒖 𝒗𝑻 𝒀𝒀𝑻 𝒗
𝒖𝑻 𝚺𝑿𝒀 𝒗
= argmax
𝒖,𝒗 𝒖𝑻 𝚺𝑿𝑿 𝒖 𝒗𝑻 𝚺𝒀𝒀 𝒗
38
Canonical Correlation Analysis
We want to learn multiple projection pairs 𝒖(𝑖) 𝑿, 𝒗(𝑖) 𝒀 :
𝒖𝑻(𝑖) 𝚺𝑿𝒀 𝒗(𝑖)
𝒖∗(𝑖) , 𝒗∗(𝑖) = argmax
𝒖 𝑖 ,𝒗(𝑖)
𝒖𝑻(𝑖) 𝚺𝑿𝑿 𝒖(𝑖) 𝒗𝑻(𝑖) 𝚺𝒀𝒀 𝒗(𝑖)

2 We want these multiple projection pairs to be orthogonal

(“canonical”) to each other:

𝒖𝑻(𝑖) 𝚺𝑿𝒀 𝒗(𝑗) = 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎 for 𝑖 ≠ 𝑗

𝑼𝚺𝑿𝒀 𝑽 = 𝑡𝑟(𝑼𝚺𝑿𝒀 𝑽) where 𝑼 = [𝒖 1 , 𝒖 2 ,…, 𝒖 𝑘 ]

and 𝑽 = [𝒗 1 , 𝒗 2 ,…, 𝒗 𝑘 ]
39
Canonical Correlation Analysis

𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)
𝑼∗ , 𝑽∗ = argmax
𝑼,𝑽 𝑼𝑻 𝚺𝑿𝑿 𝑼 𝑽𝑻 𝚺𝒀𝒀 𝑽

3 Since this objective function is invariant to scaling, we

can constraint the projections to have unit variance:
𝑼𝑻 𝚺𝑿𝑿 𝑼 = 𝑰 𝑽𝑻 𝚺𝒀𝒀 𝑽 = 𝑰

Canonical Correlation Analysis:

maximize: 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)

subject to: 𝑼𝑻 𝚺𝑿𝑿 𝑼 = 𝑽𝑻 𝚺𝒀𝒀 𝑽 = 𝑰, 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎
for 𝑖 ≠ 𝑗

40
Canonical Correlation Analysis
maximize: 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)
subject to: 𝑼𝑻 𝚺𝑿𝑿 𝑼 = 𝑽𝑻 𝚺𝒀𝒀 𝑽 = 𝑰, 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎
for 𝑖 ≠ 𝑗

1 0 0 𝜆1 0 0
𝚺𝑿𝑿 𝚺𝒀𝑿 0 1 0 0 𝜆2 0
𝑼,𝑽 0 0 1 0 0 𝜆3
Σ= 𝜆1 0 0
𝚺𝑿𝒀 𝚺𝒀𝒀 1 0 0
0 𝜆2 0 0 1 0
0 0 𝜆3 0 0 1

41
Canonical Correlation Analysis
maximize: 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)
subject to: 𝑼𝑻 𝚺𝑿𝑿 𝑼 = 𝑽𝑻 𝚺𝒀𝒀 𝑽 = 𝑰, 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎
for 𝑖 ≠ 𝑗

How to solve it? ➢ Lagrange Multipliers!

Lagrange function

𝑳 = 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽) + 𝛼 𝑼𝑻 𝚺𝒀𝒀 𝑼 − 𝑰 + 𝛽(𝑽𝑻 𝚺𝒀𝒀 𝑽 − 𝑰)

𝜕𝐿 𝜕𝐿
➢ And then find stationary points of 𝐿: =0 =0
𝜕𝑼 𝜕𝑽
−𝟏 −𝟏 𝑻
𝚺𝑿𝑿 𝚺𝑿𝒀 𝚺𝒀𝒀 𝚺𝑿𝒀 𝑼 = 𝝀𝑼
−𝟏 𝑻 −𝟏
𝚺𝒀𝒀 𝚺𝑿𝒀 𝚺𝑿𝑿 𝚺𝑿𝒀 𝑽 = 𝝀𝑽 where 𝜆 = 4𝛼𝛽

42
Canonical Correlation Analysis
maximize: 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)
subject to: 𝑼𝑻 𝚺𝑿𝑿 𝑼 = 𝑽𝑻 𝚺𝒀𝒀 𝑽 = 𝑰, 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎
for 𝑖 ≠ 𝑗
−𝟏Τ𝟐 −𝟏Τ𝟐
𝑻≜ 𝚺𝑿𝑿 𝚺𝑿𝒀 𝚺𝒀𝒀
−𝟏 𝟐 Τ Τ
−𝟏 𝟐
𝑼∗ , 𝑽∗ = (𝚺𝑿𝑿 𝑼𝑺𝑽𝑫 , 𝚺𝒀𝒀 𝑽𝑺𝑽𝑫 )
➢ Can solve these eigenvalue
equations with Singular Value Eigenvalues
Decomposition (SVD) Eigenvectors

−𝟏 −𝟏 𝑻
Eigenvalue 𝚺𝑿𝑿 𝚺𝑿𝒀 𝚺𝒀𝒀 𝚺𝑿𝒀 𝑼 = 𝝀𝑼
equations −𝟏 𝑻 −𝟏
𝚺𝒀𝒀 𝚺𝑿𝒀 𝚺𝑿𝑿 𝚺𝑿𝒀 𝑽 = 𝝀𝑽 where 𝜆 = 4𝛼𝛽

43
Canonical Correlation Analysis
maximize: 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)
subject to: 𝑼𝑻 𝚺𝑿𝑿 𝑼 = 𝑽𝑻 𝚺𝒀𝒀 𝑽 = 𝑰, 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎
for 𝑖 ≠ 𝑗

projection of Y
Linear projections
1
maximizing correlation
projection of X

2 Orthogonal projections 𝑯𝒙 𝑯𝒚
Unit variance of the ··· ···
3 𝑼 𝑽
projection vectors
··· ···
Text Image
𝑿 𝒀
44
Exploring Deep
Correlation Networks
45
Deep Canonical Correlation Analysis

Same objective function as CCA:

argmax 𝑐𝑜𝑟𝑟 𝑯𝒙 , 𝑯𝒚

View 𝐻𝑥
𝑽,𝑼,𝑾𝒙 ,𝑾𝒚

And need to compute gradients: View 𝐻𝑦

𝑯𝒙 ··· ··· 𝑯𝒚
𝜕𝑐𝑜𝑟𝑟 𝑯𝒙 , 𝑯𝒚
𝑼 𝑽
𝜕𝑈 ··· ···
𝑾𝒙 𝑾𝒚
𝜕𝑐𝑜𝑟𝑟 𝑯𝒙 , 𝑯𝒚 ··· ···
𝜕𝑉 Text Image
Andrew et al., ICML 2013 𝑿 𝒀
46
Deep Canonical Correlation Analysis

Training procedure:
𝑿′ 𝒀′
1. Pre-train the models Text Image
parameters using ··· ···
denoising autoencoders
··· ···
𝑯𝒙 ··· ··· 𝑯𝒚
𝑼 𝑽
··· ···
𝑾𝒙 𝑾𝒚
··· ···
Text Image
Andrew et al., ICML 2013 𝑿 𝒀
47
Deep Canonical Correlation Analysis

Training procedure:
1. Pre-train the models

View 𝐻𝑥
parameters using
denoising autoencoders
View 𝐻𝑦
2. Optimize the CCA
objective functions using 𝑯𝒙 ··· ··· 𝑯𝒚
large mini-batches or 𝑼 𝑽
full-batch (L-BFGS) ··· ···
𝑾𝒙 𝑾𝒚
··· ···
Text Image
Andrew et al., ICML 2013 𝑿 𝒀
48
Deep Canonically Correlated Autoencoders (DCCAE)
𝑿′ 𝒀′
Jointly optimize for DCCA and Text Image
autoencoders loss functions ··· ···
➢ A trade-off between multi-view
··· ···
correlation and reconstruction

View 𝐻𝑥
error from individual views
View 𝐻𝑦

𝑯𝒙 ··· ··· 𝑯𝒚
𝑼 𝑽
··· ···
𝑾𝒙 𝑾𝒚
··· ···
Text Image
Wang et al., ICML 2015 𝑿 𝒀
49
Deep Correlational Neural Network

1. Learn a shallow CCA autoencoder (similar to 1

layer DCCAE model)
2. Use the learned weights for initializing the
autoencoder layer
3. Repeat procedure

Chandar et al., Neural Computation, 2015

Matrix Factorization
51
Data Clustering

How to discover groups in your data?

K-mean is a simple clustering algorithm
based on competitive learning
• Iterative approach
o Assign each data point to one
cluster (based on distance metric)
o Update cluster centers
o Until convergence
• “Winner takes all”

··· ···
Text Image
𝑿 𝒀
52
Enforcing Data Clustering in Deep Networks

How to enforce data clustering in our

(multimodal) deep learning
algorithms?

··· ···

··· ···
Text Image
𝑿 𝒀
53
Nonnegative Matrix Factorization (NMF)

Given: Nonnegative n x m matrix M (all entries ≥ 0)

G
X = F

Want: Nonnegative matrices F (n x r) and G (r x m),

s.t. X = FG.
➢ easier to interpret
➢ provide better results in information retrieval, clustering

54
Semi-NMF and Other Extensions

··· ···

··· ···
Text Image
Ding et al., TPAMI2015 𝑿 𝒀
55
Deep Matrix Factorization

Li and Tang, MMML 2015

56
Deep Semi-NMF Model

Trigerous et al., TPAMI 2015

57
Multivariate Statistics

▪ Multivariate analysis of variance (MANOVA)

▪ Principal components analysis (PCA)
▪ Factor analysis
▪ Linear discriminant analysis (LDA)
▪ Canonical correlation analysis (CCA)
▪ Correspondence analysis
▪ Canonical correspondence analysis
▪ Multidimensional scaling
▪ Multivariate regression
▪ Discriminant analysis

High Energy Laser (Hel): Tomorrow’S Weapon in Directed Energy Weapons Volume I
From Everand
High Energy Laser (Hel): Tomorrow’S Weapon in Directed Energy Weapons Volume I
Bahman Zohuri
No ratings yet
Linear Algebra With Applications
No ratings yet
Linear Algebra With Applications
1,032 pages
Bruno Lecture Notes PDF
No ratings yet
Bruno Lecture Notes PDF
251 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
ML. MODEL 2
100% (1)
ML. MODEL 2
31 pages
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
MVA Section1 2012
No ratings yet
MVA Section1 2012
14 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
stat
No ratings yet
stat
53 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
CS3491-AI ML-Chapter 5
No ratings yet
CS3491-AI ML-Chapter 5
25 pages
2024 Suykens
No ratings yet
2024 Suykens
72 pages
Copy of deep-learning
No ratings yet
Copy of deep-learning
28 pages
Intro Class PDF
No ratings yet
Intro Class PDF
7 pages
Multivariate
100% (1)
Multivariate
78 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
44 pages
Eigenfaces With Pca
No ratings yet
Eigenfaces With Pca
12 pages
MLXX (Dimensionality Reduction) - 1
No ratings yet
MLXX (Dimensionality Reduction) - 1
70 pages
Chaper1 Introduction
No ratings yet
Chaper1 Introduction
26 pages
Lec4 IntroToProbabilityAndStatistics
No ratings yet
Lec4 IntroToProbabilityAndStatistics
44 pages
Module 4_chapter 2
No ratings yet
Module 4_chapter 2
14 pages
Canonical Correlation Analysis: An Overview With Application To Learning Methods
No ratings yet
Canonical Correlation Analysis: An Overview With Application To Learning Methods
22 pages
Ch5 Multivariate Methods
No ratings yet
Ch5 Multivariate Methods
26 pages
Applied Multivariate Statistical Analysis: Chang Xinfeng Department of Statistics
No ratings yet
Applied Multivariate Statistical Analysis: Chang Xinfeng Department of Statistics
46 pages
Covariance Matrix
No ratings yet
Covariance Matrix
6 pages
Canonical Correlation Analysis
No ratings yet
Canonical Correlation Analysis
22 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
AutoEncoder
No ratings yet
AutoEncoder
11 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Dmitry Grapov
No ratings yet
Dmitry Grapov
41 pages
textbook ML_removed (1)
No ratings yet
textbook ML_removed (1)
22 pages
Covariance and Correlation: Multiple Random Variables
No ratings yet
Covariance and Correlation: Multiple Random Variables
2 pages
Note 1
No ratings yet
Note 1
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
37 pages
Wald 3 Web
No ratings yet
Wald 3 Web
76 pages
Section 2 - Descriptive Multivariate Statistics
No ratings yet
Section 2 - Descriptive Multivariate Statistics
9 pages
Multivariate Statistics
No ratings yet
Multivariate Statistics
6 pages
Unit 3
No ratings yet
Unit 3
21 pages
AE2-Nets Autoencoder in Autoencoder Networks
No ratings yet
AE2-Nets Autoencoder in Autoencoder Networks
9 pages
CSE2_12200122084_SohamSarkar_PR_67
No ratings yet
CSE2_12200122084_SohamSarkar_PR_67
9 pages
10 Cor1
No ratings yet
10 Cor1
18 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
NN Notes PDF
No ratings yet
NN Notes PDF
126 pages
Unit-Iii 3.1 Regression Modelling
100% (1)
Unit-Iii 3.1 Regression Modelling
7 pages
Lecture 2_Math (1)
No ratings yet
Lecture 2_Math (1)
39 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Lecture12 1MultimodalFusion
No ratings yet
Lecture12 1MultimodalFusion
66 pages
1.12.2024-BSC-301-CSBS-class note_2024-25
No ratings yet
1.12.2024-BSC-301-CSBS-class note_2024-25
58 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
AOD Lec5-6
No ratings yet
AOD Lec5-6
52 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Chapter 1 Introduction and Review
No ratings yet
Chapter 1 Introduction and Review
43 pages
Data Characterization
No ratings yet
Data Characterization
31 pages
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
No ratings yet
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
37 pages
Systemic Lupus Erythematosus: A Systematic Approach to Arthritis of Rheumatic Diseases: Volume 4
From Everand
Systemic Lupus Erythematosus: A Systematic Approach to Arthritis of Rheumatic Diseases: Volume 4
PublishDrive
No ratings yet
Electromagnetic Warfare: Strategies and Technologies in Modern Combat
From Everand
Electromagnetic Warfare: Strategies and Technologies in Modern Combat
Fouad Sabry
No ratings yet
The Fundamental Theorem of Linear Algebra 1993
No ratings yet
The Fundamental Theorem of Linear Algebra 1993
9 pages
Lecture 1 On Solid Mechanics
No ratings yet
Lecture 1 On Solid Mechanics
89 pages
mml assignment
No ratings yet
mml assignment
3 pages
Get Numerical Linear Algebra and Applications 2nd Edition Biswa Nath Datta PDF ebook with Full Chapters Now
100% (1)
Get Numerical Linear Algebra and Applications 2nd Edition Biswa Nath Datta PDF ebook with Full Chapters Now
51 pages
Linear Algebra 1st Edition Jim Hefferon download
100% (3)
Linear Algebra 1st Edition Jim Hefferon download
56 pages
Linear Algebra with Applications 7th Edition W. Keith Nicholson All Chapters Instant Download
No ratings yet
Linear Algebra with Applications 7th Edition W. Keith Nicholson All Chapters Instant Download
51 pages
Final Exam of Applied Linear Algebra - Math 270 April 19, 2011
No ratings yet
Final Exam of Applied Linear Algebra - Math 270 April 19, 2011
5 pages
Revised Brochure BStat (2016)
No ratings yet
Revised Brochure BStat (2016)
36 pages
NC Solutions
No ratings yet
NC Solutions
123 pages
Moon Asaki Snipes Application Inspired Linear Algebra
No ratings yet
Moon Asaki Snipes Application Inspired Linear Algebra
538 pages
17B4 1-Allwebnotes
No ratings yet
17B4 1-Allwebnotes
68 pages
Finite group representations for the pure mathematician Webb P. - Download the ebook now to never miss important content
100% (1)
Finite group representations for the pure mathematician Webb P. - Download the ebook now to never miss important content
53 pages
Linear Algebra With R
No ratings yet
Linear Algebra With R
26 pages
Quantum Mechanics: School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
No ratings yet
Quantum Mechanics: School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
7 pages
Instant Access to John S Bell on the Foundations of Quantum Mechanics 1st Edition J. S. Bell ebook Full Chapters
100% (5)
Instant Access to John S Bell on the Foundations of Quantum Mechanics 1st Edition J. S. Bell ebook Full Chapters
81 pages
Dimensionality Reduction: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Dimensionality Reduction: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
41 pages
Geometrical and Statistical Properties of Systems of
No ratings yet
Geometrical and Statistical Properties of Systems of
9 pages
Discrete Time Periodic Signals: N N X N X
No ratings yet
Discrete Time Periodic Signals: N N X N X
60 pages
Statistical and Fuzzy Approaches to Data Processing, with Applications to Econometrics and Other Areas: In Honor of Hung T. Nguyen's 75th Birthday Vladik Kreinovich - The special ebook edition is available for download now
No ratings yet
Statistical and Fuzzy Approaches to Data Processing, with Applications to Econometrics and Other Areas: In Honor of Hung T. Nguyen's 75th Birthday Vladik Kreinovich - The special ebook edition is available for download now
41 pages
3501 Handouts
No ratings yet
3501 Handouts
41 pages
NC Solutions
No ratings yet
NC Solutions
158 pages
Delgado, X. Izquierdo, J. Benítez, J. and Pérez, R. Joint Stakeholder Decision Making
No ratings yet
Delgado, X. Izquierdo, J. Benítez, J. and Pérez, R. Joint Stakeholder Decision Making
13 pages
Tutorial5 Solution
No ratings yet
Tutorial5 Solution
2 pages
B.tech Cyber Security
No ratings yet
B.tech Cyber Security
166 pages
Instant download Quantum statistical mechanics 1st Edition William C. Schieve pdf all chapter
100% (2)
Instant download Quantum statistical mechanics 1st Edition William C. Schieve pdf all chapter
81 pages
Get Multilinear Subspace Learning Dimensionality Reduction of Multidimensional Data 1st Edition Haiping Lu free all chapters
100% (1)
Get Multilinear Subspace Learning Dimensionality Reduction of Multidimensional Data 1st Edition Haiping Lu free all chapters
45 pages
Hilbert Space Methods in Signal Processing 1st Edition Rodney A. Kennedy All Chapters Instant Download
100% (1)
Hilbert Space Methods in Signal Processing 1st Edition Rodney A. Kennedy All Chapters Instant Download
61 pages

Lecture7 1CoordinatedRepresentations

Uploaded by

Lecture7 1CoordinatedRepresentations

Uploaded by

Advanced

Multimodal Machine Learning

* Original version co-developed with Tadas Baltrusaitis

LSTM(1) LSTM(2) LSTM(3) LSTM(𝜏)

LSTM(1) LSTM(2) LSTM(3) LSTM(𝜏)

(1) (1) (1) (1)

(2) (2) (2) (2)

(3) (3) (3) (3)

MV- MV- MV- MV-

(1) (1) (1) (1)

(2) (2) (2) (2)

(3) (3) (3) (3)

Multi-view topologies Multiple

Multimodal prediction of children engagement

Learn (unsupervised) a joint

Learn (unsupervised) a joint

Learn (unsupervised) a joint

concepts are closely projected. Bimodal

How Can We Learn Better Representations?

▪ Instead of projecting to a joint space enforce the similarity between

Learn (unsupervised) two or more

▪ Instead of or in addition to similarity add alternative

[Vendrov et al., Order-Embeddings of [Jiang and Li, Deep Cross-Modal Hashing]

“Statistical approaches to understand the

▪ Example of multivariate analysis approaches:

Definition: A variable whose possible values are

Examples of random variables:

Variance measures the spread of the observations

Covariance measures how much two random variables change together

Pearson Correlation measures the extent to which two

𝑐𝑜𝑣(𝑋1 , 𝑌1 ) 𝑐𝑜𝑣(𝑋2 , 𝑌1 ) … 𝑐𝑜𝑣(𝑋𝑀 , 𝑌1 )

Trace is defined as the sum of the elements on the main diagonal

PCA converts a set of observations of possibly correlated

▪ SVD expresses any matrix 𝐀 as

▪ The columns of 𝐔 are eigenvectors of 𝐀𝐀𝑇 , and

demographic properties responses to survey

audio features at time i video features at time i

“canonical”: reduced to the simplest or clearest

1 Learn two linear projections, one for each view,

Two views 𝑿, 𝒀 where same instances have the same color

1 Learn two linear projections, one for each view,

2 We want these multiple projection pairs to be orthogonal

𝒖𝑻(𝑖) 𝚺𝑿𝒀 𝒗(𝑗) = 𝒖𝑻(𝑗) 𝚺𝑿𝒀 𝒗(𝑖) = 𝟎 for 𝑖 ≠ 𝑗

𝑼𝚺𝑿𝒀 𝑽 = 𝑡𝑟(𝑼𝚺𝑿𝒀 𝑽) where 𝑼 = [𝒖 1 , 𝒖 2 ,…, 𝒖 𝑘 ]

3 Since this objective function is invariant to scaling, we

Canonical Correlation Analysis:

maximize: 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽)

How to solve it? ➢ Lagrange Multipliers!

𝑳 = 𝑡𝑟(𝑼𝑻 𝚺𝑿𝒀 𝑽) + 𝛼 𝑼𝑻 𝚺𝒀𝒀 𝑼 − 𝑰 + 𝛽(𝑽𝑻 𝚺𝒀𝒀 𝑽 − 𝑰)

Same objective function as CCA:

And need to compute gradients: View 𝐻𝑦

1. Learn a shallow CCA autoencoder (similar to 1

Chandar et al., Neural Computation, 2015

How to discover groups in your data?

How to enforce data clustering in our

Given: Nonnegative n x m matrix M (all entries ≥ 0)

Want: Nonnegative matrices F (n x r) and G (r x m),

Li and Tang, MMML 2015

Trigerous et al., TPAMI 2015

▪ Multivariate analysis of variance (MANOVA)

You might also like