0% found this document useful (0 votes)

7 views19 pages

Ensi3 PRML s6 Encoders

The document discusses generative networks, focusing on techniques such as EigenSpace Coding, Auto-Encoders, Variational Autoencoders, and Generative Adversarial Networks. It emphasizes the use of Principal Components Analysis (PCA) for dimensionality reduction and feature extraction in image recognition tasks, particularly for face detection and recognition. The document also provides mathematical formulations and examples related to these concepts.

Uploaded by

Mohit Burchunde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Ensi3 PRML s6 Encoders

Uploaded by

Mohit Burchunde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Pattern Recognition and Machine Learning

James L. Crowley

ENSIMAG 3 - MMIS Fall Semester 2020

Lessons 5 9 December 2020

Generative Networks:
EigenSpace Coding, Auto-Encoders, Variational
Autoencoders and Generative Adversarial Networks

Outline:

Notation.............................................................................. 2
Key Equations ................................................................................. 2

Generative Networks ......................................................... 3

Principal Components Analysis and Eigen-Space Coding 4

Principal Components Analysis of Face Imagettes......................... 5
Example .......................................................................................... 8
Reconstruction ................................................................................ 9
Face Detection using Distance from Face Space ............................ 9
Eigenspace coding for transmission. ............................................... 10
Eigenspace Coding for Face Recognition. ...................................... 11

AutoEncoders ................................................................... 12
The Sparsity Parameter ................................................................... 13
Kullback-Leibler Divergence .......................................................... 14
Auto-Encoders vs Principal Components Analysis ........................ 16
Variational Autoencoders ............................................................... 17

Generative Adversarial Networks. ................................... 18

Generative Networks....................................................................... 18
GAN Learning as Min-Max Optimization. ..................................... 18

1
Background Reading
• Turk, M. and Pentland, A., Eigenfaces for Recognition. Journal of cognitive
neuroscience, vol. 3, no 1, p. 71-86, 1991.
• Kingma, D.P., Mohamed, S., Rezende, D.J. and Welling, M., Semi-supervised
learning with deep generative models. In Advances in neural information
processing systems (pp. 3581-3589), NIPS 2014.
• Radford A, Metz L, Chintala S. Unsupervised representation learning with
deep convolutional generative adversarial networks.

Notation
W(i,j) An RxC image window (Imagette).
!
W! A flattened 1-D Vector representing the imagette.
{Wm } A training set of M imagettes
! !
µ = E{Wm } Average (mean) imagette
! ! !
Vm = Wm − µ Zero-mean normalized imagette
! ! !
(
V = V1 V2 " VN )
Training Matrix of Zero-mean imagettes
Σ = VVT Covariance of imagettes
ϑ NxN rotation matrix (N Eigenvectors of Σ )
Λ Diagonal matrix of eigen values for Σ
xd A feature. An observed or measured value.
!
X A vector of D features.
D The number of dimensions for the vector
!
{ X m } {ym } Training samples for learning.
M The number of training samples.
a (lj ) the activation output of the jth neuron of the lth layer.
€ € wij(l ) the weight for the unit i of layer l–1 and the unit j of layer l.
b lj the bias term for jth unit of layer l.
€ ρ The sparsity parameter
€
€
€ Key Equations
Principal Components Analysis: ϑ T Σϑ = Λ

1 M (1)
The average activation at layer l: ρ̂ j = ∑ a j,m
M m=1
! 1 ! (2) ! 2
(1)
N
The autoencoder cost function: Lsparse (W , B; X m , ym ) = (am − X m ) + β ∑ KL(ρ || ρˆ j )
2 j=1
N (1) %
1− ρ (
(1)
N
ρ
the Kullback-Leibler Divergence: ∑ KL(ρ || ρˆ j ) = ∑''ρ log ρˆ + (1− ρ )log 1− ρˆ **
j=1 j=1 & j j)
€

2
€
Generative Networks
Deep learning was originally invented for recognition. The same technology can be
used for generation. Up to now we have looked at what are called “discriminative”
techniques. These are techniques that attempt to discriminate a class label, y from a
!
feature vector, X .
! !
X D( X ) yˆ
€
!
The same process can be used to learn a network that generates X given a code y.
This is called a “generative”
€ €process. €

! €
y G(y) X

!
Given an observable random variable X , and a target variable, gradient descent
! ! !
allows us to learn€a joint €probability €
distribution, P( X , Y ), where X , is generally
!
composed of continuous variables, and Y is generally a discrete set of classes
represented by a binary vector. €
€ €
€ ! !
A discriminative model gives a conditional probability distribution P(Y | X ) .
! !
A generative model gives a a conditional probability P( X | Y )

We can combine a discriminative process for one data set€with a generative process
from another and use these to make synthetic€outputs.

! ! !
X D( X ) yˆ y G(y) X

A classic example is an autoencoder. But before we see how to learn an autoencoder,

I would
€ like to € review a similar
€ classic
€ technique
€ from computer
€ vision:
Eigen-Space Coding.

3
Principal Components Analysis and Eigen-Space Coding
Principal Components Analysis (PCA) is a popular method to reduce the number of
dimensions in high dimensional feature vectors. In some cases this can provide an
important reduction in computing time with little or no impact on recognition rates. It
can also be used to determine an orthogonal basis set for highly redundant features,
such as the raw pixels in small windows extracted from images.

While PCA is primarily a data compression method for encoding, it has been
successfully used as a method for generating features for detection and recognition.
An important example occurred in 1991, with the thesis of Mathew Turk at MIT
Media Lab. (Turk and Pentland - CVPR 1991). This paper won “best paper” at CVPR
and marked the beginning of the use of appearance-based techniques in Computer
Vision.

To use the method we require a training set of M face

!
windows (imagettes) {Wm} of
size R rows and C columns. We will use the set {Wm } to learn a set of D orthogonal
!
basis images ϕ d . These can serve as feature detectors for detection, and recognition.

We can then project an imagette onto basis

€ vectors to obtain a feature vector.
C R
! € !
X m = ϕ T Wm where each component is xdm = Wm , ϕ d = ∑ ∑W m (i, j) ϕ d (i, j)
i=1 j−1

We can reconstruct the imagette by as a weighted sum of basis images.

D €
Wˆ (i, j) = µ (i, j) + ∑ xd ϕ d (i. j)
d =1

where µ (i, j) is the “average” imagette from the training data.

€
The energy of the difference between the original imagette and reconstructed
€ imagette is a measure of similarity of the imagette to a face.
R C
ε R2 = ∑∑ R(i, j)2 where R(i, j) = W (i, j) − Ŵ (i, j)
j=1 i=1

We can also train a classifier for subsets of face images, for example all images of a
particular person. Other recognition techniques are also possible.

4
Principal Components Analysis of Face Imagettes.

For notation reasons, it is often !convenient to map (or flatten) the 2D window W(i,j)
imagettes onto a 1-D vector W . This allows us to express using classical vector
algebra.

For an imagette of size C columns by R rows,

allocate a vector W(n) with N=R x C coefficients
For any pixel (i,j), compute n = j*C + i
Then W(n) = W(i, j).

Principal components analysis is a method to determine a linear subspace that is

optimal for reconstructing a set of vectors.
!
Assume a set of M training vectors (imagettes) {Wm } of N pixels.
The training data are used to compute an orthogonal basis set of D vectors to
!
represent the training set, ϕ d where D ≤ N.

!
This basis is provided by the principal components of the covariance matrix of {Wm } .
First, normalize the imagettes to zero mean:

! ! 1 M
Compute the average vector: µ = E{Wm } that is: µ (n) = ∑Wm (n)
M m=1

! ! !
Normalize to zero mean the training data: Vm = Wm − µ

The covariance matrix is then constructed from the matrix of normalized training
imagettes. Compose the matrix V as

! V (1) V (1) ... VM (1) $

# 1 2
&
! ! ! # V (2) V2 (2) ... VM (2) &
( )
V = V1, V2 ,..., Vm = # 1 &
# ! ! " ! &
# V1 (n) V2 (n) ... VM (n) &%
"

!
V has N lines and M columns. Each column is a training image, m . Each row is a
V
pixel n← (i,j).

5
The outer product V ⋅V T is a covariance matrix, Σ:

Σ = V ⋅V T

This covariance matrix has N x N = N2 coefficients.

# ⋅ ⋅ ⋅ ⋅ & # ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ &
% ( % (
% ⋅ ⋅ ⋅ ⋅ (# & % ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ (
% (% ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ( % ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ (
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
Σ = VV = %
T
⋅ ⋅ ⋅ ⋅ (% (=% ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ (
% (% ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ( % (
% ⋅ ⋅ ⋅ ⋅ (% ( % ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ (
% ($ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ' % (
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
% ⋅ ⋅ ⋅ ⋅ ( % ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ (
$ ' $ '

The coefficients of Σ are the covariances for the pixels, i and j.

M
1
σ ij2 =
M
∑V m (i)Vm ( j)
m=1

For small imagettes, the covariance matrix, Σ, is easily diagonalized using standard
€ algorithms for diagonalizing large matrices, such as Householder's method. (See
numerical recipes in C or any other numerical methods toolkit). This will work for
up to 32 x 32 imagettes (covariance matrices of 1024 x 1024).

(ϕ, Λ) ← PCA(Σ). Giving: ϕ T Σϕ = Λ

Where ϕ, is an NxN rotation matrix and Λ is a diagonal matrix with N non-zero

diagonal values.

The N columns of the rotation matrix, ϕ, are the eigenvectors of Σ. These

!
eigenvectors provide an orthogonal set of N normalized vectors, ϕ n , that provide a
set of orthogonal code vectors for describing the imagettes of size N.

The square root of the diagonal terms of Λ are the eigenvalues, λn. The N
eigenvalues tell the average energy for each eigenvector for the imagettes in the
training set. This can be used to indicate the average squared error that would result
from reconstructing an image without the corresponding eigenvector .

6
Diagonalisation generally work well for matrices up to 1024 x 1024. Thus we can
easily use this method for imagettes up to 32 x 32 pixels. Other more exotic
algorithms can be used for larger matrices.

The eigenvectors provide an orthogonal basis for the training data. We can project
!
any imagette W onto this basis with:

! ! N !
X = ϕ T W = ∑W (n)⋅ ϕ (n)
n=1

N
where the coefficients are xd = ∑W (n)⋅ ϕ d (n) for d=1,…,D with D ≤ N
n=1

This projection acts as a "code" for the imagette. We can reconstruct an imagette
this code as a weight sum of the bases plus the average image.

Reconstruction: Wˆ (n) = µ (n) + ∑ xd ϕ d (n)

d =1

This reconstruction will only produce imagettes that resemble the training data.
Patterns not in the training data will not appear in the reconstruction! This is
classically used as
€ a filter to eliminate noise. It can also be used as a pattern detector!

We can determine an error image:

Error image: R(n) = W (n) − Wˆ (n)

Error Energy: ε 2R = ∑ R(n)2

€ n=1

For xd = W (n), ϕ d (n) = ∑W (n) ⋅ ϕ d (n)

€ n=1

The error energy gives an indication of how similar the input imagette is to the
€ training data. The reconstructed image shows where the difference occurs.

When used with a set of face imagettes, the error energy is called the "difference
from Face Space".

7
Example

16 images randomly selected from a 2 minute video of Francois Berard. (1995).

Average Image Eigen Values

3.5000E+07
3.0000E+07

2.5000E+07
2.0000E+07
1.5000E+07

1.0000E+07
5.0000E+06

0.0000E+00

-5.0000E+06 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Principals Component images

8
Note that for this to work well for face images, the images should be normalized in
position and scale. This is generally accomplished by aligning the eyes to standard
positions.

If the images are note aligned, then the eigenvalues will remain large and a larger
code is required.
Reconstruction

The code can be used to reconstruct an image: Wˆ (n) = µ (n) + ∑ xd ϕ d (n)

d =1

Example reconstructed image (120 bytes) Error Image.

Reconstruction (120 bytes)

Image Error

Face Detection using Distance from Face Space

The residue image can be used to determine if a new face imagette, W(n), is "similar"
to the eigenspace (linear subspace). In this case, the residue is called the "Distance
from Face Space" (DFS)

The distance from Face Space can be used as a face detector!

We scan the image with different size windows, texture map each window to a
standard size, then computer the residue distance from face space. If the distance is
small, the window contains a face similar to the Face space.
9
!
If all N bases used, then any imagette from {Wm } can be perfectly reconstructed
(except for round off error).

Reconstruction: Wˆ (n) = µ (n) + ∑ xd ϕ d (n)

d =1

Residue image: R(n) = W (n) − Wˆ (n)

€
N

Residue Energy: ε = ∑ R(n)2

2
R
€ n=1

Test if an imagette is a face: if ( ε 2R < Threshold) THEN Face ELSE NOT Face
€

In practice, this method is less effective and more expensive than the cascade
classifiers seen last lecture.

Eigenspace coding for transmission.

Eigen space coding is a very effective method for signal compression!

In 1996 we were able to demonstrate video telephony in real time (video rates) over a
9600 baud serial line! We ran a video conf with MIT with very low band-width
phone line. In this demo, we used 32 coefficients per image! (32 x 4 - 128
bytes/image).

10
Eigenspace Coding for Face Recognition.
! !
We can use the coefficients X = W (n), ϕ (n) as a feature vector for recognition.

This technique typically used with Gaussian Mixture models,

!
Let us define the parameters
€ for an I component model as ν = (α i , µi , Σi )
then

! I
! ! ! ! € 1 1 ! ! ! !
– ( X – µ )T Σ−1 ( X – µ )
p( X; ν ) = ∑α iN ( X; µi , Σi ) where : N ( X; µ, Σ) = D 1 e 2

i=1
(2π) det(Σ)
2 2

We use an algorithm such as EM to learn a model ν k for Mk samples of images

€ {Wmk (n)} of the face for each individual,
€ k, from a population of K individuals.

! !€
N
!
Given an unknown face, W(n): X = W (n), ϕ (n) = ∑W (n) ⋅ ϕ (n)
€ n=1

from Bayes rule we can determine the most probable individual, Ck as :

€
!
! p( X | Ck )P(Ck )
P(Ck | X) = k
!
∑ p( X | Ck )P(Ck )
k=1

In 1991, Eigen-space coding was a revolutionary technique for face recognition,

because it was the first technique to actually seem to work!

However, testing soon revealed that it could only work with:

1) Controlled lighting
2) Pre-defined face orientation (i.e. Cooperating Subject)
3) Normalized image position and size.
4) Minimal occlusions
5) Limited size population
In other conditions the results are unreliable.

In practice, if there are variations in illumination, these will dominate the first
eigenvectors. In this case the corresponding eigenvectors are not useful for
recognition and can be omitted.

11
AutoEncoders

An auto-encoder is an unsupervised learning algorithm that uses back-propagation to

learning a sparse set of features for describing the training data. Rather than try to
learn a target variable, ym, the auto-encoder tries to learn to reconstruct the input X
using a minimum set of features (latent variables).

An Autocoder learns to reconstruct (generate) clean copies of data without noise.

The Key concepts are:
1) The training data is the target. The error is the difference between input and output
2) Training is with standard back-propagation (or gradient descent), with the addition
of a “sparsity term” to the loss function
3) Sparsity encodes the data with a minimum number of independent hidden units
(Code vectors)

!
Using the notation from our 2 layer network, given an input feature vector X m the
auto-encoder learns {wij(1) ,b (1)
j } and {w jk ,bk }
(2) (2)
such that for each training sample,
! (2) ˆ !
am = X m ≈ X m using as few hidden units as possible.
€
Note that N(2) N(1) << N(2)
€ =D and that €
€

12
When the number of hidden units N(2) is less than the number of input units, D,
! !
am(2) = Xˆ m ≈ X m is necessarily an approximation. The hidden unites provide
!
a “lossy” encoding for X m . This encoding can be used to suppress noise!
! "
€ The error for back-propagation for each unit is a vector δm(2) = a"m(2) – X m
!
with a component
€ δi,m for component xi,m of the training sample X m

The hidden code is composed of independent € “features” that capture some

component of the input vector. Each cell of the code vector
€ is driven by a receptive
field whose sum of products with the receptive fields of other code cells is almost
zero. That is, the code vectors are almost orthogonal. However, rather than
minimizing the product of code vectors, sparsity seeks to generate the smallest set of
code vectors that can reconstruct the training data without the noise. With an
autoencoder the components may have some slight overlap. The average degree of
independence is captured by a “sparsity parameter”, ρˆ .

The Sparsity Parameter

each of the hidden units j=1 to N(1).
The sparsity ρˆ j is the average activation for €
The auto-encoder will learn weights subject to a sparseness constraints specified by a
target sparsity parameter ρ , typically set close to zero.
€
The simple, 2-layer auto-encoder is described by:
€
"x %
! $ 1,m '
Level 0: X m = $ " ' an input vector
$x '
# D,m &

! D
level 1: Ym = a j,m = f (∑ wij(1) xi,m + b(1)
(1)
j ) the code vector
€ i=1

N (1)
level 2: X̂ m = a(2)
k,m = f (∑ w (2) (1) (2)
jk a j,m + bk ) the reconstruction of the input.
j=1

The output should approximate the input.

" a1(2) %
! $ ' ! ! " "
am(2) = $ " ' = Xˆ m ≈ X m , with error δm(2) = am(2) – X m
$ a (2) '
# D &

€
13
€
The sparsity ρˆ j for each hidden unit (code component) is computed as the average
activation for the M training samples:

M
€ 1
ρˆ j =
M
∑a (1)
j,m
m=1

The auto-encoder is trained to minimize the average sparsity. This is accomplished

€ using back propagation, with a simple tweak to the cost function.

Standard back propagation tries to minimize a loss based on the sum of squared
errors. The loss for each sample is.

! 1 !
Lm ( X m , ym ) = (am(L ) − ym )2
2

For an auto-encoder, the target output is the input vector, and the loss is squared
difference from the input vector:

! 1 ! !
Lm ( X m , ym ) = (am(L ) − X m )2
2

To impose “sparsity” we add an additional term to the loss.

! 1 ! (L ) ! 2 N (1)
Lm ( X m , ym ) = (am − X m ) + β ∑ KL( ρ || ρ̂ j )
2 j=1
N (1)
where ∑ KL(ρ || ρˆ ) j is the Kullback-Leibler Divergence of the vector of hidden unit
j=1

activations and β controls the importance of the sparsity parameter.

€
Kullback-Leibler Divergence
€
The KL divergence between the desired and average activation is:

N (1) N (1) % ρ 1− ρ (
∑ KL(ρ || ρˆ ) = ∑''ρ log ρˆ
j + (1− ρ )log *
1− ρˆ j *)
j=1 j=1 & j

14
To incorporate the KL divergence into back propagation, we replace
( 2)
∂f (z (1)
j )
N
(1)
δ =
j
∂z (1)
∑ w(2)jk δk(2)
j k=1

with

€ ∂f (z (1) ( N (2) ( ρ 1− ρ ++
j )
* ∑ w jk δk + β **− ρˆ + 1− ρˆ ----
(1) (2) (2)
δ =
j
*
∂z (1)
j ) k=1 ) j j ,,

where N(2) = D, the size of the size of the input and output vectors.
€ (The network output has the same number of components as the input).

The average activation ρˆ j is used to compute the correction. Thus you need to
compute a forward pass on a batch of training data, before computing the back-
propagation. Thus learning is necessarily batch mode.
€
The auto-encoder forces the hidden units to become approximately orthogonal,
allowing a small correlation determined by the target sparsity, ρ . Thus the hidden
units act as a form of basis space for the input vectors. The values of the hidden code
layer are referred to as latent variables. The latent variables provide a compressed
representation that reduces dimensionality and eliminates € random noise.

15
ld Learning Hypothesis
ate%near%a%lower%dimensional%
Auto-Encoders vs Principal Components Analysis
%of%high%density%where%small%changes%are%only% The Manifold Learn
direcGons)%
What is the difference between an Auto-Encoder and Principle Components analysis?
• Examples%concentrate%near%a%lowe
Both techniques project a high-dimensional data set onto a lower dimensional
manifold (variété différentielle in french). “manifold”%(region%of%high%density
allowed%in%certain%direcGons)%
Affine Transformations of a Bitmap Image Face Expressions for an individual

83%
(Illustration from the NAACL 2013 lecture from R. Socher and C. Manning)

PCA projects the data onto a high-dimensional linear manifold. AutoEncoders

project the data
PCA = Linear onto a= Linear
Manifold non-linear manifold that (should) provide a better
Auto-
Encoder Auto-Encoders Learn Salient
representation of the latent space. Variations, like a non-linear PCA
input%x,%0Jmean%
features=code=h(x)=W$x$
Linear%manifold%
reconstrucGon(x)=WT$h(x)$=$WT$W$x$
W%=%principal%eigenJbasis%of%Cov(X)%

reconstrucGon(x)%
Minimizing%reconstrucGon%error%
reconstrucGon%error%vector% forces%latent%representaGon%of%%
x$ “similar%inputs”%to%stay%on%%
manifold%
LSA%example:%
x%=%(normalized)%distribuGon%
of%coJoccurrence%frequencies% AutoEncoder 84%
82%

So which one works better as a general face detector? Try it and see.

An experimental comparison of face detection using difference from Face Space

computed with PCA vs AutoEncoders is work +2 points on the second programming
project.

16
Variational Autoencoders

The output of an auto-encoder can be used to drive a decoder to produce a filtered

version of the encoded data or of another training set. However, the output from an
auto-encoder is discrete. We can adapt an auto-encoder to generate a *nearly*
continuous output by replacing the code with a probabilistic code represented by a
mean and variance.

This is called a Variational Autoencoder (VAE). VAEs combine a discriminative

network with a generative network. VAEs can be used to generate "deep fake"
videos sequences.

For a fully connected network, decoding is fairly obvious. The network input is a
!
binary vector Y with k binary values yk , with one for each target class. This is a
!
code. The output for a training sample Ym is an approximation of a feature vector
!
belonging to the code class, Xˆ m
€ €
! !
am(2) = Xˆ m ≈ X m €
€
and the error is the difference between a output and the actual members of the class.
€
! " "
δm(2) = am(2) – X m

! !
The average error for at training set {Ym } , { X m } can be used to drive back-
€ propagation.

€ €

17
Generative Adversarial Networks.
Generative Networks
It is possible to put a discriminative network together with a generative network and
have them train each other. This is called a Generative Adversarial Network (GAN).

A Generative Adversarial Network places a generative network in competition with a

Discriminative network.

The two networks compete in a zero-sum game, where each network attempts to fool
the other network. The generative network generates examples of an image and the
discriminative network attempts to recognize whether the generated image is realistic
or not. Each network provides feedback to the other, and together they train each
other. The result is a technique for unsupervised learning that can learn to create
realistic patterns. Applications include synthesis of images, video, speech or
coordinated actions for robots.

Generally, the discriminator is first trained on real data. The discriminator is then
frozen and used to train the generator. The generator is trained by using random
inputs to generate fake outputs. Feedback from the discriminator drives gradient
ascent by back propagation. When the generator is sufficiently trained, the two
networks are put in competition.

GAN Learning as Min-Max Optimization.

! !
The generator is a function Xˆ = G( z ,θ g ) , where G() is a differentiable function computed
as a multi-layer perceptron, with trainable parameters, θ g , and z is an input random vector
! !
with model pz ( z ) , and Xˆ is a synthetic (fake) pattern.
€ €
!
The discriminator is a differentiable function
€ D( X ,θ d ) computed as a multi-layer
perceptron
€ with
€ parameters θ d that estimates the likelihood that belongs to the set
described by the model θ d .
€
€ 18
€
! ! !
The generator Xˆ = G( z, θ g ) is trained to minimize Log(1− D(G( z, θ g )))

The perceptrons D() and G() play a two-player zero-sum min-max game with a value
function V(D, G):

In practice, this may not give sufficient gradient to learn. To avoid this, the
discriminator is first trained on real data. The generator is then trained with the
discriminator held constant. When the generator is sufficiently trained, the two
networks are put in competition, providing unsupervised learning.

The discriminator is trained by ascending the gradient to seek a max:

The generator is trained by seeking a minimum of the gradient:

Critical Thinking Hand Out (4) - 1
No ratings yet
Critical Thinking Hand Out (4) - 1
138 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
No ratings yet
OL Physics Book 2 (MCQ Theory) 2008 Till 2021
386 pages
Unit 3
No ratings yet
Unit 3
21 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
(PDF) Laser Diode
No ratings yet
(PDF) Laser Diode
16 pages
Autoencoders and Their Applications in Machine Learning
No ratings yet
Autoencoders and Their Applications in Machine Learning
52 pages
Applications of Linear Algebra in Facial Recognition
100% (1)
Applications of Linear Algebra in Facial Recognition
3 pages
1 - ALG - Exponential - Growth - Decay - Functions
No ratings yet
1 - ALG - Exponential - Growth - Decay - Functions
22 pages
DAAI - Lecture - 04 - With - Solutions - 10oct22
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
84 pages
A Hole in Space (1974) by Larry Niven PDF
No ratings yet
A Hole in Space (1974) by Larry Niven PDF
212 pages
Pmi Test Sample E368808-Pm012-240-056-0003
No ratings yet
Pmi Test Sample E368808-Pm012-240-056-0003
6 pages
Eigenvectors 2
No ratings yet
Eigenvectors 2
31 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
NTPEP 16001.2 Final
No ratings yet
NTPEP 16001.2 Final
145 pages
PS UNIT 4 Notes
No ratings yet
PS UNIT 4 Notes
29 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
CHP 4
No ratings yet
CHP 4
72 pages
Chicago River Design Guidelines 2019
100% (2)
Chicago River Design Guidelines 2019
137 pages
System Requirement Specification For Face Recognition System
50% (6)
System Requirement Specification For Face Recognition System
18 pages
Solid Liquid and Gases
No ratings yet
Solid Liquid and Gases
46 pages
10 EigenImages
No ratings yet
10 EigenImages
28 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
21 pages
March Version 3 - Module 1
No ratings yet
March Version 3 - Module 1
27 pages
Computer Vision Unit 1
No ratings yet
Computer Vision Unit 1
20 pages
International Supply and Demand For U S Trained Commercial Airline Pilots
No ratings yet
International Supply and Demand For U S Trained Commercial Airline Pilots
17 pages
Air 02 00010 v2
No ratings yet
Air 02 00010 v2
16 pages
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
No ratings yet
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
131 pages
Deep Learning 3
No ratings yet
Deep Learning 3
12 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
Lecture 2.3.2VariationalAutoencoders (VAEs)
No ratings yet
Lecture 2.3.2VariationalAutoencoders (VAEs)
25 pages
Lec 14
No ratings yet
Lec 14
18 pages
CS231n Convolutional Neural Networks For Visual Recognition 6
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 6
17 pages
AI Tech Agency - by Slidesgo
No ratings yet
AI Tech Agency - by Slidesgo
41 pages
Auto Encoder
No ratings yet
Auto Encoder
11 pages
Face Recognition Using PCA (Eigenfaces) and LDA (Fisherfaces)
No ratings yet
Face Recognition Using PCA (Eigenfaces) and LDA (Fisherfaces)
20 pages
Live Case Study - 1
No ratings yet
Live Case Study - 1
7 pages
NFL Management Trainee Notification 2024
No ratings yet
NFL Management Trainee Notification 2024
14 pages
Data-Sheet FieldJointCoating
No ratings yet
Data-Sheet FieldJointCoating
2 pages
Wk01 Machine Learning
No ratings yet
Wk01 Machine Learning
6 pages
1 s2.0 S1877050915031828 Main
No ratings yet
1 s2.0 S1877050915031828 Main
7 pages
Summarise The Nature and Effects of Perceived Fairness in Groups C2
No ratings yet
Summarise The Nature and Effects of Perceived Fairness in Groups C2
1 page
Discipline of Focus
No ratings yet
Discipline of Focus
9 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
03 Face Detection
No ratings yet
03 Face Detection
7 pages
Principle Component Analysis (Pca) - Eigenfaces
100% (1)
Principle Component Analysis (Pca) - Eigenfaces
54 pages
QH WZJJDC J10 VW BV GSG M7 J Ey AC2 QT YHgs Nze BAohs
No ratings yet
QH WZJJDC J10 VW BV GSG M7 J Ey AC2 QT YHgs Nze BAohs
7 pages
Class 4 Part 2 PDF
No ratings yet
Class 4 Part 2 PDF
26 pages
W9a Autoencoders Pca
No ratings yet
W9a Autoencoders Pca
7 pages
Reserch Papers On Deep Learning Mpgi
No ratings yet
Reserch Papers On Deep Learning Mpgi
6 pages
UAS High School Profile 2024 25 Vers2
No ratings yet
UAS High School Profile 2024 25 Vers2
4 pages
Introduction To Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
No ratings yet
Introduction To Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
15 pages
The Role of Ritucharya in Human Body According To Different Ritu'S
No ratings yet
The Role of Ritucharya in Human Body According To Different Ritu'S
4 pages
FR Pca Lda
No ratings yet
FR Pca Lda
52 pages
Data Mining - Sunny Shah
No ratings yet
Data Mining - Sunny Shah
12 pages
Eigenfaces A Brilliant Application of Linear Algebra
No ratings yet
Eigenfaces A Brilliant Application of Linear Algebra
6 pages
AP English Language & Composition - Rhetorical Analysis Terms
No ratings yet
AP English Language & Composition - Rhetorical Analysis Terms
4 pages
2 2012 Int Conf MELECON Principal Component Analysis For 3D Manipulator
No ratings yet
2 2012 Int Conf MELECON Principal Component Analysis For 3D Manipulator
4 pages
Eigenfaces Face Recognition (MATLAB)
No ratings yet
Eigenfaces Face Recognition (MATLAB)
5 pages
History 7-10 - Sequence of Content
No ratings yet
History 7-10 - Sequence of Content
9 pages
Object Recognition and Template Matching: - A Template Is A Small Image (Sub-Image)
No ratings yet
Object Recognition and Template Matching: - A Template Is A Small Image (Sub-Image)
23 pages
Advertisement For Dav
No ratings yet
Advertisement For Dav
9 pages
Face Detection Using PCA
No ratings yet
Face Detection Using PCA
32 pages
Face Recognition: Jeremy Wyatt
No ratings yet
Face Recognition: Jeremy Wyatt
22 pages
Technique of Face Recognition Based On PCA With Eigen-Face Approach
No ratings yet
Technique of Face Recognition Based On PCA With Eigen-Face Approach
12 pages
Content Based: Face Recognition
No ratings yet
Content Based: Face Recognition
27 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Eigenfaces With Pca
No ratings yet
Eigenfaces With Pca
12 pages
Computer Vision: Mubarak Shah Computer Vision Lab University of Central Florida Orlando, FL 32816
No ratings yet
Computer Vision: Mubarak Shah Computer Vision Lab University of Central Florida Orlando, FL 32816
24 pages
Face Recogniton Using Neural Networks
No ratings yet
Face Recogniton Using Neural Networks
19 pages
Face Recognition Using PCA (Eigenfaces) and LDA (Fisherfaces)
No ratings yet
Face Recognition Using PCA (Eigenfaces) and LDA (Fisherfaces)
20 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Face Recognition Machine Vision System Using Eigenfaces
No ratings yet
Face Recognition Machine Vision System Using Eigenfaces
7 pages
Backpropagation: Static Backpropagation Is A Network Designed To
No ratings yet
Backpropagation: Static Backpropagation Is A Network Designed To
2 pages
Object Recognition Using Eigenvectors
No ratings yet
Object Recognition Using Eigenvectors
7 pages
HRM As Map, Model and Theory
No ratings yet
HRM As Map, Model and Theory
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Learning A Similarity Metric Discriminatively, With Application To Face Verification
No ratings yet
Learning A Similarity Metric Discriminatively, With Application To Face Verification
8 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
ECE 420: Embedded DSP Laboratory Lab Assigned Project Lab Eigenfaces For Recognition Paper Summary
No ratings yet
ECE 420: Embedded DSP Laboratory Lab Assigned Project Lab Eigenfaces For Recognition Paper Summary
5 pages
HD5208 FLX: High Density Polyethylene
No ratings yet
HD5208 FLX: High Density Polyethylene
2 pages
Report Writing Fomat
No ratings yet
Report Writing Fomat
3 pages
3 General English&legal Language-Law, Sem-1 Syllabus
No ratings yet
3 General English&legal Language-Law, Sem-1 Syllabus
1 page
Application of Gaussian Filter With Principal Component Analysis
No ratings yet
Application of Gaussian Filter With Principal Component Analysis
8 pages
CV Rezaei
No ratings yet
CV Rezaei
2 pages
Case Study Pca1-1
No ratings yet
Case Study Pca1-1
11 pages
2008-Kukenys+Mccane-svms For Human Face Detection
No ratings yet
2008-Kukenys+Mccane-svms For Human Face Detection
4 pages
Image Compression and Face Recognition: Two Image Processing Applications of Principal Component Analysis
No ratings yet
Image Compression and Face Recognition: Two Image Processing Applications of Principal Component Analysis
3 pages
Face Recognition by Using Linear Discriminant Analysis
No ratings yet
Face Recognition by Using Linear Discriminant Analysis
4 pages
Questionnaire 3 - MultiGroup Analysis
No ratings yet
Questionnaire 3 - MultiGroup Analysis
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Halliburton Packer Service Tools Catalog
100% (8)
Halliburton Packer Service Tools Catalog
92 pages

Ensi3 PRML s6 Encoders

Uploaded by

Ensi3 PRML s6 Encoders

Uploaded by

Pattern Recognition and Machine Learning

ENSIMAG 3 - MMIS Fall Semester 2020

Generative Networks ......................................................... 3

Principal Components Analysis and Eigen-Space Coding 4

Generative Adversarial Networks. ................................... 18

A classic example is an autoencoder. But before we see how to learn an autoencoder,

To use the method we require a training set of M face

We can then project an imagette onto basis

We can reconstruct the imagette by as a weighted sum of basis images.

where µ (i, j) is the “average” imagette from the training data.

For an imagette of size C columns by R rows,

Principal components analysis is a method to determine a linear subspace that is

! V (1) V (1) ... VM (1) $

This covariance matrix has N x N = N2 coefficients.

The coefficients of Σ are the covariances for the pixels, i and j.

(ϕ, Λ) ← PCA(Σ). Giving: ϕ T Σϕ = Λ

Where ϕ, is an NxN rotation matrix and Λ is a diagonal matrix with N non-zero

The N columns of the rotation matrix, ϕ, are the eigenvectors of Σ. These

Reconstruction: Wˆ (n) = µ (n) + ∑ xd ϕ d (n)

We can determine an error image:

Error image: R(n) = W (n) − Wˆ (n)

Error Energy: ε 2R = ∑ R(n)2

For xd = W (n), ϕ d (n) = ∑W (n) ⋅ ϕ d (n)

16 images randomly selected from a 2 minute video of Francois Berard. (1995).

Average Image Eigen Values

Principals Component images

The code can be used to reconstruct an image: Wˆ (n) = µ (n) + ∑ xd ϕ d (n)

Example reconstructed image (120 bytes) Error Image.

Reconstruction (120 bytes)

Face Detection using Distance from Face Space

The distance from Face Space can be used as a face detector!

Reconstruction: Wˆ (n) = µ (n) + ∑ xd ϕ d (n)

Residue image: R(n) = W (n) − Wˆ (n)

Residue Energy: ε = ∑ R(n)2

Eigenspace coding for transmission.

Eigen space coding is a very effective method for signal compression!

This technique typically used with Gaussian Mixture models,

We use an algorithm such as EM to learn a model ν k for Mk samples of images

from Bayes rule we can determine the most probable individual, Ck as :

In 1991, Eigen-space coding was a revolutionary technique for face recognition,

However, testing soon revealed that it could only work with:

An auto-encoder is an unsupervised learning algorithm that uses back-propagation to

An Autocoder learns to reconstruct (generate) clean copies of data without noise.

The hidden code is composed of independent € “features” that capture some

The Sparsity Parameter

The output should approximate the input.

The auto-encoder is trained to minimize the average sparsity. This is accomplished

To impose “sparsity” we add an additional term to the loss.

activations and β controls the importance of the sparsity parameter.

PCA projects the data onto a high-dimensional linear manifold. AutoEncoders

An experimental comparison of face detection using difference from Face Space

The output of an auto-encoder can be used to drive a decoder to produce a filtered

This is called a Variational Autoencoder (VAE). VAEs combine a discriminative

A Generative Adversarial Network places a generative network in competition with a

GAN Learning as Min-Max Optimization.

The discriminator is trained by ascending the gradient to seek a max:

The generator is trained by seeking a minimum of the gradient:

You might also like