0% found this document useful (0 votes)

19 views36 pages

Part 15 MD

computer vision slides 15

Uploaded by

yyyangwhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views36 pages

Part 15 MD

computer vision slides 15

Uploaded by

yyyangwhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Visual Information Interpretation

Beyond CNN (Optional): Transformer and Generative models

Ji Hui

1
Before Transformers: RNN
Transformers was motivated by the problem of machine translation
For a source sequence , where each denotes a word (e.g.
English), we seek to predict a translation of into a different language, a.k.a. a target
sequence (e.g. French).
Probabilistically, machine translation is about modeling the conditional distribution :

A recurrent neural network (RNN) for Seq2Seq model

SOS: start of sequence; EOS: end of sequence

2
Encoder-decoder structure for most Seq2Seq models
Encoder : accepting an input vector and and a hidden state

Decoder : Feed as the initial hidden state, along with the first element
of the sequence to decode
The decoder outputs a categorical distribution over the target vocabulary.

3
Attention is all you need

One drawback of the RNN-based encoder-decoder is that a single hidden state vector can
only carry so much information
the longer an input sequence, the noisier the information from earlier parts becomes,
which limits the ability of the model to make good predictions over long sequence lengths.
Attention: exploring the possibility of using all of the hidden state vectors generated during
encoding to decode the target sequence
Attention condenses the set of hidden states into a weighted sum of its constituent vectors
The weighting is based in part on the contents of the vectors

4
Attention
The decoder receives a context vector at each time step, which is computed by
attending to the inputs.

5
Attention
The context vector is computed as a weighted average of the encoder's contexts

The attention weights are computed as a softmax, whose input depends on the context
and teh decoder's state:

Score measures the similarity between two states, e.g. dot attention

Note that the attention here does not depend on the position in the sequence.

6
More details on Attention
In general, Attention mappings can be described as a function of a query and a set of key-
value pairs
Context vector :
Attention:

The query : represents the context vector we want

Inter-attention: For example, something computed from the previous layer or -th
timestep of the decoder
Self-attention: For example, something computed from the -th encoder.
The key : tells whether or not is useful context
For example, something computed from the -tj timestep of the encoder
The value : provides the actual context
For example, something computed from the th timestep of the encoder.

7
Scaled dot-product attention

Scaled dot-product attention

The context summary vector

8
Multi-head attention
Scaled Dot-Product attention:
Multi-head attention:

where are learnable weighting matrices.

9
Transformer (Vaswani et al. (2017))
Transformer has a encoder-decoder
architecture
All the recurrent connections are replaced
by the attention modules
The transfomer model uses stacked self-
attention layers
Positional Encoding
the attention encoder outputs do not
depend on the order of the inputs.
The order of the sequence conveys
important information for language
modeling.
The solution: Add positional information of
a input token in the sequence into the input
embedding vectors.

10
VIT (Vision Transformer): Model an image as a sequence of words
1. Split an image into patches and flatten the patches
2. Produce lower-dimensional linear embeddings from the flattened patches
3. Add positional embeddings

11
Generative model
Given training data, generate new samples from same distribution

Learning the distribution which is close to the distribution

12
Generative models
Density estimation
Density estimation is to estimate a continuous density function from a discretely
sampled set of points drawn from that density function
Two main favors
Explicit density estimation: explicitly define and solve for
Implicit density estimation (Generative models): Given training data, generate new
samples from same distribution

13
Why generative model
Generate realistic samples, e.g. synthesizing realistic images for graphics and entertainment
Model the data distribution in order to tell which of several candidate outputs is more likely
Train a generative model to learn high-level features that are useful to other tasks.

Auto-encoder Generative adversarial network (GAN)

14
Auto-encoder

Encoder: In which the model learns how to reduce the input dimensions and compress the
input data into an encoded representation
Bottleneck: which is the layer that contains the compressed representation of the input data.
This is the lowest possible dimensions of the input data.
Decoder: In which the model learns how to reconstruct the data from the encoded
representation to be as close to the original input as possible.
Reconstruction Loss: This is the method that measures measure how well the decoder is
performing and how close the output is to the original input.

15
Illustration of auto-encoder

16
Principle component analysis (PCA)
PCA produces a low-dimensional approximation of a dataset , by finding a
linear combination of the variables with maximal variance and mutual un-correlation.
Procedure of PCA
Stack the observations (with zero mean) into the rows of a matrix .
Construct the Singular Value Decomposition (SVD): :

For a -dimensional PCA, is the matrix with a diagonal upper sub-matrix:

where
The columns of is called the principle components. 17
Demo of PCA on digits

18
From PCA to Autoencoder
Consider , we have a low-dimensional approximation

can be viewed as the code vector containing only non-zero entries

PCA is a linear autoencoder

19
Autoencoder
Auto-encoder: Generalized PCA with non-linearity

For linear activation functions, this is equivalent to linear layers, which is equivalent to the
Encoder of the PCA.
For non-linear activations, the depth of the network matters, many layers might be needed to
build a good representation.
Autoencoder provides a way to represent input vector by a nonlinear projection onto a lower-
dimensional space of neuron activations for the inner-most layer with few neurons.
20
Autoencoder in action
An autoencoder with 20 latent variables

The reconstructed digits from the autoencoder with 20 latent variables

21
From decoder to generative model
The decoder of an auto-encoder provides a generator network, which maps a low-
dimensional code to a high-dimensional image.

The decoder above is not a generative model yet, as it does not define a distribution
Consider training a generator network for the distribution

If is lower-dimensional, almost everywhere. 22

VAE (Variational auto-encoder)
Generalizing the decoder to a generative noisy model

where denote the reconstruction of the decoder.

The code is defined as the samples from some distribution, which is approximated by a
mean field model

where and , i.e.,

23
Training a VAE
Training the encoder distribution to return the mean and the covariance matrix of Gaussians
The loss function
A “reconstruction term” (on the final layer), that makes the encoding-decoding scheme a
good approximation
a “regularisation term” (on the latent layer), that makes the distributions returned by the
encoder close to a standard normal distribution, via Kulback-Leibler (KL) divergence

24
VAE vs AE

Instead of encoding an input as a single point, we encode it as a distribution over the latent
space

25
Generative Adversarial Network (GAN)
The idea behind the GAN: if the generator is doing a good job of modeling the data
distribution, then the generated samples should be indistinguishable from the true data.

Train a discriminator whose job it is to classify whether an observation (e.g. an image) is

from the training set or whether it was produced by the generator.
The generator is evaluated based on the discriminator's failure on telling samples/data. 26
Generative Adversarial Network (GAN)

27
Training GAN: a two-player game
Generator network
Try to fool the discriminator by generating real-looking images
Discriminator network
Try to distinguish between real and fake images
Trained using a logistic regression classifier with cost function being cross-entropy for
classifying real vs. fake:

28
Training GANs: Minimax Game

Discriminator maximizes objective such that is close to (real) and

is close to (fake).
Generate minimizes objective such that is close to (discriminator is fooled into
thinking is real).

29
Training GANs
Minimax Game on: logistic loss function

Alternative iteration between

Gradient ascent on discriminator

Gradient descent on generator

30
Generated samples by GAN
boxed images are nearest images in training data

31
Diffusion model
Diffusion models are inspired by non-equilibrium thermodynamics, which define a Markov
chain of diffusion steps to slowly add random noise to data and then learn to reverse the
diffusion process to construct desired data samples from the noise

Different from VAE, diffusion models are learned with a fixed procedure and the latent
variable has high dimensionality (same as the original data).

32
Forward diffusion process
Given a data point sampled from a distribution , a forward diffusion process adds
small Gaussian noise to the sample in steps, producing a sequence of noisy samples

where
The step sizes are controlled by a variance schedule

The data sample gradually loses its distinguishable features as the step becomes larger.
Eventually when , is equivalent to an isotropic Gaussian distribution. 33
Reverse diffusion process
If we can reverse the forward process and sample from , we will be able to
recreate the true sample from a Gaussian noise input,
Bad news: We cannot easily estimate because it needs the data distribution
Approximation: We learn a model to approximate these conditional probabilities in order
to run the reverse diffusion process.

34
Illustration

(Image source: Sohl-Dickstein et al., 2015) 35

Summary of three generative models

Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
93 pages
Lec15 Generative Models
No ratings yet
Lec15 Generative Models
51 pages
Generative Models
No ratings yet
Generative Models
65 pages
L12 Generative Models en
No ratings yet
L12 Generative Models en
65 pages
DL Asmt-2
No ratings yet
DL Asmt-2
17 pages
GenAI Unit1 3
No ratings yet
GenAI Unit1 3
31 pages
Lec 19
No ratings yet
Lec 19
111 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
10 - Generative AI
No ratings yet
10 - Generative AI
71 pages
Machine Learning Final Presentation
No ratings yet
Machine Learning Final Presentation
32 pages
Dlunit 4
No ratings yet
Dlunit 4
122 pages
AAI - Module 2 - Variational Autoencoders
No ratings yet
AAI - Module 2 - Variational Autoencoders
9 pages
Stable Diffusion
No ratings yet
Stable Diffusion
58 pages
Unsupervised Deep Learning
No ratings yet
Unsupervised Deep Learning
11 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Unit Iii
No ratings yet
Unit Iii
15 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Unit V 2 Marks With Header DL
No ratings yet
Unit V 2 Marks With Header DL
6 pages
ACV - Notes - Final
No ratings yet
ACV - Notes - Final
7 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
Tutorial On Diffusion Models
No ratings yet
Tutorial On Diffusion Models
4 pages
AAI - Module 1 - Generative Adversarial Network and Probabilistic Models
No ratings yet
AAI - Module 1 - Generative Adversarial Network and Probabilistic Models
12 pages
Generalization of VAE
No ratings yet
Generalization of VAE
30 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Variational Autoencoder For Deep Learning of Images, Labels and Captions
No ratings yet
Variational Autoencoder For Deep Learning of Images, Labels and Captions
9 pages
Auto Encoder S
No ratings yet
Auto Encoder S
16 pages
AI60201 Module3
No ratings yet
AI60201 Module3
61 pages
For A Change
No ratings yet
For A Change
10 pages
GAPE Module 1
No ratings yet
GAPE Module 1
29 pages
Unit 5 Autoencoders
No ratings yet
Unit 5 Autoencoders
6 pages
Deep Learning Concepts Summary
No ratings yet
Deep Learning Concepts Summary
6 pages
Class 5 - Deep Dive Into AI
No ratings yet
Class 5 - Deep Dive Into AI
32 pages
Generative Models
No ratings yet
Generative Models
39 pages
Lect-Gen Ai-2
No ratings yet
Lect-Gen Ai-2
22 pages
Generative Ai
No ratings yet
Generative Ai
21 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
03 Autoencoders 4
No ratings yet
03 Autoencoders 4
159 pages
Introduction To Gen Ai
No ratings yet
Introduction To Gen Ai
13 pages
MuskanSharma - III IT
No ratings yet
MuskanSharma - III IT
10 pages
Gen AI Unit 2
100% (1)
Gen AI Unit 2
65 pages
Generative Model For Image Classification
No ratings yet
Generative Model For Image Classification
4 pages
Introduction To Generative Adversarial Networks: Luke de Oliveira
No ratings yet
Introduction To Generative Adversarial Networks: Luke de Oliveira
31 pages
L19 GANs
No ratings yet
L19 GANs
9 pages
Tutorialon Diffusion Modelsfor Imaging and Vision
No ratings yet
Tutorialon Diffusion Modelsfor Imaging and Vision
90 pages
Module 5
No ratings yet
Module 5
23 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan September 10, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan September 10, 2024
89 pages
VAE Vs GAN
100% (1)
VAE Vs GAN
3 pages
Advanced Design For AI Algorithms: Lec.: 1 GAN
No ratings yet
Advanced Design For AI Algorithms: Lec.: 1 GAN
223 pages
Gans Stanford
No ratings yet
Gans Stanford
39 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Dis10 Sol
No ratings yet
Dis10 Sol
11 pages
An Introduction To Transformers
No ratings yet
An Introduction To Transformers
10 pages
Module 5
No ratings yet
Module 5
23 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
Deep Learning Unit 3 (Part-2)
No ratings yet
Deep Learning Unit 3 (Part-2)
56 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
1 s2.0 S2667305323000273 Main
No ratings yet
1 s2.0 S2667305323000273 Main
18 pages
11 Soln
No ratings yet
11 Soln
3 pages
Python Deep Learning Tutorial
0% (1)
Python Deep Learning Tutorial
17 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Algorithm Design & Data Structures
No ratings yet
Algorithm Design & Data Structures
13 pages
CV Practical Record Editted - PDF
No ratings yet
CV Practical Record Editted - PDF
36 pages
Ball Tree
No ratings yet
Ball Tree
4 pages
Data Mining in Agriculture On Crop Price Prediction: Techniques and Applications
No ratings yet
Data Mining in Agriculture On Crop Price Prediction: Techniques and Applications
3 pages
DIVISIBILITY
No ratings yet
DIVISIBILITY
31 pages
Data Structures: Course Objectives
No ratings yet
Data Structures: Course Objectives
3 pages
Roman Urdu News Headline Classification Empowered With Machine Learning
No ratings yet
Roman Urdu News Headline Classification Empowered With Machine Learning
16 pages
Optimization Modelling A Practical Approach 1st Edition Ruhul Amin Sarker Instant Download
No ratings yet
Optimization Modelling A Practical Approach 1st Edition Ruhul Amin Sarker Instant Download
42 pages
DTFT Table PDF
100% (1)
DTFT Table PDF
2 pages
Unit 5
No ratings yet
Unit 5
39 pages
Algorithms in Real World PDF
No ratings yet
Algorithms in Real World PDF
303 pages
Discretization Techniques
No ratings yet
Discretization Techniques
29 pages
Soft Computing Roadmap
No ratings yet
Soft Computing Roadmap
3 pages
XOR Problem Tensorflow NN - Ipynb
No ratings yet
XOR Problem Tensorflow NN - Ipynb
29 pages
AI
No ratings yet
AI
9 pages
Group - A (Short Answer Questions) : S. No Blooms Taxonomy Level Course Outcome
No ratings yet
Group - A (Short Answer Questions) : S. No Blooms Taxonomy Level Course Outcome
15 pages
Introduction To Hill Climbing
No ratings yet
Introduction To Hill Climbing
5 pages
DEEP LEARNING - FDP
No ratings yet
DEEP LEARNING - FDP
18 pages
Ex. No: 9.a Power Spectral Density Estimation Date: by Bartlett Method
No ratings yet
Ex. No: 9.a Power Spectral Density Estimation Date: by Bartlett Method
7 pages
Lecture4 DSP
No ratings yet
Lecture4 DSP
24 pages
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
No ratings yet
Introduction To Convolutional Neural Network (CNN) Using Tensorflow - by Govinda Dumane - Towards Data Science
17 pages
23CS0507 - Advance DataStructure and Algorithms - R23
No ratings yet
23CS0507 - Advance DataStructure and Algorithms - R23
6 pages
CS 300 Data Structures: Sabancı University Faculty of Engineering and Natural Sciences
No ratings yet
CS 300 Data Structures: Sabancı University Faculty of Engineering and Natural Sciences
6 pages
Date:12/11/20 Due Date:12/11/20: Institute of Southern Punjab Multan Assignment Top Cover
No ratings yet
Date:12/11/20 Due Date:12/11/20: Institute of Southern Punjab Multan Assignment Top Cover
3 pages
SS Iat-2
No ratings yet
SS Iat-2
2 pages

Part 15 MD

Uploaded by

Part 15 MD

Uploaded by

Visual Information Interpretation

Beyond CNN (Optional): Transformer and Generative models

A recurrent neural network (RNN) for Seq2Seq model

The query : represents the context vector we want

Scaled dot-product attention

The context summary vector

where are learnable weighting matrices.

Learning the distribution which is close to the distribution

Auto-encoder Generative adversarial network (GAN)

For a -dimensional PCA, is the matrix with a diagonal upper sub-matrix:

can be viewed as the code vector containing only non-zero entries

The reconstructed digits from the autoencoder with 20 latent variables

If is lower-dimensional, almost everywhere. 22

where denote the reconstruction of the decoder.

where and , i.e.,

Train a discriminator whose job it is to classify whether an observation (e.g. an image) is

Discriminator maximizes objective such that is close to (real) and

Alternative iteration between

Gradient descent on generator

(Image source: Sohl-Dickstein et al., 2015) 35

You might also like