Poster For Residual Flows

1. The document presents Residual Flows, a powerful flow-based generative model using invertible residual networks (i-ResNets). 2. It introduces an unbiased "Russian roulette" estimator to calculate the log-density during training, allowing principled maximum likelihood training. 3. A gradient power series is formulated to efficiently compute partial derivatives from the log determinant term with constant memory. 4. The model motivates and investigates Lipschitz-constrained activations to avoid gradient saturation and generalizes i-ResNets to learned mixed norms. 5. Experimental results show the Residual Flow model is competitive with state-of-the-art flow-based density estimation

Uploaded by

Joshua Francis Pedro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views1 page

Poster For Residual Flows

Uploaded by

Joshua Francis Pedro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Residual Flows for Invertible Generative Modeling

Ricky T. Q. Chen1,3, Jens Behrmann2, David Duvenaud1,3, Jörn-Henrik Jacobsen1,3

University of Toronto1, University of Bremen2, Vector Institute3

Contributions Unbiased Log-density via “Russian Roulette” Estimator Density Modeling

We build a powerful flow-based model based on the invertible residual network Russian roulette estimator. To illustrate the idea, let ∆k denote the k-th We are competitive with existing state-of-the-art flow-based models on density
architecture (i-ResNet) (Behrmann et al., 2019). We term of an infinite series, and suppose we always evaluate the first term then flip estimation, when using uniform dequantization.
1. Use a “Russian roulette” estimator to produce unbiased estimates of the a coin b ∼ Bernoulli(q) to determine whether we stop or continue evaluating the
log-density. This allows principled training as a flow-based model. remaining terms. By reweighting the remaining terms by 1−q 1 , we obtain Model MNIST CIFAR-10 ImageNet 32×32 ImageNet 64×64
2. Formulate a gradient power series for computing the partial derivatives from P∞ P∞ ∞ Real NVP (Dinh et al., 2017) 1.06 3.49 4.28 3.98
the log determinant term in O(1) memory. ∆
k=2 k 1 ∆
k=2 k (1 − q) =
1
X
∆1 + E + (0) = ∆ + ∆k . Glow (Kingma & Dhariwal, 2018) 1.05 3.35 4.09 3.81
b=0 b=1 1
3. Motivate and investigate desirata for Lipschitz-constrained activation functions 1−q 1−q FFJORD (Grathwohl et al., 2019) 0.99 3.40 — —
k=1
that avoid gradient saturation. This unbiased estimator has probability q of being evaluated in finite time. We Flow++ (Ho et al., 2019) — 3.29 (3.09) — (3.86) — (3.69)
4. Generalize i-ResNets to induced mixed norms and learnable norm orders. can obtain an estimator that is evaluated in finite time with probability one by i-ResNet (Behrmann et al., 2019) 1.05 3.45 — —
applying this process infinitely many times to the remaining terms. Residual Flow (Ours) 0.97 3.29 4.02 3.78
Background: Invertible (Flow-based) Generative Models Residual Flows. Unbiased estimation of the log-density leads to our model. Table: Results [bits/dim] on standard benchmark datasets for density estimation. In brackets are
  models that used “variational dequantization”, which we don’t compare against.
n k+1 v T [J (x)k ]v
Maximum likelihood estimation. To perform maximum likelihood with X (−1) g
log p(x) = log p(f (x)) + En,v  , (8) Ablation Experiments
stochastic gradient descent, we require k P(N ≥ k)
k=1
∇θ Ex∼pdata(x) [log pθ (x)] = Ex∼pdata(x) [∇θ log pθ (x)] (1) where n ∼ p(N) and v ∼ N (0, I ). We use a shifted geometric distribution for 4.5

Bits/dim on CIFAR-10
Training Setting MNIST CIFAR-10† CIFAR-10
Change of Variables. With an invertible transformation f , we can build a p(N) with an expected compute of 4 terms. 4.0

i-ResNet + ELU 1.05 3.45 3.66∼4.78

generative model 3.5

Residual Flow + ELU 1.00 3.40 3.32

z ∼ p(z), x = f −1(z). (2) Memory-efficient Gradient Estimation 3.0
Residual Flow + LipSwish 0.97 3.39 3.29
2.5
0 5 10 15 20 25 30

Then the log-density of x is given by Epoch Table: Ablation results. †Uses immediate downsampling
Neumann gradient series. For estimating (1), we can either i-ResNet (Biased Train Estimate)
i-ResNet (Actual Test Value)
Residual Flow (Unbiased Train Estimate)
Residual Flow (Actual Test Value)
before any residual blocks.
df (x)
log p(x) = log p(f (x)) + log det . (3) (i) Estimate log p(x), then take gradient.
dx
(ii) Take the gradient, then estimate ∇ log p(x). Qualitative Samples
Flow-based generative models can be The first option uses variable amount of memory depending on the sample of n:
1. sampled, if (2) can be computed or approximated up to some precision.  
n k+1 ∂v T (J (x, θ)k )v
2. trained using maximum likelihood, if (3) can be unbiasedly estimated. ∂ df (x) X (−1) g
log det
= En,v  . (9)
∂θ dx k ∂θ
k=1
Background: Invertible Residual Networks (i-ResNets) The second option, by using a Neumann series we obtain constant memory cost:
   Figure: Qualitative samples. Real (left) and random samples (right) from a model trained on
n k 5bit 64×64 CelebA. The most visually appealing samples were picked out of 5 random batches.
Residual networks are composed of simple transformations ∂ df (x) X (−1) T k ∂(Jg (x, θ)) 
log det
= En,v   v J(x, θ)  v (10)
∂θ dx P(N ≥ k) ∂θ
y = f (x) = x + g (x) (4) k=0
Hybrid Modeling
Behrmann et al. (2019) proved that if g is a contractive mapping, ie. Lipschitz
strictly less than one, then the residual block transformation (4) is invertible. df (x)
Backward-in-forward. Since log det dx is a scalar quantity, we can Residual blocks are better building blocks for hybrid models than coupling blocks.
Sampling. The inverse f −1 can be efficiently computed by a fixed-point compute its gradient early and free up memory. This reduces the amount of Trained using a weighted maximum likelihood objective similar to (Nalisnick et al., 2019).
iteration memory by a factor equal to the number of residual blocks with negligible cost. E(x,y )∼pdata [λ log p(x) + log p(y |x)] (11)
x (i+1) = y − g (x (i)) (5)
which converges superlinearly by the Banach fixed-point theorem. MNIST CIFAR-10† CIFAR-10 MNIST SVHN
Log-density. The change of the variables can be applied to invertible residual ELU LipSwish ELU LipSwish ELU LipSwish Relative λ=0 λ = 1/D λ=1 λ=0 λ = 1/D λ=1
networks Naı̈ve Backprop 92.0 192.1 33.3 66.4 120.2 263.5 100% Block Type Acc↑ BPD↓ Acc↑ BPD↓ Acc↑ Acc↑ BPD↓ Acc↑ BPD↓ Acc↑
 
∞ k+1 Neumann Gradient 13.4 31.2 5.5 11.3 17.6 40.8 15.7% (Nalisnick et al., 2019) 99.33% 1.26 97.78% − − 95.74% 2.40 94.77% − −
(−1)
[Jg (x)]k 
X
log p(x) = log p(f (x)) + tr  (6) Backward-in-Forward 8.7 19.8 3.8 7.4 11.5 26.1 10.3% Coupling 99.50% 1.18 98.45% 1.04 95.42% 96.27% 2.73 95.15% 2.21 46.22%
k Both Combined 4.9 13.6 3.0 5.9 6.6 18.0 7.1% + 1 × 1 Conv 99.56% 1.15 98.93% 1.03 94.22% 96.72% 2.61 95.49% 2.17 46.58%
k=1
Table: Memory usage (GB) per minibatch of 64 samples when computing n=10 terms in the Residual 99.53% 1.01 99.46% 0.99 98.69% 96.72% 2.29 95.79% 2.06 58.52%

+ Trace can be efficiently estimated using the Skilling-Hutchinson estimator. corresponding power series. †Uses immediate downsampling before any residual blocks. Table: Comparison of residual vs. coupling blocks for the hybrid modeling task.
− The infinite sum can be estimated by truncating to a fixed n. However, this
introduces bias equal to the remaining terms. LipSwish Activation for Enforcing Lipschitz Constraint References
This results in the biased estimator:
  We motivate smooth and non-monotonic Lipschitz activation functions. This • Behrmann et al.. “Invertible residual networks.” (2019)
n
(−1) k+1 avoid “gradient saturation”, which occurs if the 2nd derivative asymptotically • Kahn. “Use of different monte carlo sampling techniques.” (1955)
v T [Jg (x)]k v 
X
log p(x) ≈ log p(f (x)) + Ev ∼N (0,1)  (7) approaches zero when the 1st derivative is close to one.
k • Beatson & Adams. “Efficient Optimization of Loops and Limits with Randomized Telescoping
k=1 Sums.” (2019)
LipSwish(x) = Swish(x)/1.1 = x · σ(βx)/1.1
where Behrmann et al. (2019) chose n = 5, 10. • Nalisnick et al.. “Hybrid Models with Deep and Invertible Features.” (2019)

Past Papers M2 PDF
No ratings yet
Past Papers M2 PDF
223 pages
Game Theory
No ratings yet
Game Theory
30 pages
A. Additional Lemmas and Proofs: (Theorem)
No ratings yet
A. Additional Lemmas and Proofs: (Theorem)
7 pages
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
No ratings yet
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
13 pages
NeurIPS 2023 Training Energy Based Normalizing Flow With Score Matching Objectives Paper Conference
No ratings yet
NeurIPS 2023 Training Energy Based Normalizing Flow With Score Matching Objectives Paper Conference
26 pages
Flow++: Improving Flow-Based Generative Models With Variational Dequantization and Architecture Design
No ratings yet
Flow++: Improving Flow-Based Generative Models With Variational Dequantization and Architecture Design
16 pages
Self-Supervised Learning For Inverse Problem
No ratings yet
Self-Supervised Learning For Inverse Problem
47 pages
Deep Residual Learning For Compressed Sensing Mri
No ratings yet
Deep Residual Learning For Compressed Sensing Mri
4 pages
Optimizing (Variational) Physics-Informed Neural Networks Using Least Squares
No ratings yet
Optimizing (Variational) Physics-Informed Neural Networks Using Least Squares
15 pages
Chen, Deng Et Al 2021 - Effective and Efficient Batch Normalization
No ratings yet
Chen, Deng Et Al 2021 - Effective and Efficient Batch Normalization
15 pages
Ratnn Si 2015 09 04
No ratings yet
Ratnn Si 2015 09 04
23 pages
Nips00 GM
No ratings yet
Nips00 GM
7 pages
Density Estimation Using Real NVP
No ratings yet
Density Estimation Using Real NVP
32 pages
Gao Flow Contrastive Estimation of Energy-Based Models CVPR 2020 Paper
No ratings yet
Gao Flow Contrastive Estimation of Energy-Based Models CVPR 2020 Paper
11 pages
Multivariate Probabilistic Time Series Forecasting Via Conditioned Normalizing Flows
No ratings yet
Multivariate Probabilistic Time Series Forecasting Via Conditioned Normalizing Flows
19 pages
Hagemann 2021 Inverse Problems 37 085002
No ratings yet
Hagemann 2021 Inverse Problems 37 085002
24 pages
Recycling Model Updates in Federated Learning - Are Gradient Subspaces Low-Rank
No ratings yet
Recycling Model Updates in Federated Learning - Are Gradient Subspaces Low-Rank
70 pages
IAF Kingma Et Al 2016
No ratings yet
IAF Kingma Et Al 2016
16 pages
He Deep Residual Learning CVPR 2016 Paper PDF
No ratings yet
He Deep Residual Learning CVPR 2016 Paper PDF
9 pages
1512 03385-Cropped
No ratings yet
1512 03385-Cropped
12 pages
L15 Autoregressive and Reversible Models
No ratings yet
L15 Autoregressive and Reversible Models
7 pages
Rezende 14
No ratings yet
Rezende 14
9 pages
On Deep Learning For Inverse Problems: Jaweria Amjad Jure Sokoli C Miguel R.D. Rodrigues
No ratings yet
On Deep Learning For Inverse Problems: Jaweria Amjad Jure Sokoli C Miguel R.D. Rodrigues
5 pages
Notes On Deep Learning Theory
No ratings yet
Notes On Deep Learning Theory
68 pages
Do Deep Generative Models Know
No ratings yet
Do Deep Generative Models Know
19 pages
36 Neural Operator Graph Kernel N
No ratings yet
36 Neural Operator Graph Kernel N
21 pages
Deep Residual Flow For Out of Distribution Detection
No ratings yet
Deep Residual Flow For Out of Distribution Detection
14 pages
Bayesian NN
No ratings yet
Bayesian NN
82 pages
Reparametrization Trick
No ratings yet
Reparametrization Trick
8 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
Econometrica - 2021 - Farrell - Deep Neural Networks For Estimation and Inference
No ratings yet
Econometrica - 2021 - Farrell - Deep Neural Networks For Estimation and Inference
33 pages
Stochastically Reducing Overfitting in D
No ratings yet
Stochastically Reducing Overfitting in D
5 pages
4 - Mayora Cabollero
No ratings yet
4 - Mayora Cabollero
15 pages
Deep Residual Learning For Image Recognition: Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research
No ratings yet
Deep Residual Learning For Image Recognition: Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research
7 pages
Preliminary OWR 2018 11
No ratings yet
Preliminary OWR 2018 11
29 pages
1 - Lipschitz Layers Compared
No ratings yet
1 - Lipschitz Layers Compared
24 pages
Neural Networks As Radial-Interval Systems Through Learning Function
No ratings yet
Neural Networks As Radial-Interval Systems Through Learning Function
3 pages
saVANt PRE
No ratings yet
saVANt PRE
10 pages
Crescendo Milestone3 3
No ratings yet
Crescendo Milestone3 3
12 pages
DLbook
No ratings yet
DLbook
165 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
Midpaper
No ratings yet
Midpaper
16 pages
8.auto-Encoding Variational Bayes
No ratings yet
8.auto-Encoding Variational Bayes
14 pages
Auto Encoding Variational Bayes
No ratings yet
Auto Encoding Variational Bayes
14 pages
The Bayesian Approach To Inverse Prob Le
No ratings yet
The Bayesian Approach To Inverse Prob Le
107 pages
Latent Variable Models: Stefano Ermon
No ratings yet
Latent Variable Models: Stefano Ermon
26 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Flow Network Based Generative Models For Non-Iterative Diverse Candidate Generation
No ratings yet
Flow Network Based Generative Models For Non-Iterative Diverse Candidate Generation
5 pages
Eye Detection Using Optimal Wavelet Packets and Radial Basis Functions (RBFS)
No ratings yet
Eye Detection Using Optimal Wavelet Packets and Radial Basis Functions (RBFS)
18 pages
Deep Convolutional Neural Network For Inverse Problems in Imaging
No ratings yet
Deep Convolutional Neural Network For Inverse Problems in Imaging
14 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
Improved Denoising Diffusion Probabilistic Models
No ratings yet
Improved Denoising Diffusion Probabilistic Models
17 pages
Deep Evidential Regression: Alexander Amini, Wilko Schwarting, Ava Soleimany, Daniela Rus
No ratings yet
Deep Evidential Regression: Alexander Amini, Wilko Schwarting, Ava Soleimany, Daniela Rus
20 pages
AASAN.2021 - Invertible and Pseudo-Invertible Encoders An Approach To Inverse Problems With Neural Networks
No ratings yet
AASAN.2021 - Invertible and Pseudo-Invertible Encoders An Approach To Inverse Problems With Neural Networks
199 pages
Gradients Without Backpropagation
No ratings yet
Gradients Without Backpropagation
10 pages
Denoising Diffusion Restoration Models
No ratings yet
Denoising Diffusion Restoration Models
32 pages
ANN Unit1
No ratings yet
ANN Unit1
29 pages
JSCE
No ratings yet
JSCE
9 pages
AI60201 Module3 4 Problems
No ratings yet
AI60201 Module3 4 Problems
4 pages
Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps For Time Series Prediction
No ratings yet
Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps For Time Series Prediction
12 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Quiz 2 (Take-Home)
No ratings yet
Quiz 2 (Take-Home)
1 page
Quiz 3 (Take-Home)
No ratings yet
Quiz 3 (Take-Home)
1 page
Quadratic Forms Plots
No ratings yet
Quadratic Forms Plots
3 pages
CAPE 2011 Paper 2 Vector Solutions
No ratings yet
CAPE 2011 Paper 2 Vector Solutions
3 pages
Differential Calculus Mastery Guides
No ratings yet
Differential Calculus Mastery Guides
1 page
Chapter 13: Numerical Methods Results Postprocessing
No ratings yet
Chapter 13: Numerical Methods Results Postprocessing
35 pages
Fractions Improper1 PDF
100% (1)
Fractions Improper1 PDF
2 pages
JEE Main 2025 Math Checklist
No ratings yet
JEE Main 2025 Math Checklist
2 pages
E103 St10 Case Study Daniels Story
No ratings yet
E103 St10 Case Study Daniels Story
6 pages
Handout - TRAVERSE SURVEYING
No ratings yet
Handout - TRAVERSE SURVEYING
14 pages
Random Variables and Probability Distribution: Purnomo Jurusan Teknik Mesin UGM
No ratings yet
Random Variables and Probability Distribution: Purnomo Jurusan Teknik Mesin UGM
48 pages
Sequences and Series: Module Quiz: B
No ratings yet
Sequences and Series: Module Quiz: B
4 pages
Partial Differential Equation
No ratings yet
Partial Differential Equation
23 pages
Monotonocity: A. Definitions
No ratings yet
Monotonocity: A. Definitions
11 pages
HOTS Drill 3 Exercise Paper 1 Quadaratic Function 2015
No ratings yet
HOTS Drill 3 Exercise Paper 1 Quadaratic Function 2015
9 pages
MCE Cambridge Primary Maths 2E Stage6 LWS C05
No ratings yet
MCE Cambridge Primary Maths 2E Stage6 LWS C05
4 pages
Integer Notation - Pitch Class: Post - Tonal Theory
No ratings yet
Integer Notation - Pitch Class: Post - Tonal Theory
8 pages
Lesson Plan For Vector Math
No ratings yet
Lesson Plan For Vector Math
1 page
Summary of Unit 8: Math Prim 6 2 Term
No ratings yet
Summary of Unit 8: Math Prim 6 2 Term
7 pages
Spinors and The Dirac Equation
No ratings yet
Spinors and The Dirac Equation
22 pages
AdvancedCalculus2015 Cook
100% (1)
AdvancedCalculus2015 Cook
304 pages
LAB 2 Full
No ratings yet
LAB 2 Full
8 pages
Fastest Possible Processing Speed of The Human Brain - World Mental Calculation
No ratings yet
Fastest Possible Processing Speed of The Human Brain - World Mental Calculation
11 pages
Mahatma Gandhi University: Time Table
No ratings yet
Mahatma Gandhi University: Time Table
6 pages
NMTC-2022 - Previous Year Question Papers For Class 5 and 6
0% (1)
NMTC-2022 - Previous Year Question Papers For Class 5 and 6
10 pages
Python Assignment Topics
No ratings yet
Python Assignment Topics
5 pages
HMM
No ratings yet
HMM
25 pages
Symmetry and Molecular Spectroscopy: Lecture No. 4 Group Theory: Point Group Representation
No ratings yet
Symmetry and Molecular Spectroscopy: Lecture No. 4 Group Theory: Point Group Representation
8 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
31 pages
Best Math Books List
No ratings yet
Best Math Books List
3 pages
Non-Restoring Division Algorithm
100% (1)
Non-Restoring Division Algorithm
4 pages
Wk05 EE379 CS PDF
No ratings yet
Wk05 EE379 CS PDF
40 pages

Poster For Residual Flows

Uploaded by

Poster For Residual Flows

Uploaded by

Residual Flows for Invertible Generative Modeling

Ricky T. Q. Chen1,3, Jens Behrmann2, David Duvenaud1,3, Jörn-Henrik Jacobsen1,3

Contributions Unbiased Log-density via “Russian Roulette” Estimator Density Modeling

i-ResNet + ELU 1.05 3.45 3.66∼4.78

Residual Flow + ELU 1.00 3.40 3.32

You might also like