0% found this document useful (0 votes)

12 views50 pages

Machine Learning For Data Science 2 - Normalizing Flows V2

Uploaded by

Luka Markicevic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views50 pages

Machine Learning For Data Science 2 - Normalizing Flows V2

Uploaded by

Luka Markicevic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Normalizing flows

David Nabergoj
University of Ljubljana, Faculty of Computer and Information Science
October 12, 2023, Ljubljana
Finding well-fitting distributions
● Let’s take some samples from a 1D data-generating process
● What distribution would be a good fit?

D = [... -0.98 0.95 -0.15 -0.10 0.41 0.14 1.45 …]

Finding well-fitting distributions
● Let’s look at a histogram
Finding well-fitting distributions
● A normal distribution would be a decent choice
Finding well-fitting distributions
● How do we find the best mean and variance for our normal?
● > We search by maximizing the probability of our data.
Finding well-fitting distributions
● > Plug in the log of the normal density…
Finding well-fitting distributions
● Now we can find the best parameters by gradient descent :)
Finding well-fitting distributions
● By repeating the update steps, we arrive at a good fit :)
Increasing the difficulty
● Finding the last distribution was easy. We could have even guessed the
parameters.
● What about the next example?
Increasing the difficulty
● We could model each mode with a different Gaussian. Every sample comes
from one of the modes.
Increasing the difficulty
● It works! But we needed to be clever. And optimization is not as stable.
● But we can sample and compute the density of new points :)
Why do we even want to fit distributions?
● To sample from them
● To compute the density of new points

● Sampling = generative modeling

● Density computation: can see if a point is
likely or not (outlier detection), can answer
probabilistic questions as with a CDF
What happens in the real world?
● Distributions can be much more complicated than 1D Gaussians
○ Images
○ Text
○ Audio
○ Point clouds
○ Bayesian posteriors

● We have many more dimensions

○ A 100x100x3 image has 30 thousand dimensions
○ A text or image embedding has O(1000) dimensions
Shortcomings of specialized models
● Diffusion models make amazing images… but they cannot evaluate density.
● Highly specialized methods can detect tough outliers… but we cannot
generate new data with them.

● What if we need to do both? Or what if we need to find the exact

distribution that generated our data?
Back to basics
● There is an elementary probabilistic theorem that states (simplified):

“A transformation of a random variable is still a

random variable”.

● Remember: we can treat random variables as distributions.

Back to basics

● Let’s look at the theorem again.

Something complex This can be simple

“A transformation of a random variable is still a

random variable”.

Our complex data came from here

● Our data are just transformed samples of a simple distribution!

Back to basics

● All generative models use such transformations.

● Why can’t they compute the density?
● They need an invertible transformation!
● It lets us apply this formula:
Introducing normalizing flows

● A normalizing flow is a distribution.

● It is a transformation of a simple base distribution with an invertible map.
● The invertible map makes flows different from other generative models.
Introducing normalizing flows

● To compute the log probability density:

● To generate data: sample from the base distribution, transform the sample
with the inverse map.
What does sampling look like?
The small question
● What base distribution do we use?

● One that lets us compute the density easily

● One that we can sample from easily
● A standard normal fits the bill :)

… by the way, it needs to have the same dimensionality as our data.

The BIG question
● How to define the invertible map?

● Need something expressive

● Need to compute the Jacobian easily
● Need to invert the map easily
The BIG answer
● How to define the invertible map?

● Need something expressive … use neural networks

● Need to compute the Jacobian easily … restrict the architecture
● Need to invert the map easily … restrict the architecture
Composing invertible maps
● Idea: a composition of invertible maps is invertible.
● We will have a sequence of invertible layers.

● We define one invertible layer, repeat it, and we’re done :)

What invertible functions do we know
● Shift: adding a number to an input is invertible.
● Shift and scale: scaling and then shifting is too.
● Permutation: shuffling elements of a vector is invertible, just have to
remember the order.

These are easy to invert and have an easy to compute Jacobian.

Where do we plug in neural networks?
● We can make a flow by stacking shifts, scales, and permutations.
● BUT it won’t be very expressive, as they are all linear.
● We want nonlinear invertible layers.

● If we are clever, we can make shift and scale “nonlinear!” 🤯

How?
Making nonlinear invertible maps
● Let x be our layer input
● Let y be our layer output

The genius idea:

● Split x into two disjoint parts (x1, x2)

● Split y the same way
● Keep y1 = x1
● Transform y2 = s * x2 + t
Making nonlinear invertible maps
The genius idea (cont’d):

● Split x into two disjoint parts (x1, x2)

● Split y the same way
● Keep y1 = x1
● Transform y2 = s * x2 + t
● Use a neural network to predict (s, t) from x1
Making nonlinear invertible maps
How to invert this?

● Receive y as input
● Split it into y1, y2
● Keep x1 = y1
● Take x1 and use it to predict (s, t)
● x2 = (y2 - t) / s
● Concatenate x1 and x2 into x :)
Coupling flows
These maps are called coupling layers.

● They split a vector into two parts.

● They keep one part the same.
● They predict the parameters for the
transformer with a conditioner neural
network.
● They transform the other part with these
parameters.

Stacking coupling layers makes a coupling flow!

Coupling flows
Here:

● transformer = affine map,

● conditioner = feed-forward neural network.

In general:

● transformer = any invertible map,

● conditioner = any function.
Can we compute the log Jacobian determinant?
Yes.

Turns out the Jacobian is triangular.

Log det = sum of the log of diagonal elements.

Small detail
The input x1 will never be transformed.

But:

● permutations are invertible,

● we can place them between coupling blocks.

This will shuffle dimensions and ensure each is

transformed.
Autoregressive flows
We can generalize coupling layers to autoregressive layers.

● Autoregressive layers transform inputs so that each output dimension is only

affected by its preceding input dimensions.

Transformer Parameters Conditioner

Output Input Preceding dimensions

Autoregressive flows
Coupling layers are also autoregressive layers:

● Each output dimension of y1 is only

affected by itself (valid).
● Each output dimension of y2 is only
affected by x1 which precede them (valid).
How to implement autoregressive flows
Theoretically, we could do it with coupling layers.

● Make many coupling layers

● Each layer’s conditioner gets one more dimension input than the last

But this is very slow.

How to implement autoregressive flows
Better strategy - a masked autoencoder:

● takes the entire vector x as input,

● predicts as outputs the parameters for each dimension,

The detail:

● Certain weights become zero,

● this ensures each dimension is only affected by preceding ones,
How to implement autoregressive flows
Using a masked autoencoder
as a conditioner network
results in a masked
autoregressive layer.
By the way, training autoregressive flows is easy
● We want to maximize the probability of samples.
● Parameters of the distribution = parameters of the conditioner network.
● We proceed by gradient descent (or Adam, Adagrad, …).
So what are these architectures called?
● NICE: shift transformer, coupling conditioner
● Real NVP: affine transformer, coupling conditioner
● MAF: affine transformer, masked autoregressive conditioner
● IAF: affine transformer, masked autoregressive conditioner (reverse direction)
● LRS-NSF: linear rational spline transformer, either conditioner
● RQ-NSF: rational quadratic spline transformer, either conditioner
● SINF: rational quadratic spline transformer, coupling conditioner (also
includes some orthogonal transforms in between)
● UMNN-MAF: monotonic neural network transformer, either conditioner
● … some others as well
Can we see some examples?
You can use Glow to make images. Example: project two images into the space of
the base distribution, then draw a line in this base space. Points on the line
interpolate between images in original space.
Can we see some examples?
You can use SoftFlow to generate point clouds. Train it on point clouds of planes,
chairs, armchairs; then sample new kinds of these objects.
Can we see some examples?
You can model a variety of complex distributions with all normalizing flows.
Can we see some examples?
Normalizing flows are used in sampling from complex Bayesian posteriors (e.g.
molecular dynamics, cosmology, quantum chromodynamics).

Example: we take a molecule of Alanine Dipeptide and dissolve it in a solvent. We

want to model the distribution of dihedral angles (phi and psi). We use a
simulation, so we never actually have reference angles.

Reliably obtaining the distribution of these angles has

big implications for protein folding, ligand docking, etc.
Can we see some examples?
By combining a Markov Chain Monte Carlo method with a normalizing flow, we
obtain the final distribution of these angles.
Can we see some examples?
An example from cosmology is
modeling orbital binary systems using
gravitational wave measurements.

When two massive objects (e.g. black

holes or neutron stars) rotate in a
binary system and begin merging, they
send out ripples in space-time, called
gravitational waves.

We can measure gravitational waves

using specialized observatories.
Can we see some examples?
For a system of two black holes,
we can use gravitational wave
measurements to infer black hole
parameters (e.g. mass, spin, etc.).

Normalizing flows can be used

within Markov chain Monte Carlo to
obtain the distributions of these
parameters.
How can we try this out ourselves?
Packages:

● github.com/davidnabergoj/normalizing-flows (releasing soon, general)

● github.com/VincentStimper/normalizing-flows (general)
● github.com/bayesiains/nflows (general)
● Tensorflow probability: https://www.tensorflow.org/probability (general)
● Distrax: https://github.com/google-deepmind/distrax (general)
● pocoMC: github.com/minaskar/pocomc (flows in cosmology)
● github.com/janosh/awesome-normalizing-flows (overview)
What does the code look like?
Could not be simpler.
Where can I read more?
● Good review: Papamakarios et al. Normalizing Flows for Probabilistic
Modeling and Inference. 2021.
● Our coupling flow: Dinh et al. Density estimation using Real NVP. 2017.
● Our masked AR flow: Papamakarios et al. Masked Autoregressive Flow for
Density Estimation. 2018.

Flow matching guide and code
No ratings yet
Flow matching guide and code
83 pages
2408.16046v1
No ratings yet
2408.16046v1
40 pages
Doubt in Torsional Resistance Calculation (Ixx) - MIDAS Support (1)
No ratings yet
Doubt in Torsional Resistance Calculation (Ixx) - MIDAS Support (1)
9 pages
diffusion-csail-lecture-notes
No ratings yet
diffusion-csail-lecture-notes
56 pages
Deep Learning of Conjugate Mappings
No ratings yet
Deep Learning of Conjugate Mappings
30 pages
Lecture # 4-2 Autoregressive Models
No ratings yet
Lecture # 4-2 Autoregressive Models
39 pages
Generalized Normalizing Flows via Markov Chains 1st Edition Paul Lyonel Hagemann instant download
No ratings yet
Generalized Normalizing Flows via Markov Chains 1st Edition Paul Lyonel Hagemann instant download
80 pages
Tutorialon Diffusion Modelsfor Imaging and Vision
No ratings yet
Tutorialon Diffusion Modelsfor Imaging and Vision
90 pages
Boltzmann Learning
No ratings yet
Boltzmann Learning
47 pages
Idea 2024(l) Neural Autoregressive Flows
No ratings yet
Idea 2024(l) Neural Autoregressive Flows
16 pages
NN and Optimization Regularization
No ratings yet
NN and Optimization Regularization
198 pages
2310.16624v2
No ratings yet
2310.16624v2
29 pages
rezende20a
No ratings yet
rezende20a
10 pages
Deep Learning notes
No ratings yet
Deep Learning notes
155 pages
Noise Flow and Wavelet Flow - CVPR2021
No ratings yet
Noise Flow and Wavelet Flow - CVPR2021
30 pages
1705.07057
No ratings yet
1705.07057
17 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
mit_diffusion
No ratings yet
mit_diffusion
30 pages
AI - W6L12
No ratings yet
AI - W6L12
44 pages
Alice's Adventures in A Differentiable Wonderland
No ratings yet
Alice's Adventures in A Differentiable Wonderland
279 pages
Food Chemistry and Dairy Technology Module 2024-3
No ratings yet
Food Chemistry and Dairy Technology Module 2024-3
63 pages
Neural Manifold Operators for Learning the Evolution of Physical
No ratings yet
Neural Manifold Operators for Learning the Evolution of Physical
11 pages
Notes For Generative AI
No ratings yet
Notes For Generative AI
31 pages
2001.05419v3
No ratings yet
2001.05419v3
14 pages
dl
No ratings yet
dl
80 pages
Lecture 5 - Radiation Types
No ratings yet
Lecture 5 - Radiation Types
20 pages
Neural Operator - Learning Maps Between Function Spaces
No ratings yet
Neural Operator - Learning Maps Between Function Spaces
93 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
Flow Based Deep Generative Models Report
No ratings yet
Flow Based Deep Generative Models Report
12 pages
Synthesis of Allylpalladium Complexes
No ratings yet
Synthesis of Allylpalladium Complexes
6 pages
stimper22a (1)
No ratings yet
stimper22a (1)
22 pages
notes_chapter_Convolutional_Neural_Networks
No ratings yet
notes_chapter_Convolutional_Neural_Networks
6 pages
Flow Priv
No ratings yet
Flow Priv
26 pages
(Original PDF) A Graphical Approach to Precalculus with Limits, 7th Editioninstant download
100% (4)
(Original PDF) A Graphical Approach to Precalculus with Limits, 7th Editioninstant download
47 pages
Normalizing Flows For Probabilistic Modeling and Inference PDF
No ratings yet
Normalizing Flows For Probabilistic Modeling and Inference PDF
64 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Nice: N - I C E: ON Linear Ndependent Omponents Stimation
No ratings yet
Nice: N - I C E: ON Linear Ndependent Omponents Stimation
13 pages
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
No ratings yet
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
19 pages
Denker2021 - Conditional Normalizing Flow
No ratings yet
Denker2021 - Conditional Normalizing Flow
27 pages
6 Batchnorm
No ratings yet
6 Batchnorm
30 pages
DL UNIT 3
No ratings yet
DL UNIT 3
14 pages
Exploring Normalizing Flow For Anomaly Detection
No ratings yet
Exploring Normalizing Flow For Anomaly Detection
38 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
Seager Science 2013
No ratings yet
Seager Science 2013
6 pages
5893 Densely Connected Normalizing
No ratings yet
5893 Densely Connected Normalizing
15 pages
YMC PIEZOTRONICS INC. Catalog 2022
No ratings yet
YMC PIEZOTRONICS INC. Catalog 2022
43 pages
An Overview of Orthodontic Wires
No ratings yet
An Overview of Orthodontic Wires
6 pages
Technical Notice VOLT VOLT WIND INT ANSI 2
No ratings yet
Technical Notice VOLT VOLT WIND INT ANSI 2
28 pages
Normalizing Flows An Introduction and Review of Current Methods
No ratings yet
Normalizing Flows An Introduction and Review of Current Methods
16 pages
Final Term Paper Draft 2
No ratings yet
Final Term Paper Draft 2
33 pages
IAF Kingma Et Al 2016
No ratings yet
IAF Kingma Et Al 2016
16 pages
Embedded-Model Flows Combining The Inductive Biase
No ratings yet
Embedded-Model Flows Combining The Inductive Biase
22 pages
Data-Driven Prediction of General Hamiltonian Dynamics Via Learning Exactly-Symplectic Maps
No ratings yet
Data-Driven Prediction of General Hamiltonian Dynamics Via Learning Exactly-Symplectic Maps
17 pages
Manual Pressureswitch b400 b700 Snapact
No ratings yet
Manual Pressureswitch b400 b700 Snapact
2 pages
Drawing01-02-03-04 - 245kV-800A - Silicone Rubber
No ratings yet
Drawing01-02-03-04 - 245kV-800A - Silicone Rubber
1 page
Biologically Moivated Computer Vision
No ratings yet
Biologically Moivated Computer Vision
29 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
21 pages
c110
No ratings yet
c110
1 page
The Physics Principle That Inspired Modern AI Art - Quanta Magazine
No ratings yet
The Physics Principle That Inspired Modern AI Art - Quanta Magazine
10 pages
Parallel AC Circuits SPLP
No ratings yet
Parallel AC Circuits SPLP
47 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
Linear Processing: Spring 2023 16-725 (Cmu Ri) : Bioe 2630 (Pitt) Dr. John Galeotti
No ratings yet
Linear Processing: Spring 2023 16-725 (Cmu Ri) : Bioe 2630 (Pitt) Dr. John Galeotti
20 pages
2 Resistance
No ratings yet
2 Resistance
4 pages
Image Operations I
No ratings yet
Image Operations I
41 pages
Chapter 8 Redox Reactions
No ratings yet
Chapter 8 Redox Reactions
8 pages
L15 Autoregressive and Reversible Models
No ratings yet
L15 Autoregressive and Reversible Models
7 pages
Physics, Grade 12, Unit-1 Lesson - 1
50% (2)
Physics, Grade 12, Unit-1 Lesson - 1
2 pages
STC 111 Notes
No ratings yet
STC 111 Notes
8 pages
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
No ratings yet
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
13 pages
Subject Name: Pharmaceutical Engineering: Unit I Size Reduction
No ratings yet
Subject Name: Pharmaceutical Engineering: Unit I Size Reduction
46 pages
Dip Manual
No ratings yet
Dip Manual
74 pages
Frequently Asked Questions: Lecture 7 To 9 Hydraulic Pumps
No ratings yet
Frequently Asked Questions: Lecture 7 To 9 Hydraulic Pumps
5 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
p134 03 PDF
No ratings yet
p134 03 PDF
2 pages
Types of Chemical Reactions Worksheet Writing Formulas: Ca CL Cacl Al So Al (So)
No ratings yet
Types of Chemical Reactions Worksheet Writing Formulas: Ca CL Cacl Al So Al (So)
5 pages
Sika Anchorfix®-3030: Product Data Sheet
No ratings yet
Sika Anchorfix®-3030: Product Data Sheet
4 pages
CEC 106 Practical
75% (4)
CEC 106 Practical
44 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
IS 1893.2005 Part-4
No ratings yet
IS 1893.2005 Part-4
23 pages
1 Two and Three Dimensional Problems
No ratings yet
1 Two and Three Dimensional Problems
4 pages
Gree Service Manual
33% (3)
Gree Service Manual
57 pages
ACI 318 Anchoring To Concrete
No ratings yet
ACI 318 Anchoring To Concrete
2 pages
Railway Workshop Report
100% (1)
Railway Workshop Report
21 pages
Aircraft Debris Trajectory Analysis
100% (1)
Aircraft Debris Trajectory Analysis
21 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
ISI R: Organic Chemistry
No ratings yet
ISI R: Organic Chemistry
28 pages
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Machine Learning For Data Science 2 - Normalizing Flows V2

Uploaded by

Machine Learning For Data Science 2 - Normalizing Flows V2

Uploaded by

Normalizing flows

D = [... -0.98 0.95 -0.15 -0.10 0.41 0.14 1.45 …]

● Sampling = generative modeling

● We have many more dimensions

● What if we need to do both? Or what if we need to find the exact

“A transformation of a random variable is still a

● Remember: we can treat random variables as distributions.

● Let’s look at the theorem again.

Something complex This can be simple

“A transformation of a random variable is still a

Our complex data came from here

● Our data are just transformed samples of a simple distribution!

● All generative models use such transformations.

● A normalizing flow is a distribution.

● To compute the log probability density:

● One that lets us compute the density easily

… by the way, it needs to have the same dimensionality as our data.

● Need something expressive

● Need something expressive … use neural networks

● We define one invertible layer, repeat it, and we’re done :)

These are easy to invert and have an easy to compute Jacobian.

● If we are clever, we can make shift and scale “nonlinear!” 🤯

The genius idea:

● Split x into two disjoint parts (x1, x2)

● Split x into two disjoint parts (x1, x2)

● They split a vector into two parts.

Stacking coupling layers makes a coupling flow!

● transformer = affine map,

● transformer = any invertible map,

Turns out the Jacobian is triangular.

Log det = sum of the log of diagonal elements.

● permutations are invertible,

This will shuffle dimensions and ensure each is

● Autoregressive layers transform inputs so that each output dimension is only

Transformer Parameters Conditioner

Output Input Preceding dimensions

● Each output dimension of y1 is only

● Make many coupling layers

But this is very slow.

● takes the entire vector x as input,

● Certain weights become zero,

Example: we take a molecule of Alanine Dipeptide and dissolve it in a solvent. We

Reliably obtaining the distribution of these angles has

When two massive objects (e.g. black

We can measure gravitational waves

Normalizing flows can be used

● github.com/davidnabergoj/normalizing-flows (releasing soon, general)

You might also like