Notes For Generative AI
Notes For Generative AI
Volodymyr Kuleshov
Cornell Tech
Lecture 8
Model families:
Qn
R pθ (xi |x<i )
Autoregressive Models: pθ (x) = i=1
Variational Autoencoders: pθ (x) = pθ (x, z)dz
Autoregressive models provide tractable likelihoods but no direct
mechanism for learning features
Variational autoencoders can learn feature representations (via latent
variables z) but have intractable marginal likelihoods
Key question: Can we design a latent variable model with tractable
likelihoods? Yes! Use normalizing flows.
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models, 2023 Lecture 8 3 / 31
Normalizing Flow Models: Definition
In a normalizing flow model, the mapping between Z and X , given by
fθ : Rn 7→ Rn , is deterministic and invertible such that X = fθ (Z ) and
Z = fθ−1 (X )
∂f1 ∂f1
···
∂f ∂z1 ∂zn
J= = ··· ··· ···
∂z ∂fn ∂fn
∂z1 ··· ∂zn
det(J) = 1
Observe that:
We have a volume preserving transformation since determinant is 1.
Inverse mapping can be computed for any m.
Determinant is independent of mθ , hence we can use any function!
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models, 2023 Lecture 8 11 / 31
NICE: Rescaling Layers
Rescaling layers in NICE are defined as follows:
Forward mapping z 7→ x:
xi = si zi
where si > 0 is the scaling factor for the i-th dimension.
Inverse mapping x 7→ z:
xi
zi =
si
J = diag(s)
n
Y
det(J) = si
i=1
n n
!
Y X
det(J) = exp(αθ (z1:d )i ) = exp αθ (z1:d )i
i=d+1 i=d+1
Using four validation examples z(1) , z(2) , z(3) , z(4) , define interpolated z as:
such that p(xi | x<i ) = N (µi (x1 , · · · , xi−1 ), exp(αi (x1 , · · · , xi−1 ))2 ).
µi (·) and αi (·) are neural networks for i > 1 and constants for i = 1.
Consider a sampler for this model:
Sample zi ∼ N (0, 1) for i = 1, · · · , n
Let x1 = exp(α1 )z1 + µ1 . Compute µ2 (x1 ), α2 (x1 )
Let x2 = exp(α2 )z2 + µ2 . Compute µ3 (x1 , x2 ), α3 (x1 , x2 )
Let x3 = exp(α3 )z3 + µ3 . ...
This defines an invertible transformation from z to x. Hence, this
type of autoregressive model can be interpreted as a flow!
Figure: Inverse pass of MAF (left) vs. Forward pass of IAF (right)
Note that NICE and Real NVP are special cases of the IAF framework.
But scale and shift statistics can be computed in a single pass because they
are a function of the partition that is not being transformed.
Therefore sampling and posterior inference is fast.