0% found this document useful (0 votes)

79 views36 pages

8 Generative AI

Uploaded by

tomahawx3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views36 pages

8 Generative AI

Uploaded by

tomahawx3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

CIS433

AI and Deep Learning

Huaxia Rui

February 29, 2024

H. RUI
1/36
CIS433
Generative AI

Agenda

8 Generative AI
Generative Pre-trained Transformer (GPT)
Variational Autoencoder (VAE)
Generative Adversarial Network (GAN)
Diffusion
Multimodal Learning

H. RUI
2/36
CIS433
Generative AI

Generative AI

2014–2017: VAE and GAN Era

2018–2019: Transformer Era
2020–2022: Big Model Era

H. RUI
3/36
CIS433
Generative AI
Generative Pre-trained Transformer (GPT)

GPT

1 def sample_next (probs , temperature =1.0):

2 probs = np.log(np. asarray ( probs). astype (" float64 "))
3 probs = np.exp( probs / temperature )
4 probs = probs / np.sum(probs)
5 sample = np. random . multinomial (1, probs , 1)
6 return np. argmax ( sample ) # sample is a one -hot ndarray
Since p > q ⇒ limt→0 (p/q)1/t → ∞, greedy sampling (t → 0) always chooses
the most likely token. Stochastic sampling (t = 1) randomly chooses based on
the original distribution.
This movie is nothing short of crap that they might start off with an oscar winner but i still wanted to kill ...
This movie is one of my favourite martial arts action martial arts action plays a jackie policeman who is it H. RUI
hard representation decides to fight mobsters 4/36
CIS433
Generative AI
Generative Pre-trained Transformer (GPT)

1 class TextGenerator ( keras . callbacks . Callback ):

2 def __init__ ( self , prompt , generate_len , input_len ,
3 temperatures =(1. ,) , print_freq =1 ):
4 self. prompt = prompt
5 self. generate_len = generate_len
6 self. model_input_len = input_len
7 self. temperatures = temperatures
8 self. print_freq = print_freq
9 vprompt = text_vectorization ([ prompt ]) [0]. numpy ()
10 self. prompt_len = np. nonzero ( vprompt == 0) [0][0]
11 # np. nonzero () returns a tuple of arrays
12 def on_epoch_end (self , epoch , logs=None):
13 if (epoch + 1) % self. print_freq != 0: return
14 for temperature in self. temperatures :
15 sentence = self. prompt
16 for i in range(self. generate_len ):
17 tokens = text_vectorization ([ sentence ])
18 predictions = self. model( tokens )
19 next_token = sample_next ( predictions [0,
20 self. prompt_len -1+i, :] )
21 sentence += " " + tokens_index [ next_token ] H. RUI
5/36
CIS433
Generative AI
Generative Pre-trained Transformer (GPT)

1 def prepare_lm_dataset ( text_batch ):

2 vectorized_sequences = text_vectorization ( text_batch )
3 x = vectorized_sequences [:, :-1] # remove the last
4 y = vectorized_sequences [:, 1:] # offset by one
5 return x, y
6
7 lm_dataset = dataset .map( prepare_lm_dataset ,
8 num_parallel_calls =4 )
9
10 inputs = Input( shape =( None ,), dtype=" int64 " )
11 x = PositionalEmbedding (seq_len , 10000 , d_emb)( inputs )
12 x = GT( d_emb =d_emb , n_heads =n_heads , d_ff=d_ff )(x)
13 outputs = Dense (10000 , activation =" softmax ")(x)
14 model = keras .Model( inputs , outputs )
15 model . compile ( loss=" sparse_categorical_crossentropy ",
16 optimizer =" rmsprop " )
17 prompt = "This movie"
18 cb = TextGenerator (prompt , input_len =seq_len ,
19 generate_len =50 , temperatures =(0.7 , 1))
20 model .fit( lm_dataset , epochs =200 , callbacks =[cb])
H. RUI
6/36
CIS433
Generative AI
Variational Autoencoder (VAE)

Image Generation
The key idea of image generation is to learn a low-dimensional vector space
where any point represents a “valid” image — one resembling a real thing.

The module that maps a latent point to an image, called a generator or a

decoder, is then used to sample from the latent space and produce fake but
H. RUI
realistic image that are essentially the in-betweens of training images. 7/36
CIS433
Generative AI
Variational Autoencoder (VAE)

Autoencoders
By recovering the input data, autoencoders learn a low-dimensional latent
space of “valid” images. Clearly, an encoder too powerful may just learn to
map each input to a single arbitrary number with the decoder being its inverse.
PCA is an
autoencoder
with MSE loss
and linear
activation.

1 encoder = Sequential ([
2 Flatten ( input_shape =[28 , 28]) ,
3 Dense (100 , activation ="selu"),
4 Dense (30 , activation ="selu") ])
5 decoder = Sequential ([
6 Dense (100 , activation ="selu", input_shape =[30]) ,
7 Dense (28*28 , activation =" sigmoid "),
8 Reshape ([28 ,28]) ])
9 ae = Sequential ([ encoder , decoder ]) H. RUI
8/36
CIS433
Generative AI
Variational Autoencoder (VAE)

Variational Autoencoders (VAE)

Variational autoencoders are probabilistic autoencoders.
Autoencoder: maps an image directly to a point in the latent space.
VAE: maps an image to a Gaussian distribution N (µ, σ 2 ) in the latent
space which is then used to sample an actual representation, or coding.

For numerical stability, the encoder outputs log(σ 2 ) instead of σ. H. RUI

9/36
CIS433
Generative AI
Variational Autoencoder (VAE)

1 img = Input ( shape =(28 , 28, 1) )

2 x = Conv2D (32 , 3, activation ="relu", padding ="same",
3 strides =2)(img) # use strides to downsample
4 x = Conv2D (64 , 3, activation ="relu", padding ="same",
5 strides =2)(x)
6 x = Flatten ()(x) # output shape: 3136 = 7*7*64
7 x = Dense (16 , activation ="relu")(x)
8 z_mean = Dense (d_latent , name=" z_mean ")(x)
9 z_logvar = Dense(d_latent , name=" z_log_var ")(x)
10 encoder = Model (img , [z_mean , z_logvar ], name=" encoder ")
11 # Unlike pooling , striding retains spatial information .
12 z = Input ( shape =( d_latent ,) )
13 x = Dense (7 * 7 * 64, activation ="relu")(z)
14 x = Reshape ( (7, 7, 64) )(x) # revert Flatten ()
15 x = Conv2DTranspose (64 , 3, activation ="relu", strides =2,
padding ="same")(x) # output shape (14 ,14 ,64)
16 x = Conv2DTranspose (32 , 3, activation ="relu", strides =2,
padding ="same")(x) # output shape (28 ,28 ,32)
17 outputs = Conv2D (1, 3, activation =" sigmoid ",
18 padding ="same")(x) # shape (28 ,28 ,1)
19 decoder = Model (z, outputs , name=" decoder ") H. RUI
10/36
CIS433
Generative AI
Variational Autoencoder (VAE)

1 def train_step (self , data): # for class VAE(keras .Model)

2 with tf. GradientTape () as tape:
3 z_mean , z_logvar = self. encoder (data)
4 z = self. sampler (z_mean , z_logvar ) # Gaussian sample
5 recon = decoder (z)
6 recon_loss = tf. reduce_mean (tf. reduce_sum (
7 binary_crossentropy (data , recon), axis =(1, 2)))
8 kl_loss = -0.5 * (1 + z_log_var - tf. square ( z_mean )
9 - tf.exp( z_log_var ))
10 loss = recon_loss + tf. reduce_mean ( kl_loss )
11
12 grads = tape. gradient (loss , self. trainable_weights )
13 self. optimizer . apply_gradients (
14 zip(grads , self. trainable_weights ) )
15
16 self. loss_tracker . update_state (loss)
17 self. recon_loss_tracker . update_state ( recon_loss )
18 self. kl_loss_tracker . update_state ( kl_loss )
19 return { "loss": self. loss_tracker . result (),
20 " recon_loss ": self. recon_loss_tracker . result (),
21 " kl_loss ": self. kl_loss_tracker . result ()} H. RUI
11/36
CIS433
Generative AI
Variational Autoencoder (VAE)

Even though labels were not used for training,

VAE has learnt different digits by itself in order
to help minimize reconstruction loss.

VAE generates continuous

distribution of image
classes, with one morphing
into another as we traverse
through the latent space.
They are great for learning
well-structured latent space
where specific directions
encode meaningful axis of
variation.

H. RUI
12/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

Generative Adversarial Networks (GANs)

GANs consist of a generator

trying to generate fake data
resembling the real data and a
discriminator trying to
distinguish them. Each training
iteration is divided into 2
phases with opposite goals.
1 Train the discriminator
using a balanced sample
of real and fake data.
2 Train the generator with
the discriminator
component frozen.

H. RUI
13/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

1 discriminator = Sequential ([
2 Input( shape =(64 , 64, 3)),
3 Conv2D (64, kernel_size =4, strides =2, padding ="same"),
4 LeakyReLU (alpha =0.2) ,
5 Conv2D (128 , kernel_size =4, strides =2, padding ="same"),
6 LeakyReLU (alpha =0.2) ,
7 Conv2D (128 , kernel_size =4, strides =2, padding ="same"),
8 LeakyReLU (alpha =0.2) ,
9 Flatten () ,Dropout (0.2) ,Dense (1, activation =" sigmoid ")])
10
11 generator = Sequential ([
12 Input( shape =( d_latent ,)),
13 Dense (8*8*128) ,
14 Reshape ((8 ,8 ,128)),
15 Conv2DTranspose (128 , kernel_size =4, strides =2, padding
16 ="same"), LeakyReLU ( alpha =0.2) ,
17 Conv2DTranspose (256 , kernel_size =4, strides =2, padding
18 ="same"), LeakyReLU ( alpha =0.2) ,
19 Conv2DTranspose (512 , kernel_size =4, strides =2, padding
20 ="same"), LeakyReLU ( alpha =0.2) ,
21 Conv2D (3, 5, padding ="same", activation =" sigmoid ") ]) H. RUI
14/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

Generated images are labeled as 1. Let the latent dimension be 128, the batch
size be 32, and the dataset of real images be real_img.
1 class GAN(keras. Model ):
2 def __init__ (self , discriminator ,generator , d_latent ):
3 super (). __init__ ()
4 self. discriminator = discriminator
5 self. generator = generator
6 self. d_latent = d_latent
7 self. d_loss_metric =keras . metrics .Mean(name=" d_loss ")
8 self. g_loss_metric =keras . metrics .Mean(name=" g_loss ")
9
10 def compile (self , d_optimizer , g_optimizer , loss_fn ):
11 super (). compile ()
12 self. d_optimizer = d_optimizer
13 self. g_optimizer = g_optimizer
14 self. loss_fn = loss_fn
15
16 # the decorator implements a getter
17 @property
18 def metrics (self):
19 return [self. d_loss_metric , self. g_loss_metric ] H. RUI
15/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

1 def train_step (self , real_images ):

2 batch = tf.shape( real_images )[0]
3
4 # train the discriminator , for 1 step
5 x = tf. random . normal ( shape =( batch , self. d_latent ) )
6 fake_imgs = self. generator (x)
7 imgs = tf. concat ( [fake_imgs , real_imgs ], axis =0 )
8 y = tf. concat ([tf.ones (( batch ,1)), # 1: fake
9 tf.zeros (( batch ,1))], axis =0) # 0: real
10 y += 0.05 * tf. random . uniform (tf.shape(y))
11
12 with tf. GradientTape () as tape:
13 predictions = self. discriminator (imgs)
14 d_loss = self. loss_fn (y, predictions )
15
16 grads = tape. gradient ( d_loss ,
17 self. discriminator . trainable_weights )
18 self. d_optimizer . apply_gradients ( zip(grads ,
19 self. discriminator . trainable_weights ) )
H. RUI
16/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

20 # train the generator , for 1 step

21 x = tf. random . normal ( shape =( batch , self. d_latent ) )
22 with tf. GradientTape () as tape:
23 predictions = self. discriminator (self. generator (x))
24 g_loss = self. loss_fn (tf.zeros (( batch ,1)), # 0: real
25 predictions )
26 grads = tape. gradient (g_loss ,
27 self. generator . trainable_weights )
28 self. g_optimizer . apply_gradients ( zip(grads ,
29 self. generator . trainable_weights ))
30
31 # update metrics
32 self. d_loss_metric . update_state ( d_loss )
33 self. g_loss_metric . update_state ( g_loss )
34 return { " d_loss ": self. d_loss_metric . result (),
35 " g_loss ": self. g_loss_metric . result ()}

H. RUI
17/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

Wasserstein GAN with Gradient Penalty

P
Binary cross-entropy loss − n1 n i=1 yi log pi + (1 − yi ) log(1 − pi ) labels
response from {1, 0}. Wasserstein loss labels response from {1, −1} and
requires no activation from the final layer of the discriminator.
GAN discriminator optimization GAN generator optimization
z }| { z }| {
min − Ex∼pX log D(x) + Ez∼pZ log 1 − D(G(z)) min −Ez∼pZ log D(G(z))
D G

min − Ex∼pX D(x) − Ez∼pZ D(G(z)) min −Ez∼pZ D(G(z))
D G
| {z } | {z }
WGAN discriminator/critic optimization WGAN generator optimization

WGAN critic outputs a score in (−∞, ∞) and tries to maximize the difference
between scores of its predictions for real images and fake images. The
Wasserstein loss can be very large which is troubling for neural network. To
counter this, we require the critic to be a 1-Lipschitz continuous function:

|D(x1 ) − D(x2 )| There is a double cone

≤1
∥x1 − x2 ∥1 whose origin can be moved
where ∥x1 − x2 ∥1 is the average along the graph while the
pixelwise absolute difference whole graph always stays H. RUI
between images x and x . outside the double cone. 18/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

1 def gradient_penalty (self , batch_size , real , fake):

2 alpha = tf. random . normal ([ batch_size , 1,1,1], 0,1)
3 # calculate a set of interpolated images
4 interpolated = real + alpha * (fake - real)
5 with tf. GradientTape () as gp_tape :
6 gp_tape . watch ( interpolated )
7 pred = self. critic ( interpolated , training =True)
8 grads = gp_tape . gradient (pred , [ interpolated ]) [0]
9 norm = tf.sqrt(tf. reduce_sum ( tf. square (grads),
10 axis =[1, 2, 3]))
11 # penalize deviation from 1
12 return tf. reduce_mean ( (norm -1) ** 2)
An advantage of Wasserstein loss is that we don’t need to worry about
balancing the training of the critic and the generator. In fact, the critic
must be trained to convergence before updating the generator.
Batch normalization shouldn’t be used in the critic.
1 c_wass_loss = tf. reduce_mean ( fake_scores )
2 - tf. reduce_mean ( real_scores )
3 c_gp = self. gradient_penalty ( batch_size , real , fake)
H. RUI
4 c_loss = c_wass_loss + c_gp * self. gp_weight 19/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

Conditional GAN (CGAN)

Because the critic now has access to

extra information on image content,
the generator is forced to fake images
that agree with the content label so as
to keep fooling the critic. Otherwise,
the critic can dectect fake images by
noticing the discrepancy between the
image content and the content label.

1 critic_input = Input( shape =(64 ,64 ,3) )

2 label_input = Input( shape =(64 ,64 ,2) )
3 x = Concatenate (axis = -1)( [ critic_input , label_input ] )

1 generator_input = Input ( shape =( D_LATENT ,) )

2 label_input = Input( shape =(2 ,) )
3 x = Concatenate (axis = -1)([ generator_input , label_input ])
H. RUI
20/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

1 def train_step (self , data):

2 real , labels = data # unpack images and labels
3 label_images = labels [:, None , None , :] # match size
4 label_images = tf. repeat ( label_images ,
5 repeats =64, axis =1 )
6 label_images = tf. repeat ( label_images ,
7 repeats =64, axis =2 )
8 b = tf. shape (real)[0]
9 # train the critic
10 for i in range (self. critic_steps ):
11 x = tf. random . normal ( shape =(b, self. d_latent ))
12 with tf. GradientTape () as tape:
13 fake = self. generator ([x, labels ], training =True)
14 fake_scores = self. critic ([fake , label_images ]
15 ,training =True )
16 real_scores = self. critic ([real , label_images ]
17 ,training =True )
18 c_wass_loss = tf. reduce_mean ( fake_scores )
19 - tf. reduce_mean ( real_scores )
20 c_loss = c_wass_loss + self. gp_weight *
21 self. gradient_penalty (b,real ,fake , label_images ) H. RUI
21/36
CIS433
Generative AI
Generative Adversarial Network (GAN)

22 c_gradient = tape. gradient ( c_loss ,

23 self. critic . trainable_variables )
24 self. c_optimizer . apply_gradients ( zip(c_gradient ,
25 self. critic . trainable_variables ) )
26 # train the generator
27 x = tf. random . normal ( shape =(b, self. d_latent ) )
28 with tf. GradientTape () as tape:
29 # feed with both latent vectors and label vectors
30 fake = self. generator ( [x, labels ], training =True )
31 # feed with both fake images and label images
32 fake_scores = self. critic ( [fake , label_images ],
33 training =True)
34 g_loss = -tf. reduce_mean ( fake_scores )
35
36 gen_gradient = tape. gradient (g_loss ,
37 self. generator . trainable_variables )
38 self. g_optimizer . apply_gradients ( zip( gen_gradient ,
39 self. generator . trainable_variables ) )

H. RUI
22/36
CIS433
Generative AI
Diffusion

Diffusion
If we normalize the original image x0 to have zero mean and unit variance, xT
will approximate a standard Gaussian distribution N (0, 1) for large enough T .

√ √
xt ≡ αt xt−1 + 1 − αt ϵt−1

where {ϵt } are i.i.d. standard Gaussian.

√ √ p √
xt = αt αt−1 xt−2 + 1 − αt−1 ϵt−2 + 1 − αt ϵt−1
√ p √
= αt αt−1 xt−2 + αt (1 − αt−1 )ϵt−2 + 1 − αt ϵt−1
| {z }
N (0,1−αt αt−1 )
√ p
= αt αt−1 xt−2 + 1 − αt αt−1 ϵ̃t−2
..
.
√ √ Yt
= ᾱt x0 + 1 − ᾱt ϵ̃0 where ᾱt ≡ αi .
i=1

Given a diffusion schedule {ᾱt }, we√can jump from x0 to any step of the
forward diffusion process xt ∼ N ( ᾱt x0 , (1 − ᾱt )I) ≡ q(xt |x0 ). H. RUI
23/36
CIS433
Generative AI
Diffusion

t π
ᾱt = cos2 ·
T 2
t π
1 − ᾱt = sin2 ·
T 2
1 def cosine_diffusion_schedule ( diffusion_times ):
2 signal_rates = tf.cos( diffusion_times * math.pi / 2)
3 noise_rates = tf.sin( diffusion_times * math.pi / 2)
4 return noise_rates , signal_rates
5 def linear_diffusion_schedule ( diffusion_times ):
6 alphas = 0.9999 - diffusion_times * 0.0199
7 alpha_bars = tf.math. cumprod ( alphas )
8 return tf.sqrt( alpha_bars ), tf.sqrt (1 - alpha_bars ) H. RUI
24/36
CIS433
Generative AI
Diffusion

Algorithm 1 Training Process for a Denoising Diffusion Model

1: repeat
2: x0 ∼ q(x0 ) ▷ sample an image
3: t ∼ Uniform({1,
√ √ · · · , T }), ϵ ∼ N (0, I)
4: xt = ᾱt x0 + 1 − ᾱt ϵ ▷ transform x0 by t noising steps
5: The neural network predicts the noise as ϵθ using (xt , ᾱt ).
6: Take gradient descent step on ∇θ ∥ϵ − ϵθ ∥2
7: until convergence

1 def train_step (self , images ):

2 images = self. normalizer (images , training =True)
3 # sample noise to match the shape of images
4 noises = tf. random . normal ( shape =(32 , 64, 64, 3) )
5 # sample one diffusion time step for each image
6 t = tf. random . uniform ( shape =(32 , 1, 1, 1),
7 minval =0.0 , maxval =1.0 )
8 # obtain 1 noise rate and 1 signal rate for each image
9 noise_rates , signal_rates = self. diffusion_schedule (t)
10 noisy_images = signal_rates * images + noise_rates * noises H. RUI
25/36
CIS433
Generative AI
Diffusion

Unlike VAE encoder, the forward process is unparameterized, but like the VAE
decoder, the reverse diffusion process also aims to transform random input into
meaningful output using a network.
11 with tf. GradientTape () as tape:
12 pred_noises , pred_images =self. denoise ( noisy_images ,
13 noise_rates , signal_rates , training =True)
14 # denoise () passes noisy images to the network
15 noise_loss = self.loss(noises , pred_noises )
16 gradients =tape. gradient (noise_loss ,
17 self. network . trainable_weights )
18 self. optimizer . apply_gradients ( zip(gradients ,
19 self. network . trainable_weights ) )
20 self. noise_loss_tracker . update_state ( noise_loss )
21 # the diffusion model maintains 2 networks
22 # the exponential moving average is more robust for
23 # generation than the actively trained network
24 for w, ema_w in zip( self. network .weights ,
25 self. ema_network . weights ):
26 ema_weight . assign ( EMA* ema_w + (1- EMA)*w )
27 return {m.name: m. result () for m in self. metrics }
H. RUI
26/36
CIS433
Generative AI
Diffusion

1 def denoise (self , noisy_images ,

2 noise_rates , signal_rates , training ):
3 if training :
4 nn = self. network
5 else:
6 nn = self. ema_network
7
8 pred_noises = nn( [ noisy_images , noise_rates **2] ,
9 training = training )
10 pred_images = ( noisy_images - noise_rates * pred_noises )
11 / signal_rates
12 return pred_noises , pred_images

H. RUI
27/36
CIS433
Generative AI
Diffusion

Inside generate(self, n_img, n_steps )

1 step = 1.0 / n_steps
2 current_images = tf. random . normal (shape =( n_img ,64 ,64 ,3))
3 for i in range( n_steps ):
4 t = tf.ones( (n_img , 1, 1, 1) ) - i * step
5 n_rates , s_rates = self. diffusion_schedule (t)
6 pred_noises , pred_images = self. denoise ( current_images ,
7 n_rates , s_rates , training =False)
8 # the next is based on the currently predicted one
9 n_rates , s_rates = self. diffusion_schedule ( t-step )
10 current_images = s_rates * pred_images
11 + n_rates * pred_noises
12 images = self. normalizer .mean +
13 pred_images * self. normalizer . variance **0.5
14 return tf. clip_by_value (images , 0.0, 1.0)

Increasing the number of diffusion steps (i.e.,

n_steps) in the reverse process improves the
image quality, as is shown in the figure.
H. RUI
28/36
CIS433
Generative AI
Diffusion

We can interpolate between points in the Gaussian latent space to smoothly

transition between images in the pixel space.

The initial noise map at each step is a sin( π2 t) + b cos( π2 t)

t ranges from 0 to 1
a, b are two randomly sampled Gaussian noise tensors. H. RUI
29/36
CIS433
Generative AI
Diffusion

Each successive DownBlock first increases the number

of channels via ResidualBlock and then halves the
image size via AveragePooling2D.
Each successive UpBlock decreases the number of
channels via ResidualBlock while also concatenating
the outputs from the DownBlock through skip
connections across the U-Net.
1 def ResidualBlock ( n_channels ):
2 def f(x):
3 if x. shape [3] == n_channels :
4 identity = x
5 else:
6 identity = Conv2D ( n_channels , kernel_size =1)(x)
7 x = BatchNormalization ( center =False , scale= False )(x)
8 x = Conv2D ( n_channels , 3, padding ="same",
9 activation =keras. activations .swish)(x)
10 x = Conv2D ( n_channels , 3, padding ="same")(x)
11 x = Add () ([x, identity ])
12 return x
13 return f H. RUI
30/36
CIS433
Generative AI
Diffusion

1 def DownBlock (n, depth):

2 def f(x):
3 x, skips = x
4 for _ in range (depth ):
5 x = ResidualBlock (n)(x)
6 skips . append (x)
7 x = AveragePooling2D ( pool_size =2)(x)
8 return x
9 return f
10
11 def UpBlock (n, depth ):
12 def f(x):
13 x, skips = x
14 # By default , UpSampling2D () copies to match size
15 x = UpSampling2D (size =2, interpolation =" bilinear ")(x)
16 for _ in range (depth ):
17 x = Concatenate () ([x, skips.pop ()])
18 x = ResidualBlock (n)(x)
19 return x
20 return f
H. RUI
31/36
CIS433
Generative AI
Diffusion

1 noisy_images = Input( shape =(64 ,64 ,3) )

sin_embedding() 2 x = Conv2D (32, 1)( noisy_images )
converts a scalar into 3 noise_variances = Input(shape =(1 ,1 ,1))
a 32-dimensional vector. 4 embedding = Lambda ( sin_embedding )(
keras.layers.Lambda() 5 noise_variances )
wraps it as a Layer 6 embedding = UpSampling2D ( size =(64 ,64))
object for the model. 7 ( embedding )
8 x = Concatenate ()([x, embedding ])
9 skips =[] # hold outputs from DownBlocks
10 x = DownBlock (32, depth =2) ([x, skips ])
11 x = DownBlock (64, depth =2) ([x, skips ])
12 x = DownBlock (96, depth =2) ([x, skips ])
13 x = ResidualBlock (128)(x)
14 x = ResidualBlock (128)(x)
15 x = UpBlock (96, depth =2) ([x, skips ])
16 x = UpBlock (64, depth =2) ([x, skips ])
17 x = UpBlock (32, depth =2) ([x, skips ])
18 x = Conv2D (3, 1, kernel_initializer =
19 " zeros")(x)
20 unet = keras. Model(
H. RUI
21 [ noisy_images , noise_variances ], x) 32/36
CIS433
Generative AI
Multimodal Learning

Multimodal Models
CLIP, or Contrastive Language-Image Pre-training (OpenAI 2021) is a “neural
network that eﬀiciently learns visual concepts from natural language
supervision” trained on 400 million text-image pairs by maximizing the cosine
similarity between text-image pairs and minimizing the cosine similarity
between incorrect text-image pairs.

Both the text encoder and the image encoder are Transformers where the
Vision Transformer (ViT) applies an encoder Transformer on a sequence of
H. RUI
nonoverlapping input patches of an image with positional embedding. 33/36
CIS433
Generative AI
Multimodal Learning

DALL.E 2

The CLIP training process learns a joint representation space for text and
images, as is depicted above the dotted line. The text-to-image generation
process is depicted below the dotted line.
1 A CLIP text embedding is first fed to a diffusion prior to produce an
image embedding.
2 The image embedding is used to condition a diffusion decoder which
produces a final image.
H. RUI
The CLIP model is frozen during training of the prior and decoder 34/36
CIS433
Generative AI
Multimodal Learning

The diffusion decoder of DALL.E 2 borrows from GLIDE, or Guided Language

to Image Diffusion for Generation and Editing (OpenAI 2021) which is
trained from scratch based on text prompts instead of using CLIP embeddings.
GLIDE uses a U-Net for the denoiser, the Transformer architecture for the
text encoder, and an Upsampler to scale the image to 1024×1024.
GLIDE trains its 3.5B parameters from scratch, including 2.3B for the
visual part and 1.2B for the Transformer. H. RUI
35/36
CIS433
Generative AI
Multimodal Learning

Stable Diffusion
Stable Diffusion (Stability AI, 2022) wraps the diffusion model within an
autoencoder, so the diffusion process operates on a latent space of the image
rather than the pixel space.

Compared with U-Net models that operate in pixel space, the denoising U-Net
of Stable Diffusion is lighter.
The autoencoder does the heavy lifting of encoding image details into
latent space and decoding the latent space back to the pixel space.
H. RUI
The diffusion model work purely in a latent conceptual space. 36/36

Generative AI Notes
100% (1)
Generative AI Notes
3 pages
Lec 19
No ratings yet
Lec 19
111 pages
VAE Vs GAN
100% (1)
VAE Vs GAN
3 pages
Gen AI Unit 1
100% (1)
Gen AI Unit 1
86 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
Gen Ai
No ratings yet
Gen Ai
23 pages
10 - Generative AI
No ratings yet
10 - Generative AI
71 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
Generative AI
No ratings yet
Generative AI
24 pages
Comprehensive Report On Generative AI and Computer Vision Projects
No ratings yet
Comprehensive Report On Generative AI and Computer Vision Projects
15 pages
Module 3 Presentation
No ratings yet
Module 3 Presentation
48 pages
DAAI - Lecture - 15 - 23nov22
No ratings yet
DAAI - Lecture - 15 - 23nov22
113 pages
COMP9491 Week2 Deep - Learning 1
No ratings yet
COMP9491 Week2 Deep - Learning 1
66 pages
Dis10 Sol
No ratings yet
Dis10 Sol
11 pages
New Thesis Topics in Oral Medicine and Radiology
100% (3)
New Thesis Topics in Oral Medicine and Radiology
6 pages
AI For AI Ethics - Lecture 31
No ratings yet
AI For AI Ethics - Lecture 31
12 pages
Lec15 Generative Models
No ratings yet
Lec15 Generative Models
51 pages
Module 5
No ratings yet
Module 5
23 pages
Introduction To Modern Industrial Engineering
100% (2)
Introduction To Modern Industrial Engineering
221 pages
Module 5
No ratings yet
Module 5
23 pages
Part 15 MD
No ratings yet
Part 15 MD
36 pages
Unit 5 Autoencoders
No ratings yet
Unit 5 Autoencoders
6 pages
AI60201 Module3
No ratings yet
AI60201 Module3
61 pages
Unit2 PracticeQuestions
No ratings yet
Unit2 PracticeQuestions
3 pages
Headlight Wiring
No ratings yet
Headlight Wiring
127 pages
Data Umum SSH 2024
No ratings yet
Data Umum SSH 2024
376 pages
Gen AI Lab Questions
No ratings yet
Gen AI Lab Questions
3 pages
Lect-Gen Ai-2
No ratings yet
Lect-Gen Ai-2
22 pages
Lec 28
No ratings yet
Lec 28
19 pages
Lect-Gen Ai-2
No ratings yet
Lect-Gen Ai-2
22 pages
Assignment 14 Modern AI
No ratings yet
Assignment 14 Modern AI
3 pages
Introvae: Introspective Variational Autoencoders For Photographic Image Synthesis
No ratings yet
Introvae: Introspective Variational Autoencoders For Photographic Image Synthesis
20 pages
GenAI Unit1 3
No ratings yet
GenAI Unit1 3
31 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Generative Models
No ratings yet
Generative Models
65 pages
GAPE Module 1
No ratings yet
GAPE Module 1
29 pages
Unit 1 Intoduction To Generative AI
No ratings yet
Unit 1 Intoduction To Generative AI
8 pages
SinclairCollins K-Series 02 2016
No ratings yet
SinclairCollins K-Series 02 2016
20 pages
Exploring The Various Machine Learning Models For Image Generation - A Comprehensive Survey Unlocking The Future of Digital Creativity
No ratings yet
Exploring The Various Machine Learning Models For Image Generation - A Comprehensive Survey Unlocking The Future of Digital Creativity
15 pages
Final
No ratings yet
Final
17 pages
Generative AI (Spring-2025) Assignment-1 Instructor Dr. Hajra Waheed PHD
No ratings yet
Generative AI (Spring-2025) Assignment-1 Instructor Dr. Hajra Waheed PHD
5 pages
Combinevae&Gan 4
No ratings yet
Combinevae&Gan 4
19 pages
Paper4 (GAN)
No ratings yet
Paper4 (GAN)
24 pages
Generative Ai
No ratings yet
Generative Ai
21 pages
DL Asmt-2
No ratings yet
DL Asmt-2
17 pages
Applsci 13 10637 v2
No ratings yet
Applsci 13 10637 v2
29 pages
WAH5 - Functional Language Worksheets
No ratings yet
WAH5 - Functional Language Worksheets
6 pages
Thesis Port Service
100% (3)
Thesis Port Service
7 pages
Cambridge O Level: Environmental Management 5014/22
No ratings yet
Cambridge O Level: Environmental Management 5014/22
11 pages
Active Assisted Knee Flexion and Extension
No ratings yet
Active Assisted Knee Flexion and Extension
46 pages
Unsupervised Deep Learning
No ratings yet
Unsupervised Deep Learning
11 pages
Video GPT
No ratings yet
Video GPT
14 pages
Architectures RST
No ratings yet
Architectures RST
4 pages
Madhavan M Ts Report
No ratings yet
Madhavan M Ts Report
32 pages
Make 02 00020
No ratings yet
Make 02 00020
19 pages
Swimming Pool Structural Calcs
100% (1)
Swimming Pool Structural Calcs
7 pages
PIL - 3rd Sem LLB
No ratings yet
PIL - 3rd Sem LLB
68 pages
John B. Goodenough
No ratings yet
John B. Goodenough
11 pages
Unit I
No ratings yet
Unit I
11 pages
Report: Trends in Generative Models
No ratings yet
Report: Trends in Generative Models
10 pages
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
No ratings yet
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
17 pages
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
No ratings yet
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
15 pages
CVAE-GAN Fine-Grained Image Generation Through Asymmetric Training
No ratings yet
CVAE-GAN Fine-Grained Image Generation Through Asymmetric Training
10 pages
Generative Nural Network
No ratings yet
Generative Nural Network
5 pages
Describe The Algorithm
No ratings yet
Describe The Algorithm
4 pages
Semitic Alphabets
No ratings yet
Semitic Alphabets
16 pages
Three High-Altitude Peoples, Three Adaptations To Thin Air
No ratings yet
Three High-Altitude Peoples, Three Adaptations To Thin Air
11 pages
UNIT 4 Generative AI PDF
No ratings yet
UNIT 4 Generative AI PDF
6 pages
Genaitable
No ratings yet
Genaitable
3 pages
Class Note 2: Intermediate Concepts in Generative AI
No ratings yet
Class Note 2: Intermediate Concepts in Generative AI
4 pages
Agriculture Assist (Synopsis)
79% (14)
Agriculture Assist (Synopsis)
13 pages
Amine Unit
100% (1)
Amine Unit
69 pages
Autopage C3-RS665 PDF
No ratings yet
Autopage C3-RS665 PDF
34 pages
File Page No 1663658874765
No ratings yet
File Page No 1663658874765
10 pages
2022ce11566 Srijan Lab
No ratings yet
2022ce11566 Srijan Lab
9 pages
Generative Ai
No ratings yet
Generative Ai
7 pages
Aapl 10k2013
No ratings yet
Aapl 10k2013
91 pages
Unit 4 Generative AI
No ratings yet
Unit 4 Generative AI
5 pages
Generative Model Type
No ratings yet
Generative Model Type
1 page
Class Note 1: Introduction To Generative AI (Beginner Level)
No ratings yet
Class Note 1: Introduction To Generative AI (Beginner Level)
4 pages
What Is Athletic Sports and Management?
No ratings yet
What Is Athletic Sports and Management?
3 pages
Work Immersion Instructions
No ratings yet
Work Immersion Instructions
14 pages
SL 1297 - Rudder Tube Assembly Inspection 2021-10-22
No ratings yet
SL 1297 - Rudder Tube Assembly Inspection 2021-10-22
4 pages
Airbnb Seasonality and Revenue Data Trends For Grand Prairie - AirDNA MarketMinder
No ratings yet
Airbnb Seasonality and Revenue Data Trends For Grand Prairie - AirDNA MarketMinder
2 pages
Urological Oncology: A Comparison Between Clinical and Pathologic Staging in Patients With Bladder Cancer
No ratings yet
Urological Oncology: A Comparison Between Clinical and Pathologic Staging in Patients With Bladder Cancer
5 pages
Recovery Is Everywhere Handout
No ratings yet
Recovery Is Everywhere Handout
3 pages
Worksheet 3 LS6 - MIANO, REYMARK
No ratings yet
Worksheet 3 LS6 - MIANO, REYMARK
1 page
James Hou - Salesforce - Com Developer Resume
No ratings yet
James Hou - Salesforce - Com Developer Resume
3 pages
My NoteBook
No ratings yet
My NoteBook
17 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet