0% found this document useful (0 votes)

16 views40 pages

Lecture # 5-2 PixelCNN

The document presents a lecture on PixelRNN and PixelCNN, focusing on their architectures and functionalities as autoregressive models for generating images. It discusses the components of PixelRNN, including LSTM layers and residual connections, and describes how PixelCNN utilizes convolutional layers with masked convolutions to maintain spatial resolution. The lecture also covers the evaluation of these models using datasets like CIFAR-10 and MNIST.

Uploaded by

Syed Muhammad Ali Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views40 pages

Lecture # 5-2 PixelCNN

Uploaded by

Syed Muhammad Ali Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

National University of Computer and Emerging Sciences

PixelRNN and PixelCNN

AI-4009 Generative AI

Dr. Akhtar Jamil

Department of Computer Science

04/23/2025 Presented by Dr. AKHTAR JAMIL 1

Goals
• Review of Previous Lecture
• Today’s Lecture
– Pixel Recurrent Neural Networks (PixelRNN)
– Pixel Convolutional Neural Networks (PixelCNN)

04/23/2025 Presented by Dr. AKHTAR JAMIL 2

Review of Previous Lecture

04/23/2025 Presented by Dr. AKHTAR JAMIL 3

Autoregressive models
• Popular examples of a Autoregressive Models: PixelRNN,
PixelCNN, WaveNet, etc.
• Causal convolutions allow the model to generate output
samples one timestep at a time.
• With each sample being dependent only on previous samples,
mimicking how actual output is produced in the real world.
• The advantage of causal convolutions is that they can be used in
real-time applications
– No need of the entire sequence in advance.
• They can process data as it comes in, making predictions for the
next time step using only currently available information

04/23/2025 Presented by Dr. AKHTAR JAMIL 4

Autoregressive models

Steffen et al.
2015

04/23/2025 Presented by Dr. AKHTAR JAMIL 5

Autoregressive models

The first CausalConv1D is of type A. Next, two

layers use CausalConv1D (option B) with
04/23/2025 Presented by Dr. AKHTAR JAMIL 6
Dilated convolutions
• Dilated convolutions (atrous convolutions).
• A variation of the standard convolution operation.
• Introduce an additional parameter called the dilation rate,
which defines the spacing between the kernel's elements.
• Can capture information over larger contexts without
losing resolution or increasing computational cost
significantly.
• A dilation rate of 1 means standard convolution.
• As the dilation rate increases, the kernel elements are
spread out
04/23/2025 Presented by Dr. AKHTAR JAMIL 7
Autoregressive models

The first CausalConv1D is of type A. Next, two

layers use CausalConv1D (option B) with
04/23/2025 Presented by Dr. AKHTAR JAMIL 8
PixelRNN
• PixelRNNs Components:
– Twelve fast two-dimensional Long Short-Term Memory (LSTM) layers.
• Two types of these layers are designed:
– Row LSTM layer: where the convolution is applied along each row
– Diagonal BiLSTM layer: where the convolution is applied in a novel
fashion along the diagonals of the image.
• The networks also incorporate residual connections around LSTM
layers
– Helps with the training of the PixelRNN for up to twelve layers of depth.

04/23/2025 Presented by Dr. AKHTAR JAMIL 9

PixelCNN
• Convolutional Neural Networks (CNN) is used as sequence model
with a fixed dependency range
– Need Masked convolutions.
• The PixelCNN architecture is a fully convolutional network of
fifteen layers that preserves the spatial resolution of its input
throughout the layers and outputs a conditional distribution at
each location.

04/23/2025 Presented by Dr. AKHTAR JAMIL 10

Model
• The network scans the image
one row at a time and one pixel
at a time within each row.
• For each pixel it predicts the
conditional distribution over the
possible pixel values given the
scanned context.
• The joint distribution over the
image pixels is factorized into a
product of conditional
distributions.
04/23/2025 Presented by Dr. AKHTAR JAMIL 11
Generating an Image Pixel by Pixel
•The goal is to assign a probability to each pixel of image
– A pixels.
•Assume the image as a one-dimensional sequence of pixels

•Pixels are taken from the image row by row.

•To estimate the joint distribution we write it as the product of the
conditional distributions over the pixels:

04/23/2025 Presented by Dr. AKHTAR JAMIL 12

Generating an Image Pixel by Pixel
• For color image, each pixel xi is jointly determined by three values
– One for each of the color channels: Red, Green and Blue (RGB).
• The distribution p(xi|x<i) as the following product:

• Each of the color values is conditioned on the other channels as well

as on all the previously generated pixels.
• During training and evaluation the distributions over the pixel values
are computed in parallel, while the generation of an image is
sequential.
04/23/2025 Presented by Dr. AKHTAR JAMIL 13
Pixels as Discrete Variables
• Previous approaches use a continuous
distribution for the values of the pixels in
the image.
• PixelRNN model p(x) as a discrete
distribution
– Every conditional distribution is multinomial
that is modeled with a softmax layer.
• Each channel variable xi simply takes
one of 256 distinct values.
04/23/2025 Presented by Dr. AKHTAR JAMIL 14
Today’s Lecture

04/23/2025 Presented by Dr. AKHTAR JAMIL 15

Architectural Components pf PixelRNN
• Two types of LSTM layers that use convolutions to compute at
once the states along one of the spatial dimensions.
• Incorporate residual connections to improve the training of a
PixelRNN with many LSTM layers.
• Softmax layer that computes the discrete joint distribution of the
colors and the masking technique that ensures the proper
conditioning scheme.

04/23/2025 Presented by Dr. AKHTAR JAMIL 16

LSTM Layers
• Row LSTM
• It is a unidirectional layer that processes the image row by row
from top to bottom computing features for a whole row at once;
• The computation is performed with a one-dimensional
convolution.
• For a pixel xi the layer captures a roughly triangular context above
the pixel as shown:

04/23/2025 Presented by Dr. AKHTAR JAMIL 17

LSTM Layers
• The kernel of the one-dimensional convolution has size k × 1
where k ≥ 3;
• The larger the value of k the broader the context that is captured.

Cell State

LSTM Cell RNN

04/23/2025 Presented by Dr. AKHTAR JAMIL 18
LSTM Layers
• Forget Gate: Decides what portions of the cell state should be
erased.
• Input Gate: Selects which values in the input should update the
cell state.
• Output Gate: Determines which parts of the cell state are passed
to the output.
• Cell State: Holds the LSTM unit's long-term memory across
sequence processing steps.

04/23/2025 Presented by Dr. AKHTAR JAMIL 19

LSTM Layers

04/23/2025 Presented by Dr. AKHTAR JAMIL 20

LSTM Layers
• An LSTM layer has two components:
– input-to-state component
– recurrent state-to-state component
• These together determine the four gates inside the LSTM core.
• To enhance parallelization in the Row LSTM the input-to-state
component is first computed for the entire two-dimensional input map;
• For input-to-state calculation, a k × 1 convolution is used.
• Convolution is masked to include only the valid context and produces
a tensor of size 4h×n×n
– where h is the number of output feature maps.

04/23/2025 Presented by Dr. AKHTAR JAMIL 21

LSTM Layers
• The state-to-state component of the LSTM layer is calculated:

• where xi of size h × n × 1 is row i of the input map

• Where xi is row i, represents the convolution operation and the element-
wise multiplication.
• are the kernel weights for the state-to-state and the
input-to-state components

04/23/2025 Presented by Dr. AKHTAR JAMIL 22

LSTM Layers
• The output, forget and input gates oi, fi and ii, the activation σ is
the logistic sigmoid function
• The content gate gi, σ is the tanh function.
• Each step computes the new state for an entire row of the input
map.
• The Row LSTM has a triangular receptive field, it is unable to
capture the entire available context.

04/23/2025 Presented by Dr. AKHTAR JAMIL 23

Diagonal BiLSTM
• The Diagonal BiLSTM is designed to both
parallelize the computation and to capture the
entire available context for any image size.
• In both directions of the layer scans the image
in a diagonal fashion
– From the top corner to the bottom right corner.
• Each step in the computation computes at
once the LSTM state along a diagonal in the
image.
• Resulting receptive field

04/23/2025 Presented by Dr. AKHTAR JAMIL 24

Diagonal BiLSTM
• Working: First, skew the input
map into a space
– Easy to apply convolutions along
diagonals.
• The skewing operation offsets
each row of the input map by one
position with respect to the
previous row
• This results in a map of size n ×
(2n − 1).

04/23/2025 Presented by Dr. AKHTAR JAMIL 25

Diagonal BiLSTM
• For each of the two directions, the input-to-state component is
simply a 1×1 convolution Kis that contributes to the four gates in
the LSTM core
• The operation generates a 4h × n × n tensor.
• The state-to-state recurrent component is then computed with a
column-wise convolution Kss that has a kernel of size 2 × 1.
• It takes the previous hidden and cell states, combines the
contribution of the input-to-state component and produces the
next hidden and cell states

04/23/2025 Presented by Dr. AKHTAR JAMIL 26

Diagonal BiLSTM
• The output feature map is then skewed back into an n × n map by
removing the offset positions
• This computation is repeated for each of the two directions.
• Uses a convolutional kernel of size 2 × 1 that processes a minimal
amount of information at each step
• Given the two output maps are added
– Left and the Right output map
• Kernel sizes larger than 2 × 1 are not particularly useful as they
do not broaden the already global receptive field of the Diagonal
BiLSTM
04/23/2025 Presented by Dr. AKHTAR JAMIL 27
Diagonal BiLSTM

1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
9 0 1 2 9 0 1 2
5 4 6 7 5 4 6 7

04/23/2025 Presented by Dr. AKHTAR JAMIL 28

Residual Connections
• Residual connections (He et al., 2015):
– Residual connections were introduced in the ResNet architecture.
– Create shortcuts that allow the signal to skip one or more layers.
– These connections do this by adding the input of the current layer
to its output, which helps to preserve the strength of the signal
across the network.
• PixelRNNs is trained up to twelve layers of depth.
• To increase both convergence speed and propagate signals more
directly through the network, residual connections are used.
– LSTM layer to the next.

04/23/2025 Presented by Dr. AKHTAR JAMIL 29

Residual Connections
• The input map to the PixelRNN LSTM
layer has 2h features.
• The input-to-state component reduces the
number of features by producing h
features per gate.
• After applying the recurrent layer, the
output map is upsampled back to 2h
features per position via a 1 × 1
convolution and the input map is added to
the output map.
• We can also use learnable skip
connections from each layer to the output.
• The addition of residual and layer-to-
output skip connections is more effective.

04/23/2025 Presented by Dr. AKHTAR JAMIL 30

Masked Convolution
• The h features for each input position at every layer in the network
are split into three parts
– Each corresponding to one of the RGB channels.
• When predicting the R channel for the current pixel xi, only the
generated pixels left and above of x i can be used as context.
• When predicting the G channel, the value of the R channel can
also be used as context in addition to the previously generated
pixels.
• When predicting the B channel, the values of both the R and G
channels can be used.
04/23/2025 Presented by Dr. AKHTAR JAMIL 31
Masked Convolution
• To restrict connections in the network to these
dependencies, we apply a mask to the input- to-state
convolutions and to other convolutional layers
• Two types of masks are used as indicate with mask A and
mask B
• Mask A is applied only to the first convolutional layer in a
PixelRNN and restricts the connections to those
neighboring pixels and to those colors in the current pixels
that have already been predicted.
• Mask B is applied to all the subsequent input-to-state
convolutional transitions and relaxes the restrictions of
mask A by also allowing the connection from a color to
itself.

04/23/2025 Presented by Dr. AKHTAR JAMIL 32

PixelCNN
• The Row and Diagonal LSTM layers have a potentially
unbounded dependency range within their receptive field.
• High computational cost as each state needs to be computed
sequentially.
• How about making receptive field large, but not unbounded?
• We can use standard convolutional layers
– Compute features for all pixel positions at once.
• The PixelCNN uses multiple convolutional layers that preserve the
spatial resolution;
– Pooling layers are not used.

04/23/2025 Presented by Dr. AKHTAR JAMIL 33

PixelCNN
• Masks are adopted in the convolutions to avoid seeing the future
context;

1 1 1 1 1 1 1 1 1
1 0 0 1 1* 0 1 1** 0
0 0 0 0 0 0 0 0 0
R G B

04/23/2025 Presented by Dr. AKHTAR JAMIL 34

PixelRNN and PixelCNN Specification

04/23/2025 Presented by Dr. AKHTAR JAMIL 35

Evaluation
• All our models are trained and evaluated on the Negative Log-
Likelihood (NLL) function.

• Two datasets: CIFAR-10 and MNIST

04/23/2025 Presented by Dr. AKHTAR JAMIL 36

Samples Generated

04/23/2025 Presented by Dr. AKHTAR JAMIL 37

Research Questions
• All students will research and learn about the following topics:
• 1. Explain the working of 1x1 convolutions. What is the purpose of
using 1x1 convolutions? Give use cases.
• 2. In the DiagonalBiLSTM, a 2x1 convolution kernel is used? Is it
useful to increase its size like 3x1 or 5x1? Why?
• 3. What are Skip connections? Why they are useful? How can we
select the number of skip connections?

04/23/2025 Presented by Dr. AKHTAR JAMIL 38

References
• Pixel Recurrent Neural Networks

04/23/2025 Presented by Dr. AKHTAR JAMIL 39

Thank You 

04/23/2025 Presented by Dr. AKHTAR JAMIL 40

Lecture # 5-1 Pixel RNN
No ratings yet
Lecture # 5-1 Pixel RNN
43 pages
Dis10 Sol
No ratings yet
Dis10 Sol
11 pages
DAAI - Lecture - 15 - 23nov22
No ratings yet
DAAI - Lecture - 15 - 23nov22
113 pages
Assignment
No ratings yet
Assignment
24 pages
DL - Unit IV
No ratings yet
DL - Unit IV
36 pages
Unit 4
No ratings yet
Unit 4
86 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
2020 CS182 Section 7 Notes
No ratings yet
2020 CS182 Section 7 Notes
5 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Unit V
No ratings yet
Unit V
84 pages
Case Studies
No ratings yet
Case Studies
17 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
Trustworthy - Final Essay
No ratings yet
Trustworthy - Final Essay
21 pages
DLA Unit 4
No ratings yet
DLA Unit 4
38 pages
AI Cheat
No ratings yet
AI Cheat
13 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
DL 4
No ratings yet
DL 4
5 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Deep Learning (22CS63) : Module-3
No ratings yet
Deep Learning (22CS63) : Module-3
58 pages
11.RNN and Transformers
No ratings yet
11.RNN and Transformers
100 pages
Module 04 - Learners Guide
No ratings yet
Module 04 - Learners Guide
101 pages
Unit Iv (CNN)
No ratings yet
Unit Iv (CNN)
8 pages
Iva Unit-5 Edited
No ratings yet
Iva Unit-5 Edited
42 pages
138 B Pretrained Networks Classification Complete
No ratings yet
138 B Pretrained Networks Classification Complete
47 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
L10-DL Intro
No ratings yet
L10-DL Intro
25 pages
Introduction To Rnns
No ratings yet
Introduction To Rnns
48 pages
Deep Learning CNN
No ratings yet
Deep Learning CNN
204 pages
DL Ass 742
No ratings yet
DL Ass 742
14 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
5b Dana
No ratings yet
5b Dana
67 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
23-CNN Operations - Architecture - Simple Convolution Network-09!09!2024
No ratings yet
23-CNN Operations - Architecture - Simple Convolution Network-09!09!2024
8 pages
Class Generative Models
No ratings yet
Class Generative Models
54 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
DLP&P Notes Faculty: Ms. Meenakshi Chaudhary: What Is A Convolutional Neural Network (CNN) ?
No ratings yet
DLP&P Notes Faculty: Ms. Meenakshi Chaudhary: What Is A Convolutional Neural Network (CNN) ?
50 pages
Genai See
No ratings yet
Genai See
51 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
RESNET
No ratings yet
RESNET
5 pages
IC Unit6 DeepLearning
No ratings yet
IC Unit6 DeepLearning
35 pages
12.advanced DL Topics
No ratings yet
12.advanced DL Topics
104 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
DL3 QB
No ratings yet
DL3 QB
19 pages
138 A VGG Googlenet in B Now
No ratings yet
138 A VGG Googlenet in B Now
18 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Data Science Interview Preparation (#DAY 14)
No ratings yet
Data Science Interview Preparation (#DAY 14)
11 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
From Everand
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
Analog Dialogue
No ratings yet
Applicationd of Ai in Robotics
No ratings yet
Applicationd of Ai in Robotics
9 pages
Deep Representation Learning of Patient Data From Electronic Health Records (EHR)
No ratings yet
Deep Representation Learning of Patient Data From Electronic Health Records (EHR)
13 pages
18CS753 Jan - Feb 2023 Lntroduction To Artificial Intelligence
No ratings yet
18CS753 Jan - Feb 2023 Lntroduction To Artificial Intelligence
2 pages
Computer Vision Intern - JD
No ratings yet
Computer Vision Intern - JD
3 pages
Neural Networks Question Bank
No ratings yet
Neural Networks Question Bank
42 pages
Person Head Detection Based Deep Model For People Counting in Sports Videos
No ratings yet
Person Head Detection Based Deep Model For People Counting in Sports Videos
8 pages
Pruning Attention Heads of Transformer Models Using A Search
No ratings yet
Pruning Attention Heads of Transformer Models Using A Search
22 pages
GR 10 Ai Portfoilio Activities
No ratings yet
GR 10 Ai Portfoilio Activities
9 pages
PDF 1678529419
No ratings yet
PDF 1678529419
100 pages
Variational Diffusion Unlearning
No ratings yet
Variational Diffusion Unlearning
19 pages
Kirkpatrick Et Al. - 2017 - Overcoming Catastrophic Forgetting in Neural Networks
No ratings yet
Kirkpatrick Et Al. - 2017 - Overcoming Catastrophic Forgetting in Neural Networks
14 pages
CSCE689 DRL Project Report
No ratings yet
CSCE689 DRL Project Report
7 pages
A Comparative Study of Classifying Legal Documents With Neural Networks
No ratings yet
A Comparative Study of Classifying Legal Documents With Neural Networks
9 pages
Khosla 2020
No ratings yet
Khosla 2020
7 pages
Image - Deblurring
No ratings yet
Image - Deblurring
18 pages
C 15 Mini Project
No ratings yet
C 15 Mini Project
12 pages
5 Must Read ML Books 1674952133
No ratings yet
5 Must Read ML Books 1674952133
14 pages
Lecture 1 - : Fei-Fei Li & Andrej Karpathy & Justin Johnson
No ratings yet
Lecture 1 - : Fei-Fei Li & Andrej Karpathy & Justin Johnson
47 pages
ML Set 1 QB Question Paper
No ratings yet
ML Set 1 QB Question Paper
4 pages
Purple and Green Wealth Milestones Infographic Poster - 20241126 - 085423 - 0000
No ratings yet
Purple and Green Wealth Milestones Infographic Poster - 20241126 - 085423 - 0000
1 page
Unit 3 - Requirements of Artificial Intelligence
No ratings yet
Unit 3 - Requirements of Artificial Intelligence
5 pages
Python Package Imports
No ratings yet
Python Package Imports
3 pages
NLP Roadmap 1
No ratings yet
NLP Roadmap 1
10 pages
Atharva Deshpande CV
No ratings yet
Atharva Deshpande CV
1 page
Presentacion MSTR IA
No ratings yet
Presentacion MSTR IA
29 pages
A Novel Approach For Disaster Victim Detection Under Debris Environments Using Decision Tree Algorithms With Deep Learning Features
No ratings yet
A Novel Approach For Disaster Victim Detection Under Debris Environments Using Decision Tree Algorithms With Deep Learning Features
13 pages
CS8691 Ai Iii Cse C
No ratings yet
CS8691 Ai Iii Cse C
9 pages
Bcse306l Artificial-Intelligence TH 1.0 67 Bcse306l
No ratings yet
Bcse306l Artificial-Intelligence TH 1.0 67 Bcse306l
3 pages
Lecture 1 - SML Introduction
No ratings yet
Lecture 1 - SML Introduction
29 pages
Convolution in Convolutional Neural Networks: Dot Product
No ratings yet
Convolution in Convolutional Neural Networks: Dot Product
2 pages