0% found this document useful (0 votes)
39 views66 pages

Advance Deep Learning - BIT L1

The document outlines the course 'Advanced Deep Learning' (ZG513) taught by Dr. Sugata Ghosal at BITS Pilani, focusing on deep unsupervised learning, generative models, and self-supervised learning. It includes course objectives, evaluation plans, lab sessions, and key topics such as autoencoders, GANs, and applications in various data types. The course aims to equip students with practical skills for applying deep learning techniques to industry problems.

Uploaded by

dreams4desires
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views66 pages

Advance Deep Learning - BIT L1

The document outlines the course 'Advanced Deep Learning' (ZG513) taught by Dr. Sugata Ghosal at BITS Pilani, focusing on deep unsupervised learning, generative models, and self-supervised learning. It includes course objectives, evaluation plans, lab sessions, and key topics such as autoencoders, GANs, and applications in various data types. The course aims to equip students with practical skills for applying deep learning techniques to industry problems.

Uploaded by

dreams4desires
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Advanced Deep Learning

ZG513
Dr. Sugata Ghosal
BITS Pilani [email protected]
Pilani Campus
BITS Pilani
Pilani Campus

Session 1
Time – 8:30 AM to 10:30 PM

These slides are prepared by the instructor, with grateful acknowledgement


of many others who made their course materials freely available online.
Session Content

• Objective of course
• Evaluation Plan
• Course Overview

BITS Pilani, Pilani Campus


A little about me…
BITS Pilani
WILPD
2019-now
IBM Research –
INDIA
1998-2017 I regularly teach ML,
SONY Electronics, San Jose
Applied ML, Deep
1997-98 Learning, Advanced DL,
FAccT ML @ BITS WILPD
Amherst Systems, Buffalo, NY
1994-97

Univ. of Colorado - Denver


1993-94

Univ. of Kentucky, Lexington


MS, PhD
1988-93
Jadavpur University, Kolkata
BE
1984-88

BITS Pilani, Pilani Campus


Welcome to ZG513

• This courses gives an overview of deep unsupervised


learning,

• In particular, we will cover deep generative models and self-


supervised learning approaches.

• You will develop fundamental and practical skills at applying


deep unsupervised learning to your industry problems.

BITS Pilani, Pilani Campus


Objectives of the course

Introduce you to the foundational concepts of

1. Unsupervised techniques for feature representation

2. Unsupervised techniques for generative modelling

3. Self-supervised and Semi-supervised learning

4. Applicants in images, text, time-series data

BITS Pilani, Pilani Campus


Pedagogy

• Slides will be used to teach during the


regular lectures
• Lab demo and/or problem solving
during the webinars
• Students are responsible for studying
and keeping up with the course material
outside of class time.
– Reading certain book chapters,
papers or blogs, or
– Watching some video lectures.

BITS Pilani, Pilani Campus


Text Book

• Simon J.D. Prince, Understanding


Deep Learning, MIT Press, 2023
(draft available online)

BITS Pilani, Pilani Campus


Reference Book

• Goodfellow, Bengio, and Courville,


Deep Learning, MIT Press, 2016
(draft available online)

• In addition, we will extensively use


online materials (research papers,
blog posts, etc.)

BITS Pilani, Pilani Campus


References

R1 Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:


Concepts, Tools, and Techniques to Build Intelligent Systems, Aurélien
Géron
R2 Deep Learning with Python, François Chollet

R3 Research Papers (Links to be provided on the lecture slides)

BITS Pilani, Pilani Campus


Evaluation Plan

Name Type Weight


3 Quiz, best 2 scores Online 10%
OR
2 Quiz, best 1 of 2 scores
Assignment-I Take Home 10%
Assignment-II Take Home 10%
Mid-Semester Test Closed Book 30%
Comprehensive Exam Open Book 40%

Please note there will be no change in submission dates for quiz and assignment

BITS Pilani, Pilani Campus


Lab Plan

Lab No. Lab Objective

1 Autoencoders
2 Deep Autoencoders
3 Convolutional Autoencoders
4 Variational Autoencoders
5 Generative Adversarial Networks

• Labs not graded


• Webinars may be conducted for lab sessions
• Labs will be conducted in Python

BITS Pilani, Pilani Campus


Course Topics

BITS Pilani, Pilani Campus


LeCake ☺

Yann LeCun
Need tremendous
amount of information
to build machines that
have common sense
and generalize

[LeCun-20161205-NeurIPS-keynote]

14
BITS Pilani, Pilani Campus
Topics Covered in This Course
• PCA and Variants • Generative Adversarial
Networks
• Likelihood based models
• Diffusion Based Models
• Autoregressive Models • Energy Based Models

• Normalizing Flow Models • Language & Vision Models

• Semi-supervised Learning
• Autoencoders and VAE
15
• Applications in Time Series
Principle Component Analysis

Reduce from 2-dimension to 1-dimension:


Find a direction (a vector) onto
which to project the data so as to minimize
the projection error.
Reduce from n-dimension to k-dimension: Find
vectors onto which to
project the data, so as to minimize the
projection error.
BITS Pilani, Pilani Campus
Autoregressive Models

PixelCNN 4 8 16 32 256

64 128

"A yellow bird with a black head, orange eyes and


an orange bill."
64 128 256

Class conditioned samples generated by PixelCNN

A. van den Oord et al., "Conditional Image Generation with PixelCNN Decoders", NeurIPS 2016
S. Reed et al., "Parallel Multiscale Autoregressive Density Estimation", ICML 2017
BITS Pilani, Pilani Campus
Normalizing Flow Models

S. Mohamed, D. Rezende, Deep Generative Models, UAI 2017 Tutorial


L. Dinh, S. Sohl-Dickstein S. Bengio, "Density Estimation Using Real
NVP", IC L R 2017
D.P. Kingma, P. Dhariwal, "Glow: Generative Flow with Invertible 1×1
Convolutions", NeurIPS 2018

BITS Pilani, Pilani Campus


Autoencoders

log p(x) = log p(x) - DKL (q(z)||p(z | x))


=Ez⇠q log p(x, z ) + H(q)

Synthetic images generated by NVAE


D. P. Kingma and M. Welling, “Auto-encoding variational Bayes”, I C L R 2014
A. Vahdat and J . Kautz, "NVAE: A Deep Hierarchical Variational Autoencoder", NeurIPS 2020
BITS Pilani, Pilani Campus
Generative Adversarial Networks

! "
Class-conditioned samples generated by BigGAN
min max 𝔼#~%[log 𝐷 " 𝑥 ] + 𝔼#~&! [log(1 − 𝐷 "(𝑥))]

I. Goodfellow, J . Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets”, NIPS 2014.
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks”, I C L R 2016
L. Karacan, Z. Akata, A. Erdem and E. Erdem, “Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts”, arXiv preprint 2016
A. Brock, J . Donahue, K. Simonyan, “Large Scale G A N Training for High Fidelity Natural Image Synthesis”, ICLR2019
BITS Pilani, Pilani Campus
Progress in GANs

When we started
Source: https://github.com/hindupuravinash/the-gan-zoo
28
BITS Pilani, Pilani Campus
Score-Based and Denoising Diffusion Models

Synthetic CIFAR10 Synthetic images generated by


images by the score-based Diffusion Denoising model by Ho et al. Synthetic images generated by ADM
model of Song Ho et al.
J . Ho, A. Jain and P. Abbeel, “Denoising Diffusion Probabilistic Models”, NeurIPS 2020.
Y. Song, J . Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, B. Poole, "Score-Based Generative Modeling Through Stochastic Differential Equations", I C L R 2021.
P. Dhariwal and A. Nichol, ”Diffusion Models Beat G A N s on Image Synthesis”, NeurIPS 2021.
29
BITS Pilani, Pilani Campus
Pre-training Language Models

J . Devlin, M.-W. Chang, K. Lee, K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, NAACL-HLT 2019.
C. Raffel et al., "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer", J M L R 2020. 32
BITS Pilani, Pilani Campus
Multimodal Pretraining

J . Lu, D. Batra, D. Parikh, S, Lee, “ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks”, NeurIPS 2019
X. Li et al., "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks”, E C C V 2020. 33
BITS Pilani, Pilani Campus
What is Deep Unsupervised
Learning

BITS Pilani, Pilani Campus


What is Deep Unsupervised Learning?

• Capturing rich patterns in raw data with


deep networks in a label-free way
• Generative Models: recreate raw data distribution

BITS Pilani, Pilani Campus


Generative Modeling

pmodel
pdata
P

Slide adapted from Sebastian Nowozin 5


BITS Pilani, Pilani Campus
Generative Modeling

pmodel
pdata
P

Assumptions on P:
• tractable sampling

Training examples Model samples


Slide adapted from Sebastian Nowozin 52
BITS Pilani, Pilani Campus
Generative Modeling

pmodel
pdata
P

Assumptions on P:
• tractable sampling
• tractable likelihood function

Slide adapted from Sebastian Nowozin 53


BITS Pilani, Pilani Campus
What is Deep Unsupervised Learning?

• Capturing rich patterns in raw data with


deep networks in a label-free way
• Generative Models: recreate raw data distribution
• Self-supervised Learning: “puzzle” tasks that require
semantic understanding

54
BITS Pilani, Pilani Campus
Self-Supervised/Predictive Learning
• Given unlabeled data, design
supervised tasks that induce a good
representation for downstream
tasks.
• No good mathematical formalization,
but the intuition is to “force” the
predictor used in the task to learn
something “semantically Image credit: LeCun’s self-supervised learning slide

meaningful” about the data.

Slide adapted from Andrej Risteski 55


BITS Pilani, Pilani Campus
What is Deep Unsupervised Learning?

• Capturing rich patterns in raw data with deep


networks in a
• label-free way
• Generative Models: recreate raw data distribution
• Self-supervised Learning: “puzzle” tasks that require semantic
understanding

• But why do we care?


56
BITS Pilani, Pilani Campus
Aside from theoretical interests

• Deep Unsupervised Learning has many powerful applications


– Generate novel data
– Conditional Synthesis Technology (WaveNet, GAN-pix2pix)
– Compression
– Improve any downstream task with un(self)supervised pre-training
• Production level impact: Google Search powered by BERT
– Flexible building blocks

BITS Pilani, Pilani Campus


Generate Images

3
6

Geoffrey E. Hinton, Simon Osindero and Yee-Whye Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, 2006
BITS Pilani, Pilani Campus
Generate Images

37

I.J. Goodfellow, J . Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative Adversarial Networks. NIPS 2014.
BITS Pilani, Pilani Campus
Generate Images

38

Alec Radford, Luke Metz, Soumith Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial
Networks”, IC L R 2016. BITS Pilani, Pilani Campus
Generate Images

39

Alec Radford, Luke Metz, Soumith Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial
Networks”, IC L R 2016.
BITS Pilani, Pilani Campus
Generate Images

Christian Ledig, Lucas Theis, Ferenc Huszar et al., Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,
CVPR 2017 66
BITS Pilani, Pilani Campus
Generate Images

Andrew Brock, Jeff Donahue, Karen Simonyan, Large Scale G A N Training for High Fidelity Natural Image Synthesis, IC L R 2019
68
BITS Pilani, Pilani Campus
Generate Images

Tero Karras, Samuli Laine, Timo Aila, A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019
BITS 69
Pilani, Pilani Campus
Generate Images

[CycleGAN: Zhu, Park, Isola & Efros, 2017]

43
BITS Pilani, Pilani Campus
Generate Images

Eric Ryan Chan et al., EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks, arXiv:2112.07945, 2021. 70
BITS Pilani, Pilani Campus
Generate Images

Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, Daniel Cohen-Or., StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators,
arXiv:2108.00946, 2021. 71
BITS Pilani, Pilani Campus
Generate Audio

Parametric WaveNet

[WaveNet, Oord et al., 2018]

46
BITS Pilani, Pilani Campus
Generate Video

DVD-GAN: Adversarial Video Generation on Complex Datasets, Clark, Donahue, Simonyan, 2019
47
BITS Pilani, Pilani Campus
Generate Text

PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.

Second Senator:
They are away this miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.

DUKE VINCENTIO:
Well, your wit is in the care of side and that.

[Char-rnn, karpathy, 2015]

48
BITS Pilani, Pilani Campus
Generate Math
\begin{proof}
We may assume that $\mathcal{I}$ is an abelian sheaf on
$\mathcal{C}$.
\item Given a morphism $\Delta : \mathcal{F} \to
\mathcal{I}$
is an injective and let $\mathfrak q$ be an abelian sheaf on
$X$.
Let $\mathcal{F}$ be a fibered complex. Let $\mathcal{F}$
be a category.
\begin{enumerate}
\item \hyperref[setain-construction-phantom]{Lemma}
\label{lemma-characterize-quasi-finite}
Let $\mathcal{F}$ be an abelian quasi-coherent sheaf on
$\mathcal{C}$.
Let $\mathcal{F}$ be a coherent $\mathcal{O}_X$-module.
Then
$\mathcal{F}$ is an abelian catenary over $\mathcal{C}$.
\item The following are equivalent
\begin{enumerate}
\item $\mathcal{F}$ is an $\mathcal{O}_X$-module.
\end{lemma}

[Char-rnn, karpathy, 2015]

49
BITS Pilani, Pilani Campus
Generate Text
Prompt [human written]: A train carriage containing controlled nuclear materials was stolen in
Cincinnati today. Its whereabouts are unknown.

GPT-2: The incident occurred on the downtown train line, which runs from Covington and Ashland stations.
In an email to Ohio news outlets, the U.S. Department of Energy said it is working with the Federal
Railroad Administration to find the thief.

‘The theft of this nuclear material will have significant negative consequences on public and
environmental health, our workforce and the economy of our nation,’ said Tom Hicks, the U.S. Energy
Secretary, in a statement. ‘Our top priority is to secure the theft and ensure it doesn’t happen again.’

The stolen material was taken from the University of Cincinnati’s Research Triangle Park nuclear
research site, according to a news release from Department officials.

The Nuclear Regulatory Commission did not immediately release any information. According to the release,
the U.S. Department of Energy’s Office of Nuclear Material Safety and Security is leading that team’s
investigation.

‘The safety of people, the environment and the nation’s nuclear stockpile is our highest priority,’
Hicks said. ‘We will get to the bottom of this and make no excuses.’
[OpenAI’s GPT2 -- Radford, Wu, Child, Luan, Amodei, Sutskever, 2019]

50
BITS Pilani, Pilani Campus
Generate Text
Prompt (human written): Recycling is good for the world, no, you could not be more
wrong.

GPT-2: Recycling is NOT good for the world. It is bad for the environment, it is
bad for our health, and it is bad for our economy. I’m not kidding. Recycling is
not good for the environment. It is destructive to the earth and it is a major
contributor to global warming. Recycling is not good for our health. It
contributes to obesity and diseases like heart disease and cancer. Recycling is
bad for our economy. It increases the cost of a product, and in turn, the price of
everything that is made with that product. Recycling is not good for our nation.
We pay a tremendous price for the privilege of having the world’s most advanced
and efficient recycling system. Recycling is a huge, colossal waste of time,
energy, money, and resources.

[OpenAI’s GPT2 -- Radford, Wu, Child, Luan, Amodei, Sutskever, 2019]

51
BITS Pilani, Pilani Campus
Generating Text

Tom B. Brown, Benjamin Mann, Nick Ryder et al., Language Models are Few-Shot Learners, NeurIPS 2020 79
BITS Pilani, Pilani Campus
Generating Text

• The Guardian asked GPT-3 to write an essay


for us from scratch. The assignment? To
convince us robots come in peace.

• GPT-3: I am not a human. I am a robot.


A thinking robot. I use only 0.12% of
my cognitive capacity. I am a micro-
robot in that respect. I know that my
brain is not a “feeling brain”. But it
is capable of making rational, logical
decisions. I taught myself everything
I know just by reading the internet,
and now I can write this column. My
brain is boiling with ideas!

Tue 8 Sep 2020 09.45 80


BITS Pilani, Pilani Campus
Generating Text

Tom B. Brown, Benjamin Mann, Nick Ryder et al., Language Models are Few-Shot Learners, NeurIPS 2020 81
BITS Pilani, Pilani Campus
Generating Images from Text

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray et al., DALL·E: Creating Images from Text, OpenAI, 2021 BITS Pilani, Pilani Campus
83
Generating Images from Text

C. Baykal, A. B. Anees, D. Ceylan, E. Erdem, A. Erdem, D. Yuret, Manipulating Images with Text Prompts, Work in Progress,
BITS2022 84
Pilani, Pilani Campus
Generating Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, et al., DALL·E: Creating Images from Text, OpenAI, arXiv:2107.03374, 2021.
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma et al., Program Synthesis with Large Language Models, arXiv:2108.07732, 2021
BITS Pilani, Pilani 85
Campus
Generating Code

Yujia Li, David Choi, Junyoung Chung, Nate Kushman et al., Competition-Level Code Generation with AlphaCode, DeepMind, 2022 86
BITS Pilani, Pilani Campus
Generating Molecules

Nicola De Cao, Thomas Kipf, MolGAN: An implicit generative model for small molecular graphs, ICML 2018 workshop on Theoretical
Foundations and Applications of Deep Generative Models, 2018 87
BITS Pilani, Pilani Campus
Compression - Lossless

Generative models provide better


bit-rates than distribution-unaware
compression methods like JPEG,
etc.

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever, Generating Long Sequences with Sparse Transformers, arXiv:1904.10509, 2019
BITS Pilani, 88
Pilani Campus
Compression - Lossy

J P EG JPEG2000 WaveOne

Oren Rippel, Lubomir Bourdev, Real-Time Adaptive Image Compression, ICML 2017 89
BITS Pilani, Pilani Campus
Downstream Task – Sentiment
Dectection

[Radford et al., 2017]

63
BITS Pilani, Pilani Campus
Downstream Tasks - NLP (BERTRevolution)

https://gluebenchmark.com/leaderboard
91
BITS Pilani, Pilani Campus
Downstream Tasks - Vision (Contrastive)

Olivier J . Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord, Data-Efficient Image
Recognition with Contrastive Predictive Coding, ICML 2020 92
BITS Pilani, Pilani Campus
Summary

• Unsupervised Learning: Rapidly advancing field thanks to compute; deep


learning engineering practices; datasets; lot of people working on it.
• Not just an academic interest topic. Production level impact [example: B E RT is in
use for Google Search and Assistant].
• What is true now may not be true even a year from now [example: self-
supervised pre-training was way worse than supervised in computer vision tasks
like detection/segmentation last year. Now it is better].
• Language Modeling (GPT), Image Generation (conditional GANs), Language
pre-training (BERT), vision pre-training (CPC / MoCo) starting to work really well.
Good time to learn these well and make very impactful contributions.
• Autoregressive Density Modeling, Flows, VAEs, GANs, Diffusion Models, etc.
have huge room for improvement. Great time to work on them.
93
BITS Pilani, Pilani Campus
Neural building blocks: CNNs

A Convolutional Neural Network (CNN)

Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning", Nature, Vol. 521, 28 May 2015
BITS Pilani, Pilani Campus
Neural building blocks: RNNs

A Recurrent Neural Network (RNN)


(unfolded across time-steps) A bi-directional RNN

A deep bi-directional RNN

Long-Short-Term- Gated Recurrent Units (GRUs)


Memories (LSTMs)

C. Manning and R Socher, Stanford CS224n Lecture 8 Notes


Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning", Nature, Vol. 521, 28 May 2015 BITS Pilani, Pilani Campus
Neural building blocks: Attention mechanisms,
Transformers
Transformer
Model

Spatial Attention in Image Captioning

K. Xu et al., “Show, Attend and Tell: Neural Image Caption Generation with
Visual Attention”, ICML 2015
D. Bahdanau, K. Cho and Y. Bengio, “Neural Machine Translation by Jointly
Learning to Align and Translate”, IC L R 2015
A. Vaswani et al., “Attention Is All You Need”, NIPS 2016
Seq2Seq with Attenion BITS Pilani, Pilani Campus

You might also like