0% found this document useful (0 votes)
9 views51 pages

Lec1 Intro

The lecture introduces deep generative models, emphasizing their applications in various fields such as natural language processing, image and video generation, and protein design. It contrasts generative models with discriminative models, highlighting their ability to produce multiple plausible outputs from a single input. The course will explore the formulation of real-world problems as generative models, along with their probabilistic foundations and associated challenges.

Uploaded by

hungnh11251989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views51 pages

Lec1 Intro

The lecture introduces deep generative models, emphasizing their applications in various fields such as natural language processing, image and video generation, and protein design. It contrasts generative models with discriminative models, highlighting their ability to produce multiple plausible outputs from a single input. The course will explore the formulation of real-world problems as generative models, along with their probabilistic foundations and associated challenges.

Uploaded by

hungnh11251989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Lecture 1

Introduction

6.S978 Deep Generative Models

Kaiming He
Fall 2024, EECS, MIT
The “GenAI” Era
Chatbot and natural language conversation
The “GenAI” Era
Text-to-image generation

Generated by Stable Diffusion 3 Medium.


Prompt: teddy bear teaching a course, with "generative models" written on blackboard
The “GenAI” Era
Text-to-video generation

Generated by Sora
The “GenAI” Era
AI assistant for code generation
The “GenAI” Era
Protein design and generation

Watson, et al. De novo design of protein structure and function with RFdiffusion, Nature 2023
The “GenAI” Era
Weather forecasting

Skilful precipitation nowcasting using deep generative models of radar, Nature 2021
Generative Models before the “GenAI” Era
2009, PatchMatch: Photoshop’s Content-aware Fill

PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing, SIGGRAPH 2009
Generative Models before the “GenAI” Era
1999, the Efros-Leung algorithm for texture synthesis
In today’s word: this is an Autoregressive model

Texture Synthesis by Non-parametric Sampling, ICCV 1999


What are Generative Models?
What do these scenarios have in common?
• There are multiple or infinite predictions
to one input.
• Some predictions are more “plausible”
than some others. Image generation
Chatbot
• Training data may contain no exact
solution.
• Predictions may be more complex, more
informative, and higher-dimensional than
input. Video generation
Protein generation
Discriminative vs. Generative models
discriminative discriminative generative
• “sample” x ⇨ “label” y
x y
• one desired output “dog”

generative
model
• “label” y ⇨ “sample” x
• many possible outputs
model

“dog”
y x
Discriminative vs. Generative models
discriminative generative

• Generative models can be discriminative: Bayes’ rule


• Can discriminative models be generative?
• Generative models can be discriminative: Bayes’ rule
assuming known prior

discriminative generative
constant for given x
• Generative models can be discriminative: Bayes’ rule
assuming known prior

discriminative generative
constant for given x

• Can discriminative models be generative?


still need to model prior
distribution of x

generative discriminative
constant for given y

• The challenge is about representing and predicting distributions


Probabilistic modeling
• Where does probability come from?
• Assuming underlying distributions of data
generation process
example:
• latent factors z (pose, lighting, scale, ...)
• z has simple distributions
• observations x are rendered by a “world model”
that’s a function on z
• observations x have complex distributions

• Probability is part of the modeling.


Figure from: W. T. Freeman, J. B. Tenenbaum, “Learning Bilinear Models for Two-Factor Problems in Vision”, 1996
Probability is part of the modeling
• There may not be “underlying” distributions.
• Even there are, what we can observe are a finite set of data points
• The models extrapolate the observations for modeling distributions

• Overfitting vs. underfitting: like discriminative models


overfit “right” fit underfit

discriminative models

Figure credit: https://www.mathworks.com/discovery/overfitting.html


Probability is part of the modeling

x
Probability is part of the modeling

p
underfit

x
Probability is part of the modeling

p
overfit

x
Probability is part of the modeling
• To the extreme, using delta functions is
like sampling from training data
p
overfit

x
Generative models w/ probabilistic modeling
data
Generative models w/ probabilistic modeling
data
• This is already part
of the modeling
distribution
of data
Generative models w/ probabilistic modeling
data

distribution • Optimize a loss function


of data

estimated
distribution
of data
Generative models w/ probabilistic modeling
data

distribution
of data

sample new “data”


estimated
distribution
of data
Generative models w/ probabilistic modeling
data

distribution
of data

estimate prob density


estimated
=?
distribution
of data
Generative models w/ probabilistic modeling
Notes:
• Generative models involve statistical models which are often designed and
derived by humans.
• Probabilistic modeling is not just the work of neural nets.
• Probabilistic modeling is a popular way, but not the only way.
• "All models are wrong, but some are useful.” - George Box
What are Deep Generative Models?
Deep Generative Models
• Deep learning is representation learning
• Learning to represent data instances
• map data to feature:
• minimize loss w/ target:
Deep Generative Models
• Deep learning is representation learning
• Learning to represent data instances
• map data to feature:
• minimize loss w/ target:

• Learning to represent probability distributions


• map a simple distribution (Gaussian/uniform) to a complex one:
• minimize loss w/ data distribution:

• Often perform both together


Learning to represent probability distributions
• From simple to complex distributions

to approximate

simple distribution
data distribution
Learning to represent probability distributions
• Not all parts of distribution modeling is done by learning

Case study:
This dependency graph is
Autoregressive model designed (not learned).
Learning to represent probability distributions
• Not all parts of distribution modeling is done by learning

Case study: The mapping function is learned


Autoregressive model (e.g., Transformer)
Learning to represent probability distributions
• Not all parts of distribution modeling is done by learning

Case study:
noising
Diffusion model

This dependency graph is


designed (not learned).
denoising
Learning to represent probability distributions
• Not all parts of distribution modeling is done by learning

Case study:
noising
Diffusion model

denoising

The mapping function is learned


(e.g., Unet)
Deep Generative Models may involve:
• Formulation:
• formulate a problem as probabilistic modeling
• decompose complex distributions into simple and tractable ones
• Representation: deep neural networks to represent data and their
distributions
• Objective function: to measure how good the predicted distribution is
• Optimization: optimize the networks and/or the decomposition
• Inference:
• sampler: to produce new samples
• probability density estimator (optional)
Formulating Real-world Problems
as Generative Models
Formulating Real-world Problems as Generative Models
• Generative models are about

What can be y? What can be x?


• condition • “data”
• constraint • samples
• labels • observations
• attributes • measurements

• more abstract • more concrete


• less informative • more informative
Case study: Formulating as p(x|y)
• Natural language conversation
y: prompt

x: response of the chatbot


Case study: Formulating as p(x|y)
• Text-to-image/video generation
Prompt: teddy bear teaching a course, with y: text prompt
"generative models" written on blackboard

x: generated visual content

Image generated by Stable Diffusion 3 Medium


Case study: Formulating as p(x|y)
• Text-to-3D structure generation

x: generated
3D structures

y: text prompt

Figure credit: Tang, et al. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. ECCV 2024
Case study: Formulating as p(x|y)
• Protein structure generation

x: generated
y: condition/constraint protein structures
(e.g., symmetry)

Watson, et al. De novo design of protein structure and function with RFdiffusion, Nature 2023
Case study: Formulating as p(x|y)
• Class-conditional image generation
“red fox” y: class label

x: generated image

Image generated by: Li, et al. Autoregressive Image Generation without Vector Quantization, 2024
Case study: Formulating as p(x|y)
• “Unconditional” image generation
y: an implicit condition
“images following CIFAR10 distribution”

x: generated CIFAR10-like images

• p(x|y): images ~ CIFAR10


• p(x): all images

Images generated by: Karras, et al. Elucidating the Design Space of Diffusion-Based Generative Models, NeurIPS 2022
Case study: Formulating as p(x|y)
• Classification (a generative perspective)

y: an image as the “condition” x: probability of classes


conditioned on the image

cat

bird

horse

dog
Case study: Formulating as p(x|y)
• Open-vocabulary recognition

y: an image as the “condition” x: plausible descriptions


conditioned on the image

bird

flamingo

red color

orange color

...... ...
Case study: Formulating as p(x|y)
• Image captioning

y: an image as the “condition” x: plausible descriptions


conditioned on the image

figure credit: https://github.com/GoogleCloudPlatform/asl-ml-immersion/blob/master/notebooks/multi_modal/solutions/image_captioning.ipynb


Case study: Formulating as p(x|y)
• Chatbot with visual inputs

y: image and text prompt

x: response of the chatbot

Figure from: GPT-4 Technical Report, 2023


Case study: Formulating as p(x|y)
• Policy Learning in Robotics x: policies
y: visual and other (probability of actions)
sensory observations

Chi, et al. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion, RSS 2023
Formulating Real-world Problems as Generative Models
• Generative models are about
• Many problems can be formulated as generative models
• What’s x? What’s y?
• How to represent x, y, and their dependence?
About this course
This course will cover:
• How real-world problems are formulated as generative models?
• Probabilistic foundations and learning algorithms
• Challenges, opportunities, open questions

You might also like