0% found this document useful (0 votes)

98 views13 pages

McCormick How Stable Diffusion Works Dec 2022

Stable Diffusion is a neural network that can generate images from text descriptions by removing noise from an initial random image. It works by representing images and text as numerical matrices and tensors, and using a huge set of parameters trained via machine learning to incrementally clean up the initial random image according to the text.

Uploaded by

Anklebraclet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views13 pages

McCormick How Stable Diffusion Works Dec 2022

Uploaded by

Anklebraclet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Chris McCormick About Membership Blog Archive

Become an NLP expert with videos & code for BERT and beyond → Join NLP Basecamp
now!

How Stable Diffusion Works

21 Dec 2022

The ability for a computer to generate art from nothing but a written
description is fascinating! I know that I, for one, would be desperately
curious to see what’s actually going on “under the hood” that would make
this possible, so I wanted to do what I can here to provide a less superficial
explanation of what’s going on even for those who aren’t familiar with
the concepts in artificial intelligence.

Overview
In the first section, I’ll give you the high‐level explanation ﴾that you may
already be familiar with﴿. It’s a good start, but I know that it wouldn’t
satisfy my curiosity. ὠ I’d be asking, “Ok, great, but how does it do that?”

To address this, I’ll show you some of Stable Diffusion’s inner workings.
The insides are more complex than you might be hoping, but I at least
wanted to show you more concretely what’s going on, so that it’s not a
complete mystery anymore.

More specifically:

Stable Diffusion is a huge neural network.

Neural networks are pure math.
The truth is, we don’t fully know what it’s doing!
Ultimately, Stable Diffusion works because we trained it.
But let’s start with the bigger picture!

Stable Diffusion Removes Noise

from Images
If you’ve ever tried to take a picture when it’s too dark, and the picture
came out all grainy, that graininess is an example of “noise” in an image.

We use Stable Diffusion to generate art, but what it actually does behind
the scenes is “clean up” images!

It’s much more sophisticated than the noise removal slider in your phone’s
image editor, though. It actually has an understanding of what the world
looks like, and an understanding of written language, and it leverages
these to guide the process.

For example, imagine if I gave the below image on the left to a skilled
graphic artist and told them that it’s a painting of an alien playing a guitar
in the style of H.R. Giger. I bet they could go in and painstakingly clean it
up to create something like the image on the right.
﴾These are actual images from Stable Diffusion!﴿

The artist would do it using their knowledge of Giger’s artwork as well as

knowledge of the world ﴾such as what guitars are supposed to look like
and how you play one﴿. Stable Diffusion is essentially doing the same
thing!

“Inference Steps”
Are you familiar with the “Inference Steps” slider in most art generation
tools? Stable Diffusion removes noise incrementally.

Here’s an example of running it for 25 steps:

The alien guitarist example makes more sense, because you can make out
what it’s supposed to be much more clearly… but in the image above, the
starting image looks completely unrecognizable!

In fact, that noisy alien example was actually taken from about halfway
through the process–it actually started out as complete noise as well!

How Does It Even Start?

To generate art, we give Stable Diffusion a starting image that’s actually
nothing but pure noise. But, rather cruelly ὠ, we lie and say “This is a
super‐noisy painting of an alien playing a guitar in the style of H.R. Giger–
can you clean it up for me?”

If you gave that task to a graphic artist, they’d throw up their hands–“I
can’t help you, the image is completely unrecognizable!”

So how does Stable Diffusion do it?

At the simplest level, the answer is that it’s a computer program and it has
no choice but to do its thing and produce something for us.
A deeper answer has to do with the fact that AI models ﴾more technically,
“Machine Learning” models﴿ like Stable Diffusion are heavily based on
statistics. They estimate probabilities for all of their options, and even if all
of the options have extremely low probability of being right, they still just
pick whichever path has the highest probability.

So, for example, it has some idea of the places where a guitar might go in
an image, and it could look for whatever part of the noise seems most like
it could be the edge of the guitar ﴾even though there really is no “right”
choice﴿, and starts filling things in.

Since there’s no right answer, every time you give it a different image of
pure noise it’s going to come up with a different piece of artwork!

How Do You Program Stable

Diffusion?
If I wasn’t familiar with machine learning, and I was trying to guess at how
this is actually implemented, I’d probably start to think up how you would
program it. In other words, what’s the sequence of steps it follows?

Maybe it matches keywords from the description to search a database of

images that match the description, and then compares them to the noise?
And from that guy’s explanation, it sounds like it might start by calculating
where the strongest edges are in the image? ♂

The truth is nothing like that–it doesn’t have a database of images to

reference, it doesn’t use any image processing algorithms… It’s pure math.

And I don’t mean that in the sense of “well, sure, computers are ultimately
just big calculators, and everything they do boils down to math”. I’m
talking about the “bewildering equations on a chalkboard” kind of math,
like the ones below:
﴾That’s from a technical tutorial I wrote on one of the many building blocks
of Stable Diffusion called “Attention”.﴿

The full set of equations that define each of the different building blocks
would fill a few pages, at least.

Images and Text as Numbers

In order to apply these equations, we need to represent that initial noise
image, and our text description, as big tables of numbers.

You might already be familiar with how images are represented, but let’s
look at an example. Here’s a long exposure photo I took at high tide:
And here’s how it’s represented mathematically. It’s 512 x 512 pixels, so we
represent it as a table with 512 rows and 512 columns. But we actually
need three tables to represent an image, because each pixel is made up of
a mixture of Red, Green, and Blue ﴾RGB﴿. Here are the actual values for the
above image.

With Stable Diffusion, we also work with text. Here’s a description I might
write for the image:
A long exposure color photograph of decaying concrete steps leading dow

And here’s how this is represented as a table of numbers. There is one row
for each of the words, and each word is represented by 768 numbers.
These are the actual numbers used in Stable Diffusion v1.5 to represent
these words:
How we choose the numbers to represent a word is a fascinating topic, but
also fairly technical. You can loosely think of those numbers as each
representing a different aspect of the meaning of a word.

In machine learning, we don’t actually refer to these as “tables”–we use the

terms “Matrix” or “Tensor”. These come from the field of linear algebra.

The most important and mind‐bending part of all of this, though, is the
concept of parameters.

A Billion Parameters
The initial noise and our text description are what we call our inputs to
Stable Diffusion, and different inputs will have different values in those
tables.

There is a much, much larger set of numbers that we plug into those
equations as well, though, that are the same every time–these are called
Stable Diffusion’s parameters.

Remember plotting lines in high school with equations like y = 3x + 2 ?

If this were Stable Diffusion, then ‘x’ is our input, ‘y’ is the final image, and
the numbers 3 and 2 are our parameters. ﴾And, of course, the equations
are wildly more complex ὡ﴿.

The input image was represented by about 790k values, and the 33
“tokens” in our prompt are represented by about 25k values.

But there are roughly 1 billion parameters in Stable Diffusion.

﴾Can you imagine doing all of that math by hand?!?﴿

Those 1 billion numbers are spread out across about 1,100 different
matrices of varying sizes. Each matrix is used at a different point in the
math.

I’ve printed out the full list of these matrices here, if you’re curious!

Again, those parameters don’t change–they’re the same numbers every

time you generate an image.

Stable Diffusion works because we figured out the right values to use for
each of those 1 billion numbers. How absurd is that?!

Choosing 1 Billion Parameters

Obviously, the authors could not have sat down and decided what
numbers to try. Especially when you consider that they’re not “integers”
like 1,2,3, but rather what we computer nerds call “floating point” values–
the small, very precise fractions that you saw in the tables.

Not only did we not choose these numbers–we can’t even explain a single
one of them! This is why we can’t fully explain how Stable Diffusion works.
We have some decent intuition about what those equations are doing, but
a lot of what’s going on is hidden in the values of those numbers, and we
can’t fully make sense of it.

Insane, right?

So how do we figure them out?

We start by picking 1 billion random numbers to use. With those initial

random parameter values, the model is completely useless–it can’t do
anything of value until we figure out better parameter values to use.

So we apply a mathematical process that we refer to as training which

gradually adjusts the values to ones that work well.

The way training works is something we do understand fully–it’s some

basic calculus ﴾albeit applied to a very large equation﴿ that’s essentially
guaranteed to work, and we have a clear understanding of why.

Training involves a huge dataset of training examples. A single training

example consists of an input and a desired output. ﴾I’ll explain what a
training example looks like for Stable Diffusion in another post﴿.

When we run the very first training input through ﴾with completely random
parameter values﴿ what the model spits out is going to be nothing like the
desired output.

But, using the difference between the actual output and desired output,
we can apply some very basic calculus on those equations that will tell us,
for every one of those 1 billion numbers, a specific amount that we should
add or subtract. ﴾Each individual parameter is tweaked by a different, small
amount!﴿

After we make those adjustments, the model is mathematically guaranteed

to produce an image that’s a tiny bit closer to our desired output.

So we do that many times ﴾hundreds of millions of times﴿ with many

different training examples, and the model keeps getting better and
better. We get diminishing returns as we go, though, and we eventually
reach a point where the model’s not going to benefit from further training.

Once the authors finished training the model, they published the
parameter values for everyone to use freely!

Training Stable Diffusion

There’s a lot about the Stable Diffusion training process that’s easy to
understand, and can be pretty interesting to learn, but I’m saving that for
another blog post!

Conclusion
I won’t be offended if you’re a little disappointed by the explanation here,
and that it’s not more understandable, but hopefully you at least feel like
the veil has been lifted, and that what you saw was mind‐bending and
inspiring!

What do you think?

39 Responses

Upvote Funny Love Surprised Angry Sad

1 Comment 
1 Login

G Join the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Name

 Share Best Newest Oldest

R
Reese Green 4 months ago edited
− ⚑
Really awesome blog post! I already knew how stable diffusion worked, but this made it
simple. Would love to read your explanation on how stable diffusion models are trained.

0 0 Reply • Share ›

Subscribe Privacy Do Not Sell My Data

Related posts
Choosing a Sampler for Stable Diffusion 11 Apr 2023
Classifier‐Free Guidance ﴾CFG﴿ Scale 20 Feb 2023
Steps and Seeds in Stable Diffusion 11 Jan 2023

NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Generative AI A Transformative Force in Business Intelligence
No ratings yet
Generative AI A Transformative Force in Business Intelligence
7 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Skip Gram
100% (1)
Skip Gram
37 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Generative Adversial Network
No ratings yet
Generative Adversial Network
21 pages
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
No ratings yet
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
12 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
GenAI Pinnacle Roadmap
100% (1)
GenAI Pinnacle Roadmap
8 pages
FineTuning Process Using OpenAI 1703440516
No ratings yet
FineTuning Process Using OpenAI 1703440516
14 pages
LangChain Programming For Beginners
No ratings yet
LangChain Programming For Beginners
154 pages
Techniques To FineTune LLMs
No ratings yet
Techniques To FineTune LLMs
7 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Hugging Face Transformers
No ratings yet
Hugging Face Transformers
8 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Hugging Face
100% (1)
Hugging Face
11 pages
Pytorch: Tensors and Datasets
No ratings yet
Pytorch: Tensors and Datasets
9 pages
Generative AI With Large Language Models AWS & DeepLearning
No ratings yet
Generative AI With Large Language Models AWS & DeepLearning
96 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
Lecture Generative AI and Whole Cell Modeling
No ratings yet
Lecture Generative AI and Whole Cell Modeling
50 pages
LangChain & RAG
No ratings yet
LangChain & RAG
62 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
Gen Ai Solutions
No ratings yet
Gen Ai Solutions
14 pages
Hope To Skills: Lecture# 04 Irfan Malik, Dr. Sheraz Naseer
No ratings yet
Hope To Skills: Lecture# 04 Irfan Malik, Dr. Sheraz Naseer
10 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Scalable-ML-3 4 1
No ratings yet
Scalable-ML-3 4 1
147 pages
Generative Artificial Intelligence - Wikipedia
No ratings yet
Generative Artificial Intelligence - Wikipedia
37 pages
MLOPs Artem Koval
No ratings yet
MLOPs Artem Koval
38 pages
Guide To RAG System Evaluation Metrics
No ratings yet
Guide To RAG System Evaluation Metrics
21 pages
PThread API Reference
No ratings yet
PThread API Reference
348 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Generative AI APIs For Practical Applications
No ratings yet
Generative AI APIs For Practical Applications
27 pages
Res Net
No ratings yet
Res Net
13 pages
Building Your Own Autonomous LLM Agents - LinkedIn
No ratings yet
Building Your Own Autonomous LLM Agents - LinkedIn
33 pages
Generative AI
No ratings yet
Generative AI
25 pages
Generative AI For Software Developers Syllabus
No ratings yet
Generative AI For Software Developers Syllabus
8 pages
LangChain Academy - Introduction To LangGraph - Motivation
No ratings yet
LangChain Academy - Introduction To LangGraph - Motivation
17 pages
What Is An AI Agent
No ratings yet
What Is An AI Agent
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
75 pages
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
No ratings yet
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
38 pages
On Ai
No ratings yet
On Ai
24 pages
After Effects Expressions
No ratings yet
After Effects Expressions
9 pages
Newwhitepaper Agents2
No ratings yet
Newwhitepaper Agents2
84 pages
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
0% (1)
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
8 pages
A Tour of TensorFlow
No ratings yet
A Tour of TensorFlow
16 pages
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
No ratings yet
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
11 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Deep Learning Tensorflow
No ratings yet
Deep Learning Tensorflow
35 pages
Introduction - Hugging Face NLP Course
No ratings yet
Introduction - Hugging Face NLP Course
8 pages
Tenofas FLUX Modular Workflow - User Guide - Civitai
100% (1)
Tenofas FLUX Modular Workflow - User Guide - Civitai
15 pages
Hard Prompts Made Easy: Gradient-Based Discrete Optimization For Prompt Tuning and Discovery
No ratings yet
Hard Prompts Made Easy: Gradient-Based Discrete Optimization For Prompt Tuning and Discovery
15 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Animation at Work
From Everand
Animation at Work
Rachel Nabors
No ratings yet
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
From Everand
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Fouad Sabry
No ratings yet
Curro Westbrook High School Fact Sheet 2025 01
No ratings yet
Curro Westbrook High School Fact Sheet 2025 01
2 pages
Catalog
100% (1)
Catalog
71 pages
Life and Works of Rizal Reviewer
No ratings yet
Life and Works of Rizal Reviewer
6 pages
Abirami R - Internship - Report
No ratings yet
Abirami R - Internship - Report
26 pages
Talent Management
No ratings yet
Talent Management
35 pages
Education in Peru
No ratings yet
Education in Peru
6 pages
Time Connectives Homework Ks1
100% (1)
Time Connectives Homework Ks1
8 pages
Tamil Nadu State Council For Science and Technology: Founder Chairman, Velammal Educational Trust
No ratings yet
Tamil Nadu State Council For Science and Technology: Founder Chairman, Velammal Educational Trust
3 pages
Artificial Intelligence - 3170716 - Thatmishrajii
No ratings yet
Artificial Intelligence - 3170716 - Thatmishrajii
45 pages
Metoprolol (Lopressor, Toprol-XL) Considerations For Use : Mechanism of Action Dosing
No ratings yet
Metoprolol (Lopressor, Toprol-XL) Considerations For Use : Mechanism of Action Dosing
1 page
Fe 1 Grids
No ratings yet
Fe 1 Grids
12 pages
(2024) Educational Insights - Chatgpt's Impacts On Environmental Literacy
No ratings yet
(2024) Educational Insights - Chatgpt's Impacts On Environmental Literacy
14 pages
Advanced Financial Modelling: 2 Days
No ratings yet
Advanced Financial Modelling: 2 Days
7 pages
Brent William
No ratings yet
Brent William
173 pages
Intro
No ratings yet
Intro
32 pages
D.sharmila: Career Objective
No ratings yet
D.sharmila: Career Objective
2 pages
11 - LUNA - Group - 9 - Project Proposal - Activity - Non - Govt - Inst
No ratings yet
11 - LUNA - Group - 9 - Project Proposal - Activity - Non - Govt - Inst
5 pages
Fed Undergraduate Booklet A5
No ratings yet
Fed Undergraduate Booklet A5
24 pages
Interview Failures Its Causes
No ratings yet
Interview Failures Its Causes
1 page
FINAL C1ES 108903 New Reporting Template For LESF
No ratings yet
FINAL C1ES 108903 New Reporting Template For LESF
528 pages
Pakikipagkapwa RRL
No ratings yet
Pakikipagkapwa RRL
4 pages
Certificate (10 Files Merged)
No ratings yet
Certificate (10 Files Merged)
10 pages
Rubin's Theory
50% (2)
Rubin's Theory
3 pages
Cause-Effect Persuasive Speech Outline Template
No ratings yet
Cause-Effect Persuasive Speech Outline Template
4 pages
CV, Marksheet
No ratings yet
CV, Marksheet
5 pages
Chanakya Brochure Email
No ratings yet
Chanakya Brochure Email
4 pages
Automated Pixel-Level Pavement Crack Detection On 3D Asphalt Surfaces Using A Deep-Learning Network
No ratings yet
Automated Pixel-Level Pavement Crack Detection On 3D Asphalt Surfaces Using A Deep-Learning Network
15 pages
Q1W1 - Introduction To Information and Communication Technology
No ratings yet
Q1W1 - Introduction To Information and Communication Technology
99 pages
Jay Hardwick Resume
No ratings yet
Jay Hardwick Resume
2 pages
Resume For Job Training
100% (2)
Resume For Job Training
7 pages

McCormick How Stable Diffusion Works Dec 2022

Uploaded by

McCormick How Stable Diffusion Works Dec 2022

Uploaded by

Chris McCormick About Membership Blog Archive

How Stable Diffusion Works

Stable Diffusion is a huge neural network.

Stable Diffusion Removes Noise

The artist would do it using their knowledge of Giger’s artwork as well as

Here’s an example of running it for 25 steps:

How Does It Even Start?

So how does Stable Diffusion do it?

How Do You Program Stable

Maybe it matches keywords from the description to search a database of

The truth is nothing like that–it doesn’t have a database of images to

Images and Text as Numbers

In machine learning, we don’t actually refer to these as “tables”–we use the

Remember plotting lines in high school with equations like y = 3x + 2 ?

But there are roughly 1 billion parameters in Stable Diffusion.

﴾Can you imagine doing all of that math by hand?!?﴿

Again, those parameters don’t change–they’re the same numbers every

Choosing 1 Billion Parameters

So how do we figure them out?

We start by picking 1 billion random numbers to use. With those initial

So we apply a mathematical process that we refer to as training which

The way training works is something we do understand fully–it’s some

Training involves a huge dataset of training examples. A single training

After we make those adjustments, the model is mathematically guaranteed

So we do that many times ﴾hundreds of millions of times﴿ with many

Training Stable Diffusion

What do you think?

Upvote Funny Love Surprised Angry Sad

G Join the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

 Share Best Newest Oldest

Subscribe Privacy Do Not Sell My Data

© 2023. All rights reserved.

You might also like