0% found this document useful (0 votes)
23 views62 pages

ML Visuals

Uploaded by

Cheney li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views62 pages

ML Visuals

Uploaded by

Cheney li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

ML Visuals

By dair.ai

https://github.com/dair-ai/ml-visuals
Basic ML Visuals
Softmax

Convolve

Sharpen
Softmax

Convolve

Sharpen
Softmax

Linear

Add & Norm


Feed
Forward

Add & Norm Add & Norm


Feed Multi-Head
Forward Attention

Add & Norm Add & Norm


Multi-Head Masked
Attention Multi-Head
Attention

Positional Positional
Encoding Encoding

Input Output
Embedding Embedding

Inputs Outputs(shifted right)


Softmax

Linear

Add & Norm


Feed
Forward

Add & Norm


Add & Norm
Multi-Head
Feed Attention
Forward
Add & Norm
Add & Norm
Masked
Multi-Head Multi-Head
Attention Attention

Positional Positional
Encoding Encoding

Input Output
Embedding Embedding

Inputs Outputs (shifted right)


I love coding and writing

Tokenize

“I love coding and writing”


Hidden
Layers

Input Layer Output


a[2]1 a[3]1
Layer

a[2]2 a[3]2

a[4] Ŷ
X

a[2]3 a[3]3

a[1]n a[2]n a[3]n

X = A[0] A[1] A[2] A[3] A[4]


Hidden
Layers

Input Layer Output


a[1]1 a[2]1 a[3]1
Layer

a[1]2 a[2]2 a[3]2

a[4] Ŷ
X
a[1]3 a[2]3 a[3]3

X = A[0] a[1]n a[2]n a[3]n A[4]

A[1] A[2] A[3]


Hidden
Layers

Input Layer Output


a1
[1 ]
a[2]1 a[3]1
Layer

a[1]2 a[2]2 a[3]2

a[4] Ŷ
X
a[1]3 a[2]3 a[3]3

a[1]n a[2]n a[3]n

X = A[0] A[1] A[2] A[3] A[4]


CONV
operation
MxM ReLU

+b1 +b1 MxMX2

+b2 +b2

NxNx3
MxM ReLU

a[l-1] a[l]
CONV
operation
MxM ReLU

+b1 +b1 MxMX2

+b2 +b2

NxNx3
MxM ReLU
CONV
operation
MxM ReLU

+b1 +b1 MxMX2

+b2 +b2

NxNx3
MxM ReLU
Abstract backgrounds
DAIR.AI
Gradient Backgrounds
Community Contributions
Striding in
CONV

S=1

S=2
MaxPool
Inception Same s=1
Module

5x5 Same
NxNx192
3x3 Same

1x1 Same

NxNx128
NxNx192

NxNx32

NxNx64
t-1 t

(a) Retraining w/o expansion


(b) No-Retraining w/ expansion (c) Partial Retraining w/ expansion
t-1 t t-1 t

(b) No-Retraining w/ expansion (c) Partial Retraining w/ expansion


t-1 t t-1 t t-1 t

(a) Retraining (b) No-Retraining (c) Partial Retraining


expansion expansion expansion
Size
Family
?
X Y
#bed

Walk PRICE
? ŷ Basic Neuron
ZIP
Model

Schoo
l
How does NN
Wealth work (Insprired
from Coursera)

Ŷ = 0

Ŷ = 1
Logistic
Regression
Linear ReLU(x)
regression

$
$

Size Size
Training
C
O
N
C C V C C C
O O 3 O O O
I N N N N N I1
V V V V V
1 2 C 6 7
5
O 128*128*1
128*128*1 N
V
4

Encoder Decoder

Decoder V1
V ENcoder
128*128*1
128*128*1
Large NN

Med NN

η Small NN

SVM,LR
etc

Amount of
Data

Why does Deep learning work?


Hidden
Input Output

a[1]1

One hidden layer neural network

a[1]2

X a[2] Ŷ

a[1]3

a[1]4

X = A[0] A[1] A[2]


x[1] a[1]1

a[2]

x[2] a[1]2

x[1]

x[2]

x[3]

Neural network templates


x[1] a[1]1

a[2]

x[2] a[1]2

x[1]

x[2]

x[3]

Neural network templates


Train Valid Test

Underfitting Good fit Overfitting


x2

x2

x2
x1 x1 x1

Train-Dev-Test vs. Model fitting


x[1]

a[L] DropOut
x[2]

x[3]

x2
r=1
x2

Normalizatio
x1 n

x1
w2 Early stopping
J

Er
r

w1
w1

Before Dev
w2
Normalization
Train
it
.
J w2

w1 w1

After Normalization
w2
x1

x2 w[L]

w[1] w[2] w[L- w[L- Deep neural


2] 1]
networks

FN TN

TP FP
Understanding
Precision & Recall
Batch vs. Mini-
batch
Gradient Descent

w2 w2
BGD
SGD

Batch
Gradient Descent
vs. SGD

SGD
w1 w1
x[1]
p[1]

x[2]
p[2]

x[3]

Softmax Prediction
with 2 outputs
Miscellaneous
16+3
16 16 16 1
2
Convolution 3x3 Convolution 1x1 Dropout
3 16 0.1
Dropout
Max Pooling 2x2 Skip connection
Up Sampling 0.2
Dropout
Block copied 0.3
2x2

32 32 32+6
32
4

64 64 64+12
64
8

12 12 12
128+256
8 8 8

25 25
6 6
Output
FC-512
Max-Pool
Output
Conv3-128
3er

FC-512
Lay

4er

Max-Pool
Lay

Max-Pool
Conv3-64
Conv3-128
3er

Conv3-64
2

Lay
er

Max-Pool
Lay

Max-Pool
Conv3-64
Conv3-32
Conv3-64
2er

Conv3-32
Lay

Max-Pool
Conv3-32
Conv3-32
Conv3-32
1er

Conv3-32
Lay

Input
Conv3-32
Conv3-32
1er
Lay
Conv

Conv
Max-

Max-

Input
Pool

Pool
FC

FC

er3 er4 2 er 1 er
Lay Lay Lay Lay
Filter
concatenation

3x3 convolutions 5x5 convolutions 1x1 convolutions

1x1 convolutions

1x1 convolutions 1x1 convolutions 3x3 max pooling

Previous layer
Previous layer

1x3 conv,
1 padding 1x3 conv,
1 padding
1x7 conv,
1x3 conv, 3 padding
1 padding
1x5 conv,
2 padding
1x3 conv,
1 padding

Filter
concatenation
Auxiliary Classifier

Softmax
Auxiliary Classifier

FC
Softmax Conv
FC Avg-Pool
FC Inception
Conv Inception Softmax
Avg-Pool Max-Pool FC
Inception FC
Inception Conv
Inception Avg-Pool
Inception
Inception
Max-Pool
Inception
Inception
Max-Pool
Max-Pool
Conv
Conv
Max-Pool Max-Pool
Conv
ConvTranspose2d
Max-Pool
Conv Input
Input
Filter Filter
concatenation concatenation

3x3 1x3 3x1


conv. conv. conv.

1x1 3x3 3x3 1x3 3x1 3x3


1x1
conv. conv. conv. conv. conv. conv.
1x1 conv.
1x1
conv.
1x1 conv.
1x1 1x1
Pool 1x1
conv. conv. Pool conv.
conv.

Previous layer Previous layer

(a) (b)
Previous input Previous input

x x

F(x) Stacked layers F(x) Stacked layers x


identity
+
y=F(x)
y=F(x)+x

R1

R1 R2

R1

R1 R2 R3 R1 R2 R3
Dense Block 1 Dense Block 2 Dense Block 3

Avg-Pool
Avg-Pool
Avg-Pool

Softmax
Conv
Conv
Conv
Input

FC
Transition layers
hi+1

Filter
concatenation

hi+1 add add

Filter 3x3 3x3 3x3


identity
concatenation max conv avg

add add add add add add add add

3x3 3x3 5x5 3x3 3x3 3x3 3x3 5x5 7x7 5x5 3x3 7x7 3x3 5x5
identity identity
conv conv conv avg avg avg conv conv conv conv max conv avg conv

hi hi

... ...

hi-1 hi-1

(a) (b)
=
14*14
28*28
56*56
112*112
224*224
Max(1,1,5,6) = 6

Y
1 1 2 4

5 6 7 8 6 8

3 2 1 0 Pooling performed 3 4
with a 2x2 kernel
and a stride of 2
1 2 3 4

X
Image
Representation

You might also like