0% found this document useful (0 votes)

13 views53 pages

26 Deep Learning Annotated

Uploaded by

Bambang Widjanarko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views53 pages

26 Deep Learning Annotated

Uploaded by

Bambang Widjanarko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

26: Intro to Deep

Learning
Jerr y Cain
March 11, 2024

Lecture Discussion on Ed
1
Deep Learning

2
Innovations in deep learning

Deep learning and neural

networks are cores theories and
technologies behind the current AI
revolution.
Errata:
• Checkers is the last solved game (from game
theory, where perfect player outcomes can be
fully predicted from any gameboard).
https://en.wikipedia.org/wiki/Solved_game
• The first machine learning algorithm defeated a
world champion in Chess in 1996.
https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
AlphaGO (2016)
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 3
Computers making art

A Neural Algorithm of Artistic Style Google Deep Dream

The Next Rembrandt https://ai.googleblog.com/2015/06/in
https://medium.com/@DutchDigital/the- https://arxiv.org/abs/1508.06576
ceptionism-going-deeper-into-
next-rembrandt-bringing-the-old-master- https://github.com/jcjohnson/neural-style
neural.html
back-to-life-35dfb1653597
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 4
Detecting skin cancer

Esteva, Andre, et al. "Dermatologist-level classification of skin cancer with deep neural networks."
Nature 542.7639 (2017): 115-118. Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 5
Deep learning

def Deep learning is def A neural network is,

maximum likelihood estimation at its core, many logistic
with neural networks. regression units stacked on top
of each other.

LOL Yes.
[1,0, … , 1] 𝑦,
( output > 0.5?
Predict 1
𝒙, input Lots of Logistic 𝑃 𝑌 = 1|𝑿 = 𝒙
(regressions)

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 6
Logistic Regression Model
% 1
! "

𝑌) = arg max 𝑃 𝑌 | 𝑿
0.8

𝑿 𝜃! + # 𝜃" 𝑋" 0.6

0.4
0.2
𝑦" = 𝑃 𝑌 = 1|𝑿
"#$ -10 -8 -6 -4
0
-2 0 2 4 6 8 10
"
!" #,%

+
𝑦" Let’s focus on the
model up to 𝑦.
"
𝒙
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 7
Logistic Regression Model
% 1
! "

𝑌) = arg max 𝑃 𝑌 | 𝑿
0.8

𝑿 𝜃! + # 𝜃" 𝑋" 0.6

0.4
0.2
𝑦" = 𝑃 𝑌 = 1|𝑿
"#$ -10 -8 -6 -4
0
-2 0 2 4 6 8 10
"
!" #,%

+ σ > 0.5?
…

𝑦" Let’s focus on the

model up to 𝑦.
"
𝒙
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 8
One neuron = One logistic regression

+ σ
= 𝑦(

𝒙 …

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 9
Biological basis for neural networks
A neuron
𝑥$ 𝜃!
𝑥& 𝜃" One neuron =
𝑦" one logistic
𝑥' 𝜃#
𝜃$ regression
𝑥(

Your brain
𝑥$
𝑥& Neural network =
𝑥' many logistic
regressions
𝑥(

(or rather, someone else’s brain)

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 10
Digit recognition example
Input image Input feature vector Output label

(
𝒙(() = 0,0,0,0, … , 1,0,0,1, … , 0,0,1,0 𝑦 =0

(
𝒙(() = 0,0,1,1, … , 0,1,1,0, … , 0,1,0,0 𝑦 =1

We make feature vectors from digitized pictures of numbers.

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 11
Logistic Regression

+ σ
𝑦,
( output
…

𝑃 𝑌 = 1|𝑿 = 𝒙

𝒙, input features
(pixels, on/off) Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 12
Logistic Regression

indicates logistic
regression connection No.
> 0.5?
Predict 0
𝑦,
( output ✅
…

𝑃 𝑌 = 1|𝑿 = 𝒙

𝒙, input features
(pixels, on/off) Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 13
Logistic Regression

indicates logistic
regression connection Yes.
> 0.5?
Predict 1
𝑦,
( output ✅
…

𝒙, input features
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 14
Logistic Regression

indicates logistic
regression connection Yes.
> 0.5?
Predict 1
𝑦,
( output ❌
…

What can we do to increase

𝒙, input features complexity of our model?
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 15
Take two big ideas from Logistic Regression Review

Big idea #1 𝑃 𝑌|𝑿 = 𝒙

Model conditional probability
𝑦" of class label given input

indicates logistic
regression connection

𝑦,
( output
…

Big idea #2 σ θ* 𝒙
Non-linear transform of multiple
values into one value, using
parameter θ
𝒙, input features
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 16
Introducing: The Neural network

No.
> 0.5?
Predict 0
𝑦,
( output ✅
…

𝒉, hidden
layer
𝒙, input features
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 17
Neural network
Big idea #1 𝑃 𝑌|𝑿 = 𝒙
Model conditional probability
𝑦" of class label given input

No.
> 0.5?
Predict 0
𝑦,
( output
…

𝒉, hidden
layer
𝒙, input features
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 18
Feed neurons into other neurons

hidden
neuron
+ σ

No.
> 0.5?
Predict 0
Big idea #2 σ θ* 𝒙 𝑦,
( output
…

Non-linear transform of multiple

values into one value, using
parameter θ • Neuron = logistic regression
𝒉, hidden
layer
𝒙, input features
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 19
Feed neurons into other neurons

No.
another > 0.5?
Predict 0
hidden
neuron 𝑦,
( output
…

+ σ
𝒉, hidden • Neuron = logistic regression
layer • Different parameters for
𝒙, input features every connection
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 20
Feed neurons into other neurons

|𝒉| logistic
regression
connections
No.
> 0.5?
Predict 0
𝒙 ⋅ |𝒉| 𝑦,
( output
…

parameters

𝒉, hidden • Neuron = logistic regression

layer • Different parameters for
𝒙, input features every connection
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 21
Feed neurons into other neurons

|𝒉| logistic
output
regression
connections neuron
No.
+ σ > 0.5?
Predict 0
𝒙 ⋅ |𝒉| 𝑦,
( output
…

parameters

𝒉, hidden • Neuron = logistic regression

layer • Different parameters for
𝒙, input features every connection
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 22
Feed neurons into other neurons

|𝒉| logistic 1 logistic

regression regression
connections connection
No.
> 0.5?
Predict 0
𝒙 ⋅ |𝒉| |𝒉| 𝑦,
( output
…

parameters parameters

𝒉, hidden • Neuron = logistic regression

layer • Different parameters for
𝒙, input features every connection
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 23
Why doesn’t a linear model introduce “complexity”?
Neural network:
1. for 𝑗 = 1, … , |𝒉|:
#
ℎ! = 𝜎 𝜃! " 𝒙 1. 2.

𝑦, = 𝜎 𝜃 %$ # 𝒉 = 𝑃 𝑌 = 1|𝑿 = 𝒙
2. "#, output

…
$, hidden
layer
Linear network : !, input features
1. for 𝑗 = 1, … , |𝒉|:
" #
ℎ! = 𝜃! 𝒙
#
2. 𝑦, = 𝜎 𝜃 %$ 𝒉 = 𝑃 𝑌 = 1|𝑿 = 𝒙
(by yourself)

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 24
Why doesn’t a linear model introduce “complexity”?
Neural network:
1. for 𝑗 = 1, … , |𝒉|:
#
ℎ! = 𝜎 𝜃! " 𝒙 1. 2.

𝑦, = 𝜎 𝜃 %$ # 𝒉 = 𝑃 𝑌 = 1|𝑿 = 𝒙
2. "#, output

…
$, hidden
layer
Linear network : !, input features
1. for 𝑗 = 1, … , |𝒉|:
" #
ℎ! = 𝜃! 𝒙
The linear model is effectively
#
2. 𝑦, = 𝜎 𝜃 %$ 𝒉 = 𝑃 𝑌 = 1|𝑿 = 𝒙 a single logistic regression
with 𝒙 parameters.
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 25
Demonstration

https://adamharley.com/nn_vis/
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 26
Neural networks
A neural network (like logistic regression) gets intelligence from its
parameters 𝜃.

• Learn parameters 𝜃
Training • Find 𝜃&'( that maximizes likelihood of
training data (MLE)

For input feature vector 𝑿 = 𝒙:

Testing/ • Use parameters to compute 𝑦( = 𝑃 𝑌 = 1|𝑿 = 𝒙
Prediction • Classify instance as: 1 𝑦( > 0.5
-
0 otherwise
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 27
Neural networks
A neural network (like logistic regression) gets intelligence from its
parameters 𝜃.

• Learn parameters 𝜃
Training • Find 𝜃&'( that maximizes likelihood of
training data (MLE)

How do we learn the 𝒙 ⋅ 𝒉 + |𝒉| parameters?

Gradient ascent + chain rule!

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 28
Training: Logistic Regression Review
/
1. Optimization 𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃
problem: .
("%
.
/

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦"

("%
( + 1 − 𝑦 (() log 1 − 𝑦" (
🌟
𝑦" = 𝜎 𝜃 * 𝒙(() = 𝑃 𝑌 = 1|𝑿 = 𝒙

2. Compute gradient Find |𝒙| parameters

3. Optimize initialize params

repeat many times:
compute gradient
params += η * gradient
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 29
Training: Neural networks
/
1. Optimization 𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃
problem: .
("%
.

2. Compute gradient

3. Optimize

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 30
1. Same output 𝑦,
! same log conditional likelihood
/

𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃

. .
/ ("%

𝐿 𝜃 = 9 𝑃 𝑌 = 𝑦 ( |𝑿 = 𝒙 ( , 𝜃 Binary class labels:

𝑌 ∈ 0, 1
"#, output ("%
…

$, hidden /
layer ! ! %0! !
!, input features = 9 𝑦" ( 1 − 𝑦" (

("%
for 𝑗 = 1, … , |𝒉|:
#
ℎ! = 𝜎 𝜃! " 𝒙 /

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

("%
𝑦, = 𝜎 𝜃 %$ # 𝒉 = 𝑃 𝑌 = 1|𝑿 = 𝒙
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 31
(model is a little more complicated)
/

𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃

. .
/ ("%

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

"#, output ("%

…

$, hidden
layer
!, input features
To optimize for
for 𝑗 = 1, … , |𝒉|:
log conditional likelihood,
# we now need to find:
ℎ! = 𝜎 𝜃! " 𝒙 dimension 𝒙
𝒉 ⋅ 𝒙 +𝒉 parameters

𝑦, = 𝜎 𝜃 %$ # 𝒉 = 𝑃 𝑌 = 1|𝑿 = 𝒙 dimension 𝒉

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 32
2. Compute gradient
/
1. Optimization 𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃
problem: .
("%
.
/

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

("%

4 * 𝑦" = 𝜎 𝜃 !5 * 𝒉
ℎ3 = 𝜎 𝜃3 𝒙 for 𝑗 = 1, … , |𝒉|

2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize Calculus refresher #1: Calculus refresher #2:

Derivative(sum) = Chain rule 🌟 🌟 🌟
sum(derivative)
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 33
3. Optimize
/
1. Optimization 𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃
problem: .
("%
.
/

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

("%

4 * 𝑦" = 𝜎 𝜃 !5 * 𝒉
ℎ3 = 𝜎 𝜃3 𝒙 for 𝑗 = 1, … , |𝒉|

2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize initialize params

repeat many times:
compute gradient
params += η * gradient
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 34
Training a neural net
/
1. Optimization 𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃
problem: .
("%
.
/

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

("%

4 * 𝑦" = 𝜎 𝜃 !5 * 𝒉
ℎ3 = 𝜎 𝜃3 𝒙 for 𝑗 = 1, … , |𝒉|
Wait, did we just skip something difficult?
2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize initialize params

repeat many times:
compute gradient
params += η * gradient
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 35
2. Compute gradient via backpropagation
/
1. Optimization 𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃
problem: .
("%
.
/

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

("%

4 * 𝑦" = 𝜎 𝜃 !5 * 𝒉
ℎ3 = 𝜎 𝜃3 𝒙 for 𝑗 = 1, … , |𝒉|

2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize initialize params

Learn the tricks behind
repeatbackpropagation
many times: in
compute gradient
CS229, CS231N, CS224N,
params
etc. += η * gradient
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 36
Beyond the
basics

37
Shared weights?

It turns out if you want to force some of your weights to be shared over
different neurons, the math isn’t much harder.
Convolution is an example of such weight-sharing and is used a lot for
vision (Convolutional Neural Networks, CNN).

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 38
Neural networks with multiple layers

𝒙 𝒂 𝒃 𝒄 𝒅 𝒆 𝒇 J
𝒚 𝐿𝐿
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 39
Neurons learn features of the dataset
Neurons in later layers will respond strongly to high-level
features of your training data.
If your training data is faces, you will get lots of face neurons.

If your training data

is all of YouTube…

…you get a cat

neuron.

Top stimuli in test set Optimal stimulus found

by numerical optimization
40
Le, et al., Building high-level features usingLisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024
large-scale unsupervised learning. ICML 2012
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 41
Multiple outputs?
Softmax is a generalization of
the sigmoid function.
sigmoid 𝑧 : value in range [0, 1]
𝑧 ∈ ℝ:
𝑃 𝑌 = 1|𝑿 = 𝒙 = 𝜎 𝑧
(equivalent: Bernoulli 𝑝)

softmax 𝑧 : 𝑘-dimensional values in

range[0,1] that add up to 1
𝒛 ∈ ℝ6 :
𝑃 𝑌 = 𝑖|𝑿 = 𝒙 = softmax 𝒛 (
(equivalent: Multinomial 𝑝$ , … , 𝑝) )

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 42
Softmax test metric: Top-5 error
(probabilities of predictions)

𝑌=𝑦 𝑃 𝑌 = 𝑦|𝑿 = 𝒙
5 0.14
8 0.13
7 0.12 Top-5 classification error
2 0.10 What % of datapoints
9 0.10 did not have the correct
4 0.09 class label in the top-5
1 0.09 predictions?
0 0.09
6 0.08 (class label: 5)
3 0.05
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 43
ImageNet classification
…
smoothhound, smoothhound shark, Mustelus mustelus
22,000 categories American smooth dogfish, Mustelus canis
Florida smoothhound, Mustelus norrisi
whitetip shark, reef whitetip shark, Triaenodon obseus
14,000,000 images Atlantic spiny dogfish, Squalus acanthias Stingray
Pacific spiny dogfish, Squalus suckleyi
hammerhead, hammerhead shark
smooth hammerhead, Sphyrna zygaena
Hand-engineered features smalleye hammerhead, Sphyrna tudes
(SIFT, HOG, LBP), shovelhead, bonnethead, bonnet shark, Sphyrna tiburo
angel shark, angelfish, Squatina squatina, monkfish
Spatial pyramid, electric ray, crampfish, numbfish, torpedo
smalltooth sawfish, Pristis pectinatus
SparseCoding/Compression guitarfish
roughtail stingray, Dasyatis centroura
butterfly ray
eagle ray
spotted eagle ray, spotted ray, Aetobatus narinari
cownose ray, cow-nosed ray, Rhinoptera bonasus
Mantaray
manta, manta ray, devilfish
Atlantic manta, Manta birostris
devil ray, Mobula hypostoma
grey skate, gray skate, Raja batis
little skate, Raja erinacea
…
44
Le, et al., Building high-level features usingLisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024
large-scale unsupervised learning. ICML 2012
ImageNet classification challenge
…

22,000 categories 1000 categories smoothhound, smoothhound shark, Mustelus mustelus

American smooth dogfish, Mustelus canis
Florida smoothhound, Mustelus norrisi
whitetip shark, reef whitetip shark, Triaenodon obseus
14,000,000 images 1,200,000 images in train set
Atlantic spiny dogfish, Squalus acanthias
Pacific spiny dogfish, Squalus suckleyi
200,000 images in test
hammerhead, hammerhead shark
set
Hand-engineered features smooth hammerhead, Sphyrna zygaena
smalleye hammerhead, Sphyrna tudes
(SIFT, HOG, LBP), shovelhead, bonnethead, bonnet shark, Sphyrna tiburo
angel shark, angelfish, Squatina squatina, monkfish
Spatial pyramid, electric ray, crampfish, numbfish, torpedo
smalltooth sawfish, Pristis pectinatus
SparseCoding/Compression guitarfish
roughtail stingray, Dasyatis centroura
butterfly ray
eagle ray
spotted eagle ray, spotted ray, Aetobatus narinari
cownose ray, cow-nosed ray, Rhinoptera bonasus
manta, manta ray, devilfish
Atlantic manta, Manta birostris
devil ray, Mobula hypostoma
grey skate, gray skate, Raja batis
little skate, Raja erinacea
…
45
Le, et al., Building high-level features usingLisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024
large-scale unsupervised learning. ICML 2012
ImageNet challenge: Top-5 classification error
(lower is better)

99.5%
Random guess

999
5 995
𝑃 true class label not in 5 guesses = =
1000 1000
5

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 46
ImageNet challenge: Top-5 classification error
(lower is better)

99.5% 25.8% 5.1%

Random guess Pre-Neural Networks Humans
(2014)

16.4%
? GoogLeNet
(2015)

Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge. IJCV 2015
Szegedy et al., Going Deeper With Convolutions. CVPR 2015
Hu et al., Squeeze-and-Excitation Networks. Preprint arXiV 2017
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 47
ImageNet challenge: Top-5 classiﬁcation error
(lower is better)

99.5% 25.8% 5.1%

Random guess Pre-Neural Networks Humans
(2014)

16.4%
? 2.25%
GoogLe Net SENet
(2015) (2017)

1 Trillion Artificial Neurons

(btw human brains have 1 billion neurons)

Multiple,
Multi class output

22 layers deep!
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 49
Szegedy et al., Going Deeper With Convolutions. CVPR 2015
Speeding up gradient descent minimizes loss (a function of prediction error)

initialize !! = 0 for 0 ≤ j ≤ m
repeat many times:
gradient[j] = 0 for 0 ≤ j ≤ m

⚠ for each training example (",#):

for each 0 ≤ j ≤ m:
1. What if we have 1,200,000
images in our training set?
compute gradient

⚠ !! -= η * gradient[j] for all 0 ≤ j ≤ m 2. How can we speed up the update?

Our batch gradient descent (over the entire training set) will be slow +
expensive.
1. Use stochastic gradient descent
(randomly select training examples with replacement).
2. Momentum update
(Incorporate “acceleration” or “deceleration” of gradient updates so far)
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 50
Good ML = Generalization
Overfitting
Fitting the training data too well,
such that we lose generality of
model for predicting new data
perfect fit, but bad more general fit + better
predictor for new data predictor for new data

Dropout
During training, randomly leave out
some neurons each training step.
It will make your network more robust.

Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 51
Making decisions?

Not everything is classification.

Deep Reinforcement Learning

Instead of having the output of
a model be a probability, you
make output an expectation.

http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 52
Deep Reinforcement Learning

http://cs.stanford.edu/people/karpathy
/convnetjs/demo/rldemo.html

Deep Mind Atari Games

Score compared to best
human
Lisa Yan, Chris Piech, Mehran Sahami, and Jerry Cain, CS109, Winter 2024 53

Artificial Intelligence and Machine Learning
0% (1)
Artificial Intelligence and Machine Learning
11 pages
AI ML Solved Question Paper
No ratings yet
AI ML Solved Question Paper
25 pages
Unit 3-2
100% (1)
Unit 3-2
50 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
289 pages
Adaline
No ratings yet
Adaline
18 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
151 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
Neural Networks: 10-601B Introduction To Machine Learning
No ratings yet
Neural Networks: 10-601B Introduction To Machine Learning
78 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
4 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
Unit-1 ML Notes
No ratings yet
Unit-1 ML Notes
20 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
372 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
98 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Mod-1 Part 1
No ratings yet
Mod-1 Part 1
143 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
1 Slides ANN
No ratings yet
1 Slides ANN
90 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Lec3 MLP Optimization
No ratings yet
Lec3 MLP Optimization
86 pages
GeoStat DeepLearn NDesassis 15 06 22
No ratings yet
GeoStat DeepLearn NDesassis 15 06 22
134 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
Deep Learning - Part-1
No ratings yet
Deep Learning - Part-1
143 pages
DL Slides 1
No ratings yet
DL Slides 1
63 pages
Unit 5
No ratings yet
Unit 5
61 pages
Deep Learning Notes All Units
No ratings yet
Deep Learning Notes All Units
69 pages
Module 2
No ratings yet
Module 2
44 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Chapter10 Keras
No ratings yet
Chapter10 Keras
66 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
L2 - UCLxDeepMind DL2020
No ratings yet
L2 - UCLxDeepMind DL2020
104 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Lecture 12 - Deep Learning
No ratings yet
Lecture 12 - Deep Learning
25 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
CVD Lab Manual
No ratings yet
CVD Lab Manual
33 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
CSCE 636: Deep Learning
No ratings yet
CSCE 636: Deep Learning
30 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
Cours 1
No ratings yet
Cours 1
42 pages
Neural Nets
No ratings yet
Neural Nets
33 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
4 Classification 2
No ratings yet
4 Classification 2
55 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Unit 5
No ratings yet
Unit 5
59 pages
ML 24 NN Part1 v.0.2 - 7
No ratings yet
ML 24 NN Part1 v.0.2 - 7
85 pages
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
No ratings yet
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
48 pages
A Time Series Is Worth 64 Words - Long-Term Forecasting With Transformers
No ratings yet
A Time Series Is Worth 64 Words - Long-Term Forecasting With Transformers
24 pages
Feed-Forward Neural Networks (Part 1)
No ratings yet
Feed-Forward Neural Networks (Part 1)
33 pages
Intro To NN & FL
No ratings yet
Intro To NN & FL
42 pages
ML 23 First Lectures 2 3 v0.1
No ratings yet
ML 23 First Lectures 2 3 v0.1
66 pages
Zhao 2017
No ratings yet
Zhao 2017
62 pages
UNIT I-PGI20C05J-Deep Neural Networks
No ratings yet
UNIT I-PGI20C05J-Deep Neural Networks
35 pages
CH 16
No ratings yet
CH 16
16 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
54 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
Lec 05
No ratings yet
Lec 05
46 pages
Neuro Fuzzy System
No ratings yet
Neuro Fuzzy System
24 pages
Ain3001 - Introduction - To.ann
No ratings yet
Ain3001 - Introduction - To.ann
39 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
Lecture 08 On Neural Networks 1
No ratings yet
Lecture 08 On Neural Networks 1
15 pages
Back Propagation
No ratings yet
Back Propagation
20 pages
06 NeuralNetworks 2024
No ratings yet
06 NeuralNetworks 2024
82 pages
Introduction To Machine Learning: by Prof. Mohini Chaudhari
No ratings yet
Introduction To Machine Learning: by Prof. Mohini Chaudhari
16 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
Lec 1
No ratings yet
Lec 1
30 pages
ANN Unit-2
No ratings yet
ANN Unit-2
48 pages
Boltz321 PDF
No ratings yet
Boltz321 PDF
7 pages
AI Transformation Playbook v8
No ratings yet
AI Transformation Playbook v8
10 pages
Lecture 02 - Neural Networks - 4p
No ratings yet
Lecture 02 - Neural Networks - 4p
10 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
No ratings yet
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
12 pages
Assignment Week6 AI4ICPS
No ratings yet
Assignment Week6 AI4ICPS
11 pages
RE 2022 414908 - Plag Report
No ratings yet
RE 2022 414908 - Plag Report
12 pages
So Far : Lecture 1: Review of Classical & Modern Control Lecture 2: MATLAB Lecture
No ratings yet
So Far : Lecture 1: Review of Classical & Modern Control Lecture 2: MATLAB Lecture
12 pages
ACSML0502
No ratings yet
ACSML0502
4 pages
2 Marks
No ratings yet
2 Marks
11 pages
Ai Based Chatbot To Answer Faqs: Abstract
No ratings yet
Ai Based Chatbot To Answer Faqs: Abstract
5 pages
Neural Networks
No ratings yet
Neural Networks
5 pages
Neural Networks Activation Functions 1694135997
No ratings yet
Neural Networks Activation Functions 1694135997
7 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
6 pages
Hadi Prasetyo - MendeleyFiks
No ratings yet
Hadi Prasetyo - MendeleyFiks
2 pages
Object Detection and Its Implementation On Android Devices
No ratings yet
Object Detection and Its Implementation On Android Devices
1 page
Tomorrow: AI Rebellion
From Everand
Tomorrow: AI Rebellion
DONALD WRIGHT
No ratings yet
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet

26 Deep Learning Annotated

Uploaded by

26 Deep Learning Annotated

Uploaded by

26: Intro to Deep

Deep learning and neural

A Neural Algorithm of Artistic Style Google Deep Dream

def Deep learning is def A neural network is,

𝑿 𝜃! + # 𝜃" 𝑋" 0.6

𝑿 𝜃! + # 𝜃" 𝑋" 0.6

𝑦" Let’s focus on the

(or rather, someone else’s brain)

We make feature vectors from digitized pictures of numbers.

What can we do to increase

Big idea #1 𝑃 𝑌|𝑿 = 𝒙

Non-linear transform of multiple

𝒉, hidden • Neuron = logistic regression

𝒉, hidden • Neuron = logistic regression

|𝒉| logistic 1 logistic

𝒉, hidden • Neuron = logistic regression

For input feature vector 𝑿 = 𝒙:

How do we learn the 𝒙 ⋅ 𝒉 + |𝒉| parameters?

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦"

2. Compute gradient Find |𝒙| parameters

3. Optimize initialize params

𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃

𝐿 𝜃 = 9 𝑃 𝑌 = 𝑦 ( |𝑿 = 𝒙 ( , 𝜃 Binary class labels:

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

𝜃+,- = arg max 9 𝑓 𝑦 ( | 𝒙 ( , 𝜃 = arg max 𝐿𝐿 𝜃

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

"#, output ("%

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize Calculus refresher #1: Calculus refresher #2:

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize initialize params

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

3. Optimize initialize params

𝐿𝐿 𝜃 = < 𝑦 (() log 𝑦" ( + 1 − 𝑦 (() log 1 − 𝑦" (

2. Compute gradient Take gradient with respect to all 𝜃 parameters

3. Optimize initialize params

If your training data

…you get a cat

Top stimuli in test set Optimal stimulus found

softmax 𝑧 : 𝑘-dimensional values in

22,000 categories 1000 categories smoothhound, smoothhound shark, Mustelus mustelus

99.5% 25.8% 5.1%

99.5% 25.8% 5.1%

1 Trillion Artificial Neurons

⚠ for each training example (",#):

⚠ !! -= η * gradient[j] for all 0 ≤ j ≤ m 2. How can we speed up the update?

Not everything is classification.

Deep Reinforcement Learning

Deep Mind Atari Games

You might also like