Lecture 5 Fall 2024
Lecture 5 Fall 2024
Softmax
SVM
Full loss
Approximate sum
using a minibatch of
examples
32 / 64 / 128 common
SGD
SGD+Momentum
RMSProp
Adam
Learning rate
Cosine:
Cosine:
Cosine:
Linear:
Cosine:
Linear:
Inverse sqrt:
(In practice we will usually add a learnable bias at each layer as well)
y θ
“Neural Network” is a very broad term; these are more accurately called
“fully-connected networks” or sometimes “multi-layer perceptrons” (MLP)
(In practice we will usually add a learnable bias at each layer as well)
(In practice we will usually add a learnable bias at each layer as well)
x W1 h W2 s
3072 100 10
x W1 h W2 s
3072 100 10
tanh Maxout
ReLU ELU
Forward pass
Forward pass
Forward pass
Gradient descent
axon
cell
body
axon
cell
body
axon
cell
body
Regularization
Regularization
x
s (scores) hing
* e
loss
+
L
W
R