Neural Networks For Machine Learning: Lecture 6a Overview of Mini - Batch Gradient Descent
Neural Networks For Machine Learning: Lecture 6a Overview of Mini - Batch Gradient Descent
Neural Networks For Machine Learning: Lecture 6a Overview of Mini - Batch Gradient Descent
w2
E
w
error
epoch
w1
101,
101
2
101,
99
0
1,
1
2
1,
-1
0
w2
color
indicates
training
case
gives
error
surface
gives
error
surface
w1
0.1,
10
2
0.1,
-10
0
1,
1
2
1,
-1
0
w2
color
indicates
weight
axis
gives
error
surface
gives
error
surface
E
(t)
w
w(t) = v(t)
E
= v(t 1)
(t)
w
E
= w(t 1)
(t)
w
1 $ E '
v() =
&
)
1 % w (
brown
vector
=
jump,
red
vector
=
correc@on,
green
vector
=
accumulated
gradient
blue
vectors
=
standard
momentum
wij = gij
E
wij
# E
&
E
if %%
(t)
(t 1)(( > 0
$ wij wij
'
then gij (t) = gij (t 1) +.05
else gij (t) = gij (t 1)*.95
(t)
Dividing
the
gradient
by
MeanSquare(w,
t)
makes
the
learning
work
much
beYer
(Tijmen
Tieleman,
unpublished).