Lecture20 Slides
Lecture20 Slides
Tom Kelsey
Simple in principle:
Given weights - NN gives a y-hat
ŷ compared to y gives an error measure (RSS say)
Changing the weights can make this bigger or smaller
Want to change weights to make this smaller
Error is a function of weights - so numerically optimise to
reduce
It’s a search over multiple dimensions (dictated by number of
parameters/weights).
Simple in principle:
Set some initial weights (can’t estimate error without a
parameterised model) - software deals with this - probably
random uniform.
Calculate an initial error (based on observed versus current
predicted).
For each weight determine if increasing or decreasing the
weight increases/decreases the error.
Move a bit in the correct direction. Recalculate error with
new parameters. Repeat.
Stop at some point i.e. further weight alterations make
no/little improvement.
This is a gradient search, iterating over multiple dimensions
(dictated by number of parameters/weights).
Note:
the NN starts simple (boring set of parameters), gets more
complicated as we iterate.
the step size (γ) controls how rapidly we fluctuate the
parameters (‘learning rate’).
so complexity can be controlled by stopping the
optimisation process.
one pass through all the data, changing weights, is called an
epoch.
1
1 + e−x
tanh(x) = 2 × sigmoid(2x)
max(0, x)
Rθ + λJ (θ)