0% found this document useful (0 votes)

31 views35 pages

Lecture20 Slides

This document discusses neural networks and their design. It provides guidelines for selecting an initial network configuration, conducting experiments to avoid local minima, and adding or removing layers and neurons if under-learning or over-learning occurs. It also addresses issues related to gathering and preprocessing data for neural networks, including scaling, imputing missing values, converting non-numeric data, and detecting and removing outliers. Finally, it summarizes three methods for fitting neural networks: backpropagation, quasi-Newton, and conjugate gradient descent.

Uploaded by

Tev Wallace

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views35 pages

Lecture20 Slides

Uploaded by

Tev Wallace

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Knowledge Discovery and Data Mining

Lecture 20 - More Neural Nets

Tom Kelsey

School of Computer Science

University of St Andrews
http://www.cs.st-andrews.ac.uk/~tom/
[email protected]

Tom Kelsey ID5059-20-NN2 March 2021 1 / 35

Neural Nets - Reprise

Powerful – highly complex, nonlinear models

Easy to use
Far from easy to design
In and out of fashion depending on AI hype-levels
Crude low-level model of biological neural systems
No operational difference between NN and nonlinear
regression/classification
At this point for us, a NN is a perceptron

Tom Kelsey ID5059-20-NN2 March 2021 2 / 35

Neural Nets - Design
1 Select an initial network configuration - start simple-ish e.g.
one hidden layer with the number of hidden units set to half
the sum of the number of input and output units
2 Iteratively conduct a number of experiments with each
configuration, retaining the best network (in terms of
generalisation error) found
a number of experiments are required with each
configuration to avoid being fooled if training locates a local
minimum (initial weights are randomly allocated)
3 On each experiment, if under-learning occurs try adding
more neurons to the hidden layer(s)
if this doesn’t help, try adding an extra hidden layer
4 If over-learning occurs, try removing hidden units (and
possibly layers)
5 Once you have experimentally determined an effective
configuration for your networks, resample and generate
new networks with that configuration.
Tom Kelsey ID5059-20-NN2 March 2021 3 / 35
Gathering Data

Neural networks process numeric data - care required in the

nature of data provided.
Problems if data is in an unusual range . . .
. . . if there is missing data . . .
. . . or if data is non-numeric

Numeric data can be scaled

Missing values can be imputed e.g. replaced by means (or
something else)
Non-numeric data must be converted

Tom Kelsey ID5059-20-NN2 March 2021 4 / 35

Gathering Data

These are not problems restricted to NN - but some methods are

less affected e.g. trees: categorical inputs & missing data.
Gender = (Male, Female) -> (0,1), or Size = (S,M,L) -> (0,1,2)
Animal = (reptile, bird, mammal) – some sort of binary
encoding required (e.g. dummy variables)
Postcodes = (A1 1AA, . . . ,Z99 9ZZ) is a non-starter
Solution – assign expert ratings to subsets
Dates & times can be offset from a chosen base
Unconstrained text fields (such as names) should be
discarded

Tom Kelsey ID5059-20-NN2 March 2021 5 / 35

How much data?

Standard heuristic – n ≈ 10 × C , where C is the number of

connections
There is no correct answer – depends on the nonlinear
model, which is unknown
Also depends on variance in noise, also unknown a priori
If data is sparse – or expensive – NN is probably not the
right choice.

Tom Kelsey ID5059-20-NN2 March 2021 6 / 35

Outliers

NNs tolerate noise well – up to a limit

If possible, detect & remove before training
Again, no right answers, only heuristics & folklore

Tom Kelsey ID5059-20-NN2 March 2021 7 / 35

Other data issues

The future is not the past – changes in circumstance

invalidate historic data
Data has to cover all eventualities – a NN trained on low
incomes will predict nothing about high incomes
Easiest features are learned first – even if they aren’t the
features of interest
Unbalanced data
True disease rate is 5%
Data collected and NN trained from general population
NN used on visitors to a clinic for whom the disease rate is
60%
The NN will react over-cautiously and fail to recognize
disease in some unhealthy patients

Tom Kelsey ID5059-20-NN2 March 2021 8 / 35

Fitting NNs

Simple in principle:
Given weights - NN gives a y-hat
ŷ compared to y gives an error measure (RSS say)
Changing the weights can make this bigger or smaller
Want to change weights to make this smaller
Error is a function of weights - so numerically optimise to
reduce
It’s a search over multiple dimensions (dictated by number of
parameters/weights).

Tom Kelsey ID5059-20-NN2 March 2021 9 / 35

Fitting Method 1 – Back propagation

Simple in principle:
Set some initial weights (can’t estimate error without a
parameterised model) - software deals with this - probably
random uniform.
Calculate an initial error (based on observed versus current
predicted).
For each weight determine if increasing or decreasing the
weight increases/decreases the error.
Move a bit in the correct direction. Recalculate error with
new parameters. Repeat.
Stop at some point i.e. further weight alterations make
no/little improvement.
This is a gradient search, iterating over multiple dimensions
(dictated by number of parameters/weights).

Tom Kelsey ID5059-20-NN2 March 2021 10 / 35

Error Surface

Source: WikiMedia Commons

Tom Kelsey ID5059-20-NN2 March 2021 11 / 35

Error Surface

Source: WikiMedia Commons

Tom Kelsey ID5059-20-NN2 March 2021 12 / 35

Fitting Method 1 – Back propagation

Take Rθ as the resubstitution error given parameters θ1 , θ2 ...

etc. (e.g. RSS).
Create little local problems to solve at each non-input node.
A little bit of calculus gets you there: application of the chain
rule allows determination of error changes at each
non-input node.
Keep track of how Rθ (R for simplicity) changes for changes
∂R
in each parameter i.e. for i-th par.
∂θi

Tom Kelsey ID5059-20-NN2 March 2021 13 / 35

Fitting Method 1 – Back propagation

Create little local problems to solve at each non-input node.

Iteration r + 1:
∂R
β r+1 = β r − γ r
∂β
So, if R increases with increasing βr , decrease to create βr+1
by step γ.
Keep doing this until R gets small.

Tom Kelsey ID5059-20-NN2 March 2021 14 / 35

Fitting Method 1 – Back propagation

Note:
the NN starts simple (boring set of parameters), gets more
complicated as we iterate.
the step size (γ) controls how rapidly we fluctuate the
parameters (‘learning rate’).
so complexity can be controlled by stopping the
optimisation process.
one pass through all the data, changing weights, is called an
epoch.

Tom Kelsey ID5059-20-NN2 March 2021 15 / 35

A visual pass through

The following gives a flavour of what is involved.

Source: Bernacki & Wlodarczyk, AGH
You should note:
How compuationally intensive this can potentially be
How the objective is to alter weights at each pass by some
(carefully caculated) amount to reduce prediction error

Tom Kelsey ID5059-20-NN2 March 2021 16 / 35

Back propagation

Tom Kelsey ID5059-20-NN2 March 2021 17 / 35

Fitting Methods - there are several

You should be aware there are many ways to optimise problems

like this. We’ll only mention 3:
Back-propagation (BP)
Quasi-Newton (QN)
Conjugate Gradient Descent (CGD)

Tom Kelsey ID5059-20-NN2 March 2021 18 / 35

Fitting Method 2 – Quasi-Newton

Back-propagation performed at all layers simultaneously.

xt+1 = xt − hHf (xt )−1 5 f (xt )

Quasi-Newton works by exploiting the observation that, on
a quadratic error surface, one can step directly to the
minimum using the Newton step. Any error surface is
approximately quadratic "close to" a minimum.
Since the Hessian is expensive to calculate, and since the
Newton step is likely to be wrong on a non-quadratic
surface, Quasi-Newton iteratively builds up an
approximation to the inverse Hessian.
The approximation at first follows the line of steepest
descent, and later follows the estimated Hessian more
closely.

Tom Kelsey ID5059-20-NN2 March 2021 19 / 35

Fitting Method 3 – Conjugate Gradient Descent

Works by constructing a series of line searches across the

error surface.
It first works out the direction of steepest descent, then
locates a minimum in this direction.
The conjugate directions are chosen to try to ensure that the
directions that have already been minimized stay
minimized.
The conjugate directions are calculated on the assumption
that the error surface is quadratic. If the algorithm discovers
that the current line search direction isn’t actually downhill,
it simply calculates the line of steepest descent and restarts
the search in that direction. Once a point close to a minimum
is found, the quadratic assumption holds true and the
minimum can be located very quickly.

Tom Kelsey ID5059-20-NN2 March 2021 20 / 35

Fitting Methods Summary

All methods can get stuck at a local minimum

All methods can converge very slowly due to the gradients
involved being extemely small
There are complex tradeoffs in computational complexity,
convergence rates, numerical stability, heuristic choices, . . .
The specific activation function(s) are likely to affect
performance

Tom Kelsey ID5059-20-NN2 March 2021 21 / 35

Fitting Methods Summary

All methods can get stuck at a local minimum

Tom Kelsey ID5059-20-NN2 March 2021 22 / 35

Activation functions

Each internal layer will apply a function to the weighted

sums of its inputs
There are several types in common use, each with strengths
and weaknesses
Which (if any) is preferred for a specific problem depends
on the data, the fitting method, the number & size of the
layers, etc.

Tom Kelsey ID5059-20-NN2 March 2021 23 / 35

Step function

Yes or no, depending on a threshold

Seems ideal for a binary classifier...
...but how do we collate outputs from several nodes?
Gradient descent methods are not applicable as the function
has zero gradient (where defined)

Tom Kelsey ID5059-20-NN2 March 2021 24 / 35

Linear function A = cx

Not shown in the previuous figure as it is not a good

activation function
The gradient is always c, so improvement based on gradient
won’t work
A linear function applied to a linear combination is also a
linear combination...
...So hidden layers can be replaced by a single linear formula

Tom Kelsey ID5059-20-NN2 March 2021 25 / 35

Sigmoid

1
1 + e−x

Sigmoid means “shaped like an S" so there are many

sigmoid functions
This one is sometimes referred to as the sigmoid function
Nonlinear, and combinations are also nonlinear
Activations won’t blow up in size, since the range is (0, 1)
Smooth gradient for all values, but often a small gradient, so
convergence may be slow

Tom Kelsey ID5059-20-NN2 March 2021 26 / 35

Hyperbolic tangent

tanh(x) = 2 × sigmoid(2x)

A scaled sigmoid function

Activations won’t blow up in size, since the range is (−1, 1)
Steeper derivatives, so stronger gradient

Tom Kelsey ID5059-20-NN2 March 2021 27 / 35

Rectified Linear Unit – ReLu

max(0, x)

Looks like a combination of step and linear, and hence the

worst of both worlds
In fact nonlinear with nonlinear combinations
Unbound, so activation can blow up
Has the advantage that many neurons don’t fire, making
activations sparse and efficient
If we choose initial weights to be random values between -1
and 1, then almost 50% of the network yields zero activation
because of the characteristic of ReLu, and the network is
lighter

Tom Kelsey ID5059-20-NN2 March 2021 28 / 35

The dying ReLu problem

The horizontal line in ReLu sends gradients towards zero

For activations in that region of ReLu, gradient will be 0
because of which the weights will not get adjusted during
descent
So those neurons which go into that state will stop
responding to variations in error/ input ( simply because
gradient is 0, nothing changes )
This problem can cause several neurons to just die (i.e. not
respond) making a substantial part of the network passive
Several solutions – a common one is leaky ReLu
A = 0.01x for x < 0 gives a slightly inclined line rather than
a horizontal line

Tom Kelsey ID5059-20-NN2 March 2021 29 / 35

Others

Hard Tanh – max(1, min(1, x))

2
LeCun’s Tanh – 1.7159 × tanh( x)
3
Complementary log-log – log(1 + exp(x))
Gaussian Error Linear Unit (GELU)
Exponential linear unit (ELU)
Scaled exponential linear unit (SELU)
S-shaped rectified linear activation unit (SReLU)
etc.

Tom Kelsey ID5059-20-NN2 March 2021 30 / 35

Overfitting

A NN can be a very rich class of functions with even just a

single hidden layer with a few hidden units
So we are likely to have a model with sufficient inherent
complexity to model complex systems
This presents a problem too - the model can easily overfit i.e.
learn the training dataset very well, giving a model with
poor generality
The standard problem that we have encountered throughout
our consideration of automated model selections Two
approaches are considered here

Tom Kelsey ID5059-20-NN2 March 2021 31 / 35

Validation

Maintain an independent dataset which is not used to

develop the model, but is used to measure model
performance/generality
Seek a model that predicts data we have not yet seen - the
use of validation or cross-validation data simulates this
scenario
Simplest method is to use a single validation dataset, and
‘stop’ fitting when the performance of the model against the
validation dataset begins to deteriorate

Tom Kelsey ID5059-20-NN2 March 2021 32 / 35

Weight decay

Similar to the approach in tree-methods we can balance our

raw model fit against a measure of model complexity
Using Rθ as our measure of resubstitution error with a given
set of parameters θ:

Rθ + λJ (θ)

R and J are effectively in competition, and as we are using a

gradient search, you can think of λJ as preventing us from
reaching our global minimum for R
we must estimate λ and the usual approach would be via
validation or cross-validation performance
This reveals that we have in effect just considered a more
explicit phrasing of the validation approach above

Tom Kelsey ID5059-20-NN2 March 2021 33 / 35

NN problems overview

Lack of interpretability: these models are effectively

black-box
Over-fitting: NNs are clearly prone to overfitting if some
proper controls are not put in place
Specification decisions: there are a bewildering array of
activation functions, combination functions, output
functions, training methods, parameters (e.g. number of
hidden units and layers), standardisations etc.
Local minima: as for standard non-linear regression, we may
require multiple fits to ensure we have not been trapped in a
sub-optimal solution by local minima in the error function
Long run-times – these models can take a very long time to
fit

Tom Kelsey ID5059-20-NN2 March 2021 34 / 35

Summary

Consider Neural Nets when all or most of the following apply:

Lots of cheap, numeric data in reasonable ranges
Not too many outliers
Don’t know or don’t care about the model relating input to
response
Happy to wait a long time for a predictor
Don’t care about interpretability

Tom Kelsey ID5059-20-NN2 March 2021 35 / 35

Maths For AI
No ratings yet
Maths For AI
176 pages
Mathematical Foundations of Machine Learning
100% (1)
Mathematical Foundations of Machine Learning
340 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
332 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
NEURAL NETWORKS Basics Using Matlab
100% (2)
NEURAL NETWORKS Basics Using Matlab
51 pages
Unit 3
No ratings yet
Unit 3
21 pages
Masks Common Moves
No ratings yet
Masks Common Moves
3 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Neuralnetworks 1
No ratings yet
Neuralnetworks 1
65 pages
Elements of Strucural Optimization Haftka
No ratings yet
Elements of Strucural Optimization Haftka
498 pages
Lecture Notes in Economics and Mathematical Systems 583
No ratings yet
Lecture Notes in Economics and Mathematical Systems 583
484 pages
A Review of Design, Analysis and Optimization Methodologies For Floating Wind Turbine Support Structures
No ratings yet
A Review of Design, Analysis and Optimization Methodologies For Floating Wind Turbine Support Structures
24 pages
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
No ratings yet
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
223 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Main Notes
No ratings yet
Main Notes
227 pages
Math Foundations of Machine Learning Mississippi SU
No ratings yet
Math Foundations of Machine Learning Mississippi SU
328 pages
Main Notes
No ratings yet
Main Notes
227 pages
Cs229-Main Notes Andrew NG and Tengyu Ma
No ratings yet
Cs229-Main Notes Andrew NG and Tengyu Ma
227 pages
ML Main Printing Material
No ratings yet
ML Main Printing Material
241 pages
Optimizer Methods HYSYS PDF
No ratings yet
Optimizer Methods HYSYS PDF
9 pages
ENNS: Variable Selection, Regression, Classification and Deep Neural Network For High-Dimensional Data
No ratings yet
ENNS: Variable Selection, Regression, Classification and Deep Neural Network For High-Dimensional Data
45 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
CS229
No ratings yet
CS229
216 pages
Paloschi-1982-PhD-Thesis The Numerical Solution of Nonlinear Equations Representing Chemical Processes
No ratings yet
Paloschi-1982-PhD-Thesis The Numerical Solution of Nonlinear Equations Representing Chemical Processes
189 pages
Optimization For Machine Learning: Massachusetts Institute of Technology
No ratings yet
Optimization For Machine Learning: Massachusetts Institute of Technology
169 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
Optim
No ratings yet
Optim
70 pages
Lecture 21
No ratings yet
Lecture 21
138 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
Unit 3
No ratings yet
Unit 3
110 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Section-2 Roots of Equations: Bracketing Methods
No ratings yet
Section-2 Roots of Equations: Bracketing Methods
12 pages
Non-Linear Analysis - Basis: Code - Aster, Salome-Meca Course Material
No ratings yet
Non-Linear Analysis - Basis: Code - Aster, Salome-Meca Course Material
59 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Practical Methods For Optimal Control Using Nonlinear Programming 1st Edition John T. Betts
No ratings yet
Practical Methods For Optimal Control Using Nonlinear Programming 1st Edition John T. Betts
70 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Neural Networks:: Basics Using MATLAB
No ratings yet
Neural Networks:: Basics Using MATLAB
54 pages
Process Optimization Algorythms PDF
No ratings yet
Process Optimization Algorythms PDF
77 pages
Lecture 22
No ratings yet
Lecture 22
64 pages
CSCI946 W5-Classification
No ratings yet
CSCI946 W5-Classification
72 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Applied and Computational Optimal Control A Control Parametrization Approach Kok Lay Teo Bin Li Changjun Yu Volker Rehbock Download
No ratings yet
Applied and Computational Optimal Control A Control Parametrization Approach Kok Lay Teo Bin Li Changjun Yu Volker Rehbock Download
55 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Knowledge Discovery and Data Mining: Lecture 11 - Tree Methods - Introduction
No ratings yet
Knowledge Discovery and Data Mining: Lecture 11 - Tree Methods - Introduction
49 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
1997 Statistical Methods Neural Network Prediction Models Ee JVR 97 2
No ratings yet
1997 Statistical Methods Neural Network Prediction Models Ee JVR 97 2
55 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
ANN For Fitting Applications (Data Modeling)
No ratings yet
ANN For Fitting Applications (Data Modeling)
47 pages
Continuous Time Stochastic Modelling
No ratings yet
Continuous Time Stochastic Modelling
36 pages
Vahid
No ratings yet
Vahid
18 pages
Work 1 Matlab Chiheb Ismail Final Report
No ratings yet
Work 1 Matlab Chiheb Ismail Final Report
38 pages
EMT 3200 Group #4 Curve Fitting
No ratings yet
EMT 3200 Group #4 Curve Fitting
50 pages
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
No ratings yet
Unconstrained Optimization: Prof. S.S. Jang Department of Chemical Engineering National Tsing-Hua Univeristy
46 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Artificial Neural Nets
No ratings yet
Artificial Neural Nets
34 pages
NLPQLP
No ratings yet
NLPQLP
36 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Ch22 Presn PDF
No ratings yet
Ch22 Presn PDF
34 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
ALGORITHM 652 HOMPACK - A Suite of Codes For Globally Convergent Homotopy Algorithms
No ratings yet
ALGORITHM 652 HOMPACK - A Suite of Codes For Globally Convergent Homotopy Algorithms
30 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Chap4 Ann
No ratings yet
Chap4 Ann
22 pages
5 Gradients
No ratings yet
5 Gradients
26 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
Part of DL
No ratings yet
Part of DL
24 pages
Instance Based Learning
No ratings yet
Instance Based Learning
21 pages
Assignment DL
No ratings yet
Assignment DL
20 pages
MLbook Extract
No ratings yet
MLbook Extract
14 pages
S02 DNN Perceptron Wip
No ratings yet
S02 DNN Perceptron Wip
24 pages
An Improved Method For Gas Lift Allocation Optimization PDF
No ratings yet
An Improved Method For Gas Lift Allocation Optimization PDF
14 pages
MT4614 Exam
No ratings yet
MT4614 Exam
13 pages
Cooperative Co-Evolution With Differential Grouping For Large Scale Optimization
No ratings yet
Cooperative Co-Evolution With Differential Grouping For Large Scale Optimization
16 pages
Finite Element Analysis For Minimal Shape by Arcaro, Klinka, Gasparini
No ratings yet
Finite Element Analysis For Minimal Shape by Arcaro, Klinka, Gasparini
27 pages
COMP1680 Coursework 1
No ratings yet
COMP1680 Coursework 1
14 pages
COMP1680 Coursework 1
No ratings yet
COMP1680 Coursework 1
14 pages
Loke and Dahlin 2002 Journal of Applied Geophysics 49 149-162
No ratings yet
Loke and Dahlin 2002 Journal of Applied Geophysics 49 149-162
15 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
SPE 125331 Waterflooding Optimization Using Gradient Based Methods
No ratings yet
SPE 125331 Waterflooding Optimization Using Gradient Based Methods
14 pages
Module 2 DL
No ratings yet
Module 2 DL
9 pages
Yu Et Al. - 2022 - Progressive Ensemble Kernel-Based Broad Learning System For Noisy Data Classification
No ratings yet
Yu Et Al. - 2022 - Progressive Ensemble Kernel-Based Broad Learning System For Noisy Data Classification
14 pages
Trac-Ik: An Open-Source Library For Improved Solving of Generic Inverse Kinematics
No ratings yet
Trac-Ik: An Open-Source Library For Improved Solving of Generic Inverse Kinematics
14 pages
Chapter 6vh
No ratings yet
Chapter 6vh
12 pages
Crosstalk Modeling in High-Speed Transmission Lines by Multilayer Perceptron Neural Networks
No ratings yet
Crosstalk Modeling in High-Speed Transmission Lines by Multilayer Perceptron Neural Networks
10 pages
HW 3 Unconstrained-Optimization Advanced
No ratings yet
HW 3 Unconstrained-Optimization Advanced
9 pages
Second Order Method: Newton Method Quasi Newton Method
No ratings yet
Second Order Method: Newton Method Quasi Newton Method
11 pages
May 2021 Examination Diet School of Mathematics & Statistics MT4537
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics MT4537
11 pages
Cloud Computing Coursework
No ratings yet
Cloud Computing Coursework
4 pages
May2015 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May2015 Examination Diet School of Mathematics & Statistics ID5059
9 pages
May 2021 Examination Diet School of Mathematics & Statistics MT4614
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics MT4614
6 pages
May 2021 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics ID5059
6 pages
Linear and Nonlinear Programming
No ratings yet
Linear and Nonlinear Programming
7 pages

Lecture20 Slides

Uploaded by

Lecture20 Slides

Uploaded by

Knowledge Discovery and Data Mining

Lecture 20 - More Neural Nets

School of Computer Science

Tom Kelsey ID5059-20-NN2 March 2021 1 / 35

Powerful – highly complex, nonlinear models

Tom Kelsey ID5059-20-NN2 March 2021 2 / 35

Neural networks process numeric data - care required in the

Numeric data can be scaled

Tom Kelsey ID5059-20-NN2 March 2021 4 / 35

These are not problems restricted to NN - but some methods are

Tom Kelsey ID5059-20-NN2 March 2021 5 / 35

Standard heuristic – n ≈ 10 × C , where C is the number of

Tom Kelsey ID5059-20-NN2 March 2021 6 / 35

NNs tolerate noise well – up to a limit

Tom Kelsey ID5059-20-NN2 March 2021 7 / 35

The future is not the past – changes in circumstance

Tom Kelsey ID5059-20-NN2 March 2021 8 / 35

Tom Kelsey ID5059-20-NN2 March 2021 9 / 35

Tom Kelsey ID5059-20-NN2 March 2021 10 / 35

Source: WikiMedia Commons

Tom Kelsey ID5059-20-NN2 March 2021 11 / 35

Source: WikiMedia Commons

Tom Kelsey ID5059-20-NN2 March 2021 12 / 35

Take Rθ as the resubstitution error given parameters θ1 , θ2 ...

Tom Kelsey ID5059-20-NN2 March 2021 13 / 35

Create little local problems to solve at each non-input node.

Tom Kelsey ID5059-20-NN2 March 2021 14 / 35

Tom Kelsey ID5059-20-NN2 March 2021 15 / 35

The following gives a flavour of what is involved.

Tom Kelsey ID5059-20-NN2 March 2021 16 / 35

Tom Kelsey ID5059-20-NN2 March 2021 17 / 35

You should be aware there are many ways to optimise problems

Tom Kelsey ID5059-20-NN2 March 2021 18 / 35

Back-propagation performed at all layers simultaneously.

xt+1 = xt − hHf (xt )−1 5 f (xt )

Tom Kelsey ID5059-20-NN2 March 2021 19 / 35

Works by constructing a series of line searches across the

Tom Kelsey ID5059-20-NN2 March 2021 20 / 35

All methods can get stuck at a local minimum

Tom Kelsey ID5059-20-NN2 March 2021 21 / 35

All methods can get stuck at a local minimum

Tom Kelsey ID5059-20-NN2 March 2021 22 / 35

Each internal layer will apply a function to the weighted

Tom Kelsey ID5059-20-NN2 March 2021 23 / 35

Yes or no, depending on a threshold

Tom Kelsey ID5059-20-NN2 March 2021 24 / 35

Not shown in the previuous figure as it is not a good

Tom Kelsey ID5059-20-NN2 March 2021 25 / 35

Sigmoid means “shaped like an S" so there are many

Tom Kelsey ID5059-20-NN2 March 2021 26 / 35

A scaled sigmoid function

Tom Kelsey ID5059-20-NN2 March 2021 27 / 35

Looks like a combination of step and linear, and hence the

Tom Kelsey ID5059-20-NN2 March 2021 28 / 35

The horizontal line in ReLu sends gradients towards zero

Tom Kelsey ID5059-20-NN2 March 2021 29 / 35

Hard Tanh – max(1, min(1, x))

Tom Kelsey ID5059-20-NN2 March 2021 30 / 35

A NN can be a very rich class of functions with even just a

Tom Kelsey ID5059-20-NN2 March 2021 31 / 35

Maintain an independent dataset which is not used to

Tom Kelsey ID5059-20-NN2 March 2021 32 / 35

Similar to the approach in tree-methods we can balance our

R and J are effectively in competition, and as we are using a

Tom Kelsey ID5059-20-NN2 March 2021 33 / 35

Lack of interpretability: these models are effectively

Tom Kelsey ID5059-20-NN2 March 2021 34 / 35

Consider Neural Nets when all or most of the following apply:

Tom Kelsey ID5059-20-NN2 March 2021 35 / 35

You might also like