L02 - 03 Crash Course On NN
L02 - 03 Crash Course On NN
•[email protected]
•Office: 4.24
Intro to NN and CNN
Neural Networks and Deep Learning
• Introduction to deep neural networks and derive the equations for
deep learning
• Two types of networks:
• Multilayer, fully connected, neural networks, whose inputs are pattern
vectors
• Convolutional neural networks which accept images as inputs
• Fundamental element:
• Linear computing elements (called artificial neurons) organised as networks
• Use these networks as tools for adaptively learning the parameters of
decision functions via successive presentations of training examples/patterns.
Introduction to neural networks
• Foundations of NN
• We start with a fundamental idea: Perceptron
• Although these computing elements are not used per se in current NN, the operations
they perform are almost identical to artificial neurons
https://www.cybercontrols.org/neuralnetworks
Perceptron
• A single perceptron unit learns a linear boundary between two
linearly separable pattern classes.
• E.g.:
Perceptron
• A linear boundary in 2D is a straight line with equation
• is the coefficient (the slope of the curve)
• is the y-intercept term…
• Also known as bias, bias coefficient, or bias
weight
• (yeah, bias, an overloaded term. It is not the
bias in statistics!)
• For higher dimensions we need a more general
notation:
• coordinates of a point
• coefficients
• bias
Perceptron
• The boundary separating classes in n dimensions would then be
a plane, or rather, a hyperplane:
𝑤 1𝑥+𝑤 2𝑥 2+𝑤 3 𝑥3+…+𝑤 𝑛 𝑥=0
• Also expressed as :
• Or in vector form:
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
• Learn: update free parameters
• Adding p1 to 1w would make
Incorrectly classifies p 1 as class 0
1w point more in the direction of p 1.
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
• Learn: update free parameters
• Adding p1 to 1w would make
1w point more in the direction of p 1.
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
• Learn: update free parameters
• Adding p1 to 1w would make correctly classifies p as class 1 1
incorrectly classifies p as class 1
1w point more in the direction of p 1.
2
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Keep going!
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Keep going!
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Keep going!
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Keep going!
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Three rules for updating
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Unified Update Rule
Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Adjustment to weight vector w at time step n:
Plots of E as a function of wx: (a) A value of η that is too small can slow down convergence. (b) if η is too large, there
may be large oscillations or divergence. (c) Shape of the error function E in 2D
Perceptron Convergence Algorithm
• Iteratively update weights until convergence
• Each iteration is called an epoch
• Iterate
• https://colab.research.google.com/drive/1vz21O1cAdkYCEB9I19-5KtYiI
yTHtFni?usp=sharing
X
Class A
Class B
??
Solving XOR
Adding a hidden layer... Decision boundaries
Neuron 1
Neuron 2
Mutilayer Feedforward
Neural Networks
Mutilayer Feedforward Neural Networks
• Neural networks
• interconnected perceptron-like computing elements called artificial neurons
• Formed from layers of computing units
• The output of one unit affects the behaviour of all units following it
• In a perceptron the activation function is a hard threshold
• Small variations cause large swings which is terrible in a network!
• A neuron has a smooth activation function:
Perceptron x neuron
• Except from more complicated notation, and the use of a smooth activation
function, a neuron performs the same operations as a perceptron
http://neuralnetworksanddeeplearning.com/chap1.html
Error Backpropagation
• We can overcome this problem by introducing a new type of
artificial neuron called a sigmoid neuron.
• Sigmoid neurons are similar to perceptrons, but modified!
small changes in their weights and bias
cause only a small change in their output.
http://neuralnetworksanddeeplearning.com/chap1.html
Error Backpropagation
0 to 1
1 to 0
0 to 1 0.51 to 0.49
1 to 0 0.49 to 0.51
No gradients to propagate
activation=tf.nn.relu
Error Backpropagation
• Forward pass or propagation of input to output
https://github.com/rasbt/python-machine-learning-book/blob/master/faq/visual-backpropagation.md
Error Backpropagation
• Backward pass or propagation of error (loss function)
• Gradient Descent
Error Backpropagation
• Weights Adjusted, New sample, feedforward
Error: Loss or Cost Function
• Loss function is basically it is a performance metric on how
well the MLP manages to reach its goal of generating outputs
as close as possible to the desired values.
Derivative of
transfer/activation
function
Error Backpropagation
• Backward pass or propagation of error (loss function)
• Gradient Descent
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/more_images/LossAlps.png
Network Performance
? Local Optimum
Weight Configuration
Initialization
• Initial weights decide which local optimum is reached
Backpropagation networks should be reset / trained multiple
times (keep the best)
Global Optimum
? Local Optimum
Possible starting
weights Note: Hill Climbing
Population Member
global optimum?
Momentum
• Combining current gradient and previous gradient:
Starting Weights
Learning Rate
• Learning rate η too small:
very slow progress
• Learning rate η too large:
oscillations or reductions in performance
• Adaptive learning rates
• Increase rate in Optimum
Starting Weights
Stopping Criteria
• Stop when a maximum number of epochs has been exceeded
• Stop when the mean squared error (MSE) on the training set is
small enough
• Stop when the gradient is below a desired threshold
• Stop when overfitting is observed
Problem Solved?
• Found the global optimum? (lucky!)
• i.e., the network performs optimally
on the data that it was trained on
! Does not guarantee any performance on unseen data
Possible scenario...
• I trained a network to recognize people from passport photos…
• …but it fails whenever a person smiles
The training data did not include any picture of people lee smiling!
Curse of Finite Sample Size
• A problem…
• Unlimited possibilities in nature
• …or at least a lot more than we can collect for training
So a classifier, in actual use, may encounter something new
• How can we guarantee that the classifier gives the best possible
response to this?
• Another problem…
• Collected sample may be noisy
• Inaccuracies in data collection
• May be different every time!
Generalization
• Input-output mapping of the network should be correct for data
never used in creating or training the network
• Generalization – the ability to produce satisfactory responses
to patterns that were not included in the training set
• Extra-sample error – the average prediction error for data that the
neural network has never seen
• In-sample error – is the average prediction error for data that the neural
network has been trained on
! In-sample (training) error is a poor predictor for extra-sample (testing)
error
Example
• Google Colaboratory Example
• https://colab.research.google.com/drive/1IsUmqqs-y0EAzmxaqj
VJWSVsRnrOXyjl?usp=sharing
Example
• Google Colaboratory Example
• Deep Learning Example Part01
https://colab.research.google.com/drive/1jcpFC8ZtSlRm-d1qdis
EPXqvxDHXyxMf?usp=sharing
Vector
5
9
https://cs231n.github.io/convolutional-networks/
https://en.wikipedia.org/wiki/Precision_and_recall
Metrics: Accuracy
• Accuracy:
• Number of correct predictions divided by the total number of predictions
• (TP + TN)/(TP +TN + FP + FN)
https://en.wikipedia.org/wiki/Confusion_matrix
https://en.wikipedia.org/wiki/Confusion_matrix
Example