0% found this document useful (0 votes)

23 views112 pages

L02 - 03 Crash Course On NN

This document provides a summary of key concepts from an introductory lecture on neural networks and convolutional neural networks: 1. The lecture introduced deep neural networks and derived equations for deep learning, covering two main types of networks: multilayer fully connected networks and convolutional neural networks. 2. A fundamental element of neural networks is linear computing elements called artificial neurons organized into networks. These networks are used as tools to adaptively learn the parameters of decision functions via successive training examples. 3. The lecture began with an introduction to the foundations of neural networks, starting with the basic perceptron model and algorithm. Perceptrons learn linear decision boundaries to classify patterns via iterative weight updates.

Uploaded by

Paulo Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views112 pages

L02 - 03 Crash Course On NN

Uploaded by

Paulo Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 112

•Lecturer: Paulo Santos

•[email protected]
•Office: 4.24
Intro to NN and CNN
Neural Networks and Deep Learning
• Introduction to deep neural networks and derive the equations for
deep learning
• Two types of networks:
• Multilayer, fully connected, neural networks, whose inputs are pattern
vectors
• Convolutional neural networks which accept images as inputs
• Fundamental element:
• Linear computing elements (called artificial neurons) organised as networks
• Use these networks as tools for adaptively learning the parameters of
decision functions via successive presentations of training examples/patterns.
Introduction to neural networks
• Foundations of NN
• We start with a fundamental idea: Perceptron
• Although these computing elements are not used per se in current NN, the operations
they perform are almost identical to artificial neurons

https://www.cybercontrols.org/neuralnetworks
Perceptron
• A single perceptron unit learns a linear boundary between two
linearly separable pattern classes.
• E.g.:
Perceptron
• A linear boundary in 2D is a straight line with equation
• is the coefficient (the slope of the curve)
• is the y-intercept term…
• Also known as bias, bias coefficient, or bias
weight
• (yeah, bias, an overloaded term. It is not the
bias in statistics!)
• For higher dimensions we need a more general
notation:
•  coordinates of a point
•  coefficients
•  bias
Perceptron
• The boundary separating classes in n dimensions would then be
a plane, or rather, a hyperplane:
𝑤 1𝑥+𝑤 2𝑥 2+𝑤 3 𝑥3+…+𝑤 𝑛 𝑥=0
• Also expressed as :

• Or in vector form:

• Were is known as the weight vector and is the bias

• We say that an arbitrary point (x1, x2, … xn) is on the positive

side of a line (hyperplane) if :
• (therefore belonging to class c1)
• Conversely for a point on the negative side
Perceptron
• The perceptron is an algorithm that finds this hyperplane separating
two classes by
• Iteratively stepping through the patterns of each of two classes
• Starts with an arbitrary weight vector and bias,
• The task is, given a pattern vector x from a vector population, to find a set of
weights such that:
Schematic diagram of a perceptron
• All this machine does is the sum of
products of an input pattern using the
weights and bias found during training.
• The output is a scalar value that is then
passed through an activation function
to produce a binary decision
• +1 : belongs to c1
• -1: belongs to c2
• For the perceptron the activation
function is a thresholding function.
https://gfycat.com/obviouscarelessblackfootedferret
Signal flow graph of the Perceptron
Perceptron Learning
• Test Problem
Perceptron Learning
• Test Problem

Find a line that separates the classes

Perceptron Learning
• Train a two-input/single-output network without a bias

Learning: “A process by which the free parameters of a neural network are adapted”

Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w

Incorrectly classifies p1 as class 0

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
• Learn: update free parameters
• Adding p1 to 1w would make
Incorrectly classifies p 1 as class 0
1w point more in the direction of p 1.

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Train a two-input/single-output network without a bias
• Randomly assign values to w
• Learn: update free parameters
• Adding p1 to 1w would make correctly classifies p as class 1 1
incorrectly classifies p as class 1
1w point more in the direction of p 1.
2

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Keep going!

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Three rules for updating

• Last rule is “No Change” when we have a match

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Unified Update Rule

Learning: “A process by which the free parameters of a neural network are adapted”
Learning: the free parameters in this model are w1,1 and w1,2
Perceptron Learning
• Adjustment to weight vector w at time step n:

Learning Desired Actual Input

rate output output vector
Delta rule (or Widrow-Hoff rule)
• Equivalent to:
• If output is high, reduce weights on active inputs
• If output is low, increase weights on active inputs
Effect of the learning rate η

Plots of E as a function of wx: (a) A value of η that is too small can slow down convergence. (b) if η is too large, there
may be large oscillations or divergence. (c) Shape of the error function E in 2D
Perceptron Convergence Algorithm
• Iteratively update weights until convergence
• Each iteration is called an epoch

1. Initialize weights w to random values

2. For each training example, compute
perceptron response y
3. Updates weights (+bias) using
learning rule (see previous slide)
4. Go back to step 2
Repeat until perceptron response
matches desired response
Colab Perceptron Examples
• Step-by-Step
• https://colab.research.google.com/drive/1bzbmIPt-mUslGgjFR5nBOtPz
gKtZ32PF?usp=sharing

• Iterate
• https://colab.research.google.com/drive/1vz21O1cAdkYCEB9I19-5KtYiI
yTHtFni?usp=sharing

• Iterate with Bias

• https://colab.research.google.com/drive/17ZEu9Yf3y0Ozu32D5fqHcah
cBY70-lIR?usp=sharing
Perceptron Limitations
• Decision surface is a hyperplane
• Classes must be linearly separable!
• Perceptron cannot learn XOR, or parity function in general

X
Class A

Class B

??
Solving XOR
Adding a hidden layer... Decision boundaries

Neuron 1

Neuron 2
Mutilayer Feedforward
Neural Networks
Mutilayer Feedforward Neural Networks
• Neural networks
• interconnected perceptron-like computing elements called artificial neurons
• Formed from layers of computing units
• The output of one unit affects the behaviour of all units following it
• In a perceptron the activation function is a hard threshold
• Small variations cause large swings  which is terrible in a network!
• A neuron has a smooth activation function:
Perceptron x neuron
• Except from more complicated notation, and the use of a smooth activation
function, a neuron performs the same operations as a perceptron

• denotes the sum of products

• The output (denoted by a) is
obtained by passing through
• is the activation function
• Its output is the activation value
of the unit
• The inputs to a neuron are
activation values from neurons
in the previous layer
Error Backpropagation
• The problem is that this isn't what happens when our network
contains perceptrons.
• A small change in the weights or bias of any single perceptron
in the network can sometimes cause the output of that
perceptron to completely flip, say from 0 to 1.
• That flip may then cause the behaviour
of the rest of the network to completely
change in some very complicated way.

http://neuralnetworksanddeeplearning.com/chap1.html
Error Backpropagation
• We can overcome this problem by introducing a new type of
artificial neuron called a sigmoid neuron.
• Sigmoid neurons are similar to perceptrons, but modified!
small changes in their weights and bias
cause only a small change in their output.

• That's the crucial fact which will allow a

network of sigmoid neurons to
learn incrementally

http://neuralnetworksanddeeplearning.com/chap1.html
Error Backpropagation

0 to 1
1 to 0

small change in Z, big effect!

Error Backpropagation

0 to 1 0.51 to 0.49
1 to 0 0.49 to 0.51

small change in Z, big effect! small change Z, small effect!

MLP: Activation Functions
nonconstant, bounded, and monotone-increasing continuous function.
Activation functions
Vanishing and Exploding Gradients
• Vanishing Gradient
• Error travels from the output layer towards the input layer.
• The gradients often get smaller and smaller and approach zero.
• Eventually leaves the weights of the initial or lower layers nearly
unchanged.
• As a result, the gradient descent never converges to the optimum
• Gradient Explosion
• Error gradients can accumulate during an update and result in very
large gradients
• result in large updates to the network weights
• in turn, an unstable network
https://www.analyticsvidhya.com/blog/2021/06/the-challenge-of-vanishing-exploding-gradients-in-deep-neural-networks/

Vanishing and Exploding Gradients

• Vanishing Gradient
• Saturates at 0 or 1 with a derivative
All the fun happens very close to zero
here
• Backpropagation has no gradients to
propagate!

No gradients to propagate
activation=tf.nn.relu

Vanishing and Exploding Gradients

• Better, non-saturating activation functions
• ReLU and leaky ReLU

Rectified Linear Unit

Interconnecting neurons to form a fully
connected neural net
• A layer in a network is the set of nodes (neurons) in a column of the network
• All the nodes are artificial neurons
• Except the input layer, whose nodes are components of the input pattern vector x
• Each layer in the network can have a different number of nodes,
• But each node has a single output
• The output of every node is connected to the input of all nodes in the next layer to form a fully
connected network
• Values of the first layer: inputs
• Values of the last layer: outputs
• All the others: Hidden layers -- hidden neurons
• NN with a single layers: shallow neural network
• NN with multiple hidden layers: deep neural network
Task of a neural network
• Determine the class membership of unknown input patterns
• One way to do this: Assigning a class label to each output neuron
• Thus, an NN with outputs can classify an unknown pattern into one of classes
• The NN assigns an unknown pattern vector to a class if the output neuron
has the largest activation value
Backpropagation
• A NN is defined completely by its weights, biases and activation
function.
• Training a NN:
• Use a dataset to estimate these parameters
• During training, we know the desired output
• But there is no way of knowing the values of the outputs of the hidden layers
• Backpropagation:
• Tool of choice for finding the value of weights and biases in a multilayer network
Training by backpropagation
1. Input the pattern vectors
2. Forward pass through the NN to
• Classify the patterns in the training set
• Determine the classification/output error
• Output error is calculated by a cost function (quadratic cost, cross-entropy, etc)
3. Backward (backpropagation) pass
• “Distributes” the output error back through the network to compute the
changes required to update the parameters
4. Updating the weights and biases in the network
• Using gradient descent
https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/

Error Backpropagation
• Forward pass or propagation of input to output
https://github.com/rasbt/python-machine-learning-book/blob/master/faq/visual-backpropagation.md

Error Backpropagation
• Backward pass or propagation of error (loss function)
• Gradient Descent
Error Backpropagation
• Weights Adjusted, New sample, feedforward
Error: Loss or Cost Function
• Loss function is basically it is a performance metric on how
well the MLP manages to reach its goal of generating outputs
as close as possible to the desired values.

loss = (Desired output - actual output)

loss = Absolute value of (desired – actual)
https://towardsdatascience.com/common-loss-functions-in-machine-learning-46af0ffc4d23

MLP: Loss Function

• Mean Square Error (MSE)/Quadratic Loss/L2 Loss
• predictions which are far away from actual values are penalized heavily

• Cross-Entropy Loss (most common in classification)

• increases as the predicted probability diverges from the actual label
• penalizes heavily the predictions that are confident but wrong
Backprop Algorithm
1. Initialize weights w to random values Actual output of
2. (For each training example...) neuron in layer l-1

Perform forward pass

1. Compute induced local fields:
2. Compute neuron outputs: Transfer/Activation
function
3. Compute error signal: Actual network
3. Perform backward pass Desired network
output

1. Compute local gradients: output

Derivative of
transfer/activation
function

2. Adjust weight vectors of all neurons

4. Go back to step 2 Momentum Learning

constant rate
Repeat until stopping criterion is met
Learning is Searching
• Learning = searching a hypothesis space
• E.g., weight set of an ANN
Known mathematical relationship between weight values and
error signal helps us in the search
• Gradient descent/hill-climbing:
Find the optimal parameter changes following the steepest path
up/down on the performance hypersurface
• e.g., delta rule or backpropagation algorithm for perceptron weights
Gradient Learning
• For a cost/eval function ε (•) and each parameter/weight,
iteratively update:

• …for all parameters until a given stopping criterion is met

• We usually start from some random
https://github.com/rasbt/python-machine-learning-book/blob/master/faq/visual-backpropagation.md

Error Backpropagation
• Backward pass or propagation of error (loss function)
• Gradient Descent
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/more_images/LossAlps.png

Local and global Minima

https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e

Gradient Descent: effect of optimizers

Limitations of Local Search
For backpropagation, transfer functions must be differentiable
(for cost function to be differentiable)
• Local optima occur with nonlinear hypothesis spaces

Multi-layer perceptron (MLP)

Global Optimum

Network Performance
? Local Optimum

Note: Hill Climbing

Starting Weights

Must randomize weights

until global optimum is reachable?

Weight Configuration
Initialization
• Initial weights decide which local optimum is reached
Backpropagation networks should be reset / trained multiple
times (keep the best)

Global Optimum

? Local Optimum

Possible starting
weights Note: Hill Climbing
Population Member

global optimum?
Momentum
• Combining current gradient and previous gradient:

• where α is a momentum constant

Smooths weight changes and suppresses oscillations
Accelerates learning
in the same direction
Enables escape from Optimum
Note: Hill Climbing
small local optima Momentum update

Starting Weights
Learning Rate
• Learning rate η too small:
very slow progress
• Learning rate η too large:
oscillations or reductions in performance
• Adaptive learning rates
• Increase rate in Optimum

absence of oscillations; Note: Hill Climbing

Optimum

decrease otherwise Weight

oscillations

Starting Weights
Stopping Criteria
• Stop when a maximum number of epochs has been exceeded
• Stop when the mean squared error (MSE) on the training set is
small enough
• Stop when the gradient is below a desired threshold
• Stop when overfitting is observed
Problem Solved?
• Found the global optimum? (lucky!)
• i.e., the network performs optimally
on the data that it was trained on
! Does not guarantee any performance on unseen data

Possible scenario...
• I trained a network to recognize people from passport photos…
• …but it fails whenever a person smiles
 The training data did not include any picture of people lee smiling!
Curse of Finite Sample Size
• A problem…
• Unlimited possibilities in nature
• …or at least a lot more than we can collect for training
So a classifier, in actual use, may encounter something new
• How can we guarantee that the classifier gives the best possible
response to this?
• Another problem…
• Collected sample may be noisy
• Inaccuracies in data collection
• May be different every time!
Generalization
• Input-output mapping of the network should be correct for data
never used in creating or training the network
• Generalization – the ability to produce satisfactory responses
to patterns that were not included in the training set
• Extra-sample error – the average prediction error for data that the
neural network has never seen
• In-sample error – is the average prediction error for data that the neural
network has been trained on
! In-sample (training) error is a poor predictor for extra-sample (testing)
error
Example
• Google Colaboratory Example

• https://colab.research.google.com/drive/1IsUmqqs-y0EAzmxaqj
VJWSVsRnrOXyjl?usp=sharing
Example
• Google Colaboratory Example
• Deep Learning Example Part01
https://colab.research.google.com/drive/1jcpFC8ZtSlRm-d1qdis
EPXqvxDHXyxMf?usp=sharing

• Deep Learning Example Part02

https://colab.research.google.com/drive/1KWvhT8mUnI4PMtox
6cd98eqQiw5xxEow?usp=sharing

• Deep Learning Example Part03

https://colab.research.google.com/drive/1_qUCLPB7MzDBMws
5I2LvJHWt5rJsiUtw?usp=sharing
Deep Convolutional Neural
Networks (CNN)
Deep Convolutional Neural Networks (CNN)
• Up to this point: patterns were organised in terms of feature vectors
• The form of these features are specified by a human designer
• Extracted from the images prior to being input to the NN
• Convolutional Neural Networks:
• Accept images as inputs
• Learn the features as well as the classification
Basic CNN architecture
Basics of a CNN operation
• The type of neighbourhood processing in a CNN is spatial convolution
• Computes a sum of products between pixels and a set of kernel weights
• At every spatial location in the input image
• The result at each (x,y) is a scalar value
• This scalar value is the output of a neuron
• Adding a bias passing the result through an activation function
•  we have our good old NN!
• Neighbourhoods  Receptive Fields (RF)
• The receptive fields move over the image executing convolution
• The set of weights, arranged as a receptive field, is a kernel
• Number of spatial increments of RF: strides
• To each convolution value we add a bias
• Then pass the result through an activation function to generate a single value
• This value is fed to the corresponding location in the input of next layer
• This is repeated to all locations in the input image, resulting in a 2D set of values stored in the next layer
as a 2D array called feature map
•  the role of convolution here is to extract features, such as edges, points, blobs
• Convolutional layer:
• three features maps, obtained from three distinct kernels!
• After convolution and activation:
• Subsampling (or pooling):
• Produces pooled features maps: Pooling Layer
• Reduction in spatial resolution:
• responsible for translational invariance
• Reduces the volume of data
• Done by subdividing the feature maps into a set of small (typically 2x2) regions:
• Pooling neighbourhoods
• Replacing all the values of that neighbourhood by a single value
• Common pooling methods:
• Average pooling: substitute by the average
• Max-pooling: substitute by the max value
• L2 pooling: substitute by the square root of the sum
• Convolution:
• Filtered images
• Pooling:
• Filtered images of lower resolution
• The pooled filter maps in the first layer become the inputs to the next layer
• But we now have multiple pooled feature maps
• As convolution is a linear operation (remember assignment 1??)
• The values can be combined into a single one by superposition

• The ultimate goal is classification:

• The final pooled feature maps are fed into a Fully Connected Neural Net
• As we’ve seen before  the input should be vectorized.
Example

• Think of each element of a 2D array in the top row as a

neuron
• The outputs of these neurons are pixel values, creating
feature maps
• The neurons in the feature map of the 1st layer have
output values generated by convolving with the input
image a kernel, whose size and shape are the same as the
receptive field
• And whose coefficients are learned during training
• To each convolution value we add a bias and pass the
result through an activation function to generate the
output value of the corresponding neuron in the feature
map
• The output values of neurons in the pooled feature maps
are generated by pooling the output values of neurons in
the feature maps
• The kernel weights (shown as intensity values) are
learned from sample images using backpropagation
• Therefore, the nature of the learned features is determined
by the learned kernel coefficients
Graphical illustration of the functions
performed by the components of a CNN
Feature Pooled Feature Pooled Neural
maps feature maps feature net
maps maps
0

Vector
5

9
https://cs231n.github.io/convolutional-networks/

CNN: Layer Architecture

The architecture shown here is a tiny VGG Net http://www.robots.ox.ac.uk/~vgg/research/very_deep/

pre-trained weights: https://keras.io/api/applications/

Teaching a CNN to recognise simple images
Teaching a CNN to recognise simple images

Training Image Set Test Image Set

CNN to recognise handwritten numerals
(MNIST dataset)
• 60,000 training images
• 10,000 test images
• Grayscale images of size 28x28 pixels
Architecture of the CNN trained to recognise
ten digits in the MNIST dataset
Kernels
Results of a forward pass
Let’s play with this
• http://www.cs.cmu.edu/~aharley/vis/
CIFAR-10 Dataset
Same architecture as before
Kernels of the first convolution layer
Kernels of the 2nd convolution layer
Graphical illustration of a forward pass
Example
• Google Colaboratory Example
• Deep Learning Example Part04: CNN
https://colab.research.google.com/drive/1FTl7TWxd05gj13hPC
2A-IrKnNx7W9cHS?usp=sharing
Training and evaluating a classifier
• The dataset should be divided into:
• Training set
• Usually 50% of the dataset
• Used to create a model:
• given the class label for each pattern
• the goal is to adjust the parameters in order to assign the right class to the appropriate example (data item)
• Validation set
• 25%
• Check if the performance objective is met against unseen data, if not train the classifier again (tweaking
the classifier design)
• Test set
• 25%
• Check the behaviour of the classifier with unseen data
• Almost like the validation set, but without trying to make it better this time!

• If training/validation results are acceptable but test results are not

• The training was over fit the system parameters to the available data
• Wrong architecture , small dataset, noisy/biased dataset
HEADS UP: THIS IS THE MOST IMPORTANT
PART OF THE WHOLE COURSE!
• The dataset should be divided into:
• Training set
• Usually 50% of the dataset
• Used to create a model:
• given the class label for each pattern
• the goal is to adjust the parameters in order to assign the right class to the appropriate example (data item)
• Validation set
• 25%
• Check if the performance objective is met against unseen data, if not train the classifier again (tweaking
the classifier design)
• Test set
• 25%
• Check the behaviour of the classifier with unseen data
• Almost like the validation set, but without trying to make it better this time!

• If training/validation results are acceptable but test results are not

• The training was over fit the system parameters to the available data
• Wrong architecture , small dataset, noisy/biased dataset
Classification metrics
• Key methods:
• Accuracy
• Recall
• Precision
• F1-Score
• Model evaluation:
• Either its answer is correct
• Or incorrect
• After training the model, we want to evaluate it
• Input a validation example (with known label): e.g. Image of a Dog
• Get the output of the model classification: e.g. is it a Dog or a Cat?
• Let’s assume we want to classify Dog
• Outputs of the model
• Input Dog => output Dog  True positive
• Input Cat => output Dog  False positive
• Input Dog => output Cat  False negative
• Input Cat => output Cat  True negative
• Repeat this to all the validation/test data
• Count the total number of TP, TN, FP, FN
• Calculate the metrics with these
Confusion Matrix

https://en.wikipedia.org/wiki/Precision_and_recall
Metrics: Accuracy
• Accuracy:
• Number of correct predictions divided by the total number of predictions
• (TP + TN)/(TP +TN + FP + FN)

• Useful when the target classes are well-balanced

• Not useful if that is unbalanced:
• 99% images of dogs and 1% of Cats
• A model that only predicts Dogs would have great accuracy, but it would be a terrible
classification model!
• We’ll have to look into Recall and Precision as well!
Metrics: Recall
• Recall
• Measures the extent of which the model find all the relevant information
• Number of true positives divided by the number of true positives plus the
number of true negatives
• TP/(TP+TN)
Metrics: Precision
• Precision:
• Measures the extent of which the model find only the relevant information
• Number of true positives divided by the number of true positives plus the
number of false positives
• TP / (TP + FP)
Precision vs Recall
• There is a trade off here:
• While precision measures the proportion of actual relevant data points your
model says it is relevant
• Recall measures the ability of finding all relevant instances

• Precision vs Recall curves are very useful as an analysis tool!

F1 Score
• F1 Score
• Measures the trade off between precision and recall
• It is the Harmonic Mean of precision and recall:

• Why not the average?

• Because it does not punish extreme values
• E.g. Precision = 1, but recall = 0
• Average = 0.5
• F1 score = 0
Other metrics…

https://en.wikipedia.org/wiki/Confusion_matrix
https://en.wikipedia.org/wiki/Confusion_matrix
Example

• Accuracy ((TP+TN)/total) = 0.91

• Precision (TP/TP+FP) = 0.91
• Recall (TP/TP + FN) = 0.95
• https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
Further important reading:
• Bias in Machine Learning
https://towardsdatascience.com/june-edition-bias-in-the-machine-994eadbccec2

ANN Unit-2 Chapter-2
No ratings yet
ANN Unit-2 Chapter-2
56 pages
Topic 3i - Artificial Neural Networks - Revised 20032020
100% (1)
Topic 3i - Artificial Neural Networks - Revised 20032020
70 pages
Neural N Problems - SLP
No ratings yet
Neural N Problems - SLP
123 pages
Clase3 Redunidireccional
No ratings yet
Clase3 Redunidireccional
74 pages
Lect 13 Perceptron SLP
No ratings yet
Lect 13 Perceptron SLP
15 pages
NN Lec - 02
No ratings yet
NN Lec - 02
63 pages
Lecture 03 Perceptron PDF
No ratings yet
Lecture 03 Perceptron PDF
12 pages
L13 Artificial Neural Network
No ratings yet
L13 Artificial Neural Network
45 pages
CV 2025 Spring 14
No ratings yet
CV 2025 Spring 14
33 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
Learning Algorithm
No ratings yet
Learning Algorithm
58 pages
UNIT1
No ratings yet
UNIT1
72 pages
Lecture 5 NN
No ratings yet
Lecture 5 NN
57 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Neural - N - Problems - SLP
No ratings yet
Neural - N - Problems - SLP
123 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
M. Raihan
No ratings yet
M. Raihan
20 pages
NN 2
No ratings yet
NN 2
42 pages
IML5
No ratings yet
IML5
21 pages
Ch1-Fundamental of Neural Network
No ratings yet
Ch1-Fundamental of Neural Network
59 pages
03 NeuralNetworksI PDF
100% (1)
03 NeuralNetworksI PDF
78 pages
Unit 1 Until MLP
No ratings yet
Unit 1 Until MLP
56 pages
Perceptron
No ratings yet
Perceptron
11 pages
Lecture-2 Learning Process45452465442
No ratings yet
Lecture-2 Learning Process45452465442
50 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
Neural Network
No ratings yet
Neural Network
82 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Uni2 NN 2023
No ratings yet
Uni2 NN 2023
52 pages
Perceptron Learning
No ratings yet
Perceptron Learning
19 pages
ML Lec11
No ratings yet
ML Lec11
14 pages
Schneider - Ch16 - Inv To CS 8e
No ratings yet
Schneider - Ch16 - Inv To CS 8e
33 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
53 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
Perceptron Learning Rules
50% (2)
Perceptron Learning Rules
38 pages
Neural Networks Neural Networks
No ratings yet
Neural Networks Neural Networks
30 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Lecture 9
No ratings yet
Lecture 9
97 pages
Differential Equations
From Everand
Differential Equations
Harry Hochstadt
3.5/5 (2)
Percept Rons
No ratings yet
Percept Rons
68 pages
Perceptrons Algorithm PDF
No ratings yet
Perceptrons Algorithm PDF
68 pages
Uni2 NNDL
No ratings yet
Uni2 NNDL
21 pages
10th Holiday Worksheet
No ratings yet
10th Holiday Worksheet
28 pages
Slide 2
No ratings yet
Slide 2
35 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Neural Networks Two
No ratings yet
Neural Networks Two
69 pages
Neural Networks
No ratings yet
Neural Networks
42 pages
Slides MC5
No ratings yet
Slides MC5
253 pages
Perceptron Lecture 3
No ratings yet
Perceptron Lecture 3
25 pages
Unit 1
No ratings yet
Unit 1
19 pages
Sundered Isles Truths Workbook
No ratings yet
Sundered Isles Truths Workbook
12 pages
Perceptron PDF
No ratings yet
Perceptron PDF
37 pages
Bodh Prakash - Writing Partition - Aesthetics and Ideology in Hindi and Urdu Literature-Pearson (2008)
No ratings yet
Bodh Prakash - Writing Partition - Aesthetics and Ideology in Hindi and Urdu Literature-Pearson (2008)
232 pages
Ontology Brain
No ratings yet
Ontology Brain
20 pages
Unit 4
No ratings yet
Unit 4
9 pages
Perceptron PDF
0% (1)
Perceptron PDF
8 pages
The Girl From Ipanema
No ratings yet
The Girl From Ipanema
1 page
Alice in Teh Wonderland
No ratings yet
Alice in Teh Wonderland
1 page
Practice Test 2 Key
No ratings yet
Practice Test 2 Key
4 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
Ann Mid1: Artificial Neural Networks With Biological Neural Network - Similarity
No ratings yet
Ann Mid1: Artificial Neural Networks With Biological Neural Network - Similarity
13 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
SAP Audit
100% (2)
SAP Audit
23 pages
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
No ratings yet
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
11 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Poems Absurdities
No ratings yet
Poems Absurdities
6 pages
2007 02 01b Janecek Perceptron
No ratings yet
2007 02 01b Janecek Perceptron
37 pages
Facie Evidence of Engaging in Recruitment and Placement. (Klepp vs. Odin TP., Mchenry County 40 ND N.W. 313, 314.)
No ratings yet
Facie Evidence of Engaging in Recruitment and Placement. (Klepp vs. Odin TP., Mchenry County 40 ND N.W. 313, 314.)
71 pages
Lesson 3 - The Global Economy
No ratings yet
Lesson 3 - The Global Economy
6 pages
The Elements of Journalism and The Philippines
50% (2)
The Elements of Journalism and The Philippines
5 pages
Interrupt and Precise Exception: Computer System Architecture
No ratings yet
Interrupt and Precise Exception: Computer System Architecture
21 pages
Malaysian Boy Names - Malay Boys Name With Meaning
No ratings yet
Malaysian Boy Names - Malay Boys Name With Meaning
1 page
Teknik Pemeriksaan CT Scan Sinus Paranasal
No ratings yet
Teknik Pemeriksaan CT Scan Sinus Paranasal
22 pages
CE2602-CE2603 MATLAB Assignment 2018-2019
100% (1)
CE2602-CE2603 MATLAB Assignment 2018-2019
7 pages
Folklore, Turbofolk and The Boundaries of Croatian Musical
No ratings yet
Folklore, Turbofolk and The Boundaries of Croatian Musical
37 pages
Humanoid Robot Presentation Through Multimodal Presentation Markup Language MPML-HR
No ratings yet
Humanoid Robot Presentation Through Multimodal Presentation Markup Language MPML-HR
7 pages
The Personal Side of Policing
No ratings yet
The Personal Side of Policing
2 pages
Schneider - Ch13 - Inv To CS 8e
No ratings yet
Schneider - Ch13 - Inv To CS 8e
38 pages
8VO English TEEN 8 A1 - 1 - StudentsBook Pag 10 A 20
No ratings yet
8VO English TEEN 8 A1 - 1 - StudentsBook Pag 10 A 20
11 pages
Insctructions:: Arab Open University Faculty of Business Studies
No ratings yet
Insctructions:: Arab Open University Faculty of Business Studies
4 pages
Lecture Week2 2021 Cobot Basic Concepts
No ratings yet
Lecture Week2 2021 Cobot Basic Concepts
33 pages
Offences Relating Marriage
No ratings yet
Offences Relating Marriage
12 pages
Ifism Vol 8 and 12
No ratings yet
Ifism Vol 8 and 12
6 pages
Learning Content General Mathematics
No ratings yet
Learning Content General Mathematics
40 pages
RTD Tea - Indonesia
No ratings yet
RTD Tea - Indonesia
9 pages
A Versions of Cause and Effect in Technology and Society
No ratings yet
A Versions of Cause and Effect in Technology and Society
2 pages
Mechanical Engineering Objective - Book - PDF
100% (7)
Mechanical Engineering Objective - Book - PDF
1,453 pages
Assignment No 1 9073
No ratings yet
Assignment No 1 9073
11 pages
The Audit Process - Final Work Specific Problems
No ratings yet
The Audit Process - Final Work Specific Problems
3 pages
Using Storyboarding and de Bono
No ratings yet
Using Storyboarding and de Bono
11 pages
Shannon Young Resume February 2019-2
No ratings yet
Shannon Young Resume February 2019-2
2 pages
Difference Between Speak and Talk
No ratings yet
Difference Between Speak and Talk
2 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Grammer Exercise
100% (1)
Grammer Exercise
4 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet

L02 - 03 Crash Course On NN

Uploaded by

L02 - 03 Crash Course On NN

Uploaded by

•Lecturer: Paulo Santos

• Were is known as the weight vector and is the bias

• We say that an arbitrary point (x1, x2, … xn) is on the positive

Find a line that separates the classes

Incorrectly classifies p1 as class 0

• Last rule is “No Change” when we have a match

Learning Desired Actual Input

1. Initialize weights w to random values

• Iterate with Bias

• denotes the sum of products

• That's the crucial fact which will allow a

small change in Z, big effect!

small change in Z, big effect! small change Z, small effect!

Vanishing and Exploding Gradients

Vanishing and Exploding Gradients

Rectified Linear Unit

loss = (Desired output - actual output)

MLP: Loss Function

• Cross-Entropy Loss (most common in classification)

Perform forward pass

1. Compute local gradients: output

2. Adjust weight vectors of all neurons

4. Go back to step 2 Momentum Learning

• …for all parameters until a given stopping criterion is met

Local and global Minima

Gradient Descent: effect of optimizers

Multi-layer perceptron (MLP)

Note: Hill Climbing

Must randomize weights

• where α is a momentum constant

absence of oscillations; Note: Hill Climbing

decrease otherwise Weight

• Deep Learning Example Part02

• Deep Learning Example Part03

• The ultimate goal is classification:

• Think of each element of a 2D array in the top row as a

CNN: Layer Architecture

pre-trained weights: https://keras.io/api/applications/

Training Image Set Test Image Set

• If training/validation results are acceptable but test results are not

• If training/validation results are acceptable but test results are not

• Useful when the target classes are well-balanced

• Precision vs Recall curves are very useful as an analysis tool!

• Why not the average?

• Accuracy ((TP+TN)/total) = 0.91

You might also like