0% found this document useful (0 votes)

29 views88 pages

Module 3 Final

The document discusses multi-layer perceptrons (MLPs), a type of artificial neural network. It covers topics like: - MLPs add hidden layers between the input and output layers, allowing them to represent more complex relationships than single-layer perceptrons. - The backpropagation algorithm is used to calculate error gradients and update weights in MLPs. Weights are adjusted to reduce the overall error across all training examples. - Variations of MLPs include using different activation functions for outputs, like linear for regression or softmax for classification. Batch training also updates weights based on the average error rather than individual examples.

Uploaded by

Infancy Pio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views88 pages

Module 3 Final

Uploaded by

Infancy Pio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

MACHINE LEARNING

DR. S. Chitrakala
Professor
DCSE/ CEG/ AU
Module 3
1. The Multi-Layer Perceptron
2. Back Propagation of Error
3. Multi-layer Perceptron in Practice
4. Deriving Back Propagation
5. Applications of MLP
The Multi-Layer Perceptron
Multi-layer Perceptron
 Majority of interesting problems are not linearly separable.
 That problems can be made linearly separable if we can work out how to transform
the features suitably.
 Consider making more complicated networks.
 Learning in the neural network happens in the weights. So, to perform more
computation it seems sensible to add more weights.
 There are two things that we can do:
 add some backwards connections, so that the output neurons connect to the inputs again, or
 add more neurons.

18
 The first approach leads into recurrent networks. but are not
that commonly used.
 We will instead consider the second approach. We can add
neurons between the input nodes and the outputs, and this
will make more complex neural networks,

19
XOR: Input (1, 0) corresponds to node A being 1 and B being 0.
 The input to neuron C is therefore −1 × 0.5 + 1 × 1 + 0 × 1 = −0.5 +
1 = 0.5. This is above the threshold of 0, and so neuron C fires,
giving output 1.
 For neuron D the input is −1 × 1 + 1 × 1 + 0 × 1 = −1 + 1 = 0, and so
it does not fire, giving output 0.
 Therefore the input to neuron E is −1×0.5+1×1+0×−1 = 0.5, so
neuron E fires.

20
MLP: GOING FORWARD:
 we start at the left by filling in the values for the inputs.
 We then use these inputs and the first level of weights to
calculate the activations of the hidden layer, and then we use
those activations and the next set of weights to calculate the
activations of the output layer.
 Now that we’ve got the outputs of the network, we can compare
them to the targets and compute the error.
 We need to include a bias input to each neuron.
 By having an extra input that is permanently set to -1, and
adjusting the weights to each neuron as part of the training.
 Thus, each neuron in the network (whether it is a hidden layer or
the output) has 1 extra input, with fixed value.
21
GOING BACKWARDS: BACK-PROPAGATION OF ERROR
 The error function that we used for the Perceptron was
where N is the number of output nodes.
 However, suppose that we make two errors.
 In the first, the target is bigger than the output, while in the second the
output is bigger than the target.
 If these two errors are the same size, then if we add them up we could
get 0, which means that the error value suggests that no error was made.

 To get around this we need to make all errors have the same
sign.
 sum-of-squares error function

22
 If we differentiate a error function, then it tells us the gradient of
that function, which is the direction along which it increases and
decreases the most.
 Since the purpose of learning is to minimize the error, following
the error function downhill (in other words, in the direction of the
negative gradient) will give us what we want.

23
THE MULTI-LAYER PERCEPTRON ALGORITHM
 There are
 L input nodes, plus the bias,
 M hidden nodes, also plus a bias, and
 N output nodes,
 So that there are (L+1)×M weights between the input and the hidden layer
and (M+1)×N between the hidden layer and the output.

 The sums that we write will start from 0 if they include the bias
nodes and 1 otherwise, and run up to L,M, or N, so that x0 = −1 is
the bias input, and a0 = −1 is the bias hidden node.
 The algorithm that is described could have any number of hidden
layers, in which case there might be several values for M, and
extra sets of weights between the hidden layers.
 We will also use i, j, k to index the nodes in each layer 24
1. An input vector is put into the input nodes
2. The inputs are fed forward through the network
 The inputs and the first-layer weights (here labelled as v) are used to
decide whether the hidden nodes fire or not. The activation function
g(·) is the sigmoid function given in Equation (4.2) above
 The outputs of these neurons and the second-layer weights (labelled
as w) are used to decide if the output neurons fire or not

3. The error is computed as the sum-of-squares difference

between the network outputs and the targets
4. This error is fed backwards through the network in order to
 first update the second-layer weights

 and then afterwards, the first-layer weights

25
26
27
28
Back Propagation Algorithm
INITIALIZING THE WEIGHTS
 Each neuron is getting input from n different places (either input
nodes if the neuron is in the hidden layer, or hidden neurons if it
is in the output layer).
 If we view the values of these inputs as having uniform variance,
then the typical input to the neuron will be
 where w is the initialization value of the weights. So a common
trick is to set the weights in the range
 where n is the number of nodes in the input layer to those
weights.
 This makes the total input to a neuron have a maximum size of
about 1.

53
 Further, if the weights are large, then the activation of a neuron is
likely to be at, or close to, 0 or 1 already, which means that the
gradients are small, and so the learning is very slow.
 There is an interplay here with the value of in the logistic
function, which means that small values of β (say β= 3.0 or less)
are more effective.
 We use random values for the initialization so that the learning
starts off from different places for each run, and we keep them all
about the same size because we want all of the weights to reach
their final values at about the same time. This is known as uniform
learning and it is important because otherwise the network will
do better on some inputs than others.

54
DIFFERENT OUTPUT ACTIVATION FUNCTIONS
 In the algorithm described above, we used sigmoid neurons in the
hidden layer and the output layer. This is fine for classification
problems, since there we can make the classes be 0 and 1.
 However, we might also want to perform regression problems,
where the output needs to be from a continuous range, not just 0
or 1.
 The sigmoid neurons at the output are not very useful in that
case.
 We can replace the output neurons with linear nodes that just
sum the inputs and give that as their activation (so g(h) = h ).
 This does not mean that we change the hidden layer neurons;
they stay exactly the same, and we only modify the output nodes.
55
 They are not models of neurons anymore, since they don’t have
the characteristic fire/don’t fire pattern.
 They enable us to solve regression problems, where we want a
real number out, not just a 0/1 decision.
 Soft-max activation function is most commonly used for
classification problems where the 1-of-N output encoding is used
 The soft-max function rescales the outputs by calculating the
exponential of the inputs to that neuron, and dividing by the total
sumof the inputs to all of the neurons, so that the activations sum
to 1 and all lie between 0 and 1.
 As an activation function it can be written as:

56
 If we change the activation function, then the derivative of the
activation function will also change, and so the learning rule will be
different.
 For the linear activation function
 Derivative of this is
 We can modify the error function as well, to have the cross-
entropy form (where ln is the natural logarithm):

57
SEQUENTIAL AND BATCH TRAINING
 All of the training examples are presented to the neural network,
the average sum-of-squares error is then computed, and this is
used to update the weights.
 Thus there is only one set of weight updates for each epoch (pass
through all the training examples).
 This means that we only update the weights once for each
iteration of the algorithm, which means that the weights are
moved in the direction that most of the inputs want them to
move, rather than being pulled around by each input individually.
 The batch method performs a more accurate estimate of the
error gradient, and will converge to the local minimum more
quickly.
58
 The algorithm that was described earlier was the sequential version,
where the errors are computed and the weights updated after each
input.
 This is not guaranteed to be as efficient in learning, but it is simpler
to program when using loops, and it is therefore much more
common.
 Since it does not converge as well, it can also sometimes avoid local
minima, thus potentially reaching better solutions.
 randomising the order of the input vectors at each iteration. This can
significantly improve the speed with which the algorithm learns.
 NumPy has a useful function that assists with this,
np.random.shuffle(), which takes a list of numbers and reorders
them.
59
LOCAL MINIMA
 There may be a much lower point over the next hill, but the ball can’t
see that, and it doesn’t have enough energy to climb over the hill and
find the global minimum
 We don’t know where the global minimum is because we don’t know
what the error landscape looks like;
 we can only compute local features of it for the place we are in at the
moment.
 Which minimum we end up in depends on where we start.
 If we begin near the global minimum, then we are very likely to end
up in it, but if we start near a local minimum we will probably end up
there.

60
 In addition, how long it will take to get to the minimum that we do
find depends upon the exact appearance of the landscape at the
current point.
 We can make it more likely that we find the global minimum by
trying out several different starting points by training several
different networks, and this is commonly done.
 However, we can also try to make it less likely that the algorithm will
get stuck in local minima.

61
PICKING UP MOMENTUM
 The reason that the ball stops rolling is because it runs out of energy
at the bottom of the dip.
 If we give the ball some weight, then it will generate momentum as it
rolls, and so it is more likely to overcome a small hill on the other
side of the local minimum, and so more likely to find the global
minimum.
 We can implement this idea in our neural network learning by adding
in some contribution from the previous weight change that we made
to the current one.
 In two dimensions it will mean that the ball rolls more directly
towards the valley bottom, since on average that will be the correct
direction, rather than being controlled by the local changes
62
 There is another benefit to momentum. It makes it possible to use a
smaller learning rate, which means that the learning is more stable.
 There is another benefit to momentum. It makes it possible to use a
smaller learning rate, which means that the learning is more stable.

 where t is used to indicate the current update and t − 1 is the

previous one. 0 < α < 1 is the momentum constant. Typically a value
of = 0.9 is used.
 Another thing that can be added is known as weight decay. This
reduces the size of the weights as the number of iterations increases.
 After each learning iteration through all of the input patterns, every
weight is multiplied by some constant 0 < ε < 1. This makes the
network simpler and can often produce improved results,
63
MINIBATCHES AND STOCHASTIC GRADIENT DESCENT
 Batch algorithm converges to a local minimum faster than the sequential algorithm,
which computes the error for each input individually and then does a weight update,
but that the latter is sometimes less likely to get stuck in local minima.
 The reason for both of these observations is that the batch algorithm makes a better
estimate of the steepest descent direction, so that the direction it chooses to go is a
good one, but this just leads to a local minimum.
 The idea of a minibatch method is by splitting the training set into random batches,
estimating the gradient based on one of the subsets of the training set, performing a
weight update, and then using the next subset to estimate a new gradient and using
that for the weight update, until all of the training set have been used.

64
 a single input vector is chosen from the training set, and the output
and hence the error for that one vector computed, and this is used
to estimate the gradient and so update the weights.
 A new random input vector (which could be the same as the
previous one) is then chosen and the process repeated.
 This is known as stochastic gradient descent, and can be used for
any gradient descent problem, not just the MLP.
 It is often used if the training set is very large, since it would be
very expensive to use the whole dataset to estimate the gradient in
that case.

65
 OTHER IMPROVEMENTS: One is to reduce the learning rate as the
algorithm progresses. The reasoning behind this is that the network
should only be making large-scale changes to the weights at the
beginning, when the weights are random; if it is still making large
weight changes later on, then something is wrong.
 Second derivatives of the error with respect to the weights

66
Multi-layer Perceptron in
Practice
1. Amount of Training Data
2. Number of Hidden Layers
3. When to Stop Learning
THE MULTI-LAYER PERCEPTRON IN PRACTICE: AMOUNT OF TRAINING DATA :
 For the MLP with one hidden layer there are (L + 1) ×M + (M + 1) × N weights,
where L,M,N are the number of nodes in the input, hidden, and output
layers, respectively.
 The extra +1s come from the bias nodes, which also have adjustable weights.
 This is a potentially huge number of adjustable parameters that we need to
set during the training phase.
 Setting the values of these weights is the job of the back-propagation
algorithm, which is driven by the errors coming from the training data.
 Clearly, the more training data there is, the better for learning, although the
time that the algorithm takes to learn increases.

68
 Unfortunately, there is no way to compute what the minimum
amount of data required is, since it depends on the problem.
 A rule of thumb that has been around for almost as long as the
MLP itself is that you should use a number of training examples
that is at least 10 times the number of weights.
 This is probably going to be a very large number of examples, so
neural network training is a fairly computationally expensive
operation, because we need to show the network all of these
inputs lots of times.
NUMBER OF HIDDEN LAYERS:
 it is possible to show mathematically that one hidden layer with
lots of hidden nodes is sufficient. This is known as the Universal
Approximation Theorem;
69
 You just have to experiment by training networks with different
numbers of hidden nodes and then choosing the one that gives the
best results
 we will never normally need more than two layers (that is, one
hidden layer and the output layer). This is because we can
approximate any smooth functional mapping using a linear
combination of localised sigmoidal functions.
 The basic idea is that by combining sigmoid functions we can
generate ridge-like functions, and by combining ridge-like functions
we can generate functions with a unique maximum.
 By combining these and transforming them using another layer of
neurons, we obtain a localised response (a ‘bump’ function), and any
functional mapping can be approximated to arbitrary accuracy using
70
71
WHEN TO STOP LEARNING

72
Applications of MLP
A Regression Problem
Classification with the MLP
A Classification Example: The Iris Dataset
Time-Series Prediction
Data Compression: The Auto-Associative Network
EXAMPLES OF USING THE MLP
A REGRESSION PROBLEM
 We will take a set of samples generated by a simple
mathematical function, and try to learn the generating function
(that describes how the data was made) so that we can find the
values of any inputs, not just the ones we have training data for.

74
75
76
CLASSIFICATION WITH THE MLP
 There are a couple of choices for the outputs. The first is to use a
single linear node for the output, y, and put some thresholds on
the activation value of that node.

 A more suitable output encoding is called 1-of-N encoding.

 A separate node is used to represent each possible class, and the
target vectors consist of zeros everywhere except for in the one
element that corresponds to the correct class, e.g., (0, 0, 0, 1, 0, 0)
means that the correct result is the 4th class out of 6.
 We are therefore using binary output values (we want each output
to be either 0 or 1).
77
 Simply choose the element yk of the output vector that is the largest
element of y (in mathematical notation, pick the yk for which
o so this statement says pick the yk that is bigger
than all other possible values yj ).
 This generates an unambiguous decision, since it is very unlikely that
two output neurons will have identical largest output values.
 This is known as the hard-max activation function (since the neuron
with the highest activation is chosen to fire and the rest are ignored).
 An alternative is the soft-max function, which has the effect of
scaling the output of each neuron according to how large it is in
comparison to the others, and making the total output sum to 1.

78
A CLASSIFICATION EXAMPLE: THE IRIS DATASET
 Classifying examples of three types of iris (flower) by the length and
width of the sepals and petals and is called iris.

79
80
81
TIME-SERIES PREDICTION
 The target data for training the neural network is simple, because it
comes from further up the time-series, and so training is easy.
Suppose that = 2 and k = 3.

82
 2855 data points. To minimise the sum-of-squares error. Since
there are no classes, the confusion matrix is not useful

83
84
DATA COMPRESSION: THE AUTO-ASSOCIATIVE NETWORK
 Suppose that we train the network to reproduce the inputs at the
output layer (called auto-associative learning; sometimes the
network is known as an autoencoder).
 The network is trained so that whatever you show it at the input is
reproduced at the output.
 This bottleneck hidden layer has to represent all of the information in
the input, so that it can be reproduced at the output.
 It therefore performs some compression of the data, representing it
using fewer dimensions than were used in the input.
 They are finding a different (often lower dimensional) representation
of the input data that extracts important components of the data,
and ignores the noise.
85
86
87
A RECIPE FOR USING THE MLP

 Select inputs and outputs for your problem

 Normalise inputs
 Split the data into training, testing, and validation sets
 Select a network architecture
 Train a network
 Test the network

Chapter3 - BP
No ratings yet
Chapter3 - BP
12 pages
8051 MCQ
No ratings yet
8051 MCQ
6 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
04_NeuralNetworksII
No ratings yet
04_NeuralNetworksII
74 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
34 pages
Artificial Neural Networks - MLP
No ratings yet
Artificial Neural Networks - MLP
52 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Back Propagation
100% (1)
Back Propagation
27 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
38 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Neural Network
100% (1)
Neural Network
54 pages
Neural
No ratings yet
Neural
53 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
ML 03
No ratings yet
ML 03
42 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
Back Propagation Learning Algorithm
No ratings yet
Back Propagation Learning Algorithm
15 pages
2025 Lecture07 P2 MLP
No ratings yet
2025 Lecture07 P2 MLP
56 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Lecture 2
No ratings yet
Lecture 2
52 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
Multi Layer Perceptron 1
No ratings yet
Multi Layer Perceptron 1
54 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
NN Lab2
No ratings yet
NN Lab2
5 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Unit 3
100% (1)
Unit 3
11 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
NN Introduction MES
No ratings yet
NN Introduction MES
39 pages
ML Unit 2 Lecture Notes
No ratings yet
ML Unit 2 Lecture Notes
20 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
ML Chapter 2
No ratings yet
ML Chapter 2
17 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
Fast Training of Multilayer Perceptrons
No ratings yet
Fast Training of Multilayer Perceptrons
15 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Introduction To Trigonometry
No ratings yet
Introduction To Trigonometry
27 pages
CBSE Notes for Class 10 Maths Chapter 5 – Arithmetic Progressions
No ratings yet
CBSE Notes for Class 10 Maths Chapter 5 – Arithmetic Progressions
2 pages
Similar Triangles
No ratings yet
Similar Triangles
34 pages
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
No ratings yet
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
9 pages
1 s2.0 S0893608024004775 Main
No ratings yet
1 s2.0 S0893608024004775 Main
14 pages
CP Session
No ratings yet
CP Session
18 pages
CD Module 8 Print
No ratings yet
CD Module 8 Print
58 pages
JJWXC Guide
No ratings yet
JJWXC Guide
45 pages
App Development-Unit 2
No ratings yet
App Development-Unit 2
15 pages
Batch Code Amount Password Time/Data Speed Expiration
No ratings yet
Batch Code Amount Password Time/Data Speed Expiration
22 pages
SMS 3251 Time Series Analysis I
No ratings yet
SMS 3251 Time Series Analysis I
2 pages
Ford EcoSport 2015 UK
No ratings yet
Ford EcoSport 2015 UK
28 pages
CN - Unit 1
No ratings yet
CN - Unit 1
360 pages
Electronic Banking and Customers Satisfaction in Yewa Ilaro Ogun State Nigeria
No ratings yet
Electronic Banking and Customers Satisfaction in Yewa Ilaro Ogun State Nigeria
5 pages
Esdat Unit 2 Activity 2
No ratings yet
Esdat Unit 2 Activity 2
4 pages
31 Non Directional Over Current Relays
No ratings yet
31 Non Directional Over Current Relays
3 pages
Dexter R. Allison
No ratings yet
Dexter R. Allison
2 pages
Onefs User Guide 6 5 5
No ratings yet
Onefs User Guide 6 5 5
271 pages
William Aspray Auth. Women and Underrepresented Minorities in Computing A Historical and Social Study
No ratings yet
William Aspray Auth. Women and Underrepresented Minorities in Computing A Historical and Social Study
274 pages
Advanced Microcontrollers and Interfacing
No ratings yet
Advanced Microcontrollers and Interfacing
36 pages
Application of PCA-CNN (Principal Component Analysis - Convolutional Neural Networks) Method On Sentinel-2 Image Classification For Land Cover Mapping
No ratings yet
Application of PCA-CNN (Principal Component Analysis - Convolutional Neural Networks) Method On Sentinel-2 Image Classification For Land Cover Mapping
5 pages
Coa Lab 3 Shahzeb
No ratings yet
Coa Lab 3 Shahzeb
12 pages
Windows KMS Activator Ultimate 2021 5.3
0% (1)
Windows KMS Activator Ultimate 2021 5.3
13 pages
UTD-SASE-2.1 Workshop Guide-20231026
No ratings yet
UTD-SASE-2.1 Workshop Guide-20231026
109 pages
Seven Stages of Procurement-2
No ratings yet
Seven Stages of Procurement-2
2 pages
Data Structures DPPs
No ratings yet
Data Structures DPPs
57 pages
Renaissance Compressor
No ratings yet
Renaissance Compressor
9 pages
CISCO ASA Firewall Quick Configuration Guide
No ratings yet
CISCO ASA Firewall Quick Configuration Guide
24 pages
Lecture 1
No ratings yet
Lecture 1
27 pages
CMU-SE 303 - Software Testing - 2020F - Lecture Slides - 5
No ratings yet
CMU-SE 303 - Software Testing - 2020F - Lecture Slides - 5
10 pages
WEST - QURNA-2 - GTPP - PHASE - 2 - LC-3 - LC-4 - Prot Coord Report - 20230922 - 024801
No ratings yet
WEST - QURNA-2 - GTPP - PHASE - 2 - LC-3 - LC-4 - Prot Coord Report - 20230922 - 024801
13 pages
Enterprise Gis PDF
No ratings yet
Enterprise Gis PDF
52 pages
Unit 6 Int 3
No ratings yet
Unit 6 Int 3
162 pages
Consolidated Assignment 23
No ratings yet
Consolidated Assignment 23
25 pages
Umpstep: Higher Quality Better Service!
No ratings yet
Umpstep: Higher Quality Better Service!
5 pages
Template - Unit Test Case
No ratings yet
Template - Unit Test Case
10 pages

Module 3 Final

Uploaded by

Module 3 Final

Uploaded by

MACHINE LEARNING

3. The error is computed as the sum-of-squares difference

 and then afterwards, the first-layer weights

 where t is used to indicate the current update and t − 1 is the

 A more suitable output encoding is called 1-of-N encoding.

 Select inputs and outputs for your problem

You might also like