Artificial Neural Networks: Multilayer Perceptrons Backpropagation

This document discusses artificial neural networks and multilayer perceptrons. It describes how multilayer perceptrons can have one or more hidden layers with different numbers of nodes and activation functions. Backpropagation is introduced as a learning algorithm that allows neural networks to learn through iterative adjustment of weights to minimize error. The document provides examples of how multilayer perceptrons can solve problems like the XOR problem and discusses their capabilities for forming decision regions and approximating functions.

Uploaded by

dinkarbhombe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views71 pages

Artificial Neural Networks: Multilayer Perceptrons Backpropagation

Uploaded by

dinkarbhombe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 71

Artificial Neural Networks

MultiLayer Perceptrons
Backpropagation

Berrin Yanikoglu
Nov. 2003
Capabilities of
Multilayer Perceptrons
Multilayer Perceptron
In Multilayer perceptrons, there may be one or more hidden layer(s) which are
called hidden since they are not observed from the outside.

Multilayer Perceptron
Each layer may have different number of nodes and different activation functions
Commonly:
Same activation function within one layer
Typically,
sigmoid activation function is used in the hidden units, and
sigmoid or linear activation functions are used in the output units
depending on the problem (classification or function approximation)

In feedforward networks, activations are passed only from one layer to the next
Backpropagation
Capabilities of multilayer NNs were known, but a learning algorithm was
introduced by Werbos (1974); made famous by Rumelhart and
McClelland (mid 1980s - the PDP book)
Started massive research in the area

XOR problem
Learning Boolean functions: 1/0 output can be seen as a 2-class
classification problem

Xor can be solved by a 1-hidden layer network

XOR problem
1
W
1

W
2
-1
-1
1
1
-0.5
1.5
1

1
-1.5
W
1
1
= [ 1 1 -0.5]
W
1
2
= [-1 -1 1.5]
W
2
= [ 1 1 -1.5]
Notice how each node implements a decision
boundary and the output node combines (AND)
their result.
boundary
1
(implemented by node1)
boundary
2
2
Capabilities (Hardlimiting nodes)
Single layer
Hyperplane boundaries

1-hidden layer
Can form any, possibly unbounded convex region

2-hidden layers
Arbitrarily complex decision regions

Capabilities: Decision Regions
From Lippmans NN tutorial
When the hardlimiting non-
linearities are replaced with
sigmoidal nonlinearities,
similar behavior is
observed except that the
decision regions are
replaced by smooth curves
instead of straight lines.

Capabilities (Hardlimiting nodes)
2-hidden layers (see Lippman, 1987):
First hidden layer computes regions

2nd hidden layer computes an AND operation (one for each hypercube -
worst case: #of disconnected regions)
about 1/3 the # of nodes in the first hidden layer

Output layer computes an OR operation

No more than 2-hidden layers is ever required

Capabilities
Every bounded continuous function can be approximated arbitrarily
accurately by 2 layers of weights (1-hidden layer) and sigmoidal units
(Cybenko 1989, Hornik et al. 1989)
Discontinuities can be theoretically tolerated for most real life
problems. Also, functions without compact support can be learned
under some conditions.

All other functions can be learned by 2-hidden layer networks (Cybenko
1988)
Based on Kolmogorovs Thm.

Function Approximation
Discontinuity Continuous with discontinuous first derivative
Self Study
Classification Example
Example-Self Study
Elementary Decision Boundaries
First Subnetwork
First Boundary:
a
1
1
hardlim
1 0
p 0.5 + ( ) =
Second Boundary:
a
2
1
hardlim
0 1
p 0.75 + ( ) =
0 1 2
1
0.5
0
Elementary Decision Boundaries
Third Boundary:
Fourth Boundary:
Second Subnetwork
a
3
1
hardlim
1 0
p 1.5 ( ) =
a
4
1
hardlim
0 1
p 0.25 ( ) =
Total Network
W
1
1 0
0 1
1 0
0 1
= b
1
0.5
0.75
1.5
0.25
=
W
2
1 1 0 0
0 0 1 1
= b
2
1.5
1.5
=
W
3
1 1
= b
3
0.5
=
Function Approximation &
Network Capabilities
Function Approximation
Neural Networks are intrinsically function approximators:
we can train a NN to map real valued vectors to real-valued vectors

Function approximation capabilities of a simple network, in response to its
parameters (weights and biases) are illustrated in the next slides

Function Approximation: Example
f
1
n ( )
1
1 e
n
+
----------------- =
f
2
n ( ) n =
w
1 1 ,
1
10 =
w
2 1 ,
1
10 =
b
1
1
10 =
b
2
1
10 =
w
1 1 ,
2
1 =
w
1 2 ,
2
1 =
b
2
0 =
Nominal Parameter Values
Layer number as superscripts
Nominal Response
-2 -1 0 1 2
-1
0
1
2
3
Parameter Variations
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
1 b
2
1 s s
What would be the effect of varying the bias of the output neuron?
Parameter Variations
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
1 w
1 1 ,
2
1 s s
1 w
1 2 ,
2
1 s s
0 b
2
1
20 s s
1 b
2
1 s s
Performance Learning
Performance Learning
A learning paradigm where we adjust the network parameters (weights and
biases) so as to optimize the performance of the network

Need to define a performance index (e.g. mean square error)

Search the parameter space to minimize the performance index
with respect to the parameters
Performance Index Example

Example: Performance index of a linear perceptron
defined to be the mean square error, over the input
samples x
i
is:
Performance surface
29
Performance surface with 2 weights
The idea is to find a minimum in the space of weights and
the error function E:
w1
w2
E(W)
Performance Optimization

Iterative minimization techniques:

Define E(.) as the performance index

Starting with an initial guess W(0), find W(n+1) at each iteration
such that
E( W(n+1)) < E(W(n))
Basic Optimization Algorithm
w
k 1 +
w
k
o
k
p
k
+ =
w A
k
w
k 1 +
w
k

( ) o
k
p
k
= =
p
k
- Search Direction
o
k
- Learning Rate
or
w
k
w
k 1 +
o
k
p
k
Start with initial guess w
0
and update the guess in each stage
moving along the search direction:
32
Performance surface
The gradient of the performance surface is a vector (with the dimension of w) that:
points toward the direction of maximum change,
with a magnitude equal to the slope of the tangent of the performance
surface.

A ball rolling down the hill will always attempt to roll in the direction opposite
to the gradient arrow (steepest descent).
The slope at the bottom is zero, so the gradient is also zero (that is the
reason the ball stops there).

33
Performance surface
Performance Optimization

Iterative minimization techniques: Steepest Descent
Successive adjustments to W are in the direction of the steepest descent
(direction opposite to the gradient vector)

W(n+1) = W(n) - qg(n)
where g(n) = )) ( ( n W E V
Steepest Descent
Matrix Form
F x ( ) V
x
1
c
c
F x ( )
x
2
c
c
F x ( )
.

x
n
c
c
F x ( )
= F x ( ) V
2
x
1
2
2
c
c
F x ( )
x
1
x
2
c
2
c
c
F x ( ) .
x
1
x
n
c
2
c
c
F x ( )
x
2
x
1
c
2
c
c
F x ( )
x
2
2
2
c
c
F x ( ) .
x
2
x
n
c
2
c
c
F x ( )
.

.

.

x
n
x
1
c
2
c
c
F x ( )
x
n
x
2
c
2
c
c
F x ( ) .
x
n
2
2
c
c
F x ( )
=
Gradient Hessian

Directional Derivatives
F x ( ) c x
i
c
c
2
F x ( ) cx
i
2

ith element of gradient is the first derivative (slope) of F(x) along x

i
axis:

i,i element of Hessian is the second derivative (curvature) of F(x) along x
i
axis:
What is the derivative of a function along an arbitrary direction?

Directional Derivatives
First derivative of F(x) along vector p is the projection of the gradient onto p:

p
T
F x ( ) V
p
- - - - - - - - - - - - - - - - - - - - - - -
Which direction has the greatest slope?
When the inner product of the direction vector and the gradient is
a maximum
When the direction vector is the same as the gradient

39
Two simple error surfaces (for 2 weights)
-2 -1 0 1 2
-2
-1
0
1
2
Plots
-2
-1
0
1
2
-2
-1
0
1
2
0
5
10
15
20
x
1
x
1
x
2
x
2
1.4
1.3
0.5
0.0
1.0
Directional
Derivatives
F x ( ) x
1
2
2x
1
x
2
2 x
2
2
+ + =
Example
F x ( ) x
1
2
2x
1
x
2
2 x
2
2
+ + =
x
-
0.5
0
= p
1
1
=
F x ( )
x x
-
=
V
x
1
c
c
F x ( )
x
2
c
c
F x ( )
x x
-
=
2x
1
2x
2
+
2x
1
4x
2
+
x x
-
=
1
1
= = =
p
T
F x ( ) V
p
- - - - - - - - - - - - - - - - - - - - - - -
1 1
1
1
1
1
- - - - - - - - - - - - - - - - - - - - - - - -
0
2
- - - - - - - 0 = = =
For
Performance Optimization:
Iterative Techniques Summary
F x
k 1 +
( ) F x
k
( ) <
Choose the next step so that the function decreases:
F x
k 1 +
( ) F x
k
x A
k
+ ( ) F x
k
( ) g
k
T
x A
k
+ ~ =
For small changes in x we can approximate F(x) using the Taylor Series Expansion:
g
k
F x ( ) V
x x
k
=

where
g
k
T
x A
k
o
k
g
k
T
p
k
0 < =
If we want the function to decrease, we must choose p
k
such that:
p
k
g
k
=
We can maximize the decrease by choosing:
x
k 1 +
x
k
o
k
g
k
=
Example
F x ( ) x
1
2
2 x
1
x
2
2x
2
2
x
1
+ + + =
x
0
0.5
0.5
=
F x ( ) V
x
1
c
c
F x ( )
x
2
c
c
F x ( )
2x
1
2x
2
1 + +
2x
1
4x
2
+
= =
g
0
F x ( ) V
x x
0
=
3
3
= =
o 0.1 =
x
1
x
0
og
0

0.5
0.5
0.1
3
3

0.2
0.2
= = =
x
2
x
1
og
1

0.2
0.2
0.1
1.8
1.2

0.02
0.08
= = =
Plot
-2 -1 0 1 2
-2
-1
0
1
2
Steepest Descent
Show that steepest descent satisfies the condition for iterative descent:
E( W(n+1)) < E(W(n))

Using Taylor series expansion:
E(W(n+1)) E(W(n)) - qg
T
(n) g(n)
= E(W(n)) - q,,g(n)||
2

Result follows since q,,g(n)||
2
is >0 for q > 0
~
Minima and Maxima
skip and go to next section
Minima
The point x* is a strong minimum of F(x) if a scalar o > 0 exists,
such that F(x*) < F(x* + Ax) for all Ax such that o > ||Ax|| > 0.
Strong (Local) Minimum
The point x* is a unique global minimum of F(x) if
F(x*) < F(x* + Ax) for all Ax 0.
Global Minimum
The point x* is a weak minimum of F(x) if it is not a strong
minimum, and a scalar o > 0 exists, such that F(x*) s F(x* + Ax)
for all Ax such that o > ||Ax|| > 0.
Weak Minimum
=
Scalar Example
-2 -1 0 1 2
0
2
4
6
8
F x ( ) 3x
4
7x
2

1
2
--- x 6 + =
What type of minima are these?
Scalar Example
-2 -1 0 1 2
0
2
4
6
8
F x ( ) 3x
4
7x
2

1
2
--- x 6 + =
Strong Minimum
Strong Maximum
Global Minimum
Vector Example
-2
-1
0
1
2
-2
-1
0
1
2
0
4
8
12
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
F x ( ) x
2
x
1
( )
4
8x
1
x
2
x
1
x
2
3 + + + =
-2 -1 0 1 2
-2
-1
0
1
2
-2
-1
0
1
2
-2
-1
0
1
2
0
2
4
6
8
F x ( ) x
1
2
1.5x
1
x
2
2 x
2
2
+ ( ) x
1
2
=
Optimality Conditions
What are the conditions that need to be satisfied for minima?
F x ( ) V
x x
-
=
0 =
Show using the Taylor series that the necessary
condition for a minimum point (strong or weak) is:
Gradient Descent

Delta Rule for Adaline (Linear Activation)
Backpropagation for MLP
62
o
w
0
w
2
...

w
n
63
64
65
Gradient Descent:
another slide explaining the same thing
Gradient:
VE[w]=[cE/cw
0
, cE/cw
n
]
(w
1
,w
2
)
(w
1
+Aw
1
,w
2
+Aw
2
)
Aw=-q VE[w]
cE/cw
i

=c/cw
i
E
p
(t
p
-o
p
)
2
= E
p
2(t
p
- o
p
)c/cw
i
o
p

= E
p
2(t
p
- o
p
)c/cw
i
E
i
w
i
x
i

= E
p
2(t
p
- o
p
)(-x
i
)
66
Stochastic Approximation to
Steepest Descent
Instead of updating every weight until all examples have been
observed, we update on every example:

w
i
~ (t-o) x
i
(not summing through all the patterns!)

In this case we update the weights incrementally.

Remarks:
-When there are multiple local minima stochastic gradient descent
may avoid the problem of getting stuck on a local minimum.

-Standard gradient descent needs more computation but can be
used with a larger step size.
67
Learning algorithm using the Delta Rule
Algorithm for learning using the delta rule:

1. Assign random values to the weight vector
2. Continue until the stopping condition is met
a) Initialize each w
i
to zero
b) For each example p:
Update w
i
:
w
i
+= (t
p
o
p
) x
i
c) Update w
i
:
w
i
= w
i
+ q w
i
3. Until error is small

68
Difficulties with Gradient Descent
There are two main difficulties with the gradient descent
method:

1. Convergence to a minimum may take a long time.

2. There is no guarantee we will find the global minimum.

Backpropagation Algorithm
General Activation Function
Chain Rule
f n w ( ) ( ) d
w d
-----------------------
f n ( ) d
n d
--------------
n w ( ) d
w d
--------------- =
f n ( ) n ( ) cos =
n e
2w
=
f n w ( ) ( ) e
2w
( ) cos =
f n w ( ) ( ) d
w d
-----------------------
f n ( ) d
n d
--------------
n w ( ) d
w d
--------------- n ( ) sin ( ) 2e
2w
( ) e
2w
( ) sin ( ) 2e
2w
( ) = = =
Example:
Application to Gradient Calculation
F

c
w
i j ,
m
c
------------
F

c
n
i
m
c
---------
n
i
m
c
w
i j ,
m
c
------------ =
F

c
b
i
m
c
---------
F

c
n
i
m
c
---------
n
i
m
c
b
i
m
c
--------- =
Transfer Function Derivatives
f
n
(
)
n d
d
1
1
e
n
+
- - - - - - - - - - - - - - - - -
\ .
| |
e
n
1
e
n
+
( )
2
- - - - - - - - - - - - - - - - - - - - - - - - 1
1
1
e
n
+
- - - - - - - - - - - - - - - - -
\ .
| |
1
1
e
n
+
- - - - - - - - - - - - - - - - -
\ .
| |
1 a
( )
a
( )
= = = =
f
n
( )
n d
d
n
( )
1 = =
Backpropagation
To calculate the partial derivative of E
p
(error on pattern p) w.r.t a given
weight w
ji
, we have to consider whether this is the weight of an output or
hidden node:

If w
ji
is an output node weight:

ji
j
j
j
j ji
p
dw
dnet
dnet
do
do
dE
dw
dE
=
i j j j
ji
p
o net f o t
dw
dE

'
= ) ( ) (
j
i
o
j
w
ji

=
i
ji i j w o net
) ( j j net f o =
Note that o
i
is the
input to node j.
E
p
= (t
p
-o
p
)
2
Backpropagation

If w
ji
is a hidden node weight:

ji
j
j
j
j ji
p
dw
dnet
dnet
do
do
dE
dw
dE
=
i j
j
o net f
do
dE

'
= ) (
j
i
o
j
w
ji

=
i
ji i j w o net
) ( j j net f o =
k

k

w
kj
Backpropagation

If w
ji
is a hidden node weight:

ji
j
j
j
j ji
p
dw
dnet
dnet
do
do
dE
dw
dE
=
i j
j
o net f
do
dE

'
= ) (
j
i
o
j
w
ji

=
i
ji i j w o net
) ( j j net f o =
k

k

=
k
k
kj
j dnet
dE
w
do
dE
Note that as j is a hidden node, we do not know its target.
Hence, dE/do
j
can only be calculated through js contribution to the
derivative of E w.r.t net
k
at the output nodes:
w
kj
Backpropagation Algorithm
Matrix Format
(skip)

go to next section
Network Complexity
Choice of Architecture
g p ( ) 1
i t
4
----- p
\ .
| |
sin + =
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
1-3-1 Network (1 input, 3 hidden, 1 output nodes)
i = 1 i = 2
i = 4 i = 8
Choice of Network Architecture
g p ( ) 1
6t
4
------ p
\ .
| |
sin + =
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
1-5-1
1-2-1 1-3-1
1-4-1
Residual error decreases with O(1/M) where M is the number of hidden units
Convergence in Time
g p ( ) 1 tp ( ) sin + =
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
1
2
3
4
5
0
1
2
3
4
5
0
Generalization
p
1
t
1
{ , } p
2
t
2
{ , } . p
Q
t
Q
{ , } , , ,
g p ( ) 1
t
4
---
p
\ .
| |
sin + =
p 2 1.6 1.2 . 1.6 2 , , , , , =
-2 -1 0 1 2
-1
0
1
2
3
-2 -1 0 1 2
-1
0
1
2
3
1-2-1 1-9-1
Next: Issues and Variations
on
Backpropagation

C and Data Structures Report
No ratings yet
C and Data Structures Report
22 pages
Logging and Debugging in Nokia (Alcatel-Lucent) SR OS and Cisco IOS XR - Karneliuk
No ratings yet
Logging and Debugging in Nokia (Alcatel-Lucent) SR OS and Cisco IOS XR - Karneliuk
12 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Montanari
No ratings yet
Montanari
10 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
NN 2
No ratings yet
NN 2
12 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
ML Notes
No ratings yet
ML Notes
14 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
Setting Parameters of A Deep Neural Network - Hierarchical Representations
No ratings yet
Setting Parameters of A Deep Neural Network - Hierarchical Representations
10 pages
Topic 4 (Part 2) - NN Learning
No ratings yet
Topic 4 (Part 2) - NN Learning
92 pages
121 DL2 Ann
No ratings yet
121 DL2 Ann
64 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Enigma Submission
No ratings yet
Enigma Submission
3 pages
Lecture NN Part1
No ratings yet
Lecture NN Part1
62 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
ML3 Unit 4-3
No ratings yet
ML3 Unit 4-3
13 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
FAI 4 Mathematical Concepts II
No ratings yet
FAI 4 Mathematical Concepts II
39 pages
Unit 2
No ratings yet
Unit 2
37 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Ann2018 L5
No ratings yet
Ann2018 L5
23 pages
LInear
No ratings yet
LInear
14 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
NN Theory
No ratings yet
NN Theory
138 pages
Topic 5 - Part2 NN Learning
No ratings yet
Topic 5 - Part2 NN Learning
90 pages
Class 2
No ratings yet
Class 2
107 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Notes Digital Communication Lecture 1 - 3
No ratings yet
Notes Digital Communication Lecture 1 - 3
63 pages
Fuzzy 1
No ratings yet
Fuzzy 1
69 pages
Notes Digital Communication Lecture 1 - 4
100% (2)
Notes Digital Communication Lecture 1 - 4
63 pages
Fuzzy Relations
No ratings yet
Fuzzy Relations
23 pages
Fuzzy ch3
No ratings yet
Fuzzy ch3
66 pages
Xpath Tutorial
100% (2)
Xpath Tutorial
60 pages
Windows XP Presentation
50% (2)
Windows XP Presentation
49 pages
Export 3d PDF Archicad
No ratings yet
Export 3d PDF Archicad
2 pages
Friend Function & Friend Class
No ratings yet
Friend Function & Friend Class
26 pages
CAN Protocol Specification: CAN 2.0 Part A (PDF) CAN 2.0 Part B PDF
100% (1)
CAN Protocol Specification: CAN 2.0 Part A (PDF) CAN 2.0 Part B PDF
1 page
Windows Server 2012 Domain Controller - MSBS-Controls
No ratings yet
Windows Server 2012 Domain Controller - MSBS-Controls
192 pages
CompTIA Security+ SY0-501 Exam (Valid 88 Questions & Answers) 2017
50% (10)
CompTIA Security+ SY0-501 Exam (Valid 88 Questions & Answers) 2017
7 pages
Why Do I See - Martian Source - Logs in The Messages File - Red Hat Customer Portal
No ratings yet
Why Do I See - Martian Source - Logs in The Messages File - Red Hat Customer Portal
8 pages
Cloud Computing: Presented by Mark Mathson
No ratings yet
Cloud Computing: Presented by Mark Mathson
6 pages
TRB Cse Syllabus
No ratings yet
TRB Cse Syllabus
4 pages
Papers/electronics-Communication-Engineering-Ece/33133052: 25 Best Embedded Systems Interview Questions and Answers
No ratings yet
Papers/electronics-Communication-Engineering-Ece/33133052: 25 Best Embedded Systems Interview Questions and Answers
4 pages
DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder
100% (1)
DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder
48 pages
RGB Based Project Using ATMEGA 16
No ratings yet
RGB Based Project Using ATMEGA 16
23 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
Java J2EE Performance Tuning
No ratings yet
Java J2EE Performance Tuning
7 pages
Course File Bce
No ratings yet
Course File Bce
20 pages
Running From Flash Spra958h
No ratings yet
Running From Flash Spra958h
40 pages
Introduction To Openenterprise: Product Overview
No ratings yet
Introduction To Openenterprise: Product Overview
4 pages
About Issaf
No ratings yet
About Issaf
15 pages
Opendj 3 Server Dev Guide
No ratings yet
Opendj 3 Server Dev Guide
142 pages
BPMN Quick Reference Guide Eng
No ratings yet
BPMN Quick Reference Guide Eng
1 page
4 Kamal
No ratings yet
4 Kamal
22 pages
Osd Ms Sample Jan2012 Final
No ratings yet
Osd Ms Sample Jan2012 Final
11 pages
Introduction To Image Processing and Computer Vision 2 PDF
100% (2)
Introduction To Image Processing and Computer Vision 2 PDF
179 pages
Assignment 3 Detailed Programming Solution
No ratings yet
Assignment 3 Detailed Programming Solution
12 pages
Cryptography and Security
100% (1)
Cryptography and Security
101 pages
Js 1
No ratings yet
Js 1
22 pages
Department of Computer Science and Engineering Question Bank
No ratings yet
Department of Computer Science and Engineering Question Bank
9 pages

Artificial Neural Networks: Multilayer Perceptrons Backpropagation

Uploaded by

Artificial Neural Networks: Multilayer Perceptrons Backpropagation

Uploaded by

Artificial Neural Networks

ith element of gradient is the first derivative (slope) of F(x) along x

You might also like