Lecture-2 Learning Process45452465442
Lecture-2 Learning Process45452465442
Lecture-2
Learning Process
Artificial Neural Network - Building Blocks
– Feedback Network
Feedforward Network
Single layer
network
Input Output
layer layer
Different Network Topologies
• Multi-layer feed-forward networks
– One or more hidden layers. Input projects only from previous layers
onto a layer.
2-layer or
1-hidden layer
fully connected
network
Input Hidden Output
layer layer layer
Different Network Topologies
• Multi-layer feed-forward networks
Recurrent
network
Input Output
layer layer
Different Network Topologies
• Lattice networks
– Lattice networks consist of 1-D, 2-D or n-D array of
neurons with a set of input nodes
– Each input is connected to all neurons in the array
– Such networks are feedforward networks where neurons
are arranged in an array
Different Network Topologies
• Lattice networks
– An example of 1-D lattice network
2- Adjustments of Weights (Learning)
……
……
x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents the
confidence of a digit
No ink → 0
Learning Paradigms
– Hebbian rule
– Competitive learning
– Boltzmann learning
3- Activation Functions
• Also sometimes referred to as threshold functions or
squashing functions
• Map a PEs domain to a prespecified range
• Most common functions are:
– linear function
– step function
– ramp function
– sigmoid function
– Gaussian function
• All except linear introduce a nonlinearity
3- Activation Functions
• Handwritten digit recognition
– A single neuron maps a set of inputs into an output number, or 𝑓: 𝑅256
→ 𝑅10
x1 y1
x2
Machine y2
“2”
……
……
x256 𝑓: 𝑅256 → 𝑅10 y10
The function 𝑓 is represented by a neural network
Linear Function
f ( x) = x
Step Function
if x
f ( x) =
− if x
1 if x 0 Binary step
f ( x) = function
0 otherwise
Bipolar step function replaces 0 with -1.
Ramp Function
if x
f ( x) = x if x
− if x −
Sigmoid Function
1
f ( x) =
1 + e − x
As alpha approaches infinity, f(x) approaches a step function.
Gaussian Function
− x2
f ( x ) = exp( )
v
Supervised Learning
– Perceptrons
– Multilayer Perceptrons.
Perceptron
• Perceptron is the basic operational unit of artificial
neural networks.
• It employs supervised learning rule and is able to
classify the data into two classes
• The perceptron is the simplest form of a neural
network.
• It consists of a single neuron with adjustable synaptic
weights and a hard limiter.
Single-layer two-input perceptron
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2
x2
Threshold
Perceptron
Class A 1
1
2
1
x1
Class A 2 x1
x1 w 1 + x2 w 2 − = 0 x1 w 1 + x2 w 2 + x3 w 3 − = 0
x3
(a) Two-input perceptron. (b) Three-input perceptron.
How does the perceptron learn its
classification tasks?
This is done by making small adjustments in the
weights to reduce the difference between the actual
and desired outputs of the perceptron. The initial
weights are randomly assigned, usually in the range
[−0.5, 0.5], and then updated to obtain the output
consistent with the training examples.
If at iteration p, the actual output is Y(p) and the
desired output is Yd (p), then the error is given by:
e( p ) = Yd ( p ) − Y ( p ) where p = 1, 2, 3, . . .
where p = 1, 2, 3, . . .
is the learning rate, a positive constant less than
unity. This dictates how quickly the network converges.
It is set by a matter of experimentation. It is typically 0.1
x2 x2 x2
1 1 1
x1 x1 x1
0 1 0 1 0 1
A perceptron can learn the operations AND and OR, but not
Exclusive-OR.
Example 2:
Consider the following perceptron and = 1.5.
Note that:
Anything below or on the line is class 0.
Anything above the line is class 1.
A Nice Property of Perceptrons
What would happen if the input signals were to degrade somewhat?
What would happen if the weights were slightly wrong?
Neural networks are often still able to give us the right answer, if there is a
small amount of degradation.
Compare this with classical computing approaches!
Exercise
No one‐fits‐all formula
Over fitting can occur if a “good” training set is
not chosen
What constitutes a “good” training set?
Samples must represent the general population.
Samples must contain members of each class.
Samples in each class must contain a wide range of variations or
noise effect.
The size of the training set is related to the number
of neurons