0% found this document useful (0 votes)
4 views

Supervised Learning Network

The document provides an overview of supervised learning networks, focusing on perceptron networks and their learning algorithms. It discusses the architecture, training process, and various types of neural networks including multilayer perceptrons, ADALINE, MADALINE, and backpropagation networks. Additionally, it covers gradient descent methods, activation functions, and applications of backpropagation networks in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Supervised Learning Network

The document provides an overview of supervised learning networks, focusing on perceptron networks and their learning algorithms. It discusses the architecture, training process, and various types of neural networks including multilayer perceptrons, ADALINE, MADALINE, and backpropagation networks. Additionally, it covers gradient descent methods, activation functions, and applications of backpropagation networks in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

SUPERVISED LEARNING

NETWORK
DEFINITION OF SUPERVISED LEARNING
NETWORKS
 Training and test data sets

 Training set; input & target are specified


PERCEPTRON NETWORKS
Architecture

Sensory Unit Associator Unit Response Unit

Binary activation function Activation +1 0 -1


 Linear threshold unit (LTU)

x1 w1
w0
w2
x2  o
. n
 wi xi
.
. wn i=0
n
xn 1 if w xi >0
{ -1 otherwise
f(xi)= i=0
i
PERCEPTRON LEARNING

wi = wi + wi
wi =  (t - o) xi
where
t = c(x) is the target value,
o is the perceptron output,
 Is a small constant (e.g., 0.1) called learning rate.
 If the output is correct (t = o) the weights wi are not
changed

 If the output is incorrect (t  o) the weights wi are changed


such that the output of the perceptron for the new weights
is closer to t.

 The algorithm converges to the correct classification


• if the training data is linearly separable
•  is sufficiently small
Start
Stop

Initialize weights and If weight


bias changes

Set α (0 to 1) wi (new) wi (old )  txi


W(new)=w(old) b(new) b(old )  t
B(new)=b(old)

For
each If y!=t
s:t

Activate input units Apply activation


Calculate net input
Xi=si function y=f(yin)
LEARNING ALGORITHM
 Epoch : Presentation of the entire training set to the
neural network.

 In the case of the AND function, an epoch consists of four


sets of inputs being presented to the network (i.e. [0,0],
[0,1], [1,0], [1,1]).

 Error: The error value is the amount by which the value


output by the network differs from the target value. For
example, if we required the network to output 0 and it
outputs 1, then Error = -1.
 Target Value, T : When we are training a network we not
only present it with the input but also with a value that we
require the network to produce. For example, if we present
the network with [1,1] for the AND function, the training
value will be 1.

 Output , O : The output value from the neuron.

 Ij : Inputs being presented to the neuron.

 Wj : Weight from input neuron (Ij) to the output neuron.

 LR : The learning rate. This dictates how quickly the


network converges. It is set by a matter of
experimentation. It is typically 0.1.
TRAINING ALGORITHM
 Adjust neural network weights to map inputs to outputs.

 Use a set of sample patterns where the desired output


(given the inputs presented) is known.

 The purpose is to learn to


• Recognize features which are common to good and bad
exemplars
MULTILAYER PERCEPTRON

Output Values
Output Layer

Adjustable
Weights

Input Layer
Input Signals (External Stimuli)
LAYERS IN NEURAL NETWORK
 The input layer:
• Introduces input values into the network.
• No activation function or other processing.

 The hidden layer(s):


• Performs classification of features.
• Two hidden layers are sufficient to solve any problem.
• Features imply more layers may be better.

 The output layer:


• Functionally is just like the hidden layers.
• Outputs are passed on to the world outside the neural
network.
ADAPTIVE LINEAR NEURON (ADALINE)

In 1959, Bernard Widrow and Marcian Hoff of Stanford


developed models they called ADALINE (Adaptive Linear
Neuron) and MADALINE (Multilayer ADALINE). These models
were named for their use of Multiple ADAptive LINear
Elements. MADALINE was the first neural network to be
applied to a real world problem. It is an adaptive filter which
eliminates echoes on phone lines.
ADALINE MODEL
Start
Stop

Y
Initialize weights and
bias and α
If Ei=Es

Input the specified


tolerance error Es

Calculate error
Ei=Σ(t-yin)2

For
each wi (new) wi (old )   (t  yin ) xi
s:t b(new) b(old )   (t  yin )

Activate input units Calculate net input


Xi=si Yin=b+Σxi wi
ADALINE LEARNING RULE

Adaline network uses Delta Learning Rule. This rule is also called
as Widrow Learning Rule or Least Mean Square Rule. The delta
rule for adjusting the weights is given as (i = 1 to n):
USING ADALINE NETWORKS
 Initialize
Initialize • Assign random weights to all links

 Training
• Feed-in known inputs in random
sequence
Training • Simulate the network
• Compute error between the input and
the output (Error Function)
• Adjust weights (Learning Function)
• Repeat until total error < ε
Thinking
 Thinking
• Simulate the network
• Network will respond to any input
• Does not guarantee a correct solution
even for trained inputs
MADALINE NETWORK

MADALINE is a Multilayer Adaptive Linear Element. MADALINE


was the first neural network to be applied to a real world
problem. It is used in several adaptive filtering process.
BACK PROPAGATION NETWORK
 A training procedure which allows multilayer feed forward
Neural Networks to be trained.

 Can theoretically perform “any” input-output mapping.

 Can learn to solve linearly inseparable problems.


MULTILAYER FEEDFORWARD NETWORK

Inputs

Hiddens
I0
Outputs
h0
I1 o0
h1
I2 o1
h2 Outputs

I3 Hiddens

Inputs
MULTILAYER FEEDFORWARD NETWORK:
ACTIVATION AND TRAINING
 For feed forward networks:
• A continuous function can be
• differentiated allowing
• gradient-descent.
• Back propagation is an example of a gradient-descent
technique.
• Uses sigmoid (binary or bipolar) activation function.
In multilayer networks, the activation function is
usually more complex than just a threshold function,
like 1/[1+exp(-x)] or even 2/[1+exp(-x)] – 1 to allow for
inhibition, etc.

“Principles of Soft Computing, 2nd Edition”


by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
GRADIENT DESCENT
 Gradient-Descent(training_examples, )

 Each training example is a pair of the form <(x1,…xn),t>


where (x1,…,xn) is the vector of input values, and t is the
target output value,  is the learning rate (e.g. 0.1)

 Initialize each wi to some small random value

 Until the termination condition is met, Do


• Initialize each wi to zero
• For each <(x1,…xn),t> in training_examples Do
 Input the instance (x1,…,xn) to the linear unit and
compute the output o
 For each linear unit weight wi Do

• wi= wi +  (t-o) xi


• For each linear unit weight wi Do
• wi=wi+wi
MODES OF GRADIENT DESCENT
 Batch mode : gradient descent
w=w -  ED[w] over the entire data D
ED[w]=1/2d(td-od)2

 Incremental mode: gradient descent


w=w -  Ed[w] over individual training examples d
Ed[w]=1/2 (td-od)2

 Incremental Gradient Descent can approximate Batch


Gradient Descent arbitrarily closely if  is small enough.
SIGMOID ACTIVATION FUNCTION
x0=1
x1 w1
w0 net=i=0n wi xi o=(net)=1/(1+e-net)
w2
x2  o
.
. (x) is the sigmoid function: 1/(1+e-x)
. wn
d(x)/dx= (x) (1- (x))
xn
Derive gradient decent rules to train:
• one sigmoid function
E/wi = -d(td-od) od (1-od) xi
• Multilayer networks of sigmoid units
backpropagation
BACKPROPAGATION TRAINING
ALGORITHM
 Initialize each wi to some small random value.

 Until the termination condition is met, Do

• For each training example <(x1,…xn),t> Do


• Input the instance (x1,…,xn) to the network and
compute the network outputs ok
• For each output unit k
– k=ok(1-ok)(tk-ok)
• For each hidden unit h
– h=oh(1-oh) k wh,k k
• For each network weight w,j Do
• wi,j=wi,j+wi,j where
– wi,j=  j xi,j
BACKPROPAGATION
 Gradient descent over entire network weight vector

 Easily generalized to arbitrary directed graphs

 Will find a local, not necessarily global error minimum -in


practice often works well (can be invoked multiple times with
different initial weights)

 Often include weight momentum term


wi,j(t)=  j xi,j +  wi,j (t-1)

 Minimizes error training examples

 Will it generalize well to unseen instances (over-fitting)?

 Training can be slow typical 1000-10000 iterations (use


Levenberg-Marquardt instead of gradient descent)
APPLICATIONS OF BACKPROPAGATION
NETWORK
 Load forecasting problems in power systems.

 Image processing.

 Fault diagnosis and fault detection.

 Gesture recognition, speech recognition.

 Signature verification.

 Bioinformatics.

 Structural engineering design (civil).


RADIAL BASIS FUCNTION NETWORK
 The radial basis function (RBF) is a classification and
functional approximation neural network developed by
M.J.D. Powell.

 The network uses the most common nonlinearities such as


sigmoidal and Gaussian kernel functions.

 The Gaussian functions are also used in regularization


networks.

 The Gaussian function is generally defined as

“Principles of Soft Computing, 2nd Edition”


by S.N. Sivanandam & SN Deepa
Copyright  2011 Wiley India Pvt. Ltd. All rights reserved.
RADIAL BASIS FUCNTION NETWORK

You might also like