0% found this document useful (0 votes)
9 views50 pages

Lecture-2 Learning Process45452465442

The document outlines the fundamental concepts of artificial neural networks (ANN), including network topology, learning processes, and activation functions. It describes different types of networks such as feedforward and feedback networks, along with learning paradigms like supervised, unsupervised, and reinforcement learning. Additionally, it explains the perceptron model, its training algorithm, and how it classifies inputs based on weights and errors.

Uploaded by

mohabfata2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views50 pages

Lecture-2 Learning Process45452465442

The document outlines the fundamental concepts of artificial neural networks (ANN), including network topology, learning processes, and activation functions. It describes different types of networks such as feedforward and feedback networks, along with learning paradigms like supervised, unsupervised, and reinforcement learning. Additionally, it explains the perceptron model, its training algorithm, and how it classifies inputs based on weights and errors.

Uploaded by

mohabfata2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CS 363: Neural Networks

Lecture-2
Learning Process
Artificial Neural Network - Building Blocks

• Processing of ANN depends upon the following three


building blocks :
1. Network Topology
2. Adjustments of Weights or Learning
3. Activation Functions
1. Network Topology

• A network topology is the arrangement of a network


along with its nodes and connecting lines.

• According to the topology, ANN can be classified as


the following kinds:
– Feedforward Network

– Feedback Network
Feedforward Network

• The signal can only flow in one direction, from input


to output.

• It may be divided into the following two types:


– Single layer feedforward network

– Multilayer feedforward network


Different Network Topologies
• Single layer feed-forward networks
– Input layer projecting into the output layer

Single layer
network

Input Output
layer layer
Different Network Topologies
• Multi-layer feed-forward networks
– One or more hidden layers. Input projects only from previous layers
onto a layer.

2-layer or
1-hidden layer
fully connected
network
Input Hidden Output
layer layer layer
Different Network Topologies
• Multi-layer feed-forward networks

Input Hidden Output


layer layers layer
Feedback Network

• The signal can flow in both directions using


loops.
• It may be divided into the following types:
– Recurrent networks
– Lattice structures
Different Network Topologies
• Recurrent networks
– Have at least one feedback
– Can have hidden neurons
– Feedbacks give additional quality to recurrent networks
– Higher complexity for network analysis
Different Network Topologies
• Recurrent networks
– A network with feedback, where some of its inputs are connected to
some of its outputs (discrete time).

Recurrent
network

Input Output
layer layer
Different Network Topologies
• Lattice networks
– Lattice networks consist of 1-D, 2-D or n-D array of
neurons with a set of input nodes
– Each input is connected to all neurons in the array
– Such networks are feedforward networks where neurons
are arranged in an array
Different Network Topologies
• Lattice networks
– An example of 1-D lattice network
2- Adjustments of Weights (Learning)

• One of the most important ANN features is ability to


learn from the environment
• ANN learns through an iterative process of synaptic
weights and threshold adaptation
• After each iteration ANN should have more
knowledge about the environment
Definition of learning
• Learning is a process where unknown ANN parameters are
adapted through continuous process of stimulation from the
environment
• It may be defined as the process of learning to distinguish the
data of samples into different classes by finding common
features between the samples of the same classes.
• For example, to perform training of ANN, we have some
training samples with unique features, and to perform its
testing we have some testing samples with other unique
features.
An Example: OCR
• For character recognition, input can be a vector of pixel values
(e.g. from a 16x16 pixel matrix), and output could be an
identity of a digit (0-9)
• Input layer can have 16x16=256 inputs
• Output layer could have 10 neurons (one for each digit)
• ANN would be trained with pairs of known input and output
vectors (learning phase)
• After learning is completed, ANN could recognize previously
unseen digits
Example: Handwritten digit recognition

• Handwritten digit recognition (MNIST dataset)


– The intensity of each pixel is considered an input element
– Output is the class of the digit
Input Output
x1 y1
0.1 is 1
x2
y2
0.7 is 2
The image is “2”
……

……
……
x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents the
confidence of a digit
No ink → 0
Learning Paradigms

• Learning paradigms determine the relation of the ANN to


the environment
• The method of setting the values of the weights (training)
is an important characteristic of different neural nets
• Three basic learning paradigms are:
– Supervised learning
– Unsupervised learning
– Reinforcement learning
Learning Algorithms (Rules)

• Learning algorithms determine how weight


correction is computed:
– Error-correction learning

– Hebbian rule

– Widrow-Hoff rule or the delta rule

– Competitive learning

– Boltzmann learning
3- Activation Functions
• Also sometimes referred to as threshold functions or
squashing functions
• Map a PEs domain to a prespecified range
• Most common functions are:
– linear function
– step function
– ramp function
– sigmoid function
– Gaussian function
• All except linear introduce a nonlinearity
3- Activation Functions
• Handwritten digit recognition
– A single neuron maps a set of inputs into an output number, or 𝑓: 𝑅256
→ 𝑅10

x1 y1
x2
Machine y2
“2”
……

……
x256 𝑓: 𝑅256 → 𝑅10 y10
The function 𝑓 is represented by a neural network
Linear Function

f ( x) =  x
Step Function

  if x  
f ( x) = 
−  if x  

 1 if x  0 Binary step
f ( x) =  function
 0 otherwise
Bipolar step function replaces 0 with -1.
Ramp Function

  if x  

f ( x) =  x if x  
−  if x  −

Sigmoid Function

1
f ( x) =
1 + e − x
As alpha approaches infinity, f(x) approaches a step function.
Gaussian Function

− x2
f ( x ) = exp( )
v
Supervised Learning

• Supervised learning is characterized by the presence


of a teacher .
Supervised Learning
• Teacher has knowledge in the form of input-output pairs used
for training
• Error is a difference between desired and obtained output for
a given input vector
• ANN parameters change under the influence of input vectors
and error values
• The learning process is repeated until ANN learns to imitate
the teacher
• After learning is completed, the teacher is no longer required
and ANN can work without supervision
Supervised Learning

• Error function can be mean square error and it depends


on the free parameters (weights and biases)
• Examples of supervised learning algorithms are:
– LMS (least-mean-square) algorithm
– BP (back-propagation) algorithm

• A disadvantage of supervised learning is that learning is


not possible without a teacher
• ANN can only learn based on provided examples
Supervised Learning

• Supervised Learning in Neural Networks:

– Perceptrons

– Multilayer Perceptrons.
Perceptron
• Perceptron is the basic operational unit of artificial
neural networks.
• It employs supervised learning rule and is able to
classify the data into two classes
• The perceptron is the simplest form of a neural
network.
• It consists of a single neuron with adjustable synaptic
weights and a hard limiter.
Single-layer two-input perceptron

Inputs

x1 Linear Hard
w1 Combiner Limiter
Output
 Y
w2

x2
Threshold
Perceptron

The operation of Rosenblatt’s perceptron is based on the


McCulloch and Pitts neuron model. The model consists of a
linear combiner followed by a hard limiter.

The weighted sum of the inputs is applied to the hard limiter,


which produces an output equal to +1 if its input is positive
and −1 if it is negative.
Perceptron
The aim of the perceptron is to classify inputs,
x1, x2, . . ., xn, into one of two classes, say
A1 and A2.
In the case of an elementary perceptron, the n-
dimensional space is divided by a hyperplane into
two decision regions. The hyperplane is defined by
the linearly separable function:
n
 xi wi −  = 0
i =1
Linear separability in the perceptrons
x2 x2

Class A 1

1
2
1
x1
Class A 2 x1

x1 w 1 + x2 w 2 −  = 0 x1 w 1 + x2 w 2 + x3 w 3 −  = 0
x3
(a) Two-input perceptron. (b) Three-input perceptron.
How does the perceptron learn its
classification tasks?
This is done by making small adjustments in the
weights to reduce the difference between the actual
and desired outputs of the perceptron. The initial
weights are randomly assigned, usually in the range
[−0.5, 0.5], and then updated to obtain the output
consistent with the training examples.
If at iteration p, the actual output is Y(p) and the
desired output is Yd (p), then the error is given by:

e( p ) = Yd ( p ) − Y ( p ) where p = 1, 2, 3, . . .

Iteration p here refers to the pth training example


presented to the perceptron.
If the error, e(p), is positive, we need to increase
perceptron output Y(p), but if it is negative, we
need to decrease Y(p).
The perceptron learning rule
wi ( p + 1) = wi ( p ) +   xi ( p )  e( p )

where p = 1, 2, 3, . . .
 is the learning rate, a positive constant less than
unity. This dictates how quickly the network converges.
It is set by a matter of experimentation. It is typically 0.1

The perceptron learning rule was first proposed by


Rosenblatt in 1960. Using this rule we can derive
the perceptron training algorithm for classification
tasks.
Perceptron’s training algorithm
Step 1: Initialisation
Set initial weights w1, w2,…, wn and threshold 
to random numbers in the range [−0.5, 0.5].
If the error, e(p), is positive, we need to increase
perceptron output Y(p), but if it is negative, we
need to decrease Y(p).
Perceptron’s training algorithm
Step 2: Activation
Activate the perceptron by applying inputs x1(p),
x2(p),…, xn(p) and desired output Yd (p). Calculate
the actual output at iteration p = 1
 n 
Y ( p ) = step   xi ( p ) wi ( p ) − 

 i =1 

where n is the number of the perceptron inputs,
and step is a step activation function.
Perceptron’s training algorithm
Step 3: Weight training
Update the weights of the perceptron
wi ( p + 1) = wi ( p ) + wi ( p )
where is the weight correction at iteration p.
The weight correction is computed by the delta
rule:
wi ( p ) =   xi ( p )  e ( p )
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until convergence.
Example 1: logical operation AND
 n  Inputs Desired Initia l Actual Error Final
Y ( p ) = step   xi ( p ) wi ( p ) −  Epoch output weights output weights

 i =1 
 x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3 −0.1 0 0 0.3 −0.1
e ( p ) = Yd ( p ) − Y ( p ) 0 1 0 0.3 −0.1 0 0 0.3 −0.1
1 0 0 0.3 −0.1 1 −1 0.2 −0.1
wi ( p + 1) = wi ( p ) + wi ( p ) 1 1 1 0.2 −0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0
wi ( p ) =   xi ( p )  e ( p ) 0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 −1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0

 1 if x  0 3 0 0 0 0.2 0.0 0 0 0.2 0.0


f ( x) =  0 1 0 0.2 0.0 0 0 0.2 0.0
−1
 0 otherwise 1
1
0
1
0
1
0.2
0.1
0.0
0.0
1
0 1
0.1
0.2
0.0
0.1
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1
1 0 0 0.2 0.1 1 −1 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold:  = 0.2; learning rate:  = 0.1
Two-dimensional plots of basic logical operations

x2 x2 x2

1 1 1

x1 x1 x1
0 1 0 1 0 1

(a) AND (x1  x2) (b) OR (x1  x2) (c) Exclusive-OR


(x1  x2)

A perceptron can learn the operations AND and OR, but not
Exclusive-OR.
Example 2:
Consider the following perceptron and  = 1.5.

Draw a line that represents the decision boundary between


classes 1 and 0.
Example 2:

• Geometrically, we try to find a line that separates


the two classes.

• The learning problem is to find a vector w that


provides correct classification
Example 2:
line equation:
w1x1 + w2x2 = 
You need two points to draw a line.

Note that:
Anything below or on the line is class 0.
Anything above the line is class 1.
A Nice Property of Perceptrons
What would happen if the input signals were to degrade somewhat?
What would happen if the weights were slightly wrong?

If we calculate the outputs to:


{0.2, -0.1} = 1 × 0.2 + 1 × (-0.1) = 0.1 < 1.5 -> class 0
{0.91, 1.1} = 1 × 0.9 + 1 × 1.1 = 2 > 1.5 -> class 1

Neural networks are often still able to give us the right answer, if there is a
small amount of degradation.
Compare this with classical computing approaches!
Exercise

• Consider the perceptron shown is the following figure. The


node has three inputs x = (x1; x2; x3) that receive only
binary signals (either 0 or 1).
Exercise

• Suppose that the activation of the unit is given by:


Y = I ( 0 .3 X 1 + 0 . 3 X 2 + 0 .3 X 3 − 0 .4  0 )
1 if z is true
where I ( z ) = 
0 otherwise
• Calculate what will be the output value y of the unit for
each of the following input patterns:
Data representation

 Usually input/output data needs pre‐processing


 Pictures
 Pixel intensity
 Text:
 A pattern
Size of training set

 No one‐fits‐all formula
 Over fitting can occur if a “good” training set is
not chosen
 What constitutes a “good” training set?
 Samples must represent the general population.
 Samples must contain members of each class.
 Samples in each class must contain a wide range of variations or
noise effect.
 The size of the training set is related to the number
of neurons

You might also like