Artificial Intelligence, An Introductory Course
Artificial Intelligence, An Introductory Course
An Introductory Course
Course Syllabus:
Batch learning takes place when the network weights are adjusted in
a single training step. In this mode of learning, the complete set of input/output
training data is needed to determine the weights, and feedback information
produced by the network itself is not involved in developing the network. This
learning technique is also called recording. Learning with feedback either
from the teacher or from the environment rather than a teacher, however, is
more typical for neural networks. Such learning is called incremental and is
usually performed in steps. The concept of feedback plays a central role in
learning. The concept is highly elusive and somewhat paradoxical. In a broad
sense it can be understood as an introduction of a pattern of relationships into
the cause-and-effect path. We will distinguish two different types of learning.
1
F(W,X)
X O
Network Input Network output
W
Network d
Learning
Algorithm
Learning
Signal
X O
Network Input Network output
W
sometimes of the teacher’s signal di . We thus have for the network shown in
fig. (11):
r = f ( wi , x , d i ) ..........................................(16)
The increment of the weight vector Wi produced by the learning step at time t
according to the general learning rule is
∆ w i ( t ) = cr [ w i ( t ), x ( t ), d i ( t )] x ( t ) .................... (17)
Where c is a positive number called the learning constant that determines the rate of
learning. The weight vector adapted at time t becomes at the next instant, or learning
step,
w i ( t + 1) = w i ( t ) + c r [ w i ( t ), x ( t ), d i ( t ) ] x ( t ) ....... (18)
The superscript convention will be used in this text to index the discerete-time
training steps as in equ. (18). For the k’th step we thus have from (18) using this
convention.
w ik + 1 = w ik + cr ( w ik , x k , d ik ) x k ......................... (19)
The learning in (18,19) assumes the form of a sequence of discerte- time weight
modifications. Continuous-time learning can be expressed as :
4
dwi ( t )
= crx ( t ) ........................................................ (20)
dt
w1
x1
w2
x2
O=f(net)
f(net)
Wn
f’(net)
Xn ∆W
-
r
d
X X X + +
r ≅ [ d i - f ( w it X )] f '( w it X ) ...................(21)
computed for net= wit X . The explanation of the delta learning rule is shown in
fig. (3.11) this learning rule can be readily derived from the condition of least
square error between oi and d i . Calculating the gradient vector with respect
to wi of the square error defined as
E ≅ 0 .5 (d i - o i ) 2 ..............................(22)
Which is equivalent to
5
E = 0.5 [ d i - f ( w it X )] 2 ...........................(23)
∇ E = - ( d i - o i ) f '( w it X ) X ......................(24)
∂ E / ∂ W ij = - ( d i - o i ) f '(W i t X ) X i .......(25)
Since the minimization of the error requires the weight changes to be in the
negative gradient direction, we take
∆Wi=−c ∇Ε ........................................(26)
∆wi=c(di−oi)f’(neti)x .....................(27)
∆wij=c(di-oi)f’(neti)xj ....................(28)
∆wi=c(di-oi)f’(neti)xj ...................(29)
6
called the continuous perceptron training rule. The delta learning rule can be
generalized for multilayer networks.
This rule can be considered a special case of the Delta learning rule.
Indeed, assuming that f(net)=net, we obtain that f’(net)=1. this rule sometimes
called the LMS (Least Mean Square) learning rule. Weights are initialized at
any values in this method.