0% found this document useful (0 votes)
66 views5 pages

Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)

This document discusses extensions of the delta rule for neural networks with non-step activation functions and multiple neurons. It introduces Madalines, which have a single hidden layer of neurons that connect inputs to outputs. The delta rule is generalized to update weights for Madalines based on error between actual and target outputs. Variations are discussed, including updating only certain weights, to try to improve learning before the development of backpropagation.

Uploaded by

rebel_nerd_cloud
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views5 pages

Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)

This document discusses extensions of the delta rule for neural networks with non-step activation functions and multiple neurons. It introduces Madalines, which have a single hidden layer of neurons that connect inputs to outputs. The delta rule is generalized to update weights for Madalines based on error between actual and target outputs. Variations are discussed, including updating only certain weights, to try to improve learning before the development of backpropagation.

Uploaded by

rebel_nerd_cloud
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Neural Networks: Single Neurons (continued)

G. Extension of the Delta Rule: smooth f(z) 1. The delta rule is easily extendable to cases where the step function output function is not sufficient, i.e. if you want to better model a real neuron with a sigmoidal f(z). 2. Recall for a given training vector, the output is
n " % y = f ( z ) = f $ wo + ! w jT j ' # & j =1

Now, for non-step-function activation function, we define the error using the true output:
2 E= 1 2 ( y ! t)

2. Again, the direction of steepest decrease of E is given by !


! wi = " #

"E , so "w i

$E $w i

3. Differentiating

!E !y !f ( z) !z = (y " t ) = (y " t ) = ( y " t ) f # (z) = (y " t ) f #( z)Ti !wi !w i !wi !w i


where f(z) is the derivative of f(z) with respect to z. Hence, the weights are modified by

! wi = "# ( y " t ) f $ (z)Ti = # (t " y ) f $( z)Ti


The main differences from the original delta rule are the presence of y and the factor of f(z). The same equation can be used for updating the bias weights, but the factor of Ti is replaced by 1. 4. Note that the step function is no longer a possibility for f(z), since its derivative is either 0 or (explaining why z, rather than y, was used in the original delta rule error function). The function f must now be differentiable, like the sigmoid functions described earlier. Here are some typical examples:

1 (asymptotes are f(z) = 0 and f(z) =1). For this 1 + e ! "z case f(z) = f(z)[1-f(z)], so the derivative is easily calculable from f(z) itself.
Binary sigmoid: f (z) =

2 ! 1 (asymptotes are f(z) = 1 and f(z) =1). For 1 + e ! "z this case f(z) = [1+f(z)][1-f(z)], so the derivative is again easily calculable.
Bipolar sigmoid: f (z) = Hyperbolic tangent: f(z) = tanh(z) (asymptotes are f(z) = 1 and f(z) =1). The derivative is f(z) = sech2(z) = [1 f2(z)].

2. Multiple Neuron Networks


I. Madalines (Multiple Adalines) A. A Single Layer of Adalines 1. Let there be n inputs into m output neurons. Assume each input is connected to each output unit, so we'll have an nxm array of weights, wij, i = 1,...,n; j = 1,...,m. Then the outputs are given by
n " % y j = f ( z j ) = f $ woj + ! Tk wkj ' # & k =1

Here's an example with n = 2 and m = 2:

b1
w 01

x1
w 21

w 11 w 12

y1

x2

w 22 w 02

y2

b2

2. The error function to be minimized should now include all the outputs. For step function activity function:

E=

1 2

# (zk " t k )
k =1

The derivation of the weight changes is basically the same as for a single neuron, since

m m "E 1 " m "z " n 2 =2 $ ( zk # tk ) = $ (zk # t k ) k = $ (zk # t k ) $ Tl w lk "wij "wij k =1 "wij k =1 "wij l =1 k =1 m k =1 n l =1

= $ ( zk # t k ) $ Tl

m n " w lk = $ ( zk # t k ) $ Tl%li%kj = ( z j # t j )Ti "wij k =1 l =1

so the weights are modified by


!
"wij = #$ z j # t j Ti = $ t j # z j Ti

The same equation holds for updating the bias weights, if we take i = 0 and T0 = 1.
! 3. For smooth activity functions, the error function to be minimized is based on the outputs:

E=

1 2

m k=1

" (y k ! tk )

A similar calculation to that above yields


! wi j = "# y j " t j f $(z j )Ti = # t j " y j f $(z j )Ti

Again, the same equation holds for updating the bias weights, if we take i = 0 and 0 = 1. B. Madaline Networks with one hidden layer and one output layer 1. Begin with the simple case of a single output neuron (m = 1). Let there be n inputs and l hidden neurons. We assume each input is connected to each hidden unit and the outputs of the hidden units are the inputs to the output unit. Thus, we'll have an nxl array of input-hidden weights, wij, i = 1,...,n; j = 1,...,l and l hidden-output weights, vj, plus bias weights for each neuron. Heres an example with n = l = 2:

b1
w 01

x1
w 21

w 11 w 12

by H1
h1

v1

vo

v2 w 22 w 02

Y1

x2

H2

h2

b2
The intermediate neurons, labeled with upper-case Hs in the figure, are often called hidden units since theyre not visible at input or output, but only play an internal processing role. Nonetheless, these are what makes it possible to solve non-linearseparable problems and get around Minsky and Paperts theorem. The hidden unit activations zi and outputs h j are given by
n

z j = w 0 j + " w qj Tq
q =1

and h j = f ( z j )

The output neuron satisfies similar equations to the single neuron case we studied in chapter 1: The output unit activation g and output y are given by !
n

g = v0 + ! v p h p
p =1

and

y = f (g)

2. In the original form of Madaline, the output unit had fixed weights, v0, v1, v2 (usually a "majority rules" algorithm, or an OR for 2 inputs). Hence, these weights would not need to be trained. In addition, the activation function f was taken to be step function. 3. The original delta rule for weight update can be generalized for this single hidden layer case, using the hidden unit activities

"w ij = # ( t $ z j )Ti

where now the activities zj are now even further removed from the target output, t. Nonetheless, this method can succeed if the parameters and algorithm are chosen ! carefully - i.e. to get this to work requires some experimentation, and there may be different best methods for different problems. 4. Before the backpropagation learning rule was devised, there were many attempts to improve learning using the delta rule. One variation that can be reasonably efficient is the following algorithm; instead of updating all the weights at each iteration try to short-circuit the process like so: Epoch loop: While the stopping criterion is false, do the following: Training vector loop: For each training vector (1, ..., n): Compute zj and hj for each hidden unit, g and overall output, y update weights as follows 1. If y = t, no update is performed 2. If y t and t = 1, then update weights only for the hidden unit Hc whose input sum is closest to zero

"w ic = # (1 $ zc )Ti
3. If y t, and t = -1, then update weights for all hidden units Hs whose inputs sums are positive: ! "w is = # ($1 $ zs )Ti , for all s such that zs > 0 End Training Vector Loop when all training vectors have been used Check stopping condition using updated weights after each epoch If stopping ! criterion is satisfied then terminate, else do new epoch This rather ad hoc method was no doubt the product of some experimentation as well as theory and is typical of the efforts to make the delta rule work for complex networks in the era before backpropagation. In fact, one of the problems in the 1970's slow period of research was that there was no uniformly good method of optimally modifying the weights, especially for multi-layer Madalines. It became a bit of an art to find rules that would converge for a given problem, in a reasonable amount of time. Although many other rules were suggested, we will not cover most of them since the Delta rule leads directly into the backpropagation method, which is quite general.

You might also like