0% found this document useful (0 votes)

49 views34 pages

NN Ch3

The document discusses associative memory in neural networks. It covers two types of associations, architectures of associative memory, learning algorithms like Hebbian learning, analyzing storage capacity and convergence. Examples of hetero-associative and auto-associative memory are also provided to illustrate pattern recall and storage.

Uploaded by

Shashikant Sathe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views34 pages

NN Ch3

Uploaded by

Shashikant Sathe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Pattern Association & Associative

Memory
• Two types of associations. For two patterns s and t
– hetero-association (s != t) : relating two different patterns
– auto-association (s = t): relating parts of a pattern with
other parts
• Architectures of NN associative memory
– single layer (with/out input layer)
– two layers (for bidirectional assoc.)
• Learning algorithms for AM
– Hebbian learning rule and its variations
– gradient descent
• Analysis
– storage capacity (how many patterns can be
remembered correctly in a memory)
– convergence
• AM as a model for human memory
Training Algorithms for Simple AM
• Network structure: single layer
– one output layer of non-linear units and one input layer
– similar to the simple network for classification in Ch. 2
s_1 x_1 w_11 y_1 t_1
w_1m
w_n1
s_n x_n y_m t_m
w_nm

• Goal of learning:
– to obtain a set of weights w_ij
– from a set of training pattern pairs {s:t}
– such that when s is applied to the input layer, t is computed
at the output layer
– for all training pairs s : t : t j  f ( s T w j ) for all j
Hebbian rule

• Algorithm: (bipolar or binary patterns)

– For each training samples s:t: w ij  si  t j
– w ij increases if both si and t j
are ON (binary) or have the same sign (bipolar)
• If w ij  0 initiall. Then, after updates for all P training patterns
P
w ij   si ( p)t j ( p) W  { w ij }
P 1

• Instead of obtaining W by iterative updates, it can be

computed from the training set by calculating the outer
product of s and t.
• Outer product. Let s and t be row vectors.
Then for a particular training pair s:t
 s1   s1t1......s1t m  w11 ......w1m 
   s t ......s t   
W ( p)  s ( p)  t ( p)    t1 ,......t m   
T 2 1 2 m
 
     
 sn   sn t1......sn t m  w n1......w nm 
P
And W ( P )   s T ( p)  t ( p)
p 1

• It involves 3 nested loops p, i, j (order of p is irrelevant)

p= 1 to P /* for every training pair */
i = 1 to n /* for every row in W */
j = 1 to m /* for every element j in row i
*/ w ij : w ij  si ( p)  t j ( p)
• Does this method provide a good association?
– Recall with training samples (after the weights are
learned or computed)
– Apply s (k ) to one layer, hope t (k ) appear on the other,
e.g. f ( s (k )W )  t (k )
– May not always succeed (each weight contains some
information from all samples)
P P
s (k )W  s (k ) s ( p) t ( p)   s (k )  s T ( p)  t ( p)
T

p 1 p 1

 s ( k ) s ( k ) t ( k )   s ( k ) s T ( p) t ( p)
T

pk

 s ( k ) t ( k )   s ( k ) s T ( p) t ( p)
2

pk
principal cross-talk
term term
• Principal term gives the association between s(k) and t(k).
• Cross-talk represents correlation between s(k):t(k) and other
training pairs. When cross-talk is large, s(k) will recall
something other than t(k).
• If all s(p) are orthogonal to each other, then s (k )  s T ( p)  0 ,
no sample other than s(k):t(k) contribute to the result.
• There are at most n orthogonal vectors in an n-dimensional
space.
• Cross-talk increases when P increases.
• How many arbitrary training pairs can be stored in an AM?
– Can it be more than n (allowing some non-orthogonal patterns
while keeping cross-talk terms small)?
– Storage capacity (more later)
Example of hetero-associative memory
• Binary pattern pairs s:t with |s| = 4 and |t| = 2.
• Total weighted input to output units:
y _ in j   x i w ij
• Activation function: threshold i

1 if y _ in j  0
yj  
• Weights are0 _ in j  0rule (sum of outer products of all
if byy Hebbian
computed
training pairs)

P
  s i ( p) t j ( p)
T
Wsamples:
• Training
p 1

s(p)
t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
1  1 0 1  1 0
       
0 0 0 1 1 0
s (1)  t (1) 
T   1 0   s (2)  t (2) 
T   1 0  
0 0 0  0 0 0
0 0  0 0 0 
   0    

 0 0 0 0 0 0
       
0 0 0
s T (3)  t (3)   0 1   s ( 4)  t ( 4 ) 
T  0  0 1   0 0
 0 0 0
1  0 1
1  0 1 
   1  0 1 
  
2 0
 
1 0
W 
0 1 Computing the weights
 
0 2 

recall:
x=(1 0 0 0) x=(0 1 0 0) (similar to S(1) and
S(2) 2 0  2 0
   
1 0 0 01 0
 2 0  0 1 0 0 1 0   1 0 
0 1 0 1 
0 2   0 2
  
y1  1, y2  0 y1  1, y2  0

x=(0 1 1 0)
2 0 (1 0 0 0), (1 1 0 0) class (1, 0)
  (0 0 0 1), (0 0 1 1) class (0, 1)
0 1 1 01 0
 1 1
0 1 (0 1 1 0) is not sufficiently similar
0 2  to any class

y1  1, y 2  1
delta-rule would give same or
similar results.
Example of auto-associative memory
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by a its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights computed
by Hebbian rule – outer product)
1 1 1  1
1 1 1  1
W  
1 1 1  1
 1 1  1 1 

• training pat. 111  1W  4 4 4  4  111  1

noisy pat  111  1W  2 2 2  2  111  1
missing info 0 0 1  1W  2 2 2  2  111  1
more noisy  1  11  1W  0 0 0 0 not recognized
• Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
• When P is large, W is close to an identity matrix. This
causes output = input, which may not be any stoned
pattern. The pattern correction power is lost.
• Replace diagonal elements by zero.
0 1 1  1
1 0 1  1
W0   
 1 1 0  1 
 1  1  1 0 
• (1 1 1  1)W '  (3 3 3  3)  (1 1 1  1)
(1 1 1  1)W '  (3 1 1  1)  (1 1 1  1)
(0 0 1  1)W '  (2 2 1  1)  (1 1 1  1)
(1  1 1  1)W '  (1 1  1 1)  wrong
Storage Capacity
• # of patterns that can be correctly stored & recalled by a
network.
• More patterns can be stored if they are not similar to each
other (e.g., orthogonal)
non-orthogonal 0 0 2 2 
(1  1  1 1)  W  0 0 0 0 

(1 1  1 1)
0
 2 0 0  2 (1  1  11)  W0  (1 0  1 1)
2 0  2 0  It is not stored correctly
orthogonal
(1 1  1  1) 0  1  1  1
 1 0  1  1
(1 1 1  1)  W0   
  1  1 0  1 All three patterns can be
(1 1  1 1)
 1  1  1 0 correctly recalled
• Adding one more orthogonal pattern (1 1 1the1)weight
matrix becomes:
0 0 0 0
0 0 0 0
W  The memory is
0 0 0 0
  completely destroyed!
0 0 0 0
• Theorem: an n by n network is able to store up to n-1
mutually orthogonal (M.O.) bipolar vectors of n-
dimension, but not n such vectors.
• Informal argument: Suppose m orthogonal vectors a (1)......a (m )
are stored with the following weight matrix:
0 if i  j (zero diagonal )

wi j   m

a i ( p )a j ( p ) otherwise (Hebbian rule)
 p 1
Let’s try to recall one of them, say a (k )  (a1 (k )......a n (k ))
a (k )W  a (k )( w1 , w2 ,......wn )
 (a (k )  w1 , a (k ) w2 ,......a (k ) wn )
n n n
 ( a i (k ) w i1 , a i (k ) w i 2 ,...... a i (k ) w in )
i 1 i 1 i 1
the jth component :
n m

 a (k ) w
i 1
i ij   a i ( k )   a i ( p )a j ( p )
i j p 1
m
  a j ( p )   a i ( k )a i ( p )
p 1 i j
n

 a ( k )a ( p )   a ( k )a ( p )  a ( k )a ( p )
i j
i i
i 1
i i j j

 a j ( k ) a j ( p ) k  p (since a (k ) and a ( p) are M.O.)


n  1 k  p (since a T ( p)  a ( p)  n)
 a ( p) a (k )a ( p)   a ( p) a (k )a ( p) a (k )(n  1)
m

j i i j j j j
p 1 i j pk

   a j (k )  a j (k )(n  1)
pk

 (m  1)a j (k )  a j (k )(n  1)

 ( n  m )a j ( k )
Therefore, a (k )W  (n  m )a (k )

• When m < n, a(k) can correctly recall itself

when m = n, output is a 0 vector, recall fails
• In linear algebraic term, a(k) is a eigenvector of W, whose
corresponding eigenvalue is (n-m).
when m = n, W has eigenvalue zero, the only eigenvector is
0, which is a trivial eigenvector.
• How many mutually orthogonal bipolar vectors with given
dimension n?
n can be written as n  2 k m , where m is an odd integer.
Then maximally: 2 k M.O. vectors
• Follow up questions:
– What would be the capacity of AM if stored patterns are not
mutually orthogonal (say random)
– Ability of pattern recovery and completion.
How far off a pattern can be from a stored pattern that is still
able to recall a correct/stored pattern
– Suppose x is a stored pattern, x’ is close to x, and x”=
f(xW) is even closer to x than x’. What should we do?
Feed back x” , and hope iterations of feedback will lead to x
Iterative Autoassociative Networks
• Example: 0 1 1  1
1 0 1  1
x  (1, 1, 1,  1) W    Output units
1 1 0  1
are threshold
 1  1  1 0 
units
An incomplete recall input : x '  (1, 0, 0, 0)
x 'W  (0, 1, 1,  1)  x"
x"W  (3, 2, 2,  3)  (1, 1, 1,  1)  x

• In general: using current output as input of the next

iteration
x(0) = initial recall input
x(I) = f(x(I-1)W), I = 1, 2, ……
until x(N) = x(K) where K < N
• Dynamic System: state vector x(I)
– If k = N-1, x(N) is a stable state (fixed point)
f(x(N)W) = f(x(N-1)W) = x(N)
• If x(K) is one of the stored pattern, then x(K) is called a
genuine memory
• Otherwise, x(K) is a spurious memory (caused by cross-
talk/interference between genuine memories)
• Each fixed point (genuine or spurious memory) is an
attractor (with different attraction basin)
– If k != N-1, limit-circle,
• The network will repeat
x(K), x(K+1), …..x(N)=x(K) when iteration continues.
• Iteration will eventually stop because the total number of
distinct state is finite (3^n) if threshold units are used.
• If sigmoid units are used, the system may continue evolve
forever (chaos).
Discrete Hopfield Model
• A single layer network
– each node as both input and output units
• More than an AM
– Other applications e.g., combinatorial optimization
• Different forms: discrete & continuous
• Major contribution of John Hopfield to NN
– Treating a network as a dynamic system
– Introduce the notion of energy function & attractors into
NN research
Discrete Hopfield Model (DHM) as AM
• Architecture:
– single layer (units serve as both input and output)
– nodes are threshold units (binary or bipolar)
– weights: fully connected, symmetric, and zero
diagonal
w ij  w ji
w ii  0
xi
– are external
inputs, which
may be transient
or permanent
• Weights:
– To store patterns s(p), p=1,2,…P
bipolar: w ij   si ( p) s j ( p) i  j
p

w ii  0
same as Hebbian rule (with zero diagonal)
binary: w ij   (2 si ( p)  1)(2 s j ( p)  1) i  j
p

w ii  0

converting s(p) to bipolar when constructing W.

• Recall
– Use an input vector to recall a stored vector (book calls the
application of DHM)
– Each time, randomly select a unit for update
Recall Procedure
1.Apply recall input vector x to the network: yi : x i i  1, 2,....n
2.While convergence = fails do
2.1.Randomly select a unit
2.2. Compute y _ ini  x i   y j w ji
ji
2.3. Determine activation of Yi
1 if y _ ini   i

yi   yi if y _ ini   i
 1 if y _ ini   i
2.4. Periodically test for convergence.
• Notes:
1. Each unit should have equal probability to be selected
at step 2.1
2. Theoretically, to guarantee convergence of the recall
process, only one unit is allowed to update its
activation at a time during the computation. However,
the system may converge faster if all units are allowed
to update their activations at the same time.
3. Convergence test: yi (current )  yi (next ) i
4.  i usually set to zero.
5. x i in step 2.2 ( y _ in j  xi   y j w ji) is optional.
j
• Example:
Store one pattern: 0 1 1  1
binary pattern (1, 1, 1, 0) 1 0 1  1
W  
(bipolar counterpart (111 - 1) 1 1 0  1
gives the same W )  1  1  1 0 

Recall input x  (0, 0, 1, 0), first two bits are wrong

Y1 is selected Y4 is selected
y _ in1  x1   y1  w j1  0  1  1 y _ in4  x 4   y4  w j 4  0  (2)  2
y1  1 y4  2
Y  (1, 0, 1, 0) Y  (1, 0, 1, 0)

Y3 is selected Y2 is selected
y _ in3  x3   y3  w j 3  1  1  2 y _ in2  x 2   y2  w j  0  2  2
2
y3  1 y2  1
Y  (1, 0, 1, 0) Y  (1, 1, 1, 0)
The stored pattern is correctly recalled
Convergence Analysis of DHM
• Two questions:
1.Will Hopfield AM converge (stop) with any given recall input?
2.Will Hopfield AM converge to the stored pattern that is closest
to the recall input ?
• Hopfield provides answer to the first question
– By introducing an energy function to this model,
– No satisfactory answer to the second question so far.
• Energy function:
– Notion in thermo-dynamic physical systems. The system has a
tendency to move toward lower energy state.
– Also known as Lyapunov function. After Lyapunov theorem
for the stability of a system of differential equations.
• In general, the energy function E ( y (t )), where y (t ) is the state
of the system at step (time) t, must satisfy two conditions
1. E (t ) is bounded from below E (t )  c t
2. E (t ) is monotonically nonincreasing.
E (t  1)  E (t  1)  E (t )  0 (in continuous version : E (t )  0)
• The energy function defined for DHM
E  0.5 yi y j w ij   x i yi   i yi
i j j i i
• Show E (t  1)  0
At t+1,Yk is selected for update
yk (t  1)  yk (t  1)  yk (t )
Note : y j (t  1)  0 j  k (only one unit can update at a time)
E (t  1)  E (t )
 (0.5 yi (t  1) y j (t  1) w ij   x i yi (t  1)   i yi (t  1))
i j j i i

 (0.5 yi (t ) y j (t ) w ij   x i yi (t )   i yi (t ))
i j j i i
terms which are different in the two parts are those involving yk
yj
k y j w jk , y y
i
i k
w ki , xk yk ,  k yk

E (t  1)  [ y j (t ) w k j  x k   k ]yk (t  1)
jk

y _ ink (t  1)
cases :
if yk (t )  1 & yk (t  1)  1 yk (t  1)  2
 y _ ink   k  E (t  1)  0
if yk (t )  1 & yk (t  1)  1 yk (t  1)  1
 y _ ink   k  E (t  1)  0
otherwise, yk (t  1)  yk (t )  yk (t  1)  0  E (t  1)  0

• Show E(t) is bounded from below, since yi , x i ,  i , w ij are

all bounded, E is bounded.
• Comments:
1.Why converge.
• Each time, E is either unchanged or decreases an amount.
• E is bounded from below.
• There is a limit E may decrease. After finite number of steps, E
will stop decrease no matter what unit is selected for update.

k either yk (t  1)  yk (t )  yk  0
or y _ ink    yk  0
2.The state the system converges is a stable state.
Will return to this state after some small perturbation. It is called an
attractor (with different attraction basin)
3.Error function of BP learning is another example of
energy/Lyapunov function. Because
• It is bounded from below (E>0)
• It is monotonically non-increasing (W updates along
gradient descent of E)
Capacity Analysis of DHM
• P: maximum number of random patterns of dimension n
can be stored in a DHM of n nodes
P
• Hopfield’s observation: P  0.15n,  0.15
n
• Theoretical analysis: n P 1
P , 
2 log 2 n n 2 log 2 n

P/n decreases because larger n leads to more

interference between stored patterns.
• Some work to modify HM to increase its capacity to close
to n, W is trained (not computed by Hebbian rule).
My Own Work:
• One possible reason for the small capacity of
HM is that it does not have hidden nodes.
• Train feed forward network (with hidden
output
layers) by BP to establish pattern auto-
associative. hidden
• Recall: feedback the output to input layer,
making it a dynamic system.
• Shown 1) it will converge, and 2) stored input
patterns become genuine memories.
• It can store many more patterns (seems
O(2^n))
• Its pattern complete/recovery capability Auto-association
decreases when n increases (# of spurious
attractors seems to increase exponentially)

output1 output2

hidden1 hidden2

input1 input2

Hetero-association
Bidirectional AM(BAM)
• Architecture:
– Two layers of non-linear units: X-layer, Y-layer
– Units: discrete threshold, continuing sigmoid (can be
either binary or bipolar).
• Weights: P
– Wnm   s T ( p)  t ( p) (Hebbian/outer product)
p 1
– Symmetric: w ij  w ji
– Convert binary patterns to bipolar when constructing W
• Recall:
– Bidirectional, either by X ( to recall a Y ) or by Y ( to recall a X )
– Recurrent: y (t )  ( f ( y _ in1 (t ),...... f ( y _ inm (t ))
n
where y _ in j (t )   w i j  x i (t  1)
i 1
x (t  1)  ( f ( x _ in1 (t  1),...... f ( x _ inn (t  1))
m
where x _ in i (t  1)   w ij  y j (t )
j 1

– Update can be either asynchronous (as in HM) or

synchronous (change all Y units at one time, then all X
units the next time)
• Analysis (discrete case)
– Energy function: (also a Lyapunov function)
L  0.5( XWY T  YW T X T )   XWY T
m n
  x i w ij y j
j 1 i 1

• The proof is similar to DHM

• Holds for both synchronous and asynchronous
update (holds for DHM only with asynchronous
update, due to lateral connections.)
– Storage capacity:  (max(n, m ))

Difference Between Crisp Set and Fuzzy Set
No ratings yet
Difference Between Crisp Set and Fuzzy Set
1 page
Associative Memory Neural Networks
100% (1)
Associative Memory Neural Networks
35 pages
Shallow Neural Network
No ratings yet
Shallow Neural Network
152 pages
Training Algorithms For Pattern Association
No ratings yet
Training Algorithms For Pattern Association
16 pages
Lecture-7 Discrete Hopfield Network354dsc36
No ratings yet
Lecture-7 Discrete Hopfield Network354dsc36
31 pages
Module 2 PDF
No ratings yet
Module 2 PDF
83 pages
Associative Memory Neural Networks
100% (1)
Associative Memory Neural Networks
26 pages
NN Notes PDF
No ratings yet
NN Notes PDF
126 pages
Soft Computing Module 2
No ratings yet
Soft Computing Module 2
78 pages
Unit 3
No ratings yet
Unit 3
110 pages
Associative - Memory - Networks 1
No ratings yet
Associative - Memory - Networks 1
53 pages
UNIT-II Notes
No ratings yet
UNIT-II Notes
26 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Associative Memory Hop Field Networks
No ratings yet
Associative Memory Hop Field Networks
66 pages
Associative Memory Network
No ratings yet
Associative Memory Network
63 pages
Learning
No ratings yet
Learning
48 pages
Lecture-6 Associative Memory
No ratings yet
Lecture-6 Associative Memory
38 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Module Ii
No ratings yet
Module Ii
38 pages
NN Ch3
No ratings yet
NN Ch3
36 pages
Ppt-Ii NNFL
No ratings yet
Ppt-Ii NNFL
43 pages
NN - 4TH
No ratings yet
NN - 4TH
26 pages
ETAM
No ratings yet
ETAM
25 pages
NN Ch2
No ratings yet
NN Ch2
36 pages
Associative Memory Networks
No ratings yet
Associative Memory Networks
26 pages
Soft Computing: Pattern Associators
No ratings yet
Soft Computing: Pattern Associators
36 pages
Unit 3
No ratings yet
Unit 3
39 pages
Ann 4
No ratings yet
Ann 4
88 pages
Notes Lect 17autoassociated - Hopfield
No ratings yet
Notes Lect 17autoassociated - Hopfield
25 pages
05 Pattern Assoc
No ratings yet
05 Pattern Assoc
89 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
17 pages
Amansoftcomp PDF
No ratings yet
Amansoftcomp PDF
27 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
21 pages
Pattern Association
No ratings yet
Pattern Association
10 pages
Solution ToYegnRame2001
No ratings yet
Solution ToYegnRame2001
107 pages
Hopfield Network
No ratings yet
Hopfield Network
16 pages
Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories
No ratings yet
Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories
8 pages
Lab Manual On Soft Computing (IT-802) : Ms. Neha Sexana
No ratings yet
Lab Manual On Soft Computing (IT-802) : Ms. Neha Sexana
29 pages
Task On Hebbian Learning
No ratings yet
Task On Hebbian Learning
8 pages
Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima
No ratings yet
Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima
6 pages
Hebbian Learning and Associative Memory
No ratings yet
Hebbian Learning and Associative Memory
13 pages
Associative Memory Networks
No ratings yet
Associative Memory Networks
4 pages
24 AssociativeLearning1
No ratings yet
24 AssociativeLearning1
35 pages
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
No ratings yet
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
43 pages
6.10-Tutorial For Week6
No ratings yet
6.10-Tutorial For Week6
17 pages
Lecture 14 - Associative Neural Networks Using Matlab
No ratings yet
Lecture 14 - Associative Neural Networks Using Matlab
6 pages
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
No ratings yet
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
16 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Chapter4 Associative Memory
No ratings yet
Chapter4 Associative Memory
27 pages
Single Layer Feedforward Networks
No ratings yet
Single Layer Feedforward Networks
21 pages
ECE/CS 559 - Neural Networks Lecture Notes #8: Associative Memory and Hopfield Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #8: Associative Memory and Hopfield Networks
9 pages
Associative Memory
No ratings yet
Associative Memory
10 pages
SC Theory Syllabus
No ratings yet
SC Theory Syllabus
2 pages
SC Theory Syllabus
No ratings yet
SC Theory Syllabus
2 pages
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
7 pages
Artificial Neural Networks Lecture Notes: Stephen Lucci, PHD
No ratings yet
Artificial Neural Networks Lecture Notes: Stephen Lucci, PHD
19 pages
Advantages:: Q.No 1.a Ans
No ratings yet
Advantages:: Q.No 1.a Ans
12 pages
Genetic Algorithms Versus Traditional Methods
No ratings yet
Genetic Algorithms Versus Traditional Methods
7 pages

NN Ch3

Uploaded by

NN Ch3

Uploaded by

Pattern Association & Associative

• Algorithm: (bipolar or binary patterns)

• Instead of obtaining W by iterative updates, it can be

• It involves 3 nested loops p, i, j (order of p is irrelevant)

• training pat. 111  1W  4 4 4  4  111  1

 a j ( k ) a j ( p ) k  p (since a (k ) and a ( p) are M.O.)

 (m  1)a j (k )  a j (k )(n  1)

• When m < n, a(k) can correctly recall itself

• In general: using current output as input of the next

converting s(p) to bipolar when constructing W.

Recall input x  (0, 0, 1, 0), first two bits are wrong

• Show E(t) is bounded from below, since yi , x i ,  i , w ij are

P/n decreases because larger n leads to more

– Update can be either asynchronous (as in HM) or

• The proof is similar to DHM

You might also like