0% found this document useful (0 votes)
34 views57 pages

Lecture 22 Energy-Based Models - Hopfield Network

Uploaded by

Mohak Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views57 pages

Lecture 22 Energy-Based Models - Hopfield Network

Uploaded by

Mohak Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Energy-based Models

-- Hopfield Network
Hao Dong

Peking University

1
Contents

• Discrete Hopfield Neural Networks


• Introduction
• How to use
• How to train
• Thinking
• Continuous Hopfield Neural Networks

2
• Discrete Hopfield Neural Networks
• Introduction
• How to use
• How to train
• Thinking
• Continuous Hopfield Neural Networks

3
• Before

All feed forward structures

• What about …

4
Consider this network

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$
−1 𝑖𝑓 𝑥 < 0

𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )


"#!

• The output of each neuron is +1/-1


• Every neuron receives input from every other neuron
• Every neuron outputs signals to every other neuron
• The weight is symmetric: 𝑤!" = 𝑤"! assume 𝑤!! = 0
5
Hopfield Net

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$
−1 𝑖𝑓 𝑥 < 0

𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )


"#!

• At each time, each neuron receives a “field”: ∑"&! 𝑤"! 𝑦" + 𝑏!


• If the sign of the field matches its own sign, nothing happens;
• If the sign of the field opposes its own sign, it “flips” to match the
sign of the field.

𝑦! → −𝑦! , 𝑖𝑓 𝑦! , 𝑤"! 𝑦" + 𝑏! < 0


"&' 6
Hopfield Net

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$
−1 𝑖𝑓 𝑥 < 0

𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )


"#!

• If the sign of the field opposes its own sign, it “flips” to match the
sign of the field.
• “Flips” of a neuron may cause other neurons to “flip”!

𝑦! → −𝑦! , 𝑖𝑓 𝑦! , 𝑤"! 𝑦" + 𝑏! < 0


"&' 7
Example

• Red edges are +1, blue edges are -1


• Yellow nodes are -1, black nodes are +1

8
Example

• Red edges are +1, blue edges are -1


• Yellow nodes are -1, black nodes are +1

9
Example

• Red edges are +1, blue edges are -1


• Yellow nodes are -1, black nodes are +1

10
Example

• Red edges are +1, blue edges are -1


• Yellow nodes are -1, black nodes are +1

11
Hopfield Net

• If the sign of the field opposes its own sign, it “flips” to match the field
• Which will change the field at other nodes
• Which may then flip
• Which may cause other neurons to flip
• And so on…
• Will this continue forever?

12
Hopfield Net

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$
−1 𝑖𝑓 𝑥 < 0

𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )


"#!

• Let 𝑦!( be the output of the i-th neuron before it responds to the current field
• Let 𝑦!) be the output of the i-th neuron before it responds to the current field

𝑦! → −𝑦! , 𝑖𝑓 𝑦! , 𝑤"! 𝑦" + 𝑏! < 0


"&'
13
Hopfield Net

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$ 𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )
−1 𝑖𝑓 𝑥 < 0
"#!

• If 𝑦!( = 𝑓(∑"&! 𝑤"! 𝑦" + 𝑏! ), then 𝑦!) = 𝑦!(


• No “flip” happens

𝑦!) , 𝑤"! 𝑦" + 𝑏! − 𝑦!( , 𝑤"! 𝑦" + 𝑏! = 0


"&! "&!

14
Hopfield Net

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$ 𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )
−1 𝑖𝑓 𝑥 < 0
"#!

• If 𝑦!( ≠ 𝑓(∑"&! 𝑤"! 𝑦" + 𝑏! ), then 𝑦!) = −𝑦!(


• ”Flip” happens
𝑦!) ∑"&! 𝑤"! 𝑦" + 𝑏! − 𝑦!( ∑"&! 𝑤"! 𝑦" + 𝑏! = 2𝑦!) ∑"&! 𝑤"! 𝑦" + 𝑏! > 0

• Every ”flip” is guaranteed to locally increase

15
Globally

• Consider the following sum across all nodes:


• 𝐸(𝑦! , 𝑦" , … , 𝑦# ) = − ∑$ 𝑦$ (∑%&$ 𝑤%$ 𝑦% + 𝑏$ )
= − ∑$,%&$ 𝑤$% 𝑦$ 𝑦% − ∑$ 𝑏$ 𝑦$
• Assume 𝑤!! = 0

• For a neuron k that ”flips”:


• Δ𝐸 𝑦$ = 𝐸 𝑦%, … , 𝑦$%, … , 𝑦& − 𝐸 𝑦%, … , 𝑦$', … , 𝑦&
= − 𝑦$% − 𝑦$' ∑"#$ 𝑤"$ 𝑦" + 𝑏$
• Always <0!
• Every ”flip” results in a decrease in E
16
Globally

• Consider the following sum across all nodes:


• 𝐸 𝑦! , 𝑦" , … , 𝑦# = − ∑$,&'$ 𝑤$& 𝑦$ 𝑦& − ∑$ 𝑏$ 𝑦$

• E is bounded:
• 𝐸($) = − ∑$,&'$ 𝑤$& − ∑$ 𝑏$
• The minimum variation of E in a ”flip” is:
• Δ𝐸 ($) = min 2| ∑&'$ 𝑤&$ 𝑦& + 𝑏$ |
$,{+! ,$,!…#}

• So any sequence of flips must converge in a finite number of steps


17
The Energy of a Hopfield Net

• The E is the energy of the network


• 𝐸 𝑦! , 𝑦" , … , 𝑦# = − ∑$,&'$ 𝑤$& 𝑦$ 𝑦& − ∑$ 𝑏$ 𝑦$

• The evolution of a Hopfield network decreases its energy

• Analogy: Spin Glass

18
Spin Glass

• Each dipole in a disordered magnetic material


tries to align itself to the local field
• --Filp

• 𝑝! is vector position of i-th dipole


• -- output of each neuron 𝑦!

• The contribution of a dipole to the field


depends on interaction J
• -- Weight 𝑤!"
• Derived from the “Ising” model for magnetic
materials (Ising and Lenz, 1924)
19
Spin Glass

• Response of current dipole


𝑥! 𝑖𝑓 𝑠𝑖𝑔𝑛 𝑥! 𝑓 𝑝! = 1
• 𝑥! = 8
−𝑥! 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

• Total energy (Hamiltonian) of the system


%
• 𝐸 = − ∑! 𝑥! 𝑓 𝑝!
(
= − , , 𝐽"! 𝑥! 𝑥" − , 𝑏! 𝑥!
! "*! !

• Evolve to minimize the energy


• “Flips” will stop
20
Spin Glass

• The system stops at one of its stable point


• local minimum of the energy

• Every point will return to the stable point


after evolving
• The system remembers its stable state

21
Contents

• Discrete Hopfield Neural Networks


• Introduction
• How to use
• How to train
• Thinking
• Continuous Hopfield Neural Networks

22
Hopfield Network

+1 𝑖𝑓 𝑥 ≥ 0
𝑓 𝑥 =$
−1 𝑖𝑓 𝑥 < 0

𝑦! = 𝑓(. 𝑤"! 𝑦" + 𝑏! )


"#!

• The bias is typically not utilized


• It’s similar to having a single extra neuron that is pegged to 1.0

• The network will evolve until it arrives at a local minimum in the energy contour

23
Content-addressable memory

• Each minima is a “stored” pattern


• How to store?

• Recall memory content from partial or corrupt values

• Also called associative memory

• The path is not unique


24
Real-world Examples

• Take advantage of content-addressable memory

Input Process of Evolution

25
Real-world Examples

http://staff.itee.uq.edu.au/janetw/cmc/chapters/Hopfield 26
Computation

1. Initialize network with initial pattern


𝑦! = 𝑥! , 0≤𝑖 ≤𝑁−1
2. Iterate until convergence
𝑦! = 𝑓 . 𝑤"! 𝑦" + 𝑏! , 0 ≤ 𝑖 ≤ 𝑁 − 1
"#!

• Updates can be done sequentially, or all at once


• Usually update all nodes once per epoch
• In one epoch, the nodes are updated randomly
• The system will converge to the local minimum
• Not deterministic
27
Evolution

• The energy is a quadratic function.


• 𝐸 = − ∑5,675 𝑤56 𝑦5 𝑦6 − ∑5 𝑏5 𝑦5
8 :
• 𝐸= − 𝑦 𝑊𝑦 − 𝑏: 𝑦
9
• But why not global minimum?

• For DHN, the energy contour is only defined on a


lattice
• Corners of a unit cube on −1, 1 &

28
Evolution
• If we use tanh for activation
• Still not global minimum, why?
• Local minimum still exists

• An example for a 2-neuron net 1 ) 1


− 𝑦 𝑊 𝑦 = − −𝑦 ) 𝑊(−𝑦)
• Without bias, the local minimum is 2 2
symmetric, why?

29
Contents

• Discrete Hopfield Neural Networks


• Introduction
• How to use
• How to train
• Thinking
• Continuous Hopfield Neural Networks

30
Issues to be solved

• How to store a specific pattern?

• How many patterns can we store?

• How to “retrieve” patterns better?

31
How to store a specific pattern?

• For an image with N pixels, we need:


• N neurons
& &*%
• weights (symmetric)
(

• Consider the setting without bias


• 𝐸 = − ∑!,"#! 𝑤!" 𝑦! 𝑦"

• Goal: Design W so that the energy is local minimum at pattern 𝑃 = {𝑦! }

32
Method1: Hebbian Learning

• We want:
• 𝑓 ∑"#! 𝑤"! 𝑦" = 𝑦! ∀𝑖

• Hebbian Learning:
• 𝑤"! = 𝑦" 𝑦!

• 𝑓 ∑"#! 𝑤"! 𝑦" = 𝑓 ∑"#! 𝑦" 𝑦! 𝑦" = 𝑓 ∑"#! 𝑦"(𝑦! = 𝑓 𝑦! = 𝑦!

• The pattern is stationary

%
• E = − ∑!,"#! 𝑤!" 𝑦! 𝑦" = − 𝑁(𝑁 − 1)
(
33
Method1: Hebbian Learning

• Note:
• If we store P, we will also store –P

• For K patterns:
• 𝑦$ = 𝑦%$ , 𝑦($ , … , 𝑦&$ , 𝑘 = 1, … , 𝐾
%
• 𝑤!" = ∑$ 𝑦!$ 𝑦"$
&
• Each pattern is stable

34
Method1: Hebbian Learning - How many patterns can we store?

• A network of N neurons trained by Hebbian learning can store


~0.14N patterns with low probability of error (<0.4%)
• Assume P(bit=1)=0.5
• Patterns are orthogonal – maximally distant
• The maximum Hamming distance between two N-bit
patterns is N/2 (because symmetry)
• Two patterns differ in N/2 bits are orthogonal

• The proof can be found in 11-785 CMU Lec 17

35
Method1: Hebbian Learning - Example: 4-bit pattern

• Left: stored pattern. Right: energy map


• Local minima exists

36
Method1: Hebbian Learning - Parasitic Patterns

• Parasitic patterns are not expected

37
Method2: Geometric approach

• Consider W = 𝑦𝑦 ) i.e., 𝑤"! = 𝑦" 𝑦!


• W is a positive semidefinite matrix
%
• 𝐸 = − 𝑦 ) 𝑊𝑦 − 𝑏) 𝑦 is convex quadratic
(

• But remember y is the corner of the unit


hypercube

38
Method2: Geometric approach

• Evolution of the network:


• Rotate y and project it onto the nearest corner.

39
Method2: Geometric approach

• Goal: Design W such that 𝑓 𝑊𝑦 = 𝑦

• Simple solution: y is the Eigenvector of W


• Note the eigenvalue of W are non-negative
• The eigenvector of any symmetric matrix are orthogonal

• Storing K orthogonal patterns 𝑌 = 𝑦%, 𝑦(, … , 𝑦,


• 𝑊 = 𝑌Λ𝑌𝑇
• Λ is a positive diagonal matrix diag(𝜆1, 𝜆2, … 𝜆K)
• Hebbian rule: 𝜆 = 1.
• All patterns are equally important
40
Method3: Optimization

%
• 𝐸 = − 𝑦 ) 𝑊𝑦 − 𝑏) 𝑦
(
• This must be maximally low for target patterns
• Also must be maximally high for all other patterns

• 𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛- ∑.∈0! 𝐸 𝑦 − ∑.∉0! 𝐸(𝑦)


𝑌2: set of target pattern

41
Method3: Optimization

• 𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛- ∑.∈0! 𝐸 𝑦 − ∑.∉0! 𝐸(𝑦)


• Y2: set of target pattern

• Intuitively:

42
Method3: Optimization

• 𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛- ∑.∈0! 𝐸 𝑦 − ∑.∉0! 𝐸(𝑦)

• So gradient descent:
• 𝑊 = 𝑊 + 𝛼(∑.∈0! yy ) − ∑.∉0" 𝑦𝑦 ) )

• Repeating a pattern can emphasize the


importance.

• What about 𝑦 ∉ 𝑌𝑝?

43
Method3: Optimization

• 𝑊 = 𝑊 + 𝛼(∑.∈0! yy ) − ∑.∉0" 𝑦𝑦 ) )

• We only need to focus on valleys.


• How to find valleys?

• Random sample and let it evolve

44
Method3: Optimization

• 𝑊 = 𝑊 + 𝛼(∑.∈0! yy ) − ∑.∉0" ,.345667. 𝑦𝑦 ) )

• Initialize W
• Repeat until convergence or limitation:
• Sample target pattern
• Randomly initialize the network and let it evolve
• Update weights

45
Method3: Optimization

• 𝑊 = 𝑊 + 𝛼(∑.∈0! yy ) − ∑.∉0" ,.345667. 𝑦𝑦 ) )

• Initialize W
• Repeat until convergence or limitation:
• Sample target pattern
• Initialize the network with target pattern and let it evolve
• Update weights

46
Method3: Optimization
• 𝑊 = 𝑊 + 𝛼(∑.∈0! yy ) − ∑.∉0" ,.345667. 𝑦𝑦 ) )

• Initialize W
• Repeat until convergence or limitation:
• Sample target pattern
• Initialize the network with target pattern and let it evolve a
few steps
• Update weights

47
Contents

• Discrete Hopfield Neural Networks


• Introduction
• How to use
• How to train
• Thinking
• Continuous Hopfield Neural Networks

48
Thinking
• The capacity of Hopfield Network
• How many patterns can be stored?
• Orthogonal <N; Non-orthogonal?

• Something bad happens:


• When noise increase…

49
Thinking
• Something bad happens:
• The results are not perfect…

50
Thinking
• Something bad happens:
• The results are not perfect…
• Because of the local minima

51
Thinking – Stochastic Hopfield Net
• Something bad happens:
• The results are not perfect…

• We can make Hopfield net stochastic


• Each neuron responds probabilistically
• If the difference if not large, the probability of flipping approaches 0.5
• T is a “temperature” parameter
1
𝑧! = . 𝑤!" 𝑦" + 𝑏!
𝑇
"#!
𝑃 𝑦! = 1 = 𝜎 𝑧!
𝑃 𝑦! = −1 = 1 − 𝜎(𝑧! )
52
Thinking – Stochastic Hopfield Nets

• What’s the final state? (How do we recall a memory?)


• The average of the final few iterations

53
Contents

• Discrete Hopfield Neural Networks


• Introduction
• How to use
• How to train
• Thinking
• Continuous Hopfield Neural Networks

54
Continuous Hopfield Neural Network

• Energy function :

• The output of each neuron are real numbers in [-1,+1]

• Application: optimization (TSP)

• Issues:
• Design the energy function for specific problems
• The variable of the problem and the neuron of the CHNN

55
Reference
• CMU 11-785 Lec17, 18
Thanks

57

You might also like