11 Neural Nets Annot
11 Neural Nets Annot
G. Lakemeyer
AI/WS-2024/25 3 / 32
Some Terminology of Neural Networks
AI/WS-2024/25 4 / 32
How a Unit Works
a i = g(ini )
aj Wj,i
g
ini
Input Output
ai
Links Σ Links
Input Activation
© G. Lakemeyer
0 1
X
ai := g (ini ) = g @ Wj ,i ⇥ aj A
j
AI/WS-2024/25 5 / 32
Activation Functions
Some important activation functions are: s(x) = e
- Y
ai ai ai
- +1
t ini
+1
ini
+1
ini
−1 s'(x) = S(x))) -
S(x)
© G. Lakemeyer
Note: t in step represents a threshold, that is, the minimum total weighted
input necessary for the neuron to fire (similar to actual neurons in the brain).
Mathematically, one can always use a threshold of 0 by having an additional
input with activation level a0 = 1 and W0,i = t. Then
0 1 0 1
Xn Xn
ai = stept @ Wj ,i ⇥ aj A = step0 @ Wj ,i ⇥ aj A
j =1 j =0
AI/WS-2024/25 6 / 32
Representing Boolean Functions
Inputs are 0 or 1
0 - 1
W=1 W=1
W = −1
t = 1.5 t = 0.5 t = −0.5
W=1 W=1 1 - 0
© G. Lakemeyer
AND OR NOT
Thus neural nets can at least represent any arbitrary Boolean function.
[We will see below that they can represent much more.]
AI/WS-2024/25 7 / 32
Network Topologies
AI/WS-2024/25 8 / 32
Recurrent Network Types – Hopfield Nets
Recurrent nets with symmetric bi-directional links (Wi ,j = Wj ,i ).
All units are both input and output units.
Nets function as associative memory.
After training a new input is matched against the “closest” example seen
during training.
Example:
© G. Lakemeyer
Theorem
A Hopfield Net can reliably store 0, 138 ⇥ N examples, where N is the number
O
of units.
AI/WS-2024/25 9 / 32
Recurrent Network Types – Boltzmann Machines
G Hinton .
State transitions are like simulated annealing search for the configuration
that best approximates the training set.
(Formally identical to a certain kind of belief nets.)
AI/WS-2024/25 10 / 32
Feed-Forward (FF) Nets
a w13 93
Hidden Units:
I1 H3 Units without direct connection to the
w35
w14 external environment. Hidden units are
aj
O5 organized in one or more hidden
>
layers.
-
w23
w45 Perceptrons:
I2 H4 dy
w24 FF Networks without hidden units.
a2
must
act fut non-linea
be
© G. Lakemeyer
When the topology and activation function g are fixed, the representable
functions have a specific parametrized form (parameters = weights).
Example:
a5 = g (W3,5 a3 + W4,5 a4 ) = g (W3,5 g (W1,3 a1 + W2,3 a2 ) + W4,5 g (W1,4 a1 + W2,4 a2 ))
) Learning = Search for the correct parameters = nonlinear regression.
AI/WS-2024/25 11 / 32
How to encode the input?
"Patrons
D local encoding for
1 input unit with 3 values
none = 0
Some =. 5
full = 1
© G. Lakemeyer
2) distributed : use
3 units for pations
Phone
psome
Pful
some is represented as
Phone =
O
Psomo = 1
AI/WS-2024/25 Pfull = O
12 / 32
Optimal Network structure?
Choosing the right network is a difficult problem!
Also the optimal net may be exponentially large (relative to the input).
Net is too small: the desired function is not representable.
Net is too big: Net memorizes examples without generalization
(analogous to memorizing in decision trees) Overfitting
There is no good theory of how to choose the right network, only some
© G. Lakemeyer
heuristics.
Heuristics of optimal brain damage:
Start with a network with a maximal number of connection. After the 1st
training reduce the number of connections using information theory. Iterate.
Example: -postal
Network to recognize zip codes. 3/4 of the initial connections could be removed.
(There are also methods to move from a network with few nodes to a network with
more nodes.)
AI/WS-2024/25 13 / 32
Perceptrons
© G. Lakemeyer
Ij Wj,i Oi Ij Wj O
Input Output Input Output
Units Units Units Unit
0 1
X
O = Step0 @ Wj Ij A = Step0 (W · I )
j
AI/WS-2024/25 14 / 32
Representing the Majority Function
Inputs e 40 13
,
F -half of the
© G. Lakemeyer
Ij are 1
AI/WS-2024/25 15 / 32
How Do Perceptrons Learn?
function NEURAL-NETWORK-LEARNING(examples) returns network
network a network with randomly assigned weights
repeat
for each e in examples do
Jepoch
O NEURAL-NETWORK-OUTPUT(network, e)
T the observed output values from e
update the weights in network based on e, O, and T
end
until all examples correctly predicted or stopping criterion is reached
© G. Lakemeyer
return network
Error = T O with
O = predicted (actual) output rate OLC
T = correct output
learning
Update-Rule: Wj := Wj + ↵ ⇥ Ij ⇥ Error
g([wjFj) =
g([(Wj +
Ij .
) ·
Fj)
Let EJO (actual output in too small
increases
(
AI/WS-2024/25 17 / 32
What can Perceptrons represent?
Answer: Not much!
Examples: I ,Iz 1 5
= .
I1 I1 I1
-
1 1 1
?
© G. Lakemeyer
0 0 0
0 1 I2 0 1 I2 0 1 I2
(a) I1 and I2 (b) I1 or I2 (c) I1 xor I2
I1
fan-dimens .
f(I .. In ,
Is) = ( Bors .
fat
a Cn-1) dimens
if
W = −1
at most one .
I2
hyperplane
W = −1
t = −1.5
o the input W = −1 separate
I3
On
from I'm
(a) Separating plane (b) Weights and threshold
AI/WS-2024/25 18 / 32
Why XOR is not Representable
- I
Theorem :
There is no perception using step fat a
as activ
Set that can represent XOR
© G. Lakemeyer
Proof
-
: assume
Wi
otherwise
-XOR
X =
y =
0 = g(t) = G
= W > t
X =
1 , y
= 0 =
g(w) = 1 , ,
X = 0
, y
= 1 =
g(Wal = 1 = Was t
AI/WS-2024/25
X= 1 , y
= 1 =
g(w + wz) =
16 19 / 32
Learning Behavior I: Majority function
0.9
% correct on test set
0.8
© G. Lakemeyer
0.7
0.6 Perceptron
Decision tree
0.5
0.4
0 10 20 30 40 50 60 70 80 90 100
Training set size
AI/WS-2024/25 20 / 32
Learning Behavior II: Restaurant Example
0.9
% correct on test set
0.8 Perceptron
Decision tree
© G. Lakemeyer
0.7
0.6
Restaurant domain
↑
0.4
0 10 20 30 40 50 60 70 80 90 100
Training set size
AI/WS-2024/25 21 / 32
Multi-layer Feed-Forward Networks
Output units Oi
Wj,i
Hidden units a j
© G. Lakemeyer
Wk,j
Input units Ik
AI/WS-2024/25 22 / 32
Back-Propagation
function BACK-PROP-UPDATE(network, examples, ) returns a network with modified weights
inputs: network, a multilayer network
examples, a set of input/output pairs
, the learning rate
repeat
for each e in examples do
/* Compute the output for this example */
O RUN-NETWORK(network, Ie )
/* Compute the error and ∆ for units in the output layer */
Erre Te O
© G. Lakemeyer
Errie = Error Tie Oie of the i-th output unit. Err e = error vector.
0
i = Error
E i ⇥ g (ini )
AI/WS-2024/25 23 / 32
Some Intuition behind Backpropagation
Is
then changes of Wo , p
have
little effect
Wa
© G. Lakemeyer
.
6
: =
Was th In.
1s =
g(ing) [ W Di
AI/WS-2024/25 24 / 32
The Math behind Backpropagation
Let W be the vetor of all weights.
= ElTi-g([Wiig(EWej.In)))
© G. Lakemeyer
( for a
2-laye FF-network)
-With Di
aj
-
=
.
similarly
AI/WS-2024/25
I D 25 / 32
Back-Propagation = Gradient Descent
Gradient Descent
is like Hill Climbing except that we are looking for minima instead of maxima.
Err
© G. Lakemeyer
W1
- a
W2
AI/WS-2024/25 26 / 32
Restaurant Example
0.9
% correct on test set
0.8
© G. Lakemeyer
0.7
0.4
0 10 20 30 40 50 60 70 80 90 100
Training set size
AI/WS-2024/25 27 / 32
Restaurant Example
14
8
© G. Lakemeyer
0
0 50 100 150 200 250 300 350 400
Number of epochs
AI/WS-2024/25 27 / 32
Summary
T I S I S A T E X T
uses 29 input units per character (for the 26 letters of the alphabet, blank,
© G. Lakemeyer
Neural net to recognize handwritten numbers (zip codes) [Le Cun 89]
Input: 16 ⇥ 16 pixels per digit.
3 hidden layers: 768; 192; 30 units.
Output: 10 units for digits 0–9.
Limitation of connections was crucial:
© G. Lakemeyer
each unit of the 1st layer was connected with a 5 ⇥ 5 array of the input (25
connections).
First layer was organized in 12 groups with 64 units each. All units of the same
group use identical weights.
The whole network used only 9760 different weights instead of 200,000.
AI/WS-2024/25 30 / 32
LeNet was the First Convolutional Neural Net
#
Ist
laye
© G. Lakemeyer
AI/WS-2024/25 31 / 32
A Typical Convolutional Neural Network (CNN) Today
AI/WS-2024/25 32 / 32