Artificial N eu r al Networks
and Deep Lear n in g
1
Contents
• Introduction
Motivation, Biological Background
• Th res h o l d L o g i c U n i t s
Definition, Geometric Interpretation, Limitations, Networks of T L U s , Training
• General N e u r a l Networks
Structure, Operation, Training
• M u l t i - layer Perceptrons
Definition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis
• Deep Learn i n g
Many-layered Perceptrons, Rectified Linear Units, Auto-Encoders, Feature Construction, Image Analysis
• R a d i a l B a s i s Fu n ct ion Networks
Definition, Function Approximation, Initialization, Training, Generalized Version
• Self-Org an i zi n g Map s
Definition, Learning Vector Quantization, Neighborhood of Output Neurons
• Hopfield Networks and B o l t z m a n n Machines
Definition, Convergence, Associative Memory, Solving Optimization Problems, Probabilistic Models
• Recu rren t N e u r a l Networks
Differential Equations, Vector Networks, Backpropagation through Time
2
General ( A r tif ic ia l) N e u r a l Networks
50
General N e u r a l Networks
B a s i c graph theoretic notions
A (directed) graph is a pair G = (V, E ) consisting of a (finite) set Vof vertices
or nodes and a (finite) set E ⊆ V × V of edges.
We call an edge e = (u, v) ∈ E directed from vertex u to vertex v.
Let G = (V, E ) be a (directed) graph and u ∈ V a vertex.
Then the vertices of the set
pred(u) = { v ∈ V | (v, u) ∈ E }
are called the predecessors of the vertex u
and the vertices of the set
succ(u) = { v ∈ V | (u, v) ∈ E }
are called the successors of the vertex u.
51
General N e u r a l Networks
General definition of a neural network
An (artificial) neural network is a (directed) graph G = (U,
C), whose vertices u ∈ U are called neurons or units and
whose edges c ∈ C are called connections.
The set U of vertices is partitioned into
• the set Uin of i n p u t neurons,
• the set Uout of output neurons, and
• the set Uhidden of hidden neurons.
It is
U = Uin ∪ Uout ∪ Uhidden,
Uin /= ∅, Uout /= ∅, Uhidden ∩ (Uin ∪ Uout) = ∅.
52
General N e u r a l Networks
Each connection (v, u) ∈ C possesses a weight wuv and
each neuron u ∈ U possesses three (real-valued) state variables:
• the network i n p u t netu,
• the activation actu, and
• the output outu.
Each input neuron u ∈ Uin also possesses a fourth (real-valued) state variable,
• the external i n p u t extu.
Furthermore, each neuron u ∈ U possesses three functions:
(u)
• the network i n p u t function f net : IR2| pred(u)|+κ1(u) → IR,
(u)
• the activation function f act : IR κ 2 (u) → IR, and
f out : IR → IR,
(u)
• the output function
which are used to compute the values of the state variables.
53
General N e u r a l Networks
T y p e s of (artificial) neural networks:
• If the graph of a neural network is acyclic,
it is called a feed-forward network.
• If the graph of a neural network contains cycles (backward connections),
it is called a recurrent network.
Representation of the connection weights as a mat ri x :
54
General N e u r a l Networks: E x a m p l e
A simple recurrent neural network
−2
x1 u1 u3 y
4
1
3
x2 u2
Weight m a t r i x of this network
55
Structure of a Generalized Neuron
A generalized neuron is a simple numeric processor
ext u
out v 1 = in uv 1 u
wuv1
u)
f n( et f a( cut) f o( uu t)
net u act u out u
out v n = in uv n
wuvn
σ1, . . . , σl θ 1 , . . . , θk
56
General N e u r a l Networks: E x a m p l e
u1 −2 u3
x1 1 1 y
4
1
3
x2 1
u2
Σ Σ
(u)
f net → u, i→
(w nu) = v∈pred(u) wuvin uv = v∈pred(u) wuv outv
(u) 1, if netu ≥ θ,
f act (netu, θ) =
0, otherwise.
(u)
f out (actu ) = actu
57
General N e u r a l Networks: E x a m p l e
U p d a t i n g the activations of the neurons
u1 u2 u3
input phase 1 0 0
work phase 1 0 0 netu3 = −2 < 1
0 0 0 netu1 = 0 < 1
0 0 0 netu2 = 0 < 1
0 0 0 netu3 = 0 < 1
0 0 0 netu1 = 0 < 1
• Order in which the neurons are updated:
u3, u1, u2, u3, u1, u2, u3, . . .
• Input phase: activations/outputs in the initial state.
• Work phase: activations/outputs of the next neuron to update (bold) are com-
puted from the outputs of the other neurons and the weights/threshold.
• A stable state with a unique output is reached.
58
General N e u r a l Networks: E x a m p l e
U p d a t i n g the activations of the neurons
u1 u2 u3
input phase 1 0 0
work phase 1 0 0 netu3 = −2 < 1
1 1 0 netu2 = 1 ≥1
0 1 0 netu1 = 0 < 1
0 1 1 netu3 = 3 ≥1
0 0 1 netu2 = 0 < 1
1 0 1 netu1 = 4 ≥1
1 0 0 netu3 = −2 < 1
• Order in which the neurons are updated:
u3, u2, u1, u3, u2, u1, u3, . . .
• No stable state is reached (oscillation of output).
59
General N e u r a l Networks:
Trainin
g
Definition of learning tasks for a neural network
A fixed learning task L fixed for a neural network with
• n input neurons Uin = {u 1 , . . . , u n } and
• m output neurons Uout = {v 1 , . . . , v m },
is a set of training patterns l = (→ ı (l) , →o (l) ), each consisting
of (l) (l)
• an i n p u t vector →
ı (l) = ( extu1 , . . . , extun ) and
(l) (l)
• an output vector →
o (l) = (ov1 , . . . , ovm ).
A fixed learning task is solved, if for all training patterns l ∈ Lfixed the neural network
computes from the external inputs contained in the input vector → ı (l) of a training
pattern l the outputs contained in the corresponding output vector →o (l) .
60
General N e u r a l Networks: T r a i n i n g
So l v i n g a fixed learning task: E r r o r definition
• Measure how well a neural network solves a given fixed learning task.
• Compute differences between desired and actual outputs.
• Do not sum differences directly in order to avoid errors canceling each other.
• Square has favorable properties for deriving the adaptation rules.
Σ (l)
e= e (l) = Σ ev = Σ Σ ev ,
l∈L fixed v∈Uout l∈L fixed v∈Uout
(l) (l) 2
(ov —out v )
(l)
where ev =
61
General N e u r a l Networks:
Trainin
g
Definition of learning tasks for a neural network
A free learning task Lfree for a neural network with
• n input neurons Uin = {u 1 , . . . , u n },
ı (l) ), each consisting
is a set of training patterns l = (→
of
(l) (l)
• an i n p u t vector →
ı (l) = ( extu1 , . . . , extun ).
Properties:
• There is no desired output for the training patterns.
• Outputs can be chosen freely by the training method.
• Solution idea: S i mi l a r inputs should lead to similar outputs.
(clustering of input vectors)
62
General N e u r a l Networks: Preprocessing
No rmal i zat i on of the i n p u t vectors
• Compute expected value and (corrected) standard deviation for each input:
1 (l) 1 (l)
µk = Σ ext u k and σk = Σ ( ext u −µk )2,
|L| |L| − 1 k
l∈L l∈L
• Normalize the input vectors to expected value 0 and standard deviation 1:
(l)(old)
ext u k −µk
ext(l)(new)
uk =
σk
• Such a normalization avoids unit and scaling problems.
It is also known as z-scaling or z-score standardization.
63