Artificial Neural Network
Artificial Neural Network
Artificial Neural Network
While dendrites branch into a network around the soma, the axon stretches
out to the dendrites and somas of other neurons.
The Artificial Neurons
f (.) a (.)
m
f ( i ) x wijwximj ti 1 f 0
a( f )
m
j 1 0 otherwise
Neuron
● The neuron is the basic information processing
unit of a Neural Network. It consists of:
1 A set of links, describing the neuron inputs, with weights w1,
w2, …, wm
2 An adder function (linear combiner) for computing the
weighted sum of the inputs: i m
(real numbers) f wi xi
i 1
• Gaussian function
1 1 f
2
a( f )
exp
2 2
What Can a Neuron Do?
• A hard limiter.
• A binary threshold unit.
• Hyperspace separation.
y x2
t
f ( i ) w1 x1 w2 x2 t
1 f ( i ) 0
y
0
0 otherwise
x1
w1 w2
x2
1 x1
Artificial Neural Networks (ANN)
2 i
𝛻𝐶 𝜃 𝛻𝐶 𝜃 𝛻𝐶 𝜃
≈0 =0 =0
parameter space
Source: Hung-yi Lee, Deep Learning Tutorial
Back propagation algorithm
for Single-Layer Perceptron
• Step 0 - Initialize weights (w0, w1, …, wn), m = 0,
learning rate , and threshold t
• Step 1 – Do m = m + 1
• Step 2 – Select pattern Xm
• Step 3 – Calculate output
f ( wi , X i ) wi X i t o a ( f )
• Step 3 – Calculate error atau delta = d – o
i
Input
Layer Input Neuron i Output
I1 wi1
wi2 Activation
I2
wi3
Si function Oi Oi
Hidden g(Si )
Layer I3
threshold, t
Output Layer . . .
. . .
Hidden Layer
. . .
Input Layer . . .
x1 x2 xm
How an MLP Works?
Example:
Not linearly separable.
XOR
Is a single layer
x2
perceptron workable?
1
0 x1
1
How an MLP Works?
Example:
XOR L1
y1 y2
x00
L2
2
01 L1 L2
1
11
x1 x2 x3= 1
0 x1
1
How an MLP Works?
Example:
XOR L1 L3
x00
L2
y2
2
01
1 1
11
0 x1 0 y1
1 1
How an MLP Works?
Example:
XOR L1 L3
x00
L2
y2
2
01
1 1
11
0 x1 0 y1
1 1
How an MLP Works?
Example:
z
L3
L3
y1 y2 y2
1
L1 L2 y3= 1
0 y1
x1 x2 x3= 1 1
Is the problem linearly separable?
Parity Problem
x 1 x2 x3 x3
000 0
001 1
010 1
011 0
100 1 x2
101 0
110 0 x1
111 1
Parity Problem
x1 x2 x3 x3
000 0
001 1 P1
010 1
011 0
100 1 P2
P3 x2
101 0
110 0 x1
111 1
Parity Problem
x1 x2 x3 x3
000 0
001 1 P1
010 1 111
011 0
100 1 P2
P3 x2
101 0 011
110 0 x1
111 1 000 001
Parity Problem
x3
P1
y1 y2 y3 111
P1 P2 P3
P2
P3 x2
011
x1 x2 x3
x1
000 001
Parity Problem
y3 x3
P1
111
P4
y2 P2
P3 x2
011
y1 x1
000 001
Parity Problem
y3 z
P4
y1 y3
y2
P4
y2 P1 P2 P3
y1 x1 x2 x3
General Problem
General Problem
Hyperspace Partition
L3
L1
L2
Region Encoding
000 L3 001
L1
010
100
101
110
111 L2
Hyperspace Partition &
Region Encoding Layer
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
101
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
001
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
000
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
110
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
010
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
100
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
111
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Classification
0 0
1 1
101 001 000 110 010 100 111 1
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Feed-Forward Neural Networks
o1 o2 on
d1 d2 dn
Output Layer . . .
. . .
Hidden Layer
. . .
Input Layer . . .
x1 x2 xm
Supervised Learning
Training Set
T (x(1) , d(1) ), (x( 2) , d( 2) ), , (x( p ) , d ( p ) )
E (l )
2 j 1
1 n (l )
d j o (jl ) 2
. . .
Goal: . . .
p
Minimize E E (l ) . . .
l 1 x1 x2 xm
Back Propagation Learning Algorithm
1 n (l )
p
d j o (jl ) E E
2
E (l ) (l )
2 j 1 l 1
o1 o2 on
d1 d2 dn
Learning on Output Neurons . . .
Learning on Hidden Neurons
. . .
. . .
. . .
x1 x2 xm
Learning on Output Neurons
1 n (l )
p
d j oj E E (l )
(l ) (l ) 2
E
2 j 1 l 1
(l )
net j
(l )
o j net (jl )
. . . . . .
depends on the
(d (l )
j o )(l )
j activation function
Activation Function — Sigmoid
1
1
y a (net ) net
0.5
1 e
0 net
1 y
2
1 net
a (net ) net ( ) e net
e
1 e y
(l )
net j
(l )
o j net (jl )
. . . . . . Using sigmoid,
(d (l )
j o )
(l )
j
o(jl ) (1 o(jl ) )
n p
1
E (l ) d (jl ) o (jl ) E E (l )
2
Learning on Output
2 j 1 Neurons l 1
E (l )
(l )
( d (l )
o (l )
) o (l )
(1 o (l )
j )
net j
j (l ) j j j
(l )
net j
(l )
o j net (jl )
. . . . . . Using sigmoid,
(d (l )
j o )
(l )
j
o(jl ) (1 o(jl ) )
Learning on Output Neurons
n p
E E (l )
1
d j oj
(l ) (l ) (l ) 2
E
2 j 1 l 1
. . . i . . . connecting
E
E to
(l )
output neurons?
net (l ) (l )
j
w ji net (jl ) w ji
E p oi(l )
. (l ) (l )
. . j . o.i .
w ji l 1 E (l )
(j l ) oi(l )
w ji
. . . p
w ji j oi(l )
. .( l ).
(d (jl ) o(jl ) )o(jl ) (1 o(jl ) )oi(l )
l 1
Learning on Hidden Neurons
1 n (l )
p
d j oj E E (l )
(l ) (l ) 2
E
2 j 1 l 1
E p p
E (l )
. . . j . . .
wik wik
E
l 1
(l )
l 1 wik
wji E (l ) E (l ) neti(l )
wik neti(l ) wik
. . . i . . .
wik
. . .k . . .
? ?
. . . . . .
n p
1
E (l ) d (jl ) o (jl ) E E (l )
2
Learning on Hidden
2 j 1 Neurons l 1
i
(l )
E p p
E (l )
. . . j . . .
wik wik
E (l )
l 1
l 1 wik
wji E (l ) E (l ) neti(l )
wik neti(l ) wik ok(l )
. . . i . . .
wik
. . .k . . .
. . . . . .
n p
1
E (l ) d (jl ) o (jl ) E E (l )
2
Learning on Hidden
2 j 1 Neurons l 1
i
(l )
E p p
E (l )
. . . j . . .
wik wik
E
l 1
(l )
l 1 wik
wji E (l ) E (l ) neti(l )
wik neti(l ) wik ok(l )
. . . i . . .
wik E (l ) E (l ) oi(l )
(l )
. . .k . . . neti(l )
oi neti(l )
oi(l ) (1 oi(l ) )
?
. . . . . .
n p
1
E (l ) d (jl ) o (jl ) E E (l )
2
Learning on Hidden
2 j 1 Neurons l 1
(l )
E (l )
o (l )
(1 o (l )
) w (l ) i
(l )
neti(l )
i i i ji j
E E (l )
j p p
. . . j . . .
wik wik
E (l )
l 1
l 1 wik
wji E (l ) E (l ) neti(l )
wik neti(l ) wik ok(l )
. . . i . . .
wik E (l ) E (l ) oi(l )
(l )
. . .k . . . neti(l )
oi neti(l )
oi(l ) (1 oi(l ) )
E (l ) E (l ) net j
(l )
. . . . . .
oi
(l )
j net j
(l )
oi(l )
(lj ) w ji
n p
1
E (l ) d (jl ) o (jl ) E E (l )
2
Learning on Hidden
2 j 1 Neurons l 1
E (l )
i(l )
neti(l )
oi
(l )
(1 oi
(l )
) w
ji j
(l )
E E (l )
j p p
. . . j . . .
wik wik
E
l 1
(l )
l 1 wik
wji E (l ) E (l ) neti(l )
wik neti(l ) wik ok(l )
. . . i . . .
wik E p
. . .k . . . i(l ) ok(l )
wik l 1
. . . . . . p
wik i(l ) ok(l )
l 1
Back Propagation
o1 oj on
d1 dj dn
. . . j . . .
. . . i . . .
. . . k . . .
. . . . . .
x1 . . . xm
Back Propagation
(l )
E (l )
( d (l )
o (l )
) o (l )
(1 o (l )
j )
net j
j (l ) j j j
o1 oj on
d1 dj dn
. . . j . . . p
w ji (j l ) oi(l )
l 1
. . . i . . .
. . . k . . .
. . . . . .
x1 . . . xm
Back Propagation
(l )
E (l )
( d (l )
o (l )
) o (l )
(1 o (l )
j )
net j
j (l ) j j j
o1 oj on
d1 dj dn
. . . j . . . p
w ji (j l ) oi(l )
l 1
. . . i . . . p
wik i(l ) ok(l )
l 1
. . . k . . .
. . . . . . E (l )
i
(l )
neti(l )
oi
(l )
(1 oi
(l )
) j ji j
w (l )
x1 . . . xm
Multilayer Neural Network
xii 𝛿𝑗 = 𝑜𝑗 1 − 𝑜𝑗 𝑤𝑗𝑖 𝛿𝑘
wij
oi wji j
𝛿𝑘 = (𝑑𝑘 − 𝑜𝑘 )𝑜𝑘 (1 − 𝑜𝑘 )
oj wjk
wkj k
ok