Multi Layer Perceptron 1
Multi Layer Perceptron 1
Multi-Layer Perceptron
1
• Binary
Logistic classification
Regression
Linear classifier
Logistic
Regression
Let’s • Multiple classes
extend
Logistic • Any types of
Regression boundary
2
Logistic Regression
• Linear Boundary
3
Logistic Regression
• Nonlinear transform of s :
• Graphical Representation
4
Logistic Regression
5
Perceptron and Neuron
• Perceptron is a mathematical model of a biological neuron.
• In actual Neurons, the dendrite receives electrical signals from the
axons of other neurons
• In the perceptron, these electrical signals are represented as numerical
values
6
https://wp.nyu.edu/shanghai-ima-documentation/electives/aiarts/ece265/the-neural-network-nn-and-the-biological-neural-network-bnn-erdembileg-chin-erdene/
Perceptron
• Perceptron
• First function: Weighted summation of inputs
n
1
y= x w
i =0
i i 0
0 otherwise
7
Perceptron
• What a perceptron does
8
Perceptron
• What a perceptron can do
• And operation
9
Perceptron
• What a perceptron
can do – con’d
• OR operation
10
Perceptron
11
Multilayer Perceptron
12
Multilayer
Perceptron
• Graphical Representation is
Preferred
• Mathematical form of
the output
13
Multilayer Perceptron
14
Multilayer Perceptron
• Artificial Neural Network
• AI tools based on biological brains
• It can learn anything!!
15
• What a multilayer perceptron can do
16
Multilayer
Perceptron
• What a multilayer
perceptron can do
17
Multilayer
Perceptron
• Example: What a neural
network can do
• A neural network can
solve non-linearly
separable problems
18
Multilayer Perceptron
19
Multilayer Perceptron
• What a neural network can do– con’d
• XOR operation
w11=1.0, w12=1.0, w13=-1.5 w21=1.0, w22=1.0, w23=-0.5 w31=-1.0, w32=1.0, w33=-0.5
x1 x2 S y1 x1 x2 S y2 y1 y2 S y
0 0 -1.5 0 0 0 -0.5 0 0 0 -0.5 0
0 1 -0.5 0 0 1 0.5 1 0 1 0.5 1
1 0 -0.5 0 1 0 0.5 1 0 1 0.5 1
1 1 0.5 1 1 1 1.5 1 1 1 -0.5 0
x2 y2
x1 y1
Multilayer Perceptron
• What a perceptron can do – con’d Some kind of
• XOR operation – con’d AND operation
1 x1 w11
f w11 y
w12
w13
x2
w21 w21 f
w22
f
w23 w31
1 1
1
21
Learning Algorithm
• I have a data set (x11, x12, …, x1m, y1)
(x21, x22, …, x2m, y2)
…
(xn1, xn2, …, xnm, yn)
22
Learning Algorithm
• Step 1: x1
wij h1
h2
x2 y
… …
hk
xm
• Step 2:
• How many weights? (m+1)*k+(k+1)
• How?
• Define an error function
• Find weights which minimize the error
23
Learning Algorithm
• Error Function
1 n
E (w ) = (N (w, x i ) − yi ) 2
2 i =1
• So, we need to evaluate: En
w
26
Error Back Propagation
• Remind Error Function
Dn=( xn1, xn2, …, xnd, tn1 , tn2, …, tnm) ( on1, on2, …, onm)
1
Onk =
xn1 on1 p
1 + exp − wk 0 + wkj hnj
hnj j =1
wji wkj
xni j k onk hnj =
1
d
… … … 1 + exp − wi 0 + w ji xni
i =1
xnd onm 1 m
hnp En (w ) = (t nk −onk ) 2
input hidden output 2 k =1
2
1
nk
t −
So, you can evaluate
m
1
En ( w ) = p
1
2 k =1 1 + exp k 0 kj
− w + w
1 + exp − w + w x
d
En
i =1
w ji and En
ji ni
wkj
i0
i = 1
27
Error Back Propagation
• Remind Error Function
• Too complex to differentiate
2
1
t nk −
1 m
En ( w ) =
p
1
2 k =1 1 + exp − k 0 kj
w + w
1 + exp − w + w x
d
i =1
ji ni
i0
i =1
28
Error Back Propagation
• Case 1: Weights between output and hidden layer
1 m hnj wkj
En (w ) = (t nk −onk ) 2 xni j k onk
2 k =1 … … …
xnd onm
hnp
input hidden output
29
Error Back Propagation
• Case 1: Weights between output and hidden layer
En
xn1 on1 wkj = −
hnj
wkj
wkj
xni j k onk
… … …
xnd En En netnk
hnp
onm =
input hidden output wkj netnk wkj
En onk netnk
=
netnk = hn1wk1 + hn 2 wk 2 + + hnp wkp onk netnk wkj
onk = sigmoid( netnk ) En onk
= hnj
1 m onk netnk
En (w ) = (t nk −onk ) 2
2 30
Error Back Propagation
• Case 1: Weights between output and hidden layer –con’d
En
= −(t nk − onk )onk (1 − onk )hnj
wkj
E N
En N
= = − (t nk − onk )onk (1 − onk )hnj
wkj n =1 wkj n =1
E N
wkj = − = (t nk − onk )onk (1 − onk )hnj
wkj n =1
32
Error Back Propagation
• Case 2: Weights between hidden and input layer
En m
= − xni h nj (1 − hnj ) wkj (t nk −onk )o nk (1 − onk )
w ji k =1
E N
En N
m
= = − xni h nj (1 − hnj ) wkj (t nk −onk )o nk (1 − onk )
w ji n =1 w ji n =1 k =1
E N
m
w ji = − = xni h nj (1 − hnj ) wkj (t nk −onk )o nk (1 − onk )
w ji n =1 k =1
35
Error Back Propagation
• Weights between deep layers
… … … … … … … …
36
Error Back Propagation
• Weights between deep layers
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝐸
= = 𝛿𝑘 ℎ𝑛𝑗 𝛿𝑘 =
𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝐸
= = 𝛿𝑗 ℎ𝑛𝑖 𝛿𝑗 =
𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗
𝜕𝐸
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑖 𝛿𝑖 =
= = 𝛿𝑖 ℎ𝑛𝑝 𝜕𝑛𝑒𝑡𝑛𝑖
𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕𝑤𝑖𝑝
37
Error Back Propagation
• Weights between deep layers
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝐸 𝜕ℎ𝑛𝑘
= = 𝛿𝑘 ℎ𝑛𝑗 𝛿𝑘 =
𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝑤𝑘𝑗 𝜕ℎ𝑛𝑘 𝜕𝑛𝑒𝑡𝑛𝑘
𝐾
𝜕ℎ𝑛𝑗
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑗 𝛿𝑗 = 𝛿𝑘 𝑤𝑘𝑗
= = 𝛿𝑗 ℎ𝑛𝑖 𝜕𝑛𝑒𝑡𝑛𝑗
𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑤𝑗𝑖 𝑘=1
𝐽
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕ℎ𝑛𝑖
= = 𝛿𝑖 ℎ𝑛𝑝 𝛿𝑖 = 𝛿𝑗 𝑤𝑗𝑖
𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖
𝑗=1
39
Error Back Propagation
• Weights between deep layers
If ℎ = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 𝑛𝑒𝑡
40
Error Back Propagation
• Example : XOR
• Hidden nodes : 2
• Learning rate : 0.5
w1
x1 x2 y x1
w2 w7
1 1 0
w3
1 0 1 w8
x2 o(x)
0 1 1 w4
w9
w5 w6
0 0 0
1 1
41
Error Back Propagation
• Example : XOR
Iteration : 0 Iteration : 1000
x1 x2 y o x1 x2 y o
1 1 0 0.52 1 1 0 0.50
1 0 1 0.50 1 0 1 0.48
0 1 1 0.52 0 1 1 0.50
0 0 0 0.55 0 0 0 0.52
-0.089 -0.43
x1 x1
0.098 0.056 0.08 -0.014
0.028 -0.29
0.067 -0.06
x2 -0.07
o(x) x2 -0.06
o(x)
0.092 0.016 -0.68 0.019
-0.01 -0.72
1 1 1 1
42
Error Back Propagation
• Example : XOR
Iteration : 2000 Iteration : 3000
x1 x2 y o x1 x2 y o
1 1 0 0.53 1 1 0 0.30
1 0 1 0.48 1 0 1 0.81
0 1 1 0.50 0 1 1 0.81
0 0 0 0.48 0 0 0 0.11
-0.57 -2.39
x1 x1
0.11 0.093 -2.35 4.79
-1.10 -5.86
-1.01 -6.18
x2 -0.97
o(x) x2 -5.19
o(x)
-1.07 0.12 3.15 -1.68
-0.94 1.44
1 1 1 1
43
Error Back Propagation
• Example : XOR
Iteration : 5000 Iteration : 10000
x1 x2 y o x1 x2 y o
1 1 0 0.05 1 1 0 0.02
1 0 1 0.96 1 0 1 0.98
0 1 1 0.96 0 1 1 0.98
0 0 0 0.03 0 0 0 0.02
-4.15 -4.67
x1 x1
-4.11 8.34 -4.63 9.59
-6.38 -6.63
-8.57 -9.73
x2 -6.09
o(x) x2 -6.39
o(x)
6.09 3.86 6.91 -4.53
2.38 2.63
1 1 1 1
44
Error Back Propagation
• Example : XOR
• Error graph 2.5
1.5
error
1
0.5
0
1 876 1751 2626 3501 4376 5251 6126 7001 7876 8751 9626
iteration
45
Error Back Propagation
• Example2 :
• Hidden nodes : 4 Input Output
0.00 0.00
• Iteration : 500,000
0.10 0.36
• Learning rate : 0.7 0.20 0.64
0.30 0.84
f(x) = 4x*(1-x) 0.40 0.96
1.2 0.50 1.00
1 1
0.60 0.96
0.8
0.6
0.70 0.84
0.4 0.80 0.64
0.2
0.90 0.36
0
0 .5 1 1.00 0.00
46
Error Back Propagation
• Example2
47
Error Back Propagation
• We gave only 11 points
• A NN learned only that 11 points
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
48
Error Back Propagation
• Yes, NNs generalize what they have learned
1.2
1 1
0.8
0.6
0.4
0 .5 1
0.2
49
Generalization, Overfitting
• Which one is better?
1 1 1
0.6 0.6
0.6
0.4 0.4
0.4
0.2 0.2
0.2
0 0
0
0.8
0.4
0.2
0
Summary
• A perceptron will find a hyperplane (straight line in 1 dimension)
that minimizes the error between the target and the output
51
Discussion
• Two class classification
• Use 0 and 1 for class labels
• Use one perceptron at the output layer
• Prediction
0 N (x) 0.5
class(x) =
1 O.W.