0% found this document useful (0 votes)
20 views54 pages

Multi Layer Perceptron 1

The document discusses the structure and functionality of Multi-Layer Perceptrons (MLPs) in neural networks, highlighting their ability to perform binary and multi-class classification. It explains the perceptron as a mathematical model of biological neurons, detailing the learning algorithm and error backpropagation methods used to optimize the network's weights. Additionally, it illustrates the capabilities of MLPs in solving complex problems, such as non-linearly separable functions like the XOR operation.

Uploaded by

amnashoaib0319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views54 pages

Multi Layer Perceptron 1

The document discusses the structure and functionality of Multi-Layer Perceptrons (MLPs) in neural networks, highlighting their ability to perform binary and multi-class classification. It explains the perceptron as a mathematical model of biological neurons, detailing the learning algorithm and error backpropagation methods used to optimize the network's weights. Additionally, it illustrates the capabilities of MLPs in solving complex problems, such as non-linearly separable functions like the XOR operation.

Uploaded by

amnashoaib0319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Neural Network:

Multi-Layer Perceptron

1
• Binary
Logistic classification
Regression
Linear classifier
Logistic
Regression
Let’s • Multiple classes
extend
Logistic • Any types of
Regression boundary

2
Logistic Regression
• Linear Boundary

3
Logistic Regression

• Two Steps for Evaluation


• Linear combination of inputs:

• Nonlinear transform of s :

• Graphical Representation

4
Logistic Regression

• Another Name: Perceptron


• It may be called an artificial
neuron
• It mimics the function of neurons

• Logistic Regression is using just ONE


perceptron
• What if we combine many
perceptron

5
Perceptron and Neuron
• Perceptron is a mathematical model of a biological neuron.
• In actual Neurons, the dendrite receives electrical signals from the
axons of other neurons
• In the perceptron, these electrical signals are represented as numerical
values

6
https://wp.nyu.edu/shanghai-ima-documentation/electives/aiarts/ece265/the-neural-network-nn-and-the-biological-neural-network-bnn-erdembileg-chin-erdene/
Perceptron
• Perceptron
• First function: Weighted summation of inputs

• Second function: Non-linear function

 n
1
y= x w
i =0
i i 0
0 otherwise
7
Perceptron
• What a perceptron does

8
Perceptron
• What a perceptron can do
• And operation

9
Perceptron

• What a perceptron
can do – con’d
• OR operation

10
Perceptron

• What a perceptron can do – con’d


• NOT operation

11
Multilayer Perceptron

• Let’s Cascade Many


Perceptrons
• (A network of
perceptrons) v.s. (A
network of neurons
(brain))
• Layered structures: for
simplicity of learning

12
Multilayer
Perceptron
• Graphical Representation is
Preferred

• Mathematical form of
the output

13
Multilayer Perceptron

• Structure of Multilayer Perceptron


– con’d
• Input layer
• Simply pass the input
values to the next layer
• # of nodes = # of inputs
• Hidden layer
• There can be several
hidden layers
• # of nodes should be
given
• Output layer
• # of nodes = # of outputs

14
Multilayer Perceptron
• Artificial Neural Network
• AI tools based on biological brains
• It can learn anything!!

• It is a type of Artificial Neural Network


• Multilayer perceptron

Other names of Multilayer Perceptron


• Feed-forward Neural Network
• Multilayer Feed-forward Neural Network

15
• What a multilayer perceptron can do

• Anything digital computers can do


• Boolean function: 2-layer perceptron
Multilayer • Continuous function: 2-layer
perceptron
Perceptron • Arbitrary function: 3-layer perceptron

• We can build a multi-layer perceptron


which satisfies given input-output pairs

16
Multilayer
Perceptron
• What a multilayer
perceptron can do

17
Multilayer
Perceptron
• Example: What a neural
network can do
• A neural network can
solve non-linearly
separable problems

• Example: XOR operation

18
Multilayer Perceptron

• Example: What a neural network can do


• XOR operation

19
Multilayer Perceptron
• What a neural network can do– con’d
• XOR operation
w11=1.0, w12=1.0, w13=-1.5 w21=1.0, w22=1.0, w23=-0.5 w31=-1.0, w32=1.0, w33=-0.5

x1 x2 S y1 x1 x2 S y2 y1 y2 S y
0 0 -1.5 0 0 0 -0.5 0 0 0 -0.5 0
0 1 -0.5 0 0 1 0.5 1 0 1 0.5 1
1 0 -0.5 0 1 0 0.5 1 0 1 0.5 1
1 1 0.5 1 1 1 1.5 1 1 1 -0.5 0
x2 y2

x1 y1
Multilayer Perceptron
• What a perceptron can do – con’d Some kind of
• XOR operation – con’d AND operation

1 x1 w11
f w11 y
w12
w13
x2
w21 w21 f
w22
f
w23 w31
1 1
1

21
Learning Algorithm
• I have a data set (x11, x12, …, x1m, y1)
(x21, x22, …, x2m, y2)

(xn1, xn2, …, xnm, yn)

• I want to build a Neural Network which generalizes the data


• Step 1: determine the structure of neural network
• # of layers, # of nodes in each layer
• Step 2: determine the weights of links
• How???

22
Learning Algorithm
• Step 1: x1
wij h1

h2
x2 y

… …
hk
xm

Input hidden layer Output layer

• Step 2:
• How many weights? (m+1)*k+(k+1)
• How?
• Define an error function
• Find weights which minimize the error

23
Learning Algorithm
• Error Function
1 n
E (w ) =  (N (w, x i ) − yi ) 2
2 i =1

•N (w, x ): The output of a neural network


• yi : Target value of xi

(x11, x12, …, x1d, y1)


(x21, x22, …, x2d, y2)

(xn1, xn2, …, xnd, yn)
24
Error Back Propagation
• Basic Idea
• Given input-target pairs and output of NN
D1=( x11, x12, …, x1d, t11 , t12, …, t1m) ( o11, o12, …, o1m)
D2=( x21, x22, …, x2d, t21 , t22, …, t2m) ( o21, o22, …, o2m)
… …
DN=( xN1, xN2, …, xNd, tN1 , tN2, …, tNm) ( oN1, oN2, …, oNm)

inputs targets Outputs of NN

• Minimize the error


1 m
En (w ) =  (t nk − onk )
N
E ( w ) =  En ( w )
2
where
n =1 2 k =1
25
Error Back Propagation
• Basic Idea
• Remember  N

E=
N
En because E =  En
w n =1 w n =1


• So, we need to evaluate: En
w

26
Error Back Propagation
• Remind Error Function
Dn=( xn1, xn2, …, xnd, tn1 , tn2, …, tnm) ( on1, on2, …, onm)
1
Onk =
xn1 on1   p

1 + exp −  wk 0 +  wkj hnj  
 
hnj   j =1 
wji wkj
xni j k onk hnj =
1
  d

… … … 1 + exp −  wi 0 +  w ji xni  
  i =1 
xnd onm 1 m
hnp En (w ) =  (t nk −onk ) 2
input hidden output 2 k =1
2
 
 1 
 nk
t −   
So, you can evaluate      
m    
1 
En ( w ) =    p
 1  
 2 k =1  1 + exp  k 0  kj 
− w + w  
    1 + exp −  w + w x     
d
En  

i =1
w ji and En
    ji ni   
wkj       
i0
      i = 1

 27 
Error Back Propagation
• Remind Error Function
• Too complex to differentiate
2
 
 1 
 t nk −       
  
1 m   



En ( w ) =   
p
1
2 k =1  1 + exp −  k 0  kj 
w + w  
      
 1 + exp −  w + w x     
d


i =1
    ji ni   
      
i0
      i =1

 

• Let’s use chain rule


• Case 1: when w is between output and hidden layer
• Case 2: when w is between hidden and input layer

28
Error Back Propagation
• Case 1: Weights between output and hidden layer

• For Dn=( xn1, xn2, …, xnd, tn1 , tn2, …, tnm)


hnj = output of j - th node in hidden layer given x n
netnk = hn1wk1 + hn 2 wk 2 +  + hnp wkp
onk = sigmoid( netnk ) xn1 on1

1 m hnj wkj
En (w ) =  (t nk −onk ) 2 xni j k onk
2 k =1 … … …
xnd onm
hnp
input hidden output

29
Error Back Propagation
• Case 1: Weights between output and hidden layer
En
xn1 on1 wkj = −
hnj
wkj
wkj
xni j k onk
… … …
xnd En En netnk
hnp
onm =
input hidden output wkj netnk wkj
En onk netnk
=
netnk = hn1wk1 + hn 2 wk 2 +  + hnp wkp onk netnk wkj
onk = sigmoid( netnk ) En onk
= hnj
1 m onk netnk
En (w ) =  (t nk −onk ) 2
2 30
Error Back Propagation
• Case 1: Weights between output and hidden layer –con’d

En En onk En 1 m


= hnj =  (t nq −onq ) 2
wkj onk netnk onk 2 q =1
 1
= (t nk −onk ) 2
onk  sigmoid( netnk ) onk 2
=
netnk netnk 1  (t nk −onk )
= 2(t nk −onk )
= onk (1 − onk ) 2 onk
= −(t nk −onk )
En
= −(t nk − onk )onk (1 − onk )hnj
wkj
31
Error Back Propagation
• Case 1: Weights between output and hidden layer –con’d

En
= −(t nk − onk )onk (1 − onk )hnj
wkj

E N
En N
= = − (t nk − onk )onk (1 − onk )hnj
wkj n =1 wkj n =1

E N
wkj = − =   (t nk − onk )onk (1 − onk )hnj
wkj n =1

32
Error Back Propagation
• Case 2: Weights between hidden and input layer

• For Dn=( xn1, xn2, …, xnd, tn1 , tn2, …, tnm)

netnj = xn1w j1 + xn 2 w j 2 +  + xnd w jd


xn1 on1
hnj = sigmoid( netnj )
wji wkj
xni onk
netnk = hn1wk1 + hn 2 wk 2 +  + hnp wkp j k

onk = sigmoid( netnk ) … … …


xnd onm
1 m
En (w ) =  (t nk −onk ) 2 input hidden output
2 k =1
33
Error Back Propagation
 (t nk −onk ) 2
• Case 2: Internal weights – con’d onk
= −2(t nk −onk )
En  1 m
= 
w ji w ji 2 k =1
(t nk −onk ) 2 onk
=o nk (1 − onk )
netnk
1 m 
=  (t nk −onk ) 2 netnk
2 k =1 w ji = wkj
hnj
1 m  (t nk −onk ) 2 onk netnk hnj netnj
= 
2 k =1 onk netnk hnj netnj w ji hnj
=h nj (1 − hnj )
1 m netnj
= 
2 k =1
− 2(t nk −onk ) o nk (1 − onk )  wkj h nj (1 − hnj )  xni
netnj
m = xni
= −h nj (1 − hnj ) xni  wkj (t nk −onk )o nk (1 − onk ) w ji
k =1
34
Error Back Propagation
• Case 2: Internal weights – con’d

En m
= − xni h nj (1 − hnj ) wkj (t nk −onk )o nk (1 − onk )
w ji k =1

E N
En N
 m

= =   − xni h nj (1 − hnj ) wkj (t nk −onk )o nk (1 − onk ) 
w ji n =1 w ji n =1  k =1 

E N
 m

w ji = − =    xni h nj (1 − hnj ) wkj (t nk −onk )o nk (1 − onk ) 
w ji n =1  k =1 
35
Error Back Propagation
• Weights between deep layers

• For Dn=( xn1, xn2, …, xnd, tn1 , tn2, …, tnm)

wpq wip wji wkj


netnp hnp netni hni netnj hnj netnk hnk E

… … … … … … … …

Hidden Layer Hidden Layer Hidden Layer Hidden Layer

36
Error Back Propagation
• Weights between deep layers

𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝐸
= = 𝛿𝑘 ℎ𝑛𝑗 𝛿𝑘 =
𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘

𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝐸
= = 𝛿𝑗 ℎ𝑛𝑖 𝛿𝑗 =
𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗
𝜕𝐸
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑖 𝛿𝑖 =
= = 𝛿𝑖 ℎ𝑛𝑝 𝜕𝑛𝑒𝑡𝑛𝑖
𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕𝑤𝑖𝑝
37
Error Back Propagation
• Weights between deep layers

𝜕𝐸 𝜕𝐸 𝜕ℎ𝑛𝑘 𝜕𝐸 𝜕𝐸 𝜕ℎ𝑛𝑗 𝜕𝐸 𝜕𝐸 𝜕ℎ𝑛𝑖


𝛿𝑘 = = 𝛿𝑗 = = 𝛿𝑖 = =
𝜕𝑛𝑒𝑡𝑛𝑘 𝜕ℎ𝑛𝑘 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕ℎ𝑛𝑗 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕ℎ𝑛𝑖 𝜕𝑛𝑒𝑡𝑛𝑖
𝐾 𝐽
𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕ℎ𝑛𝑗 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕ℎ𝑛𝑖
= ෍ = ෍
𝜕𝑛𝑒𝑡𝑛𝑘 𝜕ℎ𝑛𝑗 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕ℎ𝑛𝑖 𝜕𝑛𝑒𝑡𝑛𝑖
𝑘=1 𝑗=1
𝐾 𝐽
𝜕ℎ𝑛𝑗 𝜕ℎ𝑛𝑖
= ෍ 𝛿𝑘 𝑤𝑘𝑗 = ෍ 𝛿𝑗 𝑤𝑗𝑖
𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑛𝑒𝑡𝑛𝑖
𝑘=1 𝑗=1
38
Error Back Propagation
• Weights between deep layers

𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝐸 𝜕ℎ𝑛𝑘
= = 𝛿𝑘 ℎ𝑛𝑗 𝛿𝑘 =
𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝑤𝑘𝑗 𝜕ℎ𝑛𝑘 𝜕𝑛𝑒𝑡𝑛𝑘
𝐾
𝜕ℎ𝑛𝑗
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑗 𝛿𝑗 = ෍ 𝛿𝑘 𝑤𝑘𝑗
= = 𝛿𝑗 ℎ𝑛𝑖 𝜕𝑛𝑒𝑡𝑛𝑗
𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑤𝑗𝑖 𝑘=1

𝐽
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕ℎ𝑛𝑖
= = 𝛿𝑖 ℎ𝑛𝑝 𝛿𝑖 = ෍ 𝛿𝑗 𝑤𝑗𝑖
𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖
𝑗=1
39
Error Back Propagation
• Weights between deep layers

If ℎ = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 𝑛𝑒𝑡

𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝐸 𝜕ℎ𝑛𝑘 = − 𝑡𝑛 − ℎ𝑛𝑘 ℎ𝑛𝑘 1 − ℎ𝑛𝑘


= = 𝛿𝑛𝑘 ℎ𝑛𝑗 𝛿𝑛𝑘 =
𝜕𝑤𝑘𝑗 𝜕𝑛𝑒𝑡𝑛𝑘 𝜕𝑤𝑘𝑗 𝜕ℎ𝑛𝑘 𝜕𝑛𝑒𝑡𝑛𝑘
𝐾 𝐾
𝜕ℎ𝑛𝑗
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑗 𝛿𝑛𝑗 = ෍ 𝛿𝑛𝑘 𝑤𝑘𝑗
= = 𝛿𝑛𝑗 ℎ𝑛𝑖 𝜕𝑛𝑒𝑡𝑛𝑗 = ෍ 𝛿𝑛𝑘 𝑤𝑘𝑗 ℎ𝑛𝑗 1 − ℎ𝑛𝑗
𝜕𝑤𝑗𝑖 𝜕𝑛𝑒𝑡𝑛𝑗 𝜕𝑤𝑗𝑖 𝑘=1 𝑘=1
𝐽 𝐽
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕ℎ𝑛𝑖
= = 𝛿𝑛𝑖 ℎ𝑛𝑝 𝛿𝑛𝑖 = ෍ 𝛿𝑛𝑗 𝑤𝑗𝑖 = ෍ 𝛿𝑛𝑗 𝑤𝑗𝑖 ℎ𝑛𝑖 1 − ℎ𝑛𝑖
𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖 𝜕𝑤𝑖𝑝 𝜕𝑛𝑒𝑡𝑛𝑖
𝑗=1 𝑗=1

40
Error Back Propagation
• Example : XOR
• Hidden nodes : 2
• Learning rate : 0.5
w1
x1 x2 y x1
w2 w7
1 1 0
w3
1 0 1 w8
x2 o(x)
0 1 1 w4
w9
w5 w6
0 0 0
1 1

41
Error Back Propagation
• Example : XOR
Iteration : 0 Iteration : 1000
x1 x2 y o x1 x2 y o
1 1 0 0.52 1 1 0 0.50
1 0 1 0.50 1 0 1 0.48
0 1 1 0.52 0 1 1 0.50
0 0 0 0.55 0 0 0 0.52

-0.089 -0.43
x1 x1
0.098 0.056 0.08 -0.014
0.028 -0.29
0.067 -0.06
x2 -0.07
o(x) x2 -0.06
o(x)
0.092 0.016 -0.68 0.019
-0.01 -0.72
1 1 1 1
42
Error Back Propagation
• Example : XOR
Iteration : 2000 Iteration : 3000
x1 x2 y o x1 x2 y o
1 1 0 0.53 1 1 0 0.30
1 0 1 0.48 1 0 1 0.81
0 1 1 0.50 0 1 1 0.81
0 0 0 0.48 0 0 0 0.11

-0.57 -2.39
x1 x1
0.11 0.093 -2.35 4.79
-1.10 -5.86
-1.01 -6.18
x2 -0.97
o(x) x2 -5.19
o(x)
-1.07 0.12 3.15 -1.68
-0.94 1.44
1 1 1 1
43
Error Back Propagation
• Example : XOR
Iteration : 5000 Iteration : 10000
x1 x2 y o x1 x2 y o
1 1 0 0.05 1 1 0 0.02
1 0 1 0.96 1 0 1 0.98
0 1 1 0.96 0 1 1 0.98
0 0 0 0.03 0 0 0 0.02

-4.15 -4.67
x1 x1
-4.11 8.34 -4.63 9.59
-6.38 -6.63
-8.57 -9.73
x2 -6.09
o(x) x2 -6.39
o(x)
6.09 3.86 6.91 -4.53
2.38 2.63
1 1 1 1
44
Error Back Propagation
• Example : XOR
• Error graph 2.5

1.5

error
1

0.5

0
1 876 1751 2626 3501 4376 5251 6126 7001 7876 8751 9626
iteration

45
Error Back Propagation
• Example2 :
• Hidden nodes : 4 Input Output
0.00 0.00
• Iteration : 500,000
0.10 0.36
• Learning rate : 0.7 0.20 0.64
0.30 0.84
f(x) = 4x*(1-x) 0.40 0.96
1.2 0.50 1.00
1 1
0.60 0.96
0.8

0.6
0.70 0.84
0.4 0.80 0.64
0.2
0.90 0.36
0

0 .5 1 1.00 0.00

46
Error Back Propagation
• Example2

47
Error Back Propagation
• We gave only 11 points
• A NN learned only that 11 points
1.2 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

Training data Training result


• Can the NN answer to the un-learned points?

48
Error Back Propagation
• Yes, NNs generalize what they have learned

1.2

1 1

0.8

0.6

0.4
0 .5 1
0.2

49
Generalization, Overfitting
• Which one is better?

1.2 1.2 1.2

1 1 1

0.8 0.8 0.8

0.6 0.6
0.6

0.4 0.4
0.4

0.2 0.2
0.2
0 0
0

Training data 1.2

0.8

How to Control?? 0.6

0.4

0.2

0
Summary
• A perceptron will find a hyperplane (straight line in 1 dimension)
that minimizes the error between the target and the output

• Two-layer perceptron will tend to find a piece-wise hyperplane


that minimizes the error between the target and the output

• Neural networks smoothly interpolate the training data points.

• Very efficient in learning data

• No generation of human-readable knowledge

51
Discussion
• Two class classification
• Use 0 and 1 for class labels
• Use one perceptron at the output layer
• Prediction
0 N (x)  0.5
class(x) = 
1 O.W.

• n class classification Why not?


• Use 1 to n for class labels - Use 1 to n for class labels
- Use one perceptron
• Use n perceptrons at the output layer - Prediction
• Prediction class(x) = i
class(x) = arg max N i(x) (i − 1) i
i if  N ( x) 
n n
52
Discussion
• What if you have categorical inputs
• Two inputs x1  R
x2  Yellow, Red, Blue

• Create a new input variable for each categorical value


1 if original x2 is Yellow
x2 = 
0 Otherwise (0.1, Red ) (0.1, 0,1,0)
1 if original x2 is Red (0.2, Blue) (0.2, 0,0,1)
x3 = 
0 Otherwise (0.3, Yellow ) (0.3, 1,0,0)
1 if original x2 is Blue (0.4, Red ) (0.4, 0,1,0)
x4 = 
0 Otherwise 53
Discussion
• For Regression
• Normalize the outputs into [0,1]
• Why?

• Or, use a linear output node


x1 w11
f w11 y
w12
x2
w13 
w22 w21 w21
f
w23 w31
1
1
54

You might also like