0% found this document useful (0 votes)
35 views24 pages

Lecture 16-Multilayer Perceptron

DSAI 512 ML theory CH 16

Uploaded by

Knn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views24 pages

Lecture 16-Multilayer Perceptron

DSAI 512 ML theory CH 16

Uploaded by

Knn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Institute for Data Science & Artificial Intelligence

Lecture 16

Multilayer Perceptron
Course: DSAI 512-Machine Learning
1
Instructor: Ercan Atam
List of contents for this lecture

❖ Multiple layers

❖ Universal approximation

❖ The neural network

2
Relevant readings for this lecture

➢ e-Chapter 7 of Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin, “Learning from Data",
AMLBook, 2012.

➢ Chapter 6 (6.1-6.2) of Andreas Lindholm, Niklas Wahlstrom, Fredrik Lindsten, Thomas B. Schon,
“Machine Learning: A First Course for Engineers and Scientists”, Cambridge University Press, 2022.

3
The Neural network - biologically inspired

4
Planes don’t flap wings to fly

Engineering success may start with biological inspiration, but then takes a totally different path...

5
XOR: a limitation of the linear model (1)

6
XOR: a limitation of the linear model (2)

𝑓(𝐱)

A perceptron cannot implement


this function!

Why?

7
Decomposing XOR

We can write 𝑓 using the simpler “OR” and “”AND” operations.

AND → multiplication
OR → addition
Negation → bar

Note: A procedure for obtaining this will be provided


soon.

8
Perceptrons for OR and AND

𝑢1 AND(𝑢1 , 𝑢2 )
𝑢1 OR(𝑢1 , 𝑢2 )

𝑢2
𝑢2

9
How did we find that 𝑓 = ℎ1 ℎത 2 + ℎത1 ℎ2 ?

We consider only the regions of 𝑓 which are “+” and use the “disjunctive normal form” (= OR of ANDs):

Note: You can check that the decomposition constructed based on considering only the positive regions of 𝑓
holds as well when the negative regions of 𝑓 are also considered.
10
Representing 𝑓 using OR and AND (1)

Step1 (“OR”):

11
Representing 𝑓 using OR and AND (2)

Step1 (“OR”) Step2 (“ANDs”)

12
Representing 𝑓 using OR and AND (3)

Step2 (“ANDs”) Step3 ("ℎ1 &ℎ2 ")

13
The multilayer perceptron (MLP)

MLP:

P:

o More layers allow us to implement 𝑓.

o These additional layers are called “hidden layers”.


14
A closer look at MLP

input layer hidden layer 1 hidden layer 2 output layer


(layer 0) (layer 1) (layer 2) (layer 3)

Extra two layers compared to perceptron

Not counted as a layer in general since we have inputs here.

15
Universal approximation (1)

Any target function 𝑓 that can be decomposed into linear separators can be implemented by a 3-layer
perceptron.

16
Universal approximation (2)

If 𝑓 is not strictly decomposable into perceptrons, but has a smooth decision boundary, then a 3-layer
perceptron can come arbitrarily close to implementing it.

Pictorial proof:

Target 8 perceptrons 16 perceptrons

17
Approximation versus generalization

❑ The size of the MLP controls the approximation-generalization tradeoff.

❑ More nodes per hidden layer => approximation ↑ and generalization ↓

18
Minimizing 𝐸in for MLPs

❑ Remember that Ein minimization in Perceptron was a hard combinatorial optimization


problem. Now, for MLPs this problem is much harder. Why?

❑ 𝐸in is not smooth (due to “sign” function), so we cannot use the gradient descent.

❑ Remedy: sign(𝑥 ) ≈ tanh(𝑥) → use gradient descent to minimize 𝐸in corresponding to


this replacement.

19
The neural network

20
Zooming into a hidden node

(𝑙)
𝑤𝑖𝑗 : the weight into node 𝑗 in layer 𝑙
from node 𝑖 in the previous layer.

𝑊 (𝑙) 𝑊 (𝑙+1)
𝒙(𝑙−1) 𝒔(𝑙) 𝒙(𝑙) 𝒔(𝑙+1)
𝜃 𝜃 𝜃
layer (l-1) layer (l) layer (l+1)

Note: the constant “1” nodes have no incoming weight,


but they have an outgoing weight.
21
The neural network for regression and logistic regression

❑ Regression: replace 𝜃(𝑠) in the output node with identity transformation (=no transformation).

❑ Logistic regression: replace 𝜃(𝑠) in the output node with logistic regression sigmoid.

22
Summary

23
References
(utilized for preparation of lecture notes or Matlab code)

▪ https://amlbook.com/eChapters/6-Oct2022-readeronly.pdf
▪ https://www.cs.rpi.edu/~magdon/courses/LFD-Slides/SlidesLect20.pdf

24

You might also like