Multilayer Percept Ron

The document discusses nonlinear regression and multilayer perceptrons, highlighting the challenges of parameter estimation in nonlinear regression and the structure and training of multilayer perceptrons. It explains the use of soft outputs, hidden units, and various training methods including batch and online learning, as well as the importance of selecting the right number of hidden units to avoid overfitting or underfitting. Additionally, it draws a comparison between multilayer perceptrons and Support Vector Machines, noting the advantages of SVM in terms of hidden unit determination.

Uploaded by

Anonymous b14XRy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views7 pages

Multilayer Percept Ron

Uploaded by

Anonymous b14XRy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

4 Nonlinear Regression and Multilayer Perceptron

In nonlinear regression the output variable y is no longer a linear function of the regression
parameters plus additive noise. This means that estimation of the parameters is harder. It
does not reduce to minimizing a convex energy functions – unlike the methods we described
earlier.
The perceptron is an analogy to the neural networks in the brain (over-simplified). It
receives a set of inputs y = dj=1 ωj xj + ω0 , see Figure (3).
P

Figure 3: Idealized neuron implementing a perceptron.

It has a threshold function which can be hard or soft. The hard one is ζ(a) = 1, if
T
a > 0, ζ(a) = 0, otherwise. The soft one is y = σ(~ ω T ~x) = 1/(1 + eω~ ~x ), where σ(·) is the
sigmoid function.
There are a variety of different algorithms to train a perceptron from labeled examples.
Example: The quadratic error:
E(~ω |~xt , y t ) = 12 (y t − ω
~ · ~xt )2 ,
∂E
for which the update rule is ∆ωjt = −∆ ∂ω j
= +∆(y t ω~ · ~xt )~xt . Introducing the sigmoid
function rt = sigmoid(~ T t we have
P tω ~x ),
E(~ω |~x , y ) = − i ri log yit + (1 − rit ) log(1 − yit ) , and the update rule is
t t

∆ωjt = −η(rt − y t )xtj , where η is the learning factor. I.e, the update rule is the learning
factor × (desired output – actual output)× input.

10
4.1 Multilayer Perceptrons
Multilayer perceptrons were developed to address the limitations of perceptrons (introduced
in subsection 2.1) – i.e. you can only perform a limited set of classification problems, or
regression problems, using a single perceptron. But you can do far more with multiple
layers where the outputs of the perceptrons at the first layer are input to perceptrons at
the second layer, and so on.
Two ingredients: (I) A standard perceptron has a discrete outcome, sign(~ ω ·~x) ∈ {±1}.
− dj=1 (ωhj ·xj +ω0j )
P
ωh ·~x) = 1/{1 + e
It is replaced by a graded, or soft, output zh = σ(~ }, with
h = 1..H. See figure (4). This makes the output a differentiable function of the weights ω ~.

ωh · ~x) tends to 0 for small (~a · ~x) and tends to 1 for

Figure 4: The sigmoid function of (~
large (~a · ~x).

(II) Introduce hidden units, or equivalently, multiple layers, see figure (5).
The output is
H
X
T
yi = ~νi z = = νhi zh + ν0i .
h

Other output function can be used, e.g. yi = σ(~νiT ~z).

Figure 5: A multi-layer perceptron with input x’s, hidden units z’s, and outputs y’s.

11
Many levels can be specified. What do the hidden units represent? Many people have
tried to explain them but it is unclear. The number of hidden units is related to the
capacity of the perceptron. Any input-output function can be represented as a multilayer
perceptron with enough hidden units.

12
4.2 Training Multilayer Perceptrons
For training a multilayer perceptron we have to estimate the weights ωhj , νij of the per-
ceptron. First we need an error function. It can be defined as:
X X X
E[ω, ν] = {yi − νih σ( ωhj xj )}2
i h j

The update terms are the derivatives of the error function with respect to the param-
eters:
∂E
∆ωhj = − ,
∂ωhj
which is computed by the chain rule, and
∂E
∆νih = − ,
∂νih
which is computed directly.
By defining rk = σ( j ωkj xj ), E = j (yi − k νik rk )2 , we can write
P P P

∂E X ∂E ∂rk
= · ,
∂ωkj r
∂rk ∂ωkj

where
∂E X X
= −2 (yi − νil rl )νik ,
∂rk
j l

∂rk X
= xj σ 0 ( ωkj xj ),
∂ωkj
j

d
σ 0 (z) = σ(z) = σ(z){1 − σ(z)}.
dz
Hence,
∂E X X X X
= −2 (yi − νil rl )νik xk σ( ωkj xj ){1 − σ( ωkj xj )},
∂ωhj
j l j j
P P
where j (yi − l νil rl ) is the error at the output layer, νik is the weight k from middle
layer to output layer.
This is called backpropagation
P The
P error at the output layer is propagated back to the
nodes at the middle layer j (yi − l νil rl ) where it is multiplied by the activity rk (1 − rk )
at that node, and by the activity xj at the input.

13
4.2.1 Variants
One variant is learning in batch mode, which consists in putting all data into an energy
function – i.e., to sum the errors over all the training data. The weights are updated
according to the equations above, by summing over all the data.
Another variant is to do online learning. In this variant, at each time step you select
an example (xt , y t ) at random from a dataset, or from some source that keeps inputting
exmaples, and perform one iteration of steepest descent using only that datapoint. I.e. in
the update equations remove the summation over t. Then you select another datapoint at
random, do another iteration of steepest descent, and so on. This variant is suitable for
problems in which we keep on getting new input over time.
This is called stochastic descent (or Robins-Monroe) and has some nice properties
including better convergence than the batch method described above. This is because
selecting the datapoints at random introduces an element of stochasticity which prevents
the algorithm from getting stuck in a local minimum (although the theorems for this require
multiplying the update – the gradiant – by a terms that decreases slowly over time).

14
4.3 Critical issues
One big issue is the number of hidden units. This is the main design choice since the
number of input and output units is determined by the problem.
Too many hidden units means that the model will have too many parameters – the
weights ω, ν – and so will fail to generalize if there is not enough training data. Conversely,
too few hidden units means restricts the class of input-output functions that the multilayer
perceptron can represent, and hence prevents it from modeling the data correctly. This is
the classic bias-variance dilemma (previous lecture).
A popular strategy is to have a large number of hidden units but to add a regularizer
term that penalizes the strength of the weights, This can be done by adding an additional
energy term: X X
2 2
λ ωhj + νih
j,j i,h

This term encourages the weights to be small and maybe even to be zero, unless the
data says otherwise. Using an L1 -norm penalty term is even better for this.
Still, the number of hidden units is a question and in practice some of the most effective
multilayer perceptrons are those in which the structure was hand designed (by trial and
error).

15
4.4 Relation to Support Vector Machines
P
In a P
perceptron
P we get yi = h νih zh at the output and at the hidden layer we get
zh = j σ( h ωhj xj ) from the input layer.
Support Vector Machines (SVM) can also be represented in this way.
X
y = sign( αµ yµ ~xµ · ~x),
µ
P
with ~xµ · ~x = zµ the hidden units response, i.e, y = sign( µ αµ yµ zµ ).
An advantage of SVM is that the number of hidden units is given by the number of
support vectors. {αµ } is specified by minimizing the primal problem, and there is a well
defined algorithm to perform this minimization.

(Treading On Python 1) Matt Harrison-Treading On Python Volume 1 - Foundations of Python. 1 (2011)
No ratings yet
(Treading On Python 1) Matt Harrison-Treading On Python Volume 1 - Foundations of Python. 1 (2011)
170 pages
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
No ratings yet
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
7 pages
Thesis Attention-Based Encoder-Decoder Models For Speech Processing
No ratings yet
Thesis Attention-Based Encoder-Decoder Models For Speech Processing
219 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
Competitive Learning Neural Network
No ratings yet
Competitive Learning Neural Network
62 pages
Computer Vision and Machine Learning in Agriculture: Mohammad Shorif Uddin Jagdish Chand Bansal Editors
100% (2)
Computer Vision and Machine Learning in Agriculture: Mohammad Shorif Uddin Jagdish Chand Bansal Editors
180 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
Chapter 7
No ratings yet
Chapter 7
68 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Integrating+LLMs+into+AI-Driven+Supply+Chains
No ratings yet
Integrating+LLMs+into+AI-Driven+Supply+Chains
35 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Assignment 7 Solution
No ratings yet
Assignment 7 Solution
3 pages
Contents
No ratings yet
Contents
26 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
Machine Learning Based Chronic Disease Heart Attack Prediction
No ratings yet
Machine Learning Based Chronic Disease Heart Attack Prediction
6 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Construction Waste Sorting
No ratings yet
Construction Waste Sorting
15 pages
Python Learn Python Regular Expressions FAST The Ultimate Crash Course To Learning The Basics of Python Regular Expressions - (Acodemy)
100% (1)
Python Learn Python Regular Expressions FAST The Ultimate Crash Course To Learning The Basics of Python Regular Expressions - (Acodemy)
127 pages
Social Issues and Professional Practice in IT & Computing: Department of Computer Science University of Cape Town
100% (1)
Social Issues and Professional Practice in IT & Computing: Department of Computer Science University of Cape Town
133 pages
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
No ratings yet
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
20 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
73rd IPC Poster Akhila
No ratings yet
73rd IPC Poster Akhila
1 page
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Lecture 8
No ratings yet
Lecture 8
65 pages
DAL Syllabus
No ratings yet
DAL Syllabus
4 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
Ieee Paper - Isaft
No ratings yet
Ieee Paper - Isaft
5 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
16 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
ANN Unit 3
No ratings yet
ANN Unit 3
100 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
Entropy 23 00018 v2 41
No ratings yet
Entropy 23 00018 v2 41
1 page
Lec 8
No ratings yet
Lec 8
43 pages
Ai Unit 5
No ratings yet
Ai Unit 5
138 pages
Unit 3
No ratings yet
Unit 3
110 pages
K Means MLExpert
No ratings yet
K Means MLExpert
3 pages
Neural Network
100% (1)
Neural Network
54 pages
Continuous Human Action Recognition For Human Machine Interaction A Review
No ratings yet
Continuous Human Action Recognition For Human Machine Interaction A Review
31 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Neural Network
No ratings yet
Neural Network
44 pages
Statement of Purpose Northumbria University
No ratings yet
Statement of Purpose Northumbria University
2 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Session XX - Neural Network
No ratings yet
Session XX - Neural Network
43 pages
Neural
No ratings yet
Neural
53 pages
Chapter 6 - Feedforward Deep Networks
No ratings yet
Chapter 6 - Feedforward Deep Networks
27 pages
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
9 pages
2 3 4 6 7 8 9 Coursenotes
No ratings yet
2 3 4 6 7 8 9 Coursenotes
98 pages
Unit 1: Introduction To Soft Computing and Neural Networks
No ratings yet
Unit 1: Introduction To Soft Computing and Neural Networks
6 pages
Artificial Intelligencebased Techniques For Crime Scene Reconstruction and Investigation An Overview
No ratings yet
Artificial Intelligencebased Techniques For Crime Scene Reconstruction and Investigation An Overview
3 pages
Calo, Artificial Intelligence Policy A Primer and Roadmap
No ratings yet
Calo, Artificial Intelligence Policy A Primer and Roadmap
28 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
A Comprehensive Survey of Privacy-Preserving Federated Learning A Taxonomy, Review, and Future Directions
No ratings yet
A Comprehensive Survey of Privacy-Preserving Federated Learning A Taxonomy, Review, and Future Directions
36 pages
Passive Control Methods
No ratings yet
Passive Control Methods
42 pages
Sem 8
No ratings yet
Sem 8
18 pages
Analysis and Design Optimization of Micromixers: Arshad Afzal Kwang-Yong Kim
No ratings yet
Analysis and Design Optimization of Micromixers: Arshad Afzal Kwang-Yong Kim
74 pages
ML 03
No ratings yet
ML 03
42 pages
Harnessing AI in Digital Marketing
No ratings yet
Harnessing AI in Digital Marketing
9 pages
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
25 pages
Slide 2
No ratings yet
Slide 2
35 pages
Class 10 Ai Sample Paper MS 23-24
50% (2)
Class 10 Ai Sample Paper MS 23-24
8 pages
Matiasdel Campo
No ratings yet
Matiasdel Campo
14 pages
Fast Training of Multilayer Perceptrons
No ratings yet
Fast Training of Multilayer Perceptrons
15 pages
CCITT/ITU-T 60 Anniversary
No ratings yet
CCITT/ITU-T 60 Anniversary
56 pages
Fundamentals of Helicopter Dynamics by Venkatesan, C (Z-Lib - Org) - 1
No ratings yet
Fundamentals of Helicopter Dynamics by Venkatesan, C (Z-Lib - Org) - 1
27 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Wagner 2005
No ratings yet
Wagner 2005
8 pages
Utilization of Big Data in The Monetary Sector
No ratings yet
Utilization of Big Data in The Monetary Sector
16 pages
Review (3) A Comprehensive Review On Email Spam Classification Using Machine Learning Algorithms
No ratings yet
Review (3) A Comprehensive Review On Email Spam Classification Using Machine Learning Algorithms
6 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
No ratings yet
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
22 pages
ImportingHandlingData 17 02 2015
No ratings yet
ImportingHandlingData 17 02 2015
6 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Mcculloch-Pitts "Unit": A G (In) G W A
No ratings yet
Mcculloch-Pitts "Unit": A G (In) G W A
4 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Multilayer Perceptron and Uppercase Handwritten Characters Recognition
No ratings yet
Multilayer Perceptron and Uppercase Handwritten Characters Recognition
4 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)