0% found this document useful (0 votes)
113 views30 pages

MLP and Backpropagation

The document provides an overview of multilayer perceptrons (MLPs) and the backpropagation algorithm used to train them. It explains that MLPs use multiple layers of perceptron units arranged in a feedforward network. The extra layers allow MLPs to represent more complex, nonlinear relationships compared to a single perceptron. Backpropagation is then introduced as a method for calculating gradients to update the network weights using gradient descent, proceeding backwards from the output to the input layers. Key aspects of backpropagation like calculating error derivatives and updating weights are demonstrated step-by-step with an example.

Uploaded by

Fadhlan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views30 pages

MLP and Backpropagation

The document provides an overview of multilayer perceptrons (MLPs) and the backpropagation algorithm used to train them. It explains that MLPs use multiple layers of perceptron units arranged in a feedforward network. The extra layers allow MLPs to represent more complex, nonlinear relationships compared to a single perceptron. Backpropagation is then introduced as a method for calculating gradients to update the network weights using gradient descent, proceeding backwards from the output to the input layers. Key aspects of backpropagation like calculating error derivatives and updating weights are demonstrated step-by-step with an example.

Uploaded by

Fadhlan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

MLP and Backpropagation

x1

xn

We will introduce the MLP and the backpropagation


algorithm which is used to train it

MLP used to describe any general feedforward (no


recurrent connections) network

However, we will concentrate on nets with units


arranged in layers
2
x1

xn

Different books refer to the above as either 4 layer (no. of


layers of neurons) or 3 layer (no. of layers of adaptive
weights). We will follow the latter convention

1st question:
what do the extra layers gain you? Start with looking at
what a single layer can’t do
3
Perceptron Learning Theorem
• Recap: A perceptron (threshold unit) can
learn anything that it can represent (i.e.
anything separable with a hyperplane)

4
The Exclusive OR problem
A Perceptron cannot represent Exclusive OR
since it is not linearly separable.

5
6
Minsky & Papert (1969) offered solution to XOR problem by
combining perceptron unit responses using a second layer of
Units. Piecewise linear classification using an MLP with
threshold (perceptron) units

+1

+1
7
Three-layer networks
x1

x2

Input
Output

xn

Hidden layers
8
Properties of architecture
• No connections within a layer
• No direct connections between input and output layers
• Fully connected between layers
• Often more than 3 layers
• Number of output units need not equal number of input units
• Number of hidden units per layer can be more or less than
input or output units

Each unit is a perceptron

m
y

f
(
i
w
x

i
j

b
)
j
i
j

1

Often include bias as an extra weight


9
What do each of the layers do?

3rd layer can generate arbitrarily


1st layer draws linear 2nd layer combines the complex boundaries
boundaries boundaries
10
Backpropagation
note: in this example, the activation function is dismissed to ease the calculation
Backpropagation
Backpropagation
• Backpropagation, short for “backward propagation of errors”, is a
mechanism used to update the weights using gradient descent. It
calculates the gradient of the error function with respect to the neural
network’s weights. The calculation proceeds backwards through the
network.

• Gradient descent is an iterative optimization algorithm for finding the


minimum of a function; in our case we want to minimize th error
function. To find a local minimum of a function using gradient descent,
one takes steps proportional to the negative of the gradient of the
function at the current point.
Backpropagation
Backpropagation
Backpropagation
Activation Function

Left: Sigmoid non-linearity squashes real numbers to range between [0,1]


Right: The tanh non-linearity squashes real numbers to range between [-1,1].
Left: Rectified Linear Unit (ReLU) activation function, which is zero when x < 0 and then linear with slope 1
when x > 0.
Right: A plot from Krizhevsky et al. (pdf) paper indicating the 6x improvement in convergence with the ReLU
unit compared to the tanh unit.
• http://cs231n.github.io/neural-networks-1/
• https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-
example/
• https://www.anotsorandomwalk.com/backpropagation-example-
with-numbers-step-by-step/

You might also like