0% found this document useful (0 votes)

51 views61 pages

Ann MPDM

1. The perceptron was an early artificial neural network proposed by Rosenblatt in 1958 that could learn to classify data. It used a linear model and step activation function to make predictions. 2. The perceptron learning algorithm adjusts the weights of the network based on errors to minimize incorrect predictions on the training data. Weights are updated incrementally for each data point using a learning rate. 3. This allows the perceptron to learn linear discriminant functions to separate classes, provided the data is linearly separable. It served as an important early model but was limited to linear problems.

Uploaded by

sidra shafiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views61 pages

Ann MPDM

Uploaded by

sidra shafiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Master in Information Management

202223

Artificial Neural Networks

Ricardo Santos
I have always been convinced that the only way to get artificial
intelligence to work is to do the computation in a way similar to the
human brain…

Geoffrey Hinton
The AI Buzz
All of the algorithms used in the previous applications…

1. Have the ability to learn automatically from data

2. Do non-linear interpolation
3. Are universal approximators: Are able to approximate different
functions, no matter how complex that function may be
4. Have found success in modelling different phenomena, even those
where there is little knowledge about the phenomena
5. Attempt to mimic how a brain captures and stores information
Inspired from Biology

How the brain looks like to the naked eye What some neuronal pathways look like
using diffusion sprectrum imaging (DSI)
Credit to: http://www.humanconnectomeproject.org/
The biological neuron

Dendrites: Receive information

Soma or Cell Body: Processes the
information

Axon: Carries information (electric pulses)

to the Axon terminal

Synapse: The spectacular junction

between the axon terminal and the
dendrites of other neurons
The artificial neuron

Inputs Dendrites Output Axon

Dendrites: Receive inputs (x1,..., xn) at a x1 w1 𝑤1 𝑥1

synapse
𝑓 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏
Cell Body: Processes the information: 𝑖
𝑤2 𝑥2
1. Weighted sum of xi x2 w2 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓
𝑖
2. Activation function f
… …
Axon: Exports the output of the cell body to
Cell body
other neurons or the environment 𝑤𝑛 𝑥𝑛
xn wn
Neurons are the building blocks of even the most
complex ANN architectures
Inputs

????

A Very Complex Neural Network Architecture

by Andrej Kaparthy
source here

Outputs
A
G
E
N
D
A
1 An historical introduction

2 The Multi-Layer Perceptron

An historical introduction
Modelling the brain: the main inspiration for Deep Learning
1 An historic perspective

McCulloch & Pitts (1943): networks

of binary neurons can do logic

1950 1960

Frank Rosenblatt (1958):

The Perceptron
1 The Perceptron

Linear Discriminant for Binary

x1 w1 𝑤1 𝑥1
Classification:
𝑓 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 a. Obtains the equation of a line that
𝑖
𝑤2 𝑥2 discriminates between 2 linearly
x2 w2 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓
𝑖 separable classes

… … b. Makes decisions according to the step

(also known as Heaviside) function:
𝑤𝑛 𝑥𝑛
xn wn
0, ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 < 𝜃
𝑖
𝑓(𝑋) =
1, ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 ≥ 𝜃
𝑖
1 How the Perceptron Learns

Linear Discriminant for Binary

… … b. Makes decisions according to the step

(also known as Heaviside) function
𝑤𝑛 𝑥𝑛
xn wn c. Adjusts weights after making an incorrect
prediction
d. Objective function focuses on error
minimization
1 How the Perceptron Learns

x1 x2 y
Training a perceptron (numerical example): 0 0 0
a) 4 instances 0 1 0

b) 2 independent variables (x1 and x2) and one dependent 1 0 0

1 1 1
variable (y)
The algorithm:
Initialize weights, threshold and learning rate
For each instance:
i. Obtain prediction – Forward Pass
ii. Assess error in prediction
iii. Adjust the weights of the perceptron – Backward Pass
1 How the Perceptron Learns

Initialize Weights (random): x1 x2 y

0 0 0
𝑤1 = 0.9 𝑤2 = 0.9
0 1 0
Initialize threshold and learning rate:
1 0 0
𝜃 = 0.5 𝛼 = 0.5
1 1 1
First instance:
i. Calculate output of linear section
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 0 + 0.9 × 0 = 0

ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise

𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise.

1 How the Perceptron Learns

Second instance: x1 x2 y
0 0 0
𝑤1 = 0.9 𝑤2 = 0.9 𝜃 = 0.5 𝛼 = 0.5
0 1 0
i. Calculate output of linear section
1 0 0
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 0 + 0.9 × 1 = 0.9
1 1 1
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 1

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0 − 1 = −1
𝑤1𝑛𝑒𝑤 = 𝑤1 + α × 𝜀 × 𝑥1 = 0.9 + 0.5 × −1 × 0 = 0.9
𝑤2𝑛𝑒𝑤 = 𝑤2 + α × 𝜀 × 𝑥2 = 0.9 + 0.5 × −1 × 1 = 0.4
1 How the Perceptron Learns

Third instance: x1 x2 y
0 0 0
𝑤1 = 0.9 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 0
i. Calculate output of linear section
1 0 0
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 1 + 0.4 × 0 = 0.9
1 1 1
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 1

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0 − 1 = −1
𝑤1𝑛𝑒𝑤 = 𝑤1 + α × 𝜀 × 𝑥1 = 0.9 + 0.5 × −1 × 1 = 0.4
𝑤2𝑛𝑒𝑤 = 𝑤2 + α × 𝜀 × 𝑥2 = 0.9 + 0.5 × −1 × 0 = 0.4
1 How the Perceptron Learns

Fourth instance: x1 x2 y
0 0 0
𝑤1 = 0.4 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 0
i. Calculate output of linear section
1 0 0
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.4 × 1 + 0.4 × 1 = 0.8
1 1 1
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 1

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2 What now?
1 How the Perceptron Learns

First instance: x1 x2 y
0 0 0
𝑤1 = 0.4 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 0
i. Calculate output of linear section
1 0 0
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.4 × 0 + 0.4 × 0 = 0
1 1 1
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2
1 How the Perceptron Learns

Second instance: x1 x2 y
0 0 0
𝑤1 = 0.4 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 0
i. Calculate output of linear section
1 0 0
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.4 × 0 + 0.4 × 1 = 0.4
1 1 1
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2
1 How the Perceptron Learns

Third instance: x1 x2 y
0 0 0
𝑤1 = 0.4 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 0
i. Calculate output of linear section
1 0 0
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.4 × 1 + 0.4 × 0 = 0.4
1 1 1
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2 Everything is classified correctly,
no more updates can be made
1 An historic perspective

McCulloch & Pitts (1943): networks Minsky and Papert (1969): The
of binary neurons can do logic limitations of the Perceptron

1950 1960 1970

Frank Rosenblatt (1958):

The Perceptron
1 How the Perceptron Learns

Problems with the XOR: x1 x2 y

0 0 0
a) 4 instances
0 1 1
b) 2 independent variables (x1 and x2) and one 1 0 1
dependent variable (y) 1 1 0

c) This problem requires a non-linear solution

Initialize Weights (random):

𝑤1 = 0.9 𝑤2 = 0.9
Initialize threshold and learning rate:
𝜃 = 0.5 𝛼 = 0.5
1 How the Perceptron Learns

First instance: x1 x2 y
0 0 0
𝑤1 = 0.9 𝑤2 = 0.9 𝜃 = 0.5 𝛼 = 0.5
0 1 1
i. Calculate output of linear section
1 0 1
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 0 + 0.9 × 0 = 0
1 1 0
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2
1 How the Perceptron Learns

Second instance: x1 x2 y
0 0 0
𝑤1 = 0.9 𝑤2 = 0.9 𝜃 = 0.5 𝛼 = 0.5
0 1 1
i. Calculate output of linear section
1 0 1
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 0 + 0.9 × 1 = 0.9
1 1 0
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 1

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2
1 How the Perceptron Learns

Third instance: x1 x2 y
0 0 0
𝑤1 = 0.9 𝑤2 = 0.9 𝜃 = 0.5 𝛼 = 0.5
0 1 1
i. Calculate output of linear section
1 0 1
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 1 + 0.9 × 0 = 0.9
1 1 0
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 1

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2
1 How the Perceptron Learns

Fourth instance: x1 x2 y
0 0 0
𝑤1 = 0.9 𝑤2 = 0.9 𝜃 = 0.5 𝛼 = 0.5
0 1 1
i. Calculate output of linear section
1 0 1
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 1 + 0.9 × 1 = 1.8
1 1 0
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 1

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0 − 1 = −1
𝑤1𝑛𝑒𝑤 = 𝑤1 + α × 𝜀 × 𝑥1 = 0.9 + 0.5 × −1 × 1 = 0.4
𝑤2𝑛𝑒𝑤 = 𝑤2 + α × 𝜀 × 𝑥2 = 0.9 + 0.5 × −1 × 1 = 0.4
1 How the Perceptron Learns

First instance: x1 x2 y
0 0 0
𝑤1 = 0.4 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 1
i. Calculate output of linear section
1 0 1
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.9 × 0 + 0.9 × 0 = 0
1 1 0
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 0
𝑤1𝑛𝑒𝑤 = 𝑤1
𝑤2𝑛𝑒𝑤 = 𝑤2
1 How the Perceptron Learns

Second instance: x1 x2 y
0 0 0
𝑤1 = 0.4 𝑤2 = 0.4 𝜃 = 0.5 𝛼 = 0.5
0 1 1
i. Calculate output of linear section
1 0 1
𝑤1 𝑥1 + 𝑤2 𝑥2 = 0.4 × 0 + 0.4 × 1 = 0.4
1 1 0
ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise
𝑦ො = 0

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො = 1 − 0 = 1
𝑤1𝑛𝑒𝑤 = 𝑤1 + α × 𝜀 × 𝑥1 = 0.4 + 0.5 × 1 × 1 = 0.9
𝑤2𝑛𝑒𝑤 = 𝑤2 + α × 𝜀 × 𝑥2 = 0.4 + 0.5 × 1 × 1 = 0.9 Can you start to see the problem?
1 How the Perceptron Learns

Able to Solve the AND Problem Incapable of Solving the XOR Problem
1 An historic perspective

McCulloch & Pitts (1943): networks Minsky and Papert (1969): The
of binary neurons can do logic limitations of the Perceptron

1950 1960 1970 1980

Frank Rosenblatt (1958): Werbos (1974) & Rumelhart (1986): Backpropagation

The Perceptron A gradient approach for error propagation throughout

multiple layers of ANN
1 Backpropagation

Backpropagation allowed for:

Making explicit the need non-linear activation functions
1 Backpropagation

Backpropagation allowed for:

Making explicit the need non-linear activation functions
The entire network to not be squashed into a single linear transformation

Decision boundary of NN without non- Decision boundary of NN with non-linearities

linearities
1 Backpropagation

Backpropagation allowed for:

Making explicit the need non-linear activation functions
The entire network to not be squashed into a single linear transformation
It uses partial derivates and the chain-rule to update weights

Consider the following computation graph of the equations of a perceptron:

𝒙 – input vector
𝒘 – weight vector
𝒙
∙ 𝒘𝒙
+
𝒛
𝒇
%
𝒉
𝒃 – bias of linear
𝒛 = 𝑾𝒙 + 𝒃
𝒘 𝒃 𝒉 = 𝒇(𝒛)
Any reasonable human being…
1 Backpropagation

To backpropagate error:
1. Start backwards and compute the contribution of each operation to the result
2. Uses the chain rule to compute the gradients
𝒙 – input vector
𝒘 – weight vector
𝒙
∙ 𝒘𝒙
+
𝒛
𝒇 𝒉
𝒃 – bias of linear
𝝏𝒉 𝝏𝒉 𝒛 = 𝒘𝒙 + 𝒃
𝒘 𝒃 𝝏𝒛 𝝏𝒉 𝒉 = 𝒇(𝒛)
𝝏𝒉 𝝏𝒉
𝝏𝒘 𝝏𝒃
1 Backpropagation

To backpropagate error:
1. Start backwards and compute the contribution of each operation to the result
2. Uses the chain rule to compute the gradients:
i. Each of these operation nodes has a local gradient, consider the example where
we have 𝒛 = 𝒘𝒙

𝒙
Local gradient
𝒘
𝝏𝒛
𝝏𝒘 ∙ 𝒛

𝝏𝒉
𝝏𝒉 𝝏𝒛
𝝏𝒘
Downstream gradient Upstream gradient
1 Backpropagation

𝝏𝒉
𝝏𝒘
𝝏𝒉 𝝏𝒉 𝝏𝒛 𝝏𝒛
=
𝝏𝒘 𝝏𝒛 𝝏𝒘
1 Chapter 1 – Main Takeaways

1. Perceptron was one of the precursors of Modern ANN:

i. Starts by computing the equation of a linear discriminant function in binary classification
ii. Checks whether new observation is above or below that equation
iii. Updates weights for every wrong prediction (without a gradient)
iv. Only works correctly on linearly separable problems

2. Backpropagation:
i. Introduces the ability to use gradient-based learning
ii. Different logic for weight updates
iii. Allows the use of non-linear activation function: allowing for non-linear relationships
1 Chapter 1 – Main Takeaways

Core components of modern neural networks (that also exist in larger networks):
i. The forward pass starts from the current weights and inputs to compute the results of the
operations
ii. The backward pass computes the gradients that led to the classification output using
backpropagation
iii. Backpropagation is the algorithm used to apply the chain rule along a computational
graph:
Downstream gradient = Upstream gradient x local gradient
2

The Multi-Layer Perceptron

A feedforward artificial neural network with multiple layers of perceptrons,
capable of modelling complex non-linear relationships between inputs and outputs.
2 Multi-Layer Perceptron
The outputs from the input layers
are sent towards the Hidden
Layers. The outputs of the last
hidden layer are sent to the
output layer
Each input neuron gets
one input (one feature)

Both input and output layer are set

by how the problem is formulated:
Input layer size: number of features
Output layer size: Depends on how
prediction is computed
2 Multi-Layer Perceptron

Input Layer:
i. Introduces inputs to the network
ii. No processing or activation function

Hidden Layers
i. Take, as input, the outputs of previous layers
and passes them along to the next layers
ii. Two hidden layers are considered enough to
handle most problems

Output Layer
i. Generates prediction using the outputs of the
hidden layers as its inputs
ii. Backpropation in MLP is done for all weights
from input layer to output layer
2 Multi-Layer Perceptron

ANNs as universal approximators:

i. Non-linear activation functions allow for the models to go beyond
the application of mere linear transformations
ii. Extra layers (facilitated by back-propagation) allow for the capacity
to model more complex phenomena
iii. It is the increased number of layers, combined with the non-
linearities that allow these models to approximate any complex
function
2 Training an MLP – Numeric Example

Consider the following situation:

i. We have a dataset with data for Cats and Dogs
ii. We have a 5-2-1 MLP architecture

Weight Softness Purrs/min Barks/min Tail Length Label

ID
(X1) (X2) (X3) (X4) (X5) (y)
1 0.2 0.8 0.3 0.0 0.7 1
2 0.3 0.7 0.2 0.0 0.4 1
3 0.6 0.1 0.0 0.4 0.7 0
4 0.7 0.2 0.0 0.5 0.8 0
5 0.2 0.8 0.3 0.0 0.6 1
2 Training an MLP – Numeric Example

Step 1 - Initialization:
i. Weights – Random initialization

Hidden Layer Weights

𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏 𝑩𝟏
𝟎𝟏 𝒘𝟎𝟐
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3

Output Layer Weights

𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.8 -0.2 -0.2
2 Training an MLP – Numeric Example

Step 1 - Initialization:
ii. Learning Rate and how it evolves across iterations
𝛼 = 0.5, 𝑑𝑒𝑐𝑎𝑦𝑖𝑛𝑔 0.05 𝑝𝑒𝑟 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 (𝑒𝑝𝑜𝑐ℎ)
iii. Setting the Activation Function

Output layer: Hidden layers:

Quick note: In sklearn, the activation function of the output layer is already set for you
2 Training an MLP – Numeric Example

ID X1 X2 X3 X4 X5 y
1 0.2 0.8 0.3 0.0 0.7 1
Step 2 – Forward Pass:
𝑛

𝑧1∗ = ෍ 𝑤𝑖 𝑥𝑖 + 𝑤0 = 0.3 × 0.2 + −0.2 × 0.8 + 0.3 × 0.3 + −0.2 × 0 + 0.3 × 0.7 + 0.5 = 0.7
𝑖=1

1 1
𝑎11 = = = 0.668
1 + 𝑒 −𝑧1 1 + 𝑒 −0.7

1
𝑎21 = = 0.812
1 + 𝑒 −1.46
𝑧2∗ = 0.9 × 0.2 + 0.1 × 0.9 + 0.3 × 0.3 + 0.1 × 0 + 0.9 × 0.7 + 0.3 = 1.46
Hidden Layer Weights Output Layer Weights
𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏 𝑩𝟏
𝟎𝟏 𝒘𝟎𝟐 𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3 0.8 -0.2 -0.2
2 Training an MLP – Numeric Example

ID X1 X2 X3 X4 X5 y
1 0.2 0.8 0.3 0.0 0.7 1
Step 2 – Forward Pass:

𝑎11 = 0.668
𝑛

𝑧3∗ = ෍ 𝑤𝑖 𝑥𝑖 + 𝑤0 = 0.8 × 0.668 + (−0.2) × 0.812 + (−0.2) = 0.172

𝑖=1

1
𝑎12 = = 0.543 ≠ 𝑦 Need to update weights
1 + 𝑒 −0.172

𝑎21 = 0.812
Hidden Layer Weights Output Layer Weights
𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏 𝑩𝟏
𝟎𝟏 𝒘𝟎𝟐 𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3 0.8 -0.2 -0.2
2 Training an MLP – Numeric Example

Step 3 – Backward Pass:

i. Computing error for each unit j:
i. At the output layer
𝐸𝑟𝑟𝑗 = 𝑦ො𝑗 (1 − 𝑦ො𝑗 )(𝑦𝑗 − 𝑦ො𝑗 )

ii. At the hidden layer

𝐸𝑟𝑟𝑗 = 𝑦ො𝑗 (1 − 𝑦ො𝑗 ) ෍ 𝐸𝑟𝑟𝑘 𝑤𝑗𝑘

𝑘

ii. Update weights:

∆𝑤𝑖𝑗 = 𝛼𝐸𝑟𝑟𝑗 𝑎𝑖
𝑤𝑖𝑗 = 𝑜𝑙𝑑_𝑤𝑖𝑗 + ∆𝑤𝑖𝑗
2 Training an MLP – Numeric Example

Step 3 – Backward Pass

Compute error at the output layer

𝐸𝑟𝑟𝑎12 = 𝑎12 1 − 𝑎12 𝑦𝑎12 − 𝑎12 = 0.543 × 1 − 0.543 × 1 − 0.543 = 0.113

Compute the updated weights for output layer

𝐵2 𝐵2
𝑤01 = 𝑤01 + 𝛼𝐸𝑟𝑟𝑎12 = −0.2 + 0.5 × 0.113 = −0.143
2 2
𝑤11 = 𝑤11 + (𝛼 × 𝐸𝑟𝑟𝑎12 × 𝑎11 ) = 0.8 + 0.5 × 0.113 × 0.668 = 0.838
2 2
𝑤21 = 𝑤21 + (𝛼 × 𝐸𝑟𝑟𝑎12 × 𝑎21 ) = −0.2 + 0.5 × 0.113 × 0.812 = −0.154
Hidden Layer Weights Output Layer Weights
𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏 𝑩𝟏
𝟎𝟏 𝒘𝟎𝟐 𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3 0.8 -0.2 -0.2
2 Training an MLP – Numeric Example

Step 3 – Backward Pass

Compute errors at the hidden layer

𝐸𝑟𝑟𝑎11 = 𝑎11 1 − 𝑎11 𝐸𝑟𝑟𝑎12 𝑤11
2
= 0.668 × 1 − 0.668 × 0.113 × 0.838 = 0.021

𝐸𝑟𝑟𝑎21 = 𝑎12 1 − 𝑎21 𝐸𝑟𝑟𝑎12 𝑤21

2
= 0.812 × 1 − 0.812 × 0.113 × −0.154 = 0.003

Update weights of hidden layer

𝐵1 𝐵1
𝑤01 = 𝑤01 + 𝛼𝐸𝑟𝑟𝑎11 = 0.5 + 0.5 × 0.021 =0.511

1 1
𝑤11 = 𝑤11 + (𝛼 × 𝐸𝑟𝑟𝑎11 × 𝑥1 ) = 0.3 + 0.5 × 0.021 × 0.2 = 0.302

Hidden Layer Weights Output Layer Weights

𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏 𝑩𝟏
𝟎𝟏 𝒘𝟎𝟐 𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3 0.8 -0.2 -0.2
2 Training an MLP – Numeric Example

Step 3 – Backward Pass

After repeating the process for all other weights, we would have:
Output Layer Weights
𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.8 -0.2 -0.2
0.838 -0.154 -0.143

Hidden Layer Weights

𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏
𝟎𝟏 𝒘𝑩𝟏
𝟎𝟐
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3
0.302 0.900 -0.192 0.099 0.303 0.900 -0.200 0.100 0.307 0.899 0.511 0.299
2 Training an MLP – Numeric Example
2 Training an MLP – Numeric Example
2 Training an MLP – Numeric Example
2 Training an MLP with sklearn

from sklearn.neural_network import MLPClassifier

mlp_model = MLPClassifier()

mlp_model.fit(X_train, y_train)

y_pred = mlp_model.predict(X_test)
2 Chapter 2 – Main Takeaways

1. MLP are universal approximators for almost any function:

i. Non-linear activation functions allow it to model non-linear relationships
ii. Additional layers allow for increased complexity
iii. Backpropagation combined with the non-linearities allows for weights to be updated
updated without squashing everything into a single linear equation

2. Training an MLP:
i. Forward pass -> Pass the inputs at the input layer into outputs at the Output layer
ii. Compute error (loss) at the end
iii. Backpropagate the error along the network, updating weights as you go in order to
minimize the error
We’ve barely scratched the surface of ANNs

1. Loss function – Cross-Entropy Loss for Classification

2. Batch Training
3. Learning Rate and Optimization Algorithms: Stochastic Gradient Descent,
Adam & others
4. Activation Functions: ReLU, Sigmoid, Tanh
5. Other forms of ANN (out of scope): Convolutional Neural Networks (CNN),
Recurrent Neural Networks (RNN), General Adversarial Networks (GAN)
References
- https://github.com/Atcold/NYU-DLSP21
- https://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03-neuralnets.pdf
- https://towardsdatascience.com/understanding-backpropagation-abcc509ca9d0
- http://cs231n.stanford.edu/schedule.html
- http://cs231n.stanford.edu/slides/2023/lecture_7.pdf
- https://cs230.stanford.edu/syllabus/
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge, MA: MIT Press.
Thank You!

Morada: Campus de Campolide, 1070-312 Lisboa, Portugal

Tel: +351 213 828 610 | Fax: +351 213 828 611

Analytical Chemistry Lecture 3
No ratings yet
Analytical Chemistry Lecture 3
37 pages
Chapter 2 - Process Identification (Updated With Solutions)
No ratings yet
Chapter 2 - Process Identification (Updated With Solutions)
54 pages
Practical Research 2: Quarter 1 - Module 1
0% (1)
Practical Research 2: Quarter 1 - Module 1
37 pages
Chapter 1 - Introduction To Business Process Management (Updated With Solutions)
No ratings yet
Chapter 1 - Introduction To Business Process Management (Updated With Solutions)
64 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
Chapter 7 - Process Redesign
No ratings yet
Chapter 7 - Process Redesign
80 pages
ITSM-2022-23.Aulas.05.The Service Value System
No ratings yet
ITSM-2022-23.Aulas.05.The Service Value System
15 pages
Chapter 1 - Summary
No ratings yet
Chapter 1 - Summary
15 pages
Flight Delay Prediction System Paper - 802 - 826 - 828
No ratings yet
Flight Delay Prediction System Paper - 802 - 826 - 828
7 pages
05 ZeroR OneR Bayes KNN
No ratings yet
05 ZeroR OneR Bayes KNN
76 pages
Ann MPDM Ii
No ratings yet
Ann MPDM Ii
42 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
Practice Problems In, R and Charts
100% (1)
Practice Problems In, R and Charts
2 pages
Chapter 5 - Practical
No ratings yet
Chapter 5 - Practical
10 pages
Final Syllabus 12 Jan 2023
No ratings yet
Final Syllabus 12 Jan 2023
44 pages
W8GS
100% (1)
W8GS
8 pages
Investment Decision Making and Environmental Scanning
No ratings yet
Investment Decision Making and Environmental Scanning
18 pages
Factor in R PDF
No ratings yet
Factor in R PDF
4 pages
Students With Curly and Straight Hair in Grade 11 SHS Autosaved
No ratings yet
Students With Curly and Straight Hair in Grade 11 SHS Autosaved
48 pages
Accuracy of Intermittent Demand Estimates
No ratings yet
Accuracy of Intermittent Demand Estimates
12 pages
Which Cities Produce Worldwide More Excellent Papers Than Can Be Expected? A New Mapping Approach-Using Google Maps-Based On Statistical Significance Testing
No ratings yet
Which Cities Produce Worldwide More Excellent Papers Than Can Be Expected? A New Mapping Approach-Using Google Maps-Based On Statistical Significance Testing
11 pages
Journals Perataa
No ratings yet
Journals Perataa
21 pages
What's New in IBM SPSS Statistics v24 & IBM SPSS Modeler v18
No ratings yet
What's New in IBM SPSS Statistics v24 & IBM SPSS Modeler v18
38 pages
Advanced Statistical Techniques Using R: Outliers and Missing Data
No ratings yet
Advanced Statistical Techniques Using R: Outliers and Missing Data
28 pages
Fasilitas (Indikator)
No ratings yet
Fasilitas (Indikator)
22 pages
Study On The Nexuses Between Supply Chain Information Technology Capabilities and Firm Performance: Exploring The Meditating Role of Innovation and Organizational Learning
No ratings yet
Study On The Nexuses Between Supply Chain Information Technology Capabilities and Firm Performance: Exploring The Meditating Role of Innovation and Organizational Learning
17 pages
Chapter12345 Mylene
No ratings yet
Chapter12345 Mylene
32 pages
DMRT For Table 1 2 and 3 Date 19.10.2024
No ratings yet
DMRT For Table 1 2 and 3 Date 19.10.2024
56 pages
Chapter 3 - Design & Development
No ratings yet
Chapter 3 - Design & Development
7 pages
Fibonacci
No ratings yet
Fibonacci
3 pages
2017 Business
No ratings yet
2017 Business
9 pages
Types of Data
No ratings yet
Types of Data
14 pages
JURNAL Kadafy Revisi
No ratings yet
JURNAL Kadafy Revisi
13 pages
The PQRST Strategy, Reading Comprehension, and Learning Styles
No ratings yet
The PQRST Strategy, Reading Comprehension, and Learning Styles
18 pages
Midterm BioStat 2023
No ratings yet
Midterm BioStat 2023
11 pages
Differences in Perception of Online Anesthesiology Between Thai Medical Students and Teachers During The COVID-19 Pandemic
No ratings yet
Differences in Perception of Online Anesthesiology Between Thai Medical Students and Teachers During The COVID-19 Pandemic
9 pages
Analysis of Green Marketing As Environme
No ratings yet
Analysis of Green Marketing As Environme
9 pages
Slicks IDL Math Stats
No ratings yet
Slicks IDL Math Stats
1 page
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2141)

Ann MPDM

Uploaded by

Ann MPDM

Uploaded by

Master in Information Management

Artificial Neural Networks

1. Have the ability to learn automatically from data

Dendrites: Receive information

Axon: Carries information (electric pulses)

Synapse: The spectacular junction

Inputs Dendrites Output Axon

Dendrites: Receive inputs (x1,..., xn) at a x1 w1 𝑤1 𝑥1

A Very Complex Neural Network Architecture

2 The Multi-Layer Perceptron

McCulloch & Pitts (1943): networks

Frank Rosenblatt (1958):

Linear Discriminant for Binary

… … b. Makes decisions according to the step

Linear Discriminant for Binary

… … b. Makes decisions according to the step

b) 2 independent variables (x1 and x2) and one dependent 1 0 0

Initialize Weights (random): x1 x2 y

ii. Compare result with 𝜃: assign 𝑦ො = 0 if lower and 𝑦ො = 1 otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise.

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

1950 1960 1970

Frank Rosenblatt (1958):

Problems with the XOR: x1 x2 y

c) This problem requires a non-linear solution

Initialize Weights (random):

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

iii. If 𝑦ො = 𝑦, do nothing. Compute error and adjust weights otherwise

1950 1960 1970 1980

Frank Rosenblatt (1958): Werbos (1974) & Rumelhart (1986): Backpropagation

The Perceptron A gradient approach for error propagation throughout

Backpropagation allowed for:

Backpropagation allowed for:

Decision boundary of NN without non- Decision boundary of NN with non-linearities

Backpropagation allowed for:

Consider the following computation graph of the equations of a perceptron:

1. Perceptron was one of the precursors of Modern ANN:

The Multi-Layer Perceptron

Both input and output layer are set

ANNs as universal approximators:

Consider the following situation:

Weight Softness Purrs/min Barks/min Tail Length Label

Hidden Layer Weights

Output Layer Weights

Output layer: Hidden layers:

𝑧3∗ = ෍ 𝑤𝑖 𝑥𝑖 + 𝑤0 = 0.8 × 0.668 + (−0.2) × 0.812 + (−0.2) = 0.172

Step 3 – Backward Pass:

ii. At the hidden layer

𝐸𝑟𝑟𝑗 = 𝑦ො𝑗 (1 − 𝑦ො𝑗 ) ෍ 𝐸𝑟𝑟𝑘 𝑤𝑗𝑘

ii. Update weights:

Step 3 – Backward Pass

Compute error at the output layer

𝐸𝑟𝑟𝑎12 = 𝑎12 1 − 𝑎12 𝑦𝑎12 − 𝑎12 = 0.543 × 1 − 0.543 × 1 − 0.543 = 0.113

Compute the updated weights for output layer

Step 3 – Backward Pass

Compute errors at the hidden layer

𝐸𝑟𝑟𝑎21 = 𝑎12 1 − 𝑎21 𝐸𝑟𝑟𝑎12 𝑤21

Update weights of hidden layer

Hidden Layer Weights Output Layer Weights

Step 3 – Backward Pass

Hidden Layer Weights

from sklearn.neural_network import MLPClassifier

1. MLP are universal approximators for almost any function:

1. Loss function – Cross-Entropy Loss for Classification

Morada: Campus de Campolide, 1070-312 Lisboa, Portugal

You might also like