0% found this document useful (0 votes)

25 views84 pages

NN Lec - 04 - 05

The document discusses neural networks and multilayer perceptrons. It provides information on activation functions, including binary step functions, linear activation functions, and common non-linear activation functions like sigmoid and tanh. Examples and illustrations are given for different types of neural network units and architectures.

Uploaded by

Zeyad Gomaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views84 pages

NN Lec - 04 - 05

Uploaded by

Zeyad Gomaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Neural Networks

Lectures (4-5)

Dr. Mona Nagy ElBedwehy

Quiz (1)
The XOR function can be represented as

𝒙𝟏 XOR 𝒙𝟐 ⟺ 𝒙𝟏 𝑶𝑹 𝒙𝟐 𝑨𝑵𝑫 𝑵𝑶𝑻 𝒙𝟏 𝑨𝑵𝑫 𝒙𝟐

Construct a MADALINE to implement this formulation of XOR,

and compare it with the previous result.

Dr. Mona Nagy ElBedwehy

ElBedwehy 2
Quiz (2)
Find the weights required to perform the following classifications:
Vectors 𝟏, 𝟏, 𝟏, 𝟏 and −𝟏, 𝟏, −𝟏, −𝟏 are members of the class (1);
and vectors 𝟏, 𝟏, 𝟏, −𝟏 and 𝟏, −𝟏, −𝟏, 𝟏 are not members of the
class (-1). Use a learning rate of 0.5 and starting weights of 0. Using
each of the training vectors as input, test the response of the net.

Dr. Mona Nagy ElBedwehy

ElBedwehy 3
Introduction
➢ The Single layer perceptron is one of the oldest and first
introduced Neural Networks.
➢ It was proposed by Frank Rosenblatt in 1958.
➢ The Perceptron is also known as an artificial neural network.
➢ It is mainly used to compute the logical gate like AND, OR, and
NOR which has binary input and binary output.
➢ The main functionality of the perceptron is
▪ Takes input from the input layer.
▪ Weight them up and sum it up.
▪ Pass the sum to the activation function to produce output.
Dr. Mona Nagy ElBedwehy
ElBedwehy 4
Introduction

Dr. Mona Nagy ElBedwehy

ElBedwehy 5
Introduction

Dr. Mona Nagy ElBedwehy

ElBedwehy 6
Introduction
➢ As the history of AI begins, the development of AI, which
seemed to be successful, faces a problem and faces a recession.
➢ On the one hand, it is said to be an XOR problem of Perceptron.
➢ Looking at the graph of the right-most XOR gate, there was a
problem in which the results of + and - could not be divided
into one straight line (linear).
➢ While if you look at the OR gate and the AND gate can
distinguish between + and - with a single line, but the XOR gate
is impossible.
➢ The XOR results can be divided into two straight lines, as in the
following graph.
Dr. Mona Nagy ElBedwehy
ElBedwehy 7
Introduction

Dr. Mona Nagy ElBedwehy

ElBedwehy 8
Introduction
➢ Linear separability problem can be overcome by adding more
layers and choosing the values of weights and threshold in such
a way that the decision boundary gets converge into a close
region.
➢ Professor Marvin Minsky said, "We can solve it using multi-
layer.“

Dr. Mona Nagy ElBedwehy

ElBedwehy 9
Introduction

Dr. Mona Nagy ElBedwehy

ElBedwehy 10
Multilayer Perceptron
➢ A multilayer perceptron (MLP) is a perceptron that teams up
with additional perceptrons, stacked in several layers, to solve
complex problems.
➢ Each perceptron in the first layer on the left (the input layer),
sends outputs to all the perceptrons in the second layer (the
hidden layer), and all perceptrons in the second layer send
outputs to the final layer on the right (the output layer).
➢ A three-layer MLP is called a Non-Deep or Shallow Neural
Network.
➢ An MLP with four or more layers is called a Deep Neural
Network (DNN).
Dr. Mona Nagy ElBedwehy
ElBedwehy 11
Multilayer Perceptron

Dr. Mona Nagy ElBedwehy

ElBedwehy 12
Multilayer Perceptron
➢ One difference between an MLP and a neural network is that in
the classic perceptron, the decision function is a step function
and the output is binary.
➢ In neural networks that evolved from MLPs, other activation
functions can be used which result in outputs of real values,
usually between 0 and 1 or between -1 and 1.
➢ This allows for probability based predictions or classification of
items into multiple labels.
➢ A multilayer perceptron is a special case of a feedforward neural
network where every layer is a fully connected layer.

Dr. Mona Nagy ElBedwehy

ElBedwehy 13
Activation Functions
➢ An Activation function helps the NN to use important
information while suppressing irrelevant data points.
➢ An Activation function is used as a decision making body at
the output of a neuron, for example it decides whether a
neuron should be activated or not.
➢ The role of the activation function is to derive output from a
set of input values fed to a node (or a layer).
➢ The neuron learns Linear or Non-linear decision boundaries
based on the activation function.

Dr. Mona Nagy ElBedwehy 14

Activation Functions
➢ It has a normalizing effect on the neuron output which prevents
the output of neurons after several layers to become very large,
due to the cascading effect.
➢ The purpose of an activation function is to add non-linearity to
the neural network.

Dr. Mona Nagy ElBedwehy 15

Activation Functions
➢ There are 3 types of neural networks activation functions.
1. Binary Step Function.
2. Linear Activation Function.
3. Non-Linear Activation Functions.

Dr. Mona ElBedwehy 16

Binary Step Function
➢ The Binary step function depends on a threshold value that
decides whether a neuron should be activated or not.
➢ The input fed to the activation function is compared to a
certain threshold;
If the input is greater than it, then the neuron is activated,
else it is deactivated, meaning that its output is not passed on
to the next hidden layer.

Dr. Mona Nagy ElBedwehy 17

Binary Step Function
Here are some of the limitations of binary step function:

👉It cannot provide multi-value

outputs, i.e. it cannot be used
for multi-class classification
problems.

👉The gradient of the binary

step function is zero, which
causes a hindrance in the
backpropagation process.

Dr. Mona Nagy ElBedwehy 18

Linear Activation Function
➢ The linear activation function, also known as "no activation,"
or "identity function" (multiplied × 𝟏. 𝟎), is where the activation
is proportional to the input.

➢ The function doesn't do anything to the weighted sum of the

input, it simply spits out the value it was given.

➢ Mathematically it can be represented as:

𝒇(𝒙) = 𝒙
➢ In one sense, a linear function is better than a Binary step
function because it allows multiple outputs, not just yes and no.

Dr. Mona Nagy ElBedwehy 19

Linear Activation Function
➢ However, a linear activation function has two major problems :
1. It’s not possible to use backpropagation as the derivative of
the function is a constant and has no relation to the input 𝒙.
2. All layers of neural network will collapse into one if a linear
activation function is used.
No matter the number of layers in the neural network, the
last layer will still be a linear function of the first layer.

So, essentially, a linear activation function turns the NN into

just one layer.

Dr. Mona Nagy ElBedwehy 20

Linear Activation Function

Dr. Mona Nagy ElBedwehy 21

Non-Linear Activation Functions
Non-linear activation functions solve the following limitations of
linear activation functions:
1. They allow backpropagation because now the derivative
function would be related to the input, and it’s possible to
go back and understand which weights in the input
neurons can provide a better prediction.
2. They allow the stacking of multiple layers of neurons as
the output would now be a non-linear combination of input
passed through multiple layers. Any output can be
represented as a functional computation in a neural
network.
Dr. Mona Nagy ElBedwehy 22
Sigmoid / Logistic Activation Function
There are 10 non-Linear Neural Networks Activation Functions
1. Sigmoid / Logistic Activation Function
➢ This function takes any real value as input and outputs
values in the range of 0 to 1.

➢ The larger the input (more positive), the closer the output
value will be to 1.0, whereas the smaller the input (more
negative), the closer the output will be to 0.0.

➢ Mathematically it can be represented as:

Dr. Mona Nagy ElBedwehy 23

Sigmoid / Logistic Activation Function
Here’s why sigmoid/logistic activation function is one of the most
widely used functions:
➢ It is commonly used for models where we have to predict the
probability as an output. Since probability of anything exists
only between the range of 0 and 1, sigmoid is the right choice
because of its range.
➢ The function is differentiable and provides a smooth gradient,
i.e., preventing jumps in output values.

Dr. Mona Nagy ElBedwehy 24

Sigmoid / Logistic Activation Function

Dr. Mona ElBedwehy 25

Sigmoid / Logistic Activation Function
The limitations of sigmoid function are discussed below:
➢ The derivative is 𝒇′ 𝒙 = 𝐬𝐢𝐠𝐦𝐨𝐢𝐝 𝒙 ∗ 𝟏 − 𝐬𝐢𝐠𝐦𝐨𝐢𝐝 𝒙 .
As we can see from the Figure, the gradient values are only
significant for range -3 to 3, and the graph gets much flatter in
other regions.
➢ It implies that for values greater than 3 or less than -3, the
function will have very small gradients. As the gradient value
approaches zero, the network ceases to learn and suffers from
the Vanishing gradient problem.

Dr. Mona Nagy ElBedwehy 26

Tanh Function (Hyperbolic Tangent)
2. Tanh Function (Hyperbolic Tangent)
➢ The Tanh function is very similar to the sigmoid/logistic
activation function, and even has the same S-shape with the
difference in output range of -1 to 1.

➢ In Tanh, the larger the input (more positive), the closer the
output value will be to 1.0, whereas the smaller the input
(more negative), the closer the output will be to -1.0.
➢ Mathematically it can be represented as:

Dr. Mona Nagy ElBedwehy 27

Tanh Function (Hyperbolic Tangent)
Have a look at the gradient of the tanh activation function to
understand its limitations:
➢ As you can see— it also faces the problem of vanishing
gradients similar to the sigmoid activation function. Plus, the
gradient of the tanh function is much steeper as compared to
the sigmoid function.
➢ 💡 Note: Although both sigmoid and tanh face vanishing
gradient issue, tanh is zero centered, and the gradients are not
restricted to move in a certain direction. Therefore, in practice,
tanh nonlinearity is always preferred to sigmoid nonlinearity.

Dr. Mona Nagy ElBedwehy 28

Tanh Function (Hyperbolic Tangent)

Dr. Mona Nagy ElBedwehy 29

ReLU (Rectified Linear Unit) Function
➢ In the context of artificial neural networks, the rectifier or ReLU
(rectified linear unit) activation function is an activation
function defined as:
𝒙 𝐢𝐟 𝒙 > 𝟎
𝒇 𝒙 =ቐ = 𝐦𝐚𝐱 (𝟎, 𝒙)
𝟎 𝐢𝐟 𝒙 ≤ 𝟎

where 𝒙 is the input to a neuron.

➢ This is also known as a ramp function.

➢ This activation function was introduced by Kunihiko Fukushima

in 1969 in the context of visual feature extraction in hierarchical
neural networks.
Dr. Mona Nagy ElBedwehy 30
ReLU (Rectified Linear Unit) Function

Dr. Mona Nagy ElBedwehy 31

ReLU (Rectified Linear Unit) Function
Advantages
▪ Computationally efficient—allows the network to converge
very quickly.
▪ Non-linear—although it looks like a linear function.
Disadvantages
▪ When inputs approach to zero, or are negative, the gradient
of the function becomes zero, the network cannot perform
backpropagation and cannot learn.

Dr. Mona Nagy ElBedwehy 32

Leaky ReLU Function
➢ Leaky ReLUs allow a small, positive gradient when the unit is
not active, helping to mitigate the vanishing gradient problem.
𝒙 𝐢𝐟 𝒙 > 𝟎
𝒇 𝒙 =ቐ
𝟎. 𝟎𝟏𝒙 𝐢𝐟 𝒙 ≤ 𝟎

Advantages

▪ Prevents dying ReLU problem—this variation of ReLU has a

small positive slope in the negative area, so it does enable
backpropagation, even for negative input values.

Dr. Mona Nagy ElBedwehy 33

Leaky ReLU Function
Disadvantages
▪ The results not consistent — leaky ReLU does not provide
consistent predictions for negative input values.

Dr. Mona Nagy ElBedwehy 34

Softmax Function
➢ The softmax function, also known as normalized exponential
function, converts a vector of 𝑲 real numbers into a probability
distribution of 𝑲 possible outcomes.
➢ The softmax function is a generalization of the logistic function
to multiple dimensions, and used in multinomial logistic
regression.
➢ The softmax function is often used as the last activation function
of a neural network to normalize the output of a network to a
probability distribution over predicted output classes.

Dr. Mona Nagy ElBedwehy

ElBedwehy 35
Softmax Function
➢ For a vector 𝒛 of 𝑲 real numbers, the standard softmax
function 𝝈: ℝ𝑲 → (𝟎, 𝟏)𝑲 , where 𝑲 ≥ 𝟏 , is defined by the
formula:
𝒆𝒛 𝒋
𝝈 𝒛 = 𝑲 𝒛𝒋
σ𝒋=𝟏 𝒆
for 𝒊 = 𝟏, … , 𝑲.
Advantages
▪ Able to handle multiple classes only one class in other
activation functions—normalizes the outputs for each class
between 0 and 1, and divides by their sum, giving the
probability of the input value being in a specific class.

Dr. Mona Nagy ElBedwehy

ElBedwehy 36
Softmax Function
Advantages
▪ Useful for output neurons—typically Softmax is used only for
the output layer, for neural networks that need to classify
inputs into multiple categories.

Dr. Mona Nagy ElBedwehy

ElBedwehy 37
Multilayer Perceptron Network

Dr. Mona Nagy ElBedwehy

ElBedwehy 38
Multilayer Perceptron Network
➢ Let’s write out the MLP computations mathematically.
Conceptually, there’s nothing new here; we just have to pick a
notation to refer to various parts of the network. As with the
linear case, we’ll refer to the activations of the input units as
𝒙𝒋 and the activation of the output unit as 𝒚.
(ℓ)
➢ The units in the ℓ𝐭𝐡 hidden layer will be denoted 𝒉𝒊 .
➢ The network is the fully connected, so each unit receives
connections from all the units in the previous layer.
➢ This means each unit has its own bias, and there’s a weight for
every pair of units in two consecutive layers.

Dr. Mona Nagy ElBedwehy

ElBedwehy 39
Multilayer Perceptron Network
Therefore, the network’s computations can be written out as:
(𝟏) (𝟏) (𝟏)
𝒉𝒊 = 𝝓(𝟏) ෍ 𝒘𝒊𝒋 𝒙𝒋 + 𝒃𝒊
𝒋

(𝟐) (𝟐) (𝟏) (𝟐)

𝒉𝒊 = 𝝓(𝟐) ෍ 𝒘𝒊𝒋 𝒉𝒊 + 𝒃𝒊
𝒋

⋮
(𝒏) (𝒏) (𝒏−𝟏) (𝒏)
𝒉𝒊 = 𝝓(𝒏) ෍ 𝒘𝒊𝒋 𝒉𝒊 + 𝒃𝒊
𝒋

(𝒏+𝟏) (𝒏) (𝒏+𝟏)

𝒚𝒊 = 𝝓(𝒏+𝟏) ෍ 𝒘𝒊𝒋 𝒉𝒊 + 𝒃𝒊
𝒋

Dr. Mona Nagy ElBedwehy

ElBedwehy 40
Multilayer Perceptron Network
Note that we distinguish 𝝓(𝟏) , 𝝓(𝟐), …, and 𝝓(𝒏) because different
layers may have different activation functions.

Dr. Mona Nagy ElBedwehy

ElBedwehy 41
Multilayer Perceptron Example
Example (1) Implement XOR gate using a perceptron.

Dr. Mona Nagy ElBedwehy 42

Multilayer Perceptron Example

Dr. Mona Nagy ElBedwehy 43

Multilayer Perceptron Example

Dr. Mona Nagy ElBedwehy 44

Multilayer Perceptron Example

Dr. Mona Nagy ElBedwehy 45

Multilayer Perceptron Example
Using Binary Step Activation Function

𝝈 𝟐𝟎 × 𝟎 + 𝟐𝟎 × 𝟎 − 𝟏𝟎 = 𝝈 −𝟏𝟎 = 𝟎

(0,0) 𝝈 −𝟐𝟎 × 𝟎 − 𝟐𝟎 × 𝟎 + 𝟑𝟎 = 𝝈 𝟑𝟎 = 𝟏

𝝈 𝟐𝟎 × 𝟎 + 𝟐𝟎 × 𝟏 − 𝟑𝟎 = 𝝈 −𝟏𝟎 = 𝟎

𝝈 𝟐𝟎 × 𝟎 + 𝟐𝟎 × 𝟏 − 𝟏𝟎 = 𝝈 𝟏𝟎 = 𝟏

(0,1) 𝝈 −𝟐𝟎 × 𝟎 − 𝟐𝟎 × 𝟏 + 𝟑𝟎 = 𝝈 𝟏𝟎 = 𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟏 − 𝟑𝟎 = 𝝈 𝟏𝟎 = 𝟏

Dr. Mona Nagy ElBedwehy 46

Multilayer Perceptron Example

𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟎 − 𝟏𝟎 = 𝝈 𝟏𝟎 = 𝟏

(1,0) 𝝈 −𝟐𝟎 × 𝟏 − 𝟐𝟎 × 𝟎 + 𝟑𝟎 = 𝝈 𝟏𝟎 = 𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟏 − 𝟑𝟎 = 𝝈 𝟏𝟎 = 𝟏

𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟏 − 𝟏𝟎 = 𝝈 𝟑𝟎 = 𝟏

(1,1) 𝝈 −𝟐𝟎 × 𝟏 − 𝟐𝟎 × 𝟏 + 𝟑𝟎 = 𝝈 −𝟏𝟎 = 𝟎

𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟎 − 𝟑𝟎 = 𝝈 −𝟏𝟎 = 𝟎

Dr. Mona Nagy ElBedwehy 47

Multilayer Perceptron Example

Dr. Mona Nagy ElBedwehy 48

Multilayer Perceptron Example
Using Sigmoid Activation Function

(0,0)

𝟏
𝝈 𝟐𝟎 × 𝟎 + 𝟐𝟎 × 𝟎 − 𝟏𝟎 = 𝝈 −𝟏𝟎 = = 𝟎. 𝟎𝟎𝟎𝟎𝟎𝟒𝟓 ≈ 𝟎
𝟏 + 𝒆−(−𝟏𝟎)
𝟏
𝝈 −𝟐𝟎 × 𝟎 − 𝟐𝟎 × 𝟎 + 𝟑𝟎 = 𝝈 𝟑𝟎 = ≈𝟏
𝟏 + 𝒆−(𝟑𝟎)
𝟏
𝝈 𝟐𝟎 × 𝟎 + 𝟐𝟎 × 𝟏 − 𝟑𝟎 = 𝝈 −𝟏𝟎 = = 𝟎. 𝟎𝟎𝟎𝟎𝟎𝟒𝟓 ≈ 𝟎
𝟏 + 𝒆−(−𝟏𝟎)

Dr. Mona Nagy ElBedwehy 49

Multilayer Perceptron Example

(0,1)

𝟏
𝝈 𝟐𝟎 × 𝟎 + 𝟐𝟎 × 𝟏 − 𝟏𝟎 = 𝝈 𝟏𝟎 = = 𝟎. 𝟗𝟗𝟗𝟗𝟓𝟓 ≈ 𝟏
𝟏 + 𝒆−(𝟏𝟎)
𝟏
𝝈 −𝟐𝟎 × 𝟎 − 𝟐𝟎 × 𝟏 + 𝟑𝟎 = 𝝈 𝟏𝟎 = = 𝟎. 𝟗𝟗𝟗𝟗𝟓𝟓 ≈ 𝟏
𝟏 + 𝒆−(𝟏𝟎)
𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟏 − 𝟑𝟎 = 𝝈 𝟏𝟎 = = 𝟎. 𝟗𝟗𝟗𝟗𝟓𝟓 ≈ 𝟏
𝟏 + 𝒆−(𝟏𝟎)

Dr. Mona Nagy ElBedwehy 50

Multilayer Perceptron Example

(1,0)

𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟎 − 𝟏𝟎 = 𝝈 𝟏𝟎 = = 𝟎. 𝟗𝟗𝟗𝟗𝟓𝟓 ≈ 𝟏
𝟏 + 𝒆−(𝟏𝟎)
𝟏
𝝈 −𝟐𝟎 × 𝟏 − 𝟐𝟎 × 𝟎 + 𝟑𝟎 = 𝝈 𝟏𝟎 = = 𝟎. 𝟗𝟗𝟗𝟗𝟓𝟓 ≈ 𝟏
𝟏 + 𝒆−(𝟏𝟎)
𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟏 − 𝟑𝟎 = 𝝈 𝟏𝟎 = = 𝟎. 𝟗𝟗𝟗𝟗𝟓𝟓 ≈ 𝟏
𝟏 + 𝒆−(𝟏𝟎)

Dr. Mona Nagy ElBedwehy 51

Multilayer Perceptron Example

(1,1)

𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟏 − 𝟏𝟎 = 𝝈 𝟑𝟎 = =𝟏
𝟏 + 𝒆−(𝟑𝟎)
𝟏
𝝈 −𝟐𝟎 × 𝟏 − 𝟐𝟎 × 𝟏 + 𝟑𝟎 = 𝝈 −𝟏𝟎 = = 𝟎. 𝟎𝟎𝟎𝟎𝟒𝟓 ≈ 𝟎
𝟏 + 𝒆−(−𝟏𝟎)
𝟏
𝝈 𝟐𝟎 × 𝟏 + 𝟐𝟎 × 𝟎 − 𝟑𝟎 = 𝝈 −𝟏𝟎 = = 𝟎. 𝟎𝟎𝟎𝟎𝟒𝟓 ≈ 𝟎
𝟏 + 𝒆−(−𝟏𝟎)

Dr. Mona Nagy ElBedwehy 52

Multilayer Perceptron Example
Example (2)

𝒘𝟏 = 𝟎. 𝟒 𝒘𝟓 = 𝟎. 𝟔
𝟎. 𝟒 𝒙𝟏 𝒉𝟏𝟏 𝒉𝟐𝟏

𝒚𝟏

𝒚𝟐
𝟎. 𝟒 𝒙𝟐 𝒉𝟏𝟐 𝒉𝟐𝟐
𝒘𝟒 = 𝟎. 𝟒 𝒘𝟖 = 𝟎. 𝟔
RELU TANH Softmax

Dr. Mona Nagy ElBedwehy 53

Multilayer Perceptron Example
𝒘𝟏 = 𝟎. 𝟒 𝒘𝟓 = 𝟎. 𝟔
𝟎. 𝟒 𝒙𝟏 𝒉𝟏𝟏 𝒉𝟐𝟏

𝒚𝟏

𝒚𝟐
𝟎. 𝟒 𝒙𝟐 𝒉𝟏𝟐 𝒉𝟐𝟐
𝒘𝟒 = 𝟎. 𝟒 𝒘𝟖 = 𝟎. 𝟔
RELU TANH Softmax

𝒉𝟏𝟏 = 𝝈 𝒘𝟏 × 𝒙𝟏 + 𝒘𝟑 × 𝒙𝟐 = 𝝈 𝟎. 𝟒 × 𝟎. 𝟒 + 𝟎. 𝟔 × 𝟎. 𝟒
= 𝝈 𝟎. 𝟒 = 𝐦𝐚𝐱 𝟎, 𝟎. 𝟒 = 𝟎. 𝟒
𝒉𝟏𝟐 = 𝝈 𝒘𝟐 × 𝒙𝟏 + 𝒘𝟒 × 𝒙𝟐 = 𝝈 𝟎. 𝟔 × 𝟎. 𝟒 + 𝟎. 𝟒 × 𝟎. 𝟒
= 𝝈 𝟎. 𝟒 = 𝐦𝐚𝐱 𝟎, 𝟎. 𝟒 = 𝟎. 𝟒

Dr. Mona Nagy ElBedwehy 54

Multilayer Perceptron Example
𝒘𝟏 = 𝟎. 𝟒 𝒘𝟓 = 𝟎. 𝟔
𝟎. 𝟒 𝒙𝟏 𝒉𝟏𝟏 𝒉𝟐𝟏

𝒚𝟏

𝒚𝟐
𝟎. 𝟒 𝒙𝟐 𝒉𝟏𝟐 𝒉𝟐𝟐
𝒘𝟒 = 𝟎. 𝟒 𝒘𝟖 = 𝟎. 𝟔
RELU TANH Softmax

𝒉𝟐𝟏 = 𝝈 𝒘𝟓 × 𝒉𝟏𝟏 + 𝒘𝟕 × 𝒉𝟏𝟐 = 𝝈 𝟎. 𝟔 × 𝟎. 𝟒 + 𝟎. 𝟕 × 𝟎. 𝟒

= 𝝈 𝟎. 𝟓𝟐 = 𝐭𝐚𝐧𝐡 𝟎. 𝟓𝟐 = 𝟎. 𝟒𝟖
𝒉𝟐𝟐 = 𝝈 𝒘𝟔 × 𝒉𝟏𝟏 + 𝒘𝟖 × 𝒉𝟏𝟐 = 𝝈 𝟎. 𝟕 × 𝟎. 𝟒 + 𝟎. 𝟔 × 𝟎. 𝟒
= 𝝈 𝟎. 𝟓𝟐 = 𝐭𝐚𝐧𝐡 𝟎. 𝟓𝟐 = 𝟎. 𝟒𝟖

Dr. Mona Nagy ElBedwehy 55

Multilayer Perceptron Example
𝒘𝟏 = 𝟎. 𝟒 𝒘𝟓 = 𝟎. 𝟔
𝟎. 𝟒 𝒙𝟏 𝒉𝟏𝟏 𝒉𝟐𝟏

𝒚𝟏

𝒚𝟐
𝟎. 𝟒 𝒙𝟐 𝒉𝟏𝟐 𝒉𝟐𝟐
𝒘𝟒 = 𝟎. 𝟒 𝒘𝟖 = 𝟎. 𝟔
RELU TANH Softmax

𝒚𝟏 = 𝝈 𝒘𝟗 × 𝒉𝟐𝟏 + 𝒘𝟏𝟎 × 𝒉𝟐𝟐 = 𝝈 𝟎. 𝟖 × 𝟎. 𝟒𝟖 + 𝟎. 𝟕 × 𝟎. 𝟒𝟖

= 𝝈 𝟎. 𝟕𝟐 = 𝐒𝐨𝐟𝐭𝐦𝐚𝐱 𝟎. 𝟕𝟐
𝒚𝟐 = 𝝈 𝒘𝟏𝟏 × 𝒉𝟐𝟏 + 𝒘𝟏𝟐 × 𝒉𝟐𝟐 = 𝝈 𝟎. 𝟒 × 𝟎. 𝟒𝟖 + 𝟎. 𝟖 × 𝟎. 𝟒𝟖
= 𝝈 𝟎. 𝟓𝟕𝟔 = 𝐒𝐨𝐟𝐭𝐦𝐚𝐱 𝟎. 𝟓𝟕𝟔

Dr. Mona Nagy ElBedwehy 56

Multilayer Perceptron Example

𝒆𝟎.𝟕𝟐
𝒚𝟏 = 𝐒𝐨𝐟𝐭𝐦𝐚𝐱 𝟎. 𝟕𝟐 = = 𝟎. 𝟓𝟒
𝒆𝟎.𝟕𝟐 + 𝒆𝟎.𝟓𝟕𝟔

𝒆𝟎.𝟓𝟕𝟔
𝒚𝟐 = 𝐒𝐨𝐟𝐭𝐦𝐚𝐱 𝟎. 𝟓𝟕𝟔 = 𝟎.𝟕𝟐 = 𝟎. 𝟒𝟔
𝒆 + 𝒆𝟎.𝟓𝟕𝟔

Dr. Mona Nagy ElBedwehy 57

Multilayer Perceptron Practical (MNIST )
➢ The MNIST database (Modified National Institute of Standards
and Technology database) is a large database of handwritten
digits that is commonly used for training various image
processing systems.
➢ The MNIST database is also widely used for training and testing
in the field of machine learning.
➢ It was created by re-mixing the samples from NIST's original
datasets. The creators felt that since NIST's training dataset
was taken from American Census Bureau employees, while the
testing dataset was taken from American high school students, it
was not well-suited for machine learning experiments.

Dr. Mona Nagy ElBedwehy 58

Multilayer Perceptron Practical (MNIST )
➢ So, Half of the training set and half of the test set were taken
from NIST's training dataset, while the other half of the training
set and the other half of the test set were taken from NIST's
testing dataset.
➢ The MNIST database contains 70,000 handwritten digit images
(28x28 pixels), with 7,000 examples per digit (60,000 training
images and 10,000 testing images).
➢ We will train a feedforward neural network to achieve over 90%
accuracy on the MNIST dataset using Keras and TensorFlow in
python.

Dr. Mona Nagy ElBedwehy 59

Multilayer Perceptron Practical (MNIST )
Furthermore, the black and white images from NIST were
normalized to fit into a 𝟐𝟖 × 𝟐𝟖 pixel bounding box and anti-
aliased, which introduced grayscale levels.

Dr. Mona Nagy ElBedwehy 60

Step 1: Import the Required Python
Packages

Dr. Mona Nagy ElBedwehy 61

Step 1: Import the Required Python
Packages
➢ TensorFlow is a free and open-source software library for
machine learning and artificial intelligence. It can be used across
a range of tasks but has a particular focus on training and
inference of deep neural networks.

It was developed by the Google Brain team for Google's internal

use in research and production.
➢ The LabelBinarizer will be used to one-hot encode our integer
labels as vector labels. One-hot encoding transforms categorical
labels from a single integer to a vector. Many machine learning
algorithms benefit from this type of label representation.

Dr. Mona Nagy ElBedwehy 62

Step 1: Import the Required Python
Packages
➢ Each data point in the MNIST dataset has an integer label in the
range [0, 9], one for each of the possible ten digits in the MNIST
dataset.
➢ A label with a value of 0 indicates that the corresponding image
contains a zero digit.
➢ Similarly, label with a value of 8 indicates to the corresponding
image contains the number eight.

➢ However, we first need to transform these integer labels into

vector labels, where the index in the vector for label is set to 1
and 0 otherwise (this process is called one-hot encoding).
Dr. Mona Nagy ElBedwehy 63
Step 1: Import the Required Python
Packages
➢ For example, consider the label 3 and we wish to binarize/one-
hot encode it — the label 3 now becomes:
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
➢ The index for the digit three is set to one and all other entries in
the vector are set to zero.
➢ The one-hot encoding representations for each digit, 0−9, in the
listing below:

Dr. Mona Nagy ElBedwehy 64

Step 1: Import the Required Python
Packages
0: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] 6: [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]

1: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0] 7: [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
2: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] 8: [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
3: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] 9: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
4: [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
5: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

Dr. Mona Nagy ElBedwehy 65

Step 1: Import the Required Python
Packages
➢ Lines 5-8 import the necessary packages to create a simple
feedforward neural network with Keras.
➢ The Sequential indicates that our network will be feedforward
and layers will be added to the class sequentially, one on top of
the other.
➢ The Flatten flattens the input provided without affecting the
batch size. For example, If inputs are shaped (batch_size,)
without a feature axis, then flattening adds an extra channel
dimension and output shape is (batch_size, 1).

Dr. Mona Nagy ElBedwehy 66

Step 1: Import the Required Python
Packages
➢ The Dense class is the implementation of our fully connected
layers and are the hidden layers and the output layer .
➢ The classification_report function will give a nicely formatted
report displaying the total accuracy of the model, along with a
breakdown on the classification accuracy for each digit.

Dr. Mona Nagy ElBedwehy 67

Step 2: Load the MNIST Dataset

Line 2 loads the MNIST dataset from disk. If you have never run
this function before, then the MNIST dataset will be downloaded
and stored locally to your machine. Once the dataset has been
downloaded, it is cached to your machine and will not have to be
downloaded again.

Dr. Mona Nagy ElBedwehy 68

Step 3: Cast and Normalize the Data

We perform data normalization

by scaling the pixel intensities
to the range [0, 1].

Dr. Mona Nagy ElBedwehy 69

Step 4: One-hot Encoding Representation

Dr. Mona Nagy ElBedwehy 70

Step 5: Define the Network Architecture

Dr. Mona Nagy ElBedwehy 71

Step 5: Define the Network Architecture

Dr. Mona Nagy ElBedwehy 72

Step 5: Define the Network Architecture
➢ Each image in the MNIST dataset is represented as 𝟐𝟖 × 𝟐𝟖 × 𝟏
pixel image.
➢ In order to train our neural network on the image data we first
need to flatten the 2D images into a flat list of 𝟐𝟖 × 𝟐𝟖
= 𝟕𝟖𝟒 values .
➢ The network is a feedforward architecture, instantiated by the
Sequential class— this architecture implies that the layers will
be stacked on top of each other with the output of the previous
layer feeding into the next.
➢ The input_shape is set to 784, the dimensionality of each
MNIST data points.
Dr. Mona Nagy ElBedwehy 73
Step 5: Define the Network Architecture
➢ Then learn 256 weights in this layer and apply the sigmoid
activation function.
➢ The next layer learns 128 weights and apply sigmoid activation
function.

➢ Finally, apply another fully connected layer, this time only

learning 10 weights, corresponding to the ten (0-9) output
classes.
➢ Instead of a sigmoid activation, we use a softmax activation to
obtain normalized class probabilities for each prediction.

Dr. Mona Nagy ElBedwehy 74

Step 6: Compile and Train the Network
Compile the model by specifying the optimizer,
loss function, and evaluation metric

Dr. Mona Nagy ElBedwehy 75

Step 6: Train the Network

Dr. Mona Nagy ElBedwehy 76

Step 6: Compile and Train the Network
➢ Epochs tell us the number of times the model will be trained in
forwarding and backward passes.
➢ Batch Size represents the number of samples, If it’s unspecified,
batch_size will default to 32.

➢ Validation Split is a float value between 0 and 1. The model

will set apart this fraction of the training data to evaluate the
loss and any model metrics at the end of each epoch. (The
model will not be trained on this data)

Dr. Mona Nagy ElBedwehy 77

Step 7: Test the Network
➢ Once the network has finished training, we want to evaluate it on
the testing data to obtain our final classifications:

Return the class label probabilities for

every data point in testX.

Dr. Mona Nagy ElBedwehy 78

Step 7: Test the Network

Dr. Mona Nagy ElBedwehy 79

Step 7: Test the Network
➢ Thus, if you were to inspect the predictions, then the array
would have the shape (𝑿, 𝟏𝟎) as there are 17,500 total data
points in the testing set and ten possible class labels (the digits
0-9).

➢ Each entry in a given row is, therefore, a probability.

➢ To determine the class with the largest probability, we can
simply call . 𝐚𝐫𝐠𝐦𝐚𝐱(𝐚𝐱𝐢𝐬 = 𝟏), which will give us the index of
the class with the largest probability, and the final output
classification.

Dr. Mona Nagy ElBedwehy 80

Step 8: Plot the Training Loss, Accuracy,
Validation Loss, and Accuracy

Dr. Mona Nagy ElBedwehy 81

Step 8: Plot the Training Loss, Accuracy,
Validation Loss, and Accuracy

Dr. Mona Nagy ElBedwehy 82

Step 8: Plot the Training Loss, Accuracy,
Validation Loss, and Accuracy

Dr. Mona Nagy ElBedwehy 83

Step 9: Create the Confusion Matrix

Dr. Mona Nagy ElBedwehy 84

Applied Artificial Intelligence A Handbook For Business Leaders by Mariya Yao Adelyn Zhou Marlene Jia
100% (11)
Applied Artificial Intelligence A Handbook For Business Leaders by Mariya Yao Adelyn Zhou Marlene Jia
182 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Module1
No ratings yet
Module1
124 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Activation Function
No ratings yet
Activation Function
44 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Unit V Neural Networks
No ratings yet
Unit V Neural Networks
35 pages
Activation Function
No ratings yet
Activation Function
31 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Session 5
No ratings yet
Session 5
47 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Activation Function
No ratings yet
Activation Function
4 pages
Unit V
No ratings yet
Unit V
25 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
FML Unit5
No ratings yet
FML Unit5
21 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
Fundamentals of Neural Network
No ratings yet
Fundamentals of Neural Network
84 pages
Activation Function
No ratings yet
Activation Function
43 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
Activation FN
No ratings yet
Activation FN
15 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
0905 Cs 161183 Vishal
No ratings yet
0905 Cs 161183 Vishal
38 pages
Activation
No ratings yet
Activation
7 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
ANNs
No ratings yet
ANNs
57 pages
2.neural Network
No ratings yet
2.neural Network
19 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Unit 2
No ratings yet
Unit 2
35 pages
Week 14 (NN)
No ratings yet
Week 14 (NN)
49 pages
Activation Function
No ratings yet
Activation Function
9 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Unit V
No ratings yet
Unit V
26 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Digital Library
No ratings yet
Digital Library
24 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
15 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
Forward and Backward Propagation Deep Learning 1703697260
No ratings yet
Forward and Backward Propagation Deep Learning 1703697260
9 pages
Report
No ratings yet
Report
13 pages
A Survey Paper PDF
No ratings yet
A Survey Paper PDF
63 pages
JMP Muco
No ratings yet
JMP Muco
80 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Murat Durmus - A Primer To The 42 Most Commonly Used Machine Learning Algorithms (With Code Samples) - Leanpub (2023)
No ratings yet
Murat Durmus - A Primer To The 42 Most Commonly Used Machine Learning Algorithms (With Code Samples) - Leanpub (2023)
192 pages
ML Lab File
No ratings yet
ML Lab File
53 pages
Syllabus ME02000361
No ratings yet
Syllabus ME02000361
4 pages
Data Science 面试必备指南 + 面试真题
No ratings yet
Data Science 面试必备指南 + 面试真题
54 pages
Artificial Intelligence - Terminologies
No ratings yet
Artificial Intelligence - Terminologies
20 pages
MATH 6397 Automatic / Machine Learning and Data Mining Syllabus
No ratings yet
MATH 6397 Automatic / Machine Learning and Data Mining Syllabus
1 page
MLT Cat Ii
No ratings yet
MLT Cat Ii
37 pages
Rapidminer Report
No ratings yet
Rapidminer Report
28 pages
Brain Tumor Detection Using MRI Images
No ratings yet
Brain Tumor Detection Using MRI Images
4 pages
Pavan
No ratings yet
Pavan
23 pages
PIFA: An Intelligent Phase Identification and Frequency Adjustment Framework For Time-Sensitive Mobile Computing
No ratings yet
PIFA: An Intelligent Phase Identification and Frequency Adjustment Framework For Time-Sensitive Mobile Computing
11 pages
Its 2011
No ratings yet
Its 2011
359 pages
Explainable AI (XAI) For Obesity Prediction: An Optimized MLP Approach With SHAP Interpretability On Lifestyle and Behavioral Data
No ratings yet
Explainable AI (XAI) For Obesity Prediction: An Optimized MLP Approach With SHAP Interpretability On Lifestyle and Behavioral Data
9 pages
DM DW Assignment (17775) PDF
No ratings yet
DM DW Assignment (17775) PDF
3 pages
Predictive Maintenance Matlab
No ratings yet
Predictive Maintenance Matlab
71 pages
Peerj Cs 1481
No ratings yet
Peerj Cs 1481
22 pages
2024 MTH058 Lecture08 N ShotLearning
No ratings yet
2024 MTH058 Lecture08 N ShotLearning
39 pages
Heart Disease Prediction Project Documentation
No ratings yet
Heart Disease Prediction Project Documentation
22 pages
Bitspilani ML Ai Wilp
No ratings yet
Bitspilani ML Ai Wilp
31 pages
Post Processing Post Processing
No ratings yet
Post Processing Post Processing
26 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Machine Learning 3rd Sem MCA 2022 QP
100% (1)
Machine Learning 3rd Sem MCA 2022 QP
2 pages
Applied Ai U3
No ratings yet
Applied Ai U3
23 pages
Mining Databases: Towards Algorithms For Knowledge Discovery
No ratings yet
Mining Databases: Towards Algorithms For Knowledge Discovery
10 pages
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
100% (1)
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
20 pages