0% found this document useful (0 votes)

12 views57 pages

Lec 1

Uploaded by

Mado Saeed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views57 pages

Lec 1

Uploaded by

Mado Saeed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Outline

❑ Introduction
❑ Learning paradigms
❑ History of artificial neural networks (ANN)
❑ Modelling of ANNs
❑ Multilayer perceptron (MLP)
❑ Gradient Descent and Backpropagation
❑ ANN types, design and issues
❑ Validation techniques for efficient learning
❑ Assignment(s)
❑ Conclusion

2
Introduction
❑ The ever-increasing popularity of artificial intelligence (AI) and machine learning (ML)
provides a groundbreaking impetus on many aspects of our life.
➢ Artificial Intelligence (AI) are those set of human-designed tools (programs) to do things that is typically
done by human
➢ Machine learning (ML) is an AI field where machine can
learn new things through experience without the
involvement of a human.
➢ Deep learning (DL) is a ML subset where machines adapt
and learn from vast amount of data

Artificial Intelligence
(AI)

Machine Learning
(ML)

Deep
https://pvvajradhar.medium.com/ai-applications-in-various-fields-748dde27516d
Learning
(DL)
3
Categories of Machine Learning
Learning Paradigms

Supervised Reinforcement Unsupervised

Learning Learning Learning
➢ Learning with a teacher ➢ Interactive learning environment ➢ Learning without a teacher
➢ Data with known output (label) by trial and error using feedback ➢ No labels
is given from its own actions and ➢ Machine understand the data
➢ Classification and Regression experiences. ➢ Clustering

Support Vector Machine (SVM), Q-learning, Gaussian Mixtures,

K Nearest Neighbours (KNN), Markov Decision Process K-means, RNN,
Decision Trees, Random Forest Fuzzy c-means
Feedforward Artificial Neural
Network (ANN) 4
Supervised
Supervised Machine Learning
Machine Learning
Data

Image source: https://www.enjoyalgorithms.com/blog/classification-of-machine-learning-models 5

Supervised Machine Learning (cont’d)
Data

Training Testing (or validation)

Image source: https://www.enjoyalgorithms.com/blog/classification-of-machine-learning-models 6
Supervised Machine Learning (cont’d)
Learning Paradigms

Supervised Reinforcement Unsupervised

Support Vector Machine (SVM), Q-learning, Gaussian Mixtures,

K Nearest Neighbours (KNN), Markov Decision Process K-means, RNN,
Decision Trees, Random Forest Fuzzy c-means
Feedforward Artificial Neural
Network (ANN) 7
History of Neural Networks (NN)
❑ 1940: McCulloch and Pitts: First mathematical model of a neuron (A verification model)
❑ 1957: Rosenblatt’s: The Perceptron model
❑ 1959: Widrow and Hoff developed MADALINE was
the first NN to be applied to a real-world problem
Progress on NN research halted until 1981
❑ 1982: Hopfield: Associative memory - Recurrent NN
(or the RNNs)
❑ 1986: Rumelhart: Backpropagation and the era of
Image source: https://developpaper.com/take-you-into-the-past-life-and-this-life-of-neural-
multilayer perceptron (MLP). network/

❑ 1990s: Rise of support vector machine (SVM)

❑ 1997: Schmidhuber & Hochreiter: An RNN, long short-term memory (LSTM) was proposed.
❑ 2006: Hinton et al.: NN returned to the public’s vision again though Deep belief nets (DBNs)
❑ 2016: Boom of NN (Deep convolutional neural networks (CNNs): AlexNet, GoogLeNet, VGG, ResNet, etc.
8
Human Brain and Biological Neurons
❑ Human brain contains billion of neurons (~10 billion)

❑ Each neuron is a cell that uses biochemical reactions to receive, process and
transmit information
❑ Neurons are connected together through synapses (~10K)

Image source: https://beautifulnow.is/discover/wellness/new-brain-flows-are-beautiful-now Image source: https://www.getbodysmart.com/nervous-system/neuron-synapse-structure

9
Human Brain and Biological Neurons (cont’d)
❑ A neuron accept (and combine) inputs through dendrites from other neurons

❑ If a given neuron combined input above a threshold, the neuron discharges a

spike (electrical pulse) that travels from the body, down the axon, to the next
neuron(s) Neuron A

❑ The strength of the signal that Cell body

Nucleus
Neuron B
reaches the next neuron dendrites

depends on factors such as the axon

amount of neurotransmitter
(synapses) available
Synapses or
neurotransmitters
https://natureofcode.com/book/chapter-10-neural-networks/

10
Modeling of a Biological Neuron
❑ A mathematical model of the neuron (called the perceptron) has been
introduced in an effort to mimic our understanding of the functioning of
the brain.
Changes its internal state
(activation) based on the
current input.
Receives
input from
Dendrites

Cell Axon To neighboring

many other Body Neuron(s)
neurons.

Sends one output signal to many

other neurons, possibly including
its input neurons
(recurrent network)
11
Artificial Neuron
❑ An artificial neuron is an imitation of a human neuron
➢ Dendrites: Input
➢ Cell body: Processor
➢ Synaptic: Link
➢ Axon: Output

Output
Processor
Inputs

Technically, artificial neurons

are referred to as units or nodes.
12
Artificial Neuron (cont’d)
Multiple inputs (𝑥) each of which has a different strength, i.e., a weight 𝒘
𝒙𝟏
Activation the combined input must be
𝒘𝟏
above certain threshold

𝒙𝟐
𝒘𝟐

Activation
Sum Unit
Output = 𝒚
Processor
Inputs

Technically, artificial The operations done by a Neuron are:

𝒘𝒎 1) Multiply inputs by the weights,
neurons are referred
𝒙𝒎 2) Add them up
to as units or nodes. 3) Check the sum against the activation and get y 13
Artificial Neuron (cont’d)
𝒙𝟏
𝒘𝟏

𝒙𝟐 𝒘𝟐

෍ 𝒇(𝒖𝒌 ) 𝒚 = 𝒇(𝒖𝒌 )

The output (𝒚) is a function of

𝒘𝒎 ➢ Input 𝒙𝒎
𝑚
➢ Weights (𝒘)
𝑢𝑘 = ෍ 𝑤𝑖 𝑥𝑖
𝑖=1
𝒙𝒎

14
Artificial Neuron (cont’d)
𝒙𝟏
𝒘𝟏

𝒇(. ): How the combined 𝒙s and 𝒘s

𝒙𝟐 𝒘𝟐 are used to produce 𝒚?

෍ 𝒇(𝒖𝒌 ) 𝒚 = 𝒇(𝒖𝒌 )

3 3
𝒘𝒎 2 2
𝑚
1 1
𝑢𝑘 = ෍ 𝑤𝑖 𝑥𝑖
𝑖=1
𝒙𝒎 1 2 3 1 2 3

A bias value (b) is important to full control of the activation function (i.e., the output) for successful
learning. This is a sort of regularization 15
Artificial Neuron (cont’d)
𝒙𝟎 = 𝟏 𝒘𝒐 = 𝒃

𝒙𝟏 𝒘𝟏

𝒙𝟐 𝒘𝟐 ෍ 𝒇(𝒖𝒌 ) 𝒚 = 𝒇(𝒖𝒌 )

3
𝒘𝒎 2
𝑚 𝑚
1
𝑢𝑘 = ෍ 𝑤𝑖 𝑥𝑖 𝑢𝑘 = 𝑏 + ෍ 𝑤𝑖 𝑥𝑖
𝑖=1
𝒙𝒎 𝑖=1 1 2 3

16
Artificial Neuron Network (ANN)
Basic Elements of any ANN:

Inputs Weights Bias

𝒃 Activation
𝒙𝟏 𝒘𝟏 Output
Function

𝒙𝟐 𝒘𝟐 ෍ 𝑓 𝒚

➢ A set of connecting links from inputs 𝑥𝑖 (synapses)

𝒙𝒎 𝒘𝒎 each of which is characterized by a weight 𝒘𝒊 .
➢ A summing unit (adder).
➢ An activation function (nonlinearity) 17
ANN (cont’d)
❑ If the sum exceeds a certain threshold, the ANN
(or the perceptron) fires an output value that is
transmitted to the next unit(s)
❑ ANN uses nonlinear transfer function

Why do we need nonlinearity?

𝑦 = 𝑓 𝑏 + ෍ 𝑤𝑖 𝑥𝑖 𝑦 = 𝑓 𝑏 + 𝐖𝐓 𝐗
𝑖=1

y is linear and unbounded

x1
➢ NOT realistic
➢ Can NOT be generalized
➢ LESS power to solve complex nonlinear problems 18
ANN Transfer Functions
Linear 3 Saturating linear
2 2
1 𝑖𝑓 𝑢𝑘 > 1 1
1
𝑦𝑘 = ቐ𝑢𝑘 𝑖𝑓 0 ≤ 𝑢𝑘 ≤ 1
𝑦𝑘 = 𝑢𝑘 1 2 3
0 𝑖𝑓 𝑢𝑘 < 0 1 2

Hard Limit Symmetric Saturating linear2

1
1 𝑖𝑓 𝑢𝑘 ≥ 0 1 𝑖𝑓 𝑢𝑘 > 1 1
𝑦𝑘 = ቊ 𝑦𝑘 = ቐ 𝑢𝑘 𝑖𝑓 0 ≤ 𝑢𝑘 ≤ 1 -1
0 𝑖𝑓 𝑢𝑘 < 0 0 1 2
−1 𝑖𝑓 𝑢𝑘 < 0 1

Symmetric Hard Limit 1

Log Sigmoid 1

1 𝑖𝑓 𝑢𝑘 ≥ 0 1 0.5
𝑦𝑘 = ቊ 0 𝑦𝑘 =
−1 𝑖𝑓 𝑢𝑘 < 0 1 + 𝑒 −𝑢𝑘
-1 -4 4 8
19
Artificial Neuron: Transfer Function
Hyperbolic Tangent Sigmoid Leaky ReLU
3
1 2
𝑦𝑘 = max(𝜖𝑢𝑘 , 𝑢𝑘 )
𝑒 𝑢𝑘 − 𝑒 −𝑢𝑘 1
𝑦𝑘 = 𝑢 𝜖≪1 0
𝑒 𝑘 − 𝑒 −𝑢𝑘 -4 0
4
1 2 3

-1

Rectified Linear Unit (ReLU) Exponential Linear Unit (ELU)

3
3 2
2 𝑆𝑘 𝑖𝑓 𝑢𝑘 ≥ 0 1
1 𝑦𝑘 = ቊ
𝑦𝑘 = max(0, 𝑢𝑘 ) 𝛼(𝑒 𝑆𝑘 − 1) 𝑖𝑓 𝑢𝑘 < 0 0 1 2 3
0 1 2 3

20
Artificial Neural Network (ANN)
❑ An artificial neural network (ANN) is a massively parallel distributed processor made up
of simple processing units (neurons).
❑ ANN is capable of resolving paradigms that
linear computing cannot resolve.
❑ ANNs are adaptive systems, i.e.,
parameters can be changed through a
learning process (training) to suit the
underlying problem.

❑ ANNs can be used in a wide variety of classification tasks, e.g., character

recognition, speech recognition, fraud detection, medical diagnosis.
❑ “neural networks are the second-best way of doing just about anything” John
Denker (AT&T Bell laboratories)
21
Learning Process
❑ learning is the process by which the parameters of an ANN, i.e., 𝑤, are adapted through
a process of stimulation by the environment by which the network is embedded.

Learning ≡ 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 X(n)

Sample Features d
Sample number n x1 x2 x3 Output Target
➢ Selection of the network topology
1 10.33 56 0.56 0.8
➢ Adapt weights values. 2 8.97 48 0.61 0.1
➢ Learn by trial‐and‐error (experience!) 3 11.01 49 0.49 0.3
4 9.32 53 0.89 0.7
batch 5 10.51 50 0.71 0.4
❑ Every data sample for an ANN training consists 6 12.10 59 0.90 0.8

of a vector X(n) and the corresponding (desired

or target) output d 1996 7.99 61 0.59 0.9
1997 11.36 52 0.63 0.5
❑ A batch is a group of input samples with their desired 1998 12.09 48 0.78 0.2
1999 10.81 55 0.87 0.7
outputs 2000 13.00 53 0.91 0.6 22
Learning Process (cont’d)

𝒖 = 𝒃 + σ𝟑𝒊=𝟏 𝒘𝒊 𝒙𝒊

𝒃
𝒙𝟏 𝒘𝟏

𝒚 = 𝒇 𝒖 𝒅(𝒏)
𝒙𝟐 𝒘𝟐 ෍
𝑓 −
෍

+
error signal
𝒘𝒎
𝒆(𝒏)
𝒙𝒎
m=3 Update 𝑤𝑠 based on 𝑒(𝑛)

Weight Adjustment= function (𝑒𝑟𝑟𝑜𝑟,input)

23
Learning Process (cont’d)

new input sample(s) → output → update weights

Weight Adjustment= function (𝑒𝑟𝑟𝑜𝑟,input)

General rule for neuron learning
𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 + 𝜂 ∗ 𝑒 ∗ 𝑥 𝜂 is the learning constant or the learning rate
24
Learning Process: Summary
❑ Learning is a recursive operation through which network parameters
(weights) are updated in a way to reduce the difference (error) between
network output and the desired (target) output

Set initial values of the weights (e.g., randomly)

Do
Compute the output function of a given input (𝑋(𝑛))
Evaluate the output by comparing 𝑦(𝑛) with 𝑑(𝑛).
Adjust the weights.
Loop until a criterion is met.
end

Criterion
➢ Certain number of iterations
➢ Error threshold
25
Learning Process: Cost Function
❑ Our objective is to reduce the difference between
the actual and target outputs (i.e., the error)

❑ This can be achieved by minimizing a function of

the error (error energy)
➢ This called the cost function.
➢ Example is the mean squared error
1 2 1 2 𝑒 𝑛 = 𝑑 𝑛 −𝑦 𝑛
𝐸 𝑛 = 𝑒 𝑛 = 𝑑 𝑛 −𝑦 𝑛
2 2
❑ This learning is called error-correction learning or delta rule or Widrow-Hoff rule
∆𝑤𝑘𝑗 𝑛 = 𝜂. 𝑒𝑘 𝑛 . 𝑥𝑗 𝑛 𝑛 is the current sample
𝑘 index for the current neuron
𝑤𝑘𝑗 𝑛 + 1 = 𝑤𝑘𝑗 𝑛 + ∆𝑤𝑘𝑗 𝑛 𝑗: 1 → m

❑ The adjustment of a weight vector of 𝒏 input neuron connection is proportional to the product of the
error signal and the input value of the connection in question.
26
Learning Process: Epoch

The training cycle at which All the training samples have been used by the network is called the epoch
27
Learning Process: Example
Example
n x1 x2 x3 d Assume
• initial weights are 0.5, -0.3, 0.8,
1 1 1 0.5 0.7
• b=0;
2 -1 0.7 -0.5 0.2
• 𝜂 =0.1 and
3 0.3 0.3 -0.3 0.3
• linear activation function

28
Learning Process Example: Solution
Solution
𝒘𝟏
𝒃=0
𝒙𝟏
n x1 x2 x3 d
1 1 1 0.5 0.7 𝒚
2
3
-1
0.3
0.7
0.3
-0.5
-0.3
0.2
0.3
𝒙𝟐 𝒘𝟐 ෍
𝑓
𝒙𝟑 𝒘𝟑

29
ANN Examples
❑ One layer feedforward neural network called the 𝒘𝟏 𝒃
𝒙𝟏
perceptron
𝑦
❑ Can solve linear function, e.g., AND, OR, NOT 𝒙𝟐 𝒘𝟐 ෍
𝑓
x2

x1 x2 y
𝒙𝟑 𝒘𝟑 𝑛
0 0 0
1 0 0
(1,0) (1,1) 𝑦 = 𝑓 𝑏 + ෍ 𝑤𝑖 𝑥𝑖
0 1 0 𝑖=1
1 1 1 x1
(0,0) (1,0)

−𝟏. 𝟓
𝒃
𝒙𝟏 𝒘
𝟏𝟏 𝒚𝑦 = 𝑠𝑡𝑒𝑝 −1.5 + 1. 𝑥1 + 1. 𝑥2
𝒙𝟐 𝒘𝟏𝟐
෍ 𝑓
𝒃 𝒘𝟏 𝒘𝟐
ANN Examples (cont’d)
❑ One layer feedforward neural network called the 𝒘𝟏 𝒃
𝒙𝟏
perceptron
𝑦
❑ Can solve linear function, e.g., AND, OR, NOT 𝒙𝟐 𝒘𝟐 ෍
𝑓
x2

x1 x2 y
𝒙𝟑 𝒘𝟑 𝑛
0 0 0
1 0 1
(1,0) (1,1) 𝑦 = 𝑓 𝑏 + ෍ 𝑤𝑖 𝑥𝑖
0 1 1 𝑖=1
1 1 1 x1
(0,0) (1,0)

0. 𝟓
𝒃
𝒙𝟏 𝒘
𝟏𝟏 𝒚𝑦 = 𝑠𝑡𝑒𝑝 0.5 + 1. 𝑥1 + 1. 𝑥2
෍
𝒙𝟐 𝒘𝟏𝟐

𝒃 𝒘𝟏 𝒘𝟐 31
ANN Examples (cont’d)
x2 x2
x1 x2 y
x1 x2 y
0 0 0
0 0 0
(1,0) (1,1) (1,0) (1,1)
1 0 1
1 0 0
0 1 1
0 1 0
1 1 1
1 1 1 x1 x1
(0,0) (1,0) (0,0) (1,0)
𝒃 𝒃
𝑦 = 𝑠𝑡𝑒𝑝 −1.5 + 1. 𝑥1 + 1. 𝑥2 𝑦 = 𝑠𝑡𝑒𝑝 0.5 + 1. 𝑥1 + 1. 𝑥2

❑ Solving linearly, means the decision boundary is linear (straight line in 2D and a plane
in 3D)
❑ The bias term (𝒃) alters the position, but not the orientation, of the decision boundary
❑ The weights (𝑤 1, 𝑤 2, ...𝑤 m) determine the gradient
32
ANN Examples: XOR function
x2
AND
x1 x2 y
0 0 0
(1,0) (1,1)
1 0 1 OR
0 1 1
1 1 0 x1
(0,0) (1,0)

❑ The XOR function is said to be not linearly separable

❑ If one neuron defines one line through input space, what do we need to have two lines?
❑ We need to have two neurons working in parallel (next to each other rather than in different
layers).
❑ We would need a multilayer neural network to model (or to separate the two classes
using) the XOR function.
33
Multilayer Perceptron (MLP)
❑ More layers between the input the
output layers
❑ Fully connected layers

❑ Multiple neurons at the output layers y1

𝒚𝒋 , 𝐣 ∈ 𝑪 C is set of all neurons at the output layer

y2
❑ Error backpropagation is used for
learning Output
Hidden Layers
Hidden
𝑒 𝑛 =𝑑 𝑛 −𝑦 𝑛 Layer
Layer 22
❑ Weight adjustments are applied so as Hidden
Hidden
to minimize 𝑒(𝑛) in a statistical Layer1 1
Layer
sense. Inputs 34
Gradient Descent
The delta rule is a gradient descent learning rule for updating the weights of an
artificial neuron inputs in a single-layer NN
𝑤𝑘𝑗 𝑛 + 1 = 𝑤𝑘𝑗 𝑛 + ∆𝑤𝑘𝑗 𝑛

Image Source: https://datascience-enthusiast.com/figures/cost.jpg https://medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1

The goal of gradient descent is to iteratively take steps towards lower regions (minima) of the loss function 35
Gradient Descent (cont’d)
For linear activation function, the weight adjustment for a neuron k is given by
∆𝑤𝑘𝑗 𝑛 = 𝜂 ∗ 𝑒𝑘 𝑛 ∗ 𝑥𝑗 𝑛 𝑗 = 1,2, … . . 𝑚

For any activation function 𝑓:

∆𝑤𝑘𝑗 𝑛 = 𝜂 ∗ 𝑒𝑘 𝑛 ∗ 𝑓′ 𝑢(𝑛) ∗ 𝑥𝑗 𝑛 ∗

𝒘𝟏
𝑢 𝑛
𝒙𝟏

𝒙𝟐 𝒘𝟐 ෍ 𝑓 𝒚

𝒙𝒎 𝒘𝒎 𝑢 𝑛 = 𝑏 + ෍ 𝑤𝑗 𝑥𝑗
𝑗=1 36
Gradient Descent (cont’d)
𝜕𝐸
∆𝑤𝑘𝑗 = −𝜂 ∗
𝜕𝑤𝑗
𝑢 𝑛
gradient
minimization

https://medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1

By applying the chain rule 1 2 𝜕𝐸

𝐸 𝑛 = 𝑒 𝑛 =𝑒
𝜕𝐸 𝜕𝐸 𝜕𝑒 𝜕𝑦 𝜕𝑢 2 𝜕𝑒
=
𝜕𝑤𝑗 𝜕𝑒 𝜕𝑦 𝜕𝑢 𝜕𝑤𝑗 𝜕𝑒
𝑒 𝑛 =𝑑 𝑛 −𝑦 𝑛 = −1
𝜕𝑦
∆𝑤𝑘𝑗 = −𝜂 ∗ 𝑒 −1 𝑓′ 𝑢(𝑛) 𝑥𝑗 𝜕𝑦
𝑦 𝑛 = 𝑓 𝑢(𝑛) = 𝑓′ 𝑣(𝑛)
𝑚
𝜕𝑣
∆𝑤𝑘𝑗 = 𝜂 ∗ 𝑒 ∗ 𝑓′ 𝑢(𝑛) 𝑥𝑗 𝜕𝑢
𝑢 𝑛 = ෍ 𝑤𝑗 𝑥𝑗 = 𝑥𝑗
𝑗=1
𝜕𝑤𝑗 37
Backpropagation
❑ Backpropagation is supervised algorithm that is propagate activation from input
a generalization for the least mean square (LMS) to output ≡ compute yj

algorithm
❑ It is based on the gradient search technique to
minimize the cost function ≡ squared error
between the network output and the target
output
❑ It is recursive application of the chain rule to
compute the gradients

Please see the following for all details about mathematical

derivation: https://www.jeremyjordan.me/neural-networks- propagate error from output
training/ to hidden layers ≡ adjust all weights
38
Backpropagation (cont’d)
❑ The weights of each output neuron can be determined directly using the delta
learning rule.
∆𝑤𝑘𝑖 = 𝜂 ∗ 𝑒 ∗ 𝑓′ ∙ ∗ 𝑧𝑖 𝛿𝑘 =e∗𝑓′ ∙
𝑤𝑘𝑖 𝛿𝑘
y1
local gradient or error signal 𝑧𝑖
y2

2
2

39
Backpropagation (cont’d)
❑ The weights of each output neuron can be determined directly using the delta
learning rule.
∆𝑤𝑘𝑖 = 𝜂 ∗ 𝑒 ∗ 𝑓′ ∙ ∗ 𝑧𝑖 𝛿𝑘 =e∗𝑓′ ∙
𝛽𝑘𝑗
𝛿𝑗
y1
local gradient or error signal 𝑥𝑗
y2
❑ If the neuron is a hidden node
𝐾
′
2
𝛿𝑗 =𝑓 (∙) ∗ ෍ 𝛿𝑘 ∗ 𝑤𝑘 K is the set of all nodes on a next layer connected to the current neuron
𝑘=1 2
[local gradient] x [upstream gradient]
Please see the following for all details about mathematical derivation:
https://www.jeremyjordan.me/neural-networks-training/
40
Backpropagation Example
❑ Assume one input layer, one hidden layer, and one output neuron
𝑥𝑗 ∶is the 𝒋𝒕𝒉 input 𝜷𝟏𝟏
𝑥1 𝑧1
𝑧𝑖 ∶is the output of the 𝒊𝒕𝒉 hidden neuron 𝒘𝟏𝟏
𝑦𝑘 ∶is the output of the 𝒌𝒕𝒉 output neuron 𝑥2 𝜷𝟏𝟐
𝑧2 𝑦𝑘
𝒘𝟏𝟐
𝛽𝑖𝑗 ∶is the weight from input node 𝑥𝑗 to hidden node 𝑧𝑖
𝑤𝑘𝑖 ∶is the weight from hidden node 𝒛𝒊 to output neuron 𝒚𝒌 𝑧𝑖

❑ The weights of the output neuron can be adjusted using 𝑥𝑗 𝜷𝒊𝒋

Output
Hidden
the delta learning rule and the error signal: Inputs Neurons Neuron
𝐼

𝛿𝑦𝑘 =𝑒𝑘 ∗𝑓 ′ 𝑢𝑘 = 𝑑𝑘 − 𝑦𝑘 𝑓 ′ 𝑢𝑘 𝑢𝑘 = ෍ 𝑤𝑘𝑖 𝑧𝑖

𝑖=1

❑ Update the weights as follows:

𝑤𝑘𝑖 𝑛 + 1 = 𝑤𝑘𝑖 𝑛 + 𝜂 ∗ 𝛿𝑦𝑘 ∗ 𝑧𝑖 41
Backpropagation (cont’d)
❑ The weights of a 𝒊𝒕𝒉 hidden neuron can be adjusted using its error signal:
𝐾 𝐽
𝛿𝑧𝑖 =𝑓 ′ 𝑢𝑖 ∗ ෍ 𝛿𝑦𝑘 ∗ 𝑤𝑘𝑖 𝑢𝑖 = ෍ 𝛽𝑖𝑗 𝑥𝑗
𝑘=1 𝑗=1

❑ Using the error signals, the weights of the 𝒊𝒕𝒉 hidden

neuron can be updated
𝛽𝑖𝑗 𝑛 + 1 = 𝛽𝑖𝑗 𝑛 + 𝜂 ∗ 𝛿𝑧𝑖 ∗ 𝑥𝑗

❑ For a sigmoid activation with zero bias

𝑓 ′ 𝑢𝑘 = 𝑦𝑘 1 − 𝑦𝑘

42
Types of Neural Networks
Feedforward neural network Recurrent neural network (RNN)

Signals to travel one way only Output from previous step is fed
(input to output) as input in the current step
Learning with a teacher Learning without a teacher
Supervised Learning Unsupervised Learning
Self-organizing maps (SOM) 43
ANN Design and Issues
❑ Number of neurons, and hidden layers
❑ Initial weights (small random values ∈[‐1,1])
❑ Choice of the transfer function
❑ Learning rate
❑ Weights adjusting
❑ Data representation, pre-processing, and splitting

44
Learning Rate
❑ The learning rate, 𝜂, is a configurable (hyper)parameter used in ANNs
training
❑ 𝜂 controls how quickly the model is adapted to the problem
❑ Practical value 0 < 𝜂 < 1.

➢ Smaller 𝜂 → smaller changes to 𝑤 → more training epochs

• Can cause the local minima stuck.
➢ Larger 𝜂 → larger changes to 𝑤 → fewer training epochs.
• May results in divergence.

Graph Source: https://cs231n.github.io/neural-networks-3/

45
Learning Rate (cont’d)

Graph Source: https://towardsdatascience.com/the-learning-rate-finder-6618dfcb2025

Graph Source: https://srdas.github.io/DLBook/GradientDescentTechniques.html

One technique that can help the network out of local minima is the use of a momentum term.

∆𝑤𝑘𝑗 𝑛 = 𝜂 ∗ 𝛿𝑘 𝑛 ∗ 𝑥𝑗 𝑛 + 𝛼∆𝑤𝑘𝑗 𝑛 − 1

Weight increment from previous iteration

Momentum factor 46
Learning Rate (cont’d)

Graph Source: https://towardsai.net/p/machine-learning/analysis-of-learning-rate-in-gradient-descent-algorithm-using-python

47
Overfitting
x2 x2 x2

Can NOT be
generalized

x1 x1 x1

Underfitting Good Model Overfitting

Correctly classify test
patterns it has never seen
(learned) before when tested
in real-world problem

Solution
➢ Early stopping
➢ Regularization (Dropout)

Image Source: https://www.pinterest.com/pin/462604192955327068/ 48

Vanishing Gradient
❑ Deeper neural networks (i.e., with multiple hidden layers) are difficult to train
(difficulty increases geometrically).
𝐷

𝛿𝑗 =𝑓 ′ (∙) ∗ ෍ 𝛿𝑘 ∗ 𝑤𝑘 [local gradient] x [upstream gradient]

𝑘=1

➢ The gradients get smaller and smaller when backpropagating the error.
➢ After few layers of propagation, the gradient disappears (vanishes)
➢ The parameters in the deep layer will be almost static

❑ Solution
➢ Modify the activation function
➢ Use batch normalization (sort of regularization)
49
ANN Advantages and Disadvantages
❑ Advantages
➢ Very simple principles
➢ Highly parallel: information processing is much more like the brain than a serial
computer
➢ Adapt to unknown situations, can model complex functions
➢ Ease of use, learns by example, and very little user domain‐specific expertise needed.

❑ Disadvantages
➢ Very complex behaviors
➢ Not exact.
➢ Needs training.

50
ANN Terminology
❑ Neuron, unit (node)
❑ Weight and bias
❑ Transfer function (linear, sigmoid, ReLU, etc)
❑ Loss function (mean squared error, cross entropy, etc.)
❑ Learning rate, epoch, batch
❑ Backpropagation (error propagation)
❑ Optimization (gradient descent (GD), stochastic GD, Adam,….etc.)
❑ Overfitting
❑ Dropout, Batch normalization
Each ANN aspect is considered a standalone research venue 51
Validation Techniques
Data Splitting

Training/Testing Training/Validation/Testing
Total # of Samples Total # of Samples

Training Testing Training Validation Testing

60% (70%)
20% (15%)
70% to 75% 25% to 30%
20% (15%)52
Validation Techniques
Random Sample Selection
Total Number of Samples

Test Samples

Experiment #1

Experiment #2

Experiment #k

→k is the number of experiment

→ Ei is the average error for each experiment using only testing data 53
Validation Techniques
Cross Validation
Divide data into mutually exclusive and equal-sized subsets, folds, and this number is
called K

Total Number of Samples

Here K=4 Part #1 Part #2 Part #3 Part #4

→k is the number of folds

→ Ei is the average error for each fold
54
Validation Techniques
Testing Training Training Training

Training Testing Training Training

Training Training Testing Training

Training Training Training Testing

55
Assignments
❑ Assignment 1: Design your own simple ANN, (one perceptron with one input layer X1 X2 X2 d
and one output neuron). Use the data points listed in the adjacent Table as your 0 0 1 0
training data. Assume the activation function is sigmoid and assume there is no bias
0 1 1 1
for simplicity (b=0). Test your design using different iteration numbers.
1 0 1 1
1 1 1 0

❑ Assignment 2: Modify the above-designed code to implement a multi-layer perceptron, MLP (an
ANN with one input layer, one hidden layer and one output layer) for the same data points above.
Assume sigmoid activation function and there is no bias for simplicity (b=0). Test your approach
using different iteration numbers and different number of nodes for the hidden layer (e.g., 4, 8, and
16).

56
Assignments (cont’d)
❑ Assignment 3: Use the Keras library (tensorflow.keras) to build different ANNs using different
numbers of hidden layers (shallow: 1 hidden, output layer, deeper: two hidden layers with 12 and 8
nodes respectively, and more deep: three hidden layers with 32, 16, 8 nodes respectively). Use the
provided diabetic data sets (here) to train and test your design. Use the ReLU activation for the hidden
layers and the sigmoid activation for the output neuron, loss='binary_crossentropy', optimizer='adam’,
metrics=['accuracy’], epochs = 150.

❑ Assignment 4: Redo assignment #3 using 80% of the data for training and 20% of the data for testing.
Also, plot the training accuracy and loss curves for your designed networks

57
Thank You
&
Questions

ML - Chapter 5 - Neural Network
No ratings yet
ML - Chapter 5 - Neural Network
64 pages
Artificial Neural Network
100% (1)
Artificial Neural Network
16 pages
12 Neural Network
No ratings yet
12 Neural Network
52 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
168 pages
AAI Unit 2
No ratings yet
AAI Unit 2
147 pages
Neural Networks
No ratings yet
Neural Networks
75 pages
DL Mod1
No ratings yet
DL Mod1
58 pages
Neural Networks - PPT Compatibility Mode Repaired
No ratings yet
Neural Networks - PPT Compatibility Mode Repaired
55 pages
DL IT324a 2 ANN
No ratings yet
DL IT324a 2 ANN
123 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
125 pages
Unit V Tn321
No ratings yet
Unit V Tn321
50 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
54 pages
NFGP Unit I Paavai
No ratings yet
NFGP Unit I Paavai
111 pages
Unit 2
No ratings yet
Unit 2
93 pages
ANN Introduction
No ratings yet
ANN Introduction
37 pages
Lntroduction NN
No ratings yet
Lntroduction NN
96 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Lecture 01-Introduction
No ratings yet
Lecture 01-Introduction
33 pages
Wk9-Neural Networks
No ratings yet
Wk9-Neural Networks
46 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
55 pages
ch1 of Artificial Newral Network
No ratings yet
ch1 of Artificial Newral Network
20 pages
Ann - Unit 1
No ratings yet
Ann - Unit 1
96 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
Neural Networks
100% (1)
Neural Networks
57 pages
Ann Mod1
No ratings yet
Ann Mod1
106 pages
Chapter 2 - Artificial Neural Networks (ANNs)
No ratings yet
Chapter 2 - Artificial Neural Networks (ANNs)
27 pages
6-Neural NT
No ratings yet
6-Neural NT
44 pages
Ict L2 PDF
No ratings yet
Ict L2 PDF
49 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
8 pages
DL Lect 4
No ratings yet
DL Lect 4
41 pages
AI Lecture 16
No ratings yet
AI Lecture 16
51 pages
Neural Networks - Comprehensive Foundation (Introduction)
No ratings yet
Neural Networks - Comprehensive Foundation (Introduction)
47 pages
Wk. 12. Artificial Neural Networks (12!05!2021)
No ratings yet
Wk. 12. Artificial Neural Networks (12!05!2021)
48 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
57 pages
Unit 5
No ratings yet
Unit 5
29 pages
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
No ratings yet
Soft Computing: by K.Sai Saranya, Assistant Professor, Department of CSE
127 pages
Seminar Ann
No ratings yet
Seminar Ann
27 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
NN Lecture1 Introduction
No ratings yet
NN Lecture1 Introduction
40 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
31 pages
Lecture 25 - Artificial Neural Networks
No ratings yet
Lecture 25 - Artificial Neural Networks
42 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
48 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
Lesson 14 ANN Supervised
No ratings yet
Lesson 14 ANN Supervised
37 pages
Artifcial Neural Network": "A Project On
No ratings yet
Artifcial Neural Network": "A Project On
31 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Ann Today
No ratings yet
Ann Today
30 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
Artificial Neural Network
100% (2)
Artificial Neural Network
20 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
w1 01 Introtonn
No ratings yet
w1 01 Introtonn
42 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
46 pages
Introduction To Computer Graphics With OpenGL ES by JungHyun Han
No ratings yet
Introduction To Computer Graphics With OpenGL ES by JungHyun Han
341 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
20 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
25 pages
AI-Powered Startups - Deepak Gariya
No ratings yet
AI-Powered Startups - Deepak Gariya
7 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
Lecture 3 Methods
No ratings yet
Lecture 3 Methods
61 pages
BeyondAI Proceedings 2024
No ratings yet
BeyondAI Proceedings 2024
19 pages
Slides Lecture Modelling Infectious Diseases
No ratings yet
Slides Lecture Modelling Infectious Diseases
108 pages
SoftComputing Module I
No ratings yet
SoftComputing Module I
4 pages
ANN (Perceptron and Multilayerd Perceptron)
No ratings yet
ANN (Perceptron and Multilayerd Perceptron)
29 pages
Differenze Finite
No ratings yet
Differenze Finite
25 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
14 pages
Day2 L1
No ratings yet
Day2 L1
19 pages
Journal Pone 0308002
No ratings yet
Journal Pone 0308002
33 pages
Research of Fall Detection and Fall Prevention Technologies A Systematic Review
No ratings yet
Research of Fall Detection and Fall Prevention Technologies A Systematic Review
21 pages
Anjeza Kanxha Bachelor Thesis FinalPresentation
No ratings yet
Anjeza Kanxha Bachelor Thesis FinalPresentation
24 pages
Unit-5 DL
No ratings yet
Unit-5 DL
35 pages
DiseaseDynamicsSP EpidemicsModels
No ratings yet
DiseaseDynamicsSP EpidemicsModels
23 pages
Presentation 1
No ratings yet
Presentation 1
17 pages
Spaces and Transforms
No ratings yet
Spaces and Transforms
31 pages
Moanassar,+1248 3264 1 LE
No ratings yet
Moanassar,+1248 3264 1 LE
15 pages
Machine Learning For Microstrip Patch Antenna Design: Observations and Recommendations
No ratings yet
Machine Learning For Microstrip Patch Antenna Design: Observations and Recommendations
2 pages
Multi-Level CNN For Lung Nodule Classification With Gaussian Process Assisted Hyperparameter Optimization
No ratings yet
Multi-Level CNN For Lung Nodule Classification With Gaussian Process Assisted Hyperparameter Optimization
34 pages
M&S L4 - 2023
No ratings yet
M&S L4 - 2023
29 pages
Identifing Software Bugs or Not Using SMLT Model
No ratings yet
Identifing Software Bugs or Not Using SMLT Model
34 pages
5sem Bca
No ratings yet
5sem Bca
25 pages
Lecture 3 Final
No ratings yet
Lecture 3 Final
13 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
12 pages
Lec 3
No ratings yet
Lec 3
43 pages
How Hedge Funds Are Leveraging Gen AI To Get Ahead
No ratings yet
How Hedge Funds Are Leveraging Gen AI To Get Ahead
29 pages
Self-Supervised Deep Correlation Tracking
No ratings yet
Self-Supervised Deep Correlation Tracking
10 pages
Vikas Final CV
No ratings yet
Vikas Final CV
2 pages
Scale Free
No ratings yet
Scale Free
21 pages
Statement of Purpose Abhimanyu Saxena Updated
No ratings yet
Statement of Purpose Abhimanyu Saxena Updated
2 pages
Adulteration
No ratings yet
Adulteration
9 pages
Numerical Ch12 System of Linear Equtions
No ratings yet
Numerical Ch12 System of Linear Equtions
13 pages
Machine Learning For Energy Systems Optimization
No ratings yet
Machine Learning For Energy Systems Optimization
8 pages
Exam
No ratings yet
Exam
17 pages
Leveraging Artificial Intelligence For Community-Centric Information Services in Public Libraries
No ratings yet
Leveraging Artificial Intelligence For Community-Centric Information Services in Public Libraries
2 pages
Teaching Evaluation System by Use of Machine Learning and Artificial Intelligence Methods
No ratings yet
Teaching Evaluation System by Use of Machine Learning and Artificial Intelligence Methods
15 pages
10 1016-j Cmi 2022 08 001figure
No ratings yet
10 1016-j Cmi 2022 08 001figure
6 pages
Data Science Course in Kerala
No ratings yet
Data Science Course in Kerala
8 pages
Self Supervised Multi Modal Sequential Recommendation
No ratings yet
Self Supervised Multi Modal Sequential Recommendation
11 pages
Document (4) ALEX Thomas SBA 2
No ratings yet
Document (4) ALEX Thomas SBA 2
19 pages
11
No ratings yet
11
11 pages
EM3 Cex 5
No ratings yet
EM3 Cex 5
2 pages
Venge Report
No ratings yet
Venge Report
7 pages
Difference Between Mathematics and Physics
No ratings yet
Difference Between Mathematics and Physics
4 pages
Personalized Classification of Non-Spam Emails Using Machine Learning Techniques
No ratings yet
Personalized Classification of Non-Spam Emails Using Machine Learning Techniques
7 pages
Finals NA
No ratings yet
Finals NA
6 pages
Bitcoin Price Prediction 5 - Colaboratory
No ratings yet
Bitcoin Price Prediction 5 - Colaboratory
5 pages
Cyber-Bullying Detection Using Machine Learning - 2020
No ratings yet
Cyber-Bullying Detection Using Machine Learning - 2020
7 pages
Crowd Counting Using CNN
No ratings yet
Crowd Counting Using CNN
15 pages
P6i4 - Engg Full P 62-68 Shyna Kakkar Jun-2017 PDF
No ratings yet
P6i4 - Engg Full P 62-68 Shyna Kakkar Jun-2017 PDF
7 pages
Image Processing Course
No ratings yet
Image Processing Course
1 page
Data Collection: Face Recognigition System Block Diagram
No ratings yet
Data Collection: Face Recognigition System Block Diagram
5 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet

Lec 1

Uploaded by

Lec 1

Uploaded by

Outline

Supervised Reinforcement Unsupervised

Support Vector Machine (SVM), Q-learning, Gaussian Mixtures,

Image source: https://www.enjoyalgorithms.com/blog/classification-of-machine-learning-models 5

Training Testing (or validation)

Supervised Reinforcement Unsupervised

Support Vector Machine (SVM), Q-learning, Gaussian Mixtures,

❑ 1990s: Rise of support vector machine (SVM)

Image source: https://beautifulnow.is/discover/wellness/new-brain-flows-are-beautiful-now Image source: https://www.getbodysmart.com/nervous-system/neuron-synapse-structure

❑ If a given neuron combined input above a threshold, the neuron discharges a

❑ The strength of the signal that Cell body

depends on factors such as the axon

Cell Axon To neighboring

Sends one output signal to many

Technically, artificial neurons

Technically, artificial The operations done by a Neuron are:

The output (𝒚) is a function of

𝒇(. ): How the combined 𝒙s and 𝒘s

Inputs Weights Bias

➢ A set of connecting links from inputs 𝑥𝑖 (synapses)

Why do we need nonlinearity?

y is linear and unbounded

Hard Limit Symmetric Saturating linear2

Symmetric Hard Limit 1

Rectified Linear Unit (ReLU) Exponential Linear Unit (ELU)

❑ ANNs can be used in a wide variety of classification tasks, e.g., character

Learning ≡ 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 X(n)

of a vector X(n) and the corresponding (desired

Weight Adjustment= function (𝑒𝑟𝑟𝑜𝑟,input)

new input sample(s) → output → update weights

Weight Adjustment= function (𝑒𝑟𝑟𝑜𝑟,input)

Set initial values of the weights (e.g., randomly)

❑ This can be achieved by minimizing a function of

❑ The XOR function is said to be not linearly separable

❑ Multiple neurons at the output layers y1

Image Source: https://datascience-enthusiast.com/figures/cost.jpg https://medium.com/@divakar_239/stochastic-vs-batch-gradient-descent-8820568eada1

For any activation function 𝑓:

By applying the chain rule 1 2 𝜕𝐸

Please see the following for all details about mathematical

❑ The weights of the output neuron can be adjusted using 𝑥𝑗 𝜷𝒊𝒋

𝛿𝑦𝑘 =𝑒𝑘 ∗𝑓 ′ 𝑢𝑘 = 𝑑𝑘 − 𝑦𝑘 𝑓 ′ 𝑢𝑘 𝑢𝑘 = ෍ 𝑤𝑘𝑖 𝑧𝑖

❑ Update the weights as follows:

❑ Using the error signals, the weights of the 𝒊𝒕𝒉 hidden

❑ For a sigmoid activation with zero bias

➢ Smaller 𝜂 → smaller changes to 𝑤 → more training epochs

Graph Source: https://cs231n.github.io/neural-networks-3/

Graph Source: https://towardsdatascience.com/the-learning-rate-finder-6618dfcb2025

Graph Source: https://srdas.github.io/DLBook/GradientDescentTechniques.html

Weight increment from previous iteration

Graph Source: https://towardsai.net/p/machine-learning/analysis-of-learning-rate-in-gradient-descent-algorithm-using-python

Underfitting Good Model Overfitting

Image Source: https://www.pinterest.com/pin/462604192955327068/ 48

𝛿𝑗 =𝑓 ′ (∙) ∗ ෍ 𝛿𝑘 ∗ 𝑤𝑘 [local gradient] x [upstream gradient]

Training Testing Training Validation Testing

→k is the number of experiment

Total Number of Samples

Here K=4 Part #1 Part #2 Part #3 Part #4

→k is the number of folds

Training Testing Training Training

Training Training Testing Training

Training Training Training Testing

You might also like