Lecture #2
Lecture #2
Apply statistical algorithms to learn the hidden patterns and Uses artificial neural network architecture to learn the hidden
relationships in the dataset. patterns and relationships in the dataset.
Takes less time to train the model. Takes more time to train the model.
A model is created by relevant features which are manually Relevant features are automatically extracted from images. It is an
extracted from images to detect an object in the image. end-to-end learning process.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Single Layer Perceptron (SLP)
Q: What is SLP?
▪ An SLP is a supervised learning algorithm used as a binary classifier.
▪ SLP is a feed-forward network without hidden layers that uses a
threshold or activation transfer function.
▪ The input typically consists of a feature x multiplied by a set of
weights w and then summed with a bias term b.
𝒚 = 𝒘𝒙 + 𝒃
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Single Layer Perceptron (SLP) Cont.
A single-layer perceptron (SLP): This means it classifies input data by
separating it into two categories using a straight line.
For example:
OR and AND
Gates.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Example #1
Q: Calculate the output of a single-layer network with the following
parameters:
Inputs: 𝒙𝟏 = 𝟎. 𝟓, 𝒙𝟐 = −𝟏, Weights: 𝒘𝟏 = 𝟐, 𝒘𝟐 = −𝟑, and Bias: b = 0.1.
𝟏
Activation function: Sigmoid 𝝈 𝒚 =
𝟏+𝒆−𝒚
Solution:
1. Compute the weighted sum (y):
𝒚 = 𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒃 = (2x0.5) + (-3x-1) + 0.1 = 4.1
2. Apply the sigmoid activation function:
𝟏 𝟏
𝝈 𝒚 =
𝟏+𝒆−𝒚
, 𝝈 𝟒. 𝟏 =
𝟏+𝒆−𝟒.𝟏
≈ 𝟎. 𝟗𝟖𝟑 or 1
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model
An SLP is a feed-forward network without hidden layers that uses a
threshold (unit step) or activation transfer function.
Linear Model
𝒇 𝒙 = 𝒘𝑻 𝒙 + 𝒃
ෝ=𝒈 𝒇 𝒙
𝒚 = 𝒈(𝒘𝑻 𝒙 + 𝒃)
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
𝒇 𝒙 = 𝒘𝑻 𝒙 + 𝒃
ෝ=𝒈 𝒇 𝒙
𝒚 = 𝒈(𝒘𝑻 𝒙 + 𝒃)
Perceptron Update Rule
For each training sample 𝒙𝒊
𝒘 = 𝒘 + ∆𝒘, 𝒃 = 𝒃 + ∆𝒃
ෝ𝒊 . 𝒙𝒊
∆𝒘 = 𝜶 . 𝒚𝒊 − 𝒚
ෝ𝒊 )
∆𝒃 = 𝜶 . (𝒚𝒊 − 𝒚 Learning rate 𝜶 𝒊𝒏 𝟎, 𝟏
Update Rule Explained
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
Update Rule Explained
𝒚 ෝ
𝒚 ෝ
𝒚−𝒚
1 1 0 Weights are pushed towards
Positive or negative target class in
1 0 1 Case of misclassification:
0 0 0 Where: 𝒚 true value (target)
ෝ predicted value
and 𝒚
0 1 -1
Steps
Training (Learn weights):
Initialize weights
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
Step 1: Initialize weights
Step 2: For each sample:
ෝ=𝒈 𝒇 𝒙
- Calculate 𝒚 = 𝒈 𝒘𝑻 𝒙 + 𝒃
ෝ𝒊 if not zero
test 𝒚𝒊 − 𝒚
ෝ𝒊 . 𝒙𝒊 , ∆𝒃 = 𝜶 . 𝒚𝒊 − 𝒚
- Apply update rule: ∆𝒘 = 𝜶 . 𝒚𝒊 − 𝒚 ෝ𝒊
Step 3: Prediction:
ෝ=𝒈 𝒇 𝒙
- Calculate 𝒚 = 𝒈(𝒘𝑻 𝒙 + 𝒃)
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
Implementation of AND function using Perceptron Model
where:
t: True value
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Multi-Layer Perceptron
A Multi-layer Perceptron (MLP): This is a nonlinear feed-forward
network (fully connected) with hidden layers that uses a threshold or
activation transfer function.
For example: an XOR function:
ഥ𝟐 + 𝒙
𝑿𝑶𝑹: 𝒚 = 𝒙𝟏 𝒙 ഥ𝟏 𝒙𝟐
𝒙𝟏 𝒙𝟐 y
0 0 0
1 0 1
0 1 1
0 0 0
Nonlinear
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Key Components of Multi-Layer Perceptron (MLP)
Input Layer:
▪ Each neuron (or node) in this layer corresponds to an input feature. For
instance, if you have three input features, the input layer will have three
neurons.
Hidden Layers:
▪ An MLP can have any number of hidden layers, with each layer containing
any number of nodes. These layers process the information received from the
input layer.
Output Layer:
▪ The output layer generates the final prediction or result. If there are multiple
outputs, the output layer will have a corresponding number of neurons.
Fully connected:
▪ Means that every node in one layer connects to every node in the next layer.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP
Step 1: Forward Propagation
▪ In forward propagation, the data flows from the input layer to the output layer,
passing through any hidden layers. Each neuron in the hidden layers processes the
input as follows:
1. Weighted Sum: The neuron computes the weighted sum of the inputs:
𝑧 = 𝑤𝑖 𝑥𝑖 + 𝑏
𝑖
Where:
𝑥𝑖 is the input feature
𝑤𝑖 is the corresponding weight
𝑏: is the bias term
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
2: Activation Function:
▪ The weighted sum z is passed through an activation function to introduce non-
linearity. Common activation functions include:
𝟏
▪ Sigmoid: 𝝈 𝒛 =
𝟏+𝒆−𝒛
▪ ReLU (Rectified Linear Unit): 𝒇 𝒛 = 𝒎𝒂𝒙 𝟎, 𝒛
𝟐
▪ Tanh (Hyperbolic Tangent): 𝒕𝒂𝒏 𝒛 = -1
𝟏+𝒆−𝟐𝒛
Step 2: Loss Function:
▪ Once the network generates an output, the next step is to calculate the loss using a
loss function. In supervised learning, this compares the predicted output to the
actual label.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
Step 2: Loss Function:
▪ For a classification problem, the commonly used binary cross-entropy loss function is:
𝟏 𝑵
𝑳=− ෝ𝒊 + 𝟏 − 𝒚𝒊 𝒍𝒐𝒈(𝟏 − 𝒚
𝒚𝒊 𝒍𝒐𝒈 𝒚 ෝ𝒊 )
𝑵 𝒊=𝟏
Where:
𝒚𝒊 is the actual label.
ෝ𝒊 is the predicted label.
𝒚
N is the number of samples
For regression problems, the mean squared error (MSE) is used:
𝟏 𝑵
ෝ 𝒊 )𝟐
𝑴𝑺𝑬 = (𝒚𝒊 − 𝒚
𝑵 𝒊=𝟏
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
Step 3: Backpropagation
▪ The goal of training an MLP is to minimize the loss function by adjusting the network’s
weights and biases. This is achieved through backpropagation:
1- Gradient Calculation: The gradients of the loss function with respect to each weight and
bias are calculated using the chain rule of calculus.
2- Error Propagation: The error is propagated back through the network, layer by layer.
3- Gradient Descent: The network updates the weights and biases by moving in the opposite
direction of the gradient to reduce the loss:
𝝏𝑳
𝒘 = 𝒘 − 𝜶.
𝝏𝒘
𝝏𝑳
Where: 𝒘 is the weight, 𝜶 is the learning rate, is the gradient of the loss function w.r.t the
𝝏𝒘
weight.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
Step 4: Optimization
▪ MLPs rely on optimization algorithms to iteratively refine the weights and biases
during training. Popular optimization methods include:
▪ Stochastic Gradient Descent (SGD):
𝝏𝑳
Updates the weights based on a single sample or a small batch of data: 𝒘 = 𝒘 − 𝜶. 𝝏𝒘
Adam Optimizer:
An extension of SGD that incorporates momentum and adaptive learning rates for
more efficient training:
𝒎𝒕 = 𝜷𝟏 𝒎𝒕−𝟏 + 𝟏 − 𝜷𝟏 . 𝒈𝒕
𝒗𝒕 = 𝜷𝟏 𝒎𝒕−𝟏 + 𝟏 − 𝜷𝟏 . 𝒈𝒕
Here, 𝒈𝒕 represents the gradient at time t, and β1, β2 are decay rates.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Advantages of Multi-Layer Perceptron
Advantages of Multi-Layer Perceptron
▪ Versatility: MLPs can be applied to a variety of problems, both classification and
regression.
▪ Non-linearity: Thanks to activation functions, MLPs can model complex, non-linear
relationships in data.
▪ Parallel Computation: With the help of GPUs, MLPs can be trained quickly by
taking advantage of parallel computing.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Disadvantages of Multi-Layer Perceptron
Disadvantages of Multi-Layer Perceptron
▪ Computationally Expensive: MLPs can be slow to train, especially on large datasets
with many layers.
▪ Prone to Overfitting: Without proper regularization techniques, MLPs can overfit
the training data, leading to poor generalization.
▪ Sensitivity to Data Scaling: MLPs require properly normalized or scaled data for
optimal performance.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Assignment #1: Implementing MLP using Python
Reference:
Multi-Layer Perceptron Learning in Tensorflow – GeeksforGeeks
https://www.geeksforgeeks.org/multi-layer-perceptron-learning-in-tensorflow/
https://colab.research.google.com/
▪ Requirement:
Build your first neural network using Python with Google Colab, TensorFlow (with Keras), or
any convenient Python IDE.
Step 1: Import Required Modules and Load Dataset
First, we import necessary libraries such as TensorFlow, NumPy, and Matplotlib for
visualizing the data. We also load the MNIST dataset.
Step 2: Load and Normalize Image Data
Step 3: Visualizing Data
Step 4: Building the Neural Network Model
Step 5: Compiling the Model
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Assignment #1: Implementing MLP using Python
▪ Requirement:
Step 5: Compiling the Model
Once the model is defined, we compile it by specifying:
Optimizer: Adam, for efficient weight updates.
Loss Function: Sparse categorical crossentropy, which is suitable for multi-class
classification.
Metrics: Accuracy, to evaluate model performance.
Step 6: Training the Model
We train the model on the training data using 10 epochs and a batch size of 2000. We
also use 20% of the training data for validation to monitor the model’s performance
on unseen data during training.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Assignment #1: Implementing MLP using Python
▪ Requirement:
Step 5: Compiling the Model
Once the model is defined, we compile it by specifying:
Optimizer: Adam, for efficient weight updates.
Loss Function: Sparse categorical crossentropy, which is suitable for multi-class
classification.
Metrics: Accuracy, to evaluate model performance.
Step 6: Training the Model
We train the model on the training data using 10 epochs and a batch size of 2000. We
also use 20% of the training data for validation to monitor the model’s performance
on unseen data during training.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail