0% found this document useful (0 votes)
5 views23 pages

Lecture #2

The document provides an introduction to Deep Learning (DL) and its distinction from Machine Learning, highlighting the use of artificial neural networks (ANNs) for pattern recognition in complex data. It discusses the evolution of neural architectures, including the perceptron and multi-layer perceptron (MLP), detailing their structures, functionalities, and the training process through forward propagation and backpropagation. Additionally, it outlines the advantages and disadvantages of MLPs, along with an assignment to implement MLP using Python.

Uploaded by

Yousef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views23 pages

Lecture #2

The document provides an introduction to Deep Learning (DL) and its distinction from Machine Learning, highlighting the use of artificial neural networks (ANNs) for pattern recognition in complex data. It discusses the evolution of neural architectures, including the perceptron and multi-layer perceptron (MLP), detailing their structures, functionalities, and the training process through forward propagation and backpropagation. Additionally, it outlines the advantages and disadvantages of MLPs, along with an assignment to implement MLP using Python.

Uploaded by

Yousef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Deep Learning

▪ Deep Learning (DL) revolutionizes (‫ )أحدث ثورة‬how machines


comprehend (‫)يفهم‬, learn from, and engage (‫ )ينخرط‬with
complex data.
▪ DL simulates (‫ )يحاكي‬the neural networks of the human brain,
enabling computers to autonomously (‫ )بشكل ذاتي‬uncover
patterns and make informed decisions from large amounts of
unstructured data.
▪ DL utilizes (‫ )يستخدم‬artificial neural networks (ANNs) to
analyze and learn from complex data.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Difference between Machine Learning and Deep Learning
Machine Learning Deep Learning

Apply statistical algorithms to learn the hidden patterns and Uses artificial neural network architecture to learn the hidden
relationships in the dataset. patterns and relationships in the dataset.

Requires the larger volume of dataset compared to machine


Can work on the smaller amount of dataset
learning

Better for complex task like image processing, natural language


Better for the low-label task.
processing, etc.

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant features which are manually Relevant features are automatically extracted from images. It is an
extracted from images to detect an object in the image. end-to-end learning process.

More complex; it works like the black box interpretations of the


Less complex and easy to interpret the result.
result are not easy.

It can work on the CPU or requires less computing power as


It requires a high-performance computer with GPU.
compared to deep learning.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evolution of Neural Architectures
▪ The development of deep learning started with the perceptron (‫درك‬
ِ ‫;)ال ُم‬
a single-layer neural network introduced in the 1950s.
Q: What is Perceptron?
Answer:
▪ Perceptron is a type of neural network that performs binary
classification that maps input features to an output decision, usually
classifying data into one of two categories, such as 0 or 1.
▪ Perceptron consists of a single layer of input nodes fully connected to
a layer of output nodes.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Single Layer Perceptron (SLP)
Q: What is SLP?
▪ An SLP is a supervised learning algorithm used as a binary classifier.
▪ SLP is a feed-forward network without hidden layers that uses a
threshold or activation transfer function.
▪ The input typically consists of a feature x multiplied by a set of
weights w and then summed with a bias term b.
𝒚 = 𝒘𝒙 + 𝒃

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Single Layer Perceptron (SLP) Cont.
A single-layer perceptron (SLP): This means it classifies input data by
separating it into two categories using a straight line.
For example:
OR and AND
Gates.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Example #1
Q: Calculate the output of a single-layer network with the following
parameters:
Inputs: 𝒙𝟏 = 𝟎. 𝟓, 𝒙𝟐 = −𝟏, Weights: 𝒘𝟏 = 𝟐, 𝒘𝟐 = −𝟑, and Bias: b = 0.1.
𝟏
Activation function: Sigmoid 𝝈 𝒚 =
𝟏+𝒆−𝒚
Solution:
1. Compute the weighted sum (y):
𝒚 = 𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒃 = (2x0.5) + (-3x-1) + 0.1 = 4.1
2. Apply the sigmoid activation function:
𝟏 𝟏
𝝈 𝒚 =
𝟏+𝒆−𝒚
, 𝝈 𝟒. 𝟏 =
𝟏+𝒆−𝟒.𝟏
≈ 𝟎. 𝟗𝟖𝟑 or 1
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model
An SLP is a feed-forward network without hidden layers that uses a
threshold (unit step) or activation transfer function.
Linear Model

𝒇 𝒙 = 𝒘𝑻 𝒙 + 𝒃
ෝ=𝒈 𝒇 𝒙
𝒚 = 𝒈(𝒘𝑻 𝒙 + 𝒃)
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
𝒇 𝒙 = 𝒘𝑻 𝒙 + 𝒃
ෝ=𝒈 𝒇 𝒙
𝒚 = 𝒈(𝒘𝑻 𝒙 + 𝒃)
Perceptron Update Rule
For each training sample 𝒙𝒊
𝒘 = 𝒘 + ∆𝒘, 𝒃 = 𝒃 + ∆𝒃
ෝ𝒊 . 𝒙𝒊
∆𝒘 = 𝜶 . 𝒚𝒊 − 𝒚
ෝ𝒊 )
∆𝒃 = 𝜶 . (𝒚𝒊 − 𝒚 Learning rate 𝜶 𝒊𝒏 𝟎, 𝟏
Update Rule Explained

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
Update Rule Explained
𝒚 ෝ
𝒚 ෝ
𝒚−𝒚
1 1 0 Weights are pushed towards
Positive or negative target class in
1 0 1 Case of misclassification:
0 0 0 Where: 𝒚 true value (target)
ෝ predicted value
and 𝒚
0 1 -1
Steps
Training (Learn weights):
Initialize weights
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
Step 1: Initialize weights
Step 2: For each sample:
ෝ=𝒈 𝒇 𝒙
- Calculate 𝒚 = 𝒈 𝒘𝑻 𝒙 + 𝒃
ෝ𝒊 if not zero
test 𝒚𝒊 − 𝒚
ෝ𝒊 . 𝒙𝒊 , ∆𝒃 = 𝜶 . 𝒚𝒊 − 𝒚
- Apply update rule: ∆𝒘 = 𝜶 . 𝒚𝒊 − 𝒚 ෝ𝒊
Step 3: Prediction:
ෝ=𝒈 𝒇 𝒙
- Calculate 𝒚 = 𝒈(𝒘𝑻 𝒙 + 𝒃)

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
SLP Implementation Model Cont.
Implementation of AND function using Perceptron Model
where:
t: True value

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Multi-Layer Perceptron
A Multi-layer Perceptron (MLP): This is a nonlinear feed-forward
network (fully connected) with hidden layers that uses a threshold or
activation transfer function.
For example: an XOR function:
ഥ𝟐 + 𝒙
𝑿𝑶𝑹: 𝒚 = 𝒙𝟏 𝒙 ഥ𝟏 𝒙𝟐
𝒙𝟏 𝒙𝟐 y
0 0 0
1 0 1
0 1 1
0 0 0
Nonlinear
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Key Components of Multi-Layer Perceptron (MLP)
Input Layer:
▪ Each neuron (or node) in this layer corresponds to an input feature. For
instance, if you have three input features, the input layer will have three
neurons.
Hidden Layers:
▪ An MLP can have any number of hidden layers, with each layer containing
any number of nodes. These layers process the information received from the
input layer.
Output Layer:
▪ The output layer generates the final prediction or result. If there are multiple
outputs, the output layer will have a corresponding number of neurons.
Fully connected:
▪ Means that every node in one layer connects to every node in the next layer.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP
Step 1: Forward Propagation
▪ In forward propagation, the data flows from the input layer to the output layer,
passing through any hidden layers. Each neuron in the hidden layers processes the
input as follows:
1. Weighted Sum: The neuron computes the weighted sum of the inputs:

𝑧 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏
𝑖
Where:
𝑥𝑖 is the input feature
𝑤𝑖 is the corresponding weight
𝑏: is the bias term
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
2: Activation Function:
▪ The weighted sum z is passed through an activation function to introduce non-
linearity. Common activation functions include:
𝟏
▪ Sigmoid: 𝝈 𝒛 =
𝟏+𝒆−𝒛
▪ ReLU (Rectified Linear Unit): 𝒇 𝒛 = 𝒎𝒂𝒙 𝟎, 𝒛
𝟐
▪ Tanh (Hyperbolic Tangent): 𝒕𝒂𝒏 𝒛 = -1
𝟏+𝒆−𝟐𝒛
Step 2: Loss Function:
▪ Once the network generates an output, the next step is to calculate the loss using a
loss function. In supervised learning, this compares the predicted output to the
actual label.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
Step 2: Loss Function:
▪ For a classification problem, the commonly used binary cross-entropy loss function is:

𝟏 𝑵
𝑳=− ෍ ෝ𝒊 + 𝟏 − 𝒚𝒊 𝒍𝒐𝒈(𝟏 − 𝒚
𝒚𝒊 𝒍𝒐𝒈 𝒚 ෝ𝒊 )
𝑵 𝒊=𝟏
Where:
𝒚𝒊 is the actual label.
ෝ𝒊 is the predicted label.
𝒚
N is the number of samples
For regression problems, the mean squared error (MSE) is used:

𝟏 𝑵
ෝ 𝒊 )𝟐
𝑴𝑺𝑬 = ෍ (𝒚𝒊 − 𝒚
𝑵 𝒊=𝟏

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
Step 3: Backpropagation
▪ The goal of training an MLP is to minimize the loss function by adjusting the network’s
weights and biases. This is achieved through backpropagation:
1- Gradient Calculation: The gradients of the loss function with respect to each weight and
bias are calculated using the chain rule of calculus.
2- Error Propagation: The error is propagated back through the network, layer by layer.
3- Gradient Descent: The network updates the weights and biases by moving in the opposite
direction of the gradient to reduce the loss:
𝝏𝑳
𝒘 = 𝒘 − 𝜶.
𝝏𝒘
𝝏𝑳
Where: 𝒘 is the weight, 𝜶 is the learning rate, is the gradient of the loss function w.r.t the
𝝏𝒘
weight.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Working of MLP Cont.
Step 4: Optimization
▪ MLPs rely on optimization algorithms to iteratively refine the weights and biases
during training. Popular optimization methods include:
▪ Stochastic Gradient Descent (SGD):
𝝏𝑳
Updates the weights based on a single sample or a small batch of data: 𝒘 = 𝒘 − 𝜶. 𝝏𝒘
Adam Optimizer:
An extension of SGD that incorporates momentum and adaptive learning rates for
more efficient training:
𝒎𝒕 = 𝜷𝟏 𝒎𝒕−𝟏 + 𝟏 − 𝜷𝟏 . 𝒈𝒕
𝒗𝒕 = 𝜷𝟏 𝒎𝒕−𝟏 + 𝟏 − 𝜷𝟏 . 𝒈𝒕
Here, 𝒈𝒕 represents the gradient at time t, and β1, β2 are decay rates.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Advantages of Multi-Layer Perceptron
Advantages of Multi-Layer Perceptron
▪ Versatility: MLPs can be applied to a variety of problems, both classification and
regression.
▪ Non-linearity: Thanks to activation functions, MLPs can model complex, non-linear
relationships in data.
▪ Parallel Computation: With the help of GPUs, MLPs can be trained quickly by
taking advantage of parallel computing.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Disadvantages of Multi-Layer Perceptron
Disadvantages of Multi-Layer Perceptron
▪ Computationally Expensive: MLPs can be slow to train, especially on large datasets
with many layers.
▪ Prone to Overfitting: Without proper regularization techniques, MLPs can overfit
the training data, leading to poor generalization.
▪ Sensitivity to Data Scaling: MLPs require properly normalized or scaled data for
optimal performance.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Assignment #1: Implementing MLP using Python
Reference:
Multi-Layer Perceptron Learning in Tensorflow – GeeksforGeeks
https://www.geeksforgeeks.org/multi-layer-perceptron-learning-in-tensorflow/
https://colab.research.google.com/
▪ Requirement:
Build your first neural network using Python with Google Colab, TensorFlow (with Keras), or
any convenient Python IDE.
Step 1: Import Required Modules and Load Dataset
First, we import necessary libraries such as TensorFlow, NumPy, and Matplotlib for
visualizing the data. We also load the MNIST dataset.
Step 2: Load and Normalize Image Data
Step 3: Visualizing Data
Step 4: Building the Neural Network Model
Step 5: Compiling the Model

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Assignment #1: Implementing MLP using Python
▪ Requirement:
Step 5: Compiling the Model
Once the model is defined, we compile it by specifying:
Optimizer: Adam, for efficient weight updates.
Loss Function: Sparse categorical crossentropy, which is suitable for multi-class
classification.
Metrics: Accuracy, to evaluate model performance.
Step 6: Training the Model
We train the model on the training data using 10 epochs and a batch size of 2000. We
also use 20% of the training data for validation to monitor the model’s performance
on unseen data during training.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Assignment #1: Implementing MLP using Python
▪ Requirement:
Step 5: Compiling the Model
Once the model is defined, we compile it by specifying:
Optimizer: Adam, for efficient weight updates.
Loss Function: Sparse categorical crossentropy, which is suitable for multi-class
classification.
Metrics: Accuracy, to evaluate model performance.
Step 6: Training the Model
We train the model on the training data using 10 epochs and a batch size of 2000. We
also use 20% of the training data for validation to monitor the model’s performance
on unseen data during training.
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail

You might also like