Lec7 8+CNN 2
Lec7 8+CNN 2
Presented by:
Dr . Mona Hussein Alnaggar
2024-2025
1st term
Lecture 7,8
Agenda
• Convolutional Neural Network CNN
Backpropagation
• is done by fine-tuning the
weights of the connections in
ANN units based on the error
rate obtained.
• This process continues until
the artificial neural network
can correctly recognize an
object in an image with
minimal possible error rates.
Backpropagation cont.
Feedforward vs Backpropagation
• The data is fed into the model and output from each layer is obtained from the
above step is called feedforward, we then calculate the error using an error
function, some common error functions are cross-entropy, square loss error, etc.
The error function measures how well the network is performing. After that, we
backpropagate into the model by calculating the derivatives. This step is
called Backpropagation which basically is used to minimize the loss.
Gradient Descent
• is used to optimize the weight and biases based on the cost
function.
• cost function evaluates the difference between the actual and
predicted outputs.
• Gradient descent is an optimization algorithm used to find the
values of parameters (coefficients) of a function (f) that
minimizes a cost function.
• In other words, gradient descent is an iterative algorithm that
helps to find the optimal solution to a given problem.
Gradient Descent cont.
2.Calculate the gradient of the cost function with respect to the parameters.
3.Update the parameters by taking a small step in the opposite direction of the
gradient.
4.Repeat steps 2 and 3 until the algorithm reaches a local or global minimum,
where the gradient is zero.
Gradient Descent:
In Gradient Descent, we consider all the points in calculating loss and
derivative.
Stochastic gradient descent:
in Stochastic gradient descent, we use single point in loss function and
its derivative randomly.
Stochastic gradient descent:
Stochastic gradient descent:
Different Variants of Gradient Descent
• There are several variants of gradient descent that differ in the way the step size or learning rate is chosen and the way the updates are made. Here are
some popular variants:
1. Batch Gradient Descent: In batch gradient descent, To update the model parameter values like weight and bias, the entire training dataset is used to
compute the gradient and update the parameters at each iteration. This can be slow for large datasets but may lead to a more accurate model. It is
effective for convex or relatively smooth error manifolds because it moves directly toward an optimal solution by taking a large step in the direction of
the negative gradient of the cost function. However, it can be slow for large datasets because it computes the gradient and updates the parameters using
the entire training dataset at each iteration. This can result in longer training times and higher computational costs.
2. Stochastic Gradient Descent (SGD): In SGD, only one training example is used to compute the gradient and update the parameters at each iteration.
This can be faster than batch gradient descent but may lead to more noise in the updates.
3. Mini-batch Gradient Descent: In Mini-batch gradient descent a small batch of training examples is used to compute the gradient and update the
parameters at each iteration. This can be a good compromise between batch gradient descent and Stochastic Gradient Descent, as it can be faster than
batch gradient descent and less noisy than Stochastic Gradient Descent.
Advantages of gradient descent and its variants:
1. Widely used: Gradient descent and its variants are widely used in machine learning and
optimization problems because they are effective and easy to implement.
2. Convergence: Gradient descent and its variants can converge to a global minimum or a good local
minimum of the cost function, depending on the problem and the variant used.
3. Scalability: Many variants of gradient descent can be parallelized and are scalable to large datasets
and high-dimensional models.
4. Flexibility: Different variants of gradient descent offer a range of trade-offs between accuracy and
speed and can be adjusted to optimize the performance of a specific problem.
Disadvantages of gradient descent and its variants:
1. Choice of learning rate: The choice of learning rate is crucial for the convergence of gradient descent and its variants. Choosing a
learning rate that is too large can lead to oscillations or overshooting while choosing a learning rate that is too small can lead to
slow convergence or getting stuck in local minima.
2. Sensitivity to initialization: Gradient descent and its variants can be sensitive to the initialization of the model’s parameters, which
can affect the convergence and the quality of the solution.
3. Time-consuming: Gradient descent and its variants can be time-consuming, especially when dealing with large datasets and high-
dimensional models. The convergence speed can also vary depending on the variant used and the specific problem.
4. Local optima: Gradient descent and its variants can converge to a local minimum instead of the global minimum of the cost
function, especially in non-convex problems. This can affect the quality of the solution, and techniques like random initialization
and multiple restarts may be used to mitigate this issue.
Convolutional Neural Network
CNN
Convolutional Neural Network (CNN)
• A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture
commonly used in Computer Vision.
• Computer vision is a field of Artificial Intelligence that enables a computer to understand and
interpret the image or visual data.
• When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural
Networks are used in various datasets like images, audio, and text.
• Different types of Neural Networks are used for different purposes, for example for predicting the
sequence of words we use Recurrent Neural Networks more precisely an LSTM, similarly for
image classification we use Convolution Neural networks.
Convolution
Neural Network
• Convolutional Neural Network
(CNN) is the extended version
of artificial neural networks
(ANN) which is predominantly
used to extract the feature from the
grid-like matrix dataset.
• For example, visual datasets like
images or videos where data
patterns play an extensive role.
• CNN architecture consists of
multiple layers like the input layer,
Convolutional layer, Pooling layer,
and fully connected layers.
types of layers:
• In a regular Neural Network, there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is equal
to the total number of features in our data (number of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then feed into the hidden layer. There can be many hidden
layers depending upon our model and data size. Each hidden layer can have different numbers of neurons
which are generally greater than the number of features. The output from each layer is computed by matrix
multiplication of output of the previous layer with learnable weights of that layer and then by the addition of
learnable biases followed by activation function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or softmax
which converts the output of each class into the probability score of each class.
CNN Concepts
• Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces
the number of parameters to learn, and the amount of computation performed in the
network.
• The pooling layer summarizes the features present in a region of the feature map
generated by a convolution layer. So, further operations are performed on summarized
features instead of precisely positioned features generated by the convolution layer. This
makes the model more robust to variations in the position of the features in the input
image.
Pooling layer
Pooling
• When the array is created, the pixels are shifted over to the input matrix. The
number of pixels turning to the input matrix is known as the strides. When the
number of strides is 1, we move the filters to 1 pixel at a time. Similarly, when
the number of strides is 2, we carry the filters to 2 pixels, and so on. They are
Strides essential because they control the convolution of the filter against the input, i.e.,
Strides are responsible for regulating the features that could be missed while
flattening the image. They denote the number of steps we are moving in each
convolution. The following figure shows how the convolution would work.
• The padding plays a vital role in creating CNN.
• After the convolution operation, the original size of the image is shrunk.
• Also, in the image classification task, there are multiple convolution layers after which our
original image is shrunk after every step, which we don’t want.
Padding • Secondly, when the kernel moves over the original image, it passes through the middle
layer more times than the edge layers, due to which there occurs an overlap.
• To overcome this problem, a new concept was introduced named padding. It is an
additional layer that can add to the borders of an image while preserving the size of the
original picture. For example:
Max Pooling
CNN Layers
Try to build a CNN to recognize numbers
MNIST dataset
• Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights and the same depth as that of
input volume (3 if the input layer is image input).
• For example, if we have to run convolution on an image with dimensions 34x34x3. The possible size of filters can be axax3, where
‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image dimension.
• During the forward pass, we slide each filter across the whole input volume step by step where each step is called stride (which can
have a value of 2, 3, or even 4 for high-dimensional images) and compute the dot product between the kernel weights and patch
from input volume.
• As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a result, we’ll get output volume having
a depth equal to the number of filters. The network will learn all the filters.
Advantages of Convolutional Neural
Networks (CNNs):
1.Good at detecting patterns and features in images, videos, and
audio signals.
2.Robust to translation, rotation, and scaling invariance.
3.End-to-end training, no need for manual feature extraction.
4.Can handle large amounts of data and achieve high accuracy.
Disadvantages of Convolutional Neural
Networks (CNNs):
1.Computationally expensive to train and require a lot of
memory.
2.Can be prone to overfitting if not enough data or proper
regularization is used.
3.Requires large amounts of labeled data.
4.Interpretability is limited, it’s hard to understand what the
network has learned.
Q1. What are the fundamentals of deep
learning?
• A. The fundamentals of deep learning include:
1. Neural Networks: Deep learning relies on artificial neural networks, which are composed of interconnected
layers of artificial neurons.
2. Deep Layers: Deep learning models have multiple hidden layers, enabling them to learn hierarchical
representations of data.
3. Training with Backpropagation: Deep learning models are trained using backpropagation, which adjusts the
model’s weights based on the error calculated during forward and backward passes.
4. Activation Functions: Activation functions introduce non-linearity into the network, allowing it to learn
complex patterns.
5. Large Datasets: Deep learning models require large labeled datasets to effectively learn and generalize from
the data.
Q2. What are the fundamentals of neural
network?
• A. The fundamentals of neural networks include:
1. Neurons: Neural networks consist of interconnected artificial neurons that mimic the behavior of biological
neurons.
2. Weights and Biases: Neurons have associated weights and biases that determine the strength of their
connections and their activation thresholds.
3. Activation Function: Each neuron applies an activation function to its input, introducing non-linearity and
enabling complex computations.
4. Layers: Neurons are organized into layers, including input, hidden, and output layers, which process and
transform data.
5. Backpropagation: Neural networks are trained using backpropagation, adjusting weights based on error
gradients to improve performance.