Unit 3 Deep Learning
Unit 3 Deep Learning
CNN is a deep learning model mainly used for image and video recognition. Its architecture is inspired by the way the human brain
processes visual data.
1. Input Layer
• Takes input in the form of images (e.g., 28x28x1 for grayscale, 224x224x3 for RGB).
2. Convolutional Layer
• Applies filters (kernels) to extract features like edges, textures, and patterns.
• A small matrix slides over the input image to perform element-wise multiplication and sum it up.
4. Pooling Layer
• Reduces the spatial size of the feature maps to make computation faster and avoid overfitting.
• Common method: Max Pooling, which picks the maximum value in a patch.
5. Flattening
7. Output Layer
Applications of CNN
2. Object Detection – Identifying and locating objects in images (like YOLO, Faster R-CNN).
CNNs are powerful because they automatically learn features from images without manual feature extraction. They’re the backbone of
many modern computer vision systems.
Padding is the process of adding extra pixels (usually zeros) around the border of an image before applying the convolution operation in a
CNN. It helps to control the size of the output feature map and preserve important edge features.
📚 Types of Padding
• Formula:
Output size=(n−fs+1)\text{Output size} = \left(\frac{n - f}{s} + 1\right)Output size=(sn−f+1)
where n = input size, f = filter size, s = stride
Example:
28x28 input → 3x3 filter → output becomes 26x26.
• Pads the input with zeros so the output size remains the same as the input size.
Example:
28x28 input → 3x3 filter → still 28x28 output after convolution.
3. Full Padding
• Adds enough padding so that the filter can slide to every possible position, even outside the original image.
• Rarely used.
Example:
With full padding, the feature map becomes larger than the input.
The Dropout Layer is a regularization technique used in Convolutional Neural Networks (CNNs) to prevent overfitting during training.
During each training step, some neurons are randomly turned off (dropped out) with a certain probability (like 0.3 or 0.5), meaning
they do not participate in forward or backward propagation.
• Makes the model more general and improves performance on unseen data.
🔧 How It Works:
• A dropout rate is set (e.g., 0.5 means 50% neurons will be dropped randomly).
• During testing, all neurons are used, but their outputs are scaled.
🧠 Benefits:
• Reduces overfitting.
• Encourages the network to learn independent features.
ReLU is an activation function used in Convolutional Neural Networks (CNNs) and deep learning. It introduces non-linearity into the model
and helps the network learn complex patterns.
This means:
• If x > 0, output is x
• If x ≤ 0, output is 0
🔧 Advantages of ReLU:
❌ Disadvantages of ReLU:
2. Not Zero-Centered:
o ReLU only outputs positive values, which can affect the optimization.
3. Unbounded Output:
o ReLU can produce very large outputs, which may affect stability
Stride refers to the step size with which the filter (kernel) moves across the input image during convolution.
• The default stride is usually 1, meaning the filter moves one pixel at a time.
The Pooling Layer is used to reduce the size (dimensions) of the feature maps while keeping the important features.
It helps in:
• Reducing computation and memory usage.
• Preventing overfitting.
• Making the model more robust to changes like rotation or translation in the image.
🔄 Types of Pooling:
1. Max Pooling
🧠 Example:
From [1, 3; 2, 4] → Max = 4
2. Average Pooling
🧠 Example:
From [2, 4; 6, 8] → Average = (2+4+6+8)/4 = 5
3. Global Pooling
Stride is the number of pixels by which the filter moves (or slides) across the input image during the convolution operation in a CNN.
• Stride > 1: The filter moves more pixels at a time → output is smaller.
Stride controls how much the spatial dimensions (width and height) of the output feature map are reduced.
📊 Example:
Suppose:
• Stride = 1
🧠 Conclusion:
It was introduced in the AlexNet architecture and works on the idea of “lateral inhibition”, where neurons with high activation suppress
nearby neurons.
1. Improves Generalization:
o Encourages only the most strongly activated neurons to pass through, which improves feature learning.
3. Reduces Overfitting:
4. Stabilizes Training:
• For each neuron, LRN divides its activation by the sum of squared activations of neighboring neurons.
• This makes strong activations stand out more and weak ones fade.
📌 Summary:
• LRN helps in emphasizing useful features and suppressing less important ones.
🌟 Advantages of ReLU:
3. Sparse Activation:
o Only some neurons activate (non-zero output), which makes the network efficient.
4. Better Performance:
A Pooling Layer is used in Convolutional Neural Networks (CNNs) to reduce the spatial size (width and height) of feature maps.
It helps to:
• Prevent overfitting
🔄 Types of Pooling:
1. Max Pooling
🧠 Example:
[1324]⇒Max Pooling=4\begin{bmatrix} 1 & 3 \\ 2 & 4 \\ \end{bmatrix} \Rightarrow \text{Max Pooling} = 4[1234]⇒Max Pooling=4
2. Average Pooling
🧠 Example:
3. Global Pooling
Applications of Convolution:
Convolution is a key operation in Convolutional Neural Networks (CNNs) and is widely used in image and signal processing tasks. It
helps extract important features from data like edges, textures, and patterns.
🌟 1. Image Classification
Example:
Classifying an image as a cat or dog.
🌟 2. Object Detection
Example:
Detecting cars and pedestrians in self-driving car cameras.
🌟 3. Face Recognition
• Extracts facial features like eyes, nose, and mouth using convolution layers.
Example:
Face ID in smartphones.
Example:
Detecting tumors in brain scans using CNNs.
🌟 5. Feature Extraction in NLP
Example:
Sentiment analysis of a product review using CNNs.
📌 Summary:
Application Example
2.) Explain Pooling Layer with its need and different types
A Pooling Layer is used in CNNs to reduce the size (dimensions) of feature maps while keeping the most important information.
4. Provides Translation Invariance – Helps model recognize features even if they shift slightly in the image.
🔄 Types of Pooling:
1. Max Pooling
🧠 Example:
From [2, 3; 5, 1] → Max = 5
2. Average Pooling
🧠 Example:
From [2, 4; 6, 8] → Average = 5
3. Global Pooling
• Takes one value per feature map (e.g., max or average of entire map).
📋 Summary Table:
Global Pooling One value per map Used before final output
13.) Draw and explain CNN (Convolution Neural Network) architecture in detail.
CNN is a deep learning model used mainly for image classification, object detection, and pattern recognition. It mimics how humans
recognize visual patterns.
1. Input Layer
2. Convolutional Layer
4. Pooling Layer
6. Output Layer
o Uses Softmax or Sigmoid activation for final output (e.g., class labels).
scss
CopyEdit
[Input Image]
[Convolutional Layer]
[ReLU Activation]
[Pooling Layer]
[Flatten Layer]
[Output Layer]
• For handwritten digit classification (0-9), CNN can take an image and output the predicted digit.
14.) Explain ReLU Layer in detail. What are the advantages of ReLU over Sigmoid?
ReLU (Rectified Linear Unit) is an activation function used in CNNs and deep learning models to introduce non-linearity.
This means:
• If x > 0, output is x
• If x ≤ 0, output is 0
• This helps the model to focus only on the important (positive) signals.
Output Range 0 to ∞ 0 to 1
Gradient Behavior No vanishing gradient for x > 0 Suffers from vanishing gradient
📌 Conclusion:
• ReLU is the most commonly used activation function in CNNs due to its simplicity, efficiency, and better performance.
• It helps in building deep networks without major issues like vanishing gradients.
A Pooling Layer is used in Convolutional Neural Networks (CNNs) to reduce the dimensions of feature maps while keeping the most
important features.
🔹 1. Dimensionality Reduction
🔹 2. Prevents Overfitting
• By reducing the number of parameters, pooling helps avoid overfitting in deep networks.
• Keeps key information (like edges or patterns) while removing less important data.
🔹 4. Translation Invariance
• Pooling helps the network recognize features even when they shift slightly in the image.
🔹 5. Types of Pooling
• Unlike convolution layers, pooling uses a fixed filter (like 2x2), and it does not learn weights.
📌 Example:
During training, random neurons are turned off (dropped) with a certain probability (e.g., 0.5), meaning their output is set to zero
temporarily.
• During testing/inference, all neurons are used, but their outputs are scaled to match the training phase.
1. Prevents Overfitting
2. Improves Generalization
o Acts like training multiple different neural networks and averaging their results.
📌 Typical Use:
• Dropout rate (commonly 0.2 to 0.5) defines the fraction of neurons to drop.
📊 Example:
1. A small matrix called a filter or kernel (e.g., 3×3) slides over the input image.
📌 Mathematical operation:
Feature map=(Filter)∗(Input image)\text{Feature map} = (\text{Filter}) \ast (\text{Input image})Feature map=(Filter)∗(Input image)
Feature Description
Local Connectivity Each neuron is connected to a local region of input, not the whole image.
Weight Sharing The same filter is used across the image, reducing the number of parameters.
Translation Invariance Can detect the same feature even if it appears in different positions.
Multiple Filters Each filter detects different features (e.g., edges, corners, patterns).
📌 Example:
If a 3×3 filter is applied to a 5×5 image with stride 1, it slides over the image and generates a 3×3 feature map showing where the filter pattern
is detected.