0% found this document useful (0 votes)
3 views11 pages

Unit 3 Deep Learning

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, including layers such as input, convolutional, ReLU, pooling, flattening, fully connected, and output layers. It discusses key concepts like padding, dropout, strides, pooling types, and local response normalization, as well as their applications in image classification, object detection, face recognition, and medical imaging. The document emphasizes the importance of these techniques in enhancing model performance and preventing overfitting.

Uploaded by

bhaiisumedh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

Unit 3 Deep Learning

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, including layers such as input, convolutional, ReLU, pooling, flattening, fully connected, and output layers. It discusses key concepts like padding, dropout, strides, pooling types, and local response normalization, as well as their applications in image classification, object detection, face recognition, and medical imaging. The document emphasizes the importance of these techniques in enhancing model performance and preventing overfitting.

Uploaded by

bhaiisumedh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit-3 DL

1.) Explain CNN architecture with its application.

CNN Architecture (Convolutional Neural Network)

CNN is a deep learning model mainly used for image and video recognition. Its architecture is inspired by the way the human brain
processes visual data.

1. Input Layer

• Takes input in the form of images (e.g., 28x28x1 for grayscale, 224x224x3 for RGB).

• Each image is represented as a matrix of pixel values.

2. Convolutional Layer

• Applies filters (kernels) to extract features like edges, textures, and patterns.

• A small matrix slides over the input image to perform element-wise multiplication and sum it up.

• Outputs a feature map.

• Example: A filter might detect vertical edges.

3. ReLU Layer (Activation)

• Applies the Rectified Linear Unit (ReLU) function to add non-linearity.

• Converts all negative values in the feature map to zero.

4. Pooling Layer

• Also called Subsampling or Downsampling.

• Reduces the spatial size of the feature maps to make computation faster and avoid overfitting.

• Common method: Max Pooling, which picks the maximum value in a patch.

5. Flattening

• Converts the 2D feature maps into a 1D vector.

• Prepares the data for the fully connected layer.

6. Fully Connected Layer (FC)

• Acts like a regular neural network.

• Combines all extracted features to make final decisions.

• The last layer typically uses Softmax for classification tasks.

7. Output Layer

• Gives the final output, such as class probabilities in image classification.

Applications of CNN

1. Image Classification – E.g., classifying cats vs. dogs.

2. Object Detection – Identifying and locating objects in images (like YOLO, Faster R-CNN).

3. Face Recognition – Used in security and social media.

4. Medical Imaging – Detecting tumors, diseases in X-rays, MRI, etc.

5. Autonomous Vehicles – Understanding road signs, lanes, pedestrians.

6. Gesture and Emotion Recognition – In gaming or virtual meetings.

CNNs are powerful because they automatically learn features from images without manual feature extraction. They’re the backbone of
many modern computer vision systems.

2.)…What is Padding? Enlist and explain types of padding.

Padding is the process of adding extra pixels (usually zeros) around the border of an image before applying the convolution operation in a
CNN. It helps to control the size of the output feature map and preserve important edge features.

🧠 Why is Padding used?

1. To control the spatial size of the output feature map.

2. To preserve edge information that may get lost during convolution.


3. To allow the filter to fit even at the corners of the image.

📚 Types of Padding

1. Valid Padding ("No Padding")

• No extra pixels are added.

• The filter only slides over valid positions of the image.

• Output size decreases after convolution.

• Formula:
Output size=(n−fs+1)\text{Output size} = \left(\frac{n - f}{s} + 1\right)Output size=(sn−f+1)
where n = input size, f = filter size, s = stride

Example:
28x28 input → 3x3 filter → output becomes 26x26.

2. Same Padding ("Zero Padding")

• Pads the input with zeros so the output size remains the same as the input size.

• Helps in deep architectures where we don’t want shrinking size.

• Automatically calculates how much padding is needed.

Example:
28x28 input → 3x3 filter → still 28x28 output after convolution.

3. Full Padding

• Adds enough padding so that the filter can slide to every possible position, even outside the original image.

• Output size increases.

• Rarely used.

Example:
With full padding, the feature map becomes larger than the input.

Type Output Size Edge Info Preserved Common Use

Valid Smaller No Shrink feature maps

Same Same Yes Deep networks

Full Larger Yes Rare use cases

3.) Explain Dropout Layer in Convolutional Neural Network.

The Dropout Layer is a regularization technique used in Convolutional Neural Networks (CNNs) to prevent overfitting during training.

During each training step, some neurons are randomly turned off (dropped out) with a certain probability (like 0.3 or 0.5), meaning
they do not participate in forward or backward propagation.

Why Use Dropout?

• It forces the network to not rely on specific neurons.

• Makes the model more general and improves performance on unseen data.

• Helps in training a more robust model.

🔧 How It Works:

• A dropout rate is set (e.g., 0.5 means 50% neurons will be dropped randomly).

• During training, the dropout is applied.

• During testing, all neurons are used, but their outputs are scaled.

🧠 Benefits:

• Reduces overfitting.
• Encourages the network to learn independent features.

• Works well with dense (fully connected) layers in CNNs.

4) Define ReLU. Explain disadvantages of ReLU.

ReLU is an activation function used in Convolutional Neural Networks (CNNs) and deep learning. It introduces non-linearity into the model
and helps the network learn complex patterns.

The ReLU function is defined as:

ReLU(x)=max⁡(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)

This means:

• If x > 0, output is x

• If x ≤ 0, output is 0

🔧 Advantages of ReLU:

• Simple and fast to compute.

• Helps in solving vanishing gradient problem (better than sigmoid or tanh).

• Increases training speed and model performance.

❌ Disadvantages of ReLU:

1. Dying ReLU Problem:

o Sometimes neurons get stuck and always output 0.

o Once a neuron becomes inactive, it may never recover.

2. Not Zero-Centered:

o ReLU only outputs positive values, which can affect the optimization.

3. Unbounded Output:

o ReLU can produce very large outputs, which may affect stability

5.) What is Strides in CNN? Explain in brief.

Stride refers to the step size with which the filter (kernel) moves across the input image during convolution.

• It controls how much the filter shifts at each step.

• The default stride is usually 1, meaning the filter moves one pixel at a time.

• Stride = 1: Filter moves one pixel at a time → output size is larger.


• Stride = 2: Filter moves two pixels at a time → output size is smaller.
• Larger strides lead to smaller feature maps (more downsampling).

If we use a 3×3 filter on a 7×7 image:

• With stride 1, output size = 5×5


• With stride 2, output size = 3×3

Why Stride is Important:

Helps in controlling the spatial size of the output.

Larger stride → faster computation, less detail.

• Smaller stride → more detailed feature maps, but slower.

6.) Explain Pooling Layer with its different types

The Pooling Layer is used to reduce the size (dimensions) of the feature maps while keeping the important features.

It helps in:
• Reducing computation and memory usage.
• Preventing overfitting.
• Making the model more robust to changes like rotation or translation in the image.

🧪 How Pooling Works:

• A small window (e.g., 2x2) slides over the feature map.


• It performs a specific operation (like max or average) on each window.

🔄 Types of Pooling:

1. Max Pooling

• Takes the maximum value from the window.


• Captures the most important (strongest) feature.

🧠 Example:
From [1, 3; 2, 4] → Max = 4

2. Average Pooling

• Takes the average value of all elements in the window.


• Gives a smooth version of the feature map.

🧠 Example:
From [2, 4; 6, 8] → Average = (2+4+6+8)/4 = 5

3. Global Pooling

• Takes one value per feature map.


• Commonly used before the final output layer in classification.

7.) Explain stride Convolution with example.

Stride is the number of pixels by which the filter moves (or slides) across the input image during the convolution operation in a CNN.

• Stride = 1: The filter moves one pixel at a time → output is larger.

• Stride > 1: The filter moves more pixels at a time → output is smaller.

Stride controls how much the spatial dimensions (width and height) of the output feature map are reduced.

📌 Formula for Output Size (No padding):

Output size=(Input size−Filter sizeStride+1)\text{Output size} = \left(\frac{\text{Input size} - \text{Filter size}}{\text{Stride}} +


1\right)Output size=(StrideInput size−Filter size+1)

📊 Example:

Suppose:

• Input size = 5×5

• Filter size = 3×3

• Stride = 1

Output size=(5−31+1)=3×3\text{Output size} = \left(\frac{5 - 3}{1} + 1\right) = 3×3Output size=(15−3+1)=3×3

Now with Stride = 2:

Output size=(5−32+1)=2×2\text{Output size} = \left(\frac{5 - 3}{2} + 1\right) = 2×2Output size=(25−3+1)=2×2

🧠 Conclusion:

• Higher stride = Smaller output = Faster computation

• Lower stride = Larger output = More detail

8.)Explain Local response normalization and need of it.


Local Response Normalization (LRN) is a technique used in Convolutional Neural Networks (CNNs) to normalize the output of
neurons.

It was introduced in the AlexNet architecture and works on the idea of “lateral inhibition”, where neurons with high activation suppress
nearby neurons.

🧠 Why is LRN Needed? (Purpose):

1. Improves Generalization:

o Helps the model perform better on unseen data.

2. Highlights Strong Activations:

o Encourages only the most strongly activated neurons to pass through, which improves feature learning.

3. Reduces Overfitting:

o Acts as a regularization technique similar to dropout.

4. Stabilizes Training:

o Keeps activations within a reasonable range, avoiding extreme values.

🔧 How LRN Works (Simple Explanation):

• For each neuron, LRN divides its activation by the sum of squared activations of neighboring neurons.

• This makes strong activations stand out more and weak ones fade.

bx,yi=ax,yi(k+α∑j=max⁡(0,i−n/2)min⁡(N−1,i+n/2)(ax,yj)2)βb_{x,y}^i = \frac{a_{x,y}^i}{\left(k + \alpha \sum_{j=\max(0, i-n/2)}^{\min(N-1,


i+n/2)} (a_{x,y}^j)^2 \right)^\beta}bx,yi=(k+α∑j=max(0,i−n/2)min(N−1,i+n/2)(ax,yj)2)βax,yi

(You can skip the formula in exams if not needed)

📌 Summary:

• LRN helps in emphasizing useful features and suppressing less important ones.

• It’s mostly used in older models like AlexNet.

• In modern CNNs, Batch Normalization is more commonly used instead of LRN.

9.) Explain ReLU Layer and its advantages

ReLU Layer Functionality:

• Applies ReLU function element-wise to the input.

• Helps the network learn complex patterns by introducing non-linearity.

• Placed after convolution or fully connected layers.

🌟 Advantages of ReLU:

1. Simple and Fast:

o Easy to compute and increases training speed.

2. Solves Vanishing Gradient Problem:

o Unlike sigmoid/tanh, ReLU doesn't squash gradients for positive values.

3. Sparse Activation:

o Only some neurons activate (non-zero output), which makes the network efficient.

4. Better Performance:

o Works well in deep networks and helps achieve high accuracy.

10.) Explain Pooling layers and its types with examples.

A Pooling Layer is used in Convolutional Neural Networks (CNNs) to reduce the spatial size (width and height) of feature maps.
It helps to:

• Reduce computation and memory usage

• Prevent overfitting

• Retain important features while ignoring minor details

🔄 Types of Pooling:

1. Max Pooling

• Selects the maximum value from the pooling window.

• Keeps the most important feature.

🧠 Example:

From this 2×2 window:

[1324]⇒Max Pooling=4\begin{bmatrix} 1 & 3 \\ 2 & 4 \\ \end{bmatrix} \Rightarrow \text{Max Pooling} = 4[1234]⇒Max Pooling=4

2. Average Pooling

• Takes the average of all values in the pooling window.

• Produces a smoother, generalized output.

🧠 Example:

From this 2×2 window:

[2468]⇒Average Pooling=2+4+6+84=5\begin{bmatrix} 2 & 4 \\ 6 & 8 \\ \end{bmatrix} \Rightarrow \text{Average Pooling} =


\frac{2+4+6+8}{4} = 5[2648]⇒Average Pooling=42+4+6+8=5

3. Global Pooling

• Applies pooling over the entire feature map.

• Converts the feature map into a single value per channel.

• Often used before the final output layer in classification.

11.) What are the applications of Convolution with examples?

Applications of Convolution:

Convolution is a key operation in Convolutional Neural Networks (CNNs) and is widely used in image and signal processing tasks. It
helps extract important features from data like edges, textures, and patterns.

🌟 1. Image Classification

• Convolution helps identify features like edges, colors, and shapes.

• These features are used to classify objects in an image.

Example:
Classifying an image as a cat or dog.

🌟 2. Object Detection

• Convolution is used to detect specific objects in an image.

• Helps in drawing bounding boxes around detected objects.

Example:
Detecting cars and pedestrians in self-driving car cameras.

🌟 3. Face Recognition

• Extracts facial features like eyes, nose, and mouth using convolution layers.

• Used in mobile phone unlock systems.

Example:
Face ID in smartphones.

🌟 4. Medical Image Analysis

• Detects diseases or abnormalities in scans like MRI, CT, or X-rays.

Example:
Detecting tumors in brain scans using CNNs.
🌟 5. Feature Extraction in NLP

• 1D convolution is used on text data to extract meaningful features.

Example:
Sentiment analysis of a product review using CNNs.

📌 Summary:

Application Example

Image Classification Cat vs. Dog

Object Detection Car detection in videos

Face Recognition Phone Face Unlock

Medical Imaging Tumor detection in MRI

Text Analysis (NLP) Sentiment detection in reviews

2.) Explain Pooling Layer with its need and different types

A Pooling Layer is used in CNNs to reduce the size (dimensions) of feature maps while keeping the most important information.

It performs a downsampling operation after convolution layers.

📌 Need for Pooling Layer:

1. Reduces Computation – Smaller data means faster processing.

2. Prevents Overfitting – By reducing the complexity of the model.

3. Retains Key Features – Keeps important patterns and removes noise.

4. Provides Translation Invariance – Helps model recognize features even if they shift slightly in the image.

🔄 Types of Pooling:

1. Max Pooling

• Takes the maximum value in a pooling window.

• Captures the strongest features.

🧠 Example:
From [2, 3; 5, 1] → Max = 5

2. Average Pooling

• Takes the average of all values in the window.

• Produces a smoother feature map.

🧠 Example:
From [2, 4; 6, 8] → Average = 5

3. Global Pooling

• Takes one value per feature map (e.g., max or average of entire map).

• Used before the final classification layer.

📋 Summary Table:

Type What it does Usefulness

Max Pooling Picks highest value Highlights strong features

Average Pooling Takes average value Smoothens features

Global Pooling One value per map Used before final output

13.) Draw and explain CNN (Convolution Neural Network) architecture in detail.
CNN is a deep learning model used mainly for image classification, object detection, and pattern recognition. It mimics how humans
recognize visual patterns.

🧠 Main Layers in CNN Architecture:

1. Input Layer

o Takes input image (e.g., 28×28 pixels, RGB = 3 channels).

o Input shape: (Height × Width × Channels)

2. Convolutional Layer

o Applies filters (kernels) to extract features like edges or textures.

o Outputs feature maps.

3. ReLU Layer (Activation)

o Applies non-linearity using the ReLU function.

o Helps the network learn complex patterns.

4. Pooling Layer

o Reduces the size of the feature maps.

o Types: Max Pooling or Average Pooling.

5. Fully Connected (FC) Layer

o Flattens the output and connects to dense layers.

o Used for final decision-making (like classification).

6. Output Layer

o Uses Softmax or Sigmoid activation for final output (e.g., class labels).

🖼️ CNN Architecture Diagram:

scss

CopyEdit

[Input Image]

[Convolutional Layer]

[ReLU Activation]

[Pooling Layer]

[Convolutional + ReLU + Pooling] ← (repeated multiple times)

[Flatten Layer]

[Fully Connected Layer]

[Output Layer]

📌 Example Use Case:

• For handwritten digit classification (0-9), CNN can take an image and output the predicted digit.

14.) Explain ReLU Layer in detail. What are the advantages of ReLU over Sigmoid?

ReLU (Rectified Linear Unit) is an activation function used in CNNs and deep learning models to introduce non-linearity.

The ReLU function is defined as:


ReLU(x)=max⁡(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)

This means:

• If x > 0, output is x

• If x ≤ 0, output is 0

🔄 ReLU Layer in CNN:

• ReLU is applied after a convolution operation.

• It replaces all negative values in the feature map with zero.

• This helps the model to focus only on the important (positive) signals.

🌟 Advantages of ReLU over Sigmoid:

Feature ReLU Sigmoid

Output Range 0 to ∞ 0 to 1

Computation Speed Very fast Slower (involves exponentiation)

Gradient Behavior No vanishing gradient for x > 0 Suffers from vanishing gradient

Sparsity Produces sparse outputs (efficient) Dense outputs

Training Efficiency Speeds up convergence Slower learning

📌 Conclusion:

• ReLU is the most commonly used activation function in CNNs due to its simplicity, efficiency, and better performance.

• It helps in building deep networks without major issues like vanishing gradients.

15.) Explain all the features of pooling layer.

A Pooling Layer is used in Convolutional Neural Networks (CNNs) to reduce the dimensions of feature maps while keeping the most
important features.

It helps make the network faster, simpler, and more robust.

🌟 Main Features of Pooling Layer:

🔹 1. Dimensionality Reduction

• Reduces the height and width of feature maps.

• Makes the model lighter and faster.

🔹 2. Prevents Overfitting

• By reducing the number of parameters, pooling helps avoid overfitting in deep networks.

🔹 3. Retains Important Features

• Keeps key information (like edges or patterns) while removing less important data.

🔹 4. Translation Invariance

• Pooling helps the network recognize features even when they shift slightly in the image.

• Makes the model more robust to position changes.

🔹 5. Types of Pooling

• Max Pooling: Takes the maximum value in the window.

• Average Pooling: Takes the average of the values.

• Global Pooling: Takes one value per feature map.

🔹 6. Fixed Filter (No Learning)

• Unlike convolution layers, pooling uses a fixed filter (like 2x2), and it does not learn weights.

📌 Example:

For a 2×2 max pooling window:


From
[1324]⇒Max Pooling=4\begin{bmatrix} 1 & 3 \\ 2 & 4 \\ \end{bmatrix} \Rightarrow \text{Max Pooling} = 4[1234]⇒Max Pooling=4

16.) Explain Dropout Layer in Convolutional Neural Network

The Dropout Layer is a regularization technique used in CNNs to prevent overfitting.

During training, random neurons are turned off (dropped) with a certain probability (e.g., 0.5), meaning their output is set to zero
temporarily.

🔧 How Dropout Works:

• At each training step, dropout randomly disables a set of neurons.

• These dropped neurons do not participate in forward or backward passes.

• During testing/inference, all neurons are used, but their outputs are scaled to match the training phase.

🌟 Benefits of Dropout Layer:

1. Prevents Overfitting

o Forces the model to not rely on specific neurons too much.

2. Improves Generalization

o Helps the model perform better on unseen data.

3. Works Like Model Averaging

o Acts like training multiple different neural networks and averaging their results.

📌 Typical Use:

• Dropout is usually applied after fully connected (dense) layers in CNNs.

• Dropout rate (commonly 0.2 to 0.5) defines the fraction of neurons to drop.

📊 Example:

If dropout rate = 0.5


→ 50% of the neurons will be randomly deactivated during training.

17.) Explain working of Convolution Layer with its features.

A Convolution Layer is the core layer of a CNN (Convolutional Neural Network).


It is used to extract features like edges, corners, and textures from an input image.

⚙️ Working of Convolution Layer:

1. A small matrix called a filter or kernel (e.g., 3×3) slides over the input image.

2. At each position, it performs element-wise multiplication with the input values.

3. The results are summed up to get one value.

4. This process creates a feature map (also called an activation map).

5. The filter moves across the image using a defined stride.

📌 Mathematical operation:

Feature map=(Filter)∗(Input image)\text{Feature map} = (\text{Filter}) \ast (\text{Input image})Feature map=(Filter)∗(Input image)

🌟 Features of Convolution Layer:

Feature Description

Local Connectivity Each neuron is connected to a local region of input, not the whole image.

Weight Sharing The same filter is used across the image, reducing the number of parameters.

Translation Invariance Can detect the same feature even if it appears in different positions.

Multiple Filters Each filter detects different features (e.g., edges, corners, patterns).

Preserves Spatial Structure Maintains the spatial relationship between pixels.

📌 Example:
If a 3×3 filter is applied to a 5×5 image with stride 1, it slides over the image and generates a 3×3 feature map showing where the filter pattern
is detected.

You might also like