Deep Learning Module-04
Deep Learning Module-04
Module-04
Convolutional Networks
Definition of Convolution
ud
• Purpose: Captures important patterns and structures in the input data, crucial for tasks like
image recognition.
lo
C
2. Mathematical Formulation
tu
V
Page 1
21CS743 | DEEP LEARNING
3. Parameters of Convolution
a. Stride
• Definition: The number of pixels the filter moves over the input.
• Types:
ud
o Stride of 2: Filter moves two pixels at a time, reducing output size (downsampling).
b.Padding
•
lo
Definition: Adding extra pixels around the input image.
Types:
C
o Valid Padding: No padding applied; results in a smaller output feature map.
o Same Padding: Padding applied to maintain the same output dimensions as the
input.
tu
Page 2
21CS743 | DEEP LEARNING
Purpose of Pooling
ud
invariance.
2. Types of Pooling
lo
C
a. Max Pooling
• Definition: Selects the maximum value from each patch (sub-region) of the feature map.
tu
• Purpose: Captures the most prominent features while reducing spatial dimensions.
b. Average Pooling
V
• Definition: Takes the average value from each patch of the feature map.
Page 3
21CS743 | DEEP LEARNING
3. Operation of Pooling
ud
•
lo
4. Significance in Neural Networks
Feature Extraction: Reduces the size of the feature maps while retaining the most relevant
features.
C
• Efficiency: Decreases computational load, allowing deeper networks to train faster.
• Robustness: Provides a degree of invariance to small translations in the input, making the
model more robust.
tu
V
Page 4
21CS743 | DEEP LEARNING
• Focus on Local Patterns: Emphasizes the importance of local patterns in the data (e.g.,
edges and textures) over global patterns.
ud
2. Pooling as an Infinitely Strong Prior
•
lo
3. Significance in Neural Networks
Feature Learning: Both operations prioritize local features, enabling efficient learning of
essential characteristics from input data.
C
• Improved Generalization: The combination of convolution and pooling enhances the
model's ability to generalize across various input variations.
tu
V
Page 5
21CS743 | DEEP LEARNING
1. Dilated Convolutions
• Wider Context: Allows the model to incorporate a wider context of the input data without
significantly increasing the number of parameters.
ud
• Applications: Useful in tasks where understanding broader spatial relationships is
important, such as in semantic segmentation.
• Two-Stage Process:
o
lo
Depthwise Convolution: Applies a separate convolution for each input channel,
reducing computational complexity.
Pointwise Convolution: Uses 1x1 convolutions to combine the outputs from the
depthwise convolution.
C
• Parameter Efficiency: Reduces the number of parameters and computations compared to
standard convolutions while maintaining performance.
tu
• Applications: Commonly used in lightweight models, such as MobileNets, for mobile and
edge devices.
V
Page 6
21CS743 | DEEP LEARNING
• Structured Outputs: Refers to tasks where the output has a specific structure or spatial
arrangement, such as pixel-wise predictions in image segmentation or keypoint localization
in object detection.
Maintaining Spatial Structure: For tasks like semantic segmentation, it’s crucial to
ud
•
maintain the spatial relationships between pixels in predictions to ensure that the output
accurately represents the original input image.
3. Specialized Networks
•
lo
Networks (FCNs), are designed to handle structured outputs by replacing fully connected
layers with convolutional layers, allowing for spatially consistent predictions.
Skip Connections: Techniques like skip connections (used in U-Net and ResNet) help
preserve high-resolution features from earlier layers, improving the accuracy of the output.
C
4. Adjusted Loss Functions
o Pixel-wise Loss: Evaluating the loss on a per-pixel basis (e.g., Cross-Entropy Loss
for segmentation).
Loss or Intersection over Union (IoU) metrics, which consider the overlap between
predicted and true regions.
Page 7
21CS743 | DEEP LEARNING
5. Applications
• Use Cases: Structured output networks are widely used in various applications, including:
ud
o Object Detection: Predicting bounding boxes and class labels for objects in an
image while maintaining spatial relations.
Data Types
lo
C
tu
V
1. 2D Images
• Standard Input: The most common input type for CNNs, typically used in image
classification, object detection, and segmentation tasks.
• Format: Represented as height × width × channels (e.g., RGB images have three channels).
Page 8
21CS743 | DEEP LEARNING
2. 3D Data
• Definition: Includes video processing and volumetric data, such as those found in medical
imaging (e.g., MRI or CT scans).
ud
• Applications: Useful in tasks like action recognition in videos or analyzing 3D medical
images for diagnosis.
3. 1D Data
•
lo
Applications: Used in tasks like speech recognition, audio classification, and analyzing
sensor data from IoT devices.
C
Efficient Convolution Algorithms
• Definition: A mathematical algorithm that computes the discrete Fourier transform (DFT)
and its inverse, converting signals between time (or spatial) domain and frequency domain.
Page 9
21CS743 | DEEP LEARNING
ud
2. Winograd's Algorithms
• Efficiency Improvement:
o
lo
Winograd's algorithms work by rearranging the computation of convolution to
minimize redundant calculations.
They can reduce the complexity of convolution operations, particularly for small
kernels, making them more efficient in terms of computational resources.
C
• Key Concepts:
o The algorithms break down the convolution operation into smaller components,
tu
Page 10
21CS743 | DEEP LEARNING
• Definition: A technique that uses random projections to map input data into a higher-
dimensional space, facilitating the extraction of features without the need for labels.
• Purpose: Helps to approximate kernel methods, enabling linear models to learn complex
functions.
ud
• Advantages:
o Scalability: Suitable for large datasets as it allows for faster training times.
• Applications: Commonly used in tasks where labeled data is scarce, such as clustering and
anomaly detection.
2. Autoencoders
•
lo
Definition: A type of neural network designed to learn efficient representations of data
C
through unsupervised learning by encoding the input into a lower-dimensional space and
then reconstructing it back.
• Structure:
tu
• Purpose: Learns to capture important features and structures in the data without
V
• Advantages:
o Robustness: Can learn from noisy data and still produce meaningful
representations.
Page 11
21CS743 | DEEP LEARNING
ud
• Role in Unsupervised Learning: Both methods enable the extraction of meaningful
features from unlabelled data, facilitating learning in scenarios where obtaining labeled
data is challenging or expensive.
lo
C
tu
V
Page 12
21CS743 | DEEP LEARNING
Notable Architectures
ud
1. LeNet-5
lo
C
• Introduction:
o One of the first convolutional networks designed specifically for image recognition
tasks.
• Architecture Details:
o Convolutional Layer 1:
Page 13
21CS743 | DEEP LEARNING
o Pooling Layer 1:
o Convolutional Layer 2:
▪ 16 filters (5x5).
ud
▪ Output size: 10x10x16.
o Pooling Layer 2:
o lo
Fully Connected Layers:
• Significance:
tu
o Introduced the concept of using convolutional layers for feature extraction followed
by pooling layers for dimensionality reduction.
2. AlexNet
• Introduction:
Page 14
21CS743 | DEEP LEARNING
• Architecture Details:
o Convolutional Layer 1:
ud
o Activation Function: ReLU, introduced to improve training speed.
o Pooling Layer 1:
o lo
Convolutional Layer 2:
o Pooling Layer 2:
o Convolutional Layer 3:
o Convolutional Layer 4:
Page 15
21CS743 | DEEP LEARNING
o Convolutional Layer 5:
o Pooling Layer 3:
ud
▪ Output size: 6x6x256.
•
▪ lo
Output layer with 1000 neurons (for 1000 classes).
o Dropout:
o Data Augmentation:
Page 16
21CS743 | DEEP LEARNING
o GPU Utilization:
• Significance:
ud
sparked widespread research and development in CNN architectures.
o Highlighted the importance of large labeled datasets and robust training techniques
in achieving state-of-the-art performance.
lo
C
tu
V
Page 17