0% found this document useful (0 votes)

4 views

Deep Learning Module-04

The document covers key concepts in deep learning, specifically focusing on convolutional networks, including definitions, mathematical formulations, and parameters like stride and padding. It discusses the significance of convolution and pooling in neural networks for feature extraction and efficiency, as well as variants of convolution functions and notable architectures like LeNet-5 and AlexNet. Additionally, it highlights the importance of structured outputs, efficient algorithms, and techniques for unsupervised learning.

Uploaded by

sanjana sm

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Deep Learning Module-04

Uploaded by

sanjana sm

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

21CS743 | DEEP LEARNING

Module-04

Convolutional Networks

Definition of Convolution

• Convolution: A mathematical operation that combines two functions (input signal/image

and filter/kernel) to produce a third function.

ud
• Purpose: Captures important patterns and structures in the input data, crucial for tasks like
image recognition.

lo
C
2. Mathematical Formulation
tu
V

Page 1
21CS743 | DEEP LEARNING

3. Parameters of Convolution

a. Stride

• Definition: The number of pixels the filter moves over the input.

• Types:

o Stride of 1: Filter moves one pixel at a time, resulting in a detailed output.

ud
o Stride of 2: Filter moves two pixels at a time, reducing output size (downsampling).

b.Padding

•
lo
Definition: Adding extra pixels around the input image.

Types:
C
o Valid Padding: No padding applied; results in a smaller output feature map.

o Same Padding: Padding applied to maintain the same output dimensions as the
input.
tu

4. Significance in Neural Networks

• Application: Used in convolutional layers of CNNs to extract features from images.

• Learning Hierarchical Representations: Stacked convolutional layers enable learning of

complex patterns, essential for image classification and other tasks.

Page 2
21CS743 | DEEP LEARNING

Purpose of Pooling

• Spatial Size Reduction: Decreases the dimensions of the feature maps.

• Parameter and Computation Reduction: Reduces the number of parameters and

computations in the network.

• Overfitting Control: Helps to control overfitting by providing a form of translational

ud
invariance.

2. Types of Pooling
lo
C
a. Max Pooling

• Definition: Selects the maximum value from each patch (sub-region) of the feature map.
tu

• Purpose: Captures the most prominent features while reducing spatial dimensions.

b. Average Pooling
V

• Definition: Takes the average value from each patch of the feature map.

• Purpose: Provides a smooth representation of features, reducing sensitivity to noise.

Page 3
21CS743 | DEEP LEARNING

3. Operation of Pooling

ud
•
lo
4. Significance in Neural Networks

Feature Extraction: Reduces the size of the feature maps while retaining the most relevant
features.
C
• Efficiency: Decreases computational load, allowing deeper networks to train faster.

• Robustness: Provides a degree of invariance to small translations in the input, making the
model more robust.
tu
V

Page 4
21CS743 | DEEP LEARNING

1. Convolution as an Infinitely Strong Prior

• Focus on Local Patterns: Emphasizes the importance of local patterns in the data (e.g.,
edges and textures) over global patterns.

• Effectiveness in CNNs: This locality assumption enhances the effectiveness of

Convolutional Neural Networks (CNNs) for image and video analysis.

ud
2. Pooling as an Infinitely Strong Prior

• Enhances Translational Invariance: Allows the network to recognize objects regardless

of their position within the image.

• Reduces Sensitivity to Position: By downsampling, pooling reduces sensitivity to the

exact location of features, improving generalization.

•
lo
3. Significance in Neural Networks

Feature Learning: Both operations prioritize local features, enabling efficient learning of
essential characteristics from input data.
C
• Improved Generalization: The combination of convolution and pooling enhances the
model's ability to generalize across various input variations.
tu
V

Page 5
21CS743 | DEEP LEARNING

Variants of the Basic Convolution Function

1. Dilated Convolutions

• Definition: Introduces spacing (dilation) between kernel elements.

• Wider Context: Allows the model to incorporate a wider context of the input data without
significantly increasing the number of parameters.

ud
• Applications: Useful in tasks where understanding broader spatial relationships is
important, such as in semantic segmentation.

2. Depthwise Separable Convolutions

• Two-Stage Process:

o
lo
Depthwise Convolution: Applies a separate convolution for each input channel,
reducing computational complexity.

Pointwise Convolution: Uses 1x1 convolutions to combine the outputs from the
depthwise convolution.
C
• Parameter Efficiency: Reduces the number of parameters and computations compared to
standard convolutions while maintaining performance.
tu

• Applications: Commonly used in lightweight models, such as MobileNets, for mobile and
edge devices.
V

Page 6
21CS743 | DEEP LEARNING

1. Definition of Structured Outputs

• Structured Outputs: Refers to tasks where the output has a specific structure or spatial
arrangement, such as pixel-wise predictions in image segmentation or keypoint localization
in object detection.

2. Importance in Semantic Segmentation

Maintaining Spatial Structure: For tasks like semantic segmentation, it’s crucial to

ud
•

maintain the spatial relationships between pixels in predictions to ensure that the output
accurately represents the original input image.

3. Specialized Networks

• Network Design: Specialized neural network architectures, such as Fully Convolutional

•
lo
Networks (FCNs), are designed to handle structured outputs by replacing fully connected
layers with convolutional layers, allowing for spatially consistent predictions.

Skip Connections: Techniques like skip connections (used in U-Net and ResNet) help
preserve high-resolution features from earlier layers, improving the accuracy of the output.
C
4. Adjusted Loss Functions

• Loss Function Modification: Loss functions may be adjusted to enforce structural

consistency in the predictions. Common approaches include:

o Pixel-wise Loss: Evaluating the loss on a per-pixel basis (e.g., Cross-Entropy Loss
for segmentation).

o Structural Loss: Incorporating penalties for structural deviations, such as Dice

Loss or Intersection over Union (IoU) metrics, which consider the overlap between
predicted and true regions.

Page 7
21CS743 | DEEP LEARNING

5. Applications

• Use Cases: Structured output networks are widely used in various applications, including:

o Semantic Segmentation: Assigning class labels to each pixel in an image.

o Instance Segmentation: Identifying and segmenting individual object instances

within an image.

ud
o Object Detection: Predicting bounding boxes and class labels for objects in an
image while maintaining spatial relations.

Data Types

lo
C
tu
V

1. 2D Images

• Standard Input: The most common input type for CNNs, typically used in image
classification, object detection, and segmentation tasks.

• Format: Represented as height × width × channels (e.g., RGB images have three channels).

Page 8
21CS743 | DEEP LEARNING

2. 3D Data

• Definition: Includes video processing and volumetric data, such as those found in medical
imaging (e.g., MRI or CT scans).

• Format: Represented as depth × height × width × channels, allowing the network to

capture spatial and temporal information.

ud
• Applications: Useful in tasks like action recognition in videos or analyzing 3D medical
images for diagnosis.

3. 1D Data

• Definition: Consists of sequential data, such as time-series data or audio signals.

• Format: Represented as sequences of data points, often one-dimensional.

•
lo
Applications: Used in tasks like speech recognition, audio classification, and analyzing
sensor data from IoT devices.
C
Efficient Convolution Algorithms

1. Fast Fourier Transform (FFT)

• Definition: A mathematical algorithm that computes the discrete Fourier transform (DFT)
and its inverse, converting signals between time (or spatial) domain and frequency domain.

• Convolution in Frequency Domain:

o Convolution in the time or spatial domain can be transformed into multiplication in

the frequency domain, which is often more computationally efficient for large
kernels.

Page 9
21CS743 | DEEP LEARNING

• Applications: Commonly used in applications requiring large kernel convolutions, such as

in image processing and signal analysis.

ud
2. Winograd's Algorithms

• Definition: A set of algorithms designed to optimize convolution operations by reducing

the number of multiplications needed.

• Efficiency Improvement:

o
lo
Winograd's algorithms work by rearranging the computation of convolution to
minimize redundant calculations.

They can reduce the complexity of convolution operations, particularly for small
kernels, making them more efficient in terms of computational resources.
C
• Key Concepts:

o The algorithms break down the convolution operation into smaller components,
tu

allowing for fewer multiplicative operations and leveraging addition and

subtraction instead.

o They are particularly effective in scenarios where computational efficiency is

critical, such as mobile devices or real-time applications.
V

• Applications: Frequently used in lightweight models and resource-constrained

environments where computational power and memory usage are limited.

Page 10
21CS743 | DEEP LEARNING

1. Random Feature Maps

• Definition: A technique that uses random projections to map input data into a higher-
dimensional space, facilitating the extraction of features without the need for labels.

• Purpose: Helps to approximate kernel methods, enabling linear models to learn complex
functions.

ud
• Advantages:

o Efficiency: Reduces the computational burden of traditional kernel methods while

retaining useful information.

o Scalability: Suitable for large datasets as it allows for faster training times.

• Applications: Commonly used in tasks where labeled data is scarce, such as clustering and
anomaly detection.

2. Autoencoders

•
lo
Definition: A type of neural network designed to learn efficient representations of data
C
through unsupervised learning by encoding the input into a lower-dimensional space and
then reconstructing it back.

• Structure:
tu

o Encoder: Compresses the input data into a latent representation.

o Decoder: Reconstructs the original input from the latent representation.

• Purpose: Learns to capture important features and structures in the data without
V

supervision, making it effective for dimensionality reduction and feature extraction.

• Advantages:

o Robustness: Can learn from noisy data and still produce meaningful
representations.

Page 11
21CS743 | DEEP LEARNING

o Flexibility: Can be adapted for various tasks, including denoising, anomaly

detection, and generative modeling.

• Applications: Used in scenarios such as image compression, data denoising, and

generating new data samples.

3. Facilitation of Unsupervised Learning

ud
• Role in Unsupervised Learning: Both methods enable the extraction of meaningful
features from unlabelled data, facilitating learning in scenarios where obtaining labeled
data is challenging or expensive.

• Enhancing Model Performance: By leveraging these techniques, models can improve

their performance on downstream tasks, such as clustering, classification, or regression,
even in the absence of labels.

lo
C
tu
V

Page 12
21CS743 | DEEP LEARNING

Notable Architectures

ud
1. LeNet-5
lo
C
• Introduction:

o Developed by Yann LeCun and colleagues in 1998.

o One of the first convolutional networks designed specifically for image recognition
tasks.

• Architecture Details:

o Input Layer: Takes in grayscale images of size 32x32 pixels.

o Convolutional Layer 1:

▪ 6 filters (5x5) with a stride of 1.

▪ Output size: 28x28x6.

o Activation Function: Sigmoid or hyperbolic tangent (tanh).

Page 13
21CS743 | DEEP LEARNING

o Pooling Layer 1:

▪ Average pooling (subsampling) with a 2x2 filter and a stride of 2.

▪ Output size: 14x14x6.

o Convolutional Layer 2:

▪ 16 filters (5x5).

ud
▪ Output size: 10x10x16.

o Pooling Layer 2:

▪ Average pooling (2x2).

▪ Output size: 5x5x16.

o lo
Fully Connected Layers:

▪ 120 neurons in the first layer.

C
▪ 84 neurons in the second layer.

▪ Output layer with 10 neurons (for digit classes 0-9).

• Significance:
tu

o Introduced the concept of using convolutional layers for feature extraction followed
by pooling layers for dimensionality reduction.

o Paved the way for modern CNNs, influencing later architectures.

2. AlexNet

• Introduction:

o Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012.

o Marked a breakthrough in deep learning by achieving top performance in the

ImageNet competition.

Page 14
21CS743 | DEEP LEARNING

• Architecture Details:

o Input Layer: Accepts images of size 224x224 pixels (RGB).

o Convolutional Layer 1:

▪ 96 filters (11x11) with a stride of 4.

▪ Output size: 55x55x96.

ud
o Activation Function: ReLU, introduced to improve training speed.

o Pooling Layer 1:

▪ Max pooling (3x3) with a stride of 2.

▪ Output size: 27x27x96.

o lo
Convolutional Layer 2:

▪ 256 filters (5x5).

C
▪ Output size: 27x27x256.

o Pooling Layer 2:

▪ Max pooling (3x3).

▪ Output size: 13x13x256.

o Convolutional Layer 3:

▪ 384 filters (3x3).

▪ Output size: 13x13x384.

o Convolutional Layer 4:

▪ 384 filters (3x3).

▪ Output size: 13x13x384.

Page 15
21CS743 | DEEP LEARNING

o Convolutional Layer 5:

▪ 256 filters (3x3).

▪ Output size: 13x13x256.

o Pooling Layer 3:

▪ Max pooling (3x3).

ud
▪ Output size: 6x6x256.

o Fully Connected Layers:

▪ First layer with 4096 neurons.

▪ Second layer with 4096 neurons.

•
▪ lo
Output layer with 1000 neurons (for 1000 classes).

Innovative Techniques Introduced:

C
o ReLU Activation:

▪ Enabled faster convergence during training compared to traditional

activation functions like sigmoid or tanh.
tu

o Dropout:

▪ Regularization method that randomly drops neurons during training to

prevent overfitting, significantly improving generalization.
V

o Data Augmentation:

▪ Used techniques like image rotation, translation, and flipping to artificially

expand the training dataset and improve robustness.

Page 16
21CS743 | DEEP LEARNING

o GPU Utilization:

▪ Leveraged parallel processing power of GPUs, enabling training on large

datasets in a reasonable timeframe.

• Significance:

o Established deep learning as a powerful approach for image classification and

ud
sparked widespread research and development in CNN architectures.

o Highlighted the importance of large labeled datasets and robust training techniques
in achieving state-of-the-art performance.

lo
C
tu
V

Page 17

Numerical Methods For Scientists and Engineers - K. S. Rao
No ratings yet
Numerical Methods For Scientists and Engineers - K. S. Rao
123 pages
The Multiple Feedback
No ratings yet
The Multiple Feedback
10 pages
Deep Learning Module-04 Search Creators
No ratings yet
Deep Learning Module-04 Search Creators
17 pages
Module-4 dl
No ratings yet
Module-4 dl
22 pages
Unit 2 DLT
No ratings yet
Unit 2 DLT
8 pages
Final
No ratings yet
Final
30 pages
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
No ratings yet
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
13 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Module 3
No ratings yet
Module 3
34 pages
Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
Introduction To Convolutional Neural Networks (CNNS)
28 pages
Unit 3
No ratings yet
Unit 3
80 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
deep learning u3
No ratings yet
deep learning u3
3 pages
Unit 2 Convolutional Neural Network
No ratings yet
Unit 2 Convolutional Neural Network
16 pages
DL Mod 3
No ratings yet
DL Mod 3
65 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Unit Iii Convolutional Networks and Sequence Modelling
No ratings yet
Unit Iii Convolutional Networks and Sequence Modelling
38 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
visualProcessing
No ratings yet
visualProcessing
25 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
Cv Ppt Mt101
No ratings yet
Cv Ppt Mt101
16 pages
Building and Training Your Own 2D CNN Model With OpendTect - Session 1 - 061523
No ratings yet
Building and Training Your Own 2D CNN Model With OpendTect - Session 1 - 061523
13 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
3 pages
UNIT-2 - Part-1
No ratings yet
UNIT-2 - Part-1
116 pages
Object Detection With Deep Learning
No ratings yet
Object Detection With Deep Learning
3 pages
Ch-3 Convolutional Neural Networks (CNNs)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNs)
11 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
CO2_CNN_3
No ratings yet
CO2_CNN_3
31 pages
Reviewer - Convolutional Neural Networks (CNNs) - Muqaddas Bin Tahir
No ratings yet
Reviewer - Convolutional Neural Networks (CNNs) - Muqaddas Bin Tahir
8 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
13 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
Antim Prahar AI and ML for Business 2025
No ratings yet
Antim Prahar AI and ML for Business 2025
45 pages
CNN Test Answers
No ratings yet
CNN Test Answers
8 pages
4. Structured outputs- Data types
No ratings yet
4. Structured outputs- Data types
19 pages
Pattern Recognition
No ratings yet
Pattern Recognition
14 pages
CNN_Image_Processing_Presentation
No ratings yet
CNN_Image_Processing_Presentation
8 pages
Machine Learning (CSO851) - Lecture 10
No ratings yet
Machine Learning (CSO851) - Lecture 10
83 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Two-Dimensional Wavelets: ECE 802 Spring 2010
No ratings yet
Two-Dimensional Wavelets: ECE 802 Spring 2010
61 pages
Deep Learning: Alberto Ezpondaburu
No ratings yet
Deep Learning: Alberto Ezpondaburu
58 pages
CNN
No ratings yet
CNN
5 pages
21-Foundations of Convolutional Neural Networks-04!09!2024
No ratings yet
21-Foundations of Convolutional Neural Networks-04!09!2024
10 pages
Minggu04 - Convolutional Neural Network (CNN)
No ratings yet
Minggu04 - Convolutional Neural Network (CNN)
55 pages
Review_3DPlanNet__Generating_3D_Models_from_2D_Floor_Plan_Images_Using_Ensemble_Methods
No ratings yet
Review_3DPlanNet__Generating_3D_Models_from_2D_Floor_Plan_Images_Using_Ensemble_Methods
4 pages
Digital Image Processing
No ratings yet
Digital Image Processing
23 pages
Rinda Seminar Final
No ratings yet
Rinda Seminar Final
49 pages
Pdclab 6
No ratings yet
Pdclab 6
15 pages
Unit II
No ratings yet
Unit II
35 pages
MODULE 4
No ratings yet
MODULE 4
2 pages
Aiml Neural Net
No ratings yet
Aiml Neural Net
19 pages
Introduction To CNNs
No ratings yet
Introduction To CNNs
26 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
586_114_216_Convolutional_Neural_Networks
No ratings yet
586_114_216_Convolutional_Neural_Networks
48 pages
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
M4_IA2
No ratings yet
M4_IA2
6 pages
Module 3
No ratings yet
Module 3
67 pages
Unit III
No ratings yet
Unit III
89 pages
Document 1
No ratings yet
Document 1
2 pages
MLT UNIT-4 & 5 imp sol
No ratings yet
MLT UNIT-4 & 5 imp sol
22 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Bmte 144
No ratings yet
Bmte 144
6 pages
Digital Communication
No ratings yet
Digital Communication
63 pages
Divide and Conquer - Continued
No ratings yet
Divide and Conquer - Continued
6 pages
EEN-305 01A IntroSS
No ratings yet
EEN-305 01A IntroSS
12 pages
AI PERA
No ratings yet
AI PERA
10 pages
Algorithm U3 Answer Key
No ratings yet
Algorithm U3 Answer Key
26 pages
Bempong Kwasi Gyimah 5862816 Assignment 2
No ratings yet
Bempong Kwasi Gyimah 5862816 Assignment 2
8 pages
EE 411-Digital Signal Processing-Waqas Majeed
No ratings yet
EE 411-Digital Signal Processing-Waqas Majeed
4 pages
Operations Research Paper PDF
No ratings yet
Operations Research Paper PDF
4 pages
Analisis Regrensi Linear Berganda
No ratings yet
Analisis Regrensi Linear Berganda
15 pages
Hamming Code Examples
No ratings yet
Hamming Code Examples
12 pages
FS5 Quiz
No ratings yet
FS5 Quiz
3 pages
Hashing
No ratings yet
Hashing
40 pages
Numerical Integration
No ratings yet
Numerical Integration
67 pages
Dsa & CP Roadmap - Resources by Hclub7
No ratings yet
Dsa & CP Roadmap - Resources by Hclub7
8 pages
Mastering Machine Learning with scikit learn 2nd edition Gavin Hackeling - The full ebook with all chapters is available for download
100% (2)
Mastering Machine Learning with scikit learn 2nd edition Gavin Hackeling - The full ebook with all chapters is available for download
46 pages
6.034 Quiz 1, Spring 2005: 1 Search Algorithms (16 Points)
No ratings yet
6.034 Quiz 1, Spring 2005: 1 Search Algorithms (16 Points)
14 pages
G10 Q1 WK 6 Module 8 PDF
100% (2)
G10 Q1 WK 6 Module 8 PDF
21 pages
A Feature Selection Technique Based Approach For Predicting Student 2021
No ratings yet
A Feature Selection Technique Based Approach For Predicting Student 2021
10 pages
Bisection Method
100% (1)
Bisection Method
15 pages
Chapter 2 - Computer Program Algorithm
No ratings yet
Chapter 2 - Computer Program Algorithm
8 pages
17 String Matching - Rabin Karp Algorithm
No ratings yet
17 String Matching - Rabin Karp Algorithm
25 pages
Arnav MLlab02
No ratings yet
Arnav MLlab02
6 pages
Graphrnn: A Deep Generative Model For Graphs
No ratings yet
Graphrnn: A Deep Generative Model For Graphs
29 pages
Traveling Salesman Problem (EXT.) : Prof. U. K. Bhattacharya
No ratings yet
Traveling Salesman Problem (EXT.) : Prof. U. K. Bhattacharya
18 pages
Factoring: Math 8 Teacher Jervy Josiah D. Bayang
No ratings yet
Factoring: Math 8 Teacher Jervy Josiah D. Bayang
23 pages
Bus Impedance Matrix Building Algorithm Illustrated With An Example
No ratings yet
Bus Impedance Matrix Building Algorithm Illustrated With An Example
3 pages
Tugas Kelompok Matlan
No ratings yet
Tugas Kelompok Matlan
3 pages