0% found this document useful (0 votes)

11 views93 pages

Intro To CNN

The document discusses Convolutional Neural Networks (CNNs) and their advantages over traditional feature selection methods in image classification tasks. It highlights the shortcomings of traditional approaches, such as loss of spatial information and dependency on expert knowledge, and explains how CNNs automatically learn relevant features from images. The document also covers the architecture of CNNs, including convolutional layers, pooling layers, and fully connected layers, as well as concepts like feature extraction, receptive fields, and regularization techniques.

Uploaded by

melvin.2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views93 pages

Intro To CNN

Uploaded by

melvin.2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

Convolutional Neural Network

Dr. Thomas Abraham J V

SCOPE, VIT Chennai
1

Feature Selection in Image
• Suppose you have a dataset of images, and you want to classify whether the images contain
cats or dogs.

• In a traditional machine learning approach, you might use handcrafted features such as color
histograms, texture descriptors, and edge information as input to a classifier (e.g., Support
Vector Machine or Random Forest).

2
Shortcomings Of Traditional Feature Selection
• Loss of Spatial Information

• Issue: Traditional feature selection methods might discard important spatial information in
images. Features like color histograms or texture descriptors don't capture the spatial
relationships between pixels effectively.

• Example: The arrangement of pixels in a specific pattern that represents a cat's ear or a dog's
tail might not be adequately captured by traditional features.

• Limited Robustness to Variations

• Issue: Handcrafted features are often designed based on assumptions about the data
distribution. They may not be robust to variations in scale, rotation, or lighting conditions.

• Example: If the cat or dog images vary significantly in pose or lighting, manually selected
features may struggle to generalize across different scenarios.
3
Shortcomings Of Traditional Feature Selection
• •Dependency on Expert Knowledge

• Issue: The selection of handcrafted features relies on domain expertise and may not adapt well to
diverse datasets.

• Example: Features designed for one dataset may not be as effective when applied to a different dataset
with distinct characteristics.

• Solution

• Transition to CNN ( Convolutional Neural Network)

• CNNs automatically learn relevant features, leveraging the spatial relationships within images and
adapting to the complexities of the data.

• This approach is especially powerful in tasks where the structure and arrangement of features are
crucial, such as in computer vision applications.
4
Introduction to CNN

• Convolutional neural network is a class of deep, feed-forward artificial neural networks.

• CNNs, like neural networks, are made up of neurons with learnable weights and biases.

• CNN has an input layer, number of hidden layers, and an output layer.

• Computer vision through CNNs has several applications such as self-driving cars and
robotics.

5
Inspiration Behind CNN
• Hierarchical architecture

• Local connectivity

• Translation invariance

• Multiple feature maps

• Non-linearity

Source
Composing CNNs for complex tasks

7
Source
CNN vs RNN

8
Basic CNN Architecture

9
What are Features

10
What are Features

11
Feature Extraction
• Feature extraction refers to the process of transforming raw input data into a set of
meaningful features that are more representative and informative for solving a particular task
or problem. In the context of machine learning, including Convolutional Neural Networks
(CNNs), feature extraction involves identifying and selecting relevant features from the input
data that can be used for analysis, classification, or other purposes.

• Feature extraction is essential because raw input data, such as images, can contain vast
amounts of information, much of which may be redundant or irrelevant for a specific task.
By extracting relevant features, the model focuses on the most informative aspects of the
data, making learning more efficient and effective. The extracted features serve as inputs to
subsequent layers or classifiers for tasks like image classification, object detection,
segmentation, etc.

12
Receptive Field
There are two types of receptive fields:

• Local Receptive Field: This refers to the spatial area in the

input data that a single neuron in a specific layer is sensitive to.

In a convolutional layer, for instance, each neuron is associated

with a small local receptive field defined by the size of the
filter/kernel applied to the input data.

This local receptive field represents the region of the input

image that the neuron "sees" or is influenced by.

• Global Receptive Field: This refers to the entire spatial area of

the input data that influences the output of a particular neuron
in the network.

It's the combined effect of all the local receptive fields from
preceding layers that contribute to the activation of a specific
neuron in deeper layers.
13
Components of CNN
• Convolutional layer

• Pooling or downsampling layer

• Flattening layer

• Fully connected layer

14
Architecture of the CNNs applied to digit recognition

Source
Components of CNN (contd)
• Convolutional Layers: These layers perform feature extraction by applying convolution
operations using learnable filters (also known as kernels) to the input data. Filters slide across
the input image, extracting features such as edges, textures, or shapes, preserving spatial
relationships.

• Pooling Layers: Pooling layers reduce the spatial dimensions of the feature maps generated
by convolutional layers. Common pooling operations include max pooling and average
pooling, which downsample the data, extracting the most relevant information while
reducing computational complexity.

• Fully Connected Layers: These layers integrate the features learned by the previous layers
and perform classification or regression tasks based on the extracted features. The output of
these layers represents the final prediction or decision made by the network.

16
Filters
• Weight matrix applied to extract local region
features from image
• Many filters could used to extract more
features
• Typical image filter

17
Image checking

Given Image of X Check it is X?

How?
By extracting Features

18
Feature of X in image

19
Using Filters
filter1

20
filter2

21
filter3

22
Convolution operation

23
Edge Detection Algorithm

24
Image filtering

25
Filters in CNN
• In Convolutional neural networks we don't decide the filters but rather just provide the
number of kernel filters in each Convolutional layers

• The values of the kernel filters are learned automatically by the neural network through the
training process, and the filters kernels which results in the features that are most efficient for
the particular classification or the detection are automatically learned.

• The values of the kernel filters are the weights in the particular CNN and those values are
learned rather than decided.

• For classification model CNN doesn’t need to look at every pixel of the image, but abstractly
looks on different parts of the object in the image, but in segmentation problem the model
needs to look at each and every pixel.

26
How it Convolutes

• the convolution of a 5x5 image and a 3x3 filter

• slide the 3x3 filter over the input image, element-

wise multiply, and add the outputs
27
Convolution Operation

28
29
30
31
32
33
34
35
36
Sliding filter to extract local feature

Image Courtesy: https://towardsdatascience.com/convolutional-neural-network-in-natural-37

language-processing-96d67f91275c
Stride
• During convolution, the filter slides
from left to right and from top to
bottom until it passes through the entire
input image.

• Stride is considered as the step of the

filter. So, when we want to downsample
the input image and end up with a
smaller output, we set .

38
Padding

39
40
41
42
43
44
45
46
47
48
49
Padding (contd)
• We have seen that convolving an input of 6 X 6 dimension with a 3 X 3 filter results in 4 X 4
output. We can generalize it and say that if the input is n X n and the filter size is f X f, then
the output size will be (n-f+1) X (n-f+1):

• There are primarily two disadvantages here:

• Every time we apply a convolutional operation, the size of the image shrinks

• Pixels present in the corner of the image are used only a few number of times during
convolution as compared to the central pixels. Hence, we do not focus too much on the
corners since that can lead to information loss.

• To overcome these issues, we can pad the image with an additional border.

50
Count of each pixel usage

51
Zero Padding

52
53
54
55
• valid padding-padding is not used, convolution
normally reduces the spatial output
• full-padding-full padding increases the spatial
ouput

56
Channels

57
Convolution for Multiple Channels

58
Convolution for Multiple Channels

59
60
Multiple Filter Edges
• Generalized dimensions can be given as:

Input: n X n X nc

Filter: f X f X nc

Padding: p

Stride: s

Output:

[(n+2p-f)/s+1] X [(n+2p-f)/s+1] X nf

• Here, nc is the number of channels in

the input and filter, while nf is the
number of filters. 61
How to Choose Kernel Size in CNN?
• Understand the Task and Data

• Consider Input Size and Complexity

• Balance Between Local and Global Information

• Avoid Information Loss

• Experiment and Validate

• Consider Computational Resources

62
Pooling Layer
• The Pooling layer is responsible for reducing the spatial
size of the Convolved Feature. This is to decrease the
computational power required to process the
data through dimensionality reduction.

• It is useful for extracting dominant features which are

rotational and positional invariant, thus maintaining the
process of effectively training the model.

• There are two types of Pooling: Max Pooling and

Average Pooling.

• Max Pooling also performs as a Noise Suppressant

w h e r e a s Av e r a g e P o o l i n g s i m p l y p e r f o r m s
dimensionality reduction as a noise-suppressing
mechanism.
63
Activation / Feature Map Dimension
• Input image dimension is W x H, Filter dimension is K x K, Stride S and Padding P, the output activation
map will have the following dimensions:

W − K + 2P
• Wout = S
+ 1

H − K + 2P
• Hout = S
+ 1

• If the output dimensions are not integers, it means that we haven’t set the stride correctly.

• We have two exceptional cases:

• When there is no padding at all, the output dimensions are

W−K H−K
• ( S + 1,
S
+ 1)
64
Example

• Let’s suppose that we have an input image of size 128 x 128, a filter of size 5 x 5, padding
P=2 and stride S=2. Then the output dimensions are the following:

• So,the output activation map will have dimensions, 62 x 24 x nf

65
Fully Connected Layer

66
Fully Connected Layer
• These layers are in the last layer of the convolutional neural network, and their inputs
correspond to the flattened one-dimensional matrix generated by the last pooling layer. ReLU
activations functions are applied to them for non-linearity.

• Finally, a softmax prediction layer is used to generate probability values for each of the
possible output labels, and the final label predicted is the one with the highest probability
score.

67
Overfitting and Regularization in CNNs
• Deep learning models, especially Convolutional Neural Networks (CNNs), are particularly susceptible to
overfitting due to their capacity for high complexity and their ability to learn detailed patterns in large-scale data.

• Several regularization techniques can be applied to mitigate overfitting in CNNs, and some are illustrated below:

Source
# Build the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10)
])
69
• Layer (type) ┃ Output Shape ┃ Param # ┃

• ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩

• │ conv2d (Conv2D) │ (None, 30, 30, 32) │ 896 │

• ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤

• │ max_pooling2d (MaxPooling2D) │ (None, 15, 15, 32) │ 0│

• │ conv2d_1 (Conv2D) │ (None, 13, 13, 64) │ 18,496 │

• │ max_pooling2d_1 (MaxPooling2D) │ (None, 6, 6, 64) │ 0│

• │ conv2d_2 (Conv2D) │ (None, 4, 4, 64) │ 36,928 │

• │ flatten (Flatten) │ (None, 1024) │ 0│

• │ dense (Dense) │ (None, 64) │ 65,600 │

70
Pre-trained Models
A pre-trained model is a saved network that was previously trained on a large dataset, typically
on a large-scale image classification task. You either use the pretrained model as is or use
transfer learning to customize this model to a given task.

• LeNet

• AlexNet

• ZF-Net

• GoogLeNet

• VGG16/VGG19

• ResNet
71
LeNet – First CNN Architecture
• LeNet was developed in 1998 by Yann LeCun, Corinna Cortes, and Christopher Burges for
handwritten digit recognition problems. The LeNet architecture consists of five convolution layers
followed by two fully connected layers.

Layer Structure:

• Input Layer: Takes in 32x32 grayscale images (MNIST digits were originally 28x28, but they were
padded to 32x32).

• Convolutional Layers: Two convolutional layers with average pooling (then called subsampling)
following each convolution.

• First Convolutional Layer (C1): 6 filters of size 5x5, followed by subsampling.

• Second Convolutional Layer (C3): 16 filters of size 5x5, followed by subsampling.

• Third Convolutional Layer (C5): 120 filters of size 5x5. 72

LeNet - Key Features
• Fully Connected Layers: After the convolutional layers, the output is fed into two fully connected
layers.

• The first fully connected layer (F6) has 84 neurons.

• The output layer has 10 neurons (one for each digit class 0-9).

• The average pooling layer is used for subsampling., ’tanh’ is used as the activation function

73
AlexNet – DL Architecture that popularized CNN
• AlexNet was developed by Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. AlexNet
network had a very similar architecture to LeNet, but was deeper, bigger, and featured
Convolutional Layers stacked on top of each other.

• The AlexNet architecture was designed to be used with large-scale image datasets and it
achieved state-of-the-art results at the time of its publication.

• AlexNet is composed of 5 convolutional layers with a combination of max-pooling layers, 3

fully connected layers, and 2 dropout (or normalized) layers.

• The activation function used in all layers is ReLU. The activation function used in the output
layer is Softmax.

• The input size is mentioned at most of the places as 224x224x3 but due to some padding
which happens it works out to be 227x227x3. The total number of parameters in this
architecture is around 60 million.
74
AlexNet - Key Features
Key Features:

• First to use ReLU (Rectified Linear Unit) activation function, which helps in faster training.

• Introduced dropout layers to reduce overfitting.

• Utilized overlapping max-pooling layers to downsample feature maps.

• Batch size of 128

• SGD Momentum is used as a learning algorithm

• Data Augmentation is been carried out like flipping, jittering, cropping, colour normalization,
etc.

• Made use of GPU computation for training, enabling deeper networks.

75
AlexNet Architecture

76
AlexNet Architecture

77
ZF Net
• ZFnet is the CNN architecture that uses a combination of fully-connected layers and CNNs.
ZF Net was developed by Matthew Zeiler and Rob Fergus.

• It was an improvement on AlexNet by tweaking the architecture hyperparameters, in

particular by expanding the size of the middle convolutional layers and making the stride and
filter size on the first layer smaller.

• ZF Net CNN architecture consists of a total of seven layers: Convolutional layer, max-
pooling layer (downscaling), concatenation layer, convolutional layer with linear activation
function, and stride one, dropout for regularization purposes applied before the fully
connected output.

• The network has relatively fewer parameters than AlexNet and is computationally more
efficient than AlexNet by introducing an approximate inference stage through
deconvolutional layers in the middle of CNNs.
78
Difference between AlexNet and ZFNet
• Architecture

• AlexNet consists of eight layers, five convolutional layers followed by three fully connected
layers. ZFNet retained basic architecture of AlexNet but made some architectural
adjustments,particularly in the first few layers.

• Filters

• AlexNet used 11x11,5x5 and 3x3 filter sizes while ZFNet used 7x7 filter size in the first layer
only and 3x3 in the latter layers only.

• Strides

• There is stride of 4 in the first layer of AlexNet while in ZFNet there is stride of 2 used.

• Normalization

• AlexNet used Local Response Normalization while ZFNet used Local Contrast Normalization 79
GoogLeNet – CNN Architecture used by Google
• GoogLeNet is the CNN architecture used by Google to win ILSVRC 2014 classification task.
It was developed by Jeff Dean, Christian Szegedy, Alexandro Szegedy et al..

• It achieves deeper architecture by employing a number of distinct techniques, including 1×1

convolution and global average pooling. GoogleNet CNN architecture is computationally
expensive.

• Their architecture consisted of a 22 layer deep CNN but reduced the number of parameters
from 60 million (AlexNet) to 4 million. The key features of GoogLeNet: Inception Module,
the 1×1 Convolution, Global Average Pooling, Auxiliary Classifiers for Training.

• To reduce the parameters that must be learned, it uses heavy unpooling layers on top of CNNs
to remove spatial redundancy during training and also features shortcut connections between
the first two convolutional layers before adding new filters in later CNN layers.

• Real-world applications/examples of GoogLeNet CNN architecture include Street View

House Number (SVHN) digit recognition task. 80
GoogLeNet - Key Features
• Introduced the Inception module, which allows the network to extract features at multiple
scales by using convolutional filters of different sizes (1x1, 3x3, 5x5) in parallel.

• Reduced the number of parameters significantly by using 1x1 convolutions, leading to a

deeper network without a massive increase in computational cost.

• Employed global average pooling instead of fully connected layers, reducing the number of
parameters further.

81
GoogLeNet Architecture

Source
VGG Net
• The convolutional neural network model called the VGG model, or VGGNet, that supports 16
layers is also known as VGG16. It was developed by A. Zisserman and K. Simonyan from the
University of Oxford.

• VGGNet accepts 224x224 pixel images as input.

• VGG’s convolutional layers use the smallest feasible receptive field, or 3x3, to record left-to-
right and up-to-down movement. Additionally, 11 convolution filters are used to transform the
input linearly followed by ReLU activation layer.

• VGGNet contains three layers with full connectivity. The first two levels each have 4096
channels, while the third layer has 1000 channels with one channel for each class.

• It is very slow to train (the original VGG model was trained on Nvidia Titan GPU for 2–3
weeks) and it takes quite a lot of disk space and bandwidth which makes it inefficient.

• 138 million parameters lead to exploding gradients problem. 83

VGG - Key Features
• Simplified architecture using only 3x3 convolutional layers stacked on top of each other,
with depth increasing progressively.

• Utilized a consistent design pattern (same filter size, consistent max-pooling) making it
easier to understand and implement.

• The architecture was deeper than previous models (VGG16 with 16 layers and VGG19 with
19 layers), which improved performance.

84
VGG Architecture

Source
ResNet
• ResNet is the CNN architecture that was developed by
Kaiming He et al. to win the ILSVRC 2015 classification task
with a top-five error of only 15.43%. The network has 152
layers and over one million parameters,

86
ResNet
• The skip connection bypasses some levels in between to link-layer activations to subsequent
layers. This creates a leftover block. These leftover blocks are stacked to create resnets.

• The following ResNet implementations are part of Keras Applications and offer ResNet V1
and ResNet V2 with 50, 101, or 152 layers,

• ResNet50, ResNet101, ResNet152

• ResNet50V2, ResNet101V2, ResNet152V2

• ResNetV2 and the original ResNet (V1) vary primarily in that V2 applies batch
normalisation before each weight layer.

87
ResNet-34 Architecture

88
Transfer Learning
• Transfer learning is an approach to machine learning where a model trained on one task is
used as the starting point for a model on a new task. This is done by transferring the
knowledge that the first model has learned about the features of the data to the second model.

• In deep learning, transfer learning is often used to solve problems with limited data. This is
because deep learning models typically require a large amount of data to train, which can be
difficult or expensive to obtain.

89
Transfer Learning

90
Need of Transfer Learning in Deep Learning
• Limited Data: When the dataset is not large enough to train a deep neural network from
scratch.

• Time and Resource Efficiency: Training deep networks from scratch is computationally
expensive and time-consuming.

• Improved Performance: Pre-trained models often lead to better performance as they start
with learned features rather than random initialization.

91
TL Types
• Feature Extraction: Use the representations learned by a previous network to extract
meaningful features from new samples. You simply add a new classifier, which will be
trained from scratch, on top of the pretrained model so that you can repurpose the feature
maps learned previously for the dataset.
.

• Fine-Tuning: Unfreeze a few of the top layers of a frozen model base and jointly train both
the newly-added classifier layers and the last layers of the base model. This allows us to
"fine-tune" the higher-order feature representations in the base model in order to make them
more relevant for the specific task.

92
Transfer learning scenarios
• The target dataset is small and similar to the base training dataset.

• Freeze all the layers except the last, remove the last layer, add the new FC layer with
randomized weights

• The target dataset is large and similar to the base training dataset.

• Unfreeze all the layers of the model and continue the training process with the target dataset or
Initialize the model with pre-trained weights from the base model, then train on the target dataset
as if it were a new task.

• The target dataset is small and different from the base training dataset.

• Freeze most of the layers in the pre-trained model and only train the final few layers on the target
dataset.

• The target dataset is large and different from the base training dataset.
93
• Unfreeze all layers of the pre-trained model and fine-tune the entire model on the target dataset.

CNN Short
No ratings yet
CNN Short
61 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
DL Mod3
No ratings yet
DL Mod3
102 pages
Unit III
No ratings yet
Unit III
60 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
CNN - Convolutional Neural Network
No ratings yet
CNN - Convolutional Neural Network
33 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
19 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
03 - CNN
No ratings yet
03 - CNN
10 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Nria20-Dl - Unit-3 Notes-Final
No ratings yet
Nria20-Dl - Unit-3 Notes-Final
23 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
CNN 1
No ratings yet
CNN 1
19 pages
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
No ratings yet
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
13 pages
Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
CNN (Neural Network)
No ratings yet
CNN (Neural Network)
32 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unit 3
No ratings yet
Unit 3
59 pages
Unit III
No ratings yet
Unit III
8 pages
465-Lecture 5-6
No ratings yet
465-Lecture 5-6
40 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
61 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Unit 5 CNN
No ratings yet
Unit 5 CNN
151 pages
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
No ratings yet
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
26 pages
Module 05 CNN Arctitecture
No ratings yet
Module 05 CNN Arctitecture
7 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
21-Foundations of Convolutional Neural Networks-04!09!2024
No ratings yet
21-Foundations of Convolutional Neural Networks-04!09!2024
10 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
10 pages
Deep Learning Series CNN - 2
No ratings yet
Deep Learning Series CNN - 2
15 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
98 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Unit 2 Convolutional Neural Network
No ratings yet
Unit 2 Convolutional Neural Network
16 pages
CH VI - Convolutional Neural Network - 24
No ratings yet
CH VI - Convolutional Neural Network - 24
33 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Module-4 DL
No ratings yet
Module-4 DL
22 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Seminar
No ratings yet
Seminar
16 pages
CNN Test Answers
No ratings yet
CNN Test Answers
8 pages
CNN 2
No ratings yet
CNN 2
47 pages
Introduction To CNNs
No ratings yet
Introduction To CNNs
26 pages
Interfețe Vizuale Om-Mașină
No ratings yet
Interfețe Vizuale Om-Mașină
15 pages
Neural Network Literature Review
100% (1)
Neural Network Literature Review
5 pages
MNIST Based Handwritten Digits Recognition
No ratings yet
MNIST Based Handwritten Digits Recognition
5 pages
Dive Into Deep Learning: Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola
No ratings yet
Dive Into Deep Learning: Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola
1,222 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 2 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 2 Notes
51 pages
d2l en PDF
No ratings yet
d2l en PDF
995 pages
Unit 2 - Neural Networks (DL Illustrated)
No ratings yet
Unit 2 - Neural Networks (DL Illustrated)
146 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
A Survey of Convolutional Neural Networks Analysis Applications and Prospects
No ratings yet
A Survey of Convolutional Neural Networks Analysis Applications and Prospects
21 pages
Suspicious Activity Detection Using Deep Learning Approach
No ratings yet
Suspicious Activity Detection Using Deep Learning Approach
6 pages
Brain Tumor Mri
No ratings yet
Brain Tumor Mri
23 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
Deep Learning Computer Vision NLP
No ratings yet
Deep Learning Computer Vision NLP
140 pages
A Survey of Multimodal Hybrid Deep Learning For Computer Vision
No ratings yet
A Survey of Multimodal Hybrid Deep Learning For Computer Vision
28 pages
CNN Architectures - LeNet, AlexNet, VGG, GoogLeNet, ResNet and More - by Siddharth Das - Analytics Vidhya - Medium
No ratings yet
CNN Architectures - LeNet, AlexNet, VGG, GoogLeNet, ResNet and More - by Siddharth Das - Analytics Vidhya - Medium
6 pages
A Review of Deep Learning Approaches in Clinical and Healthcare Systems Based On Medical Image Analysis
No ratings yet
A Review of Deep Learning Approaches in Clinical and Healthcare Systems Based On Medical Image Analysis
42 pages
Bacteria Classification Using Image Processing and Deep Learning
No ratings yet
Bacteria Classification Using Image Processing and Deep Learning
3 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
Tensorflow
No ratings yet
Tensorflow
9 pages
American Sign Language Recognition Using Machine Learning and Com
No ratings yet
American Sign Language Recognition Using Machine Learning and Com
57 pages
Curriculum CVDL Master Program Updated
No ratings yet
Curriculum CVDL Master Program Updated
42 pages
Convolutional Neural Networks (CNN)
No ratings yet
Convolutional Neural Networks (CNN)
7 pages
Performance Evaluation of Low-Precision Quantized LeNet and ConvNet Neural Networks
No ratings yet
Performance Evaluation of Low-Precision Quantized LeNet and ConvNet Neural Networks
7 pages
d2l en PDF
100% (1)
d2l en PDF
1,024 pages
12th ICCCNT 2021 Paper 87
No ratings yet
12th ICCCNT 2021 Paper 87
7 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
76 pages
4 Deep Learning and AI Development Framework Lab Guide - MindSpore
No ratings yet
4 Deep Learning and AI Development Framework Lab Guide - MindSpore
64 pages
Student Notes - Convolutional Neural Networks (CNN) Introduction - Belajar Pembelajaran Mesin Indonesia
No ratings yet
Student Notes - Convolutional Neural Networks (CNN) Introduction - Belajar Pembelajaran Mesin Indonesia
14 pages

Intro To CNN

Uploaded by

Intro To CNN

Uploaded by

Convolutional Neural Network

Dr. Thomas Abraham J V

• Limited Robustness to Variations

• Transition to CNN ( Convolutional Neural Network)

• Convolutional neural network is a class of deep, feed-forward artificial neural networks.

• Multiple feature maps

• Local Receptive Field: This refers to the spatial area in the

In a convolutional layer, for instance, each neuron is associated

This local receptive field represents the region of the input

• Global Receptive Field: This refers to the entire spatial area of

• Pooling or downsampling layer

• Fully connected layer

Given Image of X Check it is X?

• the convolution of a 5x5 image and a 3x3 filter

• slide the 3x3 filter over the input image, element-

Image Courtesy: https://towardsdatascience.com/convolutional-neural-network-in-natural-37

• Stride is considered as the step of the

• There are primarily two disadvantages here:

• Here, nc is the number of channels in

• Consider Input Size and Complexity

• Balance Between Local and Global Information

• Avoid Information Loss

• Experiment and Validate

• Consider Computational Resources

• It is useful for extracting dominant features which are

• There are two types of Pooling: Max Pooling and

• Max Pooling also performs as a Noise Suppressant

• We have two exceptional cases:

• When there is no padding at all, the output dimensions are

• So,the output activation map will have dimensions, 62 x 24 x nf

• │ conv2d (Conv2D) │ (None, 30, 30, 32) │ 896 │

• │ max_pooling2d (MaxPooling2D) │ (None, 15, 15, 32) │ 0│

• │ conv2d_1 (Conv2D) │ (None, 13, 13, 64) │ 18,496 │

• │ max_pooling2d_1 (MaxPooling2D) │ (None, 6, 6, 64) │ 0│

• │ conv2d_2 (Conv2D) │ (None, 4, 4, 64) │ 36,928 │

• │ flatten (Flatten) │ (None, 1024) │ 0│

• │ dense (Dense) │ (None, 64) │ 65,600 │

• First Convolutional Layer (C1): 6 filters of size 5x5, followed by subsampling.

• Second Convolutional Layer (C3): 16 filters of size 5x5, followed by subsampling.

• Third Convolutional Layer (C5): 120 filters of size 5x5. 72

• The first fully connected layer (F6) has 84 neurons.

• AlexNet is composed of 5 convolutional layers with a combination of max-pooling layers, 3

• Introduced dropout layers to reduce overfitting.

• Utilized overlapping max-pooling layers to downsample feature maps.

• Batch size of 128

• SGD Momentum is used as a learning algorithm

• Made use of GPU computation for training, enabling deeper networks.

• It was an improvement on AlexNet by tweaking the architecture hyperparameters, in

• It achieves deeper architecture by employing a number of distinct techniques, including 1×1

• Real-world applications/examples of GoogLeNet CNN architecture include Street View

• Reduced the number of parameters significantly by using 1x1 convolutions, leading to a

• VGGNet accepts 224x224 pixel images as input.

• 138 million parameters lead to exploding gradients problem. 83

• ResNet50, ResNet101, ResNet152

• ResNet50V2, ResNet101V2, ResNet152V2

You might also like