0% found this document useful (0 votes)

5 views21 pages

CNN 3

The document discusses the components and variants of Convolutional Neural Networks (CNNs), focusing on their application in autonomous vehicle systems for object recognition. It covers key processes such as feature extraction, activation functions, pooling, and CNN architecture, emphasizing the importance of each layer in identifying and classifying objects. Additionally, it highlights the significance of the ImageNet dataset in advancing deep learning and computer vision tasks.

Uploaded by

arpitshuklaji9919

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views21 pages

CNN 3

Uploaded by

arpitshuklaji9919

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Minor in AI

Components & Variants of CNN

January 07, 2025

Minor in AI

1 Driving Vision: How CNNs Recognize Vehicles

Imagine you are designing an autonomous vehicle system. The car must accurately iden-
tify objects like pedestrians, traffic signs, and other vehicles in real-time from a camera
feed. Each object has distinctive features, such as shapes, edges, and textures, which
can help the system recognize and classify it. The challenge is to extract these features
and use them to make decisions like braking, accelerating, or turning, ensuring the car
navigates safely.

Figure 1: Classification of Vehicles

1.1 Problem Statement

In autonomous driving, accurately identifying objects such as cars, pedestrians, and traffic
signs is critical for ensuring safety. This requires extracting and analyzing features from
images captured by cameras to classify objects in real-time.

1.2 What needs to be done?

• Feature Extraction: Use convolutional layers to extract basic features such as
edges, lines, and curves from the image of a car.

• Feature Combination: Gradually combine simpler features (e.g., wheels, win-

dows) into more complex patterns to identify the object as a car.

• Layer-wise Processing: Understand how each layer in a Convolutional Neural

Network (CNN) contributes to detecting finer details and accumulates these features
to form a comprehensive representation of the car.

• Decision Making: Use the accumulated features in the fully connected layers to
classify the object as a car or another category.

This process enables a CNN to mimic human vision by starting with basic patterns
and building up to recognize complex objects effectively.

The image in Figure 2 shows how a Convolutional Neural Network (CNN) processes
an image step-by-step to classify it as a specific object, like a car. Here’s what happens:

Components & Variants of CNN 1

Minor in AI

Figure 2: The activations of an example ConvNet architecture. The initial volume stores
the raw image pixels (left) and the last volume stores the class scores (right). Each
volume of activations along the processing path is shown as a column. Since it’s difficult
to visualize 3D volumes, we lay out each volume’s slices in rows. The last layer volume
holds the scores for each class, but here we only visualize the sorted top 5 scores, and
print the labels of each one.

1. At the start, we have a picture of a car (on the left). This image is processed by a
series of operations called convolution, activation (ReLU), and pooling. These steps
help the network focus on specific details in the image.

2. In the first convolutional (Conv) layer, the network uses 10 filters to look for simple
patterns like edges or textures. For example, it may detect the round shape of the
wheels or the straight lines of the car’s body.

3. As we go deeper into the layers, these features are combined to form more complex
patterns. For example, in the later layers, the network might combine the wheel
and the car’s body to identify it as a complete car.

4. The image becomes cluttered with more features as we move deeper, meaning the
network gathers a combination of patterns that represent the object.

5. Finally, the network accumulates all the features from the earlier layers and makes
a decision (in this case, classifying the object as a car).

1.3 Key Takeaways

1. Feature Inside a Feature: After applying the first Conv layer, the network cap-
tures simple features, like the edges of a car, which are parts of larger features.

2. Feature Extraction: As you move through the layers, the network keeps extract-
ing features and combining them to understand the object better.

Components & Variants of CNN 2

Minor in AI

Figure 3: For instance, if the image was of the number 7, the initial layers might focus
on the curve or its straight line. As we go deeper, these individual features get combined
to form the entire number 7

3. Combination of Features: Later layers combine simpler features (like wheels and
edges) into more complex representations (like the full shape of a car).
4. Decision Making: The accumulated features are used to classify the object (e.g.,
as a car or a truck) in the final layer.

This layered approach in CNNs mimics how humans recognize objects, starting with
simple patterns and combining them into a complete picture. It is highly effective for tasks
like object detection and classification, making it a vital tool in fields like autonomous
vehicles, medical imaging, and more.

2 Activation Functions
2.1 Why activation functions are necessary?
After extracting features from raw image data using convolutional layers, the network
combines these features into a linear representation. However, many real-world problems,
including object recognition, are inherently non-linear. To enable the network to capture
these complex patterns and make meaningful decisions, activation functions introduce
non-linearity into the network. Without non-linearity, the network would be limited to
learning only linear mappings, regardless of its depth.

2.2 Common Activation Functions

1. Sigmoid Function
The sigmoid activation function is defined as:
1
σ(x) =
1 + e−x
• Range: (0, 1)
• Pros: Useful for binary classification as it maps outputs to probabilities.
• Cons: Prone to the vanishing gradient problem, making it less suitable for deep
networks.

Components & Variants of CNN 3

Minor in AI

Figure 4: Flow of NN with activation function

2. Hyperbolic Tangent (Tanh) Function

The tanh function is given by:
ex − e−x
tanh(x) =
ex + e−x
• Range: (-1, 1)
• Pros: Zero-centered output, which helps in faster convergence compared to sigmoid.
• Cons: Suffers from the vanishing gradient problem in deep networks.

3. Rectified Linear Unit (ReLU)

The ReLU function is defined as:
f (x) = max(0, x)
• Range: [0, ∞)
• Pros: Efficient, avoids the vanishing gradient problem, and accelerates convergence.
• Cons: Can suffer from the dying ReLU problem where neurons get stuck during
training.

4. Leaky ReLU
The Leaky ReLU addresses the dying ReLU problem by allowing a small gradient when
the input is negative: (
x if x > 0
f (x) =
αx if x ≤ 0
where α is a small positive constant (e.g., 0.01).
• Range: (-∞, ∞)
• Pros: Prevents neurons from becoming inactive.

Components & Variants of CNN 4

Minor in AI

Figure 5: Softmax Calculation for three classes

5. Softmax
The softmax function is primarily used in the output layer for multi-class classification.
It converts raw scores (logits) into probabilities:
exi
Softmax(xi ) = P xj
je

• Range: (0, 1)

• Pros: Provides a probabilistic interpretation for multi-class problems.

6. Swish
The swish function, proposed by Google, is defined as:
1
f (x) = x · σ(x) = x ·
1 + e−x
• Range: (-∞, ∞)

• Pros: Smooth and non-monotonic, often outperforms ReLU in certain tasks.

2.3 Key Takeaways

• Activation functions introduce non-linearity, allowing neural networks to learn com-
plex, non-linear relationships in data.

• ReLU and its variants (like Leaky ReLU) are the most commonly used due to their
simplicity and efficiency in avoiding the vanishing gradient problem.

• Sigmoid and Tanh are useful in specific scenarios but are prone to the vanishing
gradient problem in deeper networks.

• Softmax is primarily used in the output layer for multi-class classification tasks,
providing a probabilistic interpretation.

Components & Variants of CNN 5

Minor in AI

• Swish and other newer functions may offer better performance for certain tasks due
to their smooth, non-monotonic nature.

• The choice of activation function depends on the problem type, network architecture,
and the nature of the data.

Figure 6: Summary

3 Pooling
3.1 Why Pooling?
Pooling is an essential operation in Convolutional Neural Networks (CNNs) that reduces
the spatial dimensions of feature maps. This serves two main purposes:

Components & Variants of CNN 6

Minor in AI

• Dimensionality Reduction: By down-sampling the feature maps, pooling reduces

the computational complexity of the network.

• Focus on Key Features: Pooling helps retain the dominant information, making
it easier for the network to recognize patterns crucial for classification.

Figure 7: Pooling layer downsamples the volume spatially, independently in each depth
slice of the input volume. Left: In this example, the input volume of size [224x224x64]
is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that
the volume depth is preserved. Right: The most common downsampling operation is
max, giving rise to max pooling, here shown with a stride of 2. That is, each max is taken
over 4 numbers (little 2x2 square).

3.2 Types of Pooling

There are several types of pooling, each with a specific use case:

• Max Pooling: Selects the maximum value from the pooling window. It captures
the most prominent features, ensuring the dominant information is retained.

• Average Pooling: Computes the average value of the pooling window. It is used
when smoother or more generalized feature extraction is required.

• Global Pooling: Reduces the feature map to a single value by applying pooling
over the entire map. It is commonly used in tasks like object detection.

3.3 Kernel and Stride

• Kernel: The size of the window (e.g., 2 × 2, 3 × 3) used to perform the pooling
operation.

• Stride: The step size by which the kernel moves across the feature map. A stride
greater than 1 helps in reducing the dimensions of the output.

3.4 Why Max Pooling?

The primary goal of CNNs is to extract and identify the dominant features that are most
significant for classification. Max pooling is effective because:

Components & Variants of CNN 7

Minor in AI

• It ensures that only the most important information in each region is retained.

• This is sufficient for identifying an image with a label, as the main features are
enough to differentiate between classes.

3.5 Feature Extraction vs Classification

• CNNs are often referred to as feature extractors, as they are responsible for
identifying patterns and extracting essential information from images.

• The final layers of the network, often a Multi-Layer Perceptron (MLP), act as
the classification head, mapping the extracted features to specific labels.

3.6 Drawbacks

Figure 8: Drawbacks of Pooling

Pooling is a crucial operation in Convolutional Neural Networks (CNNs) to reduce

the dimensions of feature maps. However, it comes with certain drawbacks, as illustrated
below:

3.6.1 Max Pooling

Max pooling selects the maximum value in each pooling window, emphasizing the most
dominant feature. While this helps retain sharp and prominent features:

• It may lose fine details or subtler patterns in the input feature map.

Components & Variants of CNN 8

Minor in AI

• In the example shown, max pooling retains only the brightest pixel values but loses
the gradient or intensity variation of the diagonal line in the feature map.

3.6.2 Average Pooling

Average pooling computes the average of all pixel values in each pooling window, resulting
in a more generalized representation. However:

• It tends to blur the features by averaging out high and low values.

• In the example, average pooling smooths out the intensity values, causing a loss of
sharpness in the diagonal line.

3.7 Key Takeaways

1. Max Pooling vs. Average Pooling: Max pooling is often preferred in clas-
sification tasks as it highlights the most dominant features, which are important
for pattern recognition, while average pooling is better suited for tasks needing
smoother feature representations.

2. Dimensionality Reduction: Pooling operations in CNNs help in reducing the di-

mensionality of feature maps, making models more computationally efficient without
losing significant information.

3. Task-Specific Choices: The choice between max pooling and average pooling
depends on the task; max pooling is generally preferred for tasks requiring feature
emphasis, while average pooling is better for smoother, more general representations.

4. Hybrid or Learnable Pooling: Combining max and average pooling or using

learnable pooling methods could provide more flexibility and better performance
for specific tasks in CNNs.

4 CNN Architecture
The architecture of a Convolutional Neural Network (CNN) consists of several key com-
ponents:

• Convolutional Layers (Conv Layers): The architecture begins by defining the

number of convolutional layers, which are responsible for feature extraction. These
layers utilize parameters such as kernel size (k), number of filters (F), padding (P),
and stride (S).

• Pooling Layers: Following the convolutional layers, pooling layers help reduce the
dimensionality of the feature maps. The decision to include pooling depends on the
design of the model, as indicated in the diagram.

• Multilayer Perceptron (MLP): After convolution and pooling layers, MLP layers
are added for further feature processing, ultimately performing classification tasks.

• Activation Functions: Activation functions introduce non-linearity into the model,

enabling CNNs to learn complex patterns from the input data.

Components & Variants of CNN 9

Minor in AI

Figure 9: Workflow

Too Many Options!!!

It is important to note that including too many design choices or parameters in the model
may lead to overfitting and unnecessary complexity.

Story Time!
Imagine you’re at a grocery store, overwhelmed by endless cookie options: chocolate chip,
oatmeal raisin, gluten-free, and more. The sheer variety makes it frustrating to decide,
leaving you second-guessing your choice. Now, imagine the store offered a few curated
options, like ”classic” or ”healthy.” With fewer, thoughtfully selected choices, you could
decide quickly and confidently.

Figure 10: Want a Cookie?

This mirrors a common challenge in deep learning. With countless design choices for
layers, filters, kernel sizes, and activations, deciding on the best configuration can feel
overwhelming. Too many options can lead to unnecessary complexity, overfitting, or poor
performance. Structured benchmarks act like the curated options in the store, helping
simplify these decisions and guide the development of efficient, high-performing models.

Components & Variants of CNN 10

Minor in AI

ImageNet
One of the most significant breakthroughs in deep learning came with the introduction of
the ImageNet dataset. ImageNet is a vast visual database containing millions of labeled
images across various categories, specifically designed for visual object recognition. It has
played a critical role in the development of deep learning models by enabling the training
of complex CNNs with large amounts of diverse data.

Figure 11: Click Here ·

The success of CNN architectures, especially after the introduction of ImageNet, revo-
lutionized the field of computer vision. ImageNet provided a standardized benchmark for
evaluating and comparing deep learning models, enabling rapid advancements in visual
recognition tasks, from image classification to object detection.
By leveraging the ImageNet dataset, researchers and developers were able to train
models capable of achieving high accuracy on real-world image recognition tasks. The
dataset’s scale and complexity have been instrumental in pushing the boundaries of deep
learning in the field of computer vision.

4.1 ImageNet 1K
ImageNet-1K is a widely used dataset in computer vision and deep learning, playing
a pivotal role in advancing image classification and object recognition tasks. It is a
subset of the larger ImageNet dataset and contains 1,000 categories (or classes) with
approximately 1.28 million training images, 50,000 validation images, and 100,000
test images. Each image is labeled with one of the 1,000 categories, which include diverse
objects, animals, and scenes.

4.1.1 Key Features

• Rich Diversity: ImageNet-1K spans a wide variety of categories, from common
objects like “apple” and “chair” to specific breeds of dogs and species of birds.

• High-Quality Labels: The labels are derived from the WordNet hierarchy, ensur-
ing semantic relationships among categories.

• Large-Scale: The dataset’s size and diversity make it ideal for training and eval-
uating large-scale deep learning models.

4.1.2 Importance in Deep Learning

• Benchmark for Architectures: ImageNet-1K became a standard benchmark
for evaluating the performance of convolutional neural networks (CNNs) and other
models. Breakthroughs such as AlexNet (2012), VGG, ResNet, and EfficientNet
were first tested on this dataset.

Components & Variants of CNN 11

Minor in AI

• Transfer Learning: Models pre-trained on ImageNet-1K are often fine-tuned for

downstream tasks like object detection and semantic segmentation, significantly
improving performance on smaller, domain-specific datasets.

• Catalyst for Research: The ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), based on ImageNet-1K, spurred innovation in CNN architectures, opti-
mization techniques, and hardware development.

Figure 12: ImagNet 1K

4.1.3 Limitations
• Bias and Imbalance: Despite its diversity, ImageNet-1K reflects cultural and
geographical biases present in its source data.

• Focus on Object Recognition: It emphasizes object categories, which may not

be ideal for tasks requiring fine-grained or contextual understanding.

• Data Accessibility: While widely used in research, its licensing restricts direct
commercial use.

4.1.4 Legacy
ImageNet-1K has been a cornerstone in computer vision research, shaping how neural net-
works are designed and evaluated. Although newer datasets and challenges have emerged,
it remains a foundational tool for developing and benchmarking image classification mod-
els.

5 MNIST Dataset
The MNIST dataset (Modified National Institute of Standards and Technology) is one
of the most popular datasets in machine learning and computer vision. It is primarily
used for training and testing image classification models, especially for handwritten digit
recognition.

Components & Variants of CNN 12

Minor in AI

5.1 The Postcard Story

In the 1980s, researchers in the field of computer vision were trying to solve the challenge
of recognizing handwritten text. One major use case that highlighted the need for this
technology was reading and interpreting postal addresses on envelopes and postcards. The
problem was that traditional OCR (Optical Character Recognition) systems struggled
with the wide variety of handwriting styles, making it difficult to process incoming mail
efficiently.
The U.S. Postal Service (USPS) and other postal organizations faced an increasing vol-
ume of mail, and they needed a system that could automatically read and sort envelopes
based on the handwritten addresses. The variation in handwriting was a significant chal-
lenge — different people write in very different ways, making it difficult for computers to
recognize the characters.

Figure 13: The Postcard Story of MNIST

5.2 What was done?

To address this, researchers at the National Institute of Standards and Technology (NIST)
began creating datasets of handwritten digits, which would help train algorithms to rec-
ognize these digits in real-world scenarios like postal address recognition. However, this
required a large and diverse set of examples, capturing the variation in handwriting that
would appear on real mail.
Yann LeCun, Corinna Cortes, and Christopher J.C. Burges introduced MNIST in
1998. They modified and standardized NIST data by normalizing the images, resizing
them to 28 × 28 pixels, and ensuring a consistent format. Thus, the MNIST dataset
was born, made up of a collection of handwritten digits that were collected from postal
workers. These digits were carefully processed and digitized, providing a standardized set
that researchers could use to train and test their machine learning algorithms. MNIST was
intended to serve as a benchmark for evaluating the performance of algorithms designed
to recognize handwritten digits, specifically to tackle challenges like reading and sorting
postal mail.

Components & Variants of CNN 13

Minor in AI

Figure 14: MNIST Data

5.3 Features of MNIST

• Size: Contains 60,000 training images and 10,000 test images of handwritten digits
(0 to 9).

• Image Format: Grayscale images of size 28 × 28 pixels.

• Labels: Each image corresponds to a digit (0 to 9), making it a 10-class classification

problem.

• Preprocessing: Images are centered and normalized for consistency.

5.4 Why MNIST is Important

• Benchmark Dataset: It became a standard benchmark for evaluating image clas-
sification algorithms.

• Ease of Use: Its simplicity allows beginners to experiment with neural networks
without complex preprocessing.

• Historical Significance: It played a crucial role in the development of convolu-

tional neural networks (CNNs).

6 LeNet: The Pioneer CNN Architecture

LeNet-5, introduced by Yann LeCun in 1998, is one of the first convolutional neural
networks (CNNs) designed for image recognition tasks, specifically for handwritten digit
recognition using the MNIST dataset.

6.1 Why LeNet was needed?

Traditional machine learning algorithms struggled with image data due to:

Components & Variants of CNN 14

Minor in AI

• Feature Extraction: Manually crafting features for images was tedious and prone
to errors.

• High Dimensionality: Images have a large number of pixels, making them difficult
to process without efficient algorithms.

LeNet addressed these challenges by learning features automatically through convolutional

and pooling layers, drastically improving performance.

6.2 Architecture of LeNet-5

LeNet-5 consists of the following layers:

1. Input Layer: Accepts 32 × 32 grayscale images.

2. Convolutional Layer 1 (C1): Applies 6 filters of size 5 × 5, resulting in 6 feature

maps of size 28 × 28.

3. Subsampling Layer 1 (S2): Averages the values in 2×2 regions, reducing feature
maps to size 14 × 14.

4. Convolutional Layer 2 (C3): Applies 16 filters of size 5 × 5, resulting in 16

feature maps of size 10 × 10.

5. Subsampling Layer 2 (S4): Further reduces size to 5 × 5.

6. Fully Connected Layer (F5): Connects the flattened feature maps to a 120-
neuron layer.

7. Output Layer: Produces 10 outputs, corresponding to the 10 digit classes.

Figure 15: Architecture

Components & Variants of CNN 15

Minor in AI

6.3 Key Features

• Automatic Feature Extraction: Learns relevant features directly from data us-
ing convolutional layers.

• Translation Invariance: Pooling layers ensure that the model is robust to small
shifts in the input image.

• Efficiency: LeNet was computationally efficient for its time, enabling practical use
in digit recognition systems.

6.4 Impact of LeNet

• Foundation for CNNs: LeNet laid the groundwork for modern CNN architectures
like AlexNet, VGG, and ResNet.

• Commercial Applications: It was used in bank check processing and other real-
world systems.

• Demonstration of Deep Learning: Showed that neural networks could outper-

form traditional methods for visual tasks.

7 AlexNet
In 2012, a groundbreaking moment in the field of computer vision occurred, largely at-
tributed to a deep convolutional neural network (CNN) called AlexNet. Developed
by Alex Krizhevsky, along with his advisor Geoffrey Hinton and colleague Ilya
Sutskever, AlexNet played a pivotal role in revolutionizing the way machines interpret
images and kick-started the deep learning revolution in artificial intelligence.
Before AlexNet, computer vision tasks such as image classification were traditionally
handled by shallow machine learning algorithms or manually designed feature extraction
techniques. These methods had some success, but their performance was limited, partic-
ularly when dealing with complex tasks like recognizing objects in large, high-resolution
images.

The ImageNet Challenge

The breakthrough came when Krizhevsky and his team decided to apply deep learn-
ing, specifically convolutional neural networks, to the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) in 2012. This annual competition had been ongo-
ing for several years, with the task being to classify and detect objects in large-scale
image datasets.
Prior to AlexNet’s involvement, the best-performing algorithms used handcrafted
features like SIFT (Scale-Invariant Feature Transform) or HOG (Histogram of Ori-
ented Gradients), which required manual tuning and were computationally expen-
sive. These methods performed well in limited contexts but struggled to handle the
scale and variability of the ImageNet challenge, which featured over 1,000 classes
of objects, each with thousands of images.

Components & Variants of CNN 16

Minor in AI

Figure 16: AlexNet Architecture

7.1 Key Features

AlexNet made several key innovations that allowed it to outperform existing methods:

7.1.1 Deep Architecture

AlexNet had a deep architecture, consisting of eight layers — five convolutional layers
followed by three fully connected layers. This deep structure enabled it to learn complex
features and patterns in images, which shallow models couldn’t capture.

7.1.2 ReLU Activation

Instead of using the traditionally slower sigmoid or tanh activation functions, AlexNet
utilized the ReLU (Rectified Linear Unit) activation function. ReLU is computation-
ally efficient, helps avoid the vanishing gradient problem, and speeds up training. This
was a significant factor in AlexNet’s success.

7.1.3 GPUs for Training

One of the most crucial innovations was the use of Graphics Processing Units (GPUs)
for training the network. AlexNet was trained on two GPUs, which significantly acceler-
ated the training process compared to traditional CPU-based methods. This allowed the
model to learn from a massive amount of data in a reasonable amount of time, something
previously unfeasible.

7.1.4 Data Augmentation and Dropout

To prevent overfitting, AlexNet utilized data augmentation techniques like random
cropping, flipping, and color variation. This artificially expanded the training dataset.
Additionally, dropout was employed in the fully connected layers to reduce overfitting
by randomly setting some of the neurons to zero during training.

Components & Variants of CNN 17

Minor in AI

7.1.5 Local Response Normalization (LRN)

AlexNet introduced Local Response Normalization, which helped in improving gen-
eralization by normalizing the activations of neurons within local regions of the network,
aiding the model in learning more robust features.

7.2 Results and Impact

The impact of AlexNet was immediate and dramatic. At the ILSVRC 2012, AlexNet
achieved an impressive top-5 error rate of 16.4%, a substantial improvement over the
second-place submission with a top-5 error rate of 25.7%. This victory marked the dawn
of deep learning as the dominant approach to solving many computer vision tasks.
The success of AlexNet helped reignite interest in neural networks and deep learning,
which had seen a decline in popularity due to limitations in computational power and lack
of large datasets. By demonstrating the power of deep convolutional networks, AlexNet
contributed to a surge in research and development in deep learning, leading to the rapid
advancement of the field.

7.3 Legacy
AlexNet’s success also paved the way for even deeper and more sophisticated architectures,
including VGGNet, GoogLeNet, ResNet, and other models, each building upon the
principles established by AlexNet. Today, convolutional neural networks (CNNs) are the
standard for image classification, object detection, and various other computer vision
tasks.
AlexNet’s legacy extends beyond computer vision — it was a key factor in the
widespread adoption of deep learning for a variety of AI applications, including natural
language processing, speech recognition, and reinforcement learning.

7.4 Key Takeaways

1. Deep Network Architecture: AlexNet showed that deep architectures with mul-
tiple convolutional layers could significantly improve performance in computer vision
tasks.

2. ReLU Activation: The use of ReLU as an activation function helped speed up

training and improve performance compared to traditional activation functions like
sigmoid and tanh.

3. GPU Utilization: Training on GPUs enabled AlexNet to process large datasets

and learn complex patterns more efficiently than ever before.

4. Data Augmentation and Dropout: Techniques like data augmentation and

dropout were crucial in reducing overfitting and improving generalization.

5. Impact on Deep Learning: AlexNet’s success marked the beginning of the deep
learning revolution, influencing a wide range of applications beyond computer vision,
including natural language processing and reinforcement learning.

Components & Variants of CNN 18

Minor in AI

8 LeNet vs AlexNet

Figure 17: LeNet vs AlexNet

LeNet was originally developed for handwritten digit recognition, specifically designed
to classify images in the MNIST dataset. Its simpler architecture, using small input
images (28x28x1), was effective for this task. On the other hand, AlexNet, with its much
deeper and more complex architecture, was designed to tackle the much larger and more
varied ImageNet dataset, consisting of high-resolution images (224x224x3) from 1,000
different categories. AlexNet’s success in classifying complex images marked a significant
breakthrough in deep learning, particularly in computer vision.

9 VGGNet
VGGNet, developed by the Visual Geometry Group at the University of Oxford, is a
deep convolutional neural network architecture. It became famous for its simplicity and
effectiveness in handling large-scale image classification tasks. VGGNet is known for its
deep architecture with very small 3x3 convolutional filters, which allowed it to capture
detailed hierarchical features in images. The model made significant contributions to the
field of computer vision, especially for image classification challenges like ImageNet.

9.1 Why it was developed?

Before VGGNet, architectures like LeNet and AlexNet paved the way for deep learn-
ing in image recognition. However, there was a need for a more uniform and deeper
architecture that could handle high-resolution images and learn more complex features.
VGGNet emerged to address these limitations by leveraging deep layers of small convo-
lutional filters, offering a balanced approach that achieved state-of-the-art performance
while maintaining simplicity in design.

Components & Variants of CNN 19

Minor in AI

Figure 18: Architecture of VGGNet-CNN

9.2 Key Features

• Deep Architecture: VGGNet consists of 16 to 19 layers, making it significantly
deeper than its predecessors like AlexNet.

• Small Convolutional Filters: It uses small 3x3 convolutional filters, stacked in

multiple layers, which helps in capturing finer details of the image.

• Uniform Architecture: The architecture is uniform, with only 3x3 convolutions

and 2x2 max-pooling layers used throughout, simplifying the network design.

• ReLU Activation: ReLU activation functions are used after each convolutional
layer, helping in faster training and preventing vanishing gradients.

• Fully Connected Layers: At the end of the convolutional layers, VGGNet uses
three fully connected layers, which help in making the final classification decision.

9.3 Key Takeaways

1. Simplicity in Design: VGGNet’s uniform use of small convolutional filters made
it a simple yet powerful model for image classification.

2. Deep Learning Success: VGGNet demonstrated that deeper networks could

achieve impressive performance on challenging image classification tasks.

3. Influential in Computer Vision: The architecture set a benchmark for subse-

quent deep learning models, influencing the design of later networks like ResNet.

4. Transfer Learning: VGGNet’s pre-trained weights have been widely used in trans-
fer learning applications, where the model is adapted for various tasks in computer
vision.

Components & Variants of CNN 20

Surigao Del Sur State University: Post-Harvest Handling of Perishable Crops
67% (3)
Surigao Del Sur State University: Post-Harvest Handling of Perishable Crops
17 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
DL Unit3
No ratings yet
DL Unit3
8 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
Unit III
No ratings yet
Unit III
8 pages
Module 05 CNN Arctitecture
No ratings yet
Module 05 CNN Arctitecture
7 pages
Chap 2 DL
No ratings yet
Chap 2 DL
88 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
No ratings yet
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
13 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Unit 4 Deep Learning Model:: Introduction To Cnns
No ratings yet
Unit 4 Deep Learning Model:: Introduction To Cnns
7 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Unit 3
No ratings yet
Unit 3
59 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
CNN
No ratings yet
CNN
31 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
Module 5
No ratings yet
Module 5
20 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Lecture - 07 (Convolutional Neural Networks)
No ratings yet
Lecture - 07 (Convolutional Neural Networks)
57 pages
Assignment 5 - Implementing Image Classification Using Deep Learning
No ratings yet
Assignment 5 - Implementing Image Classification Using Deep Learning
8 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
Understandingcnn 241117075844 C6ee6804
No ratings yet
Understandingcnn 241117075844 C6ee6804
24 pages
Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Deep Learning Series CNN - 2
No ratings yet
Deep Learning Series CNN - 2
15 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Unit III
No ratings yet
Unit III
89 pages
DL-Unit-3 Final
No ratings yet
DL-Unit-3 Final
25 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Convolutional Neural Networks CNN
No ratings yet
Convolutional Neural Networks CNN
8 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
6 pages
CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
DL Mod3
No ratings yet
DL Mod3
102 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
61 pages
Module 3
No ratings yet
Module 3
67 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
Unit III
No ratings yet
Unit III
89 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
DL Unit 3 2019PAT
No ratings yet
DL Unit 3 2019PAT
66 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Courses For The Jan 2025 NPTEL Courses
No ratings yet
Courses For The Jan 2025 NPTEL Courses
2 pages
Nomination Form18
No ratings yet
Nomination Form18
6 pages
Using Objects and Classes Defining Simple Classes
No ratings yet
Using Objects and Classes Defining Simple Classes
34 pages
Winter Break H W Class 11
No ratings yet
Winter Break H W Class 11
2 pages
Summary of Some ONS and Enteral Formulas
No ratings yet
Summary of Some ONS and Enteral Formulas
3 pages
Work Immersion Pertinent Papers
No ratings yet
Work Immersion Pertinent Papers
19 pages
Module-1 and Module 2
No ratings yet
Module-1 and Module 2
16 pages
Ruchi Integration of Approaches
No ratings yet
Ruchi Integration of Approaches
19 pages
Work Immersion
No ratings yet
Work Immersion
39 pages
Revised Research Paper
No ratings yet
Revised Research Paper
32 pages
Jyoti Singh 2018
No ratings yet
Jyoti Singh 2018
3 pages
10.CV Terbaru Firmansyah
No ratings yet
10.CV Terbaru Firmansyah
5 pages
Unit 4
No ratings yet
Unit 4
14 pages
SC - MSC - Cs - 1st Sem - 2018
No ratings yet
SC - MSC - Cs - 1st Sem - 2018
2 pages
Physics Pratical
No ratings yet
Physics Pratical
12 pages
Lagos State University of Science and Technology
No ratings yet
Lagos State University of Science and Technology
2 pages
TPCN Monthly List of Subcontractors 06-2017
No ratings yet
TPCN Monthly List of Subcontractors 06-2017
3 pages
DLL Cpar
No ratings yet
DLL Cpar
3 pages
Designing A Rubric
No ratings yet
Designing A Rubric
28 pages
Classes Time Table II IV VI Even 2023 2024
No ratings yet
Classes Time Table II IV VI Even 2023 2024
13 pages
PedagogySyllabus F11
No ratings yet
PedagogySyllabus F11
3 pages
Daycare Strategic Plan
No ratings yet
Daycare Strategic Plan
37 pages
Leaflet Safety Relays PNOZ US 2010-07
No ratings yet
Leaflet Safety Relays PNOZ US 2010-07
82 pages
Getting Away With Murder Final
No ratings yet
Getting Away With Murder Final
34 pages
वार्षिक - परीक्षा - पेपर 6 अंग्रेजी shalasugam.com 2023
No ratings yet
वार्षिक - परीक्षा - पेपर 6 अंग्रेजी shalasugam.com 2023
2 pages
Image Segmentation DeepLearning
No ratings yet
Image Segmentation DeepLearning
18 pages
2022 A Hybrid DenseNet121-UNet Model For Brain Tumor Segmentation From MR Images
No ratings yet
2022 A Hybrid DenseNet121-UNet Model For Brain Tumor Segmentation From MR Images
9 pages
Admit Card of B.Ed.
No ratings yet
Admit Card of B.Ed.
1 page
Transkrip Nilai Sementara
No ratings yet
Transkrip Nilai Sementara
4 pages