0% found this document useful (0 votes)
12 views14 pages

Convolutional Neural Network (CNN)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

Convolutional Neural Network (CNN)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CONVOLUTIONAL NEURAL

NETWORK(CNN)
LeNet-5 Architecture
It was originally used for handwritten digit classification.

The classic version of LeNet-5 consists of the following layers:


● Convolutional Layers (Cx): these layers apply convolution operations using multiple filters. Each filter is responsible for capturing
specific features, such as edges, textures, or patterns. The sliding window approach ensures that features are extracted in a spatially
structured manner, enabling the network to capture hierarchical information (low-level features like edges in early layers, more
complex patterns like shapes in deeper layers).

● Subsampling Layers (Sx): LeNet-5 uses average pooling in its subsampling layers, which reduces the spatial resolution of the
feature maps. This pooling operation also helps the network to become more robust to variations in the position of features, as
smaller shifts in input images won’t significantly alter the output. This process, along with reduced dimensionality, helps combat
overfitting and makes the network less computationally expensive.

● Fully Connected Layers (Fx): The fully connected layers are tasked with making the final decisions or predictions based on the
features extracted by the convolutional and subsampling layers. By connecting all neurons in these layers, the network is able to
combine and weigh features learned across all previous layers, effectively integrating global information from the image to produce
the final output (in LeNet's case, predicting the digit).
Architectural Efficiency

LeNet-5 was designed to be both computationally efficient and effective, which was critical at a time when computational resources
were limited. The alternating pattern of convolution and pooling ensures that relevant features are captured while progressively
reducing the spatial complexity of the data. The structured receptive fields ensure that the network captures local patterns in a way that
supports the hierarchical learning of features.

This layered approach of progressively extracting more abstract and complex representations in later layers, followed by fully
connected layers to make the final decision, became a hallmark of modern CNNs.
VGG-16 Architecture:
The name "VGG-16" reflects the fact that it has 16 layers with learnable weights
(13 convolutional layers and 3 fully connected layers). The architecture follows a
straightforward design philosophy: stacking small convolutional filters (3x3) with
stride 1, padding 1, and using max pooling to reduce the spatial dimensions.
Key Features of VGG-16

1. Small Convolutional Filters:


○ VGG-16 uses 3x3 filters throughout the network. This is a deliberate choice because stacking multiple 3x3 filters in successive
layers allows the network to capture complex patterns with fewer parameters, while still having a large receptive field.
○ For example, two consecutive 3x3 convolutional layers have an effective receptive field of 5x5, and three consecutive 3x3 layers
have an effective receptive field of 7x7.
2. Deep Architecture:
○ VGG-16 is deeper than many previous CNN architectures, with 13 convolutional layers. This depth allows it to learn more complex
features and patterns from data, which improves its ability to classify and generalize.
3. Consistent Structure:
○ The architecture is very consistent in its design: it repeats similar building blocks (conv-conv-maxpool) across the network, making
it simpler and more predictable in terms of layer organization.
4. Large Fully Connected Layers:
○ VGG-16 has two large fully connected layers (4096 neurons each) before the final softmax layer, which helps the network to
combine features and make decisions.
5. Max Pooling:
○ Max pooling with 2x2 filters and stride 2 is used after each block of convolutional layers to progressively reduce the spatial
dimensions of the feature maps. This helps reduce computational complexity while retaining the most important features.
Strengths and Weaknesses

● Strengths:
○ Performance: VGG-16 performs exceptionally well on image classification tasks, especially on large-scale datasets like
ImageNet.
○ Generalization: The architecture is highly transferable, meaning it works well on other tasks via fine-tuning (e.g., object
detection and segmentation).
○ Simplicity: The uniform use of 3x3 filters and 2x2 max pooling makes the network design simple and effective.
● Weaknesses:
○ Computationally Expensive: VGG-16 has a very large number of parameters (about 138 million), making it
computationally expensive to train and requiring significant memory and processing power.
○ Not the Most Efficient: While deep, the architecture is not the most computationally efficient compared to more modern
architectures (e.g., ResNet, EfficientNet) that use more innovative strategies like residual connections or depthwise
separable convolutions.
VGG-19 Architecture:
VGG-19 contains 19 layers with learnable parameters: 16 convolutional layers and
3 fully connected layers.
Key Features of VGG-19

1. Small Convolutional Filters:


○ Like VGG-16, VGG-19 uses 3x3 convolutional filters, which are stacked to capture increasingly complex patterns in the
input. Multiple 3x3 filters allow the model to have a larger effective receptive field while keeping the number of
parameters manageable.
2. Deeper Architecture:
○ VGG-19 has 16 convolutional layers, making it slightly deeper than VGG-16. The extra layers improve its ability to
capture fine-grained features, but also increase the number of parameters and computational complexity.
3. Max Pooling:
○ Max pooling with 2x2 filters and a stride of 2 is used after every block of convolutional layers to reduce spatial
dimensions and control computational cost.
4. Fully Connected Layers:
○ VGG-19, like VGG-16, has two large fully connected layers (4096 neurons each) before the final classification layer.
These dense layers allow the network to integrate information from all the learned features.
5. Consistency:
○ The architecture is highly regular, with repeated blocks of convolution followed by max pooling, making it simple in
design but effective at performance
Strengths and Weaknesses:

● Strengths:
○ Performance: VGG-19 performs very well in image classification tasks and has strong
generalization ability.
○ Modularity: The repeated blocks make the network design modular and easy to implement.
○ Transfer Learning: VGG-19, like VGG-16, is popular for transfer learning, meaning it can be used
as a pre-trained model for other tasks.
● Weaknesses:
○ High Computational Cost: VGG-19 has about 143 million parameters, making it very expensive
in terms of memory and computation, especially compared to more recent architectures like Res
AlexNet Architecture:
Key Innovations in AlexNet

1. ReLU Activation:
○ AlexNet popularized the use of the ReLU activation function, which introduced non-linearity into the model. ReLU is
computationally efficient and helps mitigate the vanishing gradient problem that plagued earlier models using sigmoid or
tanh activations.
2. GPU Training:
○ AlexNet was one of the first deep learning models to make extensive use of GPUs to accelerate training. In fact, AlexNet
was trained on two Nvidia GTX 580 GPUs, splitting the model across GPUs and processing mini-batches in parallel.
3. Dropout:
○ Dropout, introduced in AlexNet, is a regularization technique that randomly sets a fraction of neurons to zero during
training, which helps prevent overfitting.
4. Data Augmentation:
○ AlexNet used data augmentation techniques like random cropping, horizontal flipping, and image translations to artificially
increase the size of the training set and improve generalization.
5. Local Response Normalization (LRN):
○ LRN was used to normalize activations by applying competition across neurons in a local neighborhood. This feature was
specific to AlexNet and is rarely used in modern architectures, which now rely on Batch Normalization.

You might also like