Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
NETWORK(CNN)
LeNet-5 Architecture
It was originally used for handwritten digit classification.
● Subsampling Layers (Sx): LeNet-5 uses average pooling in its subsampling layers, which reduces the spatial resolution of the
feature maps. This pooling operation also helps the network to become more robust to variations in the position of features, as
smaller shifts in input images won’t significantly alter the output. This process, along with reduced dimensionality, helps combat
overfitting and makes the network less computationally expensive.
● Fully Connected Layers (Fx): The fully connected layers are tasked with making the final decisions or predictions based on the
features extracted by the convolutional and subsampling layers. By connecting all neurons in these layers, the network is able to
combine and weigh features learned across all previous layers, effectively integrating global information from the image to produce
the final output (in LeNet's case, predicting the digit).
Architectural Efficiency
LeNet-5 was designed to be both computationally efficient and effective, which was critical at a time when computational resources
were limited. The alternating pattern of convolution and pooling ensures that relevant features are captured while progressively
reducing the spatial complexity of the data. The structured receptive fields ensure that the network captures local patterns in a way that
supports the hierarchical learning of features.
This layered approach of progressively extracting more abstract and complex representations in later layers, followed by fully
connected layers to make the final decision, became a hallmark of modern CNNs.
VGG-16 Architecture:
The name "VGG-16" reflects the fact that it has 16 layers with learnable weights
(13 convolutional layers and 3 fully connected layers). The architecture follows a
straightforward design philosophy: stacking small convolutional filters (3x3) with
stride 1, padding 1, and using max pooling to reduce the spatial dimensions.
Key Features of VGG-16
● Strengths:
○ Performance: VGG-16 performs exceptionally well on image classification tasks, especially on large-scale datasets like
ImageNet.
○ Generalization: The architecture is highly transferable, meaning it works well on other tasks via fine-tuning (e.g., object
detection and segmentation).
○ Simplicity: The uniform use of 3x3 filters and 2x2 max pooling makes the network design simple and effective.
● Weaknesses:
○ Computationally Expensive: VGG-16 has a very large number of parameters (about 138 million), making it
computationally expensive to train and requiring significant memory and processing power.
○ Not the Most Efficient: While deep, the architecture is not the most computationally efficient compared to more modern
architectures (e.g., ResNet, EfficientNet) that use more innovative strategies like residual connections or depthwise
separable convolutions.
VGG-19 Architecture:
VGG-19 contains 19 layers with learnable parameters: 16 convolutional layers and
3 fully connected layers.
Key Features of VGG-19
● Strengths:
○ Performance: VGG-19 performs very well in image classification tasks and has strong
generalization ability.
○ Modularity: The repeated blocks make the network design modular and easy to implement.
○ Transfer Learning: VGG-19, like VGG-16, is popular for transfer learning, meaning it can be used
as a pre-trained model for other tasks.
● Weaknesses:
○ High Computational Cost: VGG-19 has about 143 million parameters, making it very expensive
in terms of memory and computation, especially compared to more recent architectures like Res
AlexNet Architecture:
Key Innovations in AlexNet
1. ReLU Activation:
○ AlexNet popularized the use of the ReLU activation function, which introduced non-linearity into the model. ReLU is
computationally efficient and helps mitigate the vanishing gradient problem that plagued earlier models using sigmoid or
tanh activations.
2. GPU Training:
○ AlexNet was one of the first deep learning models to make extensive use of GPUs to accelerate training. In fact, AlexNet
was trained on two Nvidia GTX 580 GPUs, splitting the model across GPUs and processing mini-batches in parallel.
3. Dropout:
○ Dropout, introduced in AlexNet, is a regularization technique that randomly sets a fraction of neurons to zero during
training, which helps prevent overfitting.
4. Data Augmentation:
○ AlexNet used data augmentation techniques like random cropping, horizontal flipping, and image translations to artificially
increase the size of the training set and improve generalization.
5. Local Response Normalization (LRN):
○ LRN was used to normalize activations by applying competition across neurons in a local neighborhood. This feature was
specific to AlexNet and is rarely used in modern architectures, which now rely on Batch Normalization.