Assignment_8
Assignment_8
Ans 1 and 2
Using pre-trained architectures for developing an image classification model, leverages the knowledge learned by
models on large and diverse datasets to achieve high accuracy with less data.
a. Merits:
VGG16: Known for its simplicity, VGG16 has become a benchmark for CNNs due to its uniform design of using
3x3 convolutional layers stacked on top of each other.
ResNet: ResNet architectures, such as ResNet-50, introduce residual connections that enable training of very
deep networks by addressing the vanishing gradient problem. This results in improved performance on
complex tasks.
Inception: The Inception model incorporates multi-scale processing by using different kernel sizes within the
same layer, allowing the model to capture features at various scales.
b. Demerits:
VGG16: The simplicity of VGG16 comes with a high computational cost due to its depth and the number of
fully-connected layers, leading to a large number of parameters.
ResNet: While ResNet models are powerful, they can be complex to implement and may require more
computational resources than simpler architectures.
Inception: The complexity of the Inception architecture can make it difficult to modify and fine-tune for
specific tasks.
a. Merits:
Fine-Tuning: Fine-tuning can lead to higher accuracy since the model is adjusted to the specifics of the new
task.
Feature Extraction: This approach is faster and requires less computational power, as only a small part of the
model is being trained.
b. Demerits:
Fine-Tuning: It is computationally expensive and may lead to overfitting if the new dataset is small.
Feature Extraction: While efficient, this might result in lower accuracy as the pre-trained features may not be
optimal for the new task.
a. Merits:
Multi-Modal Models: They offer robust and aligned semantic representations, which can be beneficial for
tasks involving both visual and textual data.
Zero-Shot Learning: These models can be used without further training, making them highly versatile and
efficient.
Date of Submission: Friday 6th Sep 2024
b. Demerits:
Multi-Modal Models: They may not outperform specialized image classification models on tasks that are
purely visual.
Zero-Shot Learning: The performance can be unpredictable on specific datasets or niche tasks.
Ans 3
Fine-tuning the deeper layers is generally a better choice for image classification, as they capture more task-
specific, abstract features. Shallower layers tend to learn general features (e.g., edges, textures), which are
often transferable across tasks, while deeper layers adapt to the particular dataset or problem.