AdvAI Unit4
AdvAI Unit4
TRANSFER
LEARNING
SEM 8, AI&DS
Dr. Himani Deshpande(TSEC)
UNIT 4-
TRANSFER LEARNING
• Introduction to transfer learning
meta learning.
You learn to balance the cycle and use the same skill to learn scooter
Dr. Himani Deshpande(TSEC)
TRANSFER LEARNING
• In transfer learning, you can leverage knowledge (features, weights etc) from
previously trained models for training newer models and even tackle problems
like having less data for the newer task!
With transfer learning, we basically try to use what we’ve learned in one task to better
understand the concepts in another.
Weights are being automatically being shifted to a network performing “task A” from a
network that performed
Dr. Himani new “task B.”
Deshpande(TSEC)
Dr. Himani Deshpande(TSEC)
CONSERVATIVE TRAINING
Output layer output close Output layer
parameter close
initialization
Input layer Input layer
Target data
……
……
……
……
xN …… ……
Dr. Himani Deshpande(TSEC)
TRANSFER LEARNING
Andrew NG
• MNIST is primarily used for the task of handwritten digit recognition, where the goal
is to train a model to correctly classify images of handwritten digits (0 through 9) into
their respective numerical values.
Image Format: The dataset consists of 28x28 pixel grayscale images of handwritten
digits.
Number of Classes: There are 10 classes, each corresponding to a digit from 0 to 9.
Training and Testing Sets: MNIST is commonly divided into a training set of 60,000
examples and a testing set of 10,000 examples.
Dr. Himani Deshpande(TSEC)
CIFAR
• CIFAR stands for the Canadian Institute for Advanced Research, and the CIFAR
datasets refer to a collection of labelled datasets widely used for training and
evaluating machine learning models, particularly in the field of computer vision.
There are several CIFAR datasets, with CIFAR-10 and CIFAR-100 being the most
popular:
• CIFAR-10 consists of 60,000 32x32 colour images in 10 different classes, with 6,000
images per class.
The classes are: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and
truck.
• CIFAR-100 is an extension of CIFAR-10, containing 60,000 32x32 colour images.
However, in CIFAR-100, the images are divided into 100 different classes, each
containing 600 images. The classes in CIFAR-100 are more fine-grained, covering a
broader range of object categories.
Dr. Himani Deshpande(TSEC)
COCO
• COCO addresses three main tasks: object detection, object segmentation, and image captioning. It
provides annotations for these tasks, making it a comprehensive dataset for evaluating models across
multiple visual understanding challenges.
• The dataset includes a large number of images (currently over 200,000 images) collected from a
wide range of everyday scenes. The images are diverse and cover a variety of contexts, making it
more challenging for models to generalize well.
• COCO contains images with 80 common object categories, such as people, animals, vehicles,
household items, and more.
• The availability of rich annotations, diverse images, and multiple tasks make COCO a valuable
resource for training and evaluating models in various aspects of computer vision.
• Researchers often use pre-trained models on COCO for downstream tasks, allowing their models to
learn from the large and diverse set of images and annotations present in the dataset.
Dr. Himani Deshpande(TSEC)
1. Image Classification:
DATASETS
- ImageNet: A large-scale image dataset with millions of labeled images across thousands of categories.
Commonly used for training models like VGG, ResNet, and Inception.
2. Object Detection:
- COCO (Common Objects in Context): A dataset for object detection, segmentation, and captioning.
It includes images with complex scenes and multiple objects.
3. Natural Language Processing (NLP):
- Wikipedia Dump: Large text corpora from Wikipedia articles can be used for language model
pretraining.
- BookCorpus: A dataset containing books for training language models.
- OpenWebText: A large collection of text from the web, commonly used for training models like GPT.
4. Speech Recognition:
- LibriSpeech: A dataset for training Automatic Speech Recognition (ASR) models, containing
audiobooks with transcriptions.
- CommonVoice: A multilingual dataset of voices to train and benchmark speech recognition systems.
Dr. Himani Deshpande(TSEC)
5. Facial Recognition:
- Labeled Faces in the Wild (LFW): A dataset for face verification and recognition,
containing images of celebrities collected from the web.
- CelebA: A dataset with celebrity images annotated with various attributes, commonly
used for facial recognition tasks.
6. Medical Imaging:
- ChestX-ray8: A dataset for chest X-ray image classification tasks, particularly for
pneumonia detection.
- ISIC Skin Cancer Dataset: A dataset for skin cancer classification, including various
types of skin lesions.
7. Scene Understanding:
- ADE20K: A dataset for semantic segmentation of scenes, with pixel-level annotations
for diverse indoor and outdoor scenes.
8. Recommendation Systems:
- MovieLens: A dataset for collaborative filtering and recommendation systems,
Dr. Himani Deshpande(TSEC)
containing movie ratings from users.
O D E L S
I N E D M
E - T RA
PR
VGG-16
ResNet50
Inception
BERT (Bidirectional Encoder Representations from Transformers):
YOLO
etc.
• Convolutional Layers = 13
• Pooling Layers = 5
• Dense Layers = 3
• Freeze means fixing the weights and bias, as true knowledge is gained in terms
of weights and bias,
Dr. Himani Deshpande(TSEC)
VGG16
• While the above VGG-16 secured the 2nd rank in that year’s ILSVRC, the 1st
rank was secured by none other than Google – via its model GoogLeNet or
Inception as it is now later called as.
• The “Inception” micro-architecture was first introduced by Szegedy et al. in their 2014
paper, Going Deeper with Convolutions:
• As a result, the pre-trained BERT model can be fine-tuned with just one
additional output layer to create state-of-the-art models for a wide range of
tasks, such as question answering and language inference.
1.DeepSpeech:
1. Open-source automatic speech recognition (ASR) model by Mozilla.
2.Wav2Vec:
1. Model for unsupervised pretraining of speech representations.
1.OpenFace:
1. A model for facial landmark detection, recognition, and clustering.
2.VGGFace:
1. Pretrained for face recognition tasks.
ØFeature extraction
ØFine-tuning
• Thus the pre-trained model acts as a starting point for the model
leading to faster convergence compared to the random initialization.
Auto suggestion
• In self-supervised learning, the model learns one part of the input from
another part of the input.
• Effective NCSSL requires an extra predictor on the online side that does
not back-propagate on the target side