0% found this document useful (0 votes)
24 views29 pages

06 Transfer Learning With Tensorflow Part 3 Scaling Up

This document discusses transfer learning and how it can be used to scale up models with less data. It explains that transfer learning works by leveraging existing neural network architectures and patterns learned from similar data to solve new problems. The document outlines training a transfer learning feature extraction model on a subset of food image classes from the Food101 dataset, then fine-tuning the model to beat the original paper's results using only 10% of the data. It also describes evaluating predictions on test data and making predictions on custom images. Callbacks that can help during transfer learning training are also mentioned.

Uploaded by

Akbar Shakoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views29 pages

06 Transfer Learning With Tensorflow Part 3 Scaling Up

This document discusses transfer learning and how it can be used to scale up models with less data. It explains that transfer learning works by leveraging existing neural network architectures and patterns learned from similar data to solve new problems. The document outlines training a transfer learning feature extraction model on a subset of food image classes from the Food101 dataset, then fine-tuning the model to beat the original paper's results using only 10% of the data. It also describes evaluating predictions on test data and making predictions on custom images. Callbacks that can help during transfer learning training are also mentioned.

Uploaded by

Akbar Shakoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Transfer Learning with

Part 3: Scaling up
Where can you get help?
“If in doubt, run the code”

• Follow along with the code


• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask (don’t forget the Discord chat!)
(yes, including the “dumb”
questions)
“What is transfer learning?”
Surely someone has spent the time crafting the right model for the job…
Example transfer learning use cases
Computer vision

Natural language processing

To: [email protected] To: [email protected]


Hey Daniel, Hay daniel…

This deep learning course is incredible! C0ongratu1ations! U win $1139239230


I can’t wait to use what I’ve learned!

Not spam Spam

Model learns patterns/weights from similar problem space Patterns get used/tuned to speci c problem
fi
“Why use transfer learning?”
Why use transfer learning?
• Can leverage an existing neural network architecture proven to work on problems similar to our
own

• Can leverage a working network architecture which has already learned patterns on similar
data to our own (often results in great results with less data)

Learn patterns in a E cientNet archiecture Tune patterns/weights to


Model performs better
wide variety of images (already works really well our own problem
than from scratch
(using ImageNet) on computer vision tasks) (Food Vision)
ffi
What we’re going to cover
(broadly)
• Downloading & preparing 10% of all Food101 classes (7500+ training images)

• Training a transfer learning feature extraction model

• Fine-tuning our feature extraction model (🍔 👁Food Vision mini) to beat the
original Food101 paper with only 10% of the data

• Evaluating Food Vision mini’s predictions

• Finding the most wrong predictions (on the test dataset)

• Making predictions with Food Vision mini on our own custom images

👩🍳 👩🔬
(we’ll be cooking up lots of code!)

How:
Serial experimentation
Everything you got

Xtra-Large dataset,
xtra-large model

Larger dataset, Larger dataset,


small model larger model

Small dataset,
small model

Start small All in


🍔👁 Food Vision: Dataset(s) we’re using
Note: For randomly selected data, the Food101 dataset was downloaded and modi ed using the Image Data Modi cation Notebook

Dataset Name Source Classes Training data Testing data

750 images of pizza and steak 250 images of pizza and steak
pizza_steak Food101 Pizza, steak (2) (same as original Food101 (same as original Food101
dataset) dataset)

Chicken curry, chicken wings,


7 randomly selected images 250 images of each class
fried rice, grilled salmon,
10_food_classes_1_percent Same as above of each class (1% of original (same as original Food101
hamburger, ice cream, pizza,
training data) dataset)
ramen, steak, sushi (10)

75 randomly selected images


10_food_classes_10_percent Same as above Same as above of each class (10% of original Same as above
training data)

750 images of each class


10_food_classes_100_percent Same as above Same as above (100% of original training Same as above
data)

75 images of each class (10% 250 images of each class


101_food_classes_10_percent Same as above All classes from Food101 (101) of the original Food101 (same as original Food101
training dataset) dataset)

The dataset we’re using in Transfer Learning with TensorFlow Part 3: Scaling up
fi
fi
Let’s code!
What are callbacks?
• Callbacks are a tool which can add helpful functionality to your models during training,
evaluation or inference

• Some popular callbacks include:

Callback name Use case Code

Log the performance of multiple models and then view and compare
these models in a visual way on TensorBoard (a dashboard for
TensorBoard tf.keras.callbacks.TensorBoard()
inspecting neural network parameters). Helpful to compare the results
of di erent models on your data.
Save your model as it trains so you can stop training if needed and
Model checkpointing come back to continue o where you left. Helpful if training takes a tf.keras.callbacks.ModelCheckpoint()
long time and can't be done in one sitting.

Leave your model training for an arbitrary amount of time and have it
Early stopping stop training automatically when it ceases to improve. Helpful when tf.keras.callbacks.EarlyStopping()
you've got a large dataset and don't know how long training will take.
ff
ff
Original Model vs. Feature Extraction
Changes Output layer(s) gets trained
Output Layer (shape = 1000) 10 on new data


Layer 235 Layer 235

Layer 234 Layer 234


Working Stays same (frozen)
architecture …


(original model layers
(e.g. E cientNet) don’t update during training)
Layer 2 Layer 2

Input Layer Input Layer



Changes

Large dataset (e.g. ImageNet) Di erent dataset (e.g. 10 classes of food)

Original Model Feature Extraction Transfer Learning Model


ff
ffi
Feature extraction vs. Fine-tuning
Custom final layer
Stays same
gets trained to custom 10 10
data
Top layers get


unfrozen and fine-
Layer 235 Layer 235 tuned on custom data
Changes
(unfrozen)
Layer 234 Layer 234
Working architecture


(e.g. E cientNet) pre-
trained on ImageNet
Layer 2 Layer 2
Stays same
Bottom layers (ma
(frozen) y)
stay frozen
Input Layer Input Layer


Fine-t
Might change usual uning
ly req
more d uires
ata th
featur an
Custom dataset (e.g. 10 classes of food) extrac e
tion

Feature extraction Fine-tuning


ffi
Kinds of Transfer Learning
Transfer
Description What happens When to use
Learning Type

Take a pretrained model as it is and


The original model remains Helpful if you have the exact same kind of data
Original model (“As is”) apply it to your task without any
unchanged. the original model was trained on.
changes.

Take the underlying patterns (also Helpful if you have a small amount of custom
Most of the layers in the original
called weights) a pretrained model data (similar to what the original model was
Feature extraction has learned and adjust its outputs
model remain frozen during training
trained on) and want to utilise a pretrained model
(only the top 1-3 layers get updated).
to be more suited to your problem. to get better results on your speci c problem.

Helpful if you have a large amount of custom data


Take the weights of a pretrained Some (1-3+), many or all of the
and want to utilise a pretrained model and
Fine-tuning model and adjust ( ne-tune) them layers in the pretrained model are
improve its underlying patterns to your speci c
to your own problem. updated during training.
problem.
fi
fi
fi
What is a feature vector?
• A feature vector is a learned representation of the input data (a compressed form of the
input data based on how the model see’s it)

May be pre-trained (e.g. on ImageNet) or from scratch

[0.940, 0.242, 0.849…]


E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html

Output
Input data Model
e r e p r es en t a t i o n o f i n p u t (feature vector)
Lea r n s f e a t ur
data
ffi
ffi
Model we’ve created
Input data Model Output

GlobalAvgPool2D
augmentation
Data

10
E cientNetB0 architecture.
Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html

l y - c on n e c t e d
Ful
s if ie r l a y e r
(dense) clas
ffi
ffi
EfficientNet feature extractor
Input data
(10 classes of Food101)

Changes
Stays same
(same shape as number
(frozen, pre-trained on ImageNet)
of classes)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
a s s if ie r l a y e r
(dense) cl
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101) y e r g et
t o t h e o u t p u t l a
Layers closer
e n / fi n e - t u n e d f i r st
unfroz
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.0001 ⬇

Stays same Changes


(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.000025 ⬇

Stays same Changes


(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.00001 ⬇

Stays same Changes


(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.0000025 ⬇
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.000001 ⬇

Stays same Changes


(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.00000025 ⬇
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.0000001 ⬇

Stays same Changes


(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.000000025 ⬇
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
(some common)

Classification evaluation methods


Key: tp = True Positive, tn = True Negative, fp = False Positive, fn = False Negative

Metric Name Metric Forumla Code When to use

tp + tn tf.keras.metrics.Accuracy() Default metric for classi cation


Accuracy Accuracy = or problems. Not the best for
tp + tn + fp + fn sklearn.metrics.accuracy_score() imbalanced classes.

tp tf.keras.metrics.Precision() Higher precision leads to less false


Precision Precision = or
tp + fp sklearn.metrics.precision_score() positives.

tp tf.keras.metrics.Recall() Higher recall leads to less false


Recall Recall = or
tp + fn sklearn.metrics.recall_score() negatives.

precision ⋅ recall Combination of precision and recall,


F1-score F1-score = 2 ⋅ sklearn.metrics.f1_score() usually a good overall metric for a
precision + recall classi cation model.

When comparing predictions to truth


Custom function labels to see where model gets
Confusion matrix NA or
sklearn.metrics.confusion_matrix() confused. Can be hard to use with
large numbers of classes.
fi
fi
Finding the most wrong predictions
• A good way to inspect your model’s performance is to view the wrong predictions with the
highest prediction probability (or highest loss)
• Can reveal insights such as:
• Data issues (wrong labels, e.g. model is right, label is wrong)
• Confusing classes (get better/more diverse data)

Wrong label?

i n g c la s s
Co nf u s
Anatomy of a confusion matrix
Confusion Matrix
False positives

0 99 (98.0%) 2 (2.0%) • True positive = model predicts 1 when truth is 1


• True negative = model predicts 0 when truth is 0
True Label

• False positive = model predicts 1 when truth is 0


• False negative = model predicts 0 when truth is 1
1 0 (0.0%) 99 (100.0%)

Correct predictions
t i v e s 0 1 (true positives,
e g a
False n Predicted Label true negatives)

You might also like