06 Transfer Learning With Tensorflow Part 3 Scaling Up
06 Transfer Learning With Tensorflow Part 3 Scaling Up
Part 3: Scaling up
Where can you get help?
“If in doubt, run the code”
Model learns patterns/weights from similar problem space Patterns get used/tuned to speci c problem
fi
“Why use transfer learning?”
Why use transfer learning?
• Can leverage an existing neural network architecture proven to work on problems similar to our
own
• Can leverage a working network architecture which has already learned patterns on similar
data to our own (often results in great results with less data)
• Fine-tuning our feature extraction model (🍔 👁Food Vision mini) to beat the
original Food101 paper with only 10% of the data
• Making predictions with Food Vision mini on our own custom images
👩🍳 👩🔬
(we’ll be cooking up lots of code!)
How:
Serial experimentation
Everything you got
Xtra-Large dataset,
xtra-large model
Small dataset,
small model
750 images of pizza and steak 250 images of pizza and steak
pizza_steak Food101 Pizza, steak (2) (same as original Food101 (same as original Food101
dataset) dataset)
The dataset we’re using in Transfer Learning with TensorFlow Part 3: Scaling up
fi
fi
Let’s code!
What are callbacks?
• Callbacks are a tool which can add helpful functionality to your models during training,
evaluation or inference
Log the performance of multiple models and then view and compare
these models in a visual way on TensorBoard (a dashboard for
TensorBoard tf.keras.callbacks.TensorBoard()
inspecting neural network parameters). Helpful to compare the results
of di erent models on your data.
Save your model as it trains so you can stop training if needed and
Model checkpointing come back to continue o where you left. Helpful if training takes a tf.keras.callbacks.ModelCheckpoint()
long time and can't be done in one sitting.
Leave your model training for an arbitrary amount of time and have it
Early stopping stop training automatically when it ceases to improve. Helpful when tf.keras.callbacks.EarlyStopping()
you've got a large dataset and don't know how long training will take.
ff
ff
Original Model vs. Feature Extraction
Changes Output layer(s) gets trained
Output Layer (shape = 1000) 10 on new data
…
Layer 235 Layer 235
…
(original model layers
(e.g. E cientNet) don’t update during training)
Layer 2 Layer 2
…
Changes
…
unfrozen and fine-
Layer 235 Layer 235 tuned on custom data
Changes
(unfrozen)
Layer 234 Layer 234
Working architecture
…
(e.g. E cientNet) pre-
trained on ImageNet
Layer 2 Layer 2
Stays same
Bottom layers (ma
(frozen) y)
stay frozen
Input Layer Input Layer
…
…
Fine-t
Might change usual uning
ly req
more d uires
ata th
featur an
Custom dataset (e.g. 10 classes of food) extrac e
tion
Take the underlying patterns (also Helpful if you have a small amount of custom
Most of the layers in the original
called weights) a pretrained model data (similar to what the original model was
Feature extraction has learned and adjust its outputs
model remain frozen during training
trained on) and want to utilise a pretrained model
(only the top 1-3 layers get updated).
to be more suited to your problem. to get better results on your speci c problem.
Output
Input data Model
e r e p r es en t a t i o n o f i n p u t (feature vector)
Lea r n s f e a t ur
data
ffi
ffi
Model we’ve created
Input data Model Output
GlobalAvgPool2D
augmentation
Data
10
E cientNetB0 architecture.
Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
l y - c on n e c t e d
Ful
s if ie r l a y e r
(dense) clas
ffi
ffi
EfficientNet feature extractor
Input data
(10 classes of Food101)
Changes
Stays same
(same shape as number
(frozen, pre-trained on ImageNet)
of classes)
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
a s s if ie r l a y e r
(dense) cl
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101) y e r g et
t o t h e o u t p u t l a
Layers closer
e n / fi n e - t u n e d f i r st
unfroz
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.0001 ⬇
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.000025 ⬇
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.00001 ⬇
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.0000025 ⬇
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.000001 ⬇
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.00000025 ⬇
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.0000001 ⬇
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101)
Learning rate: 0.000000025 ⬇
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen)
10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
ffi
(some common)
Wrong label?
i n g c la s s
Co nf u s
Anatomy of a confusion matrix
Confusion Matrix
False positives
Correct predictions
t i v e s 0 1 (true positives,
e g a
False n Predicted Label true negatives)