04 Evaluation
04 Evaluation
Vincenzo Lomonaco
University of Pisa & ContinualAI
[email protected]
TABLE OF CONTENTS
01 02 03
Evaluation Continual Avalanche
Protocol Learning Metrics &
Metrics Loggers
Evaluation
Protocols
Classic ML Evaluation
Variations allowed
● K-fold Cross-Validation
● Leave-one-out
● ...
Different Data
A Simple Extension to CL
● Training phase: train the model on training sets of each experience, sequentially
● Test phase: evaluate the model on the sets of the experiences (order does not matter)
● Cross-Validation & Hyper-parameters selection can be operated based on the final aggregate metric
at the end of the training.
When and What to Test On
When to test?
On what to test?
● Current experience
● Future experiences
● Past experiences
● All experiences
● …
Continuous Learning in Single-Incremental-Task Scenarios. Maltoni & Lomonaco, Neural Networks Journal 2019.
Is it Enough for CL?
● Split by experiences: model selection on a first set of Train Split Test Split
Efficient Lifelong Learning with A-GEM Chaudhry et. al. ICLR, 2019.
Hyper-parameters Selection for CL
● We mentioned Hyper-parameters selection can be operated based on the final aggregate metric at the
end of the training
● But this may be seen as a form of cheating: we select the best hyperparameters that maximize the the
performance on a specific sequence of training experiences
● We may partially solve this with several runs with a random order of the training experiences
● This may be still unfair: we should calibrate hyper-parameters on a limited set of experiences
Class-incremental learning: survey and performance evaluation on image classification. Masana et al. 2020.
A continual learning survey: Defying forgetting in classification tasks. De Lange et al, 2019.
A more Articulated Protocol: An Example
Efficient Lifelong Learning with A-GEM Chaudhry et. al. ICLR, 2019.
Continual Learning
Metrics
What to Monitor?
Gradient Episodic Memory for Continual Learning, Lopez-Paz et al. NIPS, 2017.
Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges, Lesort et al. Information Fusion, 2020.
Accuracy
Q: How accurate is my model?
ACC Metric
A Metric
Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Gradient Episodic Memory for Continual Learning. Lopez-Paz & Ranzato, NeurIPS 2017.
Forward Transfer
FWT Metric
Gradient Episodic Memory for Continual Learning. Lopez-Paz & Ranzato, NeurIPS 2017.
Backward Transfer
BWT Metric
FORGETTING = - BWT
Gradient Episodic Memory for Continual Learning. Lopez-Paz & Ranzato, NeurIPS 2017.
Memory
Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Computation
Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Don't Forget: There is More than Forgetting!
Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Summing Up
● Choose the metrics you monitor wisely (what are you interested in?)
Q: which metrics would you monitor to evaluate a continual learner deployed and trained on the edge on
image classification tasks?
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass
Recall lecture 1: there are various Depending on where inspiration has been drawn from, continual
machine learning formulations that learning setups and evaluation can vary dramatically.
have continuous components
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
Avalanche Metrics
& Loggers
How to Monitor Experiments?
● Metrics (accuracy, forgetting, CPU Usage…) - you can create your own!
● Loggers to report results in different ways - you can create your own!
● Automatic integration in the training and evaluation loop through the Evaluation Plugin
● A dictionary with all recorded metrics always available for custom use
V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Let’s Track our Experiments
V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Interactive Logger Output
V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Tensorboard Logger in Action
V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Standalone Metrics
V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
What’s Next?
● The objective of a shared protocol is possible only with the help of the community
V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
!
Avalanche Evaluation Module Demo Session
Avalanche, From Zero To Hero tutorial: Evaluation, Avalanche, From Zero To Hero tutorial: Loggers
Next:
Methodologies [Part 1]
Do you have any questions?
[email protected]
vincenzolomonaco.com
University of Pisa
THANKS
CREDITS: This presentation template was created by Slidesgo,
including icons by Flaticon, and infographics & images by Freepik