0% found this document useful (0 votes)

14 views35 pages

04 Evaluation

Uploaded by

Huang Leonard

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views35 pages

04 Evaluation

Uploaded by

Huang Leonard

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Continual Learning: On Machines

that can Learn Continually

Ofﬁcial Open-Access Course @ University of Pisa, ContinualAI, AIDA

Lecture 4: Evaluation & Metrics

Vincenzo Lomonaco
University of Pisa & ContinualAI
[email protected]
TABLE OF CONTENTS

01 02 03
Evaluation Continual Avalanche
Protocol Learning Metrics &
Metrics Loggers
Evaluation
Protocols
Classic ML Evaluation

Train - Validation - Test split

● Model selection: train on training set, eval

on validation set

● Model assessment: train on training (+

validation) set, eval on test set

Variations allowed

● K-fold Cross-Validation
● Leave-one-out
● ...

Test, training and validation sets (brainstobytes.com)

Evaluating Machine Learning Models, by Alice Zheng, 2015.

Basic CL Evaluation Protocol

Different Data

● Classic Machine Learning -> static dataset

Train Split
● Continual Learning -> stream of datasets (experiences) Test Split

A Simple Extension to CL

● Split by patterns: one train-(validation)-test per

experience (or parallel streams of experiences)

● This is the simplest and most common evaluation

protocol
Split by Patterns

● Training phase: train the model on training sets of each experience, sequentially

● Test phase: evaluate the model on the sets of the experiences (order does not matter)

● Examples in the training and test sets are disjoint!

● We may have a single test set or one for each experience

● Multiple evaluation streams are possible (Valid, Test, Out-of-Distribution, etc.)

● Cross-Validation & Hyper-parameters selection can be operated based on the ﬁnal aggregate metric
at the end of the training.
When and What to Test On

When to test?

● At the end of each experience, usually.

● A ﬁner granularity is always possible (epochs, iterations, etc.)

On what to test?

● Current experience
● Future experiences
● Past experiences
● All experiences
● …

...depending also on the metrics you want to use!

Growing vs Fixed Test Set

Growing Test Set

● We consider only the test set of the current

and previously encountered experiences

● Compute the performance metrics average

over those

Fixed test set

● Common for some benchmarks

● Clear view on overall system performance

● Recover experience-wise performance, if

needed

Continuous Learning in Single-Incremental-Task Scenarios. Maltoni & Lomonaco, Neural Networks Journal 2019.
Is it Enough for CL?

● Split by patterns: one train-validation-test per

experience (or parallel streams of experiences)

● But is it enough for Continual Learning? -> we would like

Train Split
a way to evaluate if we are actually able to learn
continually! Test Split

● Split by experiences: model selection on a ﬁrst set of Train Split Test Split

experiences, model assessment on a second set of

experiences

● Model assessment should also involve training.

Efﬁcient Lifelong Learning with A-GEM Chaudhry et. al. ICLR, 2019.
Hyper-parameters Selection for CL

● We mentioned Hyper-parameters selection can be operated based on the ﬁnal aggregate metric at the
end of the training

● But this may be seen as a form of cheating: we select the best hyperparameters that maximize the the
performance on a speciﬁc sequence of training experiences

● We may partially solve this with several runs with a random order of the training experiences

● This may be still unfair: we should calibrate hyper-parameters on a limited set of experiences

Class-incremental learning: survey and performance evaluation on image classification. Masana et al. 2020.
A continual learning survey: Defying forgetting in classification tasks. De Lange et al, 2019.
A more Articulated Protocol: An Example

● Model selection: train the

model on a ﬁrst split of
experiences, select best
hyperparameters with a
cross-validation scheme.

● Model assessment: train &

evaluate the CL algorithm on
a second split of experiences

Efﬁcient Lifelong Learning with A-GEM Chaudhry et. al. ICLR, 2019.
Continual Learning
Metrics
What to Monitor?

● Performance on current experience

● Performance on past experiences
● Performance on future experiences
● Resource consumption
(Memory / CPU / GPU / Disk usage)
● Model size growth
(with respect to the ﬁrst model)
● Execution time
● Data efﬁciency
● ...

Gradient Episodic Memory for Continual Learning, Lopez-Paz et al. NIPS, 2017.
Continual learning for robotics: Deﬁnition, framework, learning strategies, opportunities and challenges, Lesort et al. Information Fusion, 2020.
Accuracy
Q: How accurate is my model?

In many different sauces

● Accuracy on the current experience

● Accuracy on previous experiences (plus the current one)
● Accuracy on future experiences (plus the current one)

ACC Metric

● After training on all experiences, average accuracy over all

the test experiences.

A Metric

● Average of the accuracy on all experiences at any point in

time.

Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Gradient Episodic Memory for Continual Learning. Lopez-Paz & Ranzato, NeurIPS 2017.
Forward Transfer

Q: How much learning the current experience improves my

performance on future experiences?

FWT Metric

● Accuracy on experience i after training on last experience

Minus

● Accuracy on experience i before training on the ﬁrst

experience (model init)

● Averaged over i=2,...,T

Gradient Episodic Memory for Continual Learning. Lopez-Paz & Ranzato, NeurIPS 2017.
Backward Transfer

Q: How much learning the current experience improves my

performance on previous experiences?

BWT Metric

● Accuracy on experience i after training on experience T

Minus

● Accuracy on experience i after training on experience i

● Averaged over i=1,...,T-1

FORGETTING = - BWT

Gradient Episodic Memory for Continual Learning. Lopez-Paz & Ranzato, NeurIPS 2017.
Memory

Not only performance

● How much space does your model occupy? (MB, #

of params, etc.)

● What is the increment in space required for each

new experience?

● How much space do you require for additional

information (replay buffer, past models…)?

Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Computation

Not only performance

● What is the computational overhead during

training? (# MACs, Running Time, GPU/CPU time, …)

● What about its increment over time?

● What is the computational overhead during

inference?

Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Don't Forget: There is More than Forgetting!

A plethora of other possible metrics

● Accuracy vs ofﬂine baseline

● Model Robustness
● Model Plasticity & Capacity
● …

More complex Score Functions

● Additional, more informative derived

metrics can be devised as well.

● They can be tuned depending on the

speciﬁc application goals.

Don't forget, there is more than forgetting: new metrics for Continual Learning, Rrodriguez-Diaz et al. CL Workshop @ NeurIPS, 2018.
Summing Up

● Choose an evaluation protocol and declare it (no standard, yet)

● Choose the metrics you monitor wisely (what are you interested in?)

● Do not focus exclusively on performance metrics, if possible

Q: you can achieve low/zero forgetting by occupying a lot of space. How?

Q: which metrics would you monitor to evaluate a continual learner deployed and trained on the edge on
image classiﬁcation tasks?
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass

Recall lecture 1: there are various Depending on where inspiration has been drawn from, continual
machine learning formulations that learning setups and evaluation can vary dramatically.
have continuous components

(a small snapshot from the overall paradigm relationships)

CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass

Existence of various scenarios is not a

problem, but actually meaningful because
different applications can desire different things!

But reproducibility & comparability can be

problematic, which is a constant subject in the
scientiﬁc literature.

Recently, the CLEVA-Compass has been

introduced to promote transparency &
comparability

CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass

Inner compass level (star plot):

indicates related paradigm inspiration &
continual setting conﬁguration (assumptions)

CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass

Inner compass level (star plot):

indicates related paradigm inspiration &
continual setting conﬁguration (assumptions)

Inner compass level of supervision:

“rings” on the star plot indicate presence of
supervision. Importantly: supervision is individual
to each dimension!

CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
CLEVA-Compass: A Continual Learning EValuation
Assessment Compass

Inner compass level (star plot):

indicates related paradigm inspiration &
continual setting conﬁguration (assumptions)

Inner compass level of supervision:

“rings” on the star plot indicate presence of
supervision. Importantly: supervision is individual
to each dimension!

Outer compass level:

Contains a comprehensive set of practically
reported measures

CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability, arXiv, 2021.
Avalanche Metrics
& Loggers
How to Monitor Experiments?

Evaluation module provides

● Metrics (accuracy, forgetting, CPU Usage…) - you can create your own!

● Loggers to report results in different ways - you can create your own!

● Automatic integration in the training and evaluation loop through the Evaluation Plugin

● A dictionary with all recorded metrics always available for custom use

V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Let’s Track our Experiments

V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Interactive Logger Output

-- >> Start of training phase << --

-- Starting training on experience 0 (Task 0) from train stream --
Epoch 0 ended.
Loss_Epoch/train_phase/train_stream/Task000 = 1.1099
Top1_Acc_Epoch/train_phase/train_stream/Task000 = 0.8926
...
-- >> End of training phase << --
-- >> Start of eval phase << --
-- Starting eval on experience 0 (Task 0) from test stream --
> Eval on experience 0 (Task 0) from test stream ended.
Loss_Exp/eval_phase/test_stream/Task000/Exp000 = 0.0208
Top1_Acc_Exp/eval_phase/ test_stream/Task000/Exp000 = 0.9981
...
-- >> End of eval phase << --
Loss_Stream/eval_phase/test_stream = 4.4492

V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Tensorboard Logger in Action

V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
Standalone Metrics

V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
What’s Next?

● Evaluation of a CL algorithm is not only about metrics and loggers.

● More support for the deﬁnition of training and evaluation protocols

○ How to perform cross validation in CL?

○ How to evaluate multiple runs?

● The objective of a shared protocol is possible only with the help of the community

V. Lomonaco et al. Avalanche: an End-to-End Library for Continual Learning. CLVision Workshop at CVPR 2021.
!
Avalanche Evaluation Module Demo Session

Avalanche, From Zero To Hero tutorial: Evaluation, Avalanche, From Zero To Hero tutorial: Loggers
Next:
Methodologies [Part 1]
Do you have any questions?

[email protected]
vincenzolomonaco.com
University of Pisa

THANKS
CREDITS: This presentation template was created by Slidesgo,
including icons by Flaticon, and infographics & images by Freepik

Begging
100% (1)
Begging
8 pages
Drone Market Report 2025 2030 Sample
No ratings yet
Drone Market Report 2025 2030 Sample
54 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
D. Preliminary Activities
100% (1)
D. Preliminary Activities
16 pages
DL Lab Manual
No ratings yet
DL Lab Manual
65 pages
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
No ratings yet
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
2 pages
What Happens If I Fail My Thesis
100% (3)
What Happens If I Fail My Thesis
5 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
UNIT-2 Goyal Question & Answer
No ratings yet
UNIT-2 Goyal Question & Answer
5 pages
The ML Test Score: A Rubric For ML Production Readiness and Technical Debt Reduction
No ratings yet
The ML Test Score: A Rubric For ML Production Readiness and Technical Debt Reduction
10 pages
New Leader Assimilation: Process and Outcomes
No ratings yet
New Leader Assimilation: Process and Outcomes
21 pages
Chapter 12
No ratings yet
Chapter 12
44 pages
Neda PDPMR Final PDF
No ratings yet
Neda PDPMR Final PDF
84 pages
CSE445 NSU Week - 1
No ratings yet
CSE445 NSU Week - 1
28 pages
5 CommonPractices
No ratings yet
5 CommonPractices
106 pages
Thesis To Read 4
No ratings yet
Thesis To Read 4
156 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
2019 - Yeung Et Al - Accuracy and Precision of 3d-Printed Implant Surgical Guideswith Different Implant Systems An in Vitro Study
No ratings yet
2019 - Yeung Et Al - Accuracy and Precision of 3d-Printed Implant Surgical Guideswith Different Implant Systems An in Vitro Study
8 pages
A PyTorch Library For Deep Continual Learning
No ratings yet
A PyTorch Library For Deep Continual Learning
6 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
ML Advice Lecture
No ratings yet
ML Advice Lecture
87 pages
Jade Abbott - Mls Hidden Tasks
No ratings yet
Jade Abbott - Mls Hidden Tasks
78 pages
Lecture 1 Ai
No ratings yet
Lecture 1 Ai
38 pages
Archeology-The-Science-Of-The-Human-Past - Compress 2
No ratings yet
Archeology-The-Science-Of-The-Human-Past - Compress 2
81 pages
Lecture 8 - Lifecycle of A Data Science Project - Part 2
No ratings yet
Lecture 8 - Lifecycle of A Data Science Project - Part 2
43 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
CS3244 (2120) - Project Discussion 1 - Overview
No ratings yet
CS3244 (2120) - Project Discussion 1 - Overview
25 pages
Continual Learning of Large Language Models: A Comprehensive Survey
No ratings yet
Continual Learning of Large Language Models: A Comprehensive Survey
57 pages
Final - State of The Art Research Methodology For Machine
No ratings yet
Final - State of The Art Research Methodology For Machine
53 pages
Continual Learning For Recurrent Neural Networks - An Empirical Evaluation
No ratings yet
Continual Learning For Recurrent Neural Networks - An Empirical Evaluation
48 pages
Driven To Distraction AS1.10 Practice Assessment Ver2 PDF
No ratings yet
Driven To Distraction AS1.10 Practice Assessment Ver2 PDF
13 pages
Introduction To Deep Learning AI 2025
No ratings yet
Introduction To Deep Learning AI 2025
78 pages
Seminar Report Template-2
No ratings yet
Seminar Report Template-2
27 pages
20211a6658.technical Seminar
No ratings yet
20211a6658.technical Seminar
35 pages
Federated Continual Learning For Edge-AI: A Comprehensive Survey
No ratings yet
Federated Continual Learning For Edge-AI: A Comprehensive Survey
35 pages
The Relationship Between The Lifelong Learning Tendencies and Teacher Self-Efficacy Levels of Social Studies Teacher Candidates
No ratings yet
The Relationship Between The Lifelong Learning Tendencies and Teacher Self-Efficacy Levels of Social Studies Teacher Candidates
18 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Consumer Psychology of Brands - Course Outline
No ratings yet
Consumer Psychology of Brands - Course Outline
11 pages
dbms-10 Marks
No ratings yet
dbms-10 Marks
32 pages
A Process For Implementing Industrial Predictive Maintenance - Part II - Google Cloud Blog
No ratings yet
A Process For Implementing Industrial Predictive Maintenance - Part II - Google Cloud Blog
11 pages
05 Methodologies Part1
No ratings yet
05 Methodologies Part1
33 pages
Main Neurips
No ratings yet
Main Neurips
32 pages
A Continual Learning Survey Defying Forgetting in Classification Tasks
No ratings yet
A Continual Learning Survey Defying Forgetting in Classification Tasks
20 pages
An Efficient Data Partitioning To Improve Classification Performance While Keeping Parameters Interpretable
No ratings yet
An Efficient Data Partitioning To Improve Classification Performance While Keeping Parameters Interpretable
16 pages
5 DL
No ratings yet
5 DL
33 pages
2023 - Agile Effort Estimation - Have We Solved The Problem Yet - Insights From A Replication Study
No ratings yet
2023 - Agile Effort Estimation - Have We Solved The Problem Yet - Insights From A Replication Study
19 pages
Water: Assessment of Rainwater Harvesting Potential From Roof Catchments Through Clustering Analysis
No ratings yet
Water: Assessment of Rainwater Harvesting Potential From Roof Catchments Through Clustering Analysis
14 pages
The ML Test Score A Rubric For ML Production Readiness and Technical
No ratings yet
The ML Test Score A Rubric For ML Production Readiness and Technical
10 pages
Midterm Study Guide Csci566
No ratings yet
Midterm Study Guide Csci566
20 pages
Brain-Inspired Replay For Continual Learning With Artificial Neural Networks
No ratings yet
Brain-Inspired Replay For Continual Learning With Artificial Neural Networks
14 pages
Self Intro Corning
No ratings yet
Self Intro Corning
20 pages
Introduction To Data Science: Bill Howe, PHD
No ratings yet
Introduction To Data Science: Bill Howe, PHD
9 pages
Kim 2022 The Effect of Civilian Oversight On Police Organizational Performance A Quasi Experimental Study
No ratings yet
Kim 2022 The Effect of Civilian Oversight On Police Organizational Performance A Quasi Experimental Study
16 pages
Mohamed Et Al. - 2021 - Basic and Complex Cognitive Functions in Adult ADHD
No ratings yet
Mohamed Et Al. - 2021 - Basic and Complex Cognitive Functions in Adult ADHD
18 pages
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
No ratings yet
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
127 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
4 - 2 Continuous Machine Learning
No ratings yet
4 - 2 Continuous Machine Learning
17 pages
Self Intro Corning
No ratings yet
Self Intro Corning
15 pages
Continual Learning For Smart City: A Survey: Li Yang, Zhipeng Luo, Shiming Zhang, Fei Teng, and Tianrui Li
No ratings yet
Continual Learning For Smart City: A Survey: Li Yang, Zhipeng Luo, Shiming Zhang, Fei Teng, and Tianrui Li
24 pages
NMSCST - Mary Analyn Lim - Assignment#2 - Setember - 9 - 2024
No ratings yet
NMSCST - Mary Analyn Lim - Assignment#2 - Setember - 9 - 2024
12 pages
Feature Transformers A Unified Representation Learning Framework For Lifelong Learning
No ratings yet
Feature Transformers A Unified Representation Learning Framework For Lifelong Learning
11 pages
HRA Mod 1
No ratings yet
HRA Mod 1
12 pages
CW World Main
No ratings yet
CW World Main
14 pages
Lecture 2.2 Example Data Preparation Feature Engineering
No ratings yet
Lecture 2.2 Example Data Preparation Feature Engineering
25 pages
Guild Introduction3
No ratings yet
Guild Introduction3
13 pages
Continual Learning
No ratings yet
Continual Learning
12 pages
Online Continual Learning With Natural Distribution Shifts - An Empirical Study With Visual Data
No ratings yet
Online Continual Learning With Natural Distribution Shifts - An Empirical Study With Visual Data
10 pages
Yamaha Powered Loudspeakers Brochure 2024 en Web
No ratings yet
Yamaha Powered Loudspeakers Brochure 2024 en Web
19 pages
Optimization of Hyper-Parameter For CNN Model Using Genetic Algorithm
No ratings yet
Optimization of Hyper-Parameter For CNN Model Using Genetic Algorithm
6 pages
Pchandra 2005 505 509
No ratings yet
Pchandra 2005 505 509
5 pages
Guild Introduction3
No ratings yet
Guild Introduction3
9 pages
Continual Learning Proposal
No ratings yet
Continual Learning Proposal
11 pages
5as Physical Activity
No ratings yet
5as Physical Activity
7 pages
Iii Semester Core Papers: C-301. Strategic Management C-301 Strategic Management 100 4 0 0 4 Unit-I
No ratings yet
Iii Semester Core Papers: C-301. Strategic Management C-301 Strategic Management 100 4 0 0 4 Unit-I
3 pages
Project Based Learning For Electrostatics: César Mora, Carlos Collazos, Ricardo Otero, Jaime Isaza
No ratings yet
Project Based Learning For Electrostatics: César Mora, Carlos Collazos, Ricardo Otero, Jaime Isaza
5 pages
Argumentative Essay Sample
No ratings yet
Argumentative Essay Sample
4 pages
Cont Learning
No ratings yet
Cont Learning
33 pages
Mini Project 1
No ratings yet
Mini Project 1
4 pages
Guild Introduction
No ratings yet
Guild Introduction
6 pages
DENFIS: Dynamic Evolving Neural-Fuzzy Inference System and Its Application For Time-Series Prediction
No ratings yet
DENFIS: Dynamic Evolving Neural-Fuzzy Inference System and Its Application For Time-Series Prediction
11 pages
C L C S N N: Ontinual Earning With Olumnar Piking Eural Etworks
No ratings yet
C L C S N N: Ontinual Earning With Olumnar Piking Eural Etworks
12 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
Nero Solution Litreture From Help
No ratings yet
Nero Solution Litreture From Help
6 pages
A Comprehensive Empirical Evaluation On Online Continual Learning ICCV WS 2023
No ratings yet
A Comprehensive Empirical Evaluation On Online Continual Learning ICCV WS 2023
11 pages
A Study On The Trend of Luxury Hotels
No ratings yet
A Study On The Trend of Luxury Hotels
4 pages
Dapus Umum - No 1 Dan 10
No ratings yet
Dapus Umum - No 1 Dan 10
3 pages
Essay On Stem Cells
No ratings yet
Essay On Stem Cells
4 pages
Sales Account Manager Biotechnology in Boston MA Resume Cynthia Smith
No ratings yet
Sales Account Manager Biotechnology in Boston MA Resume Cynthia Smith
2 pages
Econ F342 - Compre - QP
No ratings yet
Econ F342 - Compre - QP
2 pages
13 Customer Frustration
No ratings yet
13 Customer Frustration
24 pages
Mimi-Choon Quiñones, PHD, MBA, Joins Hispanic Health Council As The Co-Chief Research Officer
No ratings yet
Mimi-Choon Quiñones, PHD, MBA, Joins Hispanic Health Council As The Co-Chief Research Officer
2 pages
ANALISA Entrance - Exit Survey 2018
No ratings yet
ANALISA Entrance - Exit Survey 2018
1 page
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet