0% found this document useful (0 votes)

59 views13 pages

Iso Iec TS 4213 2022

ISO/IEC TS 4213:2022 outlines methodologies for assessing the classification performance of machine learning models, systems, and algorithms. It emphasizes the importance of consistent approaches in performance evaluation, considering various metrics and methodologies tailored to specific use cases and data characteristics. The document also addresses the significance of methodological controls to ensure fair and representative results in machine learning classification assessments.

Uploaded by

Riccardo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views13 pages

Iso Iec TS 4213 2022

Uploaded by

Riccardo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

TECHNICAL ISO/IEC TS

SPECIFICATION 4213

First edition
2022-10

Information technology — Artificial

intelligence — Assessment of machine
learning classification performance
Technologies de l'information — Intelligence artificielle — Evaluation
des performances de classification de l'apprentissage machine

iTeh STANDARD PREVIEW

(standards.iteh.ai)
ISO/IEC TS 4213:2022
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
iec-ts-4213-2022

Reference number
ISO/IEC TS 4213:2022(E)

iTeh STANDARD PREVIEW

(standards.iteh.ai)
ISO/IEC TS 4213:2022
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
iec-ts-4213-2022

COPYRIGHT PROTECTED DOCUMENT

© ISO/IEC 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: [email protected]
Website: www.iso.org
Published in Switzerland

ii © ISO/IEC 2022 – All rights reserved

ISO/IEC TS 4213:2022(E)

Contents Page

Foreword...........................................................................................................................................................................................................................................v
Introduction............................................................................................................................................................................................................................... vi
1 Scope.................................................................................................................................................................................................................................. 1
2 Normative references...................................................................................................................................................................................... 1
3 Terms and definitions..................................................................................................................................................................................... 1
3.1 Classification and related terms.............................................................................................................................................. 1
3.2 Metrics and related terms............................................................................................................................................................. 1
4 Abbreviated terms.............................................................................................................................................................................................. 3
5 General principles............................................................................................................................................................................................... 4
5.1 Generalized process for machine learning classification performance assessment................ 4
5.2 Purpose of machine learning classification performance assessment................................................. 4
5.3 Control criteria in machine learning classification performance assessment............................... 5
5.3.1 General......................................................................................................................................................................................... 5
5.3.2 Data representativeness and bias........................................................................................................................ 5
5.3.3 Preprocessing........................................................................................................................................................................ 5
5.3.4 Training data........................................................................................................................................................................... 5
5.3.5 Test and validation data................................................................................................................................................ 6
5.3.6 Cross-validation................................................................................................................................................................... 6
iTeh STANDARD PREVIEW
5.3.7 Limiting information leakage.................................................................................................................................. 6
5.3.8 Limiting channel effects............................................................................................................................................... 6
5.3.9 Ground truth........................................................................................................................................................................... 7
(standards.iteh.ai)
5.3.10 Machine learning algorithms, hyperparameters and parameters......................................... 7
5.3.11 Evaluation environment............................................................................................................................................... 8
5.3.12 Acceleration............................................................................................................................................................................. 8
5.3.13 Appropriate baselines ISO/IEC....................................................................................................................................................
TS 4213:2022 8
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
5.3.14 Machine learning classification performance context...................................................................... 8
iec-ts-4213-2022
6 Statistical measures of performance.............................................................................................................................................. 8
6.1 General............................................................................................................................................................................................................ 8
6.2 Base elements for metric computation.............................................................................................................................. 9
6.2.1 General......................................................................................................................................................................................... 9
6.2.2 Confusion matrix................................................................................................................................................................ 9
6.2.3 Accuracy..................................................................................................................................................................................... 9
6.2.4 Precision, recall and specificity............................................................................................................................. 9
6.2.5 F1 score......................................................................................................................................................................................... 9
6.2.6 Fβ........................................................................................................................................................................................................ 9
6.2.7 Kullback-Leibler divergence................................................................................................................................... 10
6.3 Binary classification........................................................................................................................................................................ 10
6.3.1 General...................................................................................................................................................................................... 10
6.3.2 Confusion matrix for binary classification............................................................................................... 11
6.3.3 Accuracy for binary classification.................................................................................................................... 11
6.3.4 Precision, recall, specificity, F1 score and Fβ for binary classification............................. 11
6.3.5 Kullback-Leibler divergence for binary classification.................................................................... 11
6.3.6 Receiver operating characteristic curve and area under the receiver
operating characteristic curve............................................................................................................................ 11
6.3.7 Precision recall curve and area under the precision recall curve....................................... 11
6.3.8 Cumulative response curve.................................................................................................................................... 12
6.3.9 Lift curve................................................................................................................................................................................. 12
6.4 Multi-class classification............................................................................................................................................................. 12
6.4.1 General...................................................................................................................................................................................... 12
6.4.2 Accuracy for multi-class classification......................................................................................................... 12
6.4.3 Macro-average, weighted-average and micro-average.................................................................. 12
6.4.4 Distribution difference or distance metrics............................................................................................ 13

© ISO/IEC 2022 – All rights reserved iii

ISO/IEC TS 4213:2022(E)

6.5 Multi-label classification.............................................................................................................................................................. 14

6.5.1 General...................................................................................................................................................................................... 14
6.5.2 Hamming loss...................................................................................................................................................................... 14
6.5.3 Exact match ratio............................................................................................................................................................. 15
6.5.4 Jaccard index........................................................................................................................................................................ 15
6.5.5 Distribution difference or distance metrics............................................................................................ 15
6.6 Computational complexity......................................................................................................................................................... 16
6.6.1 General...................................................................................................................................................................................... 16
6.6.2 Classification latency.................................................................................................................................................... 16
6.6.3 Classification throughput......................................................................................................................................... 17
6.6.4 Classification efficiency............................................................................................................................................. 17
6.6.5 Energy consumption..................................................................................................................................................... 17
7 Statistical tests of significance........................................................................................................................................................... 18
7.1 General......................................................................................................................................................................................................... 18
7.2 Paired Student’s t-test.................................................................................................................................................................... 18
7.3 Analysis of variance......................................................................................................................................................................... 19
7.4 Kruskal-Wallis test........................................................................................................................................................................... 19
7.5 Chi-squared test.................................................................................................................................................................................. 19
7.6 Wilcoxon signed-ranks test....................................................................................................................................................... 19
7.7 Fisher’s exact test............................................................................................................................................................................... 19
7.8 Central limit theorem..................................................................................................................................................................... 20
7.9 McNemar test......................................................................................................................................................................................... 20
7.10 Accommodating multiple comparisons.......................................................................................................................... 20
iTeh STANDARD PREVIEW
7.10.1 General...................................................................................................................................................................................... 20
7.10.2 Bonferroni correction.................................................................................................................................................. 20
7.10.3 False discovery rate....................................................................................................................................................... 21
8
(standards.iteh.ai)
Reporting................................................................................................................................................................................................................... 21
Annex A (informative) Multi-class classification performance illustration........................................................... 22
ISO/IEC TS 4213:2022
Annex Bhttps://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
(informative) Illustration of ROC curve derived from classification results.................................24
Annex C (informative) Summary information iec-ts-4213-2022 on machine learning classification
benchmark tests................................................................................................................................................................................................. 29
Annex D (informative) Chance-corrected cause-specific mortality fraction........................................................ 31
Bibliography.............................................................................................................................................................................................................................. 32

iv © ISO/IEC 2022 – All rights reserved

ISO/IEC TS 4213:2022(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance
are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria
needed for the different types of document should be noted. This document was drafted in
accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC
list of patent declarations received (see https://patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
iTeh STANDARD PREVIEW
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
(standards.iteh.ai)
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint ISO/IEC TS 4213:2022
Technical Committee ISO/IEC JTC 1, Information technology,
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
Subcommittee SC 42, Artificial intelligence.
iec-ts-4213-2022
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.

© ISO/IEC 2022 – All rights reserved v

ISO/IEC TS 4213:2022(E)

Introduction
As academic, commercial and governmental researchers continue to improve machine learning models,
consistent approaches and methods should be applied to machine learning classification performance
assessment.
Advances in machine learning are often reported in terms of improved performance relative to the
state of the art or a reasonable baseline. The choice of an appropriate metric to assess machine learning
model classification performance depends on the use case and domain constraints. Further, the
chosen metric can differ from the metric used during training. Machine learning model classification
performance can be represented through the following examples:
— A new model achieves 97,8 % classification accuracy on a dataset where the state-of-the-art model
achieves just 96,2 % accuracy.
— A new model achieves classification accuracy equivalent to the state of the art but requires much
less training data than state-of-the-art approaches.
— A new model generates inferences 100x faster than state-of-the-art models while maintaining
equivalent accuracy.
To determine whether these assertions are meaningful, aspects of machine learning classification
performance assessment including model implementation, dataset composition and results calculation
are taken into consideration. This document describes approaches and methods to ensure the relevance,
iTeh STANDARD PREVIEW
legitimacy and extensibility of machine learning classification performance assertions.
Various AI stakeholder roles as defined in ISO/IEC 22989:2022, 5.17 can take advantage of the
(standards.iteh.ai)
approaches and methods described in this document. For example, AI developers can use the approaches
and methods when evaluating ML models.
Methodological controls are put in place when assessing
ISO/IEC machine learning performance to ensure that
TS 4213:2022
results are fair and representative. Examples of these controls include establishing computational
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
environments, selecting and preparing datasets, and limiting leakage that potentially leads to
iec-ts-4213-2022
misleading classification results. Clause 5 addresses this topic.
Merely reporting performance in terms of accuracy can be inappropriate depending on the
characteristics of training data and input data. If a classifier is susceptible to majority class classification,
grossly unbalanced training data can overstate accuracy by representing the prior probabilities of
the majority class. Additional measurements that reflect more subtle aspects of machine learning
classification performance, such as macro-averaged metrics, are at times more appropriate. Further,
different types of machine learning classification, such as binary, multi-class and multi-label, are
associated with specific performance metrics. In addition to these metrics, aspects of classification
performance such as computational complexity, latency, throughput and efficiency can be relevant.
Clause 6 addresses these topics.
Complications can arise as a result of the distribution of training data. Statistical tests of significance
are undertaken to establish the conditions under which machine learning classification performance
differs meaningfully. Specific training, validation and test methodologies are used in machine learning
model development to address the range of potential scenarios. Clause 7 addresses these topics.
Annex A illustrates calculation of multi-class classification performance, using examples of positive and
negative classifications. Annex B illustrates a receiver operating characteristic (ROC) curve derived
from example data in Annex A.
Annex C summarizes results from machine learning classification benchmark tests.
Annex D discusses a chance-corrected cause-specific mortality fraction, a machine learning
classification use case. Apart from these, this document does not address any issues related to
benchmarking, applications or use cases.

vi © ISO/IEC 2022 – All rights reserved

TECHNICAL SPECIFICATION ISO/IEC TS 4213:2022(E)

Information technology — Artificial intelligence

— Assessment of machine learning classification
performance

1 Scope
This document specifies methodologies for measuring classification performance of machine learning
models, systems and algorithms.

2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989:2022, Information technology — Artificial intelligence — Artificial intelligence concepts
and terminology
ISO/IEC 23053:2022, Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)
iTeh STANDARD PREVIEW
3 (standards.iteh.ai)
Terms and definitions
For the purposes of this document, the terms and definitions in ISO/IEC 22989:2022, ISO/IEC 23053:2022,
and the following apply. ISO/IEC TS 4213:2022
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
iec-ts-4213-2022
— ISO Online browsing platform: available at https://w ww.iso.org/obp
— IEC Electropedia: available at https://w ww.electropedia.org/

3.1 Classification and related terms

3.1.1
classification
method of structuring a defined type of item (objects or documents) into classes and subclasses in
accordance with their characteristics
[SOURCE: ISO 7200:2004, 3.1]
3.1.2
classifier
trained model and its associated mechanism used to perform classification (3.1.1)

3.2 Metrics and related terms

3.2.1
evaluation
process of comparing the classification (3.1.1) predictions made by the model on data to the actual
labels in the data

© ISO/IEC 2022 – All rights reserved 1

ISO/IEC TS 4213:2022(E)

3.2.2
false negative
miss
type II error
FN
sample wrongly classified as negative
3.2.3
false positive
false alarm
type I error
FP
sample wrongly classified as positive
3.2.4
true positive
TP
sample correctly classified as positive
3.2.5
true negative
TN
sample correctly classified as negative
3.2.6
accuracy iTeh STANDARD PREVIEW
number of correctly classified samples divided by all classified samples

Note 1 to entry: It is calculated as a = ((standards.iteh.ai)

T + T ) / (T + F + T + F ) .
P N P P N N

3.2.7
confusion matrix
ISO/IEC TS 4213:2022
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
matrix used to record the number of correct and incorrect classifications (3.1.1) of samples
iec-ts-4213-2022
3.2.8
F1 score
F-score
F-measure
F1-measure
harmonic mean of precision (3.2.9) and recall (3.2.10)

Note 1 to entry: It is calculated as F1 = 2TP / ( 2TP + FP + FN ) .

3.2.9
precision
positive predictive value
number of samples correctly classified as positive divided by all samples classified as positive

Note 1 to entry: It is calculated as p = TP / (TP + FP ) .

3.2.10
recall
true positive rate
sensitivity
hit rate
number of samples correctly classified as positive divided by all positive samples

Note 1 to entry: It is calculated as r = TP / (TP + FN ) .

2 © ISO/IEC 2022 – All rights reserved

ISO/IEC TS 4213:2022(E)

3.2.11
specificity
selectivity
true negative rate
number of samples correctly classified as negative divided by all negative samples

Note 1 to entry: It is calculated as s = TN / (TN + FP ) .

3.2.12
false positive rate
fall-out
number of samples incorrectly classified as positive divided by all negative samples

Note 1 to entry: It is calculated as FP,R = FP / ( FP + TN ) .

3.2.13
cumulative response curve
gain chart
graphical method of displaying true positive rates (3.2.10) and percentage of positive prediction in the
total data across multiple thresholds
3.2.14
lift curve
graphical method of displaying on the y-axis the ratio of true positive rate (3.2.10) between the model
and a random classifier, and on the x-axis the percentage of positive predictions in the total data across
iTeh STANDARD PREVIEW
multiple thresholds
3.2.15
precision recall curve (standards.iteh.ai)
PRC
graphical method for displaying recall (3.2.10)TS
ISO/IEC and precision (3.2.9) across multiple thresholds
4213:2022
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
Note 1 to entry: A PRC is more suitable than a ROC (receiver operating characteristic) curve for showing
performance with imbalanced data. iec-ts-4213-2022
3.2.16
receiver operating characteristic curve
ROC curve
graphical method for displaying true positive rate (3.2.10) and false positive rate (3.2.12) across multiple
thresholds
3.2.17
cross-validation
method to estimate the performance of a machine learning method using a single dataset
Note 1 to entry: Cross-validation is typically used for validating design choices before training the final model.

3.2.18
majority class
class with the most samples in a dataset

4 Abbreviated terms

AI artificial intelligence

ANOVA analysis of variance

AUPRC area under the precision recall curve

© ISO/IEC 2022 – All rights reserved 3

ISO/IEC TS 4213:2022(E)

AUROC area under the receiver operating characteristic curve

CLT central limit theorem

CPU central processing unit

CRC cumulative response curve

FC fully connected

FDR false discovery rate

IoU intersection over union

GPU graphics processing unit

ROC receiver operating characteristic

5 General principles

5.1 Generalized process for machine learning classification performance assessment

A generalized process for machine learning classification performance assessment is shown in Figure 1.

iTeh STANDARD PREVIEW

(standards.iteh.ai)
Figure 1 — Generalized process for machine learning classification performance assessment
ISO/IEC TS 4213:2022
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
Step 1: Determine evaluation tasks
iec-ts-4213-2022
Determine the appropriate classification task or tasks for the evaluation.
Step 2: Specify metrics
Based on the classification task, specify the required metric or metrics.
Step 3: Conduct evaluation
Create the evaluation plan, implement the evaluation environment including software and hardware,
prepare datasets and process datasets.
Step 4: Collect and analyse data
According to the specified metrics, collect model outputs such as classification predictions for each
sample.
Step 5: Generate evaluation results
Generate evaluation results based on specified metrics and other relevant information.

5.2 Purpose of machine learning classification performance assessment

The purpose of the assessment and its baseline requirements can vary greatly depending on whether it
applies to the "design and development" or "verification and validation” stage.
The purpose of assessment during the “design and development” stage is to optimize hyperparameters
to achieve the best classification performance. The purpose of assessment during the "verification and
validation" stage is to estimate the trained model performance.

ISO/IEC TS 4213:2022(E)

Performance assessment can be applied for several purposes, including:

— model assessment, to know how good the model is, how reliable the model’s predictions are, or the
expected frequency and size of errors;
— model comparison, to compare two or more models in order to choose between them;
— out-of-sample and out-of-time comparisons, to check that performance has not degraded with new
production data.

5.3 Control criteria in machine learning classification performance assessment

5.3.1 General

When assessing machine learning classification performance, consistent approaches and methods
should be applied to demonstrate relevance, legitimacy and extensibility. Special care should be taken
in comparative assessments of multiple machine learning classification models, algorithms or systems
to ensure that no approach is favoured over another.

5.3.2 Data representativeness and bias

Except when done for specific goal-relevant reasons, the training and test data should be as free of
sampling bias as possible. That is, the distribution of features and classes in the training data should be
matched to their distribution in the real world to the extent possible. The training data does not need
iTeh STANDARD PREVIEW
to match the eventual use case exactly. For example, in the case of self-driving cars, it can be acceptable
to assess the classification performance of machine learning models trained on closed-circuit tracks
(standards.iteh.ai)
rather than on open roads for prototype systems. The data used to test a machine learning model
should be representative of the intended use of the system.
Data can be skewed, incomplete, outdated, disproportionate or have embedded historical biases.
Such unwanted biases can propagate ISO/IEC TS 4213:2022
biases present in the training data and are detrimental to model
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
training. If the machine learning operating environment is complex and nuanced, limited training data
iec-ts-4213-2022
will not necessarily reflect the full range of input data. Moreover, training data for a particular task
is not necessarily extensible to different tasks. Extra care should be taken when splitting unbalanced
data into training and test to ensure that similar distributions are maintained between training data,
validation data and test data.
Data capture bias can be based on both the collection device and the collector’s preferences. Label
biases can occur if categories are poorly defined (e.g. similar images can be annotated with different
labels while, due to in-class variability, the same labels can be assigned to visually different images).
For more information on bias in AI systems, see ISO/IEC TR 24027[1].

5.3.3 Preprocessing

Special care should be taken in preprocessing and its impact on performance assessment, especially
in the case of comparative assessment. Depending on the purpose of the evaluation, inconsistent
preprocessing can lead to biased interpretation of the results. In particular, when preprocessing favours
one model over another, their performance gap should not be attributed to the downstream algorithms.
Examples of preprocessing include removal of outliers, resolving incomplete data or filtering out noise.

5.3.4 Training data

Special care should be taken in the choice of training and validation data and how the choice impacts
performance assessment, especially in the case of comparative assessment. Depending on the purpose
of the evaluation, the use of different training data can lead to a biased interpretation of the results. In
particular, in such cases any performance gap should be attributed to the combination of the algorithm
and training data, rather than to just the algorithm.

ISO/IEC TS 4213:2022(E)

In the context of model comparison, the training data used to build the respective models can differ.
One can take two models, trained on different training data, and evaluate them against each other on
the same test data.

5.3.5 Test and validation data

The data used to test a machine learning model shall be the same for all machine learning models being
compared. The test and validation data shall contain no samples that overlap with training data.

5.3.6 Cross-validation

Cross-validation is a method to estimate the performance of a machine learning method using a single
dataset.
The dataset is divided into k segments, where one segment is used for test while the rest is used for
training. This process is repeated k times, each time using another segment as the test set. When k is
equal to N, the size as the dataset, this is called leave-one-out cross-validation. When k is smaller than
N, this is called k-fold cross-validation.
It can be of interest to compare the performance of different cross-validation techniques when all
other variables are controlled. However, models whose performance is being compared should not
use different cross-validation techniques (e.g. it is not appropriate to compare Model A k-fold cross-
validation results against the mean of Model B single train-test split results).

iTeh STANDARD PREVIEW

The primary use of cross-validation is for validating design choices such as hyperparameter values,
by comparing their overall effect on various models. That is why it is typical to retrain a model on the
full dataset after that validation, using the hyperparameters that performed best on average. However,
(standards.iteh.ai)
cross-validation does not provide a performance assessment of that final model, and extrapolating
performance from the output of cross-validation is a rough approximation with no guarantee of
faithfulness.
ISO/IEC TS 4213:2022
Another https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
use of cross-validation is for comparative evaluation of machine learning algorithms, without
subsequently training a final model. An algorithm is considered to outperform another if on average its
iec-ts-4213-2022
resulting models perform best.

5.3.7 Limiting information leakage

Information leakage occurs when a machine learning algorithm uses information not in the training
data to create a machine learning model.
Information leakage is often caused when training data includes information not available during
production. In an evaluation, information leakage can result in a machine learning model’s classification
accuracy being overstated. A model trained under these conditions will typically not generalize well.
Evaluations should be designed to prevent information leakage between training and test data.
EXAMPLE A machine learning model can be designed to classify between native and non-native Spanish
speakers, using multiple audio samples from each subject. Some observation features, such as vowel enunciation,
are potentially useful for this type of speaker classification. However, such features can also be used to identify
the specific speaker. The model can use identity-based information to accurately classify test data, even though
this information would not be available in production systems. The solution would be to not include the same
subject in both training and test data, even if the training and test samples differ.

5.3.8 Limiting channel effects

A channel effect is a characteristic of data that reflects how data were collected as opposed to what
data were collected. Channel effects can cause machine learning classification algorithms to learn
irrelevant characteristics from training data as opposed to relevant content, which in turn can lead to
poor machine learning classification performance.

ISO/IEC TS 4213:2022(E)

Channel effects can be caused by the mechanism used to acquire data, preprocessing applied to data,
the identity of the individual obtaining data, and environmental conditions under which data were
acquired, among other factors.
The data should be as free of channel effects as possible. Controlling channel effects in training data
contributes to better performance. Controlling channel effects in test data enables higher-quality
assessments.
NOTE One method of reducing channel effects is to balance channel distributions for each class in the data.

Reporting should describe known channel effects introduced to the training data. Channel effects
should be accounted for during statistical significance testing (see Clause 7).
EXAMPLE A vision-based system can be designed to distinguish between images of cats and dogs. However,
if all “cat” images are high-resolution, and all “dog” images are low-resolution, a machine learning classifier can
learn to classify images based on resolution as opposed to content.

5.3.9 Ground truth

Ground truth is the value of the target variable for a particular item of labelled input data. Cleanliness
in ground truth can affect classification performance measurement. When assessing classification
performance, a strong generalizable ground truth should be established.
General agreement on an aggregated ground truth can be quantified using measurements of agreement
such as Cohen's kappa coefficient.
iTeh STANDARD PREVIEW
In some domains (e.g. medical), inter-annotator variation can be significant, especially in tasks where
team-based consensus is involved.
(standards.iteh.ai)
5.3.10 Machine learning algorithms, hyperparameters and parameters

Most machine learning algorithms have ISO/IEC TS 4213:2022

characteristics that affect their learning processes, known as
https://standards.iteh.ai/catalog/standards/sist/1a1e419a-2a6a-4ebb-a7f2-a33d0cef775c/iso-
hyperparameters. Machine learning algorithms use hyperparameters and training data to establish
internal parameters. The manner in which iec-ts-4213-2022
these parameters are computed can vary. For example,
generative algorithms can optimize parameters such that the probability of the available training data
is maximized, whereas discriminative algorithms can optimize parameters to maximize classification
accuracy.
Hyperparameter types should be reported for all machine learning algorithms in an assessment, as
well as hyperparameter values for each machine learning model.
Hyperparameter selection bias should be taken into account when machine learning models are
compared. Different machine learning algorithms can have different numbers of hyperparameters
with different adjustment capabilities. The degree of overfitting in the training process can then differ
across machine learning algorithms.
This is especially pronounced in deep learning with its many combinations of architectures, activation
functions, learning rates and regularization parameters. No information from the test set shall be used
when adjusting hyperparameters, as this typically leads to over-optimistic performance estimation.
When label information is needed for such tuning, it is typically drawn from a separate set of data,
called the validation set, which is disjoint from the test set.
This challenge can be addressed through approaches such as nested cross-validation. In this
training process, an outer loop measures prediction performance while an inner loop adjusts the
hyperparameters of the individual models. In this fashion, methods can choose optimal settings for
building predictive models in the outer loop.
See Annex C for summary information on selected machine learning classification benchmark tests,
including model parameters and values associated with performance against various datasets.

ISO 5338 Highlights
No ratings yet
ISO 5338 Highlights
7 pages
Book - 1986 - Stan Barker - The Signs of The Times - The Neptune Factor and America's Destiny
No ratings yet
Book - 1986 - Stan Barker - The Signs of The Times - The Neptune Factor and America's Destiny
338 pages
Iso Iec TR 5469-2024
100% (1)
Iso Iec TR 5469-2024
82 pages
Sepharial - The Science of Foreknowledge
No ratings yet
Sepharial - The Science of Foreknowledge
115 pages
ISO 42006 Draft
100% (4)
ISO 42006 Draft
37 pages
Timing Solution: Bookmarks
No ratings yet
Timing Solution: Bookmarks
188 pages
Iso-Iec TS 8200 2024
No ratings yet
Iso-Iec TS 8200 2024
41 pages
Framework For Artificial Intelligence
100% (5)
Framework For Artificial Intelligence
44 pages
Computer Vision Based Fruit Sorting and Grading System 2024 11-16-15!39!31 Export
No ratings yet
Computer Vision Based Fruit Sorting and Grading System 2024 11-16-15!39!31 Export
242 pages
Astro-Logy Use Your Brain - Beat Everybody With Vedic Tropical Astrology - Real Jyotish - Cloud Version
No ratings yet
Astro-Logy Use Your Brain - Beat Everybody With Vedic Tropical Astrology - Real Jyotish - Cloud Version
65 pages
Forcasting
100% (1)
Forcasting
27 pages
Burn Rate Book
100% (1)
Burn Rate Book
1,001 pages
69r-12 Cecs Epc Hydro
No ratings yet
69r-12 Cecs Epc Hydro
23 pages
Swisseph
No ratings yet
Swisseph
98 pages
Iso TS 22003 2013
No ratings yet
Iso TS 22003 2013
12 pages
Iso 230 2 2014
No ratings yet
Iso 230 2 2014
12 pages
LMIs in Control
No ratings yet
LMIs in Control
521 pages
Chapter 13 SB Answers
50% (2)
Chapter 13 SB Answers
11 pages
The Ultimate Book On Stock Market Timing Volume 3 Geocosmic Correlations To Trading Cycles
No ratings yet
The Ultimate Book On Stock Market Timing Volume 3 Geocosmic Correlations To Trading Cycles
227 pages
Iso Iec TR 27563-2023
No ratings yet
Iso Iec TR 27563-2023
36 pages
Astro
100% (1)
Astro
4 pages
Weights and Measures - Norzagaray - 06 July 2017
No ratings yet
Weights and Measures - Norzagaray - 06 July 2017
146 pages
3rd DimensionTank Calibration
No ratings yet
3rd DimensionTank Calibration
32 pages
Ashutosh Kumar Shukla - Spectroscopic Techniques &amp - Artificial Intelligence For Food and Beverage Analysis-Springer Singapore - Springer (2020)
100% (1)
Ashutosh Kumar Shukla - Spectroscopic Techniques &amp - Artificial Intelligence For Food and Beverage Analysis-Springer Singapore - Springer (2020)
126 pages
Iso Iec TS 17021 14 2022
No ratings yet
Iso Iec TS 17021 14 2022
10 pages
Weather Course
No ratings yet
Weather Course
4 pages
Natal Chart Reading
No ratings yet
Natal Chart Reading
13 pages
Appliedpsycholo01psycgoog PDF
No ratings yet
Appliedpsycholo01psycgoog PDF
170 pages
2016 Nautical Almanac
No ratings yet
2016 Nautical Almanac
251 pages
Draft 42005 - Iso - Iec Fdis 42005 - Iso-Iec-Fdis-42005
No ratings yet
Draft 42005 - Iso - Iec Fdis 42005 - Iso-Iec-Fdis-42005
12 pages
Introduction To Deep Space Astrology
100% (1)
Introduction To Deep Space Astrology
17 pages
Introduction To Metrology
No ratings yet
Introduction To Metrology
55 pages
Continue Studying William Gann's Theories - Part4
No ratings yet
Continue Studying William Gann's Theories - Part4
8 pages
ISO IEC 27001 2022 Amd 1 2024
No ratings yet
ISO IEC 27001 2022 Amd 1 2024
3 pages
Measurements and Uncertainties
No ratings yet
Measurements and Uncertainties
197 pages
Wavelets: A Second Breath For Models Based On Fixed Cycles
No ratings yet
Wavelets: A Second Breath For Models Based On Fixed Cycles
44 pages
Gold-Silver, 1257-2008
No ratings yet
Gold-Silver, 1257-2008
122 pages
Furniture Plans How To Build A Rocking Chair
No ratings yet
Furniture Plans How To Build A Rocking Chair
10 pages
Bill Meridians Planetary Stock Trading PDF
No ratings yet
Bill Meridians Planetary Stock Trading PDF
192 pages
Cl-Molec Cap Assay Check List
0% (1)
Cl-Molec Cap Assay Check List
74 pages
Nse Astro
No ratings yet
Nse Astro
5 pages
Gann Part 05
No ratings yet
Gann Part 05
24 pages
Iso Iec 23053-2022
No ratings yet
Iso Iec 23053-2022
44 pages
QA4AI Guideline 202201 en
No ratings yet
QA4AI Guideline 202201 en
237 pages
Labonne 2020
No ratings yet
Labonne 2020
122 pages
What Is A Lunar Standstill III
No ratings yet
What Is A Lunar Standstill III
30 pages
Astro Basics
No ratings yet
Astro Basics
4 pages
Astrologie IAU2009 Eng
No ratings yet
Astrologie IAU2009 Eng
121 pages
Book
No ratings yet
Book
96 pages
p6 Aionfpga Thesis Canzani Mueller
No ratings yet
p6 Aionfpga Thesis Canzani Mueller
91 pages
Pyfeats
No ratings yet
Pyfeats
45 pages
Sun Storms
100% (1)
Sun Storms
72 pages
Aquasim Manual
No ratings yet
Aquasim Manual
219 pages
The Silver Key - A Guide To Speculations by Sepharial PDF
No ratings yet
The Silver Key - A Guide To Speculations by Sepharial PDF
72 pages
Curti 2020
No ratings yet
Curti 2020
12 pages
Solar Fire Interpretations Report Standard Natal Interpretations
100% (1)
Solar Fire Interpretations Report Standard Natal Interpretations
17 pages
Errors in Chemical Analyses
No ratings yet
Errors in Chemical Analyses
11 pages
PISA 2022 Technical Standards
No ratings yet
PISA 2022 Technical Standards
27 pages
4 ANGLES DE GANN (1x3) ENGLISH 27
No ratings yet
4 ANGLES DE GANN (1x3) ENGLISH 27
7 pages
Behaviour of Water in Different Types of Goats' Cheese
No ratings yet
Behaviour of Water in Different Types of Goats' Cheese
7 pages
Carnes, Todd - 2009 - How To Use An Ephemeris
No ratings yet
Carnes, Todd - 2009 - How To Use An Ephemeris
5 pages
Fphar 13 853023
No ratings yet
Fphar 13 853023
21 pages
Speed of Sound Lab Report
No ratings yet
Speed of Sound Lab Report
4 pages
Clinical Chemistry I
No ratings yet
Clinical Chemistry I
16 pages
Cybersecurity of Artificial Intelligence in The Ai-KJNA31643ENN
No ratings yet
Cybersecurity of Artificial Intelligence in The Ai-KJNA31643ENN
2 pages
Transit Hit List 3
No ratings yet
Transit Hit List 3
11 pages
Sist en Iso Iec 23053 2023
No ratings yet
Sist en Iso Iec 23053 2023
12 pages
Analysis and Classification of 1H-NMR Spectra by Multifractal Analysis
No ratings yet
Analysis and Classification of 1H-NMR Spectra by Multifractal Analysis
18 pages
Analog Devices How To Successfully Calibrate An Open Loop DAC Signal Chain
No ratings yet
Analog Devices How To Successfully Calibrate An Open Loop DAC Signal Chain
6 pages
A Blockchain Based System For Agri Food Supply Chain Traceability Management
No ratings yet
A Blockchain Based System For Agri Food Supply Chain Traceability Management
21 pages
KJS 2343/KJS 3113 Industrial Measurement and Instrumentation
No ratings yet
KJS 2343/KJS 3113 Industrial Measurement and Instrumentation
46 pages
Kiran TAEDEL402 Portfolio Assessment Portions Requiring Additional Evidence Highlighted
No ratings yet
Kiran TAEDEL402 Portfolio Assessment Portions Requiring Additional Evidence Highlighted
64 pages
Ec 20000
No ratings yet
Ec 20000
7 pages
Napoleons Astro Car To GR Ah Phy
No ratings yet
Napoleons Astro Car To GR Ah Phy
5 pages
Marcheafave2023NMR ASCA Ilex-Paraguariensis
No ratings yet
Marcheafave2023NMR ASCA Ilex-Paraguariensis
11 pages
Your Horoscope, Astrological Sign, Ascendant, and Natal Chart With Astrotheme
No ratings yet
Your Horoscope, Astrological Sign, Ascendant, and Natal Chart With Astrotheme
1 page
843 - Ai - MS 2 2024-25
No ratings yet
843 - Ai - MS 2 2024-25
5 pages
Convolutional Neural Network Project On Image Classification
No ratings yet
Convolutional Neural Network Project On Image Classification
8 pages
Proshot Gen4 Surveying Instrument: Precision Instrumentation For The Mining Industry
No ratings yet
Proshot Gen4 Surveying Instrument: Precision Instrumentation For The Mining Industry
2 pages
Spectrum Current
No ratings yet
Spectrum Current
1 page
PAG Activity For Biological Molecules
No ratings yet
PAG Activity For Biological Molecules
19 pages
Coordinates and Time
No ratings yet
Coordinates and Time
8 pages
Oscar Pistorius A Martian Tragedy
No ratings yet
Oscar Pistorius A Martian Tragedy
9 pages
Chemistrylabreport Jasminechao
No ratings yet
Chemistrylabreport Jasminechao
10 pages
Swephr Swiss Ephemeris
No ratings yet
Swephr Swiss Ephemeris
20 pages
Timing Solutions For Swing Traders - 2012 - Lee - Ruling Planets of The Natural Horoscope
No ratings yet
Timing Solutions For Swing Traders - 2012 - Lee - Ruling Planets of The Natural Horoscope
2 pages
02 - Statistical Analysis - Chem32 PDF
No ratings yet
02 - Statistical Analysis - Chem32 PDF
12 pages
Babylon
No ratings yet
Babylon
33 pages
Jackel 2017 - ImpliedNormalVolatility
No ratings yet
Jackel 2017 - ImpliedNormalVolatility
3 pages
Design KVP Meter
No ratings yet
Design KVP Meter
3 pages
Gilthead Sea Bream, Sparus Aurata: Life Cycle
No ratings yet
Gilthead Sea Bream, Sparus Aurata: Life Cycle
2 pages
Data 2
No ratings yet
Data 2
1 page
Alain Aspect PDF
No ratings yet
Alain Aspect PDF
4 pages
360 Degree Price Analysis: 180 Degrees 1/2 110.68
No ratings yet
360 Degree Price Analysis: 180 Degrees 1/2 110.68
3 pages
Keller Ress Elease: Reference Pressure Transmitters Top-Class Pressure Transmitters
No ratings yet
Keller Ress Elease: Reference Pressure Transmitters Top-Class Pressure Transmitters
2 pages
Retrograde S 2016
No ratings yet
Retrograde S 2016
5 pages
New Product Review-Market Analyst Software-Gann Analyst Program
No ratings yet
New Product Review-Market Analyst Software-Gann Analyst Program
7 pages
IELTS Writing Band Descriptors
No ratings yet
IELTS Writing Band Descriptors
2 pages
The Elements of Reasoning and The Intellectual Standards
No ratings yet
The Elements of Reasoning and The Intellectual Standards
2 pages

Iso Iec TS 4213 2022

Uploaded by

Iso Iec TS 4213 2022

Uploaded by

TECHNICAL ISO/IEC TS

Information technology — Artificial

iTeh STANDARD PREVIEW

iTeh STANDARD PREVIEW

COPYRIGHT PROTECTED DOCUMENT

ii ﻿ © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved ﻿ iii

6.5 Multi-label classification.............................................................................................................................................................. 14

iv ﻿ © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved ﻿ v

vi ﻿ © ISO/IEC 2022 – All rights reserved

Information technology — Artificial intelligence

3.1 Classification and related terms

3.2 Metrics and related terms

© ISO/IEC 2022 – All rights reserved ﻿ 1

Note 1 to entry: It is calculated as a = ((standards.iteh.ai)

Note 1 to entry: It is calculated as F1 = 2TP / ( 2TP + FP + FN ) .

Note 1 to entry: It is calculated as p = TP / (TP + FP ) .

Note 1 to entry: It is calculated as r = TP / (TP + FN ) .

2 ﻿ © ISO/IEC 2022 – All rights reserved

Note 1 to entry: It is calculated as s = TN / (TN + FP ) .

Note 1 to entry: It is calculated as FP,R = FP / ( FP + TN ) .

ANOVA analysis of variance

AUPRC area under the precision recall curve

© ISO/IEC 2022 – All rights reserved ﻿ 3

AUROC area under the receiver operating characteristic curve

CLT central limit theorem

CPU central processing unit

CRC cumulative response curve

FDR false discovery rate

IoU intersection over union

GPU graphics processing unit

ROC receiver operating characteristic

5.1 Generalized process for machine learning classification performance assessment

iTeh STANDARD PREVIEW

5.2 Purpose of machine learning classification performance assessment

4 ﻿ © ISO/IEC 2022 – All rights reserved

Performance assessment can be applied for several purposes, including:

5.3 Control criteria in machine learning classification performance assessment

5.3.2 Data representativeness and bias

5.3.4 Training data

© ISO/IEC 2022 – All rights reserved ﻿ 5

5.3.5 Test and validation data

iTeh STANDARD PREVIEW

5.3.7 Limiting information leakage

5.3.8 Limiting channel effects

6 ﻿ © ISO/IEC 2022 – All rights reserved

5.3.9 Ground truth

Most machine learning algorithms have ISO/IEC TS 4213:2022

© ISO/IEC 2022 – All rights reserved ﻿ 7

You might also like

ii © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved iii

iv © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved v

vi © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved 1

2 © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved 3

4 © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved 5

6 © ISO/IEC 2022 – All rights reserved

© ISO/IEC 2022 – All rights reserved 7