.. automodule:: Orange.evaluation.scoring

Method scoring (`scoring`)

Scoring plays and integral role in evaluation of any prediction model. Orange implements various scores for evaluation of classification, regression and multi-label models. Most of the methods needs to be called with an instance of :obj:`~Orange.evaluation.testing.ExperimentResults`.

.. literalinclude:: code/scoring-example.py

Classification

Calibration scores

Many scores for evaluation of the classification models measure whether the model assigns the correct class value to the test instances. Many of these scores can be computed solely from the confusion matrix constructed manually with the :obj:`confusion_matrices` function. If class variable has more than two values, the index of the value to calculate the confusion matrix for should be passed as well.

.. autofunction:: CA

.. autofunction:: Sensitivity

.. autofunction:: Specificity

.. autofunction:: PPV

.. autofunction:: NPV

.. autofunction:: Precision

.. autofunction:: Recall

.. autofunction:: F1

.. autofunction:: Falpha

.. autofunction:: MCC

.. autofunction:: AP

.. autofunction:: IS

.. autofunction:: confusion_chi_square

Discriminatory scores

Scores that measure how good can the prediction model separate instances with different classes are called discriminatory scores.

.. autofunction:: Brier_score

.. autofunction:: AUC

.. autofunction:: AUC_for_single_class

.. autofunction:: AUC_matrix

.. autofunction:: AUCWilcoxon

.. autofunction:: compute_ROC

.. autofunction:: confusion_matrices

.. autoclass:: ConfusionMatrix

Comparison of Algorithms

.. autofunction:: McNemar

.. autofunction:: McNemar_of_two

Regression

Several alternative measures, as given below, can be used to evaluate the sucess of numeric prediction:

.. autofunction:: MSE

.. autofunction:: RMSE

.. autofunction:: MAE

.. autofunction:: RSE

.. autofunction:: RRSE

.. autofunction:: RAE

.. autofunction:: R2

The following code (:download:`statExamples.py <code/statExamples.py>`) uses most of the above measures to score several regression methods.

The code above produces the following output:

Learner   MSE     RMSE    MAE     RSE     RRSE    RAE     R2
maj       84.585  9.197   6.653   1.002   1.001   1.001  -0.002
rt        40.015  6.326   4.592   0.474   0.688   0.691   0.526
knn       21.248  4.610   2.870   0.252   0.502   0.432   0.748
lr        24.092  4.908   3.425   0.285   0.534   0.515   0.715

Ploting functions

.. autofunction:: graph_ranks

The following script (:download:`statExamplesGraphRanks.py <code/statExamplesGraphRanks.py>`) shows hot to plot a graph:

.. literalinclude:: code/statExamplesGraphRanks.py

Code produces the following graph:

.. autofunction:: compute_CD

.. autofunction:: compute_friedman

Utility Functions

.. autofunction:: split_by_iterations

Multi-label classification

Multi-label classification requires different metrics than those used in traditional single-label classification. This module presents the various metrics that have been proposed in the literature. Let D be a multi-label evaluation data set, conisting of |D| multi-label examples (x_i,Y_i), i=1..|D|, Y_i \\subseteq L. Let H be a multi-label classifier and Z_i=H(x_i) be the set of labels predicted by H for example x_i.

.. autofunction:: mlc_hamming_loss

.. autofunction:: mlc_accuracy

.. autofunction:: mlc_precision

.. autofunction:: mlc_recall

The following script demonstrates the use of those evaluation measures:

.. literalinclude:: code/mlc-evaluate.py

The output should look like this:

loss= [0.9375]
accuracy= [0.875]
precision= [1.0]
recall= [0.875]

References

Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification', Pattern Recogintion, vol.37, no.9, pp:1757-71

Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004)

Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a bossting-based system for text categorization', Machine Learning, vol.39, no.2/3, pp:135-68.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orange.evaluation.scoring.rst

Orange.evaluation.scoring.rst

Method scoring (`scoring`)

Classification

Calibration scores

Discriminatory scores

Comparison of Algorithms

Regression

Ploting functions

Utility Functions

Multi-label classification

References

Files

Orange.evaluation.scoring.rst

Latest commit

History

Orange.evaluation.scoring.rst

File metadata and controls

Method scoring (scoring)

Classification

Calibration scores

Discriminatory scores

Comparison of Algorithms

Regression

Ploting functions

Utility Functions

Multi-label classification

References

Method scoring (`scoring`)