The Prevalence of Errors in Machine Learning Experiments

Shepperd, Martin; Guo, Yuchen; Li, Ning; Arzoky, Mahir; Capiluppi, Andrea; Counsell, Steve; Destefanis, Giuseppe; Swift, Stephen; Tucker, Allan; Yousefi, Leila

Computer Science > Machine Learning

arXiv:1909.04436 (cs)

[Submitted on 10 Sep 2019]

Title:The Prevalence of Errors in Machine Learning Experiments

Authors:Martin Shepperd, Yuchen Guo, Ning Li, Mahir Arzoky, Andrea Capiluppi, Steve Counsell, Giuseppe Destefanis, Stephen Swift, Allan Tucker, Leila Yousefi

View PDF

Abstract:Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments.

Comments:	20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 14--16 November 2019
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applications (stat.AP); Machine Learning (stat.ML)
Cite as:	arXiv:1909.04436 [cs.LG]
	(or arXiv:1909.04436v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.04436

Submission history

From: Martin Shepperd [view email]
[v1] Tue, 10 Sep 2019 12:32:00 UTC (26 KB)

Computer Science > Machine Learning

Title:The Prevalence of Errors in Machine Learning Experiments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Prevalence of Errors in Machine Learning Experiments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators