Google Scholar

The role of biomedical dataset in classification

AK Tanwani, M Farooq - Conference on Artificial Intelligence in Medicine …, 2009 - Springer

Conference on Artificial Intelligence in Medicine in Europe, 2009•Springer

Abstract

In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. We quantify the complexity of a biomedical dataset using five complexity measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing values and information gain. The effect of these complexity measures on classification accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree), SMO (support vector machines), Naive Bayes (probabilistic), IBk (instance based learner) and JRIP (rule-based induction). The results of our experiments show that noise and correlation-based feature selection subset merit – not a particular choice of algorithm – play a major role in determining the classification accuracy. In the end, we provide researchers with a meta-model and an empirical equation to estimate the classification potential of a dataset on the basis of its complexity. This well help researchers to efficiently pre-process the dataset for automatic knowledge extraction.

Springer

Show moreShow less

Save Cite Cited by 31 Related articles All 11 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

The role of biomedical dataset in classification