The role of biomedical dataset in classification
AK Tanwani, M Farooq - Conference on Artificial Intelligence in Medicine …, 2009 - Springer
Conference on Artificial Intelligence in Medicine in Europe, 2009•Springer
In this paper, we investigate the role of a biomedical dataset on the classification accuracy of
an algorithm. We quantify the complexity of a biomedical dataset using five complexity
measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing
values and information gain. The effect of these complexity measures on classification
accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree),
SMO (support vector machines), Naive Bayes (probabilistic), IB k (instance based learner) …
an algorithm. We quantify the complexity of a biomedical dataset using five complexity
measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing
values and information gain. The effect of these complexity measures on classification
accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree),
SMO (support vector machines), Naive Bayes (probabilistic), IB k (instance based learner) …
Abstract
In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. We quantify the complexity of a biomedical dataset using five complexity measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing values and information gain. The effect of these complexity measures on classification accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree), SMO (support vector machines), Naive Bayes (probabilistic), IBk (instance based learner) and JRIP (rule-based induction). The results of our experiments show that noise and correlation-based feature selection subset merit – not a particular choice of algorithm – play a major role in determining the classification accuracy. In the end, we provide researchers with a meta-model and an empirical equation to estimate the classification potential of a dataset on the basis of its complexity. This well help researchers to efficiently pre-process the dataset for automatic knowledge extraction.
Springer
Showing the best result for this search. See all results