A Comparison of Mining Incomplete and Inconsistent Data
DOI:
https://doi.org/10.5755/j01.itc.46.2.17330Keywords:
Incomplete data, lost values, \do not care" conditions, in- consistent data, rough set theory, probabilistic approximations, MLEM2 rule induction algorithm.Abstract
We present experimental results on a comparison of incom-pleteness and inconsistency. We used two interpretations of missing at-tribute values: lost values and "do not care" conditions. Our experimentswere conducted on 204 data sets, including 71 data sets with lost val-ues, 71 data sets with "do not care" conditions and 62 inconsistent datasets, created from eight original numerical data sets. We used the Modified Learning from Examples Module version 2 (MLEM2) rule inductionalgorithm for data mining, combined with three types of probabilisticapproximations: lower, middle and upper. We used an error rate, com-puted by ten-fold cross validation, as the criterion of quality. There isexperimental evidence that incompleteness is worse than inconsistencyfor data mining (two-tailed test, 5% level of signicance). Additionally,lost values are better than "do not care" conditions, again, with regardsto the error rate, and there is a little dierence in an error rate betweenthree types of probabilistic approximations.
DOI: http://dx.doi.org/10.5755/j01.itc.46.2.17330
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.