A Semantics Aware Random Forest for Text Classification.

A semantics aware random forest for text classification

MZ Islam, J Liu, J Li, L Liu, W Kang - Proceedings of the 28th ACM …, 2019 - dl.acm.org

Proceedings of the 28th ACM international conference on information and …, 2019•dl.acm.org

The Random Forest (RF) classifiers are suitable for dealing with the high dimensional noisy data in text classification. An RF model comprises a set of decision trees each of which is trained using random subsets of features. Given an instance, the prediction by the RF is obtained via majority voting of the predictions of all the trees in the forest. However, different test instances would have different values for the features used in the trees and the trees should contribute differently to the predictions. This diverse contribution of the trees is not considered in traditional RFs. Many approaches have been proposed to model the diverse contributions by selecting a subset of trees for each instance. This paper is among these approaches. It proposes a Semantics Aware Random Forest (SARF) classifier. SARF extracts the features used by trees to generate the predictions and selects a subset of the predictions for which the features are relevant to the predicted classes. We evaluated SARF's classification performance on real-world text datasets and assessed its competitiveness with state-of-the-art ensemble selection methods. The results demonstrate the superior performance of the proposed approach in textual information retrieval and initiate a new direction of research to utilise interpretability of classifiers.

ACM Digital Library

Show moreShow less

Save Cite Cited by 94 Related articles All 3 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

A semantics aware random forest for text classification