Authors:
Kotaro Ii
1
;
Hiroto Saigo
1
and
Yasuo Tabei
2
Affiliations:
1
School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
;
2
Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
Keyword(s):
Text Classification, Text Retrieval, Suffix Tree, Branch and Bound, SEQL, fastText.
Abstract:
Text classification and retrieval have been crucial tasks in natural language processing. In this paper, we present novel techniques for these tasks by leveraging the invariance of feature order to the evaluation results. Building on the assumption that text retrieval or classification models have already been constructed from the training documents, we propose efficient approaches that can restrict the search space spanned by the test documents. Our approach encompasses two key contributions. The first contribution introduces an efficient method for traversing a search tree, while the second contribution involves the development of novel pruning conditions. Through computational experiments using real-world datasets, we consistently demonstrate that the proposed approach outperforms the baseline method in various scenarios, showcasing its superior speed and efficiency.