Authors:
Binh Dang
1
;
Tran-Thai Dang
2
and
Le-Minh Nguyen
1
Affiliations:
1
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
;
2
Vingroup Big Data Institute, Hanoi, Vietnam
Keyword(s):
Sub-word, Topic Modeling, Sentence Transformer, Bi-encoder, Semantic Similarity Detection.
Abstract:
Topic information has been useful for semantic similarity detection. In this paper, we present a study on a novel and efficient method to incorporate the topic information with Transformer-based models, which is called the Sub-word Latent Topic and Sentence Transformer (SubTST). The proposed model basically inherits the advantages of the SBERT (Reimers and Gurevych, 2019) architecture, and learns latent topics in the sub- word level instead of the document or word levels as previous work. The experimental results illustrate the effectiveness of our proposed method that significantly outperforms the SBERT, and the tBERT (Peinelt et al., 2020), two state-of-the-art methods for semantic textual detection, on most of the benchmark datasets.