rs
rs
3
AISIA Research Lab, Ho Chi Minh City, Vietnam
1 Introduction
Recommendation systems have become more and more popular in almost all
fields, and people are using these systems in different industries such as retail,
media, news, streaming service, and e-commerce. See the benefit of recommen-
dation systems in the development of the economy; many companies built rec-
ommendation systems that utilize historical data of customers to give them some
?
Corresponding Author: Binh T. Nguyen ([email protected])
2 Duc et al.
2 Related work
The idea of the paper recommendation system had developed by Wang, and his
coworker in computer science publications [19]. Wang used the Chi-square statis-
tic and the term frequency-inverse document frequency (TF-IDF) for feature
selection from the abstract of each paper submission and utilized linear logistic
regression to classify relevant journals or conferences. The accuracy in his study
is 61.37% for the Top 3 recommended results. Son and colleagues later developed
and proposed a new approach to improve the paper submission recommendation
algorithm’s performance using other additional features. The proposed method
in [17] uses TF-IDF, the Chi-square statistic, and the one-hot encoding tech-
nique to extract parts from available information in each paper. They applied
two machine learning models, namely Logistics Linear Regression (LLR) and
Multi-layer Perceptrons (MLP), to the different combinations of features from
the paper submission. Their proposed methods achieved outperformed results
for the Top 3 accuracy, especially 89.07% for the LLR model and 88.60% for the
MLP model when using a group of features, including the title, abstract, and
keywords.
Regarding the problem, Dac et al. lately proposed a new approach [12] by
applying two embedding methods, GloVe [14] and FastText [2], combining to
Convolutional Neural Network (CNN) [9] and LSTM [6] for feature extraction.
They considered seven features groups: title, abstract, keywords, title + keyword,
title + abstract, keyword + abstract, and title + keyword + abstract for training
progress. The experimental results show that the combination of S2RSCS [17]
and CNN + FastText, namely the proposed S2CFT [12] model has the best
performance with the Top 1 accuracy is 68.11% when using a mixture of attribute
title + keyword + abstract, the accuracy at Top 3, 5, and 10 are 90.8%, 96.25%,
and 99.21% respectively.
Son and his coworkers continued to propose a new approach to the paper
recommendation system. In their study [7], besides valuable papers’ attributes,
they used additional information from the “Aims and Scopes” of journals for
input data. They collected a dataset containing 414512 articles and the cor-
responding aims and scopes in the journal of these papers. They built a new
architecture that still uses FastText [2] for embedding, and the input data is
available information of paper submission; they created a new feature by mea-
suring the similarity between paper submission and the journals’ aims & scopes.
Their proposed method is a practical approach to solving the paper submission
recommendation problem.
In recent years, transformer architectures have succeeded in various fields of
natural language processing (NLP) and computer vision (CV). One of the most
significant works dedicated to that success is the Vanilla Transformer architec-
ture [18]. It is active through adopting the attention mechanism and differentially
weighting the significance of each part of the input data. The popularity and ef-
ficiency of the transformer models are unquestionable; it contributes to carrying
many other models taken off, the most famous being BERT (Bidirectional En-
coder Representations from Transformers) [4]. BERT is pre-trained with large
4 Duc et al.
3 Methodology
In this section, we present our approach for the paper submission recommenda-
tion problem and the evaluation metrics.
3.2 Modeling
(TAK). The fine-tuned LM plays as the encoder to encode the batch of inputs
into 768-dimensional embeddings, which are further forward propagated through
an additional linear layer with ReLU activation and dropout to avoid overfitting.
The last linear layer with Softmax activation acts as a classifier to output the
N-probabilities that the given paper could belong to the respective journal. To
identify the K-relevant journals that maximize the probability of acceptance, we
choose the Top K of maximum values. The illustration for this case is in Figure
2.
Models using paper information and Journals’ Aims & Scope Be-
sides the available attributes on paper, we use Journals’ Aims & Scopes as po-
tential external features. We end up with seven new combinations of features for
the input data, Title + Aims and Scopes (TS), Abstract + Aims and Scopes
(AS), Keywords + Aims and Scopes (KS), Title + Abstract + Aims and Scopes
(TAS), Title +Keywords + Aims and Scopes (TKS), Abstract + Keywords +
Aims and Scopes (AKS), Title + Abstract + Keywords + Aims and Scopes
(TAKS).
Conceptually, we define two sub-branches and the main branch of this ar-
chitecture. In the first sub-branch, we reuse the fine-tuned LM to encode the
external feature into the embeddings and pass it through a linear layer with
ReLU activation for dimensional reduction. The second sub-branch is the same
as the model using paper information described previously. Remarkably, the pa-
per feature, output from the second sub-branch, will contribute to the following
steps. First, we extract the similarity between it and external features produced
from the first sub-branch using the cosine similarity and then concatenate it
with these cosine features. Then, in the main branch, we feed that joined infor-
mation to a linear layer and softmax activation to compute the probability of
paper submission belonging to the journals and sort them in descending order to
return the top list of recommended items. The illustration for the architecture
can be found in Figure 3.
Simple Contrastive Learning for Paper Submission Recommendation System 7
Fig. 3. The architecture of the fine-tuned LM using paper information and journal’s
aims.
8 Duc et al.
4 Experiments
This section will present how we conducted the experiments for the proposed
methods and compare them with the previous approaches for the main problem.
4.2 Datasets
Our experiments use the same dataset and preprocessing techniques as used in
Son et al. ’s work [7], including 414512 papers, where there are 331464 papers
used for training and 83048 ones used for testing. These consist of paper sub-
missions’ information (the title, abstract, and list of keywords) and their labels.
Besides, 351 aims and scopes belong to the journals play as external contributed
features.
Data preprocessing Excluding the size of the dataset, the quality of the
preprocessed data affects the model’s performance very essentially, so data pre-
processing is a crucial step in almost machine learning tasks; this process is
more critical in NLP problems. The preprocessing progress includes some steps
such as (1) lowercase text, (2) removing punctuations, (3) removing URLs, (4)
removing stop words downloaded from the Natural Language Toolkit (NLTK6),
and some other ones we define, (5) removing unnecessary spacing,(6) removing
not-be-alphabet text. In our work, we apply data preprocessing techniques for
two kinds of data, including general paper information and the aims and scopes
of the journal.
of the title, abstract, and keywords with the corresponding aims & scope of the
journal to take the [CLS] representation as to the sentence embedding. Lastly,
we fine-tune the model’s parameters using the contrastive objective (Eq. 1) to
lead it more robust in extracting the semantic relationship among sentences.
During the experiment, we found that the AdamW [11] optimizer, a variant of
the optimizer Adam [8] uses weight decay to avoid overfitting, is the efficient
choice for deep networks like our architecture.
According to 3.2, we solve the classification problem as the downstream task
in which we train the fine-tuned Distil-RoBERTa model on different combina-
tions of features and use the AdamW to optimize Cross-Entropy loss. Finally, we
put the Softmax layer to archive the Top K values representing the probability
of given inputs belonging to corresponding labels and compute the accuracy at
each K value as described in Section 3.3
4.4 Results
Our experiments are done on different combinations of attributes using two
models defined previously: Models using paper information and Models
using paper information and Journals’ Aims & Scope. We compare the
performance of our proposed model with the approach [7] (namely Approach A),
the experimental results described in Table 1 and Figure 4.
Firstly, for models only using paper information, the results show that the
proposed method performs better than Approach A for combinations of at-
tributes such as A, TK, TA, AK, and TAK. Especially, our approach has the
best outcome for the group of features TAK in Accuracy@K as 0.5173, 0.8097,
0.8862, and 0.9496, where K = 1, 3, 5, and 10. Meanwhile, Approach A with
the same input has 0.4852, 0.7856, 0.8624, and 0.9333, respectively. However,
the proposed model’s performance is lower than Approach A for the title (T)
and keywords (K) inputs at the Top 3 and 5 in the accuracy; this difference is
slight. Besides, the accuracy of the proposed model at the Top 1 surpasses the
previous method for the title and keywords (TK) input; specifically, the Accu-
racy@K (with K = 1) of the proposed model is 0.3721 and 0.4022; meanwhile,
the performance of Approach A is 0.3542, 0.3933, respectively. In addition, com-
pared to the accuracy of Approach A in the Top 10, our approach gives better
performance for the title (T) feature, and it has a little lower outcome for the
keywords (K) attribute.
Secondly, for models using paper information and Journals’ Aims & Scope,
the proposed method’s performance surpasses Approach A when using types of
input data such as AS, TAS, AKS, and TAKS. For example, the best perfor-
mance of the proposed approach in Accuracy@K (K=1, 3, 5, 10) is 0.5194, 0.8112,
0.8866, and 0.9496 when using all features (TAKS), while the performance of Ap-
proach A is lower than, which are 0.5002, 0.7889, 0.8627, and 0.9323 respectively.
Although for the remaining input groups (namely TS, KS, and TKS), the Ac-
curacy@K (K=1, 3, 5, 10) of the proposed model is lower than Approach A,
excluding only one case Accuracy@10 with TKS as input data, the accuracy of
the proposed approach is slightly greater than Approach A.
10 Duc et al.
Finally, one can see that both the proposed method and approach A can help
to improve performance when using additional information “Aims & Scopes” of
the journals. Using Aims & Scopes’ results in our proposed method is better
than not using one. Except in the cases of using types of features such as title
(T), the keyword (K), title, and keyword (TK), using the additional attribute
“Aims & Scopes” is not helpful to increase performance. For example, the best
performance of the proposed method in the Accuracy@K (K=1, 3, 5, 10) is
0.5194, 0.8112, 0.8866, and 0.9496 when using combinations of attributes TAKS;
meanwhile, the outcome when not using the aims of the journals are 0.5173,
0.8097, 0.8862 and 0.9496 respectively.
In summary, our proposed approach, SimCPSR, has outperformed perfor-
mance compared to the previous method; this can be considered a new approach
for the paper submission recommendation system. Remarkably, using additional
information “Aims & Scopes” can help improve models’ efficiency. Besides, we
can see the importance of the abstract feature; all models using input data
containing this feature have better performance than the models using input
without abstract information. Therefore, the abstract factor is essential for the
paper submission recommendation problem.
Fig. 4. The performance between the proposed SimCPSR approach and Approach A
for different features.
Simple Contrastive Learning for Paper Submission Recommendation System 11
Table 1. The performance of the proposed models compared to Son et al.’s results
in [7]
Acknowledgement
Son Huynh Thanh was funded by Vingroup JSC and supported by the Master,
Ph.D. Scholarship Programme of Vingroup Innovation Foundation (VINIF), In-
stitute of Big Data, code VINIF.2021.ThS.18.
References
1. Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., Xia, F.: Scientific paper recommen-
dation: A survey. IEEE Access 7, 9324–9339 (2019). https://doi.org/10.1109/
ACCESS.2018.2890388
2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
subword information. Transactions of the Association for Computational Lin-
guistics 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051, https://
aclanthology.org/Q17-1010
3. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for
contrastive learning of visual representations. CoRR abs/2002.05709 (2020),
https://arxiv.org/abs/2002.05709
4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep
bidirectional transformers for language understanding. In: Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota
(Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.
org/N19-1423
5. Gao, T., Yao, X., Chen, D.: Simcse: Simple contrastive learning of sentence embed-
dings (2021). https://doi.org/10.48550/ARXIV.2104.08821, https://arxiv.
org/abs/2104.08821
6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation
9(8), 1735–1780 (1997)
7. Huynh, S.T., Dang, N., Huynh, P.T., Nguyen, D.H., Nguyen, B.T.: A fusion ap-
proach for paper submission recommendation system. In: Fujita, H., Selamat, A.,
Lin, J.C.W., Ali, M. (eds.) Advances and Trends in Artificial Intelligence. From
Theory to Practice. pp. 72–83. Springer International Publishing, Cham (2021)
Simple Contrastive Learning for Paper Submission Recommendation System 13
8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014), http:
//arxiv.org/abs/1412.6980, cite arxiv:1412.6980Comment: Published as a con-
ference paper at the 3rd International Conference for Learning Representations,
San Diego, 2015
9. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015).
https://doi.org/10.1038/nature14539
10. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M.,
Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized BERT pretraining
approach. CoRR abs/1907.11692 (2019), http://arxiv.org/abs/1907.11692
11. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. CoRR
abs/1711.05101 (2017), http://arxiv.org/abs/1711.05101
12. Nguyen, D.H., Huynh, S., Huynh, P., Cuong, D.V., Nguyen, B.T.: S2cft: A new
approach for paper submission recommendation. In: SOFSEM (2021)
13. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E.,
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai,
J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning
library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E.,
Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp.
8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/
9015-pytorch-an-imperative-style-high-performance-deep-learning-library.
pdf
14. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre-
sentation. In: Empirical Methods in Natural Language Processing (EMNLP). pp.
1532–1543 (2014), http://www.aclweb.org/anthology/D14-1162
15. Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-
networks. CoRR abs/1908.10084 (2019), http://arxiv.org/abs/1908.10084
16. Samin, H., Azim, T.: Knowledge based recommender system for academia us-
ing machine learning: A case study on higher education landscape of pakistan.
IEEE Access 7, 67081–67093 (04 2019). https://doi.org/10.1109/ACCESS.2019.
2912012
17. Son, H.T., Phong, H.T., Dac, N.H.: An efficient approach for paper submission
recommendation. 2020 IEEE REGION 10 CONFERENCE (TENCON) pp. 726–
731 (2020)
18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need (2017). https://doi.org/10.48550/
ARXIV.1706.03762, https://arxiv.org/abs/1706.03762
19. Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender
system for computer science publications. Knowledge-Based Systems 157, 1–
9 (2018). https://doi.org/https://doi.org/10.1016/j.knosys.2018.05.001,
https://www.sciencedirect.com/science/article/pii/S0950705118302107
20. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P.,
Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C.,
Jernite, Y., Plu, J., Xu, C., Scao, T.L., Gugger, S., Drame, M., Lhoest, Q., Rush,
A.M.: Transformers: State-of-the-art natural language processing. In: Proceedings
of the 2020 Conference on Empirical Methods in Natural Language Processing: Sys-
tem Demonstrations. pp. 38–45. Association for Computational Linguistics, Online
(Oct 2020), https://www.aclweb.org/anthology/2020.emnlp-demos.6