0% found this document useful (0 votes)

6 views

rs

Uploaded by

hongtrinh2462004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

rs

Uploaded by

hongtrinh2462004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

SimCPSR: Simple Contrastive Learning for Paper

Submission Recommendation System

Duc H. Le1,2 , Tram T. Doan1,2 , Son T. Huynh1,2,3 , and Binh T. Nguyen1,2,3?

1
University of Science, Ho Chi Minh City, Vietnam
2
Vietnam National University, Ho Chi Minh City, Vietnam
arXiv:2205.05940v1 [cs.IR] 12 May 2022

3
AISIA Research Lab, Ho Chi Minh City, Vietnam

Abstract. The recommendation system plays a vital role in many ar-

eas, especially academic fields, to support researchers in submitting and
increasing the acceptance of their work through the conference or journal
selection process. This study proposes a transformer-based model using
transfer learning as an efficient approach for the paper submission rec-
ommendation system. By combining essential information (such as the
title, the abstract, and the list of keywords) with the aims & scopes
of journals, the model can recommend the Top K journals that max-
imize the acceptance of the paper. Our model had developed through
two states: (i) Fine-tuning the pre-trained language model (LM) with
a simple contrastive learning framework. We utilized a simple super-
vised contrastive objective to fine-tune all parameters, encouraging the
LM to learn the document representation effectively. (ii) The fine-tuned
LM was then trained on different combinations of the features for the
downstream task. This study suggests a more advanced method for en-
hancing the efficiency of the paper submission recommendation system
compared to previous approaches when we respectively achieve 0.5173,
0.8097, 0.8862, 0.9496 for Top 1, 3, 5, 10 accuracies on the test set for
combining the title, abstract, and keywords as input features. Incorpo-
rating the journals’ aims and scopes, our model shows an exciting result
by getting 0.5194, 0.8112, 0.8866, 0.9496 respective to Top 1, 3, 5, and
10. We provide the implementation and datasets for further reference at
https://github.com/hduc-le/SimCPSR.

Keywords: paper submission recommendation · contrastive learning ·

sentence embedding · recommendation system.

1 Introduction

Recommendation systems have become more and more popular in almost all
fields, and people are using these systems in different industries such as retail,
media, news, streaming service, and e-commerce. See the benefit of recommen-
dation systems in the development of the economy; many companies built rec-
ommendation systems that utilize historical data of customers to give them some
?
Corresponding Author: Binh T. Nguyen ([email protected])
2 Duc et al.

relevant suggestions to meet customer satisfaction and improve the company’s

product. Various well-known recommendation systems include the recommenda-
tion systems of Spotify or Netflix in streaming services, Amazon in e-commerce
branches, Google, Facebook, and Youtube in media. Besides, the recommenda-
tion system in academics has gained importance in recent years, including the
paper submission system [7, 12, 17, 19], the knowledge-based recommendation
system [16], and paper suggestion [1]. Selecting a suitable journal for submitting
new work is not easy for almost researchers, including young and experienced
people. To support the researchers in choosing a relevant journal to increase
their work acceptance opportunities, Wang and his coworkers proposed the rec-
ommendation system for computer science publications [19] in the early stage
and developed it by many other researchers later.
In this work, we aim to investigate the paper submission recommendation
problem using general paper information and the aims and scopes of the journals
as sufficient attributes in our method. Our target is to extract the semantic
relationship between the paper submission and journal through those available
features as well as possible. We expect the papers and journals’ representations to
be well-encoded, meaning close together for those that are semantically related
and far away for contrast in the embedding space. We tackle the problem by
applying the transformer architecture [18] as an encoder to extract the input
feature effectively and utilize the contrastive learning framework [3,5] to enhance
the model’s robustness in the downstream task. The experimental results show
that the proposed approach mostly outperforms the previous works. For example,
in using the title, abstract, and keywords combined with journals’ aims and
scopes for training, our model gets 0.5194 for Top 1 accuracy and 0.9496 for Top
10 accuracy.
The contribution of our work can be described as follows:

(a) We propose a new framework for the paper submission recommendation

problem using the transformer architecture, which shows a significant ad-
vance to tackle. The experimental results in Section 4 show that our ap-
proach has competitive performance compared to the previous works.
(b) Leveraging contrastive learning, the powerful method of sentence embedding,
we enhance the framework’s robustness in learning semantic relationships
among documents or sentences.
(c) Our method provides a basic framework that can be extended further by
applying other transformer models to achieve better performance in the
paper submission recommendation problem.

The paper can be organized as follows. First, Section 2 provides a brief

overview of the approaches that tackled this problem using well-known tools
and algorithms. Then, in Section 3, we describe our main methods, and Section
4 shows the details of chosen experiments, including the training configurations,
dataset, and achieved results. Finally, the paper ends with the conclusion and
our discussion on future work.
Simple Contrastive Learning for Paper Submission Recommendation System 3

2 Related work
The idea of the paper recommendation system had developed by Wang, and his
coworker in computer science publications [19]. Wang used the Chi-square statis-
tic and the term frequency-inverse document frequency (TF-IDF) for feature
selection from the abstract of each paper submission and utilized linear logistic
regression to classify relevant journals or conferences. The accuracy in his study
is 61.37% for the Top 3 recommended results. Son and colleagues later developed
and proposed a new approach to improve the paper submission recommendation
algorithm’s performance using other additional features. The proposed method
in [17] uses TF-IDF, the Chi-square statistic, and the one-hot encoding tech-
nique to extract parts from available information in each paper. They applied
two machine learning models, namely Logistics Linear Regression (LLR) and
Multi-layer Perceptrons (MLP), to the different combinations of features from
the paper submission. Their proposed methods achieved outperformed results
for the Top 3 accuracy, especially 89.07% for the LLR model and 88.60% for the
MLP model when using a group of features, including the title, abstract, and
keywords.
Regarding the problem, Dac et al. lately proposed a new approach [12] by
applying two embedding methods, GloVe [14] and FastText [2], combining to
Convolutional Neural Network (CNN) [9] and LSTM [6] for feature extraction.
They considered seven features groups: title, abstract, keywords, title + keyword,
title + abstract, keyword + abstract, and title + keyword + abstract for training
progress. The experimental results show that the combination of S2RSCS [17]
and CNN + FastText, namely the proposed S2CFT [12] model has the best
performance with the Top 1 accuracy is 68.11% when using a mixture of attribute
title + keyword + abstract, the accuracy at Top 3, 5, and 10 are 90.8%, 96.25%,
and 99.21% respectively.
Son and his coworkers continued to propose a new approach to the paper
recommendation system. In their study [7], besides valuable papers’ attributes,
they used additional information from the “Aims and Scopes” of journals for
input data. They collected a dataset containing 414512 articles and the cor-
responding aims and scopes in the journal of these papers. They built a new
architecture that still uses FastText [2] for embedding, and the input data is
available information of paper submission; they created a new feature by mea-
suring the similarity between paper submission and the journals’ aims & scopes.
Their proposed method is a practical approach to solving the paper submission
recommendation problem.
In recent years, transformer architectures have succeeded in various fields of
natural language processing (NLP) and computer vision (CV). One of the most
significant works dedicated to that success is the Vanilla Transformer architec-
ture [18]. It is active through adopting the attention mechanism and differentially
weighting the significance of each part of the input data. The popularity and ef-
ficiency of the transformer models are unquestionable; it contributes to carrying
many other models taken off, the most famous being BERT (Bidirectional En-
coder Representations from Transformers) [4]. BERT is pre-trained with large
4 Duc et al.

amounts of textual data and fine-tuned to achieve new state-of-the-art results

in many NLP tasks such as semantic textual similarity (STS) or sentence classi-
fication. In addition, BERT has been considered a powerful embedding method
for documents or sentences by sending those to the BERT layers and taking the
average of the output layer or the output of the first token (the [CLS] token) to
get the fixed-size embeddings.
Although BERT shows its impression on many NLP tasks, it exists some limi-
tations in sentence embedding by standard approaches. Recently, the Contrastive
Learning [3] framework has become state-of-the-art in sentence embedding. Its
idea conceptually describes a technique that aims to pull similar samples together
and push dissimilar ones far away in the embedding space by the contrastive ob-
jective. One can apply contrastive learning to unlabeled and labeled data. As a
result, fine-tuning transformer models by contrastive objective [3, 5, 15] become
an efficient method to perform better input representation, not only textual data
but also image data [3].

3 Methodology
In this section, we present our approach for the paper submission recommenda-
tion problem and the evaluation metrics.

3.1 Contrastive Learning

Contrastive learning recently has become one of the popular frameworks that
can be applied for supervised and unsupervised learning machine learning tasks.
This technique aims to learn similar/dissimilar representations from data con-
structed from a set of paired samples (xi , x+ i ) semantically related effectively
by pulling identical sample pairs together and pushing dissimilar ones apart in
embedding space. The most significant promise of contrastive learning is utiliz-
ing a pre-trained language transformer model that was then fine-tuned with the
contrastive objective to encode input into a good representation that can boost
the performance of many downstream tasks.
We first leverage the supervised training dataset (as shown in Section 4.2)
to construct a set of paired samples, D = {(xi , x+ m
i )}i=1 , for the contrastive fine-
tuning process, in which we denote xi as the i’th sample that consists of the
title, abstract, keywords and x+ i as semantically corresponding aims and scope.
We follow the contrastive learning framework described in [5]. For a mini-batch
of N pairs, let hi and h+ i denote the embedding or latent representation of xi
and x+ i ; the contrastive objective was defined as:
+
esim(hi ,hi )/τ
`i = − log PN , (1)
sim(hi ,h+
j )/τ
i=1 e
where τ is a temperature hyper-parameter and sim (h1 , h2 ) is the cosine simi-
larity, which is:
hT1 h2
sim (h1 , h2 ) = .
kh1 k · kh2 k
Simple Contrastive Learning for Paper Submission Recommendation System 5

3.2 Modeling

In this study, we build a two-state model containing two consecutive procedures,

conceptually fine-tuning the pre-trained LM with a simple contrastive learning
framework as an encoder to encode each document or sentence into sentence
embedding efficiently. Then, the fine-tuned LM can be applied to different com-
binations of the features for the downstream task to classify for the Top K
accuracy on groups of attributes.
Fine-tuning: As mentioned in Section 3.1, we consider the aims & scope
of the journal as a positive sample of the paper’s title, abstract, and keywords,
respectively. Finally, we perform fine-tuning the pre-trained Distil-RoBERTa (a
distilled version of the pre-trained RoBERTa [10]) via the contrastive objective
(Eq. 1) on the set of paired samples to fine-tune all parameters as depicted in
Figure 1.

Fig. 1. Fine-tuning the pre-trained LM progress with a simple contrastive learning

framework where xi and x+i are considered two semantically related samples and those
corresponding representation hi , h+
i .

Downstream task: We consider the fine-tuned LM as a backbone for the

classification task. Therefore, we train it on different combinations of the fea-
tures, either using available attributes of each paper submission or combining
the paper information and the journals’ aims. In this section, we describe two
different use-cases in our experiments.
Models using paper information For this case, we extract helpful in-
formation from each paper submission, including Title (T), Abstract(A), and
Keywords (K). This information is combined into seven different combinations
of attributes: Title(T), Abstract(A), Keywords(K), Title + Abstract (TA), Title
+ Keywords (TK), Abstract + Keywords (AK), Title + Abstract + Keywords
6 Duc et al.

(TAK). The fine-tuned LM plays as the encoder to encode the batch of inputs
into 768-dimensional embeddings, which are further forward propagated through
an additional linear layer with ReLU activation and dropout to avoid overfitting.
The last linear layer with Softmax activation acts as a classifier to output the
N-probabilities that the given paper could belong to the respective journal. To
identify the K-relevant journals that maximize the probability of acceptance, we
choose the Top K of maximum values. The illustration for this case is in Figure
2.

Fig. 2. The architecture of the fine-tuned LM using paper information.

Models using paper information and Journals’ Aims & Scope Be-
sides the available attributes on paper, we use Journals’ Aims & Scopes as po-
tential external features. We end up with seven new combinations of features for
the input data, Title + Aims and Scopes (TS), Abstract + Aims and Scopes
(AS), Keywords + Aims and Scopes (KS), Title + Abstract + Aims and Scopes
(TAS), Title +Keywords + Aims and Scopes (TKS), Abstract + Keywords +
Aims and Scopes (AKS), Title + Abstract + Keywords + Aims and Scopes
(TAKS).
Conceptually, we define two sub-branches and the main branch of this ar-
chitecture. In the first sub-branch, we reuse the fine-tuned LM to encode the
external feature into the embeddings and pass it through a linear layer with
ReLU activation for dimensional reduction. The second sub-branch is the same
as the model using paper information described previously. Remarkably, the pa-
per feature, output from the second sub-branch, will contribute to the following
steps. First, we extract the similarity between it and external features produced
from the first sub-branch using the cosine similarity and then concatenate it
with these cosine features. Then, in the main branch, we feed that joined infor-
mation to a linear layer and softmax activation to compute the probability of
paper submission belonging to the journals and sort them in descending order to
return the top list of recommended items. The illustration for the architecture
can be found in Figure 3.
Simple Contrastive Learning for Paper Submission Recommendation System 7

Fig. 3. The architecture of the fine-tuned LM using paper information and journal’s
aims.
8 Duc et al.

3.3 Evaluation metrics

As the previous approach [7], Son et al. used Accuracy@K to evaluate the per-
formance of the proposed model, where K = 1, 3, 5, 10. The Accuracy@K is
the ratio between the number of correct items at each K and the K number of
recommended items, described in the formula follows:
The number of relevant items
PT op−K = (2)
The number of viewed items

4 Experiments
This section will present how we conducted the experiments for the proposed
methods and compare them with the previous approaches for the main problem.

4.1 Experimental settings

Our experiments are run on Google Colab Pro, mostly with Tesla P100-PCIE
16GB VRAM GPU accelerator, and implemented in the PyTorch framework [13].
In addition, we utilize the HuggingFace library [20], one of the most popular
NLP frameworks that provide APIs to download and use pre-trained transformer
models easily.

4.2 Datasets
Our experiments use the same dataset and preprocessing techniques as used in
Son et al. ’s work [7], including 414512 papers, where there are 331464 papers
used for training and 83048 ones used for testing. These consist of paper sub-
missions’ information (the title, abstract, and list of keywords) and their labels.
Besides, 351 aims and scopes belong to the journals play as external contributed
features.
Data preprocessing Excluding the size of the dataset, the quality of the
preprocessed data affects the model’s performance very essentially, so data pre-
processing is a crucial step in almost machine learning tasks; this process is
more critical in NLP problems. The preprocessing progress includes some steps
such as (1) lowercase text, (2) removing punctuations, (3) removing URLs, (4)
removing stop words downloaded from the Natural Language Toolkit (NLTK6),
and some other ones we define, (5) removing unnecessary spacing,(6) removing
not-be-alphabet text. In our work, we apply data preprocessing techniques for
two kinds of data, including general paper information and the aims and scopes
of the journal.

4.3 Training details

We start by using a pre-trained Distil-RoBERTa model and fine-tune it on the set
of paired samples where each instance is constructed by pairing the combination
Simple Contrastive Learning for Paper Submission Recommendation System 9

of the title, abstract, and keywords with the corresponding aims & scope of the
journal to take the [CLS] representation as to the sentence embedding. Lastly,
we fine-tune the model’s parameters using the contrastive objective (Eq. 1) to
lead it more robust in extracting the semantic relationship among sentences.
During the experiment, we found that the AdamW [11] optimizer, a variant of
the optimizer Adam [8] uses weight decay to avoid overfitting, is the efficient
choice for deep networks like our architecture.
According to 3.2, we solve the classification problem as the downstream task
in which we train the fine-tuned Distil-RoBERTa model on different combina-
tions of features and use the AdamW to optimize Cross-Entropy loss. Finally, we
put the Softmax layer to archive the Top K values representing the probability
of given inputs belonging to corresponding labels and compute the accuracy at
each K value as described in Section 3.3

4.4 Results
Our experiments are done on different combinations of attributes using two
models defined previously: Models using paper information and Models
using paper information and Journals’ Aims & Scope. We compare the
performance of our proposed model with the approach [7] (namely Approach A),
the experimental results described in Table 1 and Figure 4.
Firstly, for models only using paper information, the results show that the
proposed method performs better than Approach A for combinations of at-
tributes such as A, TK, TA, AK, and TAK. Especially, our approach has the
best outcome for the group of features TAK in Accuracy@K as 0.5173, 0.8097,
0.8862, and 0.9496, where K = 1, 3, 5, and 10. Meanwhile, Approach A with
the same input has 0.4852, 0.7856, 0.8624, and 0.9333, respectively. However,
the proposed model’s performance is lower than Approach A for the title (T)
and keywords (K) inputs at the Top 3 and 5 in the accuracy; this difference is
slight. Besides, the accuracy of the proposed model at the Top 1 surpasses the
previous method for the title and keywords (TK) input; specifically, the Accu-
racy@K (with K = 1) of the proposed model is 0.3721 and 0.4022; meanwhile,
the performance of Approach A is 0.3542, 0.3933, respectively. In addition, com-
pared to the accuracy of Approach A in the Top 10, our approach gives better
performance for the title (T) feature, and it has a little lower outcome for the
keywords (K) attribute.
Secondly, for models using paper information and Journals’ Aims & Scope,
the proposed method’s performance surpasses Approach A when using types of
input data such as AS, TAS, AKS, and TAKS. For example, the best perfor-
mance of the proposed approach in Accuracy@K (K=1, 3, 5, 10) is 0.5194, 0.8112,
0.8866, and 0.9496 when using all features (TAKS), while the performance of Ap-
proach A is lower than, which are 0.5002, 0.7889, 0.8627, and 0.9323 respectively.
Although for the remaining input groups (namely TS, KS, and TKS), the Ac-
curacy@K (K=1, 3, 5, 10) of the proposed model is lower than Approach A,
excluding only one case Accuracy@10 with TKS as input data, the accuracy of
the proposed approach is slightly greater than Approach A.
10 Duc et al.

Finally, one can see that both the proposed method and approach A can help
to improve performance when using additional information “Aims & Scopes” of
the journals. Using Aims & Scopes’ results in our proposed method is better
than not using one. Except in the cases of using types of features such as title
(T), the keyword (K), title, and keyword (TK), using the additional attribute
“Aims & Scopes” is not helpful to increase performance. For example, the best
performance of the proposed method in the Accuracy@K (K=1, 3, 5, 10) is
0.5194, 0.8112, 0.8866, and 0.9496 when using combinations of attributes TAKS;
meanwhile, the outcome when not using the aims of the journals are 0.5173,
0.8097, 0.8862 and 0.9496 respectively.
In summary, our proposed approach, SimCPSR, has outperformed perfor-
mance compared to the previous method; this can be considered a new approach
for the paper submission recommendation system. Remarkably, using additional
information “Aims & Scopes” can help improve models’ efficiency. Besides, we
can see the importance of the abstract feature; all models using input data
containing this feature have better performance than the models using input
without abstract information. Therefore, the abstract factor is essential for the
paper submission recommendation problem.

Fig. 4. The performance between the proposed SimCPSR approach and Approach A
for different features.
Simple Contrastive Learning for Paper Submission Recommendation System 11

Table 1. The performance of the proposed models compared to Son et al.’s results
in [7]

Method Feature Usage Top 1 Top 3 Top 5 Top 10

T 0.3542 0.6634 0.7561 0.8532
TS 0.4015 0.6991 0.7971 0.8951
K 0.3933 0.7008 0.7919 0.8852
KS 0.4284 0.7256 0.8189 0.9075
A 0.4691 0.7661 0.8482 0.9253
AS 0.477 0.7662 0.8488 0.9258
TK 0.4157 0.7315 0.8232 0.9084
Approach A
TKS 0.4475 0.7490 0.8302 0.9127
TA 0.4644 0.7613 0.8448 0.9233
TAS 0.4828 0.7754 0.8536 0.9276
AK 0.4791 0.7730 0.8530 0.9273
AKS 0.4951 0.7830 0.8602 0.9304
TAK 0.4852 0.7856 0.8624 0.9333
TAKS 0.5002 0.7889 0.8627 0.9323

T 0.3721 0.6555 0.7526 0.8533

TS 0.3737 0.6553 0.7513 0.8523
K 0.4022 0.6892 0.7822 0.8792
KS 0.4015 0.6921 0.7839 0.8784
A 0.4875 0.7842 0.8639 0.9351
AS 0.4886 0.7849 0.8642 0.9353
TK 0.4372 0.7382 0.8280 0.9145
SimCPSR
TKS 0.4367 0.7354 0.8268 0.9128
TA 0.4935 0.7892 0.8689 0.9385
TAS 0.5014 0.7920 0.8729 0.9428
AK 0.4853 0.7782 0.8622 0.9350
AKS 0.5030 0.7964 0.8765 0.9435
TAK 0.5173 0.8097 0.8862 0.9496
TAKS 0.5194 0.8112 0.8866 0.9496
12 Duc et al.

5 Conclusion and Further Works

We presented a transformer-based approach to the paper submission recommen-
dation system with simple contrastive learning. The proposed method utilized
a simple supervised contrastive objective to fine-tune all parameters in the pre-
trained LM for embedding input data and the fine-tuned LM to train different
combinations of the features using available paper information and the journal’s
aims for downstream task. The experimental results show that the proposed ap-
proach has competitive performance and is an advanced method for enhancing
the efficiency of the paper submission recommendation system. Furthermore, we
will continue improving the proposed algorithm’s performance and apply them
to different datasets that belong to various areas in the future.

Acknowledgement
Son Huynh Thanh was funded by Vingroup JSC and supported by the Master,
Ph.D. Scholarship Programme of Vingroup Innovation Foundation (VINIF), In-
stitute of Big Data, code VINIF.2021.ThS.18.

References
1. Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., Xia, F.: Scientific paper recommen-
dation: A survey. IEEE Access 7, 9324–9339 (2019). https://doi.org/10.1109/
ACCESS.2018.2890388
2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
subword information. Transactions of the Association for Computational Lin-
guistics 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051, https://
aclanthology.org/Q17-1010
3. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for
contrastive learning of visual representations. CoRR abs/2002.05709 (2020),
https://arxiv.org/abs/2002.05709
4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep
bidirectional transformers for language understanding. In: Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota
(Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.
org/N19-1423
5. Gao, T., Yao, X., Chen, D.: Simcse: Simple contrastive learning of sentence embed-
dings (2021). https://doi.org/10.48550/ARXIV.2104.08821, https://arxiv.
org/abs/2104.08821
6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation
9(8), 1735–1780 (1997)
7. Huynh, S.T., Dang, N., Huynh, P.T., Nguyen, D.H., Nguyen, B.T.: A fusion ap-
proach for paper submission recommendation system. In: Fujita, H., Selamat, A.,
Lin, J.C.W., Ali, M. (eds.) Advances and Trends in Artificial Intelligence. From
Theory to Practice. pp. 72–83. Springer International Publishing, Cham (2021)
Simple Contrastive Learning for Paper Submission Recommendation System 13

8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014), http:
//arxiv.org/abs/1412.6980, cite arxiv:1412.6980Comment: Published as a con-
ference paper at the 3rd International Conference for Learning Representations,
San Diego, 2015
9. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015).
https://doi.org/10.1038/nature14539
10. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M.,
Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized BERT pretraining
approach. CoRR abs/1907.11692 (2019), http://arxiv.org/abs/1907.11692
11. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. CoRR
abs/1711.05101 (2017), http://arxiv.org/abs/1711.05101
12. Nguyen, D.H., Huynh, S., Huynh, P., Cuong, D.V., Nguyen, B.T.: S2cft: A new
approach for paper submission recommendation. In: SOFSEM (2021)
13. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E.,
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai,
J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning
library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E.,
Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp.
8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/
9015-pytorch-an-imperative-style-high-performance-deep-learning-library.
pdf
14. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre-
sentation. In: Empirical Methods in Natural Language Processing (EMNLP). pp.
1532–1543 (2014), http://www.aclweb.org/anthology/D14-1162
15. Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-
networks. CoRR abs/1908.10084 (2019), http://arxiv.org/abs/1908.10084
16. Samin, H., Azim, T.: Knowledge based recommender system for academia us-
ing machine learning: A case study on higher education landscape of pakistan.
IEEE Access 7, 67081–67093 (04 2019). https://doi.org/10.1109/ACCESS.2019.
2912012
17. Son, H.T., Phong, H.T., Dac, N.H.: An efficient approach for paper submission
recommendation. 2020 IEEE REGION 10 CONFERENCE (TENCON) pp. 726–
731 (2020)
18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need (2017). https://doi.org/10.48550/
ARXIV.1706.03762, https://arxiv.org/abs/1706.03762
19. Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender
system for computer science publications. Knowledge-Based Systems 157, 1–
9 (2018). https://doi.org/https://doi.org/10.1016/j.knosys.2018.05.001,
https://www.sciencedirect.com/science/article/pii/S0950705118302107
20. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P.,
Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C.,
Jernite, Y., Plu, J., Xu, C., Scao, T.L., Gugger, S., Drame, M., Lhoest, Q., Rush,
A.M.: Transformers: State-of-the-art natural language processing. In: Proceedings
of the 2020 Conference on Empirical Methods in Natural Language Processing: Sys-
tem Demonstrations. pp. 38–45. Association for Computational Linguistics, Online
(Oct 2020), https://www.aclweb.org/anthology/2020.emnlp-demos.6