A Cognitive Study On Semantic Similarity Analysis

Uploaded by

Александр Гынку

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

A Cognitive Study On Semantic Similarity Analysis

Uploaded by

Александр Гынку

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1

A Cognitive Study on Semantic Similarity Analysis

of Large Corpora: A Transformer-based Approach
Praneeth Nemani, Satyanarayana Vollala (praneeth19100, satya)@iiitnr.edu.in
Department of Computer Science and Engineering
International Institute of Information Technology, Naya Raipur, Chhattisgarh, India

Abstract—Semantic similarity analysis and modeling is a two blocks of text had many similar words but conveyed
fundamentally acclaimed task in many pioneering applications different meanings. For example, the sentences Tom and Harry
of natural language processing today. Owing to the sensation played Badminton and Cricket, and Tom played Badminton
of sequential pattern recognition, many neural networks like
RNNs and LSTMs have achieved satisfactory results in semantic and Harry played Cricket are lexically similar. However, the
similarity modeling. However, these solutions are considered context of these two sentences was not the same. Conversely,
inefficient due to their inability to process information in a non- the sentences Jenny knows many languages and Jenny is a
sequential manner, thus leading to the improper extraction of polyglot, though lexically dissimilar, convey the same mean-
context. Transformers function as the state-of-the-art architecture ing. Similarity analysis based on lexical similarity methods is
due to their advantages like non-sequential data processing and
self-attention. In this paper, we perform semantic similarity anal- easy to implement, but they fail in sentences that are lexically
ysis and modeling on the U.S Patent Phrase to Phrase Matching dissimilar and semantically similar.
Dataset using both traditional and transformer-based techniques. Today’s pioneering ML algorithms and their applications
We experiment upon four different variants of the Decoding use the method of vectorization for the process of feature
Enhanced BERT - DeBERTa and enhance its performance by
performing K-Fold Cross-Validation. The experimental results extraction. The concept of vectorization in NLP [2] is that each
demonstrate our methodology’s enhanced performance compared word/phrase in a dataset is represented as a vector or an array
to traditional techniques, with an average Pearson correlation of numbers, making feature extraction easier owing to today’s
score of 0.79. computers’ computational efficiency. Some techniques that use
Index Terms—Semantic Similarity, K-Fold Cross Validation, this vectorization concept include Bag of Words [3] and TF-
Pearson Correlation, Transformers IDF [4]. However, the major limitation of these solutions
is that they do not consider the broader context of the text
I. INTRODUCTION block for computing semantic similarity. The broader context
between two blocks of text is inversely proportional to their
Semantic similarity is defined as the association between semantic distance.
two blocks of text, including sentences, words, and documents.
Recurrent Neural Networks and Long Short-Term Memory
It plays a fundamentally acclaimed role in most of the NLP
networks, abbreviated as RNNs and LSTMs, respectively, have
tasks performed by researchers worldwide today. The dynamic
been considered effective techniques for learning dependencies
and versatile nature of human language makes it difficult to
between blocks of text. While RNNs depend on the recent
standardize the process of semantic similarity [1]. As time
previous blocks, LSTMs depend on the broader context of the
evolves, finding new semantic analysis techniques is deemed
text. However, they are considered inferior to transformers due
essential due to the exponential rise of textual data generation.
to their limitation of sequential data processing. The significant
The conceptual overview of semantic similarity analysis is
advantages of transformers like the training of large corpus,
depicted in Fig. 1.
non-sequential data processing, self-attention, and techniques
like positional embeddings to replace recurrent have popular-
ized them for performing modern-day NLP tasks. This paper
provides an insight into how modern-day transformers can be
used for the task of semantic similarity analysis. The major
NN for similarity Similarity Similarity
Preprocessing
Blocks of Text estimation Metrics Database contributions of our paper can be listed below:
Fig. 1: Conceptual Overview of Semantic Similarity Analysis • We present a comprehensive study on the different
methodologies used for the process of semantic similarity
As mentioned above, semantic similarity analysis is pivotal analysis.
in various applications like information recovery, text sum- • We also perform the state-of-the-art preprocessing tech-
marization, speech enhancement, and automatic dialogue gen- niques and exploratory data analysis on the U.S Patent
eration. In the initial methodologies proposed by researchers Phrase-to-Phrase matching dataset.
worldwide, semantic similarity was calculated based on the • Exclusive experimentation and analysis of one traditional
number of similar words in two blocks of text. However, and four transformer-based techniques is performed to
this yielded inaccurate results as there were instances where extract context and perform semantic similarity analysis.
2

II. RELATED RESEARCH OVERVIEW similar sentences using a GAN-based approach. The authors
Research on Semantic Analysis has been a topic of interest proposed a syntactic and semantic long short-term memory
since the 20th century. Many solutions have been proposed (SSLSTM) algorithm for evaluating semantic similarity. Three
with their implementation on many benchmark datasets. This variations of the sentence similarity generative adversarial
section presents a comprehensive survey of datasets and dif- network (SSGAN) algorithm were proposed for generating
ferent methodologies used for Semantic Similarity Analysis. sentences. The state-of-the-art solutions for tasks in natural
Table I gives an overview of the widely-used datasets for language processing involve the usage of transformers. Pre-
semantic similarity. cisely, transformers in NLP are used to solve NLP tasks
involving the dependency of long sequences. In this context, Li
TABLE I: Popular Datasets for semantic similarity et al. [18] introduced a hybrid Cross2self attention, Bi-RNN -
BERT model to computer semantic similarity in biomedical
Author Dataset Word/Sentence pairs Similarity Score Range data. The methodology was validated on the OHNLP2018
Rubenstein et al. [5] R&G 65 0-4 baselines with an increase of 0.6% in the Pearson coefficient.
Miller et al. [6] M&C 30 0-4
Finkelstein et al. [7] WS353 353 0-10
Another approach involving the usage of BERT for semantic
Agirre et al. [8] STS2015 3000 0-5 similarity of outlook emails was proposed by Sanjeev et al
Marelli et al. [9] SICK 10000 1-5
USTPO Patent Phrase Matching 33000 0-1
[19]. Some of the standard approaches in NLP for semantic
analysis include Word2Vec, proposed by Google in 2013 and
the Glove model. The related research overview could be
Benajiba et al. [10] proposed a solution involving a Siamese summarized in Table II.
LSTM regression model that is used to predict the similarity
of the SQL template of two questions. The authors defined TABLE II: Overview of the existing solutions
a metric called the SQL structure distance used to estimate
the similarity using the proposed methodology. To reduce the Author Methodology Dataset used
Benajiba et al. [10] Siamese LSTM Regression WikiSQL
computational cost of the solution, the authors have clustered Li et al. [11] ISA + Siamese NNs DBMI , CDD-ful, CDD-ref
the training set samples with the one-hot lexical representation Pontes et al. [12] Siamese CNN + LSTM SICK dataset
of the questions. Li et al. [11] proposed another solution Quan et al. [13] Attention Constituency Vector Tree STS’12-STS’15
(ACVT)
involving semantic similarity in biomedical sentences using Shancheng et al. [14] Double Seq. NN + LSTM Chinese semantic similarity
an SNN approach. The methodology involved the integration dataset
Yang et al. [15] Probase M&G, WS353-Sim, and
of an interactive self-attention (ISA) mechanism and an SNN. R&G
The proposed solution was validated on three standard biomed- Liang et al. [17] SSLSTM + SSGAN SemEval and Quora
ical datasets with an average Pearson score of 0.65. Pontes et Li et al. [18] Cross2self, BERT Biomedical Data
Sanjeev et al [19] BERT Outlook Emails
al. [12] proposed a Siamese CNN + LSTM model in which the Our Work DeBERTa + K-Fold Stratified U.S Patent Phrase to
CNN extracts the local context while the LSTM extracts the Cross Validation Phrase Matching Dataset
global context. The proposed methodology was evaluated on
the SICK dataset with different combinations of local context
and global context.
III. METHODOLOGY
Quan et al. [13] proposed a framework combining the capa-
bility of word embeddings and attention weight mechanism by This section deals with the different techniques used to
integrating them into a unified network known as the Attention perform the task semantic similarity analysis. In this work,
Constituency Vector Tree (ACVT). The proposed solution was we compare and analyze the performance of five different
validated on 19 benchmark datasets which include STS’12- techniques used for semantic similarity analysis. These include
STS’15 with a Pearson score of 0.75. Shancheng et al. [14] Levenshtein Metric similarity and four different variants of the
proposed a double sequential network consisting of identical DeBERTa model. This section deals with the architecture of
LSTM layers that simultaneously train two sequences of sen- each model and how it can be fine-tuned for our dataset to
tences. The outputs of both the layers were passed through the perform the required task.
dense layer and compressed to obtain the semantic similarity.
The proposed solution addressed the problem of Chinese
A. Levenshtein Metric
characteristics and was compared with the Baidu Semantic
Text Similarity model and achieved higher accuracy. Yang et In Natural Language Processing, the Levenshtein distance
al. [15] proposed a methodology that involved an extensive between two words is defined as the number of single-
semantic network known as Probase. From the current weights character edits required to convert one word from other
and parameters of probase, the semantic similarity was per- [20]. It is a string metric used to understand the disparity
formed on the MG, WS353-Sim, and RG datasets. between two different sequences. Edits can be defined as
In recent years, Generative Adversarial Networks (GANs) insertion, replacement, and deletion in this context. Some of
have gained tremendous popularity in artificial data generation the Levenshtein Distance applications include DNA Analysis
for various tasks, including image sample generation with and Plagiarism Checking. In this task of semantic similarity
limited data [16] and text generation. In this view, Liang analysis, we experiment with the approach of the Levenshtein
et al. [17] addressed the generation and identification of Distance on our dataset.
3

B. DeBERTa
Sm,n = [Cm,n, Posm|n] × [Cn,m, Posn|m]T
As mentioned in section II, there has been a remarkable = CmCTn + CmPosTn|m + Posm|nCTn + Posm|nPosT
rise in the usage of transformers in many NLP tasks like |
nm
(1)
semantic analysis and dialogue generation. BERT (Bidirec- However, there have been recent improvements in the
tional Encoder Representations from Transformers) has been composition of DeBERTa owing to ELECTRA-Style Pre-
acknowledged as a recent advancement in transformers by Training [24]. The version, also known as DeBERTa-V3, has
researchers at Google in their work [21]. The concept of many variants, including DeBERTa-base, DeBERTa-V3-Small,
BERT lies in the fact that when sequential data is trained DeBERTa-V3-XSmall, and mDeBERTa-V3-Base. The initial
in a bi-directional manner, better and deeper inference can version of DeBERTa uses a mask language modeling (MLM)
be obtained on the data for specific tasks like language mechanism, which is now replaced by replaced token detection
understanding. In Machine Vision, transfer learning is a widely (RTD), considered to be a sample-efficient pre-training task.
used technique by researchers across the globe to perform The variants of DeBERTa differ in their backbone parameters,
various tasks rather than training a model from the onset. vocabulary, hidden size, and layers. The architectural speci-
The idea of transfer learning is that existing deep learning fications of all the variants have been depicted in Table III.
models could be transformed into objective-specific models Once we input the data into the model, we now perform the
by fine-tuning the existing model. This approach has gained task of Stratified K-Fold cross-validation.
significance among NLP researchers worldwide, and transfer
TABLE III: Specifications of the different versions of De-
learning could now be applied to many NLP tasks. BERT
BERTa
employs the usage of a transformer, a mechanism that is
based on attention that comprehends the contextual inference
Model Corpus Backbone Hidden Size Layers
between two words in a corpus. The simplest form of BERT
Parame-
has an encoder and a decoder. The purpose of the encoder is ters(M)
to comprehend the input text, while the decoder’s purpose is DeBERTa-V3-Base 128 86 768 12
to deliver a prediction. DeBERTa-V3-Small 128 44 768 6
Since 2018, there has been a rapid rise in the design and DeBERTa-V3-XSmall 128 22 384 12
development of pre-trained language models like GPT, T5, mDeBERTa-V3-Base 250 86 768 12
RoBERTa, StructBERT, and DeBERTa [22]. However, in this
work, we emphasize the different versions of DeBERTa and
their performance in the U.S Patent to Phrase matching dataset. C. Stratified K-Fold Cross Validation
DeBERTa is a Decoding Enhanced BERT with disentangled It is deemed essential to evaluate our model once trained
attention, which functions based on introducing two novel on our input data. A methodological error is incorporated if
techniques: Disentangled attention and enhanced masked de- the model retains the parameters of a periodic function and is
coding. The concept of disentangled attention is that each experimented on the same data. The prediction scores would
word or token in the input layer is represented by two vectors remain perfect on known labels, and the model’s performance
corresponding to its content and position in the corpus. This would still be unsatisfactory on unseen data. This condition
is inferred from the fact that the word’s position also has is also known as overfitting. So to prevent overfitting, it is
significant importance in content extraction. However, though always deemed essential to split out a chunk of data into the
disentangled attention conveys the relative positions of words, test/validation set. However, there is a probability of overfitting
it is deemed essential to determine the exact position of words the test/validation set due to tweaking the existing parameters
in a corpus to avoid semantic disparity. So to achieve this, until the estimator performs correctly. So to address this
DeBERTa integrates the positional word embeddings prior situation, we perform K − Fold Stratified Cross-Validation
to its softmax layer. Owing to its architecture, DeBERTa [25]. The objective of the K − Fold Stratified Cross-Validation
is considered significantly superior to its counterparts like is that the data is split into K folds. Training is performed on
RoBERTa [23]. K −1 folds while testing is performed on the remaining fold,
The input to this pipeline is the anchor phrase + seperated resulting in a higher performance of the model. Mathemati-
token + the context phrase. DeBERTa uses a metric known cally, the cross-validation estimate CV can be represented in
as the cross-attention score to infer the semantic similarity Eq. 2
between two blocks of text. Mathematically, the cross attention
score of a block m with respect to another block n can be Σ
N
CV = 1/N L(yi, f Ki (xi)) (2)
represented as shown in Eq. 1 where Cm represents the content
i=1
of with
m the word andtoPos
respect n m|n represents the position of the word depicts the actual score, Kidepicts the pre-
. The cross-attention score between two where yi f (xi)
blocks m and n can be categorized into four components: block diction the on Kith fold and L is the loss function. Subse-
value-to-block value, block value-to-index, index-to-index, and quently, we evaluate the model using the Pearson correlation
index-to-block value. as shown in Eq. 1 coefficient.
4

(a) (b) (c) (d)

Fig. 2: Distribution of Terms in (a) Anchor Phrases, (b) Target Phrases, (c) Context Tags. (d) Distribution of scores

Fig. 3: (a) Distribution of anchors with respect to targets, (b) Distribution of anchor count with respect to context, (c) Distribution
of target count with respect to context

IV. RESULTS AND EXPERIMENTATION (a)

A. Dataset Used
This section deals with the description and exploratory data
analysis of the dataset used. We use the U.S Patent Phrase to
Phrase Matching dataset to perform semantic similarity anal-
ysis in this work. The U.S Patent Phrase to Phrase Matching
dataset is derived from the repositories of the U.S. Patent and
Trademark Office (USPTO), and its patent archives stand as
a rare blend of information volume quality, and variety. The
dataset consists of 4 columns: Anchor, Target, Context, and
Score. The first phrase is represented by the anchor columns,
the second by the target columns, and the context column (b)
represents the subject within which the similarity is to be
scored. The dataset consists of 733 unique anchor words and
29340 unique target words. The frequency distribution of terms
of anchor, target, and context columns is depicted in Fig.
2 (a), 2 (b) and 2 (c) respectively. The distribution of the
score column is represented in Fig. 2 (d). Also, we analyze
the distribution of anchor phrases with respect to context and
target terms.Fig. 3 (a) shows the distribution of anchors with
respect to targets, Fig. 3 (b), the distribution of anchor count
with respect to context and Fig. 3 (c) depicts the distribution
of target count with respect to context. Similarly, the character Fig. 4: (a) Distribution of Anchors and Target’s character count
count and word count distribution of both anchor and target (b) word count
columns are illustrated in Fig. 4 (a) and 4 (b) respectively.

B. Results of Semantic Analysis by Levenshtein Metric

C. Performance of DeBERTa-base
The below Fig. 7 shows the distribution of the Levenshtein
similarity score with respect to the number of anchor-target In this section, we illustrate the performance of the
pairs. The figure shows that maximum pairs are present DeBERTa-base model on the dataset. We use three metrics
between 0.1 and 0.2, with the least number of pairs having to evaluate the performance: training loss, validation, and
the Levenshtein similarity score between 0.8 and 1.0. From Pearson correlation coefficient. The training loss measures
this experiment, we conclude that the methodology yields us how deep the model fits the training data and how accurate
a pearson correlation score of 0.4147. the model’s predictions are on the training set. Similarly, we
5

Fig. 5: Training and Validation Losses of DeBERTa-small in (a) Fold 1, (b) Fold 2 and (c) Fold 3

Fig. 6: Variation of pearson scores of DeBERTa-small in (a) Fold 1, (b) Fold 2 and (c) Fold 3

TABLE IV: Performance Metrics of DeBERTa-base

Fold Training Loss Validation Loss Pearson Correlation

1 0.030000 0.024792 0.771547
2 0.030700 0.027380 0.760287
3 0.030500 0.024049 0.751671
4 0.029900 0.028247 0.785597

score in all folds, as depicted in Table V. This can be

justified by the fact that mDeBERTa is trained upon the CC100
multilingual data, and the presence of multilingual backbone
parameters led to the depreciated performance of the model.
Fig. 7: Distribution of Levenshtein Similarity Scores
TABLE V: Performance Metrics of mDeBERTa

define the validation loss as the model’s performance on the Fold Training Loss Validation Loss Pearson Correlation
validation/test set. The final metric, also known as the Pearson 1 0.273200 0.276361 0.116614
correlation coefficient, gives us a linear measure of the strength 2 0.148500 0.141006 0.154153
of two variables. As depicted in Table IV, DeBERTa-base 3 0.147500 0.140739 0.193211
achieved an average training loss of 0.03, validation loss of 4 0.150100 0.136254 0.175404
0.026, and Pearson correlation score of 0.74. In the subsequent
sections, we analyze the performance of DeBERTa-V3-Small,
DeBERTa-V3-XSmall, and mDeBERTa-V3-Base. E. Performance of DeBERTa-Small
DeBERTa-Small is an abridged version of the DeBERTa-
D. Performance of mDeBERTa base, keeping in view the critical parameters required for pre-
In this section, we emphasize the performance of multi- diction. It is trained on 160GB data as its previous version with
lingual DeBERTa on the training and validation sets. The 44M backbone parameters and has a hidden size of 768 with
number of epochs are 5, with the batch size being 128. Despite the number of layers as 6. The model achieved a cumulative
performing stratified K-Fold cross-validation with the number Pearson score of 0.78 after three cross-validation folds. The
of folds set as 4, the model showed an inferior performance in performance metrics of DeBERTa-Small are depicted in Table
terms of similarity prediction with very less Pearson coefficient VI. From Table VI, we can infer that the best performance is
6

illustrated by DeBERTa-Small. Also, we illustrate the training [6] G. A. Miller and W. G. Charles, “Contextual correlates of semantic
and validation losses of each fold in Fig. 5 and the variation similarity,” Language and cognitive processes, vol. 6, no. 1, pp. 1–28,
1991.
of the pearson score in Fig. 6. [7] L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolf-
man, and E. Ruppin, “Placing search in context: The concept revisited,”
TABLE VI: Performance Metrics of DeBERTa-v3-Small in Proceedings of the 10th international conference on World Wide Web,
2001, pp. 406–414.
[8] E. Agirre, C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre,
Fold Training Loss Validation Loss Pearson Correlation W. Guo, I. Lopez-Gazpio, M. Maritxalar, R. Mihalcea et al., “Semeval-
2015 task 2: Semantic textual similarity, english, spanish and pilot on
1 0.003400 0.026166 0.799629 interpretability,” in Proceedings of the 9th international workshop on
2 0.003300 0.027797 0.782329 semantic evaluation (SemEval 2015), 2015, pp. 252–263.
3 0.003500 0.025930 0.803020 [9] M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi, and
R. Zamparelli, “A sick cure for the evaluation of compositional dis-
tributional semantic models,” in Proceedings of the Ninth International
Conference on Language Resources and Evaluation (LREC’14), 2014,
pp. 216–223.
[10] Y. Benajiba, J. Sun, Y. Zhang, L. Jiang, Z. Weng, and O. Biran,
F. Performance of DeBERTa-XSmall “Siamese networks for semantic pattern similarity,” in 2019 IEEE 13th
DeBERTa-XSmall is considered a simplified version of International Conference on Semantic Computing (ICSC). IEEE, 2019,
pp. 191–194.
DeBERTa-Small with only 22M backbone parameters which [11] Z. Li, H. Lin, W. Zheng, M. M. Tadesse, Z. Yang, and J. Wang, “Interac-
is half in number compared to its counterpart. This model tive self-attentive siamese network for biomedical sentence similarity,”
achieved a cumulative Pearson score of 0.765 after four cross- IEEE Access, vol. 8, pp. 84 093–84 104, 2020.
[12] E. L. Pontes, S. Huet, A. C. Linhares, and J. Torres-Moreno, “Predicting
validation folds. However, fewer backbone parameters and the semantic textual similarity with siamese CNN and LSTM,” CoRR,
hidden size justify its lesser performance than DeBERTa- vol. abs/1810.10641, 2018.
Small. The performance metrics of DeBERTa-XSmall are [13] Z. Quan, Z.-J. Wang, Y. Le, B. Yao, K. Li, and J. Yin, “An efficient
framework for sentence similarity modeling,” IEEE/ACM Transactions
depicted in Table VII. on Audio, Speech, and Language Processing, vol. 27, no. 4, pp. 853–
865, 2019.
TABLE VII: Performance Metrics of DeBERTa-XSmall [14] T. Shancheng, B. Yunyue, and M. Fuyu, “A semantic text similarity
model for double short chinese sequences,” in 2018 International Con-
ference on Intelligent Transportation, Big Data Smart City (ICITBS),
Fold Training Loss Validation Loss Pearson Correlation 2018, pp. 736–739.
[15] T. Yang, S. Wu, J. Feng, N. Fu, and M. Tian, “Semantic network
1 0.039200 0.030078 0.774637 based approach to compute term semantic similarity,” in 2019 3rd
2 0.039200 0.031391 0.765988 International Conference on Electronic Information Technology and
Computer Engineering (EITCE), 2019, pp. 654–658.
3 0.038800 0.029105 0.780142
[16] P. R. Medi, P. Nemani, V. R. Pitta, V. Udutalapally, D. Das, and S. P. Mo-
4 0.038700 0.031934 0.755139 hanty, “Skinaid: A gan-based automatic skin lesion monitoring method
for iomt frameworks,” in 2021 19th OITS International Conference on
Information Technology (OCIT), 2021, pp. 200–205.
[17] Z. Liang and S. Zhang, “Generating and measuring similar sentences
using long short-term memory and generative adversarial networks,”
V. CONCLUSION IEEE Access, vol. 9, pp. 112 637–112 654, 2021.
[18] Z. Li, H. Lin, C. Shen, W. Zheng, Z. Yang, and J. Wang, “Cross2self-
This paper experimented with traditional and transformer- attentive bidirectional recurrent neural network with bert for biomedical
based approaches for semantic similarity modeling on large semantic text similarity,” in 2020 IEEE International Conference on
corpora. We also compared our methodology with exist- Bioinformatics and Biomedicine (BIBM), 2020, pp. 1051–1054.
[19] M. M. Sanjeev, B. Ramalingam, and S. Kumar T.K., “Realtime se-
ing techniques, and the results demonstrated the improved mantic similarity analysis of bulk outlook emails using bert,” in 2020
performance of the model. The proposed methodology also International Conference on Advances in Computing, Communication
illustrated context extraction and showed its importance in Materials (ICACCM), 2020, pp. 89–94.
[20] R. Haldar and D. Mukhopadhyay, “Levenshtein distance technique in
similarity modeling. In the following aspects, the execution dictionary lookup methods: An improved approach,” arXiv preprint
time and memory could be optimized, thus leading to en- arXiv:1101.1232, 2011.
hanced training. Also, the architecture of the existing model [21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
could be improved, thus leading to enhanced performance. neural information processing systems, vol. 30, 2017.
[22] P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced bert
with disentangled attention,” in International Conference on Learning
REFERENCES Representations, 2021.
[1] D. Chandrasekaran and V. Mago, “Evolution of semantic similarity—a [23] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–37, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
2021. pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[2] A. K. Singh and M. Shashi, “Vectorization of text documents for [24] P. He, J. Gao, and W. Chen, “Debertav3: Improving deberta using
identifying unifiable news articles,” Int. J. Adv. Comput. Sci. Appl, electra-style pre-training with gradient-disentangled embedding shar-
vol. 10, no. 7, 2019. ing,” arXiv preprint arXiv:2111.09543, 2021.
[3] Y. Zhang, R. Jin, and Z.-H. Zhou, “Understanding bag-of-words model: [25] T.-T. Wong and N.-Y. Yang, “Dependency analysis of accuracy estimates
a statistical framework,” International Journal of Machine Learning and in k-fold cross validation,” IEEE Transactions on Knowledge and Data
Cybernetics, vol. 1, no. 1, pp. 43–52, 2010. Engineering, vol. 29, no. 11, pp. 2417–2427, 2017.
[4] S. Qaiser and R. Ali, “Text mining: use of tf-idf to examine the
relevance of words to documents,” International Journal of Computer
Applications, vol. 181, no. 1, pp. 25–29, 2018.
[5] H. Rubenstein and J. B. Goodenough, “Contextual correlates of syn-
onymy,” Communications of the ACM, vol. 8, no. 10, pp. 627–633, 1965.

S10 1
No ratings yet
S10 1
477 pages
Detailed Lesson Plan in Cookery
88% (80)
Detailed Lesson Plan in Cookery
3 pages
Technical Report: Learning Compound Noun Semantics
No ratings yet
Technical Report: Learning Compound Noun Semantics
167 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
A Survey of Text-Matching Techniques
No ratings yet
A Survey of Text-Matching Techniques
53 pages
Unit 3-1
No ratings yet
Unit 3-1
66 pages
Unit 5 DL
No ratings yet
Unit 5 DL
11 pages
Evolution of Semantic Similarity - A Survey
No ratings yet
Evolution of Semantic Similarity - A Survey
35 pages
Daily Lesson LOG: School Grade Level Teacher Learning Area Teaching Dates and Time Quarter
89% (9)
Daily Lesson LOG: School Grade Level Teacher Learning Area Teaching Dates and Time Quarter
16 pages
Semantic Analysis-Week 7
No ratings yet
Semantic Analysis-Week 7
24 pages
Texim Fast: Text-To-Image Encoding For Semantic Similarity Evaluation of Disproportionate Sequences
No ratings yet
Texim Fast: Text-To-Image Encoding For Semantic Similarity Evaluation of Disproportionate Sequences
23 pages
Feature Eng
No ratings yet
Feature Eng
34 pages
SYLLABUS-ED 640 Foundations of Education1
100% (2)
SYLLABUS-ED 640 Foundations of Education1
7 pages
A Systematic Literature Review of Similarity Analysis Techniques For Bangla Text
No ratings yet
A Systematic Literature Review of Similarity Analysis Techniques For Bangla Text
8 pages
NLP
No ratings yet
NLP
29 pages
Unit 3 and 4 Notes
No ratings yet
Unit 3 and 4 Notes
27 pages
8-Measuring Text Similarity Based On Structure and Word Embedding
No ratings yet
8-Measuring Text Similarity Based On Structure and Word Embedding
20 pages
Admin, 4015
No ratings yet
Admin, 4015
19 pages
Text Similarity in Vector Space Models: A Comparative Study
No ratings yet
Text Similarity in Vector Space Models: A Comparative Study
17 pages
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
No ratings yet
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
13 pages
Paper 125
No ratings yet
Paper 125
11 pages
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
No ratings yet
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
29 pages
Text Semantic Similarity
No ratings yet
Text Semantic Similarity
17 pages
Sun 等 - 2022 - Sentence Similarity Based on Contexts
No ratings yet
Sun 等 - 2022 - Sentence Similarity Based on Contexts
16 pages
NLP Project
No ratings yet
NLP Project
16 pages
Vector Based Models
No ratings yet
Vector Based Models
41 pages
10 1002@cpe 5971
No ratings yet
10 1002@cpe 5971
17 pages
A Comparative Analysis of Temporal Long Text Similarity: Application To Financial Documents
No ratings yet
A Comparative Analysis of Temporal Long Text Similarity: Application To Financial Documents
15 pages
EFSET Correlation Summary
50% (2)
EFSET Correlation Summary
3 pages
Semeval-2012 Task 6: A Pilot On Semantic Textual Similarity
No ratings yet
Semeval-2012 Task 6: A Pilot On Semantic Textual Similarity
9 pages
1677 Multiway Attention Modeli
No ratings yet
1677 Multiway Attention Modeli
9 pages
Semantic Similarity Between Medium-Sized Texts
No ratings yet
Semantic Similarity Between Medium-Sized Texts
13 pages
AAAI06-123 (Revisar para Referencias)
No ratings yet
AAAI06-123 (Revisar para Referencias)
6 pages
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
No ratings yet
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
9 pages
A Survey of Numerous Text Similarity Approach
No ratings yet
A Survey of Numerous Text Similarity Approach
10 pages
Semantic Text Analysis
No ratings yet
Semantic Text Analysis
6 pages
2020 Lrec-1 851
No ratings yet
2020 Lrec-1 851
6 pages
Irs Unit5
No ratings yet
Irs Unit5
6 pages
Detailed Lesson Plan in English III CO1
100% (4)
Detailed Lesson Plan in English III CO1
6 pages
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
No ratings yet
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
8 pages
NLP Unit Test 2
No ratings yet
NLP Unit Test 2
10 pages
Cambridge English Qualifications Orientation
No ratings yet
Cambridge English Qualifications Orientation
46 pages
NLP 2
No ratings yet
NLP 2
8 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
A Survey of Text Similarity Approaches: Wael H. Gomaa Aly A. Fahmy
No ratings yet
A Survey of Text Similarity Approaches: Wael H. Gomaa Aly A. Fahmy
6 pages
Evaluating of Efficacy Semantic Similarity Methods
No ratings yet
Evaluating of Efficacy Semantic Similarity Methods
8 pages
Sentence-Level Semantic Textual Similarity Using Word-Level Semantics
No ratings yet
Sentence-Level Semantic Textual Similarity Using Word-Level Semantics
4 pages
Expert Systems With Applications: Raja Muhammad Suleman, Ioannis Korkontzelos
No ratings yet
Expert Systems With Applications: Raja Muhammad Suleman, Ioannis Korkontzelos
9 pages
A Survey On Semantic Similarity Measures
No ratings yet
A Survey On Semantic Similarity Measures
5 pages
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
No ratings yet
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
11 pages
Semantic Similarity
No ratings yet
Semantic Similarity
14 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
Published Paper
No ratings yet
Published Paper
12 pages
978 3 319 11749 2 - 8 PDF
No ratings yet
978 3 319 11749 2 - 8 PDF
2 pages
Text Similarity Using Siamese Networks and Transformers
No ratings yet
Text Similarity Using Siamese Networks and Transformers
10 pages
A Comparison of Document Similarity Algorithms
No ratings yet
A Comparison of Document Similarity Algorithms
10 pages
AAA Intro Maria Hanif
No ratings yet
AAA Intro Maria Hanif
3 pages
NLP - Mid 2 Examination
No ratings yet
NLP - Mid 2 Examination
4 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
Deep Learning For Semantic Similarity
No ratings yet
Deep Learning For Semantic Similarity
7 pages
Instructional Materials and English Proficiecyof Grade 10 Students in Magwawaintegrated SCHOOLS.Y 2021-2022
No ratings yet
Instructional Materials and English Proficiecyof Grade 10 Students in Magwawaintegrated SCHOOLS.Y 2021-2022
33 pages
Kaula
No ratings yet
Kaula
2 pages
Curriculum Corner Phonemic Awareness
No ratings yet
Curriculum Corner Phonemic Awareness
2 pages
Review On NLP Paraphrase Detection Approaches
No ratings yet
Review On NLP Paraphrase Detection Approaches
4 pages
Needs Analysis For Hotel Recptionists
No ratings yet
Needs Analysis For Hotel Recptionists
4 pages
Lesson Plan On The Continent of Africa
No ratings yet
Lesson Plan On The Continent of Africa
8 pages
Climate Change Rubric
No ratings yet
Climate Change Rubric
2 pages
Chapter - 4 & 5
No ratings yet
Chapter - 4 & 5
63 pages
1ms Yearly Planning 20212022
No ratings yet
1ms Yearly Planning 20212022
7 pages
Student Fee Rates For FY 2020 and FY 2021
No ratings yet
Student Fee Rates For FY 2020 and FY 2021
18 pages
Week 1 Contemporary Home-Based Activity 2
No ratings yet
Week 1 Contemporary Home-Based Activity 2
3 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
2 pages
Input Data Sheet: Item Analysis in
No ratings yet
Input Data Sheet: Item Analysis in
24 pages
Contoh RPP k13 Versi Inggris
No ratings yet
Contoh RPP k13 Versi Inggris
8 pages
Brady Newton-Resume
No ratings yet
Brady Newton-Resume
2 pages
Kumar Kumaresan
No ratings yet
Kumar Kumaresan
16 pages
Station Rotations Model Planning Template: Step 1: Reimagine The Learning Environment
No ratings yet
Station Rotations Model Planning Template: Step 1: Reimagine The Learning Environment
10 pages
Idan Fakhry (1908103172) Journal
No ratings yet
Idan Fakhry (1908103172) Journal
13 pages
Zulema Tesol
No ratings yet
Zulema Tesol
4 pages
21AI63 Simp 23
No ratings yet
21AI63 Simp 23
3 pages
Neha Chheda Resume 2016
No ratings yet
Neha Chheda Resume 2016
2 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Review of Related Literature: One Mehan Gardens, Manila, Philippines, 1000
No ratings yet
Review of Related Literature: One Mehan Gardens, Manila, Philippines, 1000
8 pages
Teacher S Resource Centre at A Glance L4
No ratings yet
Teacher S Resource Centre at A Glance L4
1 page
Rubix Cube
No ratings yet
Rubix Cube
1 page
Name: Erika Mae Aliviano Teacher: Ms. Ailun Jaugan Gr. & Sec.: Stem 11-Innovativeness III - Self - Learning Activities
No ratings yet
Name: Erika Mae Aliviano Teacher: Ms. Ailun Jaugan Gr. & Sec.: Stem 11-Innovativeness III - Self - Learning Activities
2 pages