0% found this document useful (0 votes)

29 views

Published Paper

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Published Paper

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Semantic textual similarity between

sentences using bilingual word semantics

Md. Shajalal & Masaki Aono

Progress in Artificial Intelligence

ISSN 2192-6352

Prog Artif Intell

DOI 10.1007/s13748-019-00180-4

1 23
Your article is protected by copyright and
all rights are held exclusively by Springer-
Verlag GmbH Germany, part of Springer
Nature. This e-offprint is for personal use only
and shall not be self-archived in electronic
repositories. If you wish to self-archive your
article, please use the accepted manuscript
version for posting on your own website. You
may further deposit the accepted manuscript
version in any repository, provided it is only
made publicly available 12 months after
official publication or later and provided
acknowledgement is given to the original
source of publication and a link is inserted
to the published article on Springer's
website. The link must be accompanied by
the following text: "The final publication is
available at link.springer.com”.

1 23
Author's personal copy
Progress in Artificial Intelligence
https://doi.org/10.1007/s13748-019-00180-4

REGULAR PAPER

Semantic textual similarity between sentences using bilingual word

semantics
Md. Shajalal1 · Masaki Aono2

Received: 4 December 2018 / Accepted: 2 March 2019

Abstract
Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical
similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity,
but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level
semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute
representation of words in two different languages. The similarity function based on the concept-to-concept relationship
corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced
based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new
semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences
is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated
employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity
(STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual
similarity and outperforms some known related methods.

Keywords Semantic similarity · Word semantics · Word-embedding · Textual similarity · Bilingual semantics

1 Introduction the text similarity is employed widely which include text

summarization, machine translation, paraphrase detection,
Semantic textual similarity between sentences is beneficial sentiment analysis, etc [3].
and mandatory for many information retrieval (IR) tasks. The The most typical technique for computing the textual sim-
vector space model in IR is the earliest application of textual ilarity between two text segments is lexical matching that
similarity. The model retrieves the most relevant documents produces the similarity score considering the number of lex-
for a given query from a certain collection using the text simi- ical items (words/phrases) exist in both input segments [17].
larity between query and documents. The textual similarity is There are some other techniques to improve the similar-
used in some other applications such as web search, subtopic ity measure by canonicalizing the text using stemming,
mining, word sense disambiguation (WSD), relevance feed- stopword removal, part-of-speech (POS) tagging, longest
back, text classification and so on [20,23,26]. There are also common subsequence matching, etc [24]. But these types
some natural language processing (NLP) applications where of similarity measures are not able to capture the similarity
beyond a trivial level. Furthermore, these similarity mea-
sures only can capture textual similarity, but cannot always
B Md. Shajalal
identify the semantic similarity. For example, consider two
[email protected]
sentences “I own a cat” and “I have a pet.” There is an
Masaki Aono
[email protected] obvious semantic similarity between these two sentences,
but the conventional textual similarity measures are not able
1 Department of Computer Science and Mathematics, to capture any kind of semantic connection between the given
Bangladesh Agricultural University, 2202 Mymensingh, sentences. On the contrary, consider two sentences “How are
Bangladesh
you?” and “How old are you?,” though there is no semantic
2 Department of Computer Science and Engineering, Toyohashi similarity between these two sentences, the lexical match-
University of Technology, Toyohashi, Aichi, Japan

123
Author's personal copy
Progress in Artificial Intelligence

Table 1 The similarity scores for different sentence pairs ranging [0,1] [9]
Sentence1 Sentence2 Score

The two sentences are completely equivalent, as they mean the same thing
The bird is bathing in the sink Birdie is washing itself in the water basin 5.00
The two sentences are mostly equivalent, but some unimportant details differ
In May 2010, the troops attempted to invade Kabul The US army invaded Kabul on May 7, 2010 4.00
The two sentences are roughly equivalent, but some important information differs/missing
John said he is considered a witness but not a suspect “He is not a suspect anymore”. John said 3.00
The two sentences are not equivalent, but share some details
They flew out of the nest in groups They flew into the nest together 2.00
The two sentences are not equivalent, but are on the same topic
The woman is playing the violin The young lady enjoys listening to the guitar 1.00
The two sentences are completely dissimilar
John went horse back riding at dawn with a whole Sunrise at dawn is a magnificent view to take in if 0.00
group of friends you wake up early enough for it

ing will give the decision that they are 75% similar in terms ity. In Sect. 3, we present our proposed method to estimate
of words overlap. Some example sentence pairs from STS- semantic similarity by addressing the challenges. We present
2017 [9] with their corresponding semantic similarity scores the experiments and evaluation results to show the effective-
ranging [0,5] are given in Table 1. The larger the score, the ness of our proposed method in Sect. 4. Some concluded
more the similarity between the sentences is. remarks and future directions are described in Sect. 5.
Recently, semantic textual similarity (STS) has gained
much attention in the research community [1,2,9,24,25,28].
SemEval (Semantic Evaluation)1 is organizing some mul- 2 Related work
tilingual and cross-language tasks on STS in the recent
past [1,2,9]. Researchers proposed different methods to SemEval is organizing some multilingual and cross-language
compute the similarity based on different techniques and tasks on STS in the recent past [1,2,9]. The participat-
resources [1,2,9,24,25,28]. ing methods in SemEval STS tasks estimate the similar-
This paper presents an effective method for measuring the ity by introducing numerous features employing multiple
semantic textual similarity which uses bilingual word-level resources [4,25]. They identified and applied some rules to
semantics to estimate sentence-level semantic similarity. To incorporate the challenges like currency values, negation,
capture word-level semantics, we utilize vector represen- compounds, number overlap and literal matching [1,2,9]. To
tation of words (word-embedding) and WordNet. Multiple extract features, they leveraged multiple knowledge bases
new semantic similarity measures are proposed based on such as WordNet and Wikipedia. They also used some tools
the word-embedding model. We leverage multiple pretrained to amplify the performance which includes named entity
embedding models which are trained on three different cor- recognizer (NER), dependency parser, stemmer, lemmatizer,
pora in two different languages. In these similarity measures, part-of-speech (POS) tagger, stopword list, etc. Mihalcea et
we exploit the word-level similarity between words from two al. [24] suggested different types of corpus-based and knowl-
corresponding sentences only when the words belong to the edge bases similarity measures. Their method utilized the
same class (POS tag). Apart from these, we also introduced word-level similarity from the corresponding texts. Hassan-
a new similarity measure by utilizing the word sense from zadeh et al. [15] introduced multiple syntactic, semantic and
WordNet. To estimate the importance of each measure, a structural features based on the content information of the
supervised linear regression model is used. Finally, a linear text segments and external resources to capture the similarity.
weighted ranking is applied to all measures with there impor- Kozareva et al. [19] introduced an answer validation system
tance score. The experimental results on SemEval STS 2017 by adopting a machine-learning-based textual entailment or
test collections demonstrated that our proposed method is similarity system using multilingual semantics, and the lan-
effective and outperformed some known related works. guages include English, Dutch, French, German, Spanish,
The rest of the paper is structured as follows: Section 2 Italian and Portuguese.
summarizes the related work on semantic textual similar- The semantic information from some external struc-
tured knowledge bases such as Wikipedia and WordNet is
1 SemEval: https://en.wikipedia.org/wiki/SemEval. employed to estimate the similarity. In some prior works [11,

123
Author's personal copy
Progress in Artificial Intelligence

Fig. 1 Semantic textual

Sentence 1 Sentence 2
similarity estimation framework Wikipedia Google News Resources
(English & Corpus
Bengali)

Preprocessing Similarity Measures Similarity Score Estimation Score

WordNet based Similarity

Word-Embedding based Similarity

ElasticNet Regularization

Traditional Similarity
Weighted Linear Ranking

12,22], the proposed methods identified the semantic mean- that have little or no semantic meaning associated) are filtered
ing of words from WordNet and applied that seman- out from the sentence by using Indri’s2 stopwords collec-
tic information to compute the similarity between texts. tion. We also apply NLTK WordNet lemmatizer to the words
Researchers also proposed corpus-based methods combin- that convert them into their base form. For example, con-
ing with WordNet-based measures [21,24]. In [24], they sider an example sentence “A girl is on a sandy beach”.
introduced an approach where the individual weight of each The outcome of the preprocessing phase is a set of words
word estimated using a large corpus is exploited with the S = {girl, sandy, beach}.
similarity score derived from WordNet. They applied the
similarities between words in similar class in two sen- 3.2 Similarity measures
tences. The average of the maximum similarities is then
used as the final similarity score. Li et al. [21] combined Let S1 and S2 be the two set of words for two corresponding
the word order score with WordNet-based measure to cal- sentences. We propose multiple new similarity measures with
culate the sentence similarity. Recently, researchers tried the help of WordNet and word-embedding that exploit the
word-embedding-based techniques which are also used for words’ semantics to estimate the sentence-level similarity.
semantic similarity [14,18]. In this regard, we used word-level semantics from WordNet
sense. We also employ multiple pretrained word-embedding
models in two different languages to estimate the similarity of
3 Our approach sentence-pair using the vector representation of words. Here,
we try to capture word-level semantics from the distributed
This section presents our proposed method to compute the representation of words in multiple text corpora. The details
semantic textual similarity between two sentences. The high- of each similarity measures are described below.
level building blocks of our methods are depicted in Fig. 1 that
can be divided into three phases, namely (1) Preprocessing, 3.2.1 Similarity based on wordnet
(2) Similarity measures and (3) Similarity score estimation.
In preprocessing phase, each sentence is canonicalized We utilize word-level semantic information from WordNet
into a set of words after filtering the stopwords out. The to compute sentence-level semantic similarity. The similar-
lemmatization is also applied to each word to convert into ities in WordNet are defined between concept-to-concept,
their base form. Then, we introduce multiple new seman- rather than words. We leverage those concepts to estimate
tic similarity measures and discuss how they will estimate word-to-word similarity for two words corresponding to two
the similarity of sentence-pair in Similarity measure phase. sentences.
Wikipedia (both Bengali & English) and Google News Cor-
Similarity based on WordNet (WN_sim): Our proposed
pus are used as the resources in this phase. We also investigate
similarity measure based on WordNet is defined as
the performance of the classical traditional similarity mea-
follows:
sures to compare the performance with newly introduced
ones. Finally, the similarity score is computed by leveraging
all extracted features with their importance score estimated
w∈S1 Max(simlch (w, v)) · weight(w)
using a supervised feature selection technique. v∈S2
W Nsim (S1 , S2 ) =
w∈S1 weight(w)
3.1 Preprocessing (1)

Given a sentence-pair, at first, the punctuation marks are 2 Indri’s Stopwords: http://www.lemurproject.org/stopwords/stoplist.
removed from each sentence. Then, the stopwords (words dft.

123
Author's personal copy
Progress in Artificial Intelligence

Table 2 Basic notation used in

Symbol Description
Algorithm 1
S_ter ms[ ] List of words after processing sentence S
AVS N −dimensional average feature vector for each sentence S
tc_S Total number of words contain in vocabulary of w2v_model for S
w2v_model Trained word-embedding model
vocab(w2v_model) Vocabulary of w2v_model
add(t, AVS, w2v_model) Adding the N −dimensional vector for term t with AVS
divide(AVS, tc_S) Dividing each value of AVS with tc_S
μ Mean of elements of the average vector AVS
σ Standard deviation of the elements in vector AVS

where Max(simlch (w, v)) returns the maximum similarity same POS tag. tagw denotes the POS of w which must be
v∈S2
same to POS of v.
score of word w in sentence S1 with the any of the words
v exist in sentence S2 . The function weight(w) denotes the Average Feature vector-based Similarity (AFS_sim): For
IDF of w. The similarity function sim lch (w, v) is the Lea- a given sentience S, our next similarity measure first com-
cock & Chodorw similarity defined as follows: putes the feature vector for each word and then returns a
feature vector for sentence S by averaging the word vectors.
length(w, v) The average feature vector AVS per sentence is calculated by
simlch (w, v) = − log
2· D the following Algorithm 1 named AVSC. The pseudocode is
presented in Algorithm 1. Table 2 summarizes the basic nota-
where length(w, v) is the shortest distance from concept w tion used in the algorithm.
to concept v. The maximum depth is defined by D. For a given sentence S1, we first remove different types of
punctuation marks, digits, etc. The preprocessed sentence is
3.2.2 Similarity based on word-embedding split into the list of words. The list is denoted by S_ter ms[ ].
This list is used to compute the similarity between two cor-
Word-embedding represents each word as a vector in a high responding sentences S1 and S2, respectively.
dimension. The embedding space can be used to extract
the semantic information of words. Therefore, pretrained Algorithm 1: Algorithm fo AVS Calculation: AVSC(S,
word-embedding model is used in this research to introduce w2v_model)
two new similarity measures to capture word-level seman- Input: Sentence S and Word2Vec model, w2v_model
tics. Output: Average Feature Vector, AVS
1 S_ter ms[ ] ← Pr epr ocess(S1)
Average Pairwise Similarity(APS_sim): In our first word- 2 AVS ← [0, · · · , 0]
embedding-based similarity measure, the similarity is esti- 3 tc_S ← 0
mated between two words within same classes. The words 4 for each term, t ∈ S_ter ms do
5 if t in vocab(w2v_model) then
are classified considering their part-of-speech (POS) tags. If 6 AVS ← add(t, AVS, w2v_model)
two words have the same POS tag, they are considered in the 7 tc_S++
same class. In other words, the similarity is estimated if and 8 end
only if two words from two sentences have the same POS 9 end
tag. The average pairwise similarity A P Wsim is defined as 10 AVS ← divide(AVS, tc_S)
follows: 11 AVS ← z_normalization(AVS, μ, σ ) [ where x = x−μ
σ ]

APSsim (S1 , S2 )
Then, we compute the average feature vector AVS for each
w∈S1 Max(simv∈S2 (w, tagw , v, wvmodel )) sentence. Word-embedding model returns a N-dimensional
= (2)
|S| vector for each term. Therefore, we retrieve the feature vec-
tor for each term t belongs to S_ter ms[ ]. The vectors are
where Max(simv∈S2 (w, tagw , v, wvmodel )) returns the max- computed only for words those belong to the vocabulary of
imum similarity as described in Eq. 1. The function sim the Word2Vec model, vocab(w2v_model). The feature vec-
v∈S2
(w, tagw , v, wvmodel ) denotes the similarity based on word- tors for each word belongs to a particular sentence are then
embedding model wvmodel between w and v which share the added.

123
Author's personal copy
Progress in Artificial Intelligence

This addition is done in scope that starts in step 4 and ends 3.3.1 Importance estimation
in step 8. The vector after the addition is stored in AVS for the
corresponding sentence. Each values in AVS is then divided ElasticNet regularization model [29] is a supervised feature
by total number of words tc_S for the respective sentence S. selection and importance estimation technique applied to
The division is done in step 10. Averaging the vectors might compute the importance of every similarity measure. The
go toward any hidden differences; it can be important and extracted features may contain some noisy and redundant
not. Therefore, we normalize each of the elements of vector features. Those features may not contribute to the accuracy
AVS in a particular scale. The normalization is done in step of the predictive model or may decrease the accuracy of the
11. The following Z-score normalization technique is applied model. We employ a supervised feature selection technique
in this purpose. ElasticNet, to estimate the importance of each measure.
The ElasticNet is a regularized regression method that
x −μ linearly combines the l1 and l2 penalties of the lasso and
x = ridge methods. The Elastic Net [13,29] can be defined by
σ
the combination of Lasso [27] and ridge regression [16] as
where the normalized value and original value are indicated follows:
by x and x, respectively. μ and σ denote the mean and stan-
1
n
dard deviation of the vector’s elements, respectively. Then,
min (yi − β0 − xβ)2 + λPα (β) (4)
we apply the cosine similarity to compute the similarity score β,β0 ∈R p+1 2n
i=1
between S1 and S2.
Let AVS1 and AVS2 be the two normalized average fea- where
ture vectors calculated by Algorithm 1 for the corresponding
two sentences S1 and S2 for measuring the similarity. The 1
average feature vector-based similarity (AFSsim ) measure is Pα (β) = (1 − α) β l22 +α β l1
2
defined as follows: p
1
= (1 − α)β 2j + α β j
AVS1 · AVS2 2
j=1
AFSsim (S1, S2) =
||AVS1|| · ||AVS2||
N where yi denotes the response of observations i and x repre-
AV S1i · AV S2i
= i=0 (3) sents the data. n p and β are the sample size, the dimension
N 2 N 2
i=0 AV S1i i=0 AV S2i of the features space and the parameters of the linear regres-
sion, respectively. For α = [0, 1], the elastic net reduces to
where AVS1i and AVS2i denote the i-th feature value of the ridge regression and the Lasso, respectively. Due to the
vector AVS1 and AVS2, respectively. The dimension of the smoothness of the l2 norm, ridge regression always keeps all
vector AVS is denoted by N . To investigate the performance the explanatory variables in the model. On the other hand,
of the traditional textual similarity functions, we also employ Lasso provides a compact representation of the features space
some similarities including edit-distance-based lexical simi- for the sharp edge of the l1 constraints [13].
larity, Jaccard similarity, etc. However, Lasso have some limitations compared to the
elastic net. One of them is that the number of features selected
by Lasso may not exceed the sample size when p > n.
3.3 Similarity score estimation Another limitation is that when a group of features is highly
correlated to each other, Lasso does not make a good selec-
The final similarity score is computed using our proposed tion. We, therefore, make use of the elastic net to alleviate
semantic similarity measures (described in the previous sec- these limitations.
tion) and some common traditional similarity measures. The
similarity scores from these measures are varied widely.
Alternatively, the different individual measure has the dif- 3.3.2 Linear ranking
ferent levels of contribution in computing the sentence
similarity. That is why, we estimate the measures’ importance Finally, we employed a linear ranking approach using all
using supervised feature selection technique. Here, we treat similarity measures as well as their weight to estimate the
the different similarity measures as features set. We divide final similarity score between S1 and S2 as follows
this step into two phase which includes i. Importance Esti- T
mation and ii. Linear Ranking. The remainder of this section i=1 wi
· S M i (S1, S2)
sim(S1, S2) = T (5)
i=1 wi
presents the details of these two phases.

123
Author's personal copy
Progress in Artificial Intelligence

Table 3 Overview of different

Model Dimension Vocabulary size Corpus Language
word-embedding models
W2V_GN 300 3,000,000 Google News Corpus English
W2V_Wiki_E 200 71,291 Wikipedia English
W2V_Wiki_B 200 77,076 Wikipedia Bengali

where SMi (S1, S2) denotes the i-th similarity measure and 3. Ignore the grammatical errors and awkward wording as
wi is the importance score of i-th corresponding measure long as they do not obscure what is being conveyed.
estimated by Eq. 4. 4. Avoid overlabeling pairs with middle range scores.
5. Be careful of overreliance on an extreme score like 0 or
5.
4 Experiments and evaluation
In total, we exploited three pretrained word-embedding
We carried out experiments with a wide range of experimen-
models. The summary of them is presented in Table 3. Two
tal settings on STS-2017 dataset and validate the performance
of them are trained with English and Bengali Wikipedia, and
of our method with a standard evaluation metric used in
other one is trained with Google News Corpus. We used
SemEval STS task. The remainder of this section presents the
python Google Translator package to translate the sentences
details of dataset collection, evaluation metric, experimental
to capture semantics using Bengali word-embedding model.
setup and the comparative discussion about the performance
The Bengali word-embedding model and translated sentence
of our method with some known related works.
pairs are used by Eqs. 2 and 3. NLTK POS tagger and NLTK
WordNet lemmatizer have also been used to identify the POS
4.1 Dataset collection
of the word and stem the word, respectively. The IDF score
in Eq. 1 for each words is estimated using the Clueweb09 [8]
To test the performance of our proposed method, we car-
document corpus which is comprised with 50 millions of web
ried out experiments on a benchmark dataset for semantic
documents.
textual similarity. The SemEval Semantic Textual Similarity
2017 [9] (STS20173 ) task provided a dataset of 250 pairs of
sentences. The STS2017 organizers provided the similarity 4.2 Evaluation metric
score per sentence-pair that was calculated by human asses-
sors’ judgments. We employed their provided gold-standard The performance of our method has been tested based on
judgment as a ground truth in this research. Their human Pearson Correlation Coefficient.4 This evaluation metric has
assessors have given the similarity score using the following also been used as an official metric to test the performance
similarity label ranges from [0, 5]. of a method in SemEval STS2017 [9].
Let X = {x1 , x2 , x3 . . . xn } and Y = {y1 , y2 , y3 . . . yn } be
• Label 0: On different topics the two sets of scores for n pairs of sentences generated by the
• Label 1: Not similar but share few common details system and human assessors’ judgment, respectively. Each
• Label 2: Not similar but share some common details element xi or yi in set X and Y , respectively, represents the
• Label 3: Roughly similar semantic textual similarity between i-th sentence-pair. The
• Label 4: Similar Pearson Coefficient Correlation r is defined as follows:
• Label 5: Completely similar
n
i=1 (x i
− x̄)(yi − ȳ)
The distribution of the similarity labels after the annota- r = (6)
n n
tion is mentioned elsewhere in [9]. The human assessors are (x
i=1 i − x̄) 2
i=1 (yi − ȳ)
2

instructed to assign the labels as followings [9]:

where n is the number of sentence pairs and xi and yi are the
1. Assign labels as precisely as possible according to the similarity scores given by participant and human assessors,
underlying meaning of the two sentences rather than their respectively, indexed with i. The arithmetic mean of the ele-
n
superficial similarities or differences. ments of X is defined by x̄ = n1 i=1 xi and analogously
2. Be careful of wording differences that have an impact on for ȳ.
what is being said or described.

3 STS2017: http://alt.qcri.org/semeval2017/task1/. 4 https://en.wikipedia.org/wiki/Pearson_correlation_coefficient.

123
Author's personal copy
Progress in Artificial Intelligence

4.3 Measures’ importance similarity. Among other measures with all variants, two mea-
sures (Eqs. 2, 3) based on word-embedding trained with
We applied ElasticNet [29] regularization technique to com- Google News Corpus ranked second and third which reflect
pute the importance of each similarity measure. In this regard, the importance of our proposed measures.
an R package named glmnet has been used to compute the
importance. We applied fivefold cross-validation with the 4.4 Experimental setup
parameter α ranging from 0.1 to 0.9. Figure 2 reflects the esti-
mated importance of similarity measures. The figure reflects To validate the performance of our method, we conducted
that our introduced similarity measures based on WordNet experiments in multiple experimental settings. At first, we
(Eq. 1) ranked in the first position. Therefore, we can con- applied our proposed semantic similarity measures based on
clude that semantics from WordNet can capture the semantic WordNet and word-embedding separately to observe the per-
formance. Then, we applied the combination of all three
0.2 proposed measures in which different embedding models
0.18 (illustrated in Table 3) were used. Finally, we applied the
Importance (Coefficient)

0.16 linear ranking discussed in Sect. 3.3 (Eq. 5) on all measures

with their variants in terms of the embedding models. The
0.14
lexical matching in terms of word overlap is used as the base-
0.12 line. The summary of all experimental setup is illustrated in
0.1 Table 4.
0.08
4.5 Experimental results
0.06
0.04 The performance of our proposed method with all experi-
0.02 mental settings on SemEval STS 2017 test collection [9] in
0 terms of Pearson’s ( r × 100) is summarized in Table 5.
AFS_Wiki_B

APS_Wiki_B

The table illustrates that the method applying super-

AFS_Wiki_E

APS_Wiki_E
APS_GN
WN

AFS_GN

vised feature selection on proposed semantic similarity

measures with their variants, TF + WN+WE_AV, achieved
the best performance over all other experimental settings.
Among all introduced measures including their variants,
Similarity Measures
WN_sim performs better than the other measures based on
Fig. 2 Estimated importance of similarity measure. The importance word-embedding. The performances of the proposed mea-
score is represented by the Y-axis and the similarity measures are rep- sures using different word-embedding models (WE_GN,
resented by X-axis WE_Wiki_E, and WE_Wiki_B corresponding to the settings

Table 4 Summary of all

SN Run Description
experimental settings
1 WN_sim Similarity score computed by WNsim (S1 , S2 ) (Eq. 1)
2 WE_GN Applied word-embedding-based similarity measures APSsim (S1 , S2 ) and
AFSsim (S1 , S2 ) presented in Eq. 2 and Eq. 3 with 300-dimensional (i.e.,
N = 300) word2vec model trained with Google News Corpus
3 WE_Wiki_E Applied word-embedding-based similarity measures APSsim (S1 , S2 ) and
AFSsim (S1 , S2 ) presented in Eq. 2 and Eq. 3 with 200-dimensional
word2vec model trained with English Wikipedia
4 WE_Wiki_B Applied word-embedding-based similarity measures APSsim (S1 , S2 ) and
AFSsim (S1 , S2 ) presented in Eq. 2 and Eq. 3 with 200-dimensional
word2vec model trained with Bengali Wikipedia
5 WN+WE_GN Applied linear ranking of experiment setup 1 and setup 2
6 WN+WE_Wiki_E Applied linear ranking on experiment setup 1 and setup 3
7 WN+WE_Wiki_B Applied linear ranking on experiment setup 1 and setup 4
8 TF + WN+WE_AV Applied linear ranking with all introduce measures including their variants
and traditional measures
9 LM Applied lexical matching based on terms overlap

123
Author's personal copy
Progress in Artificial Intelligence

Table 5 Performance of our proposed method in terms of Pearson’s The sentence-pairwise performance comparison among
( r × 100) on SemEval 2017 semantic textual similarity dataset (STS experimental setting 1, 2, 3, 4 and 8 (wrt. Table 4) for 40 ran-
2017)
domly selected sentence pairs is depicted in Fig. 3. The figure
Method Run Pearson’s (r × 100) indicates that the evaluation results of individual settings are
Our method WN_sim 66.63
varied widely. But most of the case, we can see that our pro-
posed method using ElasticNet importance estimation, TF +
WE_GN 65.09
WN+WE_AV achieved better performance compared to the
WE_Wiki_E 63.15
others.
WE_Wiki_B 57.29
WN+WE_GN 68.81
WN+WE_Wiki_E 67.42 4.6 Comparison with related work
WN+WE_Wiki_B 59.12
TF + WN+WE_AV 77.13 To validate the performance of our proposed method, we
Baseline LM 31.59 compared the performance with some known related methods
The best result is in bold among all and baseline. The Table 6 reflects the comparison between
our method and some known related methods in terms of
Pearson’s correlation coefficient r × 100. The table illus-
2, 3, and 4) are also quite competitive as compared to the trates that our proposed method outperformed other methods
WN_sim. Therefore, we can conclude that our proposed as well as the baseline.
measures are able to capture better semantics to estimate
the semantic similarity between texts. Moreover, our pro-
posed measures applied to multiple word-embedding models Table 6 Performance comparison among our proposed method and
some known related methods in terms of Pearson’s ( r × 100) on
trained on different corpora as well as different languages
SemEval 2017 semantic textual similarity dataset (STS 2017)
achieved competitive and consistent performance. These
findings and results demonstrated the effectiveness of the Method Run Pearson’s (r × 100)
proposed measures in computing the semantic similarity of Our method TF + WN+WE_AV 77.13
texts. Combining the measures based on WordNet and word- Related methods Bonet & Cedeño [10] 72.69
embedding (settings WN+WE_GN, WN+WE_Wiki_E, and Bjerva and Östling [7] 69.06
WN+WE_Wiki_B) further improves the performance and MatrusriIndia [9] 65.79
also surpasses the individual contribution. Finally, the results
NLPProxem [9] 62.56
of the linear ranking applied on all measures with their vari-
Borrow and Peskov [5] 61.74
ants, TF + WN+WE_AV, proved the effectiveness of impor-
Biçici [6] 54.68
tance estimation using ElasticNet regularization technique.
Baseline LM 31.59
This setting achieved a new state-of-the-art performance in
estimating the semantic similarity between texts. The best result is in bold among all

5
4.5
4
3.5
Similarity score

3
2.5
2
1.5
1
0.5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
WN_sim WE_GN WE_E WE_B TF+WN_WE_AV

Fig. 3 Pairwise performance of different experimental settings. The similarity score is represented by the X-axis, and the experimental settings are
represented by Y -axis

123
Author's personal copy
Progress in Artificial Intelligence

Bonet and Cedeño [10] proposed a method based on dif- Spanish and pilot on interpretability. In: Proceedings of the 9th
ferent features including lexical features, explicit semantic International Workshop on Semantic Evaluation (SemEval 2015),
pp. 252–263 (2015)
analysis, context vector-based features and embedding-based 2. Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A.,
features. The multilingual word representation is employed Mihalcea, R., Rigau, G., Wiebe, J.: Semeval-2016 task 1: Seman-
by the group of Bonet and Cedeño [10] to capture semantic tic textual similarity, monolingual and cross-lingual evaluation. In:
similarity. On the other side, Borrow and Peskov [5] applied Proceedings of the 10th International Workshop on Semantic Eval-
uation (SemEval-2016), pp. 497–511 (2016)
end-to-end shared weighted deep lstm model for semantic 3. Aliguliyev, R.M.: A new sentence similarity measure and sen-
textual similarity. Though the related methods are based on tence based extractive technique for automatic text summarization.
similarity measures using numerous technique and resources, Expert Syst. Appl. 36(4), 7764–7772 (2009)
our proposed method uses only the word-level semantics 4. Bär, D., Biemann, C., Gurevych, I., Zesch, T.: Ukp: computing
semantic textual similarity by combining multiple content simi-
to capture sentence-level similarity. The results also indi-
larity measures. In: Proceedings of the First Joint Conference on
cate that the word-level semantics is effective to capture the Lexical and Computational Semantics-Volume 1: Proceedings of
sentence-level similarity. However, the experimental results the Main Conference and the Shared Task, and Volume 2: Proceed-
demonstrated the effectiveness of our method in computing ings of the Sixth International Workshop on Semantic Evaluation,
Association for Computational Linguistics, pp. 435–440 (2012)
semantic similarity.
5. Barrow, J., Peskov, D.: UMDeep at SemEval-2017 task 1: end-to-
end shared weight LSTM model for semantic textual similarity.
In: Proceedings of the 11th International Workshop on Semantic
5 Conclusion and future directions Evaluation (SemEval-2017), pp. 180–184 (2017)
6. Biçici, E.: RTM at SemEval-2017 task 1: referential translation
machines for predicting semantic similarity. In: Proceedings of the
This paper introduced a method for measuring the semantic 11th International Workshop on Semantic Evaluation (SemEval-
textual similarity between sentences. We estimated similar- 2017), pp. 203–207 (2017)
ity between sentences using the word-level semantics. In this 7. Bjerva, J., Östling, R.: ResSim at SemEval-2017 task 1: multi-
regard, we investigate bilingual word semantics that has been lingual word representations for semantic textual similarity. In:
Proceedings of the 11th International Workshop on Semantic Eval-
utilized to capture the semantic similarity between sentences. uation (SemEval-2017), pp. 154–158 (2017)
We proposed three new semantic similarity measures exploit- 8. Callan, J., Hoy, M., Yoo, C., Zhao, l.: Clueweb09 data set (2009)
ing word-embedding and WordNet. The performance of each 9. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.:
individual measures using different resources (Wikipedia and SemEval-2017 task 1: semantic textual similarity-multilingual and
cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055
Google news corpus) on STS-2017 dataset was observed. The (2017)
observation concluded that proposed measures are effective 10. España-Bonet, C., Barrón-Cedeño, A.: Lump at SemEval-2017
for computing similarity. Moreover, the performance of mea- task 1: towards an interlingua semantic similarity. In: Proceed-
sures using Bilingual semantics (both Bengali and English) ings of the 11th International Workshop on Semantic Evaluation
(SemEval-2017), pp. 144–149 (2017)
was competitive that indicate the consistency of our proposed 11. Fernando, S., Stevenson, M.: A semantic similarity approach to
measures to capture similarity. The combination of measure paraphrase detection. In: Proceedings of the 11th Annual Research
further improved the performance. Finally, the linear ranking Colloquium of the UK Special Interest Group for Computational
applied to all measures with their importance score computed Linguistics, pp. 45–52 (2008)
12. Ferreira, R., Lins, R.D., Freitas, F., Simske, S.J., Riss, M.: A new
linear regression technique surpasses the performance of all sentence similarity assessment measure based on a three-layer sen-
and achieved a new state-of-the-art performance. The perfor- tence representation. In: Proceedings of the 2014 ACM Symposium
mance comparison with some known related methods also on Document Engineering, ACM, pp. 25–34 (2014)
demonstrated the effectiveness of our method. 13. Fewzee, P., Karray, F.: Elastic net for paralinguistic speech recog-
nition. In : Proceedings of the 14th ACM International Conference
In the near future, we would like to apply our proposed on Multimodal Interaction, ACM, pp. 509–516 (2012)
semantic measures in some other fields such as query sugges- 14. Han, L., Kashyap, A.L., Finin, T., Mayfield, J., Weese, J.:
tion generation, web search diversification, query completion UMBC_ebiquity-core: semantic textual similarity systems. In:
and subtopic mining. It would be interesting to apply mul- Second Joint Conference on Lexical and Computational Seman-
tics (* SEM), Volume 1: Proceedings of the Main Conference and
tilingual word semantics to estimate the cross-language the Shared Task: Semantic Textual Similarity, vol. 1, pp. 44–52
sentence similarity. We also have a plan to apply long short- (2013)
term-memory (LSTM) to introduce a new similarity measure 15. Hassanzadeh, H., Groza, H., Nguyen, A., Hunter, J.: Uqeresearch:
for semantic similarity. semantic textual similarity quantification. In: Proceedings of the
9th International Workshop on Semantic Evaluation (SemEval
2015), pp. 123–127 (2015)
16. Hoerl, A., Kennard, R.: Ridge Regression, in Encyclopedia of Sta-
References tistical Sciences, vol. 8, pp. 129–136. Wiley, New York (1988)
17. Jijkoun, V., de Rijke, M.: Recognizing textual entailment using
1. Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez- lexical similarity. In: Proceedings of the PASCAL Challenges
Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, Workshop on Recognising Textual Entailment, Citeseer, pp. 73–
R.: Semeval-2015 task 2: Semantic textual similarity, English, 76 (2005)

123
Author's personal copy
Progress in Artificial Intelligence

18. Kenter, T., De Rijke, M.: Short text similarity with word embed- 26. Shajalal, Md., Ullah, M.Z., Chy, A.N., Aono N.: Query subtopic
dings. In: Proceedings of the 24th ACM International on Con- diversification based on cluster ranking and semantic features.
ference on Information and Knowledge Management, ACM, pp. In: Advanced Informatics: Concepts, Theory And Application
1411–1420 (2015) (ICAICTA), 2016 International Conference On, IEEE, pp. 1–6
19. Kozareva, Z., Vazquez, S., Montoyo, A.: Adaptation of a machine- (2016)
learning textual entailment system to a multilingual answer valida- 27. Tibshirani, Robert: Regression shrinkage and selection via the
tion exercise. In: CLEF (Working Notes), 2006 lasso: a retrospective. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(3),
20. Li, H., Xu, J.: Semantic matching in search. Found. Trends Inf. 273–282 (2011)
Retr. 7(5), 343–469 (2014) 28. Zhang, Z. , Saligrama, V.: Zero-shot learning via semantic sim-
21. Li, Y., McLean, D., Bandar, Z.A., Crockett, K.: Sentence similarity ilarity embedding. In: Proceedings of the IEEE International
based on semantic nets and corpus statistics. IEEE Trans. Knowl. Conference on Computer Vision, p. 4166–4174 (2015)
Data Eng. 8, 1138–1150 (2006) 29. Zou, H., Hastie, T.: Regularization and variable selection via the
22. Lintean, M.C., Rus, V.: Measuring semantic similarity in short texts elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320
through greedy pairing and word semantics. In: FLAIRS Confer- (2005)
ence (2012)
23. Metzler, D., Dumais, S., Meek, C.: Similarity measures for
short segments of text. In: European Conference on Information
Publisher’s Note Springer Nature remains neutral with regard to juris-
Retrieval, pp. 16–27. Springer, Berlin (2007)
dictional claims in published maps and institutional affiliations.
24. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and
knowledge-based measures of text semantic similarity. AAAI 6,
775–780 (2006)
25. Šarić, F., Glavaš, G., Karan, M., Šnajder, J., Bašić, B.D.: Takelab:
systems for measuring semantic text similarity. In: Proceedings
of the First Joint Conference on Lexical and Computational
Semantics-Volume 1: Proceedings of the Main Conference and the
Shared Task, and Volume 2: Proceedings of the Sixth International
Workshop on Semantic Evaluation, Association for Computational
Linguistics, pp. 441–448 (2012)

123

9.Text-Based Measure of Supply Chain Risk Exposure
No ratings yet
9.Text-Based Measure of Supply Chain Risk Exposure
43 pages
Sentence-Level Semantic Textual Similarity Using Word-Level Semantics
No ratings yet
Sentence-Level Semantic Textual Similarity Using Word-Level Semantics
4 pages
Evolution of Semantic Similarity - A Survey
No ratings yet
Evolution of Semantic Similarity - A Survey
35 pages
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
No ratings yet
Semantic Similarity For English and Arabic Texts: A Review: Alzahrani 2016
29 pages
AAAI06-123 (Revisar para Referencias)
No ratings yet
AAAI06-123 (Revisar para Referencias)
6 pages
8-Measuring Text Similarity Based On Structure and Word Embedding
No ratings yet
8-Measuring Text Similarity Based On Structure and Word Embedding
20 pages
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
No ratings yet
Short Text Similarity Calculation Based On Jaccard and Semantic Mixture
9 pages
Nlp Project[1]
No ratings yet
Nlp Project[1]
16 pages
Expert Systems With Applications: Raja Muhammad Suleman, Ioannis Korkontzelos
No ratings yet
Expert Systems With Applications: Raja Muhammad Suleman, Ioannis Korkontzelos
9 pages
Sentence Similarity Based On Semantic Networks
No ratings yet
Sentence Similarity Based On Semantic Networks
36 pages
Text Semantic Similarity
No ratings yet
Text Semantic Similarity
17 pages
A Survey On Semantic Similarity Measures
No ratings yet
A Survey On Semantic Similarity Measures
5 pages
06879d26e3ba5b6fb7feeddc199f24dd4ff6
No ratings yet
06879d26e3ba5b6fb7feeddc199f24dd4ff6
7 pages
Semantic Similarity
No ratings yet
Semantic Similarity
14 pages
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
No ratings yet
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
11 pages
10 1002@cpe 5971
No ratings yet
10 1002@cpe 5971
17 pages
2020.lrec-1.851
No ratings yet
2020.lrec-1.851
6 pages
Data & Knowledge Engineering: Jesús Oliva, José Ignacio Serrano, María Dolores Del Castillo, Ángel Iglesias
No ratings yet
Data & Knowledge Engineering: Jesús Oliva, José Ignacio Serrano, María Dolores Del Castillo, Ángel Iglesias
3 pages
A Cognitive Study On Semantic Similarity Analysis
No ratings yet
A Cognitive Study On Semantic Similarity Analysis
6 pages
Evaluating of Efficacy Semantic Similarity Methods
No ratings yet
Evaluating of Efficacy Semantic Similarity Methods
8 pages
A Survey of Numerous Text Similarity Approach
No ratings yet
A Survey of Numerous Text Similarity Approach
10 pages
M S S W: A S: Easurement of Emantic Imilarity Between Ords Urvey
No ratings yet
M S S W: A S: Easurement of Emantic Imilarity Between Ords Urvey
10 pages
Semeval-2012 Task 6: A Pilot On Semantic Textual Similarity
No ratings yet
Semeval-2012 Task 6: A Pilot On Semantic Textual Similarity
9 pages
A Survey of Text Similarity Approaches: Wael H. Gomaa Aly A. Fahmy
No ratings yet
A Survey of Text Similarity Approaches: Wael H. Gomaa Aly A. Fahmy
6 pages
Finding The Similarity Between Two Arabic Texts
No ratings yet
Finding The Similarity Between Two Arabic Texts
12 pages
Corpus Linguistics: National Conference On Artificial Intelligence. 1, PP
No ratings yet
Corpus Linguistics: National Conference On Artificial Intelligence. 1, PP
4 pages
Review On NLP Paraphrase Detection Approaches
No ratings yet
Review On NLP Paraphrase Detection Approaches
4 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
A Comparison of Document Similarity Algorithms
No ratings yet
A Comparison of Document Similarity Algorithms
10 pages
Text Similarity Using Siamese Networks and Transformers
No ratings yet
Text Similarity Using Siamese Networks and Transformers
10 pages
Semantic Similarity Between Medium-Sized Texts
No ratings yet
Semantic Similarity Between Medium-Sized Texts
13 pages
A Systematic Literature Review of Similarity Analysis Techniques for Bangla Text
No ratings yet
A Systematic Literature Review of Similarity Analysis Techniques for Bangla Text
8 pages
3860-Article Text-10344-1-10-20180424
No ratings yet
3860-Article Text-10344-1-10-20180424
4 pages
Information 11 00421 v2
No ratings yet
Information 11 00421 v2
17 pages
A Novel Hybrid Methodology of Measuring
No ratings yet
A Novel Hybrid Methodology of Measuring
10 pages
Measuring Similarity Between Question Pair in Online Forums: 1 Pramod Kumar Rai 2 Kunal Chakma
No ratings yet
Measuring Similarity Between Question Pair in Online Forums: 1 Pramod Kumar Rai 2 Kunal Chakma
5 pages
Mridul 2021 Ijca 921582
No ratings yet
Mridul 2021 Ijca 921582
7 pages
Semantic Text Analysis
No ratings yet
Semantic Text Analysis
6 pages
978 3 319 11749 2 - 8 PDF
No ratings yet
978 3 319 11749 2 - 8 PDF
2 pages
pxc3887118
No ratings yet
pxc3887118
7 pages
Semeval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
No ratings yet
Semeval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
15 pages
Combining Xxsentence Similarities Measures To Identify Paraphrases
No ratings yet
Combining Xxsentence Similarities Measures To Identify Paraphrases
15 pages
Admin, 4015
No ratings yet
Admin, 4015
19 pages
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
No ratings yet
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
13 pages
A Hybrid Approach of Weighted Fine Tuned BERT Extraction With Deep Siamese Bi - LSTM Model For Semantic Text Similarity Identification
No ratings yet
A Hybrid Approach of Weighted Fine Tuned BERT Extraction With Deep Siamese Bi - LSTM Model For Semantic Text Similarity Identification
27 pages
Measuring Semantic Similarity Between Words and Improving Word Similarity by Augumenting PMI
No ratings yet
Measuring Semantic Similarity Between Words and Improving Word Similarity by Augumenting PMI
5 pages
PESTS: Persian - English Cross Lingual Corpus For Semantic Textual Similarity
No ratings yet
PESTS: Persian - English Cross Lingual Corpus For Semantic Textual Similarity
21 pages
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
No ratings yet
Semantic Textual Similarity With Siamese Neural Networks: Tharindu Ranasinghe, Constantin or Asan and Ruslan Mitkov
8 pages
Trigram 11
No ratings yet
Trigram 11
16 pages
A Hybrid Approach To Paraphrase Detection Based On Text Similarities and Machine
No ratings yet
A Hybrid Approach To Paraphrase Detection Based On Text Similarities and Machine
6 pages
Deep Learning For Semantic Similarity
No ratings yet
Deep Learning For Semantic Similarity
7 pages
Extracting Word Synonyms From Text Using Neural Approaches
No ratings yet
Extracting Word Synonyms From Text Using Neural Approaches
7 pages
R11-1071
No ratings yet
R11-1071
6 pages
Doc2Vec Quranic Verses
No ratings yet
Doc2Vec Quranic Verses
8 pages
Paper SSA Dwiki
No ratings yet
Paper SSA Dwiki
6 pages
Vector Based Models
No ratings yet
Vector Based Models
41 pages
Word-Level Neutrosophic Sentiment Similarity
No ratings yet
Word-Level Neutrosophic Sentiment Similarity
36 pages
Advanced Cogntive Science
No ratings yet
Advanced Cogntive Science
15 pages
A Review of Semantic Similarity Measures in WordNet
No ratings yet
A Review of Semantic Similarity Measures in WordNet
12 pages
A Study on Form-meaning Relation from the Neuro-cognitive Perspective
From Everand
A Study on Form-meaning Relation from the Neuro-cognitive Perspective
Zhang Shiqian
No ratings yet
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
4_5861484186887524483
No ratings yet
4_5861484186887524483
1 page
2nd yr maths summer class sechedule.docx (2)
No ratings yet
2nd yr maths summer class sechedule.docx (2)
1 page
2017 EC Academic Calendar
No ratings yet
2017 EC Academic Calendar
3 pages
mitiku tamirat profile
No ratings yet
mitiku tamirat profile
1 page
2017 2nd and 4th Class Schedule-final (2)
No ratings yet
2017 2nd and 4th Class Schedule-final (2)
2 pages
Let2 W
No ratings yet
Let2 W
46 pages
Applsci 12 09691 v2
No ratings yet
Applsci 12 09691 v2
35 pages
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
No ratings yet
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
12 pages
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
No ratings yet
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
162 pages
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
No ratings yet
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
9 pages
HDP Work Book Final
100% (2)
HDP Work Book Final
98 pages
Grade 8-Information Technology IT Fetena Net Af43
100% (1)
Grade 8-Information Technology IT Fetena Net Af43
115 pages
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
100% (1)
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
115 pages
Paraphrasing Textual Entailment and Semantic Simil
No ratings yet
Paraphrasing Textual Entailment and Semantic Simil
239 pages
Shimaa IsmailSemanticSimilarity
No ratings yet
Shimaa IsmailSemanticSimilarity
11 pages
Grade 8-Social Studies Fetena Net 1dc2
100% (3)
Grade 8-Social Studies Fetena Net 1dc2
213 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
Collective Human Opinions in Semantic Textual Simi
No ratings yet
Collective Human Opinions in Semantic Textual Simi
17 pages
Handout Cloud, Iot, Ip
No ratings yet
Handout Cloud, Iot, Ip
141 pages
Access 2018 2887076
No ratings yet
Access 2018 2887076
19 pages
The Final Main Thesis-Compressed
No ratings yet
The Final Main Thesis-Compressed
85 pages
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
100% (1)
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
144 pages
8-Deep Learning For NLP
No ratings yet
8-Deep Learning For NLP
49 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
Kaiwartya 2016
No ratings yet
Kaiwartya 2016
17 pages
7-Information Extraction (IE) and Machine Translation (MT)
No ratings yet
7-Information Extraction (IE) and Machine Translation (MT)
46 pages
6-Lecture Six (Chapter Four-Semantic Analysis)
No ratings yet
6-Lecture Six (Chapter Four-Semantic Analysis)
25 pages
2-Lecture Two - (Back Ground of NLP)
No ratings yet
2-Lecture Two - (Back Ground of NLP)
65 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
2 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
No ratings yet
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
11 pages
Article 4
No ratings yet
Article 4
7 pages
Creating Word Embeddings - Coding The Word2Vec Algorithm in Python Using Deep Learning - by Eligijus Bujokas - Towards Data Science
No ratings yet
Creating Word Embeddings - Coding The Word2Vec Algorithm in Python Using Deep Learning - by Eligijus Bujokas - Towards Data Science
11 pages
Caliskan et al. (2022).Gender Bias in Word Embeddings_ A Comprehensive Analysis of Frequency, Syntax, and Semantics
No ratings yet
Caliskan et al. (2022).Gender Bias in Word Embeddings_ A Comprehensive Analysis of Frequency, Syntax, and Semantics
15 pages
Doan Uccs 0892D 10279
No ratings yet
Doan Uccs 0892D 10279
147 pages
A Set of Arabic Word Embedding Models For Use in Arabic NLP
No ratings yet
A Set of Arabic Word Embedding Models For Use in Arabic NLP
10 pages
Fast_and_Accurate_Resume_Parsing_Method_Based_on_Multi-Task_Learning
No ratings yet
Fast_and_Accurate_Resume_Parsing_Method_Based_on_Multi-Task_Learning
6 pages
Final Project Vaaghu
No ratings yet
Final Project Vaaghu
84 pages
2411.00796v1
No ratings yet
2411.00796v1
54 pages
NLP Midsem Paper Jan 2024 Regular exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular exam
4 pages
Deep Learning Enabled Semantic Communication Systems
No ratings yet
Deep Learning Enabled Semantic Communication Systems
13 pages
Requirements Similarity and Retrieval: July 2024
No ratings yet
Requirements Similarity and Retrieval: July 2024
28 pages
Ea Drdo
No ratings yet
Ea Drdo
22 pages
Deep Multimodal Representation Learning A Survey
No ratings yet
Deep Multimodal Representation Learning A Survey
22 pages
24-02-14 7. Feature extraction methods
No ratings yet
24-02-14 7. Feature extraction methods
19 pages
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
No ratings yet
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
281 pages
DL Unit-2
No ratings yet
DL Unit-2
51 pages
Research Paper1
No ratings yet
Research Paper1
5 pages
Data Science for Fake News Surveys and Perspectives the Information Retrieval Series 42 9783030626952 9783030626969 3030626954 Compress
No ratings yet
Data Science for Fake News Surveys and Perspectives the Information Retrieval Series 42 9783030626952 9783030626969 3030626954 Compress
308 pages
Mirrorgan: Learning Text-To-Image Generation by Redescription
No ratings yet
Mirrorgan: Learning Text-To-Image Generation by Redescription
10 pages
A Verifiable Semantic Searching Scheme by Optimal Matching Over Encrypted Data in Public Cloud
No ratings yet
A Verifiable Semantic Searching Scheme by Optimal Matching Over Encrypted Data in Public Cloud
16 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages
Unit -1 NLP-R20
No ratings yet
Unit -1 NLP-R20
10 pages
Deep Sequence-To-Sequence Entity Matching For Heterogeneous Entity Resolution
No ratings yet
Deep Sequence-To-Sequence Entity Matching For Heterogeneous Entity Resolution
10 pages
Us Presidential Vocabulary - Ipynb
No ratings yet
Us Presidential Vocabulary - Ipynb
40 pages
deep seek
No ratings yet
deep seek
2 pages
Indian Versus Pinoy: The Battle For The Greatest Shawarma
No ratings yet
Indian Versus Pinoy: The Battle For The Greatest Shawarma
279 pages

Published Paper

Uploaded by

Published Paper

Uploaded by

Semantic textual similarity between

sentences using bilingual word semantics

Md. Shajalal & Masaki Aono

Progress in Artificial Intelligence

Prog Artif Intell

Semantic textual similarity between sentences using bilingual word

Received: 4 December 2018 / Accepted: 2 March 2019

1 Introduction the text similarity is employed widely which include text

Fig. 1 Semantic textual

Preprocessing Similarity Measures Similarity Score Estimation Score

WordNet based Similarity

Word-Embedding based Similarity

Table 2 Basic notation used in

Table 3 Overview of different

instructed to assign the labels as followings [9]:

3 STS2017: http://alt.qcri.org/semeval2017/task1/. 4 https://en.wikipedia.org/wiki/Pearson_correlation_coefficient.

0.16 linear ranking discussed in Sect. 3.3 (Eq. 5) on all measures

The table illustrates that the method applying super-

vised feature selection on proposed semantic similarity

Table 4 Summary of all

You might also like