0% found this document useful (0 votes)

36 views11 pages

Shimaa IsmailSemanticSimilarity

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views11 pages

Shimaa IsmailSemanticSimilarity

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/362786644

Arabic Semantic-Based Textual Similarity

Article in Benha Journal of Applied Sciences · April 2022

DOI: 10.21608/bjas.2022.254708

CITATION READS

1 29

3 authors:

Shimaa Ismail Abdelwahab el sammak

Benha University Benha University
4 PUBLICATIONS 34 CITATIONS 21 PUBLICATIONS 152 CITATIONS

SEE PROFILE SEE PROFILE

Tarek Elshishtawy
Benha University
9 PUBLICATIONS 41 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cloud Computing View project

Steganography View project

All content following this page was uploaded by Shimaa Ismail on 19 August 2022.

The user has requested enhancement of the downloaded file.

Benha Journal of Applied Sciences (BJAS) print: ISSN 2356–9751
Vol. (7) Issue (4) (2022), (133-142) online: ISSN 2356–976x
http://bjas.journals.ekb.eg

Arabic Semantic-Based Textual Similarity

Shimaa Ismail 1, AbdelWahab Alsammak2 and Tarek Elshishtawy1
1
Faculty of Computers and Artificial Intelligence, Benha Univ., Benha, Egypt
2
Faculty of Engineering Shoubra, Benha Univ., Benha, Egypt
E-mail: [email protected]
Abstract
Textual similarity is one of the most important aspects of information retrieval. This paper proposes several techniques of
semantic textual similarity as well as the factors that influence them. Two-hybrid approaches for measuring the degree of
similarity between two Arabic snipped texts are presented. The first proposed approach combined the word-based and vector-
based similarity methods to construct semantic word spaces for each word of the input text. These words are represented in
their lemma forms to capture all semantically related words. In this approach, the semantic word spaces are used to find the
best matching between the input text words, and hence, the degree of similarity between the two snipped texts is computed.
The second proposed approach combined semantic and syntactic based approaches. The basic Levenshtein concept
represents the main structure for this approach. It has been modified to measure the edit cost at the token level not at the
character level. In addition, the semantic word spaces are added to this approach to include the semantic features to the
syntactic features. Some techniques are embedded to overcome the syntactic approach problems such as the word sequence.
Pearson correlation coefficient is used to measure the degree of correctness of the two proposed approaches as compared to
two benchmark datasets. The experiments achieved 0.7212 and 0.7589 for the two proposed approaches on two different
datasets.

Keywords: Arabic Text Similarity, Semantic Similarity, Lexical Similarity, Word Embedding, Permutation Feature,
Negation Effect.

1. Introduction and determines how similar these characters are. The

STS (Semantic Textual Similarity) metrics have been semantic similarity, on the other hand, is determined by
a major topic in a variety of studies and applications. the meaning of the phrases. It is necessary to determine
Information retrieval, machine translation, text the degree of relatedness between sentences to quantify
summarization, sentiment analysis, question generating semantic similarity [4].
and answering, automatic essay scoring, automatic short The Levenshtein approach is one of the approaches
answer grading, and other activities all rely on them [1]. that measure the similarity of texts lexically. The
There are several barriers in determining the Levenshtein distance is a statistic for calculating how
semantic similarity or relationship between two snipped many edit operations are necessary to convert one string
Arabic texts, including morphological inflections and to another [5]. The distance between two words is
orthographic confusion related to optional diacritization. defined as the number of single-character changes (i.e.,
As a result, there are more homographs than in English, insertions, deletions, or replacements) necessary to
which adds to the confusion [2]. The Arabic word "‫"ذهب‬ change one word into the other in its original form. The
for example, maybe the verb "Went" or the noun "Gold" edit distance is expanded in this study to detect distances
with various diacritics. It is possible to discern between between two sentences given in words rather than
the two meanings based on the part of speech tagging characters. In addition, semantic features are embedded
and the context of the word. As a result, it is important to to change the edit cost between the input words.
do this work with a conventional morphological tool. In Semantic word spaces are generated from one of the
this research, the StanfordNLP library is used as a word embedding models that represent the words in a
morphological tool to determine the Part of Speech vector space. This vector space defined the meaning of
(POS) tagging for words. In addition, the lemmatization words and the relation between them. There are two
approach was employed in our study to avoid the algorithms used to extract these semantic word spaces in
diacritization difficulty and its impact by using the this research. In the two proposed approaches, the
lemma form of the word. The inflected word form is semantic word spaces are used in two different manners.
transformed into its dictionary lemma look-up form In this paper, two hybrid approaches are proposed to
through lemmatization [3]. This lemma form is the enhance the overall calculations of the similarities
shortest that captures all the word's semantic between two texts. An interlaced approach that combines
characteristics. word-based and vector-based approaches was proposed.
There are two different types of sentence similarity: A modified strategy that combines both syntax and
lexical similarity and semantic similarity. Lexical semantic techniques also presented.
similarity looks at a sentence as a series of characters
134 Arabic Semantic-Based Textual Similarity

The rest of the paper is organized as follows. Section i.e., words used in comparable contexts are mapped to a
2 represents the related work of the semantic similarity proximal vector space. Different techniques represent the
approaches. Section 3 explains the approach words in vector spaces such as Skipgram skip-G,
methodology. In section 4, the evaluations of the Continuous Bag of Words CBOW, and the co-
experimental results are described. Finally, the occurrence frequency. The terms "flower" and "tree," for
conclusion of our approach is presented in section 5. example, are semantically related since they both refer to
plants and are used in the same context [11]. FastText is
2. Related Work one of these word embedding models that presented
Arabic snipped texts can be classified according to word representation for the Arabic language. FastText is
the adopted methodology as follows: an unsupervised learning technique that generates vector
One of the most used strategies for evaluating representations for words in 294 languages [12]. It
semantic similarity is deep learning with feature- supports both CBOW and skip-G models.
engineered models. Tian et al. [6] used characteristics Nagoudi et al. [13] used a word embedding model
such as n-gram overlap, edit distance, and longest presented by Zahran et al. [14] based on CBOW, Skip-
common prefix/ suffix/ substring to train deep learning G, and GloVe approaches to determine the semantic
algorithms. They were able to reach a PCC of 0.7440. similarity of Arabic sentences. Additionally, they
On the aspects of alignment similarity and string included two weighing functions: IDF weighting and
similarity measurements, Henderson et al. [7] employed POS tagging. They achieved a PCC of 0.7463 ranked the
the same method with various algorithms, such as first for applying an approach with the native language
Recurrent Neural Networks (RNN) and Recurrent and the second among all participants.
Convolutional Neural Networks (RCNN). For the same Alian et al., [15] proposed a method that combined
dataset, their PCC is 0.7304. lexical, semantic and syntactic-semantic features with
For the same dataset utilized, the semantic machine learning techniques like linear regression and
information space (SIS) is a technique that produced a support vector machine regression. They applied the
high Pearson correlation coefficient. The non- Levenshtein method and one of the word embedding
overlapping Information Content (IC) computation is models to represent words in a vector space. They
obtained using this method, which is based on the evaluated their approach on three different datasets. For
semantic hierarchical taxonomy in WordNet. This the STS-MSRvid of the SemEval competition dataset,
method was employed by Wu et al [8] in three studies. they achieved a PCC of 0.743.
As they used the IC which is based on the word The word-based similarity category treats the phrase
frequencies from WordNet and the British National as a collection of words; hence it is based on the
Corpus. They also collaborate the IDF weighting system similarity between terms. There are several approaches
using the IC with cosine similarity. The highest Pearson for determining phrase similarity in this area, including
correlation coefficient is presented in this competition by maximum similarity, similarity matrix, employing
0.7543. similar and dissimilar components, and word meaning
BableNet is a huge, multilingual semantic network disambiguation [4]. In addition, several of these
with broad coverage that gathered its data from Wordnet strategies are integrated. First, the maximum similarity
and Wikipedia [9]. The multilingual word-sense aligner of each word in the first sentence and each word in the
proposed by Hassan et al., [10] relies heavily on the second sentence is determined using the Max similarity
BableNet network. According to the Bable synsets, they approach. The average similarity is then determined
built an aligner that aligns each word in one phrase to [16]. The similarity matrix method generates a matrix
another word in the other. In many languages, these containing the results of calculating the similarity
synsets reflect the word's meaning, named entity, and between each word in each sentence and the words in the
synonyms. For the used Arabic dataset, PCC is 0.7158. other sentence [17]. Wang et al., [18] describe the use of
Word embedding is a common method for various similar and dissimilar parts to represent words using
text applications and NLP activities. The distributed word embedding. They then used cosine similarity to
representation of words in a vector space is referred to as create a similarity matrix. Furthermore, a semantic
word embedding. Traditional NLP techniques miss matching function was used to create a semantic
syntactic (structure) and semantic (meaning) links across matching vector for each word. They break down the
collections of words. As a result, using word vectors to resulting match vectors to discover which portions of
represent words has its advantages. The word vectors are each vector are similar and which are distinct. Finally,
multidimensional continuous floating-point values in the similarity is calculated using these vectors. In [19],
which semantically comparable words are transferred to they utilized Wordnet synonyms to extend the words of
geometrically close regions. Each point in the word the original phrases, then generated a vector
vector represents a dimension of the word's meaning, representation for these words in addition to the vectors

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Shimaa Ismail, AbdelWahab Alsammak and Tarek Elshishtawy 135

of the set of terms in each sentence. Finally, the cosine lemmatization in both the fastText vector model and the
similarity of the two vectors is used to calculate the input text in our technique. In the following part, we'll
similarity. go through how to do that. Second, utilizing the vector
word-space of sentences, a word-matching matrix is
3. Methodology created. Finally, the degree of similarity between
In this research, we presented two-hybrid approaches sentences is calculated.
for two different semantic techniques. Fig. (1) illustrates the suggested technique, which is
3.1 The First Proposed Approach divided into three stages:
In the first approach, we present a hybrid 1) Vector-Based Similarity,
methodology that combines two semantic similarity 2) Word-Based Similarity,
approaches: word-based and vector-based methods, to 3) Similarity Measures.
quantify the semantic similarity between two snipped 3.1.1 Vector-Based Similarity
Arabic texts. The vector space of each word is first FastText Models are available and readable for the
retrieved in lemma form using the fastText vector model. Arabic language. FastText models trained using CBOW
The StanfordNLP library was used to generate their with position-weights, in dimension 300, with character
lemma form. To analyze natural language, the n-grams of length 5, a window of size 5, and 10
StanfordNLP package is employed. It turns a string of negatives [21]. Arabic fastText corpus contains more
human language text into lists of sentences and words, than 356 thousand words. These words are totally
generates basic forms of those words, parts of speech, different represented in their surface form. A screenshot
and morphological aspects, and provides a syntactic from this dataset is shown in fig. (2).
structural dependency parse that is parallel across over
70 languages [20]. It is utilized for tokenization and

Fig. (1) The Proposed approach Overview.

Fig. (2) A screenshot from the fastText Arabic Model dataset

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

136 Arabic Semantic-Based Textual Similarity

Many studies employed the vector space model's surface To extract the closest words for each word in the
form to derive the semantic similarity between words sentences, vectors are employed to extract semantic
that did not include additional semantically related similarity using a word embedding or word
terms. But in this research, a lemmatization technique is representation. For related or near-synonymous words,
used to improve the search word space for the fastText these vectors shared the same semantic properties. The
model words. The lemmatization is applied for the input suggested method extracts comparable words from the
text and the fastText vector model and stated as a mapped fastText model using the preprocessed and
mapped fastText model. Some other preprocessing tools lemma form of input words. There are numerous indices
are applied for the text such as noise removal, word in the mapped model for each word in the input text that
normalization, eliminating stopwords. contains the same word but distinct vectors. Fig. (3)
There are two techniques are used to extract the showed the process of extracting the semantic word
semantic word spaces for each word of the input words: space for a specific word.
using the closest words algorithm or a ready-made The mapped vector model is used to extract the
function built in the fastText module in python language. closest words using Algorithm 1. Where np is the
3.1.1.1 Using the closest words algorithm NumPy library stands for Numerical Python.

Fig. (3) The process of extracting the semantic word space for a word using the closest words algorithm

Algorithm 1: Finding the closest words for a specific word

Input: The Word Vector, N
Output: Similar Words
1. difference = vector of all words – vector of a word
2. delta = np.sum(difference * difference)
3. i = np.argmin(delta)
4. similar word = word of index i
5. drop the index of this similar word
6. iterate the previous steps by N times
7. Return similar words

Fig. (4) The process of extracting the semantic word space for a word using the fastText library

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Shimaa Ismail, AbdelWahab Alsammak and Tarek Elshishtawy 137

Iterating over the whole vectors of the mapped sentence matched to some of the second sentence words
fastText model to obtain the index of the most with the number of common words (NCW).
comparable word can be used to find the closest word. 3.1.3 Similarity Measures
By repeating the loop N times with the same method, N For two sentences s1 and s2 with a length of n and m
numbers of related terms are discovered. The word space respectively, the similarity score is measured by
number for each word can be assigned, however, Equation 1. Where p is the length of the matching
according to the mapped fastText model, each word has matrix.
a distinct number of indexes. The main purpose is to ∑ ⁄
(1)
increase the word search space by collecting more
relevant comparable terms for a particular word. In this The result value is a ratio from 0 to 1 scale. So, we
approach, we extract 10 closest words for each index. multiply the output by 5 to make the output ratio is from
3.1.1.2 Using the fastText module 0 to 5 scale. The zero value represents that the two
The language model of the fastText organization was sentences are quite different, and the five value indicates
released as a Python module [22]. Similar words can be that the two sentences are typical.
retrieved using a built-in function named "get nearest 3.2 The Second Proposed Approach
neighbors" in this package or module. Like the word In the second approach, a modified approach is
space for each word, this function returns the 10 closest presented to measure the similarity of two snipped
neighbors of the searched word in its surface form. The Arabic texts lexically and semantically based on the edit
process of extracting the semantic word space using the distance approach. This proposed approach is hybrid in
fastText module is shown in Figure 4. the sense that both syntax and semantic features are used
3.1.2 Word-based Similarity to measure the similarity. Different knowledge resources
In this stage, we tried to find the relatedness or the are employed such the semantic word spaces. Also, the
association between words. From the semantic words approach presents a solution to miss ordering of words
spaces, a common word matrix is built. This matrix is between given two sentences. The modified edit distance
constructed by the common words of each pair of words approach is based on different weights (edit cost)
by their semantic word spaces. From this matrix, a according to the state of the two words.
matching matrix is generated by selecting the most The proposed workflow for measuring the edit cost
common words between each pair of words. The between two words is shown in Fig (5).
matching matrix consists of the words of the first

Fig. (5) The workflow diagram of finding the edit cost of two words

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

138 Arabic Semantic-Based Textual Similarity

Algorithm 2: Finding the edit cost between two words

Input: Pair of words S1[i], S2[j]
Output: Edit cost
Let cost be the edit cost,
IF the word S1[i] = the word S2[j],
then, cost=0;
ELSE IF S1[i] is one of the semantic space words of S2[j] according to the fastText module,
then, cost= 1- similarity score (SC);
ELSE IF S2[j] is one of the semantic space words of S1[i] according to the fastText module,
then, cost = (1- SC);
ELSE IF there are common words bw the two semantic spaces obtained by the closest words algorithm,
then, cost = 1- similarity ratio (SR);
SR = NCW/ Max (|WS1|,|WS2|),
Where NCW is the number of common words between WS1[i] and WS2[j],
Where WS1 is the length of the word space of word i in S1 that differ according to each word,
Where WS2 is the length of the word space of word j in S2 that differ according to each word,
ELSE, cost=1;
The frame algorithm for calculating the edit cost of terms are considered as NCW. The SR is then
given two words in two sentences is presented in determined by dividing the NCW by the maximum
Algorithm 2. length of the two semantic spaces, which is differing
In this proposed approach, new features are according to each word. The cost will be (1-SR).
combined to provide accurate measures of the similarity Otherwise, the cost equals one.
between two snipped Arabic texts. The following For two sentences S1, S2 with length n, m words
summarizes the features and proposed modules of the respectively, a corresponding matrix CM is constructed
approach. depending on the edit cost between each pair of words.
3.2.1 Token Lemma Level The value of CM[i+1][j+1] is as shown in Equation 2:
The Levenshtein approach is one of the lexical
similarity strategies based on the edit distance metrics. [ ][ ]
The cost of changing one string into another is calculated { [ ][ ] (2)
using its methodology, which assigns a unity cost to all [ ][ ]
edit operations. This cost is utilized to create a character Finally, the edit distance between the two sentences
matrix between the two words, which then yields the edit is represented by the final value of CM[-1][-1]. The
distance. The suggested method extends the Levenshtein similarity between the two sentences is measured by
algorithm by computing 'edit distances' at the token level Equation 3.
rather than the character level as in the traditional
algorithm. The lemma versions of the tokens are
represented. In addition, the input text is pre-processed (3)
using the mentioned pre-processing tools in the first 3.2.3 Applying Word Permutation
proposed approach. The Levenshtein method is based on word order and
3.2.2 Embedding Semantic Knowledge word syntactic dependencies. For example, the two
The cost between each pair of words is calculated phrases “Every morning the sun shines in the sky” and
using the semantic word spaces produced from the two "The sun shines in the sky every morning” in Arabic
algorithms outlined in the first proposed approach. In "‫ " تسطع الشمس في السماء كل صباح‬and " ‫كل صباح تسطع الشمس في‬
algorithm 2, the edit cost between each pair of words has ‫ "السماء‬have the same meaning and their similarity should
a different value according to their state. First, the be 100 percent, however, when using the Levenshtein
semantic word space obtained from the fastText module technique, the similarity would be zero owing to
is used. The ten nearest neighbors’ words obtained from incorrect word order. As a result, a permutation
the fastText module have similarity scores (SC) for each approach is employed to determine the optimal word
term. This score shows how closely the retrieved alignment between the two texts. A permutation is a
comparable word is semantically connected to the mathematical approach for determining the number of
searched word. The edit cost between two words is alternative arrangements in a set when the order of the
determined by (1-SC) if the first word is one of the arrangements is needed. One of the two phrases is
semantic space words of the second word and vice versa. rearranged n! times, where n is the sentence's word
Otherwise, the semantic word spaces generated from length. In our approach, we utilized this technique for
Algorithm 1 are used. The two semantic spaces' common five words only to reduce the complexity of permutation

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Shimaa Ismail, AbdelWahab Alsammak and Tarek Elshishtawy 139

operations. For each candidate sequence, the edit the closest words of the semantic word space of each
distance is calculated. The candidate sequence with the word.
shortest edit distance is chosen as the one that most 4.2.1.1 Using the closest words algorithm
accurately matches the alignment of the words in the two In this experiment, two tests are accomplished. The
sentences. first test uses the mapped vector model with the input
text words in their lemma form. The second uses the
4. Experiments input text and the fastText vector model in their surface
4.1 Dataset Description form. First, the input text is preprocessed with the
We utilized the dataset from the Semantic Evaluation preprocessing tools except the stopwords are eliminated
“SemEval” yearly competition to assess the performance once and keep another. The results of the Pearson
of our approach. This event is used for a variety of correlation coefficient are shown in Table (1).
languages and track issues. The semantic similarity Results of Table (1) proved that applying the
between texts is one of these tracks (word phrases, lemmatization technique for the input text with the
sentences, paragraphs, or full documents). Furthermore, mapped vector model has a better effect than using the
the text might be in monolingual or multilingual formats. fastText model and the input text in their surface form.
2017 was the final year of the competition that used the In addition, removing the stopwords from the input text
semantic similarity track. improved the results slightly.
Development Sets and Evaluation Sets are the two
datasets included in the released dataset. One of the Table (1) Experimental Results using the closest words
Development Sets for monolingual Arabic snipped texts algorithm
is the STS-MSRvid dataset1, which contains 368 pairs of
sentences. The Evaluation Sets2 are a collection of 250 Dataset With Pre- With Surface With Lemma
sentence pairs with human judgment ratings that were Processing Form Form
published as the Evaluation Gold Standard3. In the With StopWords 0.6708 0.6886
output of these datasets, the Pearson Correlation Without StopWords 0.6887 0.7000
Coefficient (PCC) between each pair of sentences is
supplied in a one-column table. This coefficient runs Table (2) Experimental Results using the fastText
from -1 to 1, with a value of (-1) indicating that the module.
values of the two columns are completely different and a
value of (1) indicating that they are identical. This The dataset in Built-In Closest Words
coefficient is expressed in Equation 4. Surface Form Function Algorithm
∑ ̅ ̅
(4) With StopWords 0.6513 0.6708
√∑ ̅ √∑ ̅ Without StopWords 0.6679 0.6887
Where ̅ is the mean of x which is defined by
Equation 5. Table (3) Experimental Results for studying the
negation effect on the proposed Approach.
̅ ∑ ⁄ (5)
4.2 Experiments Evaluations Before After
Dataset With
The two proposed approaches are evaluated with two Studying Studying
Pre-Processing
different datasets in two tests as follows: Negation Negation
4.2.1 The First Approach (Evaluation Gold With StopWords 0.6924 0.7052
Standard) Without
0.7039 0.7212
The first approach is evaluated using the evaluation StopWords
gold standard dataset. The experimental results are
classified according to the applied algorithm of finding

http://alt.qcri.org/semeval2017/task1/data/uploads/ar_sts_data_
updated.zip
2

http://alt.qcri.org/semeval2017/task1/data/uploads/sts2017.eval
.v1.1.zip
3

http://alt.qcri.org/semeval2017/task1/data/uploads/sts2017.gs.z
ip

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

140 Arabic Semantic-Based Textual Similarity

4.2.1.2 Using the fastText module score. The Pearson Correlation Coefficient of the entire
In this experiment, the proposed approach is applied dataset was modified by these scores, as seen in Table 3.
with input text in their surface form to be compatible In the last experiment done, the Pearson Correlation
with the results of the built-In function Coefficient becomes close to the human judgment
“get_nearest_neighbors”. The results shown in Table (2) scores. The score of 0.7212 is the highest value obtained
are obtained with stopwords and without stopwords. in applying the proposed approach.
Table (2) proved that the proposed algorithm for 4.2.1.4 Comparison with other approaches
finding the closest words achieved good results than the Table (4) compares our proposed approach to other
built-in function that does the same task. works in the AR-AR track of the SemEval competition
4.2.1.3 Studying Negation Effect 2017 that had the highest PCC for the participants.
Negation is a significant factor that influences the In Table (4), some researchers used the google machine
sentence's orientation (Ismail et al., 2016). Negation translator to increase the training dataset as a
terms in Arabic include (,‫ لم‬,‫ عدم‬,‫ ال‬,‫ لن‬,‫ ليست‬,‫)ليس‬. The requirement for the deep learning approach. Therefore,
meaning of the sentence is reversed with these negative their results are much better than the results of the
words. These negation terms are scanned in each traditional approaches that use the native language. The
sentence of the two input sentences in the proposed proposed approach is considered the second after [13].
method. If a negation word is found in a sentence, the 4.2.2 The Second Approach (STS-MSRvid)
overall score will be reduced by (-1.5), as long as the The second approach is evaluated using the STS-
other sentence does not include negation terms. The MSRvid dataset. The experimental results were carried
negation presence was represented by a score of (-0.5), out for each methodology and after applying the
which substituted the (+1) score from the common word permutation feature as shown in Table (5).

Table (4) Results for the SemEval participants for Evaluation Sets

Language Researchers PCC

[8] 0.7543
Google Machine Translator [6] 0.7440
[7] 0.7304
[13] 0.7463
Native Language First Proposed Approach 0.7212
[10] 0.7158

Table (5) the proposed approach correlation results for the Development Sets

Input With Lemma Form PCC PCC + Permutation

With StopWords 0.6402 0.7010
Modified Edit Distance (MED)
Without StopWords 0.6544 0.7314
With StopWords 0.6792 0.7349
MED + Semantic Word Space
Without StopWords 0.6835 0.7589

Table (6) the proposed approach correlation with similar works for STS-MSRvid Dataset

Used Technique Research PCC

Lexical & Semantic Similarity Second Proposed Approach 0.759
Similarity features + Machine Learning
[15] 0.743
Mean of IDF weighted vectors [13] 0.691

The proposed approach is compared to other works that used the same dataset as shown in Table (6). It demonstrates that
our suggested approach has a higher PCC than other research, implying that it has a benefit when compared to other
methodologies.

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Shimaa Ismail, AbdelWahab Alsammak and Tarek Elshishtawy 141

5. Conclusion information space to evaluate semantic textual

In this paper, we proposed two-hybrid approaches similarity,” In Proceedings of the 11th
that combine different semantic similarity techniques for International Workshop on Semantic Evaluation
measuring the similarity between two snipped Arabic (SemEval-2017), August., pp.77-84, 2017.
texts. The input text is preprocessed with different [9] R.Navigli and S.P.Ponzetto, “BabelNet: Building
preprocessing tools such as normalization and removing a very large multilingual semantic network,” In
noise diacritics which improved its efficiency. Proceedings of the 48th annual meeting of the
Moreover, A lemmatization tool is applied for the input association for computational linguistics, July.,
text and the used word embedding model to enrich the pp.216-225, 2010.
search word space. The semantic word spaces extracted [10] B.Hassan, S.AbdelRahman, R.Bahgat and
by applying the closest words algorithm or the fastText I.Farag, “FCICU at SemEval-2017 Task 1: Sense-
module are also lemmatized. Applying the based language independent semantic textual
lemmatization technique proved that it has a great effect similarity approach,” In Proceedings of the 11th
on results. In addition, the permutation tool is applied to International Workshop on Semantic Evaluation
overcome the word order problem that affects the (SemEval-2017), August., pp.125-129, 2017.
similarity significantly. Finally, the experimental results [11] Dzone “https://dzone.com/articles/introduction-
for the two proposed approaches over two different to-word-vectors”, [Date accessed 25/03/2022],
datasets are satisfying and close to the most values for 2018.
the researchers who used the same datasets. [12] P.Bojanowski, E.Grave, A.Joulin and T.Mikolov,
“Enriching word vectors with subword
References information,” Transactions of the Association for
[1] W.H.Gomaa and A.A.Fahmy, “A Survey of Text Computational Linguistics, vol.5, pp.135-146,
Similarity Approaches,” International Journal of 2017.
Computer Applications, vol.68, pp.13-18, 2013. [13] E.Nagoudi, J.Ferrero and D.Schwab, “LIM-LIG
[2] N.Y.Habash, “Introduction to Arabic natural at SemEval-2017 Task1: Enhancing the semantic
language processing”. Synthesis lectures on similarity for arabic sentences with vectors
human language technologies, vol.3(1), pp.1-187, weighting,” In Proceedings of the 11th
2010. International Workshop on Semantic Evaluation
[3] S.Ismail, A.Alsammak and T.Elshishtawy, “A (SemEval-2017), pp.134-138, 2017.
generic approach for extracting aspects and [14] M.A.Zahran, A.Magooda, A.Y.Mahgoub,
opinions of Arabic reviews,” In Proceedings of H.Raafat, M.Rashwan & A.Atyia. “Word
the 10th international conference on informatics representations in vector space and their
and systems, May., pp.173-179, 2016. applications for arabic.” In International
[4] M.Farouk, “Measuring sentences similarity: a Conference on Intelligent Text Processing and
survey”. arXiv preprint arXiv:1910.03940, 2019. Computational Linguistics, pp.430-443, Springer,
[5] V.Levenshtein, “Binary codes capable of Cham, 2015.
correcting deletions, insertions and reversals,” [15] M.Alian, A.Awajan, A.Al-Hasan and R.Akuzhia,
Soviet Physics Doklady, vol.10(8), pp.707-710, “Building Arabic Paraphrasing Benchmark based
1966. on Transformation Rules,” Transactions on Asian
[6] J.Tian, Z.Zhou, M.Lan and Y.Wu, “Ecnu at and Low-Resource Language Information
semeval-2017 task 1: Leverage kernel-based Processing, vol.20(4), pp.1-17, 2021.
traditional nlp features and neural networks to [16] R.Mihalcea, C.Corley, & C.Strapparava. Corpus-
build a universal model for multilingual and based and knowledge-based measures of text
cross-lingual semantic textual similarity,” semantic similarity. In Aaai ,Vol.6(2006),
In Proceedings of the 11th international pp.775-780, 2006.
workshop on semantic evaluation (SemEval- [17] S.Fernando, M.Stevenson, “A semantic similarity
2017), August., pp.191-197, 2017. approach to paraphrase detection”.
[7] J.Henderson, E.Merkhofer, L.Strickhart and In Proceedings of the 11th annual research
G.Zarrella, “MITRE at SemEval-2017 Task 1: colloquium of the UK special interest group for
Simple semantic similarity,” In Proceedings of computational linguistics, pp.45-52, 2008.
the 11th International Workshop on Semantic [18] Z.Wang, H.Mi, A.Ittycheriah, “Sentence
Evaluation (SemEval-2017), August., pp.185- similarity learning by lexical decomposition and
190, 2017. composition”. arXiv preprint arXiv:1602.07019,
[8] H.Wu, H.Y.Huang, P.Jian, Y.Guo and C.Su, 2016.
“BIT at SemEval-2017 Task 1: Using semantic

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

142 Arabic Semantic-Based Textual Similarity

[19] K.Abdalgader, A.Skabar, “Short-text similarity [21] E.Grave, P.Bojanowski, P.Gupta, A.Joulin, and
measurement using word sense disambiguation T.Mikolov, “Learning word vectors for 157
and synonym expansion”. In Australasian joint languages,” arXiv preprint arXiv:1802.06893,
conference on artificial intelligence pp.435-444, 2018.
Springer, Berlin, Heidelberg, 2010. [22] FastText Model, “Word vectors for 157
[20] StanfordNLP Package, “StanfordNLP 0.2.0 - languages”, “https://fasttext.cc/docs/en/crawl-
Python NLP Library for Many Human vectors.html”, [Date accessed 25/03/2022].
Languages”,
“https://stanfordnlp.github.io/stanfordnlp/index.ht
ml”, [Date accessed 25/03/2022]

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

View publication stats

Cit353 Summary With Past Question 2
No ratings yet
Cit353 Summary With Past Question 2
26 pages
Mineral Composition of Mature Carob PDF
No ratings yet
Mineral Composition of Mature Carob PDF
14 pages
Grade 3 Reading Comprehension Workbook
13% (8)
Grade 3 Reading Comprehension Workbook
3 pages
MESIntelligence Reports User Guide
No ratings yet
MESIntelligence Reports User Guide
93 pages
Synergistic Impact of Geopolymer Binder and Recycled Coarse Aggregates On The Performance of Concrete Masonry Units
No ratings yet
Synergistic Impact of Geopolymer Binder and Recycled Coarse Aggregates On The Performance of Concrete Masonry Units
15 pages
PhytochemicalcharacterizationandimmunomodulatoryeffectsofaqueousandethanolicextractsandessentialoilofMoroccanLaurusnobilisL Lauraceaeonhumanneutrophils
No ratings yet
PhytochemicalcharacterizationandimmunomodulatoryeffectsofaqueousandethanolicextractsandessentialoilofMoroccanLaurusnobilisL Lauraceaeonhumanneutrophils
9 pages
Applications of Petri Net Based Models in Manufacturing Systems: A Review
No ratings yet
Applications of Petri Net Based Models in Manufacturing Systems: A Review
14 pages
Mechanical Behavior of Welded and Un-Welded Polyethylene Pipe Materials
No ratings yet
Mechanical Behavior of Welded and Un-Welded Polyethylene Pipe Materials
11 pages
A Review of Definition and Classification of Heritage Buildings and Framework For Their
No ratings yet
A Review of Definition and Classification of Heritage Buildings and Framework For Their
6 pages
Seismicity and Seismotectonics of Jeddah-Makkah Region, West-Central Saudi Arabia
No ratings yet
Seismicity and Seismotectonics of Jeddah-Makkah Region, West-Central Saudi Arabia
10 pages
Artic ICST12 Ccfinalv 22
No ratings yet
Artic ICST12 Ccfinalv 22
8 pages
Acs Omega 2023
No ratings yet
Acs Omega 2023
14 pages
Pdfiroud
No ratings yet
Pdfiroud
2 pages
Metabolites 14 00193 v2
No ratings yet
Metabolites 14 00193 v2
16 pages
Nehad Paper-1
No ratings yet
Nehad Paper-1
16 pages
Elnaem2020impact of Pharmacist Led Interventions On Medication Adherence
No ratings yet
Elnaem2020impact of Pharmacist Led Interventions On Medication Adherence
12 pages
BrJHaematol 2022 Eposters1 40
No ratings yet
BrJHaematol 2022 Eposters1 40
2 pages
Optimum Design of Wide Span Cable-Stayed Roof Structures
No ratings yet
Optimum Design of Wide Span Cable-Stayed Roof Structures
13 pages
New Correlations Calculate Volatile Oil, Gas Condensate PVT Properties
No ratings yet
New Correlations Calculate Volatile Oil, Gas Condensate PVT Properties
7 pages
Face Recognition: A Literature Review: January 2005
No ratings yet
Face Recognition: A Literature Review: January 2005
17 pages
Design Symmetricand Asymmetric Single Circular Split Ring Microwave Sensor Using COMSOLMultiphysics
No ratings yet
Design Symmetricand Asymmetric Single Circular Split Ring Microwave Sensor Using COMSOLMultiphysics
7 pages
QCMRIACRphantom
No ratings yet
QCMRIACRphantom
9 pages
QCMRIACRphantom
No ratings yet
QCMRIACRphantom
9 pages
015 Final
No ratings yet
015 Final
9 pages
Multi-Level Architecture Modeling of An Intelligent Energy Management Strategy For Battery/Supercapacitor Electric Vehicle
No ratings yet
Multi-Level Architecture Modeling of An Intelligent Energy Management Strategy For Battery/Supercapacitor Electric Vehicle
8 pages
570 ArticleText 1797 2 10 20210506
No ratings yet
570 ArticleText 1797 2 10 20210506
7 pages
Preparation and Characterization of A New Family of Biointerpenetrating Network Hydrogel Based On A Green Methodegyptian Journal of Chemistry1
No ratings yet
Preparation and Characterization of A New Family of Biointerpenetrating Network Hydrogel Based On A Green Methodegyptian Journal of Chemistry1
15 pages
Design and Implementation of An Unmanned Ground Vehicle For Security Applications
No ratings yet
Design and Implementation of An Unmanned Ground Vehicle For Security Applications
7 pages
Pga 167169 PDF
No ratings yet
Pga 167169 PDF
26 pages
MEPCON 2019 Paper 188
No ratings yet
MEPCON 2019 Paper 188
7 pages
Activity Based Cost Estimation Model For Al Extr Profiles
No ratings yet
Activity Based Cost Estimation Model For Al Extr Profiles
11 pages
Dynamic Model in Estimating The Impact of Competition On Banking Efficiency: Evidence Form MENA Countries
No ratings yet
Dynamic Model in Estimating The Impact of Competition On Banking Efficiency: Evidence Form MENA Countries
11 pages
1 s2.0 S1110982321000582 Main1
No ratings yet
1 s2.0 S1110982321000582 Main1
12 pages
Current Pharmaceutical Situation - Services - inYemenandFutureChallenges1
No ratings yet
Current Pharmaceutical Situation - Services - inYemenandFutureChallenges1
8 pages
Jnep 16 3 03012
No ratings yet
Jnep 16 3 03012
5 pages
Optimal Coordinationof Directional Overcurrent Relaysin Interconnected Networksutilizing UDCand FCL
No ratings yet
Optimal Coordinationof Directional Overcurrent Relaysin Interconnected Networksutilizing UDCand FCL
6 pages
Using Deep Learning For Automatically Determining Correct Application of Basic Quranic Recitation Rules
No ratings yet
Using Deep Learning For Automatically Determining Correct Application of Basic Quranic Recitation Rules
7 pages
Crude Oil
No ratings yet
Crude Oil
8 pages
A Brief Analysis of Amazon Online Reviews: October 2019
No ratings yet
A Brief Analysis of Amazon Online Reviews: October 2019
7 pages
Big Data Quality Eval RG
No ratings yet
Big Data Quality Eval RG
8 pages
Accepted Proof
No ratings yet
Accepted Proof
8 pages
Computational Study and Preclinical Evaluation of A Novel Radiosynthesized 47sc-Dipeptide Derivative As A Promising Cancer Theranostic Agent
No ratings yet
Computational Study and Preclinical Evaluation of A Novel Radiosynthesized 47sc-Dipeptide Derivative As A Promising Cancer Theranostic Agent
2 pages
Syrianrefugeesthe Nederlandand IDPs
No ratings yet
Syrianrefugeesthe Nederlandand IDPs
14 pages
New Analytical Expressions of Photovoltaic Solar Module Physical Parameters Effects of Module Temperature and Incident Solar Irradiance
No ratings yet
New Analytical Expressions of Photovoltaic Solar Module Physical Parameters Effects of Module Temperature and Incident Solar Irradiance
14 pages
Callforbookchapters
No ratings yet
Callforbookchapters
5 pages
Icmiee22 042
No ratings yet
Icmiee22 042
9 pages
Online Marketing Campaigns Aesthetics Measuring The Direct Effect On Customers Decision Making
No ratings yet
Online Marketing Campaigns Aesthetics Measuring The Direct Effect On Customers Decision Making
15 pages
Investigation of Energy Recovery From MSW
No ratings yet
Investigation of Energy Recovery From MSW
12 pages
Solar Parabolic Dish Stirling Engine System Design, Simulation, and Thermal Analysis
No ratings yet
Solar Parabolic Dish Stirling Engine System Design, Simulation, and Thermal Analysis
17 pages
Article Zerrouki
No ratings yet
Article Zerrouki
21 pages
Experimental Study On Using The Aluminum Sacrificial Anode A
No ratings yet
Experimental Study On Using The Aluminum Sacrificial Anode A
7 pages
Molecules KARA
No ratings yet
Molecules KARA
14 pages
Effectivenessofthetranspalatalarchincontrollingorthodonticanchorageinmaxillarypremolarextractioncases Asystematicreviewandmeta Analysis
No ratings yet
Effectivenessofthetranspalatalarchincontrollingorthodonticanchorageinmaxillarypremolarextractioncases Asystematicreviewandmeta Analysis
13 pages
Optimum Sun Exposure Timesfor Vitamin DStatus Correctionin Saudi Arabia
No ratings yet
Optimum Sun Exposure Timesfor Vitamin DStatus Correctionin Saudi Arabia
9 pages
2021 Zellagui Etal IETES
No ratings yet
2021 Zellagui Etal IETES
36 pages
Performance of The Audio Signals Transmission Over
No ratings yet
Performance of The Audio Signals Transmission Over
15 pages
Cement Stabilizationof Reclaimed Asphalt Pavement Aggbasesand Subbases
No ratings yet
Cement Stabilizationof Reclaimed Asphalt Pavement Aggbasesand Subbases
8 pages
Resilienceand Vulnerabilityin Supply Chain Literaturereview 2016
No ratings yet
Resilienceand Vulnerabilityin Supply Chain Literaturereview 2016
7 pages
Full-Scale Experimental Tests On Three Lattice Steel Transmission Towers
No ratings yet
Full-Scale Experimental Tests On Three Lattice Steel Transmission Towers
21 pages
2022 F Sustainability CRM
No ratings yet
2022 F Sustainability CRM
24 pages
Energy-Spectral Efficiency Trade-Off in IRS-Assisted NOMA Systems A Weighted Product Method
No ratings yet
Energy-Spectral Efficiency Trade-Off in IRS-Assisted NOMA Systems A Weighted Product Method
11 pages
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet
Disambiguation of Particles: Hindi-To-English
From Everand
Disambiguation of Particles: Hindi-To-English
Anil Thakur
No ratings yet
2017 EC Academic Calendar
No ratings yet
2017 EC Academic Calendar
3 pages
Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts
No ratings yet
Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts
7 pages
ImprovingPart of SpeechTagginginAmharicLanguageUsingDNN
No ratings yet
ImprovingPart of SpeechTagginginAmharicLanguageUsingDNN
16 pages
Dereje Mesfin: Sno College Department Time Lab # of Stud. Invigilator Supervisor
No ratings yet
Dereje Mesfin: Sno College Department Time Lab # of Stud. Invigilator Supervisor
1 page
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
No ratings yet
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
9 pages
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
No ratings yet
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
162 pages
2017 2nd and 4th Class Schedule-Final
No ratings yet
2017 2nd and 4th Class Schedule-Final
2 pages
2nd Yr Maths Summer Class Sechedule
No ratings yet
2nd Yr Maths Summer Class Sechedule
1 page
Mitiku Tamirat Profile
No ratings yet
Mitiku Tamirat Profile
1 page
HDP Work Book Final
100% (2)
HDP Work Book Final
98 pages
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
100% (1)
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
115 pages
Grade 8-Information Technology IT Fetena Net Af43
100% (1)
Grade 8-Information Technology IT Fetena Net Af43
115 pages
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
No ratings yet
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
12 pages
Applsci 12 09691 v2
No ratings yet
Applsci 12 09691 v2
35 pages
Paraphrasing Textual Entailment and Semantic Simil
No ratings yet
Paraphrasing Textual Entailment and Semantic Simil
239 pages
Collective Human Opinions in Semantic Textual Simi
No ratings yet
Collective Human Opinions in Semantic Textual Simi
17 pages
Kaiwartya 2016
No ratings yet
Kaiwartya 2016
17 pages
Published Paper
No ratings yet
Published Paper
12 pages
Grade 8-Social Studies Fetena Net 1dc2
100% (5)
Grade 8-Social Studies Fetena Net 1dc2
213 pages
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
100% (1)
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
144 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
6-Lecture Six (Chapter Four-Semantic Analysis)
No ratings yet
6-Lecture Six (Chapter Four-Semantic Analysis)
25 pages
The Final Main Thesis-Compressed
No ratings yet
The Final Main Thesis-Compressed
85 pages
Handout Cloud, Iot, Ip
No ratings yet
Handout Cloud, Iot, Ip
141 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
7-Information Extraction (IE) and Machine Translation (MT)
No ratings yet
7-Information Extraction (IE) and Machine Translation (MT)
46 pages
Let2 W
No ratings yet
Let2 W
46 pages
8-Deep Learning For NLP
No ratings yet
8-Deep Learning For NLP
49 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
2-Lecture Two - (Back Ground of NLP)
No ratings yet
2-Lecture Two - (Back Ground of NLP)
65 pages
Fiction Becomes Fact: Sustainable Information and Communications Technology in 2020
No ratings yet
Fiction Becomes Fact: Sustainable Information and Communications Technology in 2020
38 pages
Mms Exam Form
No ratings yet
Mms Exam Form
1 page
Data Center Design Guide
No ratings yet
Data Center Design Guide
11 pages
Piezas de La Máquina ATM Del Dispensador NCR S2
No ratings yet
Piezas de La Máquina ATM Del Dispensador NCR S2
7 pages
JS Essentials 1 Overview (2021!10!29)
No ratings yet
JS Essentials 1 Overview (2021!10!29)
24 pages
Exploded View and Parts List
No ratings yet
Exploded View and Parts List
2 pages
App Service Plans
No ratings yet
App Service Plans
6 pages
Bridge Course
No ratings yet
Bridge Course
49 pages
Disha
No ratings yet
Disha
17 pages
Data Change Manual
No ratings yet
Data Change Manual
3 pages
C++ Operator Overloading 2
No ratings yet
C++ Operator Overloading 2
38 pages
Introduction To Internet of Things
No ratings yet
Introduction To Internet of Things
54 pages
Factoring Polynomials: Be Sure Your Answers Will Not Factor Further!
No ratings yet
Factoring Polynomials: Be Sure Your Answers Will Not Factor Further!
5 pages
Improving Throughput and Availability of Cellular Digital Packet Data (CDPD)
No ratings yet
Improving Throughput and Availability of Cellular Digital Packet Data (CDPD)
12 pages
Research Paper Topics On Computer Engineering
100% (1)
Research Paper Topics On Computer Engineering
7 pages
Mil - Q2 - Module 5
No ratings yet
Mil - Q2 - Module 5
9 pages
Security System PDF
No ratings yet
Security System PDF
4 pages
Dog Shaped Coin Bank and Soap Dispenser 4a6ce93c c903 4af5 PDF
No ratings yet
Dog Shaped Coin Bank and Soap Dispenser 4a6ce93c c903 4af5 PDF
3 pages
326 - 7 - 052020 - Manuale Sweet Sauna Pro ENG
No ratings yet
326 - 7 - 052020 - Manuale Sweet Sauna Pro ENG
29 pages
HL Paper1
No ratings yet
HL Paper1
15 pages
5G Network Emulation Solutions Catalog
No ratings yet
5G Network Emulation Solutions Catalog
23 pages
Normal
No ratings yet
Normal
41 pages
Web Based Event Management System EMS
No ratings yet
Web Based Event Management System EMS
4 pages
Weekly Lesson Plan (Grade 10)
No ratings yet
Weekly Lesson Plan (Grade 10)
8 pages
Face Mask Detection
No ratings yet
Face Mask Detection
44 pages
Programming For Problem Solving B.tech 1ST SEM ALL STUDENT
No ratings yet
Programming For Problem Solving B.tech 1ST SEM ALL STUDENT
6 pages
Q. Explain Scope of Variable With An Example - Local, Global, Instance
No ratings yet
Q. Explain Scope of Variable With An Example - Local, Global, Instance
5 pages

Shimaa IsmailSemanticSimilarity

Uploaded by

Shimaa IsmailSemanticSimilarity

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Arabic Semantic-Based Textual Similarity

Article in Benha Journal of Applied Sciences · April 2022

Shimaa Ismail Abdelwahab el sammak

SEE PROFILE SEE PROFILE

Cloud Computing View project

Steganography View project

The user has requested enhancement of the downloaded file.

Arabic Semantic-Based Textual Similarity

1. Introduction and determines how similar these characters are. The

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Fig. (1) The Proposed approach Overview.

Fig. (2) A screenshot from the fastText Arabic Model dataset

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Algorithm 1: Finding the closest words for a specific word

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Algorithm 2: Finding the edit cost between two words

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Language Researchers PCC

Input With Lemma Form PCC PCC + Permutation

Used Technique Research PCC

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

5. Conclusion information space to evaluate semantic textual

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

Benha Journal Of Applied Sciences, Vol. (7) Issue (4) (2022(

View publication stats

You might also like