S2-Hybrid Method For Text Summarization Based On Statistical and Semantic Treatment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-021-10613-9

Hybrid method for text summarization based


on statistical and semantic treatment

Nabil Alami 1 & Mostafa El Mallahi 2 1


& Hicham Amakdouf & Hassan Qjidaa
1

Received: 25 June 2020 / Revised: 16 October 2020 / Accepted: 25 January 2021

# The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021

Abstract
Text summarization presents several challenges such as considering semantic relation-
ships among words, dealing with redundancy and information diversity issues. Seeking to
overcome these problems, we propose in this paper a new graph-based Arabic summa-
rization system that combines statistical and semantic analysis. The proposed approach
utilizes ontology hierarchical structure and relations to provide a more accurate similarity
measurement between terms in order to improve the quality of the summary. The
proposed method is based on a two-dimensional graph model that makes uses statistical
and semantic similarities. The statistical similarity is based on the content overlap
between two sentences, while the semantic similarity is computed using the semantic
information extracted from a lexical database whose use enables our system to apply
reasoning by measuring semantic distance between real human concepts. The weighted
ranking algorithm PageRank is performed on the graph to produce significant score for all
document sentences. The score of each sentence is performed by adding other statistical
features. In addition, we address redundancy and information diversity issues by using an
adapted version of Maximal Marginal Relevance method. Experimental results on EASC
and our own datasets showed the effectiveness of our proposed approach over existing
summarization systems.

Keywords Text summarization . NLP . WordNet . AWN . Graph based ranking . Maximal
marginal relevance

* Nabil Alami
[email protected]

Mostafa El Mallahi
[email protected]

Hicham Amakdouf
[email protected]
Hassan Qjidaa
[email protected]

Extended author information available on the last page of the article


Multimedia Tools and Applications

1 Introduction

Text summarization (TS) is a challenging task in the natural language processing area (NLP).
It seeks to facilitate the task of reading and searching information in large documents by
generating reduced ones with no loss of meaning. Hovy [34] defined a summary as “a text that
is produced from one or more texts which contains a significant portion of the information in
the original text(s), and that is no longer than half of the original text(s)”. According to Mani
and Maybury [42], TS is “the process of distilling the most important information from a
source (or sources) to produce an abridged version for a particular user (or users) and task (or
tasks)”. In addition to a text document, automatic summarization can be applied to all kinds of
media as speech, multimedia documents, hypertext or even video. Text summarization is
necessary to fix information content overload issue, and the vast amount of online information
produced due to the rapid growth of the Internet, and the large array of media mass storage.
Since humans cannot handle large text volumes manually, they seek to save time and reduce
the cost by the help of automatic analysis methods. Such methods should avoid the need to
look in the whole document in order to decide whether it is of interest or not, by finding the
most relevant information quickly. Automatic summarization can be applied in other natural
language processing tasks such as clustering, classification, indexing, keyword extraction and
so on. Luhn [40] was the first to tackle the automatic summarization of texts, which is, to date,
a dynamic field with many challenging issues.
Overall, text summarization techniques are divided into two major categories: extractive
and abstractive. The former methods consist of extracting key sentences from the input
document and presenting them in the same order as a summary. Deciding on the relevance
of sentences rests on the weight of each one as accounted for by statistical and linguistic
features. Abstraction, however, involves rephrasing the most relevant parts of the source
document, that is to say, digesting the major concepts in the initial text and presenting them
in a shorter document. Linguistic and statistical methods, in addition to human knowledge, are
prerequisites in this respect. Whereas abstractive summarization needs heavy machinery for
language generation and is not easy to implement or stretch to larger domains, simple
extraction of sentences has yielded positive results in large-scale applications, namely in
multi-document summarization.
Research in Arabic Text summarization is still in its early beginning and the literature that
addresses this subject area in Arabic is fairly small and only recently compared to that on
Anglo-Saxon, roman and other Asian languages. Moreover, summarization systems for Arabic
have not reached the same level of maturity and reliability as those for English, for instance. It
is worth mentioning that Arabic is the mother tongue of all Arabic countries with a very fast
growth pace on the web, given that the number of internet users increased by 5296.6%
between 2000 and 2013. Arabic is one of the five most spoken languages in the world and
is also in the fourth position in terms of those used in the internet, after English, Chinese and
Spanish. However, research on Arabic NLP is still embryonic because Arabic is not given
equal attention as other languages. Therefore, the need to develop systems for the processing
and summarization of electronic Arabic texts is growing significantly.
Generally speaking, when the summarization process is done by humans, they read the text
first and understand it using some basic level of background knowledge. In this work, an
attempt is made to enable the proposed system to understand the relationships between
different textual components of an Arabic document, by providing some human constructed
knowledge repositories and integrating concepts from a wide range of areas. Traditional
Multimedia Tools and Applications

Arabic summarization methods do not take into account the semantic relationships among
words so they cannot represent the meaning of documents accurately. To solve this issue,
several languages like English have resorted to the introduction of semantic information from
ontologies such as WordNet (WN) to improve the quality of text summarization. Furthermore,
information redundancy and diversity is a typical problem for automatic text summarization
(ATS) and especially for Arabic documents. To deal with this situation, during summary
generation, the Maximal Marginal Relevance (MMR) method [11] is used to reduce redundant
information. Recently, graph-based ranking algorithms, motivated by the PageRank algorithm
[10], has shown their effectiveness in text summarization [7, 21, 23, 46]. To construct a graph,
a node needs to be added for each sentence in the text, and the edges between nodes are
established through sentence inter-connections that are defined by their relationships.
In this paper, we introduce a new graph-based Arabic summarization system that
combines statistical and semantic analysis to achieve the summarization task, and to avoid
the usual problems of this task: information redundancy and diversity. In the first step, the
proposed system proceeds to document preprocessing. Second, it computes the similarity
between each pair of sentences. For this, two types of similarity are adopted: (i) statistical
similarity based on the overlap of content between two sentences, (ii) semantic similarity
measure based on the semantic information extracted from Arabic WordNet (AWN). If an
Arabic word does not exist in the AWN ontology, the proposed algorithm uses a machine
translation from Arabic to English and interrogates the WordNet ontology so as to
calculate the semantic similarity. By using text representation, the proposed algorithm
aims at converting the text into a graph model with a two-fold relation between sentences:
statistical similarity and semantic similarity. Subsequently, the weighted ranking algo-
rithm PageRank [10] is executed on the graph to produce relevant score for each sentence
in the input text. Then, each individual sentence score is improved by adding other
statistical features. The top-ranking sentences are identified to form the summary for the
input document and an adapted MMR algorithm version [11] is applied to eliminate
unneeded information and enhance the quality of the final summary. In addition, the main
contributions of this research include:

& Proposing a domain independent method that does not need any domain-specific knowl-
edge or features.
& Proposing a hybrid framework that combines different extractive approaches: graph-based,
statistical-based and semantic-based.
& Dealing with a lack of semantic resources dedicated to Arabic by using a machine
translation between Arabic and English to benefit from the richness and opportunities
offered by the English language in this field.
& Proposing a new sematic similarity measure between two Arabic sentences.
& Proposing a sentence selection strategy that addresses redundancy elimination problem
and increases information diversity of the output summary.
& Adopting a new graph model based on both statistical and semantic similarities to make a
two-dimensional graph representation of the input text as a set of sentences linked by
meaningful relations.

As for the other parts of this paper, they are organized as follows. Section 2 briefly overviews
the text summarization literature, with a focus on Arabic, and discusses the limitation of the
current approaches. Section 3 describes in detail the suggested Arabic text summarization
Multimedia Tools and Applications

system. System evaluation and the experimental findings are dealt with in Section 4. And
finally Section 5 concludes with pointers to future works.

2 Related work

Automatic summarization of textual data can be classified into several approaches among
which we mention: statistical-based, graph-based, semantic-based topic-based, machine learn-
ing and deep learning-based. Compared to research on English, works on ATS for Arabic are
recent. In this section, we will briefly review some works related to these kinds of approaches
with a focus on Arabic. Then, we will report the characteristics of Arabic and discuss some
limitations of the current approaches.

2.1 Automatic text summarization approaches

2.1.1 Statistical-based approaches

These methods use statistical analysis of a set of features to extract important sentences and
words from the input document. They are based essentially on the calculation of a score
associated with each sentence in order to estimate its importance in the text. The final summary
will keep only the sentences with the highest scores. These traditional approaches are among
the first that were applied in automatic summarization task [15, 40]. These kinds of approaches
are language independent and do not require any extra linguistic knowledge or complex
linguistic processing [22]. However, similar sentences that have high scores may be included
in the summary and other important ideas of the document may not be selected. In
Afsharizadeh et al. [1], a query-oriented text summarization is proposed. The authors extracted
the most informative sentences by combining the weights of the best 11 features, which are
document feature, sentence position, normalized sentence length, numerical data, proper noun,
topic frequency, topic token frequency, headline frequency, start cluster frequency, skip bi-
gram topic frequency, and cluster frequency. Heu et al. [33] proposed a multi-document
summarization (FoDoSu) based on Folksonomy system that employs tag clusters generated
by a well-known Flickr application in order to detect important sentences from a documents
set. After the pre-processing step, the system constructs a Word Frequency Table (WFT) and
uses a HITS algorithm to discover the semantic relationships between words with the help of
tag clusters from Flickr. Each sentence is then scored according to the importance of its words
and the semantic relatedness to words in other sentences. Results obtained by experiments on
TAC 2008 and 2009 datasets show that the proposed system outperforms existing state-of-the
art systems.
Regarding Arabic, Douzidia and Lapalme [14] was to our knowledge the first research
work designed for Arabic text. It uses a linear combination of many sentence-scoring features:
terms frequency, sentence position, cue words and title words. In [20], the authors proposed
two Arabic text summarization systems: AQBTSS, a query-based Arabic single-document
summarizer, and ACBTSS, a concept-based summarizer. The first system takes an Arabic
document and Arabic user’s query in order to generate a summary for the document in
agreement with the given query. The second is based on a bag-of- words representation of
some concepts proposed by the authors. In both systems, the vector space model is used to
score sentences and the summarization result is consistent with the query or concept. In both
Multimedia Tools and Applications

systems, all sentences are scored according to the vector space model (VSM). The relevance of
the summary is related to the input query and concept. The authors of [47] proposed a
statistical approach to deals with both single and multi-document summarizations for Arabic.
The system extracts the summary sentences by ranking the terms of each sentence. The term
scoring process is based on both a clustering technique and an adapted discriminant analysis
method: mRMR (minimum redundancy and maximum relevance) [51]. The experimental
results on EASC (Essex Arabic Summaries Corpus) for single-document summarization and
TAC 2011 Multi-Lingual datasets for multi-document summarization showed that the sug-
gested approach is competitive to standard systems and outperformed the lead baseline.

2.1.2 Graph-based approaches

Graph theory has been successfully applied to many NLP tasks in order to represent the
semantic relationships between different units of a document. Such representation has been
commonly used for extractive summarization. In graph-based summarization, textual units and
the relations between them are represented in the form of undirected graphs. Each unit is
represented by a node in the graph. The relation between different units is modeled by an edge
between them. Many kinds of relations can be considered, such as, number of common words
or cosine similarity. These kinds of approaches are domain and language-independent and can
enhance coherency by detecting redundant information. The main disadvantage is that they
represent sentences as Bag of Words and compute the similarity measure without considering
the importance of words, which may fail to detect semantically similar sentences. The chosen
sentences are modified according to the accuracy of similarity computation. LexRank and
TextRank are the most important methods proposed for graph-based automatic summarization.
TextRank [43] is a graph-based ranking model used for both automatic text summarization and
key-words extraction. In the summarization task, each sentence is represented by a node in the
graph and the edge between two nodes represents the similarity relation that is measured as a
content overlap between the given sentences. The weight of each edge indicates the impor-
tance of a relationship. Sentences are ranked based on their scores and those that have very
high score are chosen. LexRank [23] is another automatic summarization system, which is
similar to TextRank. Both of them use graph-based approach for TS. The difference between
them is that the similarity measure used by TextRank is based on the number of similar words
shared between the two sentences, while LexRank uses cosine similarity measure of TF-IDF
vectors. The sentences are ranked according to the PageRank [10] algorithm, which is
generally used to rank the graph elements. Rinaldi and Russo [53] used a multimedia semantic
graph for web document visualization and summarization. The authors used both statistical
and semantic analysis of textual and visual contents in order to build a Visual Semantic Tag
Cloud from the most important terms in a document. The semantic information is derived from
a knowledge base where concepts are represented through several multimedia items. The
authors demonstrated that with the help of semantic analysis, the proposed approach improves
the user knowledge acquisition by means of a synthesized visualization. The robustness of the
approach has been shown by outperforming state-of-art algorithms in topic detection combin-
ing both visual and semantic information. El-Kassas et al. [21] proposed a novel extractive
graph-based system called “EdgeSumm” that relies on four proposed algorithms. After the
preprocessing step, the system builds a new graph representation of the input text using NLP
techniques such as lemmatization and POS tagging. Then, the weight of each node in the graph
is calculated based on several features (title, keywords, bi-grams… etc.). After that, the system
Multimedia Tools and Applications

searches the constructed text graph for candidate sentences to be included in the summary.
Finally, the system reorders sentences and concatenates them to generate the final summary.
EdgeSumm is designed for any domain addressed by the document. It does not require any
training data since it is an unsupervised technique. The evaluation results showed that the
proposed system outperforms the state-of-the-art ATS systems in terms of ROUGE-1 and
ROUGE-L.
Regarding Arabic, Elbarougy et al. [16] presented an extractive text summarization system
designed for Arabic. The authors proposed a document representation model based on graph
where the vertices of the graph are the sentences. A Modified PageRank algorithm is applied
with an initial score for each node that is the number of nouns in this sentence. The weight of
each edges between two sentences are calculated based on the cosine similarity between the
sentences. The proposed method performs text summarization with three major steps: prepro-
cessing step, features extraction and graph construction step, and finally applying the new
Modified PageRank algorithm and summary generation. Experimental results showed that the
proposed approach gives better results when compared with state-of-the-art Arabic text
summarization methods.

2.1.3 Semantic-based approaches

The idea behind semantic-based approaches is to represent the documents to be summarized


by a semantic representation and fed this representation to a summarization technique to
produce a final summary. Latent Semantic Analysis (LSA) is a well-known language-inde-
pendent technique that generate a semantic representation based on observed co-occurrence of
words [22]. The main drawback of LSA-based summarization is that the generated summary
depends on the quality of the semantic representation of the input document. Semantic Role
Labelling (SRL) is also used for semantic-based text summarization. In [45], the authors used
Explicit Semantic Analysis (ESA) to represent words as a vector of weighted Wikipedia
concepts then they achieve sentence semantic parsing using SRL to construct a semantic
representation for input documents. SRL provides a good semantic relationship between
phrases or sentences but the quality of the final summary is related to the quality of the
semantic representation of the documents to be summarized. Other semantic-based techniques
using WordNet are introduced for ATS. Baruah et al. [8] presented a new text summarization
approach using Assamese WordNet. The method improves the word frequency calculation by
using synsets extracted from Assamese WordNet. After identifying and weighing the salience
sentences a graph representation of the input text is built based on vector centrality concept and
cosine similarity measure. Experimental results demonstrated the effectiveness of our proposed
method.
Recently, many works related to semantic representation have been done. Kang and
Nguyen [36] proposed a random forest technique that learns the weights, shapes, and sparsities
of feature representations for real-time semantic segmentation. Gao et al. [30] presented an
elaborately designed architecture of the deep neural network to build a new co-saliency
detection approach for surveillance scene analysis in the context of trustful Internet-of-
Things designed for smart cities. The authors used a multi-stage context perception to extract
the semantic features from images provided by numerous IoT devices. In Gao et al. [31], the
authors introduced a distributed deep learning-based framework for salient object detection
combining cloud and edge computing. The media data are allocated to different servers for
training the detection model. The proposed framework enables the hierarchical information
Multimedia Tools and Applications

allocation strategy in the cloud and proposes a novel pyramidal deep learning model in order to
capture effectively the global contextual features of the salient object. Thus, the balance
between learning the within-semantic knowledge and the cross-semantic knowledge was
improved.

2.1.4 Topic-based approaches

Fang et al. [25] developed Topic Aspect-Oriented Summarization (TAOS). They used various
features (topic factors) that describe different topics. These topics can have different aspects
represented by various preferences of features. First, the system extracts various groups of
feature, and then select common group of features according to a selected group norm penalty
and latent variables. This approach is used for text as well as image summarization. For
document summarization, the authors extracted three features: sentence length, sentence
position and word frequency. For image summarization, they used Histogram of Oriented
Gradient (HOG), bag-of-visual word and color histogram. In order to generate the summary,
greedy algorithm is implemented considering the coverage and diversity issues. Experiment
results show that the proposed method outperforms both document and image summarization
methods. In another work, a topic model based approach for novel summarization is proposed
by Wu et al. [58]. The summarization task is performed in three main steps. The first step
consists of topic modeling where candidate sentences associated with topic words are extracted
based on probability distribution of each topic word in the input novel. The second step adopts
an importance evaluation function in order to choose the most relevant sentences from the
candidate sentences and thus produces an initial novel summary. In the final step, the proposed
method refines the initial summary to improve its readability by dealing with the semantic
confusion issue caused by synonymous or ambiguous words. Experimental results on real
datasets showed that the proposed approach produces not only a higher compression ratio, but
also better summarization quality compared to other candidate methods. Rani and Lobiyal [52]
proposed an extractive text summarization approach for Hindi language. They used topic
modeling with LDA to tag lexical and important information from Hindi novels and stories.
Since Hindi suffers from the unavailability of corpus and the inadequacy of the processing
tools, authors prepared their own corpus and tools. Experiments showed optimal results
compared with the baselines and traditional topic modeling approach.

2.1.5 Machine learning approaches

The idea behind these approaches is to use a training set of data to train the summarization
system, which is modeled as a supervised classification problem. After the training phase, the
system can classify sentences into two groups: summary or non-summary sentences. The
probability of selecting a sentence to be included in the summary is estimated based on the
training documents and their manually generated summaries. The main drawback of machine
learning-based approaches is that they require a large set of training data with their manually
extracted summaries such that each sentence in the training dataset should be labeled as
summary or non-summary to ameliorate the sentence selection for the final summary. Yang
et al. [59] presented an extractive ATS method, which uses the following steps. First, the
proposed method uses a new text representation model called “Enhanced Sentence Embedding
model” for sentence representation based on word embedding and three types of semantic
features, which are POS tagging, n-gram and sentence position. Second, it uses the cascade
Multimedia Tools and Applications

forest as a classification algorithm to decide whether a sentence should be included in the


summary or not.
Due to its complexity, little works based on machine learning have been done for Arabic
text summarization. Many models for ATS using machine learning have been investigated by
[27]. The authors extracted ten features, such as, sentence position, keywords, sentence
resemblance the title and other sentences and named entity. They used their own corpus
composed of 100 Arabic documents and 50 English documents in order to train several
models: genetic algorithm, mathematical regression, feed forward neural network, probabilistic
neural network, and Gaussian mixture model. Sentences are then ranked according to the
trained models. The experimentations shown that the obtained results are promising especially
when using the Gaussian mixture model. It should be noted that, the authors used a limited
corpus to train their proposed models (100 Arabic documents and 50 English documents). This
impact negatively the performance of the machine learning based models, which require a
large amount of training data. El-Fishawy et al. [18] introduced a machine learning based
method for summarizing Arabic micro blogging posts. The summarization task is formulated
as a regression problem. Instead of classifying sentences as summary or non-summary, the
authors gives each tweet a score that defines whether the tweet may be included in the
summary or not. The proposed method combines a conventional decision tree with the
possibility of linear regression functions. Experimental results indicate that the proposed
method achieves better performance compared to the manual summarization and other
competitors.

2.1.6 Deep learning approaches

Other kinds of approaches based on deep learning and neural networks have been
proposed in recent years. In [60], the authors proposed a text summarization system
based on deep neural network. They used an unsupervised approach based on deep
autoencoder (AE) to compute a feature space from the input representation based on a
sentence of a document. Various input representations are explored as an input to the
AE: tf-idf using global vocabulary with different lengths, term frequency using local
vocabulary (Ltf). After training the AE, each sentence is mapped into its latent space in
order to calculate the semantic similarity between sentences. The sentences are ranked
according to their semantic similarity with the query. Experiments results show that the
proposed approach can make further improvements by using an Ensemble Noisy Auto-
Encoder (ENAE), which consists of adding noise to the input text and selecting the top
ranked sentences from several runs. The authors of [55] presented a deep learning
method for abstractive text summarization based. After the text pre-processing step, the
proposed approach used a new phrase extraction method to decompose the sentences in
the original document into a sequence of phrases by constructing a binary semantic tree.
The authors refine the extracted phrase that is redundant in semantics or syntactic
structure before training model because the proposed extraction method cannot guarantee
that extracted phrases are correct in the syntactic and semantics structure. Then a new
deep learning model composed of CNN and LSTM is adopted in order to learn a new
representation of phrases rather than words. The use of phrases enables the proposed
model to produce natural sentences. Experimental results on CNN and DailyMail
datasets showed that the proposed approach outperforms the state-of-the-art models in
terms of both syntactic and semantics structure.
Multimedia Tools and Applications

Little works based on deep learning have been done for Arabic text summarization.
Alami et al. [3] presented a novel Arabic summarization method based on variational
auto-encoder (VAE) model to learn a feature space from a high-dimensional input data.
The authors explored several input representations such as term frequency (tf), tf-idf
and both local and global vocabularies. All sentences are ranked according to their
semantic similarity in the concept space produced by the VAE. Experiments on two
kinds of datasets showed that the VAE using tf-idf representation of global vocabular-
ies provides a more discriminative feature space and improves the recall of other
models. The obtained results confirm that the proposed method outperforms the state-
of-the-art approaches for both graph-based and query-based summarization technique.
In another work, Alami et al. [4] adopted several unsupervised deep neural network
models with word embedding an ensemble learning to improve the performance of
Arabic document summarization. They used two types of document representation.
Sentence representation built from the BOW approach and Sentence2vec representation
built from word2vec approach. The authors used auto-encoder, variational auto-encoder
and extreme learning machine auto-encoder in order to learn the latent semantic
representation of documents. They also proposed three ensemble techniques. The first
ensemble combines BOW and Sentence2vec using a majority voting. The second
ensemble combines the information from BOW and unsupervised neural networks.
The third ensemble aggregates the information provided by Sentence2vec and the
adopted unsupervised neural networks models. Experimental results on Arabic and
English datasets showed that: i) Feature learning using unsupervised neural networks
improves the summarization task; ii) second, unsupervised neural networks models
trained on Sentence2vec representation give better results than those trained on BOW
representation; ii) third, ensemble methods improve the performance automatic text
summarization. In particular the ensemble based on Sentence2vec model outperforms
significantly the performances of the summarization task for both Arabic and English
datasets. Although this approach is robust and gives good results, it is time consuming
and needs more data in the learning stage to build a more relevant semantic concept
space. Unlike the present approach, it not include neither statistical features nor man-
developed knowledge database system to perform the explicit semantic analysis. The
main drawback of deep learning-based approaches are: (i) they require huge training
dataset, so they need remarkable human efforts to manually build and annotate the
training data, (ii) training neural network is slow and very time consuming, (iii) It is
hard to define how the network generates a decision.

2.1.7 Hybrid approaches

Hybrid approaches combine two or many techniques in order to take advantage of each one.
Patel et al. [49] explored both statistical feature and fuzzy logic system for sentence scoring in
multi-document summarization. The method uses a redundancy removal technique based on
cosine similarity to enrich the summarization output. The experimental results showed that the
proposed framework achieves a significant performance improvement over the other summa-
rizers. In the work proposed by [26], several features are taking into account: words frequency
in the whole document, similarity with the title, the similarity of words among sentences and
paragraphs, sentence position, existence of cue-phrases and the occurrence of non-essential
information. The author investigated the effect of the combination of these features on several
Multimedia Tools and Applications

summarization models such as naive-Bayes classifier, maximum entropy and SVM model.
The summary is then generated by combining the three models. Performance evaluation on the
DUC 2002 dataset shows that results were promising when compared with some existing
techniques.
Many hybrid-based ATS systems are proposed for Arabic. Ibrahim and Elghazaly [35]
followed a hybrid approach that uses two summarization techniques: RST technique for
Rhetorical Representation and Vector Space Model (VSM) technique for Vector Representa-
tion. The former builds a Rhetorical Structure Tree (RS-Tree) of the input text using the RST
technique and construct the summary with the most significant paragraphs. The latter makes a
text representation using the VSM technique based on the cosine similarity measure. The
experimental results showed that Rhetorical Representation method yields better first in terms
of precision measure and secondly in terms of the quality of the produced summaries that are
more readable than Vector Representation; however, the performance of the second was better
with long articles. Al-Radaideh and Bataineh [6] presented a single-document summarization
based on a hybrid approach. The authors extract important sentences by combining domain
knowledge, statistical features, and genetic algorithms. The experimental results showed that
using domain knowledge improves the performance of summarizing Arabic political docu-
ments. The results obtained by combining domain knowledge (set of Arabic political key-
words) and statistical features achieved better performance than the results obtained without
incorporating domain knowledge. Other experimentations are performed by the authors to
compare their proposed system against existing Arabic summarization methods. The result of
this comparison demonstrated two principal points. First, the combination of the three
approaches (semantic similarity, statistical features, and genetic algorithm) outperformed some
existing Arabic summarization methods. Second, Arabic summarization based on the generic
algorithm outperformed the graph-based summarization. The main advantage of hybrid
approaches is that they combine the advantages of two or more approaches, which become
complementary to each other. They lead to increase performance of summarization.

2.2 Characteristics of Arabic

2.2.1 Absence of diacritics

Letters in Arabic are accompanied by signs placed below or above of them for distinguishing
the word from another homonym in terms of meaning and pronunciation. Diacritical marks are
needed for the purpose of morphology, semantic analysis and other linguistic and voice
features [12]. Indeed, word without diacritics can have several forms by adding these marks.
Table 1 illustrates the morphology analysis of word “kataba”/ktb using Alkhalil analyzer [9].
In the following table, we show some results among 17 results returned by the analyzer.

Table 1 Morphology analysis of word kataba / “ ”


Multimedia Tools and Applications

Usually, Arabic texts do not incorporate diacritical marks and they are often absent. Habash
[32] mentioned that diacritics are present in only 2% of Arabic texts. These diacritics are
written only in certain circumstances such as in the Quran, Hadith and some learning books
used to teach Arabic. The absence of diacritical marks is one of the biggest problems making
Arabic NLP more complicated.

2.2.2 Agglutination

Arabic is a very derivational and inflectional language, which makes the NLP tasks, such as
lemmatization and stemming, more difficult. The number of root in Arabic is approximately
10,000, and with adding affixes to these root, there are approximately 120 patterns [54].
Unlike English and French, Arabic words are composed by several morphemes, which
represent the lexical elements [54]. Indeed, articles, prepositions and pronouns are stuck to
adjectives, nouns, verbs and practices to which they relate in order to constitute a new lexical
unit that convey several morpho-syntactic information. Therefore, words in Arabic can
represent a complete English or French statement. This increases the complexity of Arabic
NLP tasks, such as segmentation and stemming. For example: le mot « » means «
Est-ce que vous vous souvenez de nous? » in French and « Do you remember us? » in English.
This characteristic can improve the morphological ambiguity. Indeed, it is sometimes difficult
to distinguish between a proclitic or enclitic and an original character of the word. A word in
Arabic is composed of it basic form (stem derived from the root), around which various
prefixes, suffixes, proclitic and enclitics are attached. For example, the character “ ”‘is a part
of the word “ ” (he arrived), while in the word “ “(and he opened), it is a proclitic.

2.2.3 Irregularity of the word order in the sentence

In Arabic, the words in a sentence can be composed in different ways while keeping the same
meaning conveyed by this sentence. Indeed, the order of the words is relatively free in a sentence.
This provides syntactic ambiguities in the fact that it is necessary to provide in the grammar all the
rules of possible combinations of the words order in the sentence. Table 2 shows how we can change
the word order in the sentence to get three sentences with the same meaning.

2.2.4 Irregular punctuation

The Arabic text is also characterized by the irregular use of punctuation marks. These
punctuation marks were introduced recently in the Arabic writing system. In addition, we
can find an entire Arabic paragraph containing no punctuation except for a dot at the end of
this paragraph. Thus, it should be noted that the presence of punctuation marks cannot guide
sentences segmentation as in the case of other Latin languages, such as French or English. The

Table 2 Irregularity of the word order in the sentence


Multimedia Tools and Applications

segmentation of Arabic texts must be done not only by punctuation marks and typographic
markers but also by particles and certain words such as subordination and coordination
conjunctions, etc. These particles are attached to the inflected form of words (agglutination
of morphemes), and require a rigorous morphological analysis to identify them.

2.3 Synthesis and limitations of current approaches

It is clear from Table 3 that most studies in Arabic text summarization rely on the Bag-of words
approach (BOW). The BOW approach is based on the words existing in the document, and one of
the obvious disadvantages of this approach is that, it overlooks the semantic relationship among
words, which amounts to saying that its meaning representation of documents is not accurate. The
system is always limited to the words explicitly mentioned within the input text document. For
instance, if the system cannot find the relationships between terms like « » (Petroleum) and
« » (Oil), it would handle these words separately as two different unrelated terms and this may
affect negatively their importance in the input document. Similarly, in the supervised approaches
like [18, 27], training is a decisive step to ameliorate the precision of the system. Therefore, in these
kinds of approaches, all the words that appear in the testing documents but not in the training
documents are ignored and no new information, outside what is already available in the test
documents is considered. On the other hand, several semantic-based approaches have been
introduced for automatic summarization. WordNet [44] is one of the most widely used thesauruses
for English. Because of its semantic relations of terms, it has been heavily used to improve the
quality of several NLP applications, such as automatic text summarization [8, 24, 28, 48, 50], text
clustering [56], word sense disambiguation [13] and other NLP tasks [29]. Thus, using semantic
approach based on the concepts extracted from Arabic WordNet should be investigated in
automatic summarization of Arabic documents. In addition, and as mentioned in section 1,
traditional Arabic summarization systems do not address the redundancy and the information
diversity issues. When producing the summary of the input text, the likelihood of having similar
sentences in the output is much higher, so similar sentences can appear together in the summary
because they have a similar score. Further, other ideas of the document may not be selected and
relevant information may be overlooked and accordingly not included in the final summary that is
supposed to encapsulate the maximum amount of information from the input text. Hence,
redundancy elimination has become crucial in text summarization and especially in Arabic, with
the aim of diversifying the information included in the summary.
To overcome these drawbacks, we introduce a new Arabic summarization system that
consider both statistical and semantic relationship between words. For this purpose, we make a
two-dimensional graph representation of the input document. The graph represents two kinds
of similarity: statistic similarity and semantic similarity between sentences. All sentences are
ranked according to their importance in the graph and other statistical features. The final
summary is built by including the most scored sentences using a technique based on the MMR
algorithm in order to deal with redundancy and information diversity issue.

3 A new Arabic text summarization system

The proposed hybrid framework combines different extractive methods: graph-based,


statistical-based and semantic-based. It aims to increase information diversity of the output
Multimedia Tools and Applications

Table 3 Comparison between different Arabic summarization systems

Reference Summarization Summarization technique Corpus Evaluation


approach technique

Douzidia and Statistical-based Sentence position, terms DUC 2004 corpus ROUGE
Lapalme [14] frequency, title words
and cue words
El-Haj et al. [20] Statistical-based Query-based and Authors corpus Manual
Concept-based
Oufaida et al. [47] Statistical-based Minimal-redundancy EASC corpus and ROUGE-1,
maximal-relevance TAC 2011 ROUGE-2
(mRMR) MultiLing Pilot
corpus
Elbarougy Graph-based Modified PageRank EASC corpus Precision,
et al. [16] algorithm Recall,
F-measure
Ibrahim and Hybrid RST Authors corpus Precision
Elghazaly [35] VSM
Al-Radaideh and Hybrid Domain knowledge, EASC corpus, ROUGE-1,
Bataineh [6] Statistical features, KALIMAT corpus ROUGE-2
Genetic algorithms
Fattah and Machine learning Probabilistic neural networks, authors corpus and Precision,
Ren [27] Feed forward neural DUC 2001 corpus Recall,
networks, Gaussian mixture, ROUGE-1
Mathematical regression,
Genetic Programming
El-Fishawy Machine learning Similarity between tweets. Authors corpus F-measure,
et al. [18] Decision tree with linear Normal-
regression ized
Discounted
Cumula-
tive Gain
Alami et al. [3] Deep Unsupervised feature learning, EASC corpus, ROUGE-1
learning-based Variational auto-encoder Authors corpus
Alami et al. [4] Deep Unsupervised feature learning, EASC corpus, ROUGE-1,
learning-based Word2vec, English corpus ROUGE-2
Auto-encoder,
Variational auto-encoder,
Extreme learning machine,
Ensemble learning

summaries, addressing redundancy and considering the semantic relationships among words.
Our proposed system has many advantages:

& It needs no training data or annotated corpus.


& It is domain-independent, which means that we do not need to take domain-specific
knowledge into account.
& It is knowledge-rich: unlike the existing methods, the semantic relationships among words
are considered, and the semantic information from a man-developed knowledge database
system is introduced so as to accurately represent the meaning of documents and improve
the quality of Arabic text summarization.
& It deals with a lack of semantic resources dedicated to Arabic by using a machine
translation between Arabic and English to benefit from the richness and opportunities
offered by the English language in this field.
Multimedia Tools and Applications

& It uses a graph model based on both statistical and semantic similarities to make a two-
dimensional graph representation of the input text as a set of sentences linked by
meaningful relations.
& Statistical features are used here in combination with the ranking algorithm on the graph
model to perform summarization.
& The proposed system uses the MMR method to address redundancy and information
diversity problems.

3.1 System architecture

Figure 1 illustrates the general architecture of our proposed method. A large Arabic document
constitutes the system input while the output is a small Arabic document containing the major
ideas of the original text. The following section explains in detail each stage of the proposed
system. The advanced method consists of the three main steps below:

i. Pre-processing: The aim of this step is to prepare the original text for the later steps. It
consists of tokenizing the input document, removing stop-words so as to reduce the size of
the input document, and finally extracting the root of each word.
ii. Analysis: In this stage, a number of statistical features are extracted, the measure of the semantic
and statistical similarity within each sentence of the original text is computed, and a two
dimensional graph is built representing the input document. The ranking algorithm is then
performed on a resulted graph in order to rank each sentence against others. The final score of
each sentence is computed by adding the weight of the extracted statistical features.
iii. Post-processing: The goal of this stage is to generate the final summary according to the
score of each sentence and by avoiding information redundancy.

3.2 Preprocessing phase

The preprocessing phase consists of cleaning the source documents, as well as splitting and
tokenizing the sentences. In our system, the sentence is the extraction unit and the term is
considered as a scoring unit. We implement this phase in three steps:

Preprocessing Module Analysis Module

Tokenizaon Features Extracon


Input
AWN
document
Similarity Measure
Stop-word Removal
Stasc & Semanc

If word do not exist in AWN


Root Extracon
WordNet
Machine Translaon
Post-processing Module

Redundancy Eliminaon Graph Construcon

Summary Summary Generaon Sentences Ranking

Fig. 1 The main steps of the proposed Arabic Summarization system


Multimedia Tools and Applications

3.2.1 Tokenization

The Tokenization process consists of dividing the text into tokens. The input text is
normalized through two steps: first, all punctuations, non-letters and diacritics are
removed, secondly some characters are replaced by the normalized ones (
). In our system, and depend-
ing on the datasets used, we consider the character “.” as a sentences separator and
the character ““(space) as a word separator. This consideration makes the splitting
process easy in order to segment the text document into sentences and each sentence
into words.

3.2.2 Stop words removal

Stop-words are very common words with a mainly structural function; they are recurrently
used in a text, carry little meaning and their function is syntactic only. They do not indicate the
subject matter and do not add any value to the content of their documents. In Arabic, words
like ( ) are frequent in sentences; but with little significance in the implication
of a document. These words can be deleted from the text to help identify the most meaningful
words in the summarization process. There is no typical list of stop-words specific to the
Arabic text; this is why we simply use, in this work, a list of 168 words proposed by [38].

3.2.3 Root extraction

Words in Arabic are generally derived from a root, which is indeed a base for diverse words with a
somehow related meaning. A set of derivations representing a same area can be constructed by
adding suffixes to the root. Identifying a root of an Arabic word (stemming) helps in its
grammatical variations mapping to the instances of the same term. As shown in Table 4, all the
three Arabic words “ ” are related to the same root “ ”. This
amounts to saying that multi-derivations of the Arabic wording structures make the semantic
representation of the text possible. The quality and performances of a text summarization task may
be positively impacted by an adequate representation of Arabic text. Moreover, since words
sharing the same root have a semantic relation, using this root in features selection can improve
the accuracy of similarity measure and frequency analysis in Arabic text because the words in a
text can have more than one occurrence, but in different forms.
It is to be noted that determining the root of any Arabic word is a difficult task, as the text
has to undertake a detailed morphological, syntactic and semantic analysis. Word stemming is
considered as one of the most difficult problems in Arabic. A wide body of research has been

Table 4 different derivation of root “ ”


Multimedia Tools and Applications

carried out in this field. We used the Khoja stemmer presented by [37], which is a root-based
stemmer that extracts the roots by using pattern matching and removing affixes.

3.3 Analysis phase

After preprocessing the input Arabic document, the analyzing phase begins scoring the
sentences based on the computed set of features. Each sentence is given two kinds of scores:
statistical score and semantic score. This section describes the chosen statistical features, and
how the semantic and statistical similarities between two sentences are computed. The main
steps of this phase are:

& Extract the statistical features and give an initial score for each sentence. Statistical score is
the result of the linear combination of the weights given to each extracted feature.
& Calculate the statistical similarity measure between each pair of sentences.
& Calculate the semantic similarity measure between each pair of sentences.
& Build a two-dimensional graph based on both statistical and semantic similarities.
& Apply a random walk on a graph using a PageRank algorithm. All the sentences are then
ranked according to semantic and statistical relationships between them.
& Compute the final score of each sentence.

3.3.1 Statistical features

Each sentence has a statistical score according to two features: TF.ISF and sentence position.

TF.ISF (term frequency / inverse sentence frequency) TF-IDF is a statistical measure often
used in text mining and information retrieval. It evaluates how important a term is to a document
in a collection or corpus. It is obtained by multiplying the term frequency (TF) and the inverse
document frequency (IDF). TF represents the number of times the term appears in the document.
The assumption here is that the word becomes more important when its number of occurrence in a
document is high. IDF represents the importance of a term in a document collection or corpus. It
strikes a balance for local frequencies that are likely to increase the importance of a certain term
simply because of its high frequency in a single document. IDF is obtained by comparing the
number of documents containing the term with the whole number of documents in the corpus. In
this paper, we use the Inverse Sentence Frequency (ISF) measure which is the same as IDF, where
a set of sentences substitutes for a set of documents. The inverse sentence frequency (ISF)
measures the importance of a term within the sentence collection. The formula below computes
the ISF of each word in the sentence [5]:

N
ISF wi ¼ log2 ð1Þ
d f wi

Where N represents the number of sentences in the document and dfwi represents the number of
sentences where the word wi appears.

Sentence position Identifying sentences that are closely related to the topic of the document
can be largely attributed to their position in the same document. Accordingly, sentence
Multimedia Tools and Applications

position is considered when computing the score of each sentence. We consider the first and
the last sentences are most related to the topic, so their weight is high.

3.3.2 Statistical similarity

The statistical similarity presents the number of similar words shared between two sentences.
Formally, the statistical similarity measure between two sentences Si and Sj as described in [43] is:

 wk j wk ∈S i ∩S j
Similarity S i ; S j ¼  ð2Þ
logðjS i jÞ þ log S j

Where |Si| represents the length of the sentence Si.


Other sentence similarity measures are also possible and could be interesting, such as
Euclidean Distance, Jaccard Coefficient, Cosine similarity, etc. The measure presented in Eq.
(2) is shown to be a simple and fast alternative to other similarity measures.

3.3.3 Semantic similarity

Semantic similarity is becoming more and more popular and has a significant role in
different NLP tasks such as information retrieval, information extraction, text summari-
zation, text clustering and so on. That two sentences (Or two documents) do not have
common terms does not necessarily mean that the sentences are not semantically related.
Semantic similarity involves measuring the relationship between lexicographically dis-
similar concepts. Still, two terms can have semantic similarity (e.g., can be synonyms or
have similar meaning) despite their lexicographic difference. Therefore, summarization by
using only classical methods will not be able to recover sentences (Or documents) with
semantically similar terms. This is one of the problems this work addresses. Several
semantic similarities have been proposed to quantify the semantic similarity based on
ontology hierarchy. Some utilize the taxonomies within WordNet and the relations defined
between its units. WordNet (WN) is the hierarchically-structured repository that was
created by linguistic experts and its richness stems from its lexical relations that are
explicitly defined. This kind of measure based on WN has been used on a wide scale in
NLP applications. Arabic WordNet (AWN) builds on Princeton WordNet [44] to provide a
lexical resource for standard modern Arabic. In this work, we have integrated two lexical
databases, WN and AWN, to take into consideration the semantic relationships between
terms, and provide a more accurate similarity measurement between sentences.

Words semantic similarity In this work, we have adopted the concepts based representation
model to calculate the semantic similarity between terms. In the concept-based representation
of the text, each term is replaced by its associated concepts in AWN. Two stages are required
to allow such representation:

i) The first is the projection of the terms into concepts, and each term in this stage is
substituted by the corresponding vector of concepts (not including the words that do not
appear in AWN);
Multimedia Tools and Applications

ii) The second stage is the application of a disambiguation strategy in order to assign one
concept to one term, and avoid the loss of information caused by the replacement of a
term by a list of concepts. In our approach, we have adopted the “First concept” strategy
as a simple disambiguation method. AWN provides for every word a list of ordered
concepts, arranged from the most appropriate to the least appropriate concept. This
disambiguation strategy focuses only on the first concept of the list as the most appro-
priate concept [17]. Tables 5 and 6, show the projection of two terms (“ ”),
and their associated concepts in AWN.

In this work, the focus is on using the Wu and Palmer measure [57] when computing the
semantic similarity between any two concepts. This measure was found to be simple to calculate
and presents more performances while remaining competitive and expressive as other similarity
measures. This is why we have adopted this measure as a base of our work. The Wu and Palmer
[57] measure calculates a similarity measure between concepts by examining the depths of the
two terms in the ontology, together with the depth of the least common subsume (LCS) node that
connects their senses. This measure is based on path lengths (in number of nodes), common parent
concepts, and distance from the hierarchy root [56].
An example is given in Fig. 2, which represents an ontology constituted by a number of
nodes and a root node (Root). Concept_Xi and concept_Yj correspond to two ontology
elements for which the similarity will be calculated. This similarity measure considers the
distance (N1 and N2) which separates nodes concept_Xi and concept_Yj from the hierarchy
root and the distance (N) which separates the most specific common concept (the common
parent related with the minimum number of IS-A links with the two concepts) of concept_Xi
and concept_Yj from the node Root. In the given example, the LCS of concept_Xi and
concept_Yj is the node concept_LCS that represents the lowest common node between the
paths of these two senses from the root of WordNet hierarchy. Once the LCS has been found,
the distance between two senses is defined by the following formula:
2*N
simðX ; Y Þ ¼ ð3Þ
N1 þ N2

We should point out that in the case where one of the two terms does not appear in Arabic
WordNet, we use a machine translation and we interrogate the English WordNet ontology to
compute their semantic similarity. Figure 3 illustrates the flowchart of the semantic similarity
measure process between two Arabic words wi and wj.

Table 5 Mapping of term “ ” in Arabic WordNet


Multimedia Tools and Applications

Table 6 Mapping of term “ ” in Arabic WordNet

Sentences semantic similarity The semantic similarity between each pair of sentences is
computed using a measure proposed by [41]. This measure is computed by summing the
maximum scores of all words similarity divided by the sum of the sentence length. First, each
sentence is represented as a word vector, and then the semantic similarity for each pair of
words of the given sentences is computed based on Eq. (3). Equation (4) defines the semantic
similarity formulation between two sentences:

Fig. 2 Example of a concept hierarchy


Multimedia Tools and Applications

 ∑w∈S i maxSimw w; S j þ ∑w∈S j maxSimw ðw; S i Þ
Sim S i ; S j ¼ ð4Þ
jS i j þ S j

In this equation: Si and Sj are the given sentences; maxSimw(w, Sj) represents the maximum
similarity scores of the word w and all the words in Sj; and |Si| represents the length of the
sentence Si.

3.3.4 The need of a machine translation

One of the particularities of our system is dealing with a lack in semantic resources dedicated
to the Arabic language. We use a machine translation between Arabic and English to benefit
from the richness and opportunities offered by the English language in terms of automatic
linguistic resources. The semantic similarity measurement between two Arabic words is
calculated using Arabic WordNet. Arabic WordNet is a very important lexical resource for
Arabic; it is currently under construction, and is not as mature as its correspondent in English.
That is why in this paper we propound the integration of the English WordNet ontology and
use it in the computation of the two words semantic similarity if one of them does not exist in
Arabic WordNet. Both of the given words will be translated to English and the similarity will
then be calculated according to English WordNet.

3.3.5 Graph construction

Here, we convert the input Arabic document into graph format. To draw the graph, we need to find
textual units that best describe the task of automatic summarization and consider them as nodes of
the graph. Then, we need to identify relations that connect those units. In this work, we consider the

wi and wj

Project wi in AWN: Concepts Cwi = {Ci1, Ci2, ….., Cin}


Project wj in AWN: Concepts Cwj= {Cj1, Cj2, ….., Cjm}

wi & wj exist No
Yes

WuPalmer: First Concept Translate wi into English: Ewi Project Ewi in WordNet: Cwei = {CEi1, CEi2, ….., CEin}
sim(wi,wj)=sim(Ci1, Cj1)) Translate wj into English: Ewj Project Ewj in WordNet: Cwej = {CEj1, CEj2, ….., CEjm}

Yes Ewi & Ewj exist

No
WuPalmer: First Concept
sim(Ewi,Ewj)=sim(CEi1, CEj1))

Return sim(CEi1,
Return sim(Ci1, Cj1)) Return 0
CEj1))

Fig. 3 Semantic similarity measure between two Arabic words wi and wj


Multimedia Tools and Applications

sentences of the input Arabic document as a text unit and the similarity between those sentences as a
relation between them. The system we have put forward relies on the two-dimension graph model.
An undirected weighted graph G = (N, E) is built in which sentences are represented by a set of
nodes N and the relation between each sentence is represented by the edge that connects the two
correspondent vertices. Two types of edges are used: statistical similarity and semantic similarity:

Statistical similarity (section 3.3.2) The edge between the pair of sentence is created if this
measure exceeds a predefined threshold. The weight of the edge represents the number of
common tokens between the two sentences divided by the length of each sentence.

Semantic similarity (section 3.3.3) Similar sentences have an edge between them. While the
graph edges represent the semantic similarity between the sentences, the edge weight repre-
sents the degree of this similarity. The two-dimensional undirected weighted graph built in this
step is input of the process in the next section to compute the score for each sentence.

3.3.6 Sentence ranker

The input of this process is the undirected weighted graph resulted from the previous step.
PageRank algorithm [10] was used to calculate a salient score for each vertex of the graph.
PageRank is a very popular link analysis algorithm that was developed as a method for Web
link analysis. It determines the importance of a vertex within a directed graph, on the basis of
the information elicited from the graph structure. In our case, the key intuition is that a
sentence should be highly ranked if it is recommended by many other highly ranked sentences.
PageRank can as well be used on undirected graph. In this respect, the output-degree and the
input-degree for a node are equal. In our case, In(Ni) is equal to Out(Ni) since the graph is
undirected. Equation (5) provides the score of a node Ni, where adj (Ni) is the set of vertices
adjacent to Ni, is the weight of the edge between node Ni and node Nj, and d is a damping
factor that can be set between 0 and 1. The factor d has the role of incorporating into the model
the probability of moving randomly from a given node to another in the graph.

PR N j
PRðN i Þ ¼ ð1−d Þ þ d*∑N j ∈adjðN i Þ wij ð5Þ
∑N k ∈adjðN j Þ wjk

According to [10], the factor d is often set to 0.8; and all the nodes are assigned an initial score
of 1. According to [43], convergence is usually achieved after fewer iterations when the
number of edges is large. In our case, the number of edges equals to the number of sentences in
the text. Thus, the scores difference between iterations is chosen to be below a threshold of
0.01 for all vertices. We apply (5) iteratively on a weighed graph G to compute PR. The salient
scores of the sentences are represented by the weights of the vertices. When they correspond to
vertices with higher scores, these sentences become important, salient to the document and
have strong ties with others sentences. It is to be noted that (5) is applied on both statistic and
semantic edges. We obtain two scores for each node PRstatic(Ni) and PRsemantic(Ni). The
following formula gives the silent score of each node in the graph:

PRðN i Þ ¼ PRstatic ðN i Þ þ PRsemantic ðN i Þ ð6Þ


Multimedia Tools and Applications

In the final step of the ranking process, PR(Ni) is improved by other statistical features such as
TF.ISF of the root and position of the sentence. The ultimate score related to each sentence is
given by the following formula:

∑wj∈S i TF:ISF w j
scoreðS i Þ ¼ PRðS i Þ þ þ PositionðS i Þ ð7Þ
rootCountðS i Þ

Where PR(Si) represents the rank of the sentence Si given by (6); TF.ISF represents the term
frequency/inverse sentence frequency of the root; Position(Si) = 1 for the sentence in the first
position, 0.5 for the sentence in the last position, 0 otherwise.

3.4 Post-treatment phase: Redundancy elimination and summary generation

Post-treatment is the final step of our system. It consists of eliminating redundancy from the
best scored sentences by the formula (7). In this way, we are sure that our final generated
summary covers a diversity of most information contained in the original input document. In
this step, and after carrying out the ranking process, each sentence has its salient score
Score(Si). Simply and as other graph based summarization systems, we can choose to include
in the final summary (depending on the summary size) the sentences with the higher scores.
However, this will create redundancy in the summary, since many similar sentences that
represent the same meaning in the document have similar score, so they can be included
together in the summary. Also, the remaining ideas of the document may not be identified and
relevant information of the document may be overlooked and does not appear in the final
summary. That is why the adapted version of MMR [11] is used to re-rank and select
appropriate sentences to include into the summary without redundancy. MMR is an iterative
method for content selection. In the case of automatic summarization task, it iteratively
chooses the best sentence to insert in the summary according to two characteristics:

& Relevant: That is, the sentence must be highly relevant to the content of the text. So, the
sentence with the higher ranking score will be considered.
& Novel: which means that the sentence must be minimally redundant with the summary, so
the similarity between the sentence and other previously selected sentences in the summary
needs to be low.

Algorithm 1. Ranking and generating summary via maximizing marginal relevance

As shown in Algorithm 1, the sentence is incorporated if it is highly ranked and its


similarity to any existing sentence in the summary must not be very high. First, the sentence
Multimedia Tools and Applications

with the highest rank is added to the summary S and removed from the ranked list R. The next
sentence with the highest re-ranked score from (8) is selected from the ranked list. It is then
deleted from the ranked list and added to the summary. The same process is repeated until the
summary attains the predefined length. The MMR method works according to the following
equation:
 
MMR ¼ argmaxsi ϵRnS λ* scoreðsi Þ−ð1−λÞ * maxs j ϵS * sim si ; s j ð8Þ

Where R is a set of sentences; S is a set of summary sentences; λ is a tuning factor between the
importance of a sentence and its significance to formerly chosen sentences; score(s) is the
ranking score for sentences calculated in the preceding section and sim(si, sj) is the semantic
similarity measure between si and sj.

4 Experimental results

4.1 Datasets

The performance of a summarization method is usually evaluated by comparing the results


with the summary that was extracted manually. In Arabic language, several studies aimed at
getting over the scarcity of the Arabic language in corpora. In [19], the authors drew on
Amazon’s Mechanical Turk to build Essex Arabic Summaries Corpus (EASC). The dataset
consists of 153 Arabic articles taken from two Arabic newspapers and the Arabic version of
Wikipedia. The dataset contains 10 main topics: science and technology, finance, health,
environment, art and music, education, politics, religion, sports and tourism. For each docu-
ment, five model extractive summaries are available. These model summaries were created by
native Arabic speakers using Mechanical Turk. Nonetheless, no standard dataset exists
nowadays for Arabic that is applied in the evaluation of the Arabic text summarization task.
Hence, to test the accuracy of our summarization system, we have developed a corpus of
Arabic articles. We have collected a sample of 42 articles of a number of news articles and
blogs from three popular Arabic newspapers: Al Jazeera (www. aljazeera.net), Al Arabiya
(www.alarabiya.net) and Hespress (www.hespress.com). These websites were selected
because of: (i). Their large circulation: as electronic papers, they are popular and read on a
large scale; (ii). They are written in genuine Arabic text by native speakers active in different
sectors. The sample document covers various topics (health, religion, business and politics) in
different sizes. Manual summaries were generated by a human expert in Arabic language for
all the sample documents. In this work, we used the two datasets described above so as to
assess our method. Figure 4 shows a sample Arabic text used in the evaluation process.

4.2 Evaluation metrics

We have used three important measures in the assessment of our system’s performance:
precision, recall and F-measure. Precision (P) measures the amount of correct information
returned by the system. It corresponds to the number of correct summary sentences of the
system divided by the number of its summary sentences. Recall (R) measures the system’s
coverage. It reflects the ratio of relevant sentences extracted by the system. It equals the
number of correct summary sentences of the system, divided by the aggregate number of
Multimedia Tools and Applications

Fig. 4 Example of the evaluation corpus

human generated summary sentences. These two measures are antagonistic in that a system
striving for coverage will obtain lower precision, and lower recall will be the result of a system
striving for precision. F-measure (F) strikes a balance between the first two measures using a
parameter β, and calculated by the formula that follows:
 
F ¼ β 2 þ 1 *P*R= β 2 *P þ R ð9Þ

The (F-Measure/summary size) ratio is significant in the comparison of systems. The F1 score
is obtained by setting the value of β to one. More formally, the three measures are represented
by the following formulas:
jS manual ∩S auto j
P¼ ð10Þ
jS auto j

jS manual ∩S auto j
R¼ ð11Þ
jS manual j

2*P*R
F¼ ð12Þ
PþR
Where Smanual is the set of sentences in the summary generated manually and Sauto represents
the set of sentences in the summary generated by the system.
In addition, to evaluate our method, we used the well-known automatic evaluation method
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) as advanced by [39]. ROUGE is
an extensively used set of automatic evaluation metrics that allows us to make an intrinsic
evaluation of automatic text summaries against human-made abstracts. It is of great importance
in judging the quality of any summary. ROUGE has been adapted by DUC since DUC 2004.
The main reason behind ROUGE is to enumerate the number of text units overlaps between
Multimedia Tools and Applications

the candidate summary and other human generated summaries. It consists of a package that
includes several metrics, such as N-gram Co-Occurrence Statistics (ROUGE-N), Longest
Common Subsequence (ROUGE-L), Weighted Longest Common Subsequence (ROUGE-
W), and Skip-Bigram Co-Occurrence Statistics (ROUGE-S). Formally, ROUGE-N (N = 1 in
our experiments) is an n-gram recall measure between a system generated summary (i.e.
candidate summary) and a set of human generated summaries (i.e. reference summaries). This
measure evaluates the summary by computing the n-gram recall between the summary itself
and the set of references summaries. ROUGE-N is given by the following formula:
∑S∈fRSg ∑gramn ∈S C m ðgramn Þ
ð13Þ
∑S∈fRS g ∑gramn ∈S C ðgramn Þ

Where RS represents the reference summary, n stands for the length of the n-gram, gramn
represents the maximum number of n-grams co-occurring in candidate summary, Cm(gramn) is
the maximum number of n-grams co-occurring in a set of reference summaries, and C(gramn)
represents the number of n-gram in the reference summaries. Before applying ROUGE, many
language-dependent preprocessing steps are required. In this work, we applied the stop word
removal process before calculating the ROUGE score.

4.3 Experiment setup

Our system was compared with a set of the baseline approaches (i.e., only a statistical-based
summarizer; graph-based summarizer without redundancy elimination; and a graph-based
summarizer with redundancy elimination) to show the effectiveness of our method. The first
system is a simple Arabic text summarizer based on TF.ISF feature. The summary is generated
from the highest scored sentences. The score of each sentence is computed as follows:

∑wj∈S i TF:ISF w j
scoreðS i Þ ¼ ð14Þ
RCðS i Þ

Where TF. ISF(wj) is the term frequency / inverse sentence frequency of the root wj; and
RC(Si) is the number of root in the sentence.
The second system is TextRank [43]. TextRank is a graph-based ranking model used for
both automatic text summarization and key-words extraction. It is based on PageRank
algorithm in order to rank the graph elements that better describe the text. In the summarization
task, each sentence is represented by a node in the graph and the edge between two nodes
represents the similarity relation that is measured as a content overlap between the given
sentences. The weight of each edge indicates the importance of a relationship. Sentences are
ranked based on their scores and those that have very high score are chosen. The third system
is LexRank [23], which is identical to TextRank. Both of them use graph-based approach for
text summarization and the only difference between them is that the similarity measure used by
TextRank is based on the number of similar words shared between the two sentences, while
LexRank uses cosine similarity measure of TF-IDF vectors. For further comparison of our
approach, we have implemented another graph-based summarizer with redundancy elimina-
tion proposed in [2]. This Arabic text summarizer based on graph theory (ATSG), uses a
cosine similarity measure to calculate the similarity between sentences. It makes a graph
representation for an input Arabic document and applies the PageRank algorithm in order to
rank each sentence in the graph. The system is then performed by removing redundancy from
Multimedia Tools and Applications

Table 7 Evaluation results of the proposed system on Dataset-1 and Dataset-2

Dataset-1 Dataset-2 Summary size

Precision Recall F1-measure Precision Recall F1-measure

60.64 47.20 53.08 75.70 47.52 58.39 20%


58.23 53.33 55.67 71.68 54.48 61.91 25%
56.89 58.93 57.89 68.49 59.03 63.41 30%
53.13 65.30 58.59 62.65 61.32 61.98 35%
51.15 71.03 59.47 57.74 64.18 60.79 40%

the final summary. We implemented all of these systems in java language. As mentioned
above, we used two datasets to test and evaluate the performances of our system. The first
dataset (Dataset-1) is the EASC corpus while the second dataset (Dataset-2) is our own built
corpus. Then we ran our algorithm to produce summaries for these sample texts in five various
sizes: 20%, 25%, 30%, 35% and 40%.

5 Results and discussion

To assess the quality of the automatic generated summary of different systems, we have
calculated Recall, Precision, F score and ROUGE- 1 score. Table 7 summarizes the results of
running our algorithm on the ESCAS corpus and our own corpus with different sizes. As
Table 7 illustrates, the recall decreases when the compression ratio goes down because the co-
occurrence between candidate summary and gold summary increases.
The comparison between average Recall, precision and F-measure of our system with other
baseline systems is given in Table 8. The summary size taken into account in this comparison
is 30% of the original document. We can easily notice that our system has the highest average
F score value when compared to other systems and for both of the used datasets. With
summary size 30%, the best F-measure score of the other systems is reported by the ATSG
system, with 46.76% for dataset-1 and 47.43% for dataset-2. However, in our experiment, the
average value of F-measure reported by our system is 57.89% for dataset-1 and 63.41% for
dataset-2. This amounts to saying that our algorithm enhances the performance of the graph-
based summarization system. To confirm the previous results, we additionally applied Rouge
metric. Table 9 shows the Rouge-1 score of our algorithm applied on both dataset-1 and
dataset-2. Table 9 also shows that average Rouge-1score of the proposed system increases
when the compression ratio is increased.

Table 8 Comparison against other systems with 30% of summary size

System Dataset-1 Dataset-2

Precision Recall F1-measure Precision Recall F1-measure

Proposed method 56.89 58.93 57.89 68.49 59.03 63.41


ATSG [2] 46.22 47.31 46.76 51.58 43.90 47.43
TextRank [43] 44.26 36.24 39.85 60.23 39.76 47.90
LexRank [23] 31.03 25.71 28.12 42.22 27.95 33.63
TF.ISF 39.46 33.71 36.37 42.81 27.30 33.34
Multimedia Tools and Applications

Table 9 Rouge-1 scores of the proposed system on Dataset-1 and Dataset-2

Rouge-1 for Dataset-1 Rouge-1 for Dataset-2 Summary size

51.74 56.08 20%


58.55 63.08 25%
63.19 68.40 30%
68.72 71.06 35%
73.37 74.24 40%

Table 10 draws a comparison between Rouge-1 score of our system with other systems. We
notice that our system has the highest value of Rouge-1 score, and outperforms all the other
systems on both datasets. With summary size 30%, the bestRouge-1 result of the other state-of-
the art methods is reported by the ATSG system, with 51% for dataset-1 and 52.53% for
dataset-2. Whereas, in our experiment, Rouge-1 score is 63.19% when our system was run on
dataset-1 and 68.40% when our system was run on dataset-2.
Table 11 displays an example of our system summary (VAE) compared with summaries
extracted with the competitors. The input text is given in Fig. 4. It is clear from Table 11
comparison that the summary generated by our approach is closer to human-generated
summary than other models. Our method has three sentences existing in the human-
generated summary, while the best result for the competitors is obtained by ATSG [2] and
TextRank, which have two common sentences with the human-generated summary. The bad
result is obtained by LexRank and Tf.ISF. We observe from the obtained results, that our
method outweighs all other methods thanks to the fact that our system can spot the relation-
ships between similar words among all sentences using lexical databases, WordNet and Arabic
WordNet. These relationships cannot be identified by the reference systems used in the
experimentation. In addition, none of the reference systems has a redundancy-removal com-
ponent, except the ATSG system that produces a reasonable result compared to other systems.
This shows that removing redundancy is an important part of Arabic text summarization.
However, the computational cost of our method is higher than competitors. We used a laptop
with Intel core i5-7200 U having two CPU (2.50 GH and 2.71 GH) 7th generation and 16GB
of memory. We noted that our method has been executed on 9 s. ATSG makes 6 s to finish its
execution. TextRank needs about 4 s to generate a summary.
By this work, we can also confirm that various reasons account for the difficulty to compare
the proposed approach to other existing systems. Firstly, unlike English, there is no approved
benchmark reference for Arabic language against which to assess our approach in Arabic text
summarization. Hence, the comparison of the performance of the proposed approaches is
intricate given that a different dataset and different evaluation measures are used in each work.
Dissimilarly, benchmarking in English can rely on DUC human generated summaries.

Table 10 Rouge-1 comparison against other systems with 30% of summary size

System Rouge-1 for Dataset-1 Rouge-1 for Dataset-2

Proposed method 63.19 68.40


ATSG [2] 51,00 52,53
TextRank [43] 46,10 49,98
LexRank [23] 32,70 34,27
TF.ISF 41,90 35,66
Multimedia Tools and Applications

Table 11 Comparison between reference summary, proposed method and competitors

Moreover, the community working on Arabic text summarization is still quite small. Add to
this, lexical, syntactic, and semantic ambiguity are higher in Arabic because of the complexity
of the language as far as spelling, vocabulary and morphology are concerned.

6 Conclusion and future work

In this paper, we have introduced a novel automatic summarization system for Arabic language with
statistical and semantic treatment of the input document. The proposed system incorporates the
advantages of a graph -based system and scoring sentences according to PageRank algorithm
performed on a two dimensional graph that represents the Arabic document with both semantic and
Multimedia Tools and Applications

statistical relationships existing between the document sentences. Our system improves the score of
each sentence by other statistical features extracted from the original text such as sentence position
and the TF.ISF of the root. The proposed system deals with a well-known problem in Arabic text
summarization (redundancy and information diversity) by using an adapted version of MMR
technique to remove redundancy from the final summary. The proposed system is knowledge-
rich because it integrates an external knowledge database developed by human. In addition, the
proposed system deals with a lack in the semantic resources dedicated to Arabic by using a machine
translation between Arabic and English to benefit from the richness and opportunities offered by the
English language in this field.
The comparison of the performance measures clearly shows that the advantages of our system
outweigh those of other summarization systems. Benchmarking the proposed algorithm using two
different datasets showed that it outperforms all other systems. In addition, the system does not need
any training data, and does not use any structural or domain-dependent features and was, therefore,
successfully used to summarize Arabic texts. We have shown the results of the automatic evaluation
of the system, and compared our summaries with human-made summaries using the ROUGE
method and F-measure. We accordingly conclude that our approach outperforms other existing
systems in terms of Rouge-1 and F1-measures. In this paper, we have introduced a new Arabic text
summarization method, which incorporates the semantic relationships between textual units pro-
vided by the lexical database AWN. Although this method produced positive results, AWN suffers
from two main problems. First, it is incomplete and several terms and concepts do not exist. Second,
the process of extracting information from this database is time-consuming and affect the perfor-
mance of the proposed method. Accordingly, it is necessary to provide a more powerful solution that
is able to efficiency detect the existing semantic relationships between different textual units (words,
sentences … etc.) in less time.
In the future work, we will turn to address the question of how to ameliorate the
performance on Arabic text summarization by considering a further development of this work
in several ways. One possibility would be to use other knowledge resources, such as Wikipedia
and other large corpus in addition to WN and AWN that are not a complete solution to
compute the semantic similarity between two words, because of its limit of the overall concept
coverage. We are also going to introduce more features like part-of-speech co-reference
resolution. In addition, we are planning to incorporate to our system, the implicit semantic
relationships provided by unsupervised deep learning models.

References

1. Afsharizadeh M, Ebrahimpour-Komleh H, Bagheri A (2018) Query-oriented text summarization using


sentence extraction technique. 2018 4th International Conference on Web Research (ICWR). https://doi.org/
10.1109/icwr.2018.8387248
2. Alami N, Meknassi M, Alaoui Ouatik S, Ennahnahi N (2015) Arabic text summarization based on graph
theory. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications
(AICCSA), Marrakech, pp 1–8. https://doi.org/10.1109/aiccsa.2015.7507254
3. Alami N, En-nahnahi N, Ouatik SA, Meknassi M (2018) Using unsupervised deep learning for automatic
summarization of Arabic documents. Arab J Sci Eng 43(12):7803–7815
4. Alami N, Meknassi M, En-nahnahi N (2019) Enhancing unsupervised neural networks based text summa-
rization with word embedding and ensemble learning. Expert Syst Appl 123:195–211
5. Alguliyev RM, Aliguliyev RM, Isazade NR (2015) An unsupervised approach to generating generic
summaries of documents. Appl Soft Comput 34:236–250
6. Al-Radaideh QA, Bataineh DQ (2018) A hybrid approach for Arabic text summarization using domain
knowledge and genetic algorithms. Cogn Comput 10(4):651–669
Multimedia Tools and Applications

7. Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GRAPHSUM : discovering correlations among multiple
terms for graph-based summarization. Inf Sci 249:96–109
8. Baruah N, Sarma SK, Borkotokey S (2019) A novel approach of text summarization using Assamese
WordNet. 2019 4th international conference on information systems and computer networks (ISCON).
https://doi.org/10.1109/iscon47742.2019.9036285
9. Boudchiche M, Mazroui A, Ould Abdallahi Ould Bebah M, Lakhouaja A, Boudlal A (2017) AlKhalil
Morpho sys 2: a robust Arabic morpho-syntactic analyzer. Journal of King Saud University - Computer and
Information Sciences 29(2):141–146
10. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks
and ISDN Systems 30(1):107–117
11. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based re-ranking for reordering documents and
producing summaries. In: Proceedings of SIGIR 1998. Melbourne, Australia, pp 335–336
12. Chennoufi A, Mazroui A (2017) Morphological, syntactic and diacritics rules for automatic diacritization of
Arabic sentences. Journal of King Saud University - Computer and Information Sciences 29(2):156–163
13. Dhungana UR, Shakya S, Baral K, Sharma B (2015) Word sense disambiguation using WSD specific
WordNet of polysemy words. In: Proceedings of the 2015 IEEE 9th international conference on semantic
computing (IEEE ICSC 2015). Anaheim, CA, pp 148–152. https://doi.org/10.1109/ICOSC.2015.7050794
14. Douzidia FS, Lapalme G (2004) Lakhas, an Arabic summarization system. In: Proc. of 2004 Doc.
Understanding Conf. (DUC2004), Boston, MA
15. Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285
16. Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified
PageRank algorithm. Egyptian Informatics Journal 21(2):73–81
17. Elberrichi Z, Abidi K (2012) Arabic text categorization: a comparative study of different representation
modes. The International Arab Journal of Information Technology 9:465–470
18. El-Fishawy N, Hamouda A, Attiya GM, Atef M (2014) Arabic summarization in twitter social network. Ain
Shams Engineering Journal 5(2):411–420
19. El-Haj M, Kruschwitz U, Fox C (2010) Using mechanical turk to create a corpus of arabic summaries. In:
proceedings of the 7th international conference on language resources and evaluation (LREC), Valletta,
Malta, pp 36–39, in the language resources (LRs) and human language technologies (HLT) for Semitic
languages workshop.
20. El-Haj M, Kruschwitz U, Fox C (2011) Experimenting with automatic text summarisation for Arabic. In:
Vetulani Z (ed) Human language technology. Challenges for Computer Science and Linguistics, Springer,
Berlin Heidelberg, pp 490–499
21. El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based framework for
automatic text summarization. Inf Process Manag 57(6):102264
22. El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: a compre-
hensive survey. Expert Syst Appl 165:113679
23. Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J
Artif Intell Res 22:457–479
24. Estiri A, Kahani M, Ghaemi H, Abasi M (2014) Improvement of an abstractive summarization evaluation
tool using lexical-semantic relations and weighted syntax tags in Farsi language. In: 2014 Iranian
Conference on Intelligent Systems (ICIS). Bam 2014:1–6. https://doi.org/10.1109/iraniancis.2014.6802594
25. Fang H, Lu W, Wu F, Zhang Y, Shang X, Shao J, Zhuang Y (2015) Topic aspect-oriented summarization
via group selection. Neurocomputing 149:1613–1619
26. Fattah MA (2014) A hybrid machine learning model for multi-document summarization. Appl Intell 40(4):
592–600
27. Fattah MA, Ren F (2009) GA, MR, FFNN, PNN and GMM based models for automatic text summariza-
tion. Comput Speech Lang 23(1):126–144
28. Ferreira R, de Souza CL, Freitas F, Lins RD, de Frana SG, Simske SJ, Favaro L (2014) A multi-document
summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787
29. Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-
counting and information content theory. Eng Appl Artif Intell 39:80–88
30. Gao Z, Xu C, Zhang H, Li S, de Albuquerque VHC (2020) Trustful internet of surveillance things based on
deeply represented visual co-saliency detection. IEEE Internet Things J 7(5):4092–4100
31. Gao Z, Zhang H, Dong S, Sun S, Wang X, Yang G, Wu W, Li S, de Albuquerque VHC (2020) Salient
object detection in the distributed cloud-edge intelligent network. IEEE Netw 34(2):216–224
32. Habash NY (2010) Introduction to Arabic natural language processing. Synthesis Lectures on Human
Language Technologies 3:1–187
33. Heu JU, Qasim I, Lee DH (2015) FoDoSu: multi-document summarization exploiting semantic analysis
based on social folksonomy. Inf Process Manag 51(1):212–225
Multimedia Tools and Applications

34. Hovy EH (2005) Automated text summarization. In: Mitkov R (ed) The Oxford handbook of computational
linguistics. Oxford Univ, Press, pp 583–598
35. Ibrahim A, Elghazaly T (2013) Rhetorical representation and vector representation in summarizing arabic
text. Natural language processing and information systems, lecture notes in computer science, vol 7934 pp
421–424. Springer, Berlin
36. Kang B, Nguyen TQ (2019) Random Forest with learned representations for semantic segmentation. IEEE
Trans Image Process 28(7):3542–3555
37. Khoja S (1999) Stemming Arabic Text. http://zeus.cs.pacificu.edu/shereen/research.htm
38. Khoja S (2001) APT: Arabic part-of-speech tagger. In: Proceedings of the student workshop at the second
meeting of the north American chapter of the Association for Computational Linguistics (NAACL2001).
Carnegie Mellon University, Pittsburgh, Pennsylvania, pp 20–25
39. Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. Proceedings of workshop on
text summarization branches out, post-conference workshop of ACL, In, pp 74–81
40. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
41. Malik R, Subramaniam V, Kaushik S (2007) Automatically selecting answer templates to respond to
customer emails. In: Proceedings of the 20th international joint conference on Artifical intelligence.
Hyderabad, India, pp 1659–1664
42. Mani I, Maybury MT (1999) Advances in automatic summarization. MIT Press, Cambridege, MA
43. Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Proceedings of the conference on
empirical methods in natural language processing 2004. Barcelona, Spain, pp 404–411
44. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
45. Mohamed M, Oussalah M (2019) SRL-ESA-TextSum: a text summarization approach based on semantic
role labeling and explicit semantic analysis. Inf Process Manag 56(4):1356–1372
46. Nguyen-Hoang TA, Nguyen K, Tran QV (2012) TSGVi: a graph-based summarization system for
Vietnamese documents. J Ambient Intell Human Comput 3:305–313
47. Oufaida H, Nouali O, Blache P (2014) Minimum redundancy and maximum relevance for single and
multidocument arabic text summarization. Journal of King Saud University - Computer and Information
Sciences 26(4):450–461
48. Pal AR, Saha D (2014) An approach to automatic text summarization using WordNet. In: 2014 IEEE
International Advance Computing Conference (IACC), Gurgaon, pp 1169-1173. https://doi.org/10.1109/
iadcc.2014.6779492
49. Patel D, Shah S, Chhinkaniwala H (2019) Fuzzy logic based multi document summarization with improved
sentence scoring and redundancy removal technique. Expert Syst Appl 134:167–177
50. Patil AP, Dalmia S, Abu Ayub Ansari S, Aul T, Bhatnagar V (2014) Automatic text summarizer. In: In:
2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI),
New Delhi, pp 1530–1534. https://doi.org/10.1109/ICACCI.2014.6968629
51. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency,
max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
52. Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic
modeling. Multimed Tools Appl 80:3275–3305. https://doi.org/10.1007/s11042-020-09549-3
53. Rinaldi AM, Russo C (2020) Using a multimedia semantic graph for web document visualization and
summarization. Multimed Tools Appl 80:3885–3925. https://doi.org/10.1007/s11042-020-09761-1
54. Shaheen M, Ezzeldin AM (2014) Arabic question answering: systems, resources, tools, and future trends.
Arab J Sci Eng 39(6):4541–4564
55. Song S, Huang H, Ruan T (2018) Abstractive text summarization using LSTM-CNN based deep learning.
Multimed Tools Appl 78(1):857–875
56. Wei TT, Lu YH, Chang HY, Zhou Q, Bao XY (2015) A semantic approach for text clustering using
WordNet and lexical chains. Expert Syst Appl 42(4):2264–2275
57. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting
on Association for Computational Linguistics. https://doi.org/10.3115/981732.981751
58. Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel
document automatic summarization. Expert Syst Appl 84:12–23
59. Yang K, He H, Al-Sabahi K, Zhang Z (2019) EcForest: extractive document summarization through
enhanced sentence embedding and cascade forest. Concurrency and Computation: Practice and
Experience 31:e5206. https://doi.org/10.1002/cpe.5206
60. Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl
68:93–105

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Multimedia Tools and Applications

Nabil Alami received the B.Sc., M.Sc. degrees in computer sciences from Faculty of Sciences, Sidi Mohammed
Ben Abdellah University, Morocco in 2005 and 2007, respectively. He is presently a Ph.D. degree candidate in
computer science at the Faculty of Sciences, Sidi Mohammed Ben Abdellah University, Morocco. His research
interests include image processing, computer graphics, artificial intelligence and natural language processing.

Mostafa El Mallhi received the B.Sc, M.Sc and Ph. D.degrees in computer science from Faculty of Sciences,
Sidi Mohammed Ben Abdellah, Morocco University in 2000, 2007, and 2017 respectively. He is full professor of
computer sciences at Height Normal School, Mathematics and computer sciences Department, Sidi Mohammed
Ben Abdellah University, Fez, Morocco. His research interests include image processing, pattern classification,
orthogonal systems, neural networks, big data, data mining, data science, deep learning, genetic algorithms and
special functions.
Multimedia Tools and Applications

Hicham Amakdouf received the B. Sc., M. Sc. Degrees in computer sciences from Faculty of Sciences Sidi
Mohammed Ben Abdellah, University, Morocco in 2003 and 2007, respectively. He is presently a Ph.D. degree
candidate in computer science from the Faculty of Sciences Dhar el Mahraz Fez, Morocco. His research interests
include image processing, computer graphics, artificial intelligence, geographic information system

Hassan Qjidaa received his M.Sc and PhD in applied physics from Claud Bernard University of Lyon France in
1983 and 1987 respectively. He got the Pr. Degree in Electrical Engineering from sidi mohammed Ben Abdellah
university, Fez, Morocco 1999. He is full profossor of electrical engineering at the Faculty of Sciences, Sidi
Mohammed Ben Abdellah University, Fez, Morocco. His main research interest include Image Manuscripts
Recognition, cognitive Science, Image Processing, Computer Graphics, Pattern Recognition, Neural Networks,
Human-machine Interface, Artificial Intelligence and Robotics.
Multimedia Tools and Applications

Affiliations

Nabil Alami 1 & Mostafa El Mallahi 2 & Hicham Amakdouf 1 & Hassan Qjidaa 1
1
LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, PO Box
1796, Fez 30003, Morocco
2
High Normal School, Mathematics and computer sciences Department, Laboratory of Computer Science
and Interdisciplinary Physics, Sidi Mohamed Ben Abdellah University, Fez, Morocco

You might also like