0% found this document useful (0 votes)
24 views

Text Summarization Using Machine Learning Lst m

Uploaded by

abhimyvkn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Text Summarization Using Machine Learning Lst m

Uploaded by

abhimyvkn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/354291874

Text Summarization Approaches Using Machine Learning & LSTM

Article in Revista Gestão Inovação e Tecnologias · September 2021


DOI: 10.47059/revistageintec.v11i4.2526

CITATIONS READS

2 2,239

3 authors:

Neeraj Kumar Sirohi Mamta Bansal Rajshree


IMS Engineering College Shobhit University
1 PUBLICATION 2 CITATIONS 49 PUBLICATIONS 92 CITATIONS

SEE PROFILE SEE PROFILE

Siddhi Nath Rajan


IMS Engineering College
3 PUBLICATIONS 5 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Data mining View project

Smart Farming View project

All content following this page was uploaded by Neeraj Kumar Sirohi on 08 July 2022.

The user has requested enhancement of the downloaded file.


Text Summarization Approaches Using Machine Learning & LSTM

Neeraj Kumar Sirohi1; Dr. Mamta Bansal2; Dr.S.N. Rajan3


1
Research Scholar, Shobhit Institute of Engineering & Technology, Meerut, India.
2
Shobhit Institute of Engineering & Technology, Meerut, India.
3
IMS Engineering College, Ghaziabad, India.
1
[email protected]
2
[email protected]
3
[email protected]

Abstract
Due to the massive amount of online textual data generated in a diversity of social media, web, and
other information-centric applications. To select the vital data from the large text, need to study the
full article and generate summary also not loose critical information of text document this process is
called summarization. Text summarization is done either by human which need expertise in that area,
also very tedious and time consuming. second type of summarization is done through system which is
known as automatic text summarization which generate summary automatically. There are mainly two
categories of Automatic text summarizations that is abstractive and extractive text summarization.
Extractive summary is produced by picking important and high rank sentences and word from the text
document on the other hand the sentences and word are present in the summary generated through
Abstractive method may not present in original text.
This article mainly focuses on different ATS (Automatic text summarization) techniques that has been
instigated in the present are argue. The paper begin with a concise introduction of automatic text
summarization, then closely discussed the innovative developments in extractive and abstractive text
summarization methods, and then transfers to literature survey, and it finally sum-up with the proposed
techniques using LSTM with encoder Decoder for abstractive text summarization are discussed along
with some future work directions.

Key-words: ATA, Text Summarization, Abstractive, Extractive, Neural Network, LSTM, Encoder,
Decoder.

1. Introduction

To extract valuable information from gigantic text is a challenging task now a days because we
have lots of unstructured information available on the net in the form of articles, blogs and reports.

ISSN: 2237-0722 5010


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
ATS (Automated text summarization) method provides an actual solution of extraction valuable data
from the big document. Text summarization is a process to summarize a document which retain the
primary gist and significant fragments of the novel document. Text summarization is the systems that
help the users to fetch the important ideas from novel document read the complete document. Before
discuss text summarization in depth first we know about the meaning of summary in 1995 Maybury[1]
describe the summary as: ‘‘An active summary consists the utmost significant info from a document
(or Documents) to out-turn an reduced form of the novel data for a specific task(s) and user(s) ”. After
that in 2002 Radev 2002[2] define the summary as “a transcript which is selected from one or more
document keeping the vital evidence around the novel Document(s), and which is generally not
lengthier than half of the unique document(s) and frequently, suggestively smaller than that". Then in
2005 summary is re-redefined by Hovy [3] according to him “text which is produced from single or
many documents, that holds an important segment of the data in actual Document(s), which is generally
not additional than half the primary document(s)".
However, Summarization of big document is still an undeveloped subject. There are mainly
two approaches to complete text summarization: extractive and abstractive [4]. extractive
summarization (ES) is a process of picking sentences and words from the text as a summary. Maximum
summarization research techniques are based on extractive approach. On the other hand, abstractive
approaches produces a summary by rephrasing the transcript although retains the original sense of text
in the summary text [5]. A relatively new term, abstractive text summary, has attracted attention
between investigators because of its ability to generate novel terms using language creation techniques.
The overall structure of an text summarization system; describe in below Fig. 1; involves the
following steps:
1. Pre-Processing: [6] working with many linguistic methods includes segmentation, word
tokenization, sentence selection, stop-word removal, stemming and part-of-speech etc. and
produced a refined text from original document.
2. Processing: The text we find from pre-processing step is processed using one or more text
summarization technique and transform the input document(s) to the summary
3. Post-Processing: From the generated summary some time we need to rearrange the
sentences, word in a sequence in extractive summary or replace some words in by word
embedding if abstractive summary to produced good summary.

ISSN: 2237-0722 5011


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
Figure 1 - Single or Multiple Documents, Automatic Text Summarization

1.1 Automatic Text Summarization Classifications

There are various groupings for TS classifications as demonstrated in Fig. 2. TS schemes can
be categorised based on any of the standards describe below.

Figure 2 - Classification of Automatic Text Summarization

Summary depend on the Input Size: On the bases of document as input, a summary can be
generated upon a solitary text document or numerous documents [7].
Input size tells us the to the total number of input documents whose summary can be generate
as target summary. As describe in Fig. 1, in which a user use Single-Document in single-source-
document Summarization (SSDS) and produce a summary (Shorten form of source document) while
preserve the critical [8].
Summary depend on nature of the output: It is categorised as Query and Generic-Based. The
summary generated by generic method is based on extraction of the critical information from one or
mode text and gives a general idea about its contents [9]. Where’s a query-based summarization deals
with multi-document where homogeneous documents are find out from large corpus of document based
on any particular [10]. A summary based on Query consider the data which is utmost suitable for query.

ISSN: 2237-0722 5012


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
Where summary created through generic method presents a comprehensive knowledge about the
article. [11]. Query-based summary also mentioned as a topic based, user based or query-based
summary [9].
Summary based on the extractive and abstractive approach: Extractive summarization
technique based on selection of utmost word and sentences from the inputted document and select them
as a part of summary [15]. where’s in abstractive approach the summarization is done in two steps in
first step an intermediate representation of main document is created by using NLP techniques and then
second step the summary is generated using this intermediate representation. In abstractive summary
the sentence of summary is differ from inputted sentences [16]
Summary depends on the content: classified into Informative or Indicative. An indicative
instant (Summary) comprises on the overall knowledge around the inputted document (Bhat, Mohd, &
Hashmy, 2018[12]). Thus, indicative summary determines the theme of input document (i.e. addressed
the area of inputted document). The main intention of an indicative method is to notify the users about
the field of the input document which help the user to accept that whether the reading of inputted
document is required or not. The normal length of this summary is about 8 to 10% of the unique
document [13]. On the other hand, an informative summary covers vital info and concepts of the main
document like all themes of the text. The summary created by an informative method is about 20 to
30% of the main document in a length [14].
Based on the summarization domain: There are two categories of summarization based on
Domain: that is General and Domain-Specific. The domain independent or general summarization
summarizes the documents of different domains. And the domain-specific summarization summarized
documents of definite area (e.g. legal documents or medical documents).
Summary based on the language: Three different types of summarization based on language
which are Mono, Multi, or Cross-Lingual. A method where the source and target documents is the same
language is called monolingual. where’s if the summary is produced in many languages (e.g. Arabic,
English, or French) and inputted text is also in different languages are consider as multi-lingual
summarization. And in cross-lingual summarization the inputted document is single (certain) language
(e.g. English, Chinese, French, Arabic) and produced the summary in different (e.g. Chinese to French,
Arabic to English) [9].

ISSN: 2237-0722 5013


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
2. Text Summarization Approaches

Basically, the is two main approaches for automatic text summarization (ATS) extractive and
abstractive each approach is implemented by any one from different techniques. This section provide
overview about techniques which are used for each approach in the literature.

2.1 Extractive Automatic Text Summarization Approach

Summarization based on extractive scheme’s architecture shows in Fig. 3 having the following
components.
1) Inputted document first pre-processed (i.e Tokenization, lowering, normalization etc).
2) Postprocessing like: restructuring the mined sentences, substituting pronouns with their root
forms, swapping qualified chronological appearance with genuine dates, etc. [15] the processing steps
are as follows:
1. Generate intermediate picture: Producing an appropriate representation of the inputted
document into simplify text representation (e.g. graphs,bi-gram, bag-of-words, etc.) [8].
2. Sentence Scoring: Assign scoring to the sentences and assign a ranking to every sentence
constructed on the inputted document [17].
3. Selection of maximum-scored sentences: picking utmost and significant sentences from the
inputted text (s) then combining them all to produce a summary [17] [18]). The length of final summary
depended on the selection of a any threshold value or any cut-off limit of the maximum length of the
instant and maintain the similar sequencing of the produced sentences as the inputted document [19].

Figure 3 - The Architecture of an Extractive Text Summarization System

ISSN: 2237-0722 5014


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
2.1.1 Methods Based on Statistical

These approaches based on extraction of most significant sentence’s words and sentences from
the inputted text depends on the features sets arithmetical analysis. The ‘‘utmost favourably located”
and ‘‘most recurrent”, are the common parameters to defined any sentences or words as ‘‘most vital”
sentence or words of the document. [15]. The scoring of sentence in this approach are involve the
following steps [9]:
1) By applying some linguistic and mathematical features calculate weight of every sentence
and assign them [15].
2) assignment of concluding weight to each and every sentence in the text [9] which is
calculated via a feature-score equation [15] (i.e. summing up all nominated features’ scores
to determine the final score of each sentence).

2.1.2 Topic-Based Methods

These approaches depend on recognizing the topic of document which is prime theme (i.e. what
the text all about). TF-IDF (Term Frequency, Term Frequency-Inverse Document Frequency), topic
word selection and lexical chains are the utmost technique for topic identification. Topic identification
kept their corresponding weights in a simple table [17], etc. The basic steps involve in this process
include [17]:
1) An intermediate representation of inputted text is generated which holds the key topic of
that document.
2) According to this representation a weight of each sentence is assign.

2.1.3 Sentence Significance or Clustering-Based Methods

This method is used when we have multiple documents to summarization it collect all key
sentences which describe the main them (key subject) of the document and generate cluster for all these
sentences. Sentence selection in this approach are based on centrality of sentence which is calculated
through the word centrality by using TF-IDF approach then select all sentences having TF-IDF is grater
or equal to defined threshold [20]. The scoring of sentence is performed through the following steps.
1) Based on construction of centroid by calculating TFIDF of each sentence in the text (Mehta
& Majumder, 2018[21]), and 2) selection of sentence which has more words closer to particular cluster

ISSN: 2237-0722 5015


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
centroid are consider for summary [20]. The sentence which is closer to the cluster key idea has more
significant chances to be a part of summary sentence [21].
Selection of sentence and summarization of document through Clustering-based summarization
will take care of importance and elimination of redundancy in the produced summary. In clustering
algorithms, the selection of and summarization are complete in the following steps [22]:
1) Cluster is generated from inputted document by using any clustering algorithm.
2) Then cluster’s ordering is performed which is accomplish through ranking of cluster.
Ranking depends on the no key words a cluster have more key words in cluster has higher rank, and 3)
finally from these cluster a high rank sentences are pick as summary sentence.

2.1.4 Semantic-Based Methods

Semantic based summarization is generally used LSA (Latent Semantic Analysis) it is


unsupervised ML (Machine Learning) approach that based on experimental observation about
co-occurrences of words [17] and characterised semantic of document scoring of sentence in LSA
approach complete in following steps:
1) Initially a matrix (term-to-sentence matrix) is created using an input document [23].
2) Singular Value Decomposition (SVD) is applying on the input matrix to recognize the
associations between sentences and terms.
An alternate technique of semantic based summarization proposed and implemented by [24].
which is based on alternate methods like SRL (Semantic Role Labelling) and ESA (Explicit Semantic
Analysis).

2.1.5 Methods based on Machine-Learning

These approaches renovate the summarizing task to a unsupervised problem to supervised


classifications task which works on the sentence. This algorithm learned from examples and the
sentence from inputted document are classified either “instant” (summary) or “non-instance
(non-summary)” via a document with training sets (i.e a set of document and their numerous summaries
generated by human). Summarization method based on machine-learning are focused on scoring the
sentence. Which accomplish through the following steps described by [25]):
1) Mined sentence options from the pre-processed text (i.e supported several options of words and
sentences extraction).

ISSN: 2237-0722 5016


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
2) Mined options is used as input to a neural network that generate one output score.

2.1.6 Methods Used Deep-Learning for Summarization

Kobayashi, Noguchi, and Yatsuka (2015) [26], Proposed a method where text level likeness
depends on embedding (i.e scattered equivalence of term). Text is measured in the form of sentence
and sentence is refers as a collection of terms(words). The task is to validate the issue of maximize a
sub-modular purpose outlined through negative synopsis of the adjacent neighbour’s distances on
embedding disseminations (i.e. a collection of word embeddings in a text) [26]. Accomplish the
sentence- equal likeness soft-out less complex meaning in compression of text-level likeness.[27]
recommend a summarization process for solitary text by applying a RNN (Recurrent Neural Network)
and reinforcement knowledge based algorithm with a ordered scheme of encoder-selection network
style. The significant features are carefully chosen by a sentence-equal selection encoding method then
sentences which are the part of summary are identified and pick out from the document.

2.1.7 Methods based on Fuzzy-Logic

Fuzzy-logic system for summarization is a efficient way to collect likeness of the human
intellectual classifications of document an provide a well-organized technique to represent sentence
features standards of the document [28]. Scoring of sentences are done through the Following steps
[22]:
1) Features like term weight sentence length etc are selected from every sentence.
2) By applying the fuzzy logic method (i.e. subsequently introducing the essential instructions
to acquaintance base of this structure) a score is assign to every sentence based on sentence importance
which indicate the importance of sentence. And based on rules defined in knowledge based and
sentence features a value of 0 and 1 is assign to each sentence.
In conclusion, A batter summary is produced if different approaches used together because they
use the advantages of different approaches and removes their inadequacies. Many summarization
systems combine various approaches to take the assistance from the merits of different
technique.[29],[30],[25].[31] recommend an extractive summarization method which used Fuzzy
C-Means, TextRank and collective sentence marking approaches to summarize Bengali text[30].
suggest summarizer which generate extractive summary and uses a Distributional Semantic system to
detention the key idea of document, for creation of cluster for equal meaning of sentences the K-means

ISSN: 2237-0722 5017


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
algorithm for clustering is used, and also for division of sentences in particular cluster ranking
algorithm is used.
Summarization expands the precision of final summaries. [29] suggest extractive
summarization scheme depend on joint prototypical method which merges two methods that is:
Sentence2Vec and Bag-of-Words (where individual sentence denotes as a vector from the inputted
document). Alami et al. summing up by define that the collaborative system generally produced more
accurate results in compression to a solitary method because the statistics of to respectively vector is
balancing with each other.
Every strategy has its own benefits and limitation same as we have with extractive
summarization which is as follows:

Advantages

The extractive techniques are simple, quicker, and easy to implement in compression to
abstractive techniques. This method provides a higher precision because in extractive summarization
sentences are directly chosen from original text and user get the summary with in the same vocabularies
in which the original document has[32].

Disadvantages

This technique is far-off with technique that human experts use to creates synopses (Hou, Hu,
& Bei, 2017) [51]. The main disadvantages of extractive summarization are as follows:
1. Some sentences are redundant in summary [33].
2. Summary sentences may be lengthier than normal sentences [15].
3. Because summary can be generated from various documents in multi-document the mutable
terminologies conflicts can be arise [15].
4. As because Vital data feast among the sentences. adversary, evidence might not be enclosed
[15]. If source document contains several topics then generated output summary might be partial [25].
To tide over from this problem user needs to focus all the topics which cause the lengthy summary and
summary has more length then exception.

ISSN: 2237-0722 5018


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
2.2 Summarization through Abstractive Approaches

Critical investigation of document is needed in abstractive [34]. In abstractive approach after


reading and understand the meaning of input text, then the document is converted to well defined
intermediate representation with keeping the key idea of text using NLP methods [35][50]. Abstractive
summarization is not just replication process where the sentence of input document is replicated in
summary [36] as an alternative it needs the skill to produced alternate sentences. Below Fig. 5 describe
the building block of the process of summarization. Which includes the pre-processing, processing
which includes?
1) Construction an alternate equal image [37].
2) Then summary which is closer to human-generated summary is created by using natural
language process (NLP) techniques [37] and finally used post-processing which produced final and
refine summary.

Figure 4 - Abstractive Summarization Process

The abstractive Summarization techniques classified into the following categories [38]:
1) Based on structure: these methods used predefined framework (e.g. trees, ontologies
templates, and graphs, and). This method recognizes the utmost significant data from the source
document then use any of the framework mention above and generate the summary [38].
2) Semantic based: these methods used semantic representation of text and natural language
generation (NLG) schemes (e.g. predicate arguments, created on data items and semantic diagrams).
This method creates the semantic picture of the source document through the data-items,
semantic-graphs or predicate opinions then use a NLG scheme to generate the abstractive summaries
[38] finally,

ISSN: 2237-0722 5019


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
3) Methods based on deep learning. Lin and Ng (2019) [39] proposed one more classification
for abstractive summarization which is neural based generally refers to any technique that is based on
neural network.
The methods of each categories are briefly described in this section.

2.2.1. Approaches based on Graph

Recommend the model known as “Opinosisi” in which model based on graph is used where
word is represented by nodes topical data is linked with node. Sentences framework is represented
through the directed edges. Following are the steps which involved in processing the data under
graph-based approach describe the Ganesan et al. (2010):
1) Creation of graph: To describe the source document a word-based graph is generated, and 2)
Creation of Summary: The process of creating the final abstractive summary. Numerous sub-methods
of graph are discovered and ranked are as follows:
1. A score is assigned to every path then sort them on the basis of score in descending order.
Unused paths score also includes in process of sorting.
2. By applying likeness measure (e.g. Jaccard) repeated (or very comparable) paths removed.
3. After step two select the topmost paths from remaining paths and produced the summary,
length of summary is dependent on the number of paths selected by a constraint.

2.2.2. Tree-Based Methods

These approaches recognize comparable sentences that exchange data between them, then
gather these sentences and produced summary [38]. Equal sentences are denoted by structure which is
look like a tree. The dependency tree is constructed through parsing. To describe the text document in
the form of tree, the tree-based approach is commonly used. In procedure to produced summary, some
task is progress like trees pruning and linearization (i.e. translating trees to strings), etc. [38]. abstractive
summarizer for multi-document was proposed by Kurisinkel, Zhang, and Varma[41] the highlight of
this technique are as follows:
1) To find the set of all syntactic dependence tree input document of the corpus will pe parsed.
2) From all the dependency extract in step 1 select all set of unfinished dependency trees (with
flexible sizes).
3) Clustering the picked unfinished dependence trees to assurance topical range.

ISSN: 2237-0722 5020


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
4) Use that clustered tree to create a sole sentence that shows how significant the cluster
in the summary generation process.

2.2.3. Rule-Based Methods

This approach based on describing the rules and classes to determine vital ideas about the input
document then summary is generated by using these ideas. Following steps are involved in this
approach are [38]:
1) Based on relationship and idea present in document the input document is classified.
2) According to the area of input a query is formulated.
3) Queries are responded by discovering the relationships and ideas of the document then
finally.
4) Passed these responses into almost outlines rules and create the abstractive summary.
Genest and Lapalme (2012) [42], suggest an style dependent on the abstractive structures. Each
abstractive structure is planned to solve a smaller group or ideas that involves satisfied choice
heuristics, rules for IE (Information Extraction) and simple patterns creation are used and construct
pattern for every structure. All these guidelines generated physically. An abstractive system looking to
response for single or multiple features which could be linked with the equal feature. The Information
Extraction guidelines might notice numerous applicants for every feature and the contented collection
component choose the finest which is directly involved in the summary creation unit.

2.2.4. Semantic-Based Methods

These approaches uses a semantic image (e.g. predicate-argument structures data objects, or
semantic graphs) of the input text(s) then pass this information to NLG (natural language generation)
scheme where noun and verb phrases is uses to produce the concluding abstractive summary [38]. A
multi-document abstractive summarizer suggests by [43], Khan et al. which recommend that:
1) Using SRL input text represents in predicate argument structures.
2) Using a semantic similarity measure a clusters is generated through semantically equal
predicate- argument structures across the document.
3) Based on weighted features and optimized grades the predicate-argument structures using a
Genetic Algorithm.

ISSN: 2237-0722 5021


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
4) From these selected predicate-argument structures by using the language generation the final
sentences are created for summary.

2.2.5. Methods based on Deep-Learning

Summary generated by sequence-to-sequence (seq2seq) shows that excellent summary can be


generated using abstractive methods is also achievable [44]. Seq2seq accomplished countless
achievement over several NLP methods includes voice recognition, machine transformation, and
dialogue schemes [45]. A scheme constructed on Recurrent Neural Network with attention
encoder-decoder attains auspicious results for small document; however, the methods based on deep
learning still have faced some problems like.
1) Repetitive sentences and words are produced.
2) Incapability to dealing with the problem like OOV (out-of vocabulary) (i.e. infrequent and
limited-vocabulary of words) [44].
Due to the great success of deep learning, this technique is suffer from above described
problems to overcome from these problems we describe the new approach by using the following steps
includes:
1) Transforming the document into simple transcripts through pre-processing
(i.e lemmatization, stop word removal, lowering, tokenize etc) and saving the original documents along
with summary distinctly,
2) By using the word vectorization implemented with pre-trained model (like Glove toolkit)
[46] which is used to vectorization word into vector that will be again used in the proposed model.
3) Then using the bidirectional LSTM model through Tensorflow [47] to encode the text and
then using unidirectional LSTM method for decoding. Along with Cross-entropy to analyse the loss
and to adjust the loss Adam optimizer is used.

3. Conclusion

Summarization is an exciting research area now a days and almost current summarization
methods to which produced abstractive summary are mainly focus on the deep learning methods
particularly for short document [48]. It suggested that combining unlike approaches and methods and
take the advantage for producing improved summaries using abstractive methods. various summaries
is generated from the same text by using different summarization techniques so it is encouraged to

ISSN: 2237-0722 5022


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
associated the unlike ATS approaches to generate a better summary then the summaries produced by
individual method [49]. After studies the article it is observe that a good extractive summary is created
by using structure-based techniques and by using deep-learning and semantic based techniques a
promising abstractive summary is generated [38]. Because these algorithms used the pre-processing
phase for extracting the vital-phrases also removed the stop-words from the inputted text and then used
any method to produce an abstracted summary [38]. In Kouris et al. (2019)[48], suggest an ATS which
is capable to produced good abstractive summary by combining the dep-learning technique with
encoder-decoder along with semantic-based data alteration methods. We propose a unique model in
which for pre-processing (like lowering, tokenization, Noise Removal and normalization) are
performed where NLP is used then through LSTM (Long-short-Term-Memory) encoder-decoder
architecture which is mainly for working on text along with RNN is used and produced a promising
abstractive summary.

References

Maybury, M.T. (1995). Generating summaries from event data. Information Processing
& Management, 31(5), 735–751. https://doi.org/10.1016/0306-4573 (95)00025-C
Dragomir R Radev, Eduard Hovy, and Kathleen McKeown. 2002. “Introduction to the special issue on
summarization”. Computational linguistics 28, 4 (2002), 399–408.
Hovy, E., 2005. Text Summarization. In: The Oxford Handbook of Computational Linguistics, Mitkov,
R. (Ed.), OUP Oxford, Oxford, ISBN-10: 019927634X, pp: 583-598.
Chen, J.; Zhuge, H. Abstractive Text-Image Summarization Using Multi-Modal Attentional
Hierarchical RNN. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 4046–4056.
Li, P.; Lam,W.; Bing, L.; Wang, Z. Deep Recurrent Generative Decoder for Abstractive Text
Summarization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language
Processing; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 2091–2100.
Gupta, V. K. & Siddiqui, T. J. (2012). Multi-document summarization using sentence clustering. Paper
presented at the 2012 4th international conference on intelligent human computer interaction (IHCI).
Kumar, Y. J., Goh, O. S., Basiron, H., Choon, N. H. & Suppiah, P. C. (2016). A Review on Automatic
Text Summarization Approaches. Journal of Computer Science, vol. 4, no. 12, pp. 178-190.
Joshi, M., Wang, H. & McClean, S. (2018). Dense semantic graph and its application in single
document summarisation. In C. Lai, A. Giuliani & G. Semeraro (Eds.), Emerging ideas on information
filtering and retrieval: DART 2013: Revised and invited papers (pp. 55–67). Springer International
Publishing.
Gambhir, M., & Gupta, V. (2017). Recent automatic text summarization techniques: A survey.
Artificial Intelligence Review, 47(1), 1–66. https://doi.org/10.1007/ s10462-016-9475-9.

ISSN: 2237-0722 5023


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
Sahoo, D., Balabantaray, R., Phukon, M. & Saikia, S. (2016). Aspect based multidocument
summarization. Paper presented at the 2016 International Conference on Computing, Communication
and Automation (ICCCA).
Mohan, M. J., Sunitha, C., Ganesh, A., & Jaya, A. (2016). A study on ontology based abstractive
summarization. Procedia Computer Science, 87, 32–37. https://doi. org/10.1016/j.procs.2016.05.122.
Bhat, I. K., Mohd, M. & Hashmy, R. (2018). SumItUp: A hybrid single-document text summarizer. In
M. Pant, K. Ray, T. K. Sharma, S. Rawat & A. Bandyopadhyay (Eds.), Soft computing: Theories and
applications: Proceedings of SoCTA 2016, Volume 1 (pp. 619–634). Singapore: Springer Singapore.
Saggion, H. & Lapalme, G. (2002). Generating Indicative-Informative Summaries with SumUM.
Computation of linguistic, vol. 28, no. 4, pp. 497-526.
Fan, W., Wallace, L. & Zhongj, S. R. (2005). Tapping into the Power of Text Mining. Communications
of ACM, Vol. 49, No. 9, pp. 76-82.
Gupta, V. & Lehal, S. L. (2010). A Survey of Text Summarization Extractive Techniques. Journal of
emerging technologies in web intelligence, vol. 2, no.3, pp. 258-268.
Cheung, J. (2008). Computing abstractive and extractive summarization of evalutive text:
controversiality and content selection. B. Sc. (Hons.) University of British Columbia, pp. 1-38.
Nenkova, A. & McKeown, K. (2012). A survey of text summarization techniques. In C. C. Aggarwal
& C. Zhai (Eds.), Mining text data (pp. 43–76). Boston, MA: Springer US.
Zhu, J., Zhou, L., Li, H., Zhang, J., Zhou, Y. & Zong, C. (2017). Augmenting neural sentence
summarization through extractive summarization. Paper presented at the Natural Language
Processing and Chinese Computing, Dalian, China.
Wang, S., Zhao, X., Li, B., Ge, B. & Tang, D. (2017). Integrating extractive and abstractive models for
long text summarization. Paper presented at the 2017 IEEE International Congresson BigData
(BigData Congress).
Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexical centrality as salience in text
summarization. Journal of Artificial Intelligence Research, 22, 457–479.
Mehta, P., & Majumder, P. (2018). Effective aggregation of various summarization techniques.
Information Processing & Management, 54(2), 145–158. https://doi.org/10.1016/j.ipm.2017.11.002.
Nazari, N., & Mahdavi, M. A. (2019). A survey on automatic text summarization. Journal of AI and
Data Mining, 7(1), 121–135. https://doi.org/ 10.22044/jadm.2018.6139.1726.
Al-Sabahi, K., Zhang, Z., Long, J., & Alwesabi, K. (2018). An enhanced latent semantic analysis
approach for Arabic document summarization. Arabian Journal for Science and Engineering.
https://doi.org/10.1007/s13369-018-3286-z.
Mohamed, M., & Oussalah, M. (2019). SRL-ESA-TextSum: A text summarization approach based on
semantic role labeling and explicit semantic analysis. Information Processing & Management, 56(4),
1356–1372. https://doi.org/ 10.1016/j.ipm.2019.04.003.
Moratanch, N. & Chitrakala, S. (2017). A Survey on Extractive Text Summarization. Paper presented
at the 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP),
Chennai.
Kobayashi, H., Noguchi, M. & Yatsuka, T. (2015). Summarization based on embedding distributions.
Paper presented at the Conference on Empirical Methods in Natural Language Processing, Lisbon,
Portugal.

ISSN: 2237-0722 5024


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
Chen, L. & Nguyen, M. L. (2019). Sentence selective neural extractive summarization with
reinforcement learning. Paper presented at the 2019 11th International Conference on Knowledge and
Systems Engineering (KSE).
Kumar, A., & Sharma, A. (2019). Systematic literature review of fuzzy logic based text summarization.
Iranian Journal of Fuzzy Systems, 16(5), 45–59. https://doi.org/10.22111/ijfs.2019.4906.
Alami, N., Meknassi, M., & En-nahnahi, N. (2019). Enhancing unsupervised neural networks based
text summarization with word embedding and ensemble learning. Expert Systems with Applications,
123, 195–211. https://doi.org/ 10.1016/j.eswa.2019.01.037.
Mohd, M., Jan, R., & Shah, M. (2020). Text document summarization using word embedding. Expert
Systems with Applications, 143. https://doi.org/10.1016/j.eswa.2019.112958.
Rahman, A., Rafiq, F. M., Saha, R., Rafian, R. & Arif, H. (2019). Bengali text summarization using
TextRank, fuzzy C-Means and aggregate scoring methods. Paper presented at the 2019 IEEE Region
10 Symposium (TENSYMP).
Tandel, A., Modi, B., Gupta, P., Wagle, S. & Khedkar, S. (2016). Multi-document text summarization
– a survey. Paper presented at the 2016 International Conference on Data Mining and Advanced
Computing (SAPIENCE).
Hou, L., Hu, P. & Bei, C. (2017). Abstractive document summarization via neural model with joint
attention. Paper presented at the Natural Language Processing and Chinese Computing, Dalian,
China.
Mohan, M. J., Sunitha, C., Ganesh, A., & Jaya, A. (2016). A study on ontology based abstractive
summarization. Procedia Computer Science, 87, 32–37. https://doi.org/10.1016/j.procs.2016.05.122.
Al-Abdallah, R.Z., & Al-Taani, A.T. (2017). Arabic single-document text summarization using particle
swarm optimization algorithm. Procedia Computer Science, 117, 30–37.
https://doi.org/10.1016/j.procs.2017.10.091.
Bhat, I. K., Mohd, M. & Hashmy, R. (2018). SumItUp: A hybrid single-document text summarizer. In
M. Pant, K. Ray, T. K. Sharma, S. Rawat & A. Bandyopadhyay (Eds.), Soft computing: Theories and
applications: Proceedings of SoCTA 2016, Volume 1 (pp. 619–634). Singapore: Springer.
Chitrakala, S., Moratanch, N., Ramya, B., Revanth Raaj, C. G. & Divya, B. (2018). Concept-based
extractive text summarization using graph modelling and weighted iterative ranking. In N.R. Shetty,
L.M. Patnaik, N.H. Prasad & N. Nalini (Eds.), Emerging research in computing, information,
communication and applications: ERCICA 2016 (pp.149–160). Singapore: Springer Singapore.
Gupta, S., & Gupta, S. K. (2019). Abstractive summarization: An overview of the state of the art.
Expert Systems with Applications, 121, 49–65. https://doi.org/ 10.1016/j.eswa.2018.12.011.
Lin, H. & Ng, V. (2019). Abstractive summarization: A survey of the state of the art. Paper presented
at the The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI- 19).
Ganesan, K., Zhai, C., & Han, J. (2010). Opinosis: A graph-based approach to abstractive
summarization of highly redundant opinions. Paper presented at the Proceedings of the 23rd
International Conference on Computational Linguistics, Beijing, China.
J Kurisinkel, L., Zhang, Y. & Varma, V. (2017). Abstractive multi-document summarization by partial
tree extraction, recombination and linearization. Paper presented at the Proceedings of the Eighth
International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Taipei,
Taiwan.

ISSN: 2237-0722 5025


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021
Genest, P.-E. & Lapalme, G. (2012). Fully abstractive approach to guided summarization. Paper
presented at the Proceedings of the 50th Annual Meeting of the Association for Computational
Linguistics: Short Papers – Volume 2, Jeju Island, Korea.
Khan, A., Salim, N., & Jaya Kumar, Y. (2015). A framework for multi-document abstractive
summarization based on semantic role labelling. Applied Soft Computing, 30, 737–747.
https://doi.org/10.1016/j.asoc.2015.01.070.
Hou, L., Hu, P. & Bei, C. (2017). Abstractive document summarization via neural model with joint
attention. Paper presented at the Natural Language Processing and Chinese Computing, Dalian,
China.
Wang, S., Zhao, X., Li, B., Ge, B. & Tang, D. (2017). Integrating extractive and abstractive models for
long text summarization. Paper presented at the 2017 IEEE International Congress on Big Data
(BigData Congress).
Rehurek, R. & Sojka, P. (2016). Software framework for topic modelling with large corpora. Paper
presented at the LREC 2010 Workshop on New Challenges for NLP Framework.
https://radimrehurek.com/gensim/index.html.
Mart, #237, Abadi, n., Barham, P., Chen, J., Chen, Z., Zheng, X. (2016). TensorFlow: A system for
large-scale machine learning. Paper presented at the proceedings of the 12th USENIX conference on
operating systems design and implementation, Savannah, GA, USA.
Kouris, P., Alexandridis, G. & Stafylopatis, A. (2019). Abstractive text summarization based on deep
learning and semantic content generalization. Paper presented at the Proceedings of the 57th Annual
Meeting of the Association for Computational Linguistics, Florence, Italy.
Dutta, S., Chandra, V., Mehra, K., Ghatak, S., Das, A. K. & Ghosh, S. (2019). Summarizing microblogs
during emergency events: A comparison of extractive summarization algorithms. Paper presented at
the International Conference on Emerging Technologies in Data Mining and Information Security
(IEMIS 2018), Kolkata, India.
Krishnakumari, K. & Sivasankar, E. (2018). Scalable aspect-based summarization in the adoop
environment. In V. B. Aggarwal, V. Bhatnagar & D. K. Mishra (Eds.), Big data analytics: Proceedings
of CSI 2015 (pp. 439–449). Singapore: Springer Singapore.
Hou, L., Hu, P. & Bei, C. (2017). Abstractive document summarization via neural model with joint
attention. Paper presented at the Natural Language Processing and Chinese Computing, Dalian,
China.

ISSN: 2237-0722 5026


Vol. 11 No. 4 (2021)
Received: 21.07.2021 – Accepted: 22.08.2021

View publication stats

You might also like