0% found this document useful (0 votes)
31 views15 pages

State of The Art Text - Summarisation

The document discusses different types of automatic text summarization techniques including extractive, abstractive, and hybrid approaches. It outlines the classification of summarization based on input size and approach. The main approaches are extractive which selects important sentences, abstractive which generates new phrases, and hybrid which combines both.

Uploaded by

Ahmed hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

State of The Art Text - Summarisation

The document discusses different types of automatic text summarization techniques including extractive, abstractive, and hybrid approaches. It outlines the classification of summarization based on input size and approach. The main approaches are extractive which selects important sentences, abstractive which generates new phrases, and hybrid which combines both.

Uploaded by

Ahmed hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

www.ijcrt.

org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882

A Review of state-of-the-art Automatic Text


Summarisation
Kartik Rathi, Dept. of Computer Science, SET, Sharda Saumy Raj, Dept. of Computer Science, SET, Sharda
University, Greater Noida, India, University, Greater Noida, India,

Yash Vardan Singh, Dept. of Computer Science, SET, Assoc. Prof. Sudhir Mohan, Dept. of Computer Science,
Sharda University, Greater Noida, India, SET, Sharda University, Greater Noida, India,

Abstract

Text summarisation comes under the domain of Natural Language Processing (NLP), which entails replacing a long, precise
and concise text with a shorter, precise and concise one. Manual text summarising takes a lot of time, effort and money and it's
even unfeasible when there's a lot of text. Much research has been conducted since the 1950s and researchers are still developing
Automatic Text Summarisation (ATS) systems. In the past few years, lots of text-summarisation algorithms and approaches
have been created. In most cases, summarisation algorithms simply turn the input text into a collection of vectors or tokens.
The basic objective of this research is to review the different strategies used for text summarising. There are three types of ATS
approaches, namely: Extractive text summarisation approach, Abstractive text summarisation approach and Hybrid text
summarisation approach. The first method chooses the relevant statements out of the given input text or document & convolves
those statements to create the final output as summary. The second method converts the input document into an intermedial
representation before generating a summary containing phrases that differ from the originals. Both the extractive and abstractive
processes are used in the hybrid method. Despite all of the methodologies presented, the produced summaries still lag behind
human-authored summaries. By addressing the various components of ATS approaches, methodologies, techniques, datasets,
assessment methods and future research goals, this study provides a thorough review for researchers and novices in the field of
NLP.

Keywords: Text summarisation, Abstractive, Extractive, Hybrid, Dataset.

1. Introduction In this era of big data, the abundance of textual data


obtainable from diverse sources has increased. To be
Summarisation is the process of condensing a long piece of beneficial, this huge volume of data holds a plenitude of
text into a shorter one, reducing the volume of the original knowledge and skill that should be appropriately summed up.
text whilst maintaining important information and content The increasing obtainability of documents necessitates
significance. Because human text summarisation is indeed a extensive study in the domain of NLP for automatic text
stagnant and fundamentally tiresome process, automating it is summarisation. It is the process of constructing a concise and
becoming incredibly popular. vivid summary even without involvement of a human whilst
maintaining the original text's meaning.
Text summarisation may help with a plethora of NLP tasks,
such as text classification, information retrieval, legal text It is indeed challenging because, in an effort to create a
summarisation, main stream media summarisation and summary of a literary piece, we usually read it in its entirety
headline creation. Furthermore, the production of summaries to get a clear grasp of it and then compose a summary,
might be embedded into these systems as a stage in the emphasizing its key themes. Automated text summarisation
process, decreasing the size of the document. is a pretty difficult and stagnant process since computers
deficit human language and cognition.
IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e527
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
Automatic Text summarisation is a technique for 2. Classification
compressing vast amounts of data-parallel holding up the
ingenious elucidation of the data entered. Additionally, the
data is structured in such a way that the reader has a thorough
understanding of the huge text. People are turning to the web
to obtain the information they need since the use of electronic
information is growing every day. The internet maintains a
significant quantity of data nowadays. People are turning to
the web to obtain the information they need since the use of
electronic information is growing every day. Because it is
impossible for the user to read all of the data, text
summarisation is used to summarize the data, which is then
shown to the user so that the data may be simply understood.

Single-document and multi-document summarising systems


are two types of automatic text summarisation systems. The
first one creates a summary from a single document, while
the second does it from a collection of documents. These are
Figure 2: Automatic Text Summarisers Classification
created using either an abstractive, extractive or hybrid
method to summarize the text. The extractive technique
generate the summary by selecting the most essential 2.1 On the basis of Input Size: The input size refers to the
sentences from the input material. The abstractive technique number of source documents used to create the target
transforms the text taken as input into an intermedial form summary and it is further subdivided into two parts: 1) single
before presenting a summary that includes phrases and words document summarisation and 2) Multi document
that vary from the original text sentences, whereas the hybrid summarisation. As shown in Figure 1, SDS (Single
approach combines both the approaches, that is extractive and Document Summarisation) takes single text document as
abstractive. Section 2 elucidate the various classification for input and generates a summary from it, with the goal of
ATS. Figure 1 shows the architecture of an ATS system shortening input material while maintaining the key
which includes the tasks shown below. information. The purpose of Multi-Document Summarisation
(MDS) is to decrease repeated information in the input
Figure 1: Automatic Text Summarisation System
documents by generating a summary based on a group of
1. Pre-Processing: Constructing an organized simulacrum of documents which are taken as an input. SDS is less
the original text by employing a variety of linguistic challenging than MDS. MDS has some issues including
repetition, secular relatedness, coverage, shrink ratio and so
on [2, 3].

2.2 On the basis of Approach of Text Summarisation:


Abstractive, extractive and hybrid are major three categories
in which text summarisation is divided. The extractive text
summarisation method chooses the crucial statements from
the given input document provided by user and after that
concatenates them into provided output summary. The
document provided by the user are represented in an
intermediary representation in the abstractive text
summarisation technique, & the output is constructed from
approaches like stop word removal, stemming, sentence this. Whereas Abstractive summaries are made up of
segmentation, part-of-speech tagging and tokenization and so statements that are not same as the source document
on [1]. sentences. The extractive and abstractive processes are
combined in the hybrid text summarisation methodology.
2. Processing: Converting an input document or text to the
Section 3 will go through these techniques in further depth.
summary using one of the text summarisation ways, by using
one or more techniques. Different types of approaches in
Automatic Text Summarisation are delineated in Section 3.

3. Post-Processing: Before creating the final summary,


various issues must be resolved in the generated summary
sentences, such as reordering the selected sentences and
anaphora resolution.

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e528
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
2.3 On the basis of Summary Language: There are summary of this is also produced in these languages. 3)
numerous sorts of text summarising techniques for different Cross-Lingual - when the source content is written in one
languages. So, a group of different languages are combined language like English and the summary is written in another
and collectively classified into major three types of like Arabic or French then the summarising system is cross
categories: 1) Monolingual - When the source and destination lingual [4].
papers are written in the same language, the summarising
system is monolingual. 2) Multilingual - When the source 3. Approaches
information is written in many languages like English, French
There are 3 techniques to automatic text summarisation in
and Arabic then the summarising system is multilingual & the
general: abstractive, extractive and hybrid.

Figure 3: Techniques for automatic text summarising.

3.1 Abstractive Disadvantages: It is quite tough to provide a finest abstractive


summary in practice. Abstractive summarisers which works
3.1.1 Proposal - It creates summaries that are produced by well are difficult to create since they necessitate the usage of
the new statements that were not there in the given input natural language generating technology, which is still in its
document (original copy). Abstractive text summarisation growing domain. In order to produce new phrases, the
algorithms are complex in nature and complicated because abstractive technique requires a complete comprehension of
they have to understand the input text, find the most relevant the given text. The majority of abstractive summarisers
passages and generate syntactically correct sentences as generate repeated terms and with out-of-vocabulary words
summarisation. For hand-written regulations, such a process they are unable to handle adequately. The variety of abstract
is practically difficult. However, recent advancement and summarisers' representations limits their power. Systems
research in AI/ML, particularly neural networks, have can't summarise what their representations can't capture [7,
enabled abstractive summarisation to some extent. 9].
Furthermore, NN are the state-of-the-art in abstractive
summarisation today [5]. 3.1.2 Techniques and Methods

1. Template-Based Methods: Human summaries contain


shared phrase forms that may be specified as templates in
particular fields (e.g. meeting summaries). The abstractive
summary may be generated on the basis of the given input
text genre by using the information, to fill the slots in the
input text in the appropriate predefined blueprint. The text
samples that fill the template slots are determined using
extraction rules and linguistic patterns [10, 11].

Figure 4: An abstractive text summarisation architecture 2. NER Summarisation – NER is an acronym for Named
Entity Recognition. It is a type of method for recognizing and
An abstractive text summariser's design is shown in Figure 4. classifying atomic items in text into specific categories, such
It comprises of tasks pre-processing, processing tasks and as people's names, organization names, places, concepts and
post-processing such as 1) creating an internal semantic so on. Text summarisation, question & answer, text
representation and 2) Creating a summary that would be classification and machine translation systems and in number
similar-to human-generated summaries by applying natural of languages, NER has been used till date. A lot of work,
language generation techniques [6]. advancement and research has been done in the field of NER
for English, where capitalization provides a crucial indication
Advantages: Based on paraphrasing, compression, or fusion
for rules, however, Indian languages lack such qualities. This
it can employ more adjustable expressions, it provides better
makes summarising the subject in Indian languages more
summaries using distinct terms that do not belong in the
difficult [12].
original text. The produced summary resembles the manual
summary more closely. When opposed to extractive 3. Sequence to sequence RNN - This concept enables
procedures, abstract methods can reduce the text even more
sequences from a single domain to be changed into sequences
[7-9].
from another domain. They began by describing the basic
IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e529
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
encoder-decoder RNN, which serves as a baseline, before Because a major part of the words in the summary originate-
presenting a variety of novel summarisation models. Neural from the original material, this approach is excellent for
machine translation model is being depicted by this baseline summarising [15].
model. The bidirectional GRU-RNN is being used by the
encoder, whilst the unidirectional GRU-RNN is being used 4. Semantic-Based Methods – These are the ATS methods
by the decoder with a encoder as the same hidden-state size which use a semantic representation (like semantic graphs,
and words are produced using attention to the tool over the predicate-argument structures, or information items) to build
source, for example: the hidden states and a soft-max layer an abstractive summary from the input document(s), which is
gets more attention over the target vocabulary [13, 14]. then fed into a natural language production engine developed
an abstractive summariser for multi-document which 1) uses
The summarising problem, the huge vocabulary 'trick' (LVT), SRL to represent input documents with predicate-argument
was also adjusted or added to this core model. This method's structures, 2) uses a semantic similarity measure to cluster
major goal is to lessen the size of data of the decoder's semantically similar predicate-argument structures across the
softmax layer, which is the main computational bottleneck. text, 3) the predicate-argument structures are ranked
Furthermore, by limiting the modeling effort just on those according to attributes that have been weighted and optimised
words which are crucial with respect to a specific example, using a Genetic Algorithm and 4) these predicate-argument
by following this type of strategy speeds up convergence. structures are used to produce phrases via language
generation [16, 17].

Table 1: Abstractive Text Summarisation Techniques - Advantages & Disadvantages.

Techniques Advantages Disadvantages


Template-Based Template slots need human creation of
Generates cohesive summaries and
Methods [16 , 17] extraction rules and linguistic patterns.
explanatory and the template slots may be Lack in variation is there because of
filled by scraps gathered with the help of predefined templates.
information extraction algorithms
NER Summarisation [12]
SpaCy library is the fastest and greatly fit Because of the uncertainty in the
for the practical applications and Flair language, both the quality and
constancy of the annotation are key
library outperforms and also competent for
challenges. Challenging on informal
experimentations. text.
Sequence to sequence Suitable for the short sentences.
It needs a large volume of structured data
RNN Summarisation for training.
[9]; [18, 19] RNN-based Seq2Seq models take a long time
to train and they can't capture distant
dependence links for lengthy sequences.
Semantic- Based The characteristics of the semantic
Sematic Role Labelling (SRL) aids in determining
Methods [20] representation of the input text
the semantic link between sentences’ words. determines the quality of the summary
which is constructed.

3.2 Extractive

3.2.1 Proposal - This algorithm takes bits and snippets of the


input text, generally sentences, & combines them to create
summary content. Most extractive summarisers follow the
same two phases at a high level: First, give each sentence a
score. Then choose the N phrases that get the greatest score.
The way sentences are scored is the key distinction between
individual extraction approaches [21].

Figure 5: An extractive text summarisation architecture

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e530
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
An extractive text summariser's design is shown in Figure 5. 2. Fuzzy logic – It is a typical model based on fuzzy logic for
It comprises of tasks as- Automatic Text Summarisation and it takes eight features as
the input for each and every sentence like (Length of
1) Pre-processing tasks. 2) post-processing tasks such as sentence, Data in numerical form, Location of a sentence,
replacing relative temporal expression with real dates, Title word, Thematic words, Sentence to sentence similarity,
reordering obtained paragraphs, substituting pronouns with Proper noun and Term weight) for its basic importance
their antecedents and so on[1]. 3) Processing tasks which calculation. After extracting these eight features attributes
includes: values, it goes into a Fuzzy Inference System (FIS). Also,
 To make text analysis easier, create a proper according to the indagation, a summary length of roughly
representation of the incoming text. (For example 10% (approximately) of the real text length is appropriate and
bag of words (BOW), graphs, N-gram etc.) [2]. the resulting summary consists of phrases extracted in the
 Sentences scoring/ranking - Sentences are ranked original sequence [26].
depending on their representation in the input 3. Latent semantic analysis - It is a statistical-algebraic
document or text [22]. approach for detecting hidden semantic patterns in words and
 Withdrawing top-scored sentences: choosing and sentences. It's an unsupervised method that doesn't need any
conjoining the essential statement from the text or prior training or understanding. LSA gathers information
input document to construct the summary [22, 23]. from the context of the input material, like whether words are
The length for created summary is determined by the used collectively and whether similar ideas appear in many
preferred compression rate, which is limited by a phrases. The presence of multiple similar phrases in the
length cut off or threshold that keeps the generated sentences indicates that they are semantically connected.
statement in the same sequence as the original text Words' meanings are determined by the sentences wherein
[9]. they occur and sentence meanings generally determined by
word meanings. The mathematical approach of Singular
Advantages: The fundamental advantage of extractive
Value Decomposition (SVD) is used to uncover the
approaches is that, no matter how basic the method is, it
interconnections among phrases and SVD improves accuracy
always provides syntactically accurate statements, even if
by predicting word-to-word correlations and minimising
they aren't helpful or grammatically perfect summaries.
noise [27, 28].
Disadvantages: The extractive methodology is diametrically
Step 1: Forming an input matrix: An input document must be
opposed to the way through which human specialists
formatted as a matrix in which the sentences are represented
compose summaries. The produced extraction summary has
as columns and the words/characters as rows. This way
the following flaws:
computer can easily comprehend and conduct computation on
1. Some summary sentences have redundant information [7]. it.

2. Sentences that have been extracted may be lengthier than Step 2: It is an algebraic method for modelling word and
usual [1]. sentence relationships.

3. As the extractive summaries are picked from numerous Step 3: Sentence selection: the key sentences are chosen using
input documents, so in a multi-document system setting, the SVD findings as well as various methods.
temporal expressions creates conflict [1].
Following Sentence selection approaches have been used:
4. The retrieved summaries are limited with what the
 LSA [29]
sentences from the actual text can predict. As a result, more
detailed explanations could be out of their grasp.  SVD [28]
 Murray et al. (2005) [30]
3.2.2 Techniques and Methods  Cross method [31]
 Topic method [32]
1. Bayesian Learning - SUMARIST, SWESUM and other
automated text summary systems have been developed for the 4. MS Pointer Network – After a period of time, QianGu
English language. However, single-syllable languages such using the so-known Multi Source-Pointer technique is the
as Vietnamese, Chinese, Japanese, Mongolian, Thai and other next analysis received from the ML approach. This technique
"native" languages of East Asia and Southeast. Many people primarily focuses on assigning a rating to abstractive using
speak single-syllable languages, which account for more than deep learning by predicting the inaccuracy of words in the
60% of all languages spoken on the planet. As a result, text as well as semantic inaccuracy. Basically, in this term,
processing a one-syllable language is critical. However, it is larger weights are assigned to words that are semantically
quite difficult to detect a word or phrase solely on white space related. The rogue is tested on the Gigaword and cable news
and all word segmentation techniques presently do not network (CNN) datasets for this method's assessment. In
achieve 100% accuracy. They primarily suggested a text compared to other ML techniques such as Sequence to
summary approach based on the Naive Bayes algorithm and sequence in addition to attention baseline, as well as
a subject word set in this research report [24, 25]. Nallapati’s abstractive model and the results performed quite
well. The Gigaword dataset was used to test this model and it
Naive Bayes categorization is used in two stages for single-
was superior to rouge-1 scoring 40.21 as shown to be, rouge-
syllable text: Two critical parts of the work are training and
2 scoring 19.37 and rouge-L scoring 38.29. Another test is
summary. It get trained using data and with the help of people
conducted using the CNN dataset, with rouge-1 scoring
to create a collection of extracted sentences in the Training
39.93, rouge-2 scoring 18.21 and rouge-L scoring 37.39.
phase.
IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e531
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
Other than all this methods, another one is loses rogue-1 was then tested and verified on database of Document
scoring measurements, which is basically contrasted with Understanding Conference -2002 and then compared with
systems like baseline lead-3 given by Nallapati, with rouge-1 different types of summaries like Sys-19, Sys-30 and
scoring 40.21 in the dataset of CNN. The major disadvantage MsWord summaries. Results performed more than expected
of this type of model is the occurrence of recursion of the with the comparison with the terms which are recall scoring
same statements in the document. By the virtue of this, it can 0.40 and f-measure scoring 0.422. MMI, PSO, Fuzzy are
be seen that this type of model is mainly kindred to the superior to different summaries like Sys-30 by the accuracy
recursion/redundancy issue of the sentences. Qian Guo, a of 0.063. The main disadvantage of this method is the issue
problem researcher, suggests adding TF-IDF or RBM to of semantic problems. This approach may be used by
achieve a suitable or correct summaries which results in labelling the semantic roles in the lexical dataset and other for
context of future study [13, 33]. multi-document summarisers [35, 36].

5. Rule-Based – In the last ten years, this strategy has become 7. TF-IDF Technique –TF-IDF approach is used in text
much less prominent in the field of text summarisation. The summarising research such as [37-40]. This is from one of the
approach's key benefit is that it can be used to a basic domain, algorithms that checks the link between a text and the entire
making rule-based validation relatively straightforward. collection of documents available. The major goal here is to
However, when utilised for a domain with a level of compute the TF and IDF values [41]. Every phrase is treated
complexity very high, rule-based validation becomes quite as a separate document for a single input or a single type of
difficult, so if the system is not able to identify the rules, then document. The frequentness of recurrence of the word (T) in
it cannot produce results. Aside from that, if there are more the entire single statement is used in this approach to
rules than are necessary, the system finds it challenging to determine how essential that word is in the input. IDF, on the
sustain the output's performance [34]. other hand, is a numerical figure that represents the
frequentness of the term (T) appears in a sentence. The
6. Maximal marginal importance (MMI) – Current and recent numerical value or weightage of the word will be much more
ML technology studies include the Maximal Marginal if it occurs many times in the document and also least in many
Importance (MMI) approach, the PSO and a combination of other documents, one can find this by simply multiplying the
other strategies such as fuzzy logic. Input is one type of TF value to the IDF value.
document and the output is in extractive summary format.
MMI produces summaries that sum up differently by
determining the most unique sentences. Key sentences are
chosen by taking the repetitive sentences there in the input
and also by removing statements from the given input or from
text-source. Techniques like PSO are used to select the least
& most essential features and this fuzzy logic helps it to
determine the values for the factors such as risk and
ambiguity or the endurance rate can easily fluctuate. Output

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e532
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
Table 2: Extractive Text Summarisation Techniques - Advantages & Disadvantages

Techniques Advantages Disadvantages


Bayesian Learning [24, 25] Works efficiently for single
syllable languages.
In Naive Bayes, all predictors are assumed to be
independent, which is very rare case in real world.
This greatly restricts the algorithm's usability in real-
life scenarios.

Fuzzy logic [26]; It tackles the uncertainties in


the input
[42, 43]

The duplication of the chosen statements in the


summary is a negative element which might raise and
negatively impact the quality of the summary.

Creates lingual linked phrases. SVD takes a long time to compute.

Latent semantic analysis [4]

MS Pointer Network The duplication of sentences in the summary is a flaw


[33] in this strategy.
This strategy is used to give words that have
semantic composites more weight.

Rule Based [34] Simple to test and validate rule based.

If the system can't identify the rules, then no output is


achieved. Due to too many rules the system's
performance becomes harder to maintain.

The semantic difficulty is the system's Achilles' heel.

Maximal marginal
importance (MMI) [35, 36] Create summaries with a lot of variety by
focusing on the most -significant lines.

TF-IDF Technique [37, 40] It does not account for text location, semantics and co-
Aids in extracting the most descriptive phrases occurrences across texts and so on.
from a document and quickly calculate the
similarity of two papers.

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e533
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
3.3 Hybrid is 2) abstraction phase, which applies a pointer and attention
approaches to create an encoder-decoder based on RNN and
3.3.1 Proposal - This method mainly focuses on the produce summaries [9].
combination of abstractive and extractive approaches. Below
Figure 6 mimics the hybrid text summariser's typical 2. Pretrained encoder: BERT which is an acronym for
architecture. It usually includes the phases listed below: Bidirectional Encoder Representations from Transformers is
a pre-trained language approach framework that has offered
1) Pre-processing phase. 2) Phrase extraction phase a quick overview of a broad range of NLP techniques and
(extractive automated text summarisation) which takes out methodologies. For all types of the summarisation (extractive
key sentences from the input document/text. 3) Create the and abstractive), BERT can provide a full-fledged framework
final abstractive summary utilising abstractive approach on and architecture. It's a unique language representation model
the collected phrases from the starting phase. 4) Post- that trains differently through masked language modeling
Processing: In order to get assured that the sentences that is [46].
constructed are legitimate, certain basic rules must be devised
such as: A sentence must be at least 3 words long according 3. Extractive - After a neural encoder creates sentence
to sentence structure (subject + verb + object). A verb is must representations, a classifier determines which statements
to appear in each and every sentence. An article (like "a", "an" should be used as summaries, rearranges them and adds the
and "the"), a conjunction (like "and"), a preposition (like "of") appropriate grammar. Different models like REFRESH (it is
or an interrogative word (like "who") should not be used at a learning-based system of reinforcement that has been taught
the conclusion of the sentence [8]; [44, 45]. by maximizing the ROUGE measure worldwide.), LATENT
(Given a set of phrases, the probability of human summaries
is maximized using this latent model), SUMO (it basically
uses the structured attention to instigate or provide a
representation of multiroot dependency tree of the material
while anticipating the desired summary), NEUSUM (it is the
most sophisticated extractive summarisation technique that
scores and chooses sentences together) have been used for
extractive summarisation [11, 23]; [46, 47].

Abstractive - In this the work is regarded/divided as a difficult


sequence-to-sequence tasks. Different models like PTGEN
(pointer generator network). It has a word copying feature
that allows it to copy information from the original input, as
well as a cover feature that maintains track of terms that have
Figure 6: A hybrid text summarisation architecture been summarised, DCA (Deep Communicating Agents)
models are trained utilizing the reinforcement learning), For
Advantages: It brings together the combined benefits of the
abstractive summarisation, DRM (deep reinforced model) is
both extractive and abstractive techniques. These two ways
now being employed, which tackles the coverage problem by
work together in hybrid increasing the performance of
adopting an intra-attention strategy in which the decoder pays
summarisation on a broad level [9].
attention to previously produced words [10]; [48, 49].
Disadvantages: The produced summary generates a relatively
low-grade abstractive summary in comparison to the pure
abstractive approach since it is based on extracts (pieces of
text) rather than the original text. Abstractive technique is
challenging and also needs extensive use of NLP, so the
researchers are engaging more on the extractive automatic
text summarising strategy, which employs a variety of
approaches and tactics to provide more reasoned and relevant
summaries [4].

1.3.2 Techniques and methods

1. Extractive to Abstractive Methods: This technique goes by


extracting sentences with the help of one of the extractive
automatic text summarisation methods and after that applying
the abstractive text summarisation techniques (one of them)
to the recovered statements. Author, Wang et al. suggested
the "EA-LTS" hybrid system for such challenge of
summarising large texts in. The system is divided into 2
stages: 1) the extraction (cleaning/removal) phase, which
applies a graph model to remove out key phrases and another

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e534
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882

Table 3: Hybrid Text Summarisation Techniques - Advantages & Disadvantages.

Techniques Advantages Disadvantages

The extraction process used in the


Extractive to Abstractive Both approaches are used to increase the
first step has considerable influence
Methods quality of the précis generated [50].
on the final outcome [51].

This strategy may not perform well


It's the most effective summariser and when the input material is rather long
Pretrained encoders outperforms RNN. and the compression ratio is quite
low, since it may result in summaries
that are devoid of context.

4. Text summarisation datasets from the marked statements. If a statements is not annotated,
then it is considered as insignificant.
Dataset in Standard form– Authors have presented a
conspectus of several corpora that have been utilized in the 4. CNN-corpus: It can be used as information retrieval from
task of summarisation [52]. a single document. The given source texts, word highlights
and gold-standard summaries are all included. Not long ago
1. DUC: DUC is an acronym for Document Understanding this corpus was utilised in the competition called
Conference. These are the datasets, which are the most "DocEng'19" [55].
frequent and generally utilized in most text summarising
analysis. There are three types of summaries in each dataset: 5. Gigaword 5: It's a well-known dataset for an abstractive
the first one are the summaries which are manually created, summarisation studies. It has roughly 10 million articles of
second are the baseline summaries which are automatically the English news, making it perfect for neural network
generated and lastly the summaries supplied by challenge training and testing. Gigaword has been chastised for
participants' systems, these are also automatically generated. summaries that simply provide the headlines [13, 53].
Although these datasets are frequently used to evaluate
6. CNN/Daily Mail Corpus: It is an English-language dataset
Automatic Text Summarisation, they are insufficient for
with little over 300,000 distinct news stories published by
training neural network models [53].
CNN and Daily Mail writers. It was first used for a passage-
2. SummBank Dataset: It includes 40 news clusters, human- based question-and-answer task and afterwards it was widely
authored non-extractive summaries, three hundred and sixty used to test Automatic Text Summarisation.
multi-document extracts, more than 2 million multi-
To summarise, given the bulk of available data focus on the
document and single-document extracts which are made
news domain, more datasets are required that 1) cover non-
using the machine and manual methodologies [54].
English languages & 2) include the diverse data domains for
3. Computer-Aided Summarisation Tool (CAST) Corpus: It all language families.
includes a selection of newswire texts from the Reuters
The following characteristics can be seen defined for each
Corpus3 plus various science texts from the British National
dataset in Table 4: 1) name of the dataset, 2) language of the
Corpus4. After signing the deal with Reuters5, the textual
dataset, 3) domain of the dataset and 4) allows single-
data of the new section of the corpus is obtainable, but the rest
document summarisation or not 5) allows multi-document
of it is not. There are three different sorts of information
summarisation or not.
annotations are given in the corpus: sentence significance,
sentence linkages and text fragments that may be extracted

IJCRT2204522 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e535

Electronic copy available at: https://ssrn.com/abstract=4107774


www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
Table 4: Datasets Used for Text Summarisation.

Name of the Dataset Language of the Domain of the Allows Single Allows Multi-
input document dataset Document Document
summarisation summarisation

DUC 2002 English News ✔ ✔

Turkish Dataset Turkey News ✔ ✖

XSum English News ✔ ✖

CNN/Daily Mail English News ✔ ✖


news highlights
dataset

Gigaword 5 English News ✔ ✖

SummBank Chinese, English News ✔ ✔

CAST English News ✔ ✖

CNN-corpus English News ✔ ✖

5. Evaluation Metrics extrinsic methods, it uses a task-based performance measure


to assess or maintain the summary quality. The extrinsic
Many attempts have been made to resolve the concern related evaluation mainly determines how useful the summaries are
to the summary evaluation during the last two decades. in certain application setting [1, 56].
According to Huang et al., the objectives that needs to be
addressed while creating a shortened and understandable To evaluate the text summarisation there are two ways: first
summary: one is manual and second is automatic. In the context of text
summarising research, it is a very difficult problem to solve.
1. Coverage of Information: the generated summary must To examine the quality of the Automatic Text Summarisers
include all of the key points from the given input material (s). that created them, the automatically generated summaries
must be assessed. The performance of the Automatic Text
2. Importance of Information: the summary should include all
Summariser is frequently matched or compared to the other
of the subjects in the input material (s).The most essential
benchmarked systems, such as leading sentences from the
topics can either be the major or central topics or user-
input material or standard text summarisers like LexRank,
preferred topics.
TextRank and more [56-58].
3. Redundancy of Information: can reduce the total amount of
5.1 Manual Summary Evaluation - Computer generated
information available in the produced summary that is
summaries are may be asked to assess by the human judges
frequently redundant or duplicates.
using the quality points listed below [56, 59]:
4. Text Cohesion: The summary isn't merely a collection of
Readability: Evaluate the language quality of the summary
essential but disparate sentences or words. The summary
generated by looking for extra spaces in its verbal structure
should be written in a way that is both legible and clear.
or dangling anaphora.
The resulting output summaries are evaluated using two
Grammatical: The generated summary should not contain any
different methods: The first one is intrinsic methods, human
improper statements or capitalization errors which conflicts
judgment is used to assess summary quality. The intrinsic
the grammar norms.
assessment analyses a summary's consistency and content
coverage, as well as its informativeness and the second is

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e536
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
Referential Clarity: The reader should be able identify the 6. Applications
noun phrase as soon as it appears in the generated summary.
6.1 CV or Resume Summarisation: CV summarisation will
Coverage of content: The generated summary is must to play a major role in extracting the CV document with only
encompass all of the subjects mentioned in the input material the required information like qualifications, marks, skills,
experience, projects done and other useful information of the
(s).
candidates.
Structure and Coherence: The generated summary should be
6.2 News Summarisation: News blaster is basically a text
arranged properly and well-constructed. It is made up of a
summariser that assists readers in locating the most relevant
series of related and cohesive statements.
news. In this system gathers, clusters, categorizes and
Non-redundancy: The generated summary should not contain summarises news automatically from many different sites on
any repetitions. the daily basis

5.2 Automatic Summary Evaluation - Here we'll look at 6.3 Summarisation of Scientific Papers: These publications
some of the most commonly used evaluation metrics in the are well-organized with a template-like structure and
literature [21]. predictable positions of typical components in the content. To
get citation information, mining the pattern of citations is one
1. Precision Score Metric: This is calculated by taking example of a way and relationships between citations, as well
intersection of number of sentences in both the reference and as summarisation approaches that recognize the material’s
candidate summaries and dividing it by the total number of content in both the citing and cited publications, can be
sentences in the candidate summary as shown in Eq. 1. employed.
Recall = Sref ∩ Scand / Scand (1) 6.4 Legal Documents Summarisation: To save legal
2. Recall Score Metric: As shown in Equation 2, it is professional’s time, Kavila et al. presented a legal document
calculated by taking intersection of number of sentences in search system that is automated. The summarising task
both the reference and candidate summaries and dividing it highlights the rhetorical functions of presenting legal
by the total number of sentences in the reference summary. judgments document phrases. Based on the legal question, the
search task discovers relevant historical cases. As a result, the
Recall = Sref ∩ Scand / Sref (2) hybrid system employs a variety of techniques, including
keywords or key phrase matching applications or procedures,
3. F-Measure Score Metric: As shown in Equation 3, F- as well as the case-based strategy [60].
measure is nothing but the harmonic mean of recall and
precision. This is a measure that combines both the recall and
precision metrics.
7. Challenges
F˗Measure = 2 (Precision) (Recall) / Precision + Recall
(3) 7.1 Related to Text Summarisation Applications: Most of
old or previous systems are focused on specific online
4. ROUGE Metric: It is the most trusted and widely used reviews, text news and so on applications. Now is the time to
instrument in NLP for unmanned evaluation or assessment of concentrate on the most difficult applications, such as
the summaries, generated automatically. It basically counts extended text, novel and book summaries.
the amount of overlapping units, between the candidate
summaries and reference summaries. It has been shown to be 7.2 Related to Multi-Document Summarisation:
useful in testing the accuracy of the model and assessing the Redundancy, rearranging the sentences and co-reference are
quality of summaries and has a good correlation with human among the challenges that multi-document summarisation
judgments [3, 9]. faces. Multi-document summarisation can result in improper
references [4].
ROUGE evaluation has been regarded as a standard for
assessing the generated summary and testing the accuracy of 7.3 Related to Input Document’s Length or Size: The
a summarising model since its inception, however it has the majority of Automatic Text Summarisers are designed to
major drawback of just matching strings between the handle short text documents. Existing ATS approaches may
summaries without taking into account the meaning of series perform well when summarising small texts, but they perform
of words (n-grams) or single words. poorly when summarising large texts.

The problem of human judgment is that it is subjective, with 7.4 Related to Languages that are supported: The majority
a broad range of what constitutes an "excellent" summary. of Automatic Text Summarisers focuses majorly on English
This discrepancy suggests that developing an automated language material. The quality of current Automatic Text
review and analysis system is complex and time-consuming. Summarisers for many more languages needs to be enhanced
To decrease the expense of review, summaries created by [61].
Automatic Text Summariser are examined using automated
7.5 Related to Text Summarisation Using Deep Learning:
metrics. The automated assessment measures, on the other
RNN in the sequence-to-sequence systems require a large-
hand, still require human effort since they rely on a testing of
scale well organized trained data during the generating phase
system-generated summaries comparing it with one or more
of summary. In actual NLP applications, the requisite training
human-created model summaries [21, 56].
data is not always accessible. Building an Automatic Text
Summariser with a very little quantity of training data by
utilizing it with a classic NLP combination approach such as
IJCRT2204522 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org
Electronic copy available at: https://ssrn.com/abstract=4107774 e537
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
syntax, grammatical, semantic analysis and so on is an summarisation and the evaluation of the generated summary
interesting research issue. are the most important part of this study.

8. Conclusion and future work Future work in this field of textual summary research could
include: i) solving problems related to feature, such as picking
The main purpose of this paper is to provide the latest study, features to employ in data summarisation to discover the
progress and research which is made in this field till date. We more appropriate features, uncovering new features, creating
have discussed mainly the abstractive, extractive and hybrid the most often utilized features, using a variety of semantic
sumarisation techniques and their related advantages and features, finding the best factors to produce coherent
disadvantages. Extractive summaries still hold the top of sentences and adding system elements. ii) Pre-processing the
current popular trend topics in this research, even though they database problem with the right title; otherwise, POS Tagging
are far simpler than the most complicated abstractive is necessary to prevent word deletion and create tokens and
summaries, which are quite complex. This is due to the fact this is done to distinguish word categories such as nouns,
that more study is required and that many questions remain adjectives, verbs and so on. iii) Summing up the mathematical
unanswered in the abstractive summarisation process, which methodologies, ML and fuzzy-based is the most difficult to
is a hurdle that researchers must overcome. It can also be try in the extractive summaries. iv)We can enhance the
shown that semantics, similarity, sentence position, sentence current methodologies, such as NATSUM in some
length, frequency, keywords and the necessity to be there are circumstances, or increase NATSUM performance by
the most essential variables in making a good or clean boosting compliance, by using abstractive summaries. v)
summary. Also different datasets used for these Unusual datasets, such as legal papers, tourism attractions
summaries and inspection documents summaries [62].

REFERENCES 10619, X. Huang, J. Jiang, D. Zhao, Y. Feng and Y. Hong,


Eds. Cham: Springer International Publishing, 2018, pp. 329–
[1] V. Gupta and G. S. Lehal, “A Survey of Text 338. doi: 10.1007/978-3-319-73618-1_28.
Summarisation Extractive Techniques,” J. Emerg. Technol.
Web Intell., vol. 2, no. 3, pp. 258–268, Aug. 2010, [8] R. Sun, Z. Wang, Y. Ren and D. Ji, “Query-Biased Multi-
doi:10.4304/jetwi.2.3.258-268. document Abstractive Summarisation via Submodular
Maximization Using Event Guidance,” in Web-Age
[2] M. Joshi, H. Wang and S. McClean, “Dense Semantic Information Management, vol. 9658, B. Cui, N. Zhang, J. Xu,
Graph and Its Application in Single Document X. Lian and D. Liu, Eds. Cham: Springer International
Summarisation,” in Emerging Ideas on Information Filtering Publishing, 2016, pp. 310–322. doi: 10.1007/978-3-
and Retrieval, vol. 746, C. Lai, A. Giuliani and G. Semeraro, 319399379_24.
Eds. Cham: Springer International Publishing, 2018, pp. 55–
67. doi: 10.1007/978-3-319-68392-8_4. [9] S. Wang, X. Zhao, B. Li, B. Ge and D. Tang, “Integrating
Extractive and Abstractive Models for Long Text
[3] V. K. Gupta and T. J. Siddiqui, “Multi-document Summarisation,” in 2017 IEEE International Congress on Big
summarisation using sentence clustering,” in 2012 4th Data (Big Data Congress),Honolulu, HI, USA, Jun.2017,
International Conference on Intelligent Human Computer pp.305–312. doi: 10.1109/BigDataCongress.2017.46.
Interaction (IHCI), Kharagpur, India, Dec. 2012, pp. 1–5. doi:
10.1109/IHCI.2012.6481826. [10] R. Paulus, C. Xiong and R. Socher, “A Deep Reinforced
Model for Abstractive Summarisation,” ArXiv170504304
[4] M. Gambhir and V. Gupta, “Recent automatic text Cs, Nov. 2017, Accessed: Mar. 05, 2022. [Online]. Available:
summarisation techniques: a survey,” Artif. Intell. Rev., vol. http://arxiv.org/abs/1705.04304
47, no. 1, pp. 1–66, Jan. 2017, doi: 10.1007/s10462-
01694759. [11] S. Narayan, S. B. Cohen and M. Lapata, “Don’t Give Me
the Details, Just the Summary! Topic-Aware Convolutional
[5] N. Moratanch and S. Chitrakala, “A survey on abstractive Neural Networks for Extreme Summarisation,” in
text summarisation,” in 2016 International Conference on Proceedings of the 2018 Conference on Empirical Methods
Circuit, Power and Computing Technologies (ICCPCT), in Natural Language Processing, Brussels, Belgium, 2018,
Nagercoil, India, Mar. 2016, pp. 1–7. doi: pp. 1797–1807. doi: 10.18653/v1/D18-1206.
10.1109/ICCPCT.2016.7530193.
[12] P. Marek, Š. Müller, J. Konrád, P. Lorenc, J. Pichl and J.
[6] S. Chitrakala, N. Moratanch, B. Ramya, C. G. Revanth Šedivý, “Text Summarisation of Czech News Articles Using
Raaj and B. Divya, “ConceptBased Extractive Text Named Entities,” 2021, doi: 10.48550/ARXIV.2104.10454.
Summarisation Using Graph Modelling and Weighted
Iterative Ranking,” in Emerging Research in Computing, [13] R. Nallapati, B. Zhou, C. dos Santos, C. Gulcehre and B.
Information, Communication and Applications, N. R. Shetty, Xiang, “Abstractive Text Summarisation using Sequence-to-
L. M. Patnaik, N. H. Prasad and N. Nalini, Eds. Singapore: sequence RNNs and Beyond,” in Proceedings of The 20th
Springer Singapore, 2018, pp. 149–160. doi: 10.1007/978- SIGNLL Conference on Computational Natural Language
981-10-4741-1_14. Learning, Berlin, Germany, 2016, pp. 280–290. doi:
10.18653/v1/K16-1028.
[7] L. Hou, P. Hu and C. Bei, “Abstractive Document
Summarisation via Neural Model with Joint Attention,” in [14] J. Chung, C. Gulcehre, K. Cho and Y. Bengio,
Natural Language Processing and Chinese Computing, vol. “Empirical Evaluation of Gated Recurrent Neural Networks

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e538
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
on Sequence Modeling,” ArXiv14123555 Cs, Dec. 2014, [24] P. K. P. Mok, “Language-specific realizations of syllable
Accessed: Mar. 08, 2022. [Online]. Available: structure and vowel-to-vowel coarticulation,” J. Acoust. Soc.
http://arxiv.org/abs/1412.3555 Am., vol. 128, no. 3, p. 1346, 2010, doi: 10.1121/1.3466859.

[15] I. Gibadullin and A. Valeev, “Experiments with LVT and [25] T. Nomoto, “Bayesian learning in text summarisation,”
FRE for Transformer model,” ArXiv200412495 Cs, Apr. in Proceedings of the conference on Human Language
2020, Accessed: Mar. 05, 2022. [Online]. Available: Technology and Empirical Methods in Natural Language
http://arxiv.org/abs/2004.12495 Processing - HLT ’05, Vancouver, British Columbia,
Canada, 2005, pp. 249–256. doi: 10.3115/1220575.1220607.
[16] V. Gupta, N. Bansal and A. Sharma, “Text
Summarisation for Big Data: A Comprehensive Survey,” in [26] A. Kumar and A. Sharma, “Systematic literature review
International Conference on Innovative Computing and of fuzzy logic based text summarisation,” Iran. J. Fuzzy Syst.,
Communications, vol. 56, S. Bhattacharyya, A. E. Hassanien, vol. 16, no. 5, Oct. 2019, doi: 10.22111/ijfs.2019.4906.
D. Gupta, A. Khanna and I. Pan, Eds. Singapore: Springer
Singapore, 2019, pp. 503–516. doi: 10.1007/978-981- [27] M. G. Ozsoy, F. N. Alpaslan and I. Cicekli, “Text
132354-6_51. summarisation using Latent Semantic Analysis,” J. Inf. Sci.,
vol. 37, no. 4, pp. 405–417, Aug. 2011, doi:
[17] A. Khan, N. Salim and H. Farman, “Clustered genetic 10.1177/0165551511408848.
semantic graph approach for multi-document abstractive
summarisation,” in 2016 International Conference on [28] J. Steinberger and K. Ježek, “Text Summarisation and
Intelligent Systems Engineering (ICISE), Islamabad, Singular Value Decomposition,” in Advances in Information
Pakistan, Jan. 2016, pp. 63–70. doi: Systems, vol. 3261, T. Yakhno, Ed. Berlin, Heidelberg:
10.1109/INTELSE.2016.7475163. Springer Berlin Heidelberg, 2004, pp. 245–254. doi:
10.1007/978-3-540-30198-1_25.
[18] T. Cai, M. Shen, H. Peng, L. Jiang and Q. Dai,
“Improving Transformer with Sequential Context [29] Y. Gong and X. Liu, “Generic text summarisation using
Representations for Abstractive Text Summarisation,” in relevance measure and latent semantic analysis,” in
Natural Language Processing and Chinese Computing, vol. Proceedings of the 24th annual international ACM SIGIR
11838, J. Tang, M.-Y. Kan, D. Zhao, S. Li and H. Zan, Eds. conference on Research and development in information
Cham: Springer International Publishing, 2019, pp. 512–524. retrieval - SIGIR ’01, New Orleans, Louisiana, United States,
doi: 10.1007/9783030-32233-5_40. 2001, pp. 19–25. doi: 10.1145/383952.383955.

[19] W. Miao, G. Zhang, Y. Bai and D. Cai, “Improving [30] G. Murray, S. Renals and J. Carletta, “Extractive
Accuracy of Key Information Acquisition for Social Media summarisation of meeting recordings,” in Interspeech 2005,
Text Summarisation,” in 2019 IEEE International Sep. 2005, pp. 593–596. doi: 10.21437/Interspeech.2005-59.
Conferences on Ubiquitous Computing & Communications [31] E. Linhares Pontes, S. Huet, J.-M. Torres-Moreno and
(IUCC) and Data Science and Computational Intelligence A. C. Linhares, “Cross-Language Text Summarisation Using
(DSCI) and Smart Computing, Networking and Services Sentence and Multi-Sentence Compression,” in Natural
(SmartCNS), Shenyang, China, Oct. 2019, pp. 408–415. doi: Language Processing and Information Systems, vol. 10859,
10.1109/IUCC/DSCI/SmartCNS.2019.00094. M. Silberztein, F. Atigui, E. Kornyshova, E. Métais and F.
[20] N. S. Ranjitha and J. S. Kallimani, “Abstractive multi- Meziane, Eds. Cham: Springer International Publishing,
document summarisation,” in 2017 International Conference 2018, pp. 467–479. doi: 10.1007/978-3-319-91947-8_48.
on Advances in Computing, Communications and [32] N. K. Nagwani, “Summarising large text collection using
Informatics (ICACCI), Udupi, India, Sep. 2017, pp. 1690– topic modeling and clustering based on MapReduce
1694. doi: 10.1109/ICACCI.2017.8126086. framework,” J. Big Data, vol. 2, no. 1, p. 6, Dec. 2015, doi:
[21] N. Moratanch and S. Chitrakala, “A survey on extractive 10.1186/s40537-015-0020-5.
text summarisation,” in 2017 International Conference on [33] Q. Guo, J. Huang, N. Xiong and P. Wang, “MS-Pointer
Computer, Communication and Signal Processing (ICCCSP), Network: Abstractive Text Summary Based on Multi-Head
Chennai, India, Jan. 2017, pp. 1–6. doi: Self-Attention,” IEEE Access, vol. 7, pp. 138603–138613,
10.1109/ICCCSP.2017.7944061. 2019, doi: 10.1109/ACCESS.2019.2941964.
[22] A. Nenkova and K. McKeown, “A Survey of Text [34] M. A. P. Subali and C. Fatichah, “Kombinasi Metode
Summarisation Techniques,” in Mining Text Data, C. C. Rule-Based dan N-Gram Stemming untuk Mengenali
Aggarwal and C. Zhai, Eds. Boston, MA: Springer US, 2012, Stemmer Bahasa Bali,” J. Teknol. Inf. Dan Ilmu Komput.,
pp. 43– 76. doi: 10.1007/978-1-4614-3223-4_3. vol. 6, no. 2, p. 219, Feb. 2019, doi:
[23] J. Zhu, L. Zhou, H. Li, J. Zhang, Y. Zhou and C. Zong, 10.25126/jtiik.2019621105.
“Augmenting Neural Sentence Summarisation Through [35] F. B. Goularte, S. M. Nassar, R. Fileto and H. Saggion,
Extractive Summarisation,” in Natural Language Processing “A text summarisation method based on fuzzy rules and
and Chinese Computing, vol. 10619, X. Huang, J. Jiang, D. applicable to automated assessment,” Expert Syst. Appl., vol.
Zhao, Y. Feng and Y. Hong, Eds. Cham: Springer 115, pp. 264–275, Jan. 2019, doi:
International Publishing, 2018, pp. 16–28. doi: 10.1007/978- 10.1016/j.eswa.2018.07.047.
3-319736181_2.
[36] O.-M. Foong and A. Oxley, “A hybrid PSO model in
Extractive Text Summariser,” in 2011 IEEE Symposium on

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e539
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
Computers & Informatics, Kuala Lumpur, Malaysia, Mar. Vancouver, Canada, 2017, pp. 1073– 1083. doi:
2011, pp. 130–134. doi: 10.1109/ISCI.2011.5958897. 10.18653/v1/P17-1099.

[37] School of Software, Xinjiang University, Urumqi [49] A. Celikyilmaz, A. Bosselut, X. He and Y. Choi, “Deep
830008, China, R. Khan, Y. Qian and S. Naeem, “Extractive Communicating Agents for Abstractive Summarisation,” in
based Text Summarisation Using KMeans and TF-IDF,” Int. Proceedings of the 2018 Conference of the North American
J. Inf. Eng. Electron. Bus., vol. 11, no. 3, pp. 33–44, May Chapter of the Association for Computational
2019, doi: 10.5815/ijieeb.2019.03.05. Linguistics: Human Language Technologies, Volume
1 (Long Papers), New Orleans, Louisiana, 2018, pp. 1662–
[38] A. M. Azmi and N. I. Altmami, “An abstractive Arabic 1675. doi: 10.18653/v1/N18-1150.
text summariser with user controlled granularity,” Inf.
Process. Manag., vol. 54, no. 6, pp. 903–921, Nov. 2018, doi: [50] R. Sahba, N. Ebadi, M. Jamshidi and P. Rad, “Automatic
10.1016/j.ipm.2018.06.002. Text Summarisation Using Customizable Fuzzy Features and
Attention on the Context and Vocabulary,” in 2018 World
[39] M. Yousefi-Azar and L. Hamey, “Text summarisation Automation Congress (WAC), Stevenson, WA, Jun. 2018,
using unsupervised deep learning,” Expert Syst.Appl., vol. pp. 1–5. doi:10.23919/WAC.2018.8430483.
68pp. 93–105, Feb.2017, doi:10.1016/j.eswa.2016.10.017.
[51] H. Li, J. Zhu, C. Ma, J. Zhang and C. Zong, “Read,
[40] A. Alzuhair and M. Al-Dhelaan, “An Approach for Watch, Listen and Summarize: MultiModal Summarisation
Combining Multiple Weighting Schemes and Ranking for Asynchronous Text, Image, Audio and Video,” IEEE
Methods in Graph-Based Multi-Document Summarisation,” Trans. Knowl. Data Eng., vol. 31, no. 5, pp. 996–1009,
IEEE Access, vol. 7, pp. 120375–120386, 2019, doi: May 2019, doi: 10.1109/TKDE.2018.2848260.
10.1109/ACCESS.2019.2936832.
[52] F. F. dos Santos, M. A. Domingues, C. V. Sundermann,
[41] S. Robertson, “Understanding inverse document V. O. de Carvalho, M. F. Moura and S. O. Rezende, “Latent
frequency: on theoretical arguments for IDF,” J. Doc., vol. association rule cluster based model to extract topics for
60, no. 5, pp. 503–520, Oct. 2004, doi: classification and recommendation applications,” Expert
10.1108/00220410410560582. Syst. Appl., vol. 112, pp. 34–60, Dec. 2018, doi:
[42] Nasrin nazari and M. A. Mahdavi, “A survey on 10.1016/j.eswa.2018.06.021.
Automatic Text Summarisation,” J. AI Data Min., no. Online [53] H. Lin and V. Ng, “Abstractive Summarisation: A
First, May 2018, doi: 10.22044/jadm.2018.6139.1726. Survey of the State of the Art,” Proc. AAAI Conf.
[43] D. Patel, S. Shah and H. Chhinkaniwala, “Fuzzy logic Artif. Intell. vol. 33, pp. 9815–9822, Jul. 2019, doi:
based multi document summarisation with improved 10.1609/aaai.v33i01.33019815.
sentence scoring and redundancy removal technique,” Expert [54] D. R. Radev et al., “Evaluation challenges in large-scale
Syst. Appl., vol. 134, pp. 167–177, Nov. 2019, doi: document summarisation,” in Proceedings of the 41st Annual
10.1016/j.eswa.2019.05.045. Meeting on Association for Computational Linguistics -
[44] I. K. Bhat, M. Mohd and R. Hashmy, “SumItUp: A ACL ’03, Sapporo, Japan, 2003, vol. 1, pp. 375–382. doi:
Hybrid Single-Document Text Summariser,” in Soft 10.3115/1075096.1075144.
Computing: Theories and Applications, vol. 583, M. Pant, K. [55] R. D. Lins et al., “The CNN-Corpus: A Large Textual
Ray, T. K. Sharma, S. Rawat and A. Bandyopadhyay, Eds. Corpus for Single-Document Extractive Summarisation,” in
Singapore: Springer Singapore, 2018, pp. 619–634. doi: Proceedings of the ACM Symposium on Document
10.1007/978-981-10-5687-1_56. Engineering 2019, Berlin Germany, Sep. 2019, pp. 1–10. doi:
[45] E. Lloret, M. T. Romá-Ferri and M. Palomar, 10.1145/3342558.3345388.
“COMPENDIUM: A text summarisation system for [56] E. Lloret, L. Plaza and A. Aker, “The challenging task
generating abstracts of research papers,” Data Knowl. Eng., of summary evaluation: an overview,” Lang. Resour. Eval.,
vol. 88, pp. 164– 175, Nov. 2013, doi: vol. 52, no. 1, pp. 101–148, Mar. 2018, doi:
10.1016/j.datak.2013.08.005. 10.1007/s10579017-9399-2.
[46] Y. Liu, I. Titov and M. Lapata, “Single Document [57] D. R. Radev, H. Jing, M. Styś and D. Tam, “Centroid-
Summarisation as Tree Induction,” in Proceedings of the based summarisation of multiple documents,” Inf. Process.
2019 Conference of the North, Minneapolis, Minnesota, Manag., vol. 40, no. 6, pp. 919–938, Nov. 2004, doi:
2019, pp.1745–1755. doi: 10.18653/v1/N19-1173. 10.1016/j.ipm.2003.10.006.
[47] X. Zhang, M. Lapata, F. Wei and M. Zhou, “Neural [58] R. Mihalcea, “Graph-based ranking algorithms for
Latent Extractive Document Summarisation,” in Proceedings sentence extraction, applied to text summarisation,” in
of the 2018 Conference on Empirical Methods in Natural Proceedings of the ACL 2004 on Interactive poster and
Language Processing, Brussels, Belgium, 2018, pp. 779–784. demonstration sessions -, Barcelona, Spain, 2004, pp. 20-es.
doi: 10.18653/v1/D18-1088. doi: 10.3115/1219044.1219064.
[48] A. See, P. J. Liu and C. D. Manning, “Get To the Point: [59] I. Mani, Automatic Summarisation, vol. 3. Amsterdam:
Summarisation with Pointer- Generator Networks,” in John Benjamins Publishing Company, 2001. doi:
Proceedings of the 55th Annual Meeting of the Association 10.1075/nlp.3.
for Computational Linguistics (Volume 1: Long Papers),

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e540
www.ijcrt.org © 2022 IJCRT | Volume 10, Issue 4 April 2022 | ISSN: 2320-2882
[60] S. D. Kavila, V. Puli, G. S. V. Prasada Raju and R. [61] R. Belkebir and A. Guessoum, “Concept generalization
Bandaru, “An Automatic Legal Document Summarisation and fusion for abstractive sentence generation,” Expert Syst.
and Search Using Hybrid System,” in Proceedings of the Appl., vol. 53, pp. 43–56, Jul. 2016, doi:
International Conference on Frontiers of Intelligent 10.1016/j.eswa.2016.01.007.
Computing: Theory and Applications (FICTA), vol. 199, S.
C. Satapathy, S. K. Udgata and B. N. Biswal, Eds. Berlin, [62] C. Barros, E. Lloret, E. Saquete and B. Navarro-
Heidelberg: Springer Berlin Heidelberg, 2013, pp. 229–236. Colorado, “NATSUM: Narrative abstractive summarisation
doi: 10.1007/978-3-642-35314-7_27. through cross-document timeline generation,” Inf. Process.
Manag, vol. 56, no. 5, pp. 1775–1793, Sep. 2019, doi:
10.1016/j.ipm.2019.02.010.

IJCRT2204522 International
Electronic Journal of Creative
copy available Research Thoughts (IJCRT) www.ijcrt.org
at: https://ssrn.com/abstract=4107774 e541

You might also like