0% found this document useful (0 votes)
19 views8 pages

Research Paper 7

The document is a survey on Natural Language Processing (NLP) based text summarization, highlighting the exponential growth of data and the need for effective summarization techniques. It categorizes summarization methods into extractive and abstractive, discussing various approaches and their applications in different fields. The paper reviews several methodologies, including unsupervised and supervised techniques, emphasizing the advancements in neural networks for improving summarization quality.

Uploaded by

Mihir Bhagat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Research Paper 7

The document is a survey on Natural Language Processing (NLP) based text summarization, highlighting the exponential growth of data and the need for effective summarization techniques. It categorizes summarization methods into extractive and abstractive, discussing various approaches and their applications in different fields. The paper reviews several methodologies, including unsupervised and supervised techniques, emphasizing the advancements in neural networks for improving summarization quality.

Uploaded by

Mihir Bhagat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]

IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

Natural Language Processing (NLP) based Text


Summarization - A Survey
Ishitva Awasthi
Information Technology Kuntal Gupta Prabjot Singh Bhogal
SVKM 's NM IM S, M PSTM E Information Technology Information Technology
Shirpur, India SVKM 's NM IM S, M PSTM E SVKM 's NM IM S, M PSTM E
[email protected] Shirpur, India Shirpur, India
[email protected] [email protected]

Sahejpreet Singh Anand Prof. Piyush Kumar Soni


Information Technology Department of Information Technology
SVKM 's NM IM S, M PSTM E SVKM 's NM IM S, M PSTM E
Shirpur, India Shirpur, India
2021 6th International Conference on Inventive Computation Technologies (ICICT) | 978-1-7281-8501-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICT50816.2021.9358703

[email protected] [email protected]

Abstract— The size of data on the Internet has risen in an Based on the aim, su mmarizer can be classified as Generic,
exponential manner over the past decade. Thus, the need for a where the model treats the input without any bias and prior
solution emerges, that transforms this vast raw information knowledge. Do main-specific, where the model uses domain
into useful information which a human brain can understand. informat ion to form a mo re accurate summary based on
One such common technique in research that helps in dealing known facts. Query-based, where the summary only
with enormous data is text summarization. Automatic
summarization is a renowned approach which is used to
contains known answers to natural language questions about
reduce a document to its main ideas. It operates by preserving the input text.
substantial information by creating a shortened version of the Based on output type summarizer can be classified as:
text. Text Summarization is categorized into Extractive and Extractive, where important sentences are selected from the
Abstractive methods. Extracti ve methods of summarization input text to form a summary. Abstractive, where the model
minimize the burden of summarization by choosing from the forms its own phrases and sentences to offer a mo re
actual text a subset of sentences that are relevant. Although coherent summary, like what a human would generate. In
there are a ton of methods, researchers specializing in Natural general, creating abstract summaries is a mo re co mp lex task
Language Processing (NLP) are particularly drawn to than extract ive methods. Therefore, they are still far fro m
extractive methods. Based on linguistic and statistical
reaching the human level, except for recent advances in the
characteristics, the implications of sentences are calculated. A
study of extractive and abstract methods for summarizing texts
use of neural networks pro moted by the advances of neural
has been made in this paper. This paper also analyses above machine translation and sequencing models.
mentioned methods which yields a less repetitive and a more Applications of text summarizer are media monitoring,
concentrated summary. search marketing, internal document workflow, financial
research, social media marketing, helping disabled people
and more.
Keywords—Text summarization, extractive, abstractive,
reinforcement learning, supervised, unsupervised. II. M ET HODOLOGIES
I. INT RODUCT ION In this segment, we are about to rev iew nu merous
[1]The summary of large texts remains an open problem in outstanding works that have been accomplished on Text
Summarization as shown in Fig. 1. We are essentially going
natural language processing. Automatic Text
to represent their approach and workflow.
Summarization is used to summarize large documents. Text
summarization is the process of shortening a text document
with software, in order to create a su mmary with the major A. Extractive
points of the orig inal document. Text su mmarization is the
technique of reducing a text document with the use of 1) Unsupervised
software, in order to create a summary or abstract of the
original document. Su mmarization is done to highlight the Extractive Unsupervised summarizat ion technique means
important parts of the text. creating the summary fro m the given document without
Text summarizer can be classified based on input type: using any previous labelled group or classification. There
Single Document, where the input is s mall in textual are three ways to do so, firstly graph based, secondly latent
context. Basic summarizat ion models are built for such variable and lastly, term frequency. These are easy to
cases. Multi document, where the input can be implement and give satisfactory results. So me o f the
comparatively long. The co mplexity increases here as more research done is mentioned below.
text leads to more semantic links being generated. Hernández et al., [17] presented a solution of using K-
Means Clustering for choosing sentences in ext ractive text

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1310


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

summarization which is a ma jor disadvantage. The first step relative score is calculated. Finally, the summary can be
is to eliminate stop words, hyphens and redundant white generated by firstly, accord ing to the descending order of
spaces. This is called pre processing of the input text. The relative rank and secondly, according to their occurrence in
next step is to select the feature using n-gram and finding the input text.
out the weights using Boolean Weighting(BOOL), Term
Frequency(TF), Inverse Docu ment Frequency(IDF) or TF-
IDF. The next step is to apply KMeans for sentence El-Kassas et al., [19] presented a single document and
clustering. KMeans is an iterative process in which values graph-based extract ive system called EdgeSu mm. In the
are plotted to the nearest centroid(mean of all values) and proposed method, firstly pre-processing and lemmat ization
then calculating new centroids. In the proposed method, the is done. After that for text representation, a graph is created
first sentence is considered as a baseline and the similarity with the nouns as nodes and non-noun words as the edges.
between the sentences is plotted using Euclidean’s There are “S#” and “E#” node to indicate the starting and
distance.After the clustering is done using K clusters, the ending of the sentence. For each node weight is calculated
sentences (also called most representative sentences) nearest by counting the frequency of its occurrence. For selecting
to the centroids are selected. The proposed method obtains sentences, there is an assumption that all nouns represent
more favourable results than other state-of-art methods. different topics. Firstly, it searches for the most frequent
words or phrases and creates a list of the selected nodes and
edges. For the source and destination node to be selected
then the score must be greater than the average score of all
the nodes and to select an edge both the source and
destination node must be selected. If the candidate
summary(summary generated using the algorithm) exceeds
the user-limit then, the sentences in the candidate summary
are scored and ranked in ascending order. After that
KMeans clustering is applied to group similar sentences and
the sentences with higher rank fro m each cluster are selected
to generate the final summary.

Zheng & Lapata, [20] p roposed Position-Augmented


Centrality based Summarization(PacSu mm). It uses graph
based ranking algorith ms where sentences are the nodes
and the edges show the relationship between the nodes.For
mapping the sentences, Bidirectional Encoder
Representations fro m Transformers(BERT) was used. There
are 2 tasks for pre-training BERT, first task is masked
language modelling in wh ich a token is given to the
sentence in view of the left and right sentences and second,
sentence prediction in wh ich the relat ionship between two
sentences is predicted. For fine tuning BERT, five negative
samples per positive samp le are g iven. After finding the
representations of all sentences, a pairwise dot product is
taken to create an unnormalized matrix. Using this matrix,
the sentences will be selected.

Vanetik et al., [21] suggested a Weighted Co mpression


Model for ext racting important info rmation fro m the text.
In the proposed model, this is done by shortening the
sentence by iteratively removing Elementary Discourse
Units(EDUs). Firstly, every term is given a non-negative
weight. The weights are assigned using Extractive models of
Fig. 1. T axonomy for T ext summarization Gillick and Favre [34] and McDonald [35]. The next step is
selecting and removing EDUs. A list of EDUs is created
using constituency-based syntax trees. Fro m the list, the
Joshi et al., [18] suggested an unsupervised framework for EDUs that can make a sentence grammatically incorrect if
extractive text summarizat ion of a single docu ment called removed are omitted. All the others are removed and the
SummCoder. In Su mmCoder, after the pre-processing, the weight of the “important” EDUs are calculated and sorted.
sentences are converted to fixed length vectors using the For the generating the su mmary, the EDUs are selected on
skip thought model. For generating summary, sentence the basis that weight to cost ratio is maximu m and summary
selection is done considering 3 scores: Sentence Content length is not exceeded.
Relevance Metric (score ContR), Sentence Novelty Metric
(scoreNov ) and Sentence Position Relevance Metric
(scoreP osR). After calculat ing all the scores a final score and

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1311


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

Ozsoy & Alpaslan, [22] presented Latent Semantic Non stop words in the text which match with those in the
Analysis(LSA) for text summarizat ion. It is an algebraic- title are given more importance in the summary.
statical method for finding out hidden logical patterns 4) Keyphrase Score
between words and sentences. For text representation, an Keywords used or predefined by the author are given mo re
input matrix is created in wh ich the rows represent the importance in the summary when used in the text.
words and the colu mns represent the sentences. The cells 5) Sentence Length
show the TF-IDF value of the words. To model the We add as an attribute the length of the phrase, an effort to
relationship between words and sentences, Single Value catch the intuition that short phrases are quite unlikely to be
Deco mposition(SVD) is used.The output of SVD helps in
successful summaries because they do not communicate as
selecting sentences using the cross method. The sentences
much data as longer phrases.
with the longest vector are selected.
Charitha et al., [26] suggests that for an automated text
Song et al., [23] suggested Fuzzy Evolutionary Optimization summarization, Convolutional Neural Network (CNN) can
Modeling(FEOM) for clustering sentences. Consider ‘n’ be adapted by rating sentences. It learns features from
objects that will be assigned to clusters according to the sentences presented in a phrase and then allocates ranks to
distances. The next step is to apply 3 evolutionary operators: the same without needing any manual work by humans. As
selection, crossover and mutation till the termination input, it takes word embeddings. Its ranks are produced as a
condition is not reached(Nmax = 200). There are 3 control part of the output for these phrases.‘word2vec' module is
parameters which regulate the crossover probability, p c and used for the same.
mutation probability, p m : Distribution Coefficient, Var, Integer Linear Programming which is mostly known as ILP
Relative Distance, G and Mean Evaluation effect, Em. The is generally used for sentence selection based on the ranks
best fitted sentences are then selected. allocated previously by the model.
ILP attempts to solve problems where, keeping certain
constraints, an objective function should be min imised or
2) Supervised maximised, with the limitation that integers should be the
variables used.
Extractive Supervised summarization strategies diminish the The CNN model takes the output fro m the above module
weight of summarizat ion by choosing subsets of sentences. word as an input. This CNN model is equipped to learn the
Analysts working with NLP are part icularly pulled in to characteristics of the sentences in order to rate them. A
extractive summarizat ion. The fundamental focal point of sentencing matrix which is the final matrix o r input made
the current work is to recognise remarkab le h ighlights by joining the vectors of the words of the sentence. Using
which would help in making a decision about the word2vec, the pre -trained word vectors are created. There
significance of a sentence in an art icle. A Supervised are mult iple feature maps for each sentence in this model.
learning approach requires a lot of named information o r Each feature map is created by applying filters to the
labelled dataset. Extract ive procedures select the top N sentence matrix.
sentences that best speak to the central issues of the article.
Extractive procedures are set up as binary classification For extract ive summary, Wong et al.,, [27] examined
problems where the objective is to recognise the article combined sentence features. He used a supervised learning
sentences having a place in the summary. A directed system to calculate the weights of various characteristics to
methodology requires the presence of a bunch of reference, determine how likely a sentence is to be meaningful.
or gold, summaries. Accordingly, an admin istered model A supervised learning classifier is then used after feature
utilising a minimal robust set of features is the feature of the vectors of sentences are tested. In particular, candidate
beneath strategies. sentences are re-ranked, considering the duration of the final
summaries is fixed. Lastly, to assemble the final su mmaries,
Collins et al., [25] considers the Recurrent Neural Network the top sentences were removed.
(RNN) framework. Since the sentences are randomly Sentence Features(SF) is a collect ive term given by the
ordered in our dataset, there is no immediately availab le author to show that certain words or phrases are considered
mean ing for each sentence fro m the surrounding sentences. much more weighted and important than other
A set of such features are used for each phrase to provide words/phrases in a sentence or a document based on their
the context locally and globally, which is described below. frequency, location and quote if any. So me of the other
1) AbstractROGUE similar features considered, for Extract ive Su mmarisation
It is used for summarization as a feature. It uses the abstract, are:
a pre-existing description, to manipulate the known 1) Content Features
structure of a paper. AbstractROUGE 's theory is that Based on content-bearing words, we comb ine three well-
sentences that are strong qualitative summaries are often known sentence features , i.e., centroid words, signature
likely to have good summaries of the highlights. terms, and high-frequency words, including representations
2) Numeric Count of both Unigram and Bigram.
This is basically calculated on counting the times of nu meric 2) Event Features
occurrences in a sentence, As sentences containing An event consists of a term for an event and related event
numbers/math do not contribute to a healthy summary. elements.
3) Title Score 3) Relevance Features

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1312


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

To manipulate inter-sentence relationships, significance workab le solution to a major problem (as there are a few
characteristics are integrated. Basic SVM needs the solution guidelines for retrench ment) for a single document
of the following optimization problem for a set of training summarization problem.
examples. A hyper plane is used to distinguish research
examples as positive and negative is predicted to be found
here by the SVM classifier. The object ive of Probabilistic Padmaku mar & Saran, [4] suggests that group s entences
SVMs is to estimate the shown at the top of the vector identify groups of similar
sentences and select representatives for these groups to form
a summary. The decoder is trained to specify embedding
PERA & NG, 2010, [28] has developed a model u sing the into sentences. They perform a co mbination of Sentence
Naive Bayes Classifier (NBC) by using a Na'ıve Bayes Embedding using RNN v ia Long Short Term Memory
classifier to verify that going through the classifying text (LSTM), where they use a repetit ive neural network with
documents/phrases and using their summaries, instead of short-term memo ry to determine embedding into sentences.
going through the entire documents, is really cost-effective. When representing sentences at a high vector level, the goal
The Naïve Bayesian Classier insists that characteristics are is usually to embed sentences directly or indirectly in such a
autonomous. It learns the previous probability and the way that the sentences closest to the definition are
conditional probability of each function, and the highest embedded next to each other in the vector space. Since
posterior probability predicts the class mark. sentences that form a g roup in a vector space may be close
to each other, it is sufficient to keep one representative in
each such group to make a summary.

B. Abstractive Schumann [5] introduces an uncontrolled method of


1) Unsupervised summarizing sentences in a logical way using the
Variational AutoEncoder (VA E). It is known for its flexib le
In Dohare, [2] the Semantic Abstractive Su mmarization mathematical learn ing, wh ich represents high input. VAEs
(SAS) pipeline is developed. SAS first produces an Abstract are trained in learning to reconstruct the input from potential
mean ing Representation (AMR) graph of the input story, in variables. Provid ing exp licit in formation about the length of
which it pulls out an abridged graph and finally, forms discharge during training influences VA E not to include this
abridged sentences from this abridged graph. Th ey informat ion and can therefore be used during consideration.
developed a comprehensive approach to generating an AMR Instructing the decoder to produce a short output sequence
story graph using coreference resolution and Meta Nodes. leads to the output of the input sentence in a few words. The
They used an unattended novel algorith m depending on how VA E system uses text data using RNNs such as encoder and
people summarize a piece of text to extract a summary decoder. The vectors µ and σ are formed fro m the last
graph below. The p ipe has three important steps. The 1st hidden coding state and the first codec cell state is started as
step is to convert the document to AMR. The 2nd step is to z. They use a forward and backward bidding code. They
extract an AMR summary fro m the AMR document created show in different summed data sets, that these short
in the previous step. The 3rd step is to create text fro m the sentences cannot form a simple foundation but produce
extracted sub-graph. It surpasses previous methods of SAS higher ROUGE (Recall-Oriented Understudy for Gisting
technology by 1.7% and 3.7% using basic human solutions. Evaluation) points than trying to construct a whole sentence.
The idea that upgrading the decoder to produce shorter
results will lead to more details expressed in a few words
Chu & J. Liu [3] states that abstract summaries are made in can be confirmed in the summary test. Linear Regression
data stocks of large, paired text documents. However, such tests have shown that the length of the input sentence is
data sets are rare and the models trained in them are rare in inserted with the latest variable.
other domains. Recently, progress has been made in a
sequence of sequences by single pairs. They only look at the Zhang et al., [6] states that the method of summarizing the
places where there are only texts (product or business dialogue should take into account the context of many
reviews) without the given summaries and suggest end -to- speakers where speakers have different ro les, purposes, and
end constructions, which construct a neural model to create language styles. In a tete-a-tete, as a consumer conversation,
uncontrolled summaries. The MeanSu m model has two key SuTaT aims to summarize each speaker by modeling
features: customer words and agent words separately while
1) A module of auto-encoder that learns the maintaining their integration. SuTaT consists of a
representation of each review and prevents the generated conditional production module and two unselected summary
summary from being a language domain. modules. The purpose is to generate a customer summary
2) A module o f summary that learns to generate and a data agent summary. The design is similar to how a
meeting summaries similar to each input document. tete-a-tete is made: representative responses and customer
The most common methods of using the abstract neural requests depend on each other. Fine-looking models that can
abstract are using supervised readings of many pairs to be rented perform better than unsupervised output models.
summarize the most expensive documents to find on the Co mpared to other unstructured abstract foundations
scale. The proposed model is not very catchy because it installed with LSTM encoders and decoders, SuTaT-LSTM
ignores attention or directions. The model does not offer a has significant performance improvements.

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1313


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

few different supervised abstractive techniques that we have


Zheng et al. [7] states that the available su mmaries are based included in this paper.
primarily on well-structured models in well-structured texts
such as CNN and DailyMail news. Variations fro m the Raphal et al., [10] gives a brief about various RNN
news, podcasts tend to be longer, more engaging and variants used for Abstractive Text su mmarization. The basic
chattering, and more vocal about co mmercial and sponsored RNN lacks in capturing the long term dependencies. This is
content, making the default Podcast Summarization a major rectified using Long Short Term Memory (LSTM ) RNN
challenge. They have designed two simp le foundations for model. LSTM consists of an input gate, output gate and a
model co mparisons: (1) Baseline 1: Choose the first tokens forget gate. It is used to capture long term dependencies. It
fro m the text as a summary. (2) Princip le 2: Select the final also solves the Vanishing Gradient problem.
tokens in the text as a summary. The idea of these two basic
elements is that the end of a podcast can contain very Khatri, Singh, and Parikh [11] implements the
important content details. Based on the basic analysis in this Abstractive Contextual RNN (A C-RNN) where a document
paper, we discuss many guidelines for future research: (1) context vector is passed as input at the first step to the
Summary based on long narrative construction: Simp le encoder. The logic behind this approach is that if a person
structure of heuristics is not the same as long narrat ive (2) knows the context of a text, he/she can make the summary
Conversation summary : podcasts are conversational, more easily as it prov ides a better understanding. This
interactive, and general. How to use existing research to solves a major drawback by generating more document
help summarize podcasts is still in short supply. (3) Mu lti- focused summaries rather than generic summaries.
module podcast analysis: We believe that mult idisciplinary
analysis is important in understanding the podcast and Liu and Liu [12] presents a Supervised Abstractive Model
should therefore play an important ro le in summarizing the using Conditional Random Fields (CRF) where the
podcast and recommendations. utterance compression is done as a sequence labeling task
and is based on the Maximu m Marginal Relevance (MMR).
Yang et al., [8] is an uncontrolled abstract model with a It uses BIO labeling scheme for sequence labeling. MMR
denoising system that uses a transformer-decoder-based score along with term weight of wo rd is determined. The
encoder-decoder structure and uses pre-training for massive model selects the summary sentences, iteratively, until the
unregistered power. The paper has three elements - (1) To given length limit is reached.
use key sentences as the summary and to train the model to
predict it during pre-training train ing. (2) Trained with the C. Reinforcement Learning
loss of the theme model and the denoising autoencoder.
TED uses a multi-line decoder converter. (3) Instead of Reinforcement learning is used in text summarization to
classical word tokenization, they use the SentencePiece improve the efficiency of the existing techniques. This is
tokenizat ion. It fo llows the default configuration of the done by training an agent with reward or punishment on
network converter. every decision it makes and getting an optimal policy that
would be used to generate summaries. In this paper we have
Baziot is et al. [36] suggests a Sequence-to-Sequence-to- discussed a few reinfo rcement learn ing techniques that are
Sequence Autoencoder (SEQ3) sequence, consisting of two used for automatic text summarization.
pairs of encoder-decoders. Here, words are used as a
sequence of unintelligible variables. The first and last seq Lee, & Lee [13] presents a Rein forcement learning model
with Embedding Features. They have used the Deep Q-
Networks (DQN)-based model. The sentences are
Baziot is et al., [36] suggests a Sequence-to-Sequence-to- represented as sentence embedding vectors. The Q-Values
Sequence Autoencoder (SEQ3) sequence, consisting of two are computed using a Deep Neural Network model w ith a
pairs of encoder-decoders. Here, words are used as a regression function. The Q-value is used to select the
sequence of unintellig ible variab les. The first and last sentences. The role of agent is to select the sentences using
sequences are inserted and reconstructed sentences, sentence selector and generate a summary.
respectively, wh ile the middle sequence is a comp ressed
sentence. The Embedding layer enables source sequence Prakash & Shukla [14] presents the Human Aided Text
embedding coded projects by bidirectional RNN. To make a Summarizer “SAAR”, for a single docu ment setup. The
summary, we use the RNN decoder targeted, for their input is passed through the preprocessing where the text is
ground care and input feeds. The co mpiler works as a Tokenized and Isolated and a structured representation is
compressor, but its encoder works in the e mbedding of the created. The weights are calculated using ISF and IDF. RL
abbreviated words. is used to calculate the sentence score and a Term-Sentence
Matrix is created. This is used to calculate the similarity
2) Supervised using Euclidean distance and the sentence with the
maximu m distance is selected for the summary. The user
Abstractive supervised is a technique that creates checks the summary and if it is not adequate then the user
summaries with words and sentences that are not present in gives feedback keywords according to which a new
the input text. It trains a supervised learning model with a summary is generated.
dataset containing articles and their summaries. There are a

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1314


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

Mohsen et al., [15] presents a Hierarchical Self-Attentive Rank values for CRSSF are identified on the basis of
Neural Ext ractive Su mmarizer Via Reinfo rcement Learning 1) The word-correlation(W C) variab les that were
(HSA SRL) model. The first component is the Attention previously added
Sentence Encoder which has a bidirectional LSTM (Bi- 2) Degrees of sentence similarity.
LSTM) sentence encoder which encodes sentences into Talking about the WC factor and sentence similarity, it
sequential representation vectors. Then the Attention contains the non-stop, stemmed terms correlation variables,
Document Encoder composes a document representation. which is a 54,625 square symmetric matrix.
Then the Sentence Extractor labels the sentences as 1 or 0 The correlation factor of each of the t wo terms wi and wj,
according to their relevance in the summary. A learning which shows how closely wi and wj are related to
Agent is trained to rank the sentences. It directly optimizes semantics, is determined on the basis of the
the ROUGE scores. The agent is initialized rando mly and as 1) Co-occurrence frequency.
it reads the documents it learns. The agent receives a reward 2) Relative distance between Wi and Wj.
for every match of its summary from the gold summary.

D. Hybrid III. DAT ASET

Hybrid approaches to text summarization are a T ABLE 1. DATASETS FOR SUMMARIZATION


combination o f d ifferent techniques in order to counter their
drawbacks. We have covered a few hybrid techniques in this
paper. Dataset Description

Bhagchandani et al., [16] d id research to build a Hybrid CNN/Daily Mail It consists of both articles and
model that consists of three co mponents - Clustering, Word summaries of long news articles.
Graphs, and Neural Networks. Th is model was developed
Gigaword It consists of nearly ten million
for an abstractive multi-document setup. The model takes documents, articles, and their
mu ltip le documents and sends them to the preprocessing headlines, (over four billion
module where No rmalization o f passages is done. Files are words) of the original English
tokenized to sentences and then to words. A single list of Gigaword Fifth Edition.
pre-processed strings is sent as input to the Summarization
NYT The New York T imes dataset
module where they are clustered and condensed to
contains the full text and metadata
sentences. These are then ranked using Text Rank which is
of NYT articles from 1987 to
an unsupervised extractive su mmarization technique A 2007.
Seq2Seq Encoder Decoder model then performs sentence
compression and the final summary is generated. DUC The Document Understanding
Conference (DUC) archives and
Wong et al., [27] has implemented a hybrid model for synopses assessed by the National
extractive text summarization through Probabilistic SVM Institute of Standards and
T echnology (NIST ) since 2001.
and NBC. Supervised approaches to learning typically
achieve good output but require data that is manually 20NG It consists of 19,997 papers
labelled. The amount of labelled data is decreased by co - compiled in 20 separate categories
training techniques. A co-training approach is developed to from the Usenet newsgroup
train different classifiers based on the same feature space. archive. 80 percent of the
documents in 20NG were used for
The comb ination of surface, material, and relevance features MNB preparation for assessment
are added to PSVM and NBC. Co-train ing was applied to purposes and the remaining 20
combine labelled and unlabeled data to decrease labelling percent for classification
costs. Experiments show that the semi-supervised learning assessment.
method saves half of the cost of labelling and retains
T IDSUMM TIDSUMM contains Darknet
equivalent efficiency (0.366 vs. 0.396) co mpared with utilization information with 6831
supervised learning. The ROUGE outcomes of the same documents of 26 distinct
summary process are enhanced. classifications crawled over the
onion web or T or network.
PERA & NG, [28] imp lemented a new hybrid method
T T News A Chinese news summarization
which consists of 2 methods CorSum and Na'ıve Bayes corpus, created for the shared
classifier(NBC). The preco mputed word-correlat ion factors summarization task at NLPCC
are used by the CorSu m arch itecture to classify 2017
representative phrases in a text to produce the description. In
order to enhance the quality of CorSu m produced SummMac SummMac contains records about
summaries, CorSum-SF(CSSF) relies on word similarity. computer science gathered from
The NBC is then used in large collections to classify CSSF ACL sponsored conferences.
created summaries of documents available on the web.

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1315


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

IV. EVALUAT ION M ET RICS ● Anaphora Problem: To understand which pronoun used
The crucial step after generating a su mmary is to evaluate it. in the art icle is a substitution for wh ich of the prev iously
introduced terms.
The summary can be evaluated by two methods: Automatic
and by Humans. Automatic evaluation is a more feasib le ● Cataphora Problem: To understand which ambiguous
words or exp lanations are used to refer to a part icular term
option than human evaluation because it is simple, fast and before even introducing the term itself.
scalable. Text Summarizat ion evaluation methods are: [24]
Future scope of Automatic Text Su mmarizat ion is to resolve
A. Extrinsic Evaluation these challenges (and some other challenges) and make this
In this, the summary is checked on the basis of how the technology mo re easier and feasible to imp lement. Research
summary will help to acco mplish other tasks, like on Automatic Text Su mmarization is still going on to find
informat ion classification, answering questions, etc. the perfect model that can generate a summary like a real
successfully. For examp le, the reading co mprehension is a human.
summary about a given topic and helps to answer mu ltip le
questions. Therefore, a summary is good if it helps in
completion of other tasks. REFERENCES

B. Intrinsic Evaluation [1] Gonçalves, Luís. 2020. “Automatic Text Summarization with
In this, the summary is analyzed between the automatic Machine Learning — An overview.” Medium.com.
https://medium.com/luisfredgs/au(Gonçalves,2020)Automatic-text-
generated outline and human made outline. Intrinsic summarization-with-machine-learning-an-overview-68ded5717a25.
evaluation is done based on text quality, co-selection and [2] Dohare, S., Gupta, V., & Karnick, H. (2018, July).
content based. Unsupervised semantic abstractive summarization. In Proceedings of ACL
The most prominent and frequently used intrinsic method of 2018, Student Research Workshop (pp. 74-83).
summary evaluation is ROUGE score, which co mes under [3] Chu, E., & Liu, P. (2019, May). MeanSum: a neural model for
content based evaluation. unsupervised multi-document abstractive summarization. In International
Conference on Machine Learning (pp. 1223-1232).
ROUGE(Recall Oriented Understudy of Gisting Evaluation)
[4] Padmakumar, A., & Saran, A. (2016). Unsupervised Text
is an automatic summary evaluation method which
Summarization Using Sentence Embeddings (pp. 1-9). Technical Report,
calculates the score based on similarity between machine University of Texas at Austin.
summary and human made summary. ROUGE score can be [5] Schumann, R. (2018). Unsupervised abstractive sentence
calculated by 5 ways: [37] summarization using length controlled variational autoencoder. arXiv
1) ROUGE-N: This measures the recall score based on a preprint arXiv:1809.05233.
similar sequence of words in both the summaries called n - [6] Zhang, X., Zhang, R., Zaheer, M., & Ahmed, A. (2020).
gram where n is the length of n-gram. Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes. arXiv
2) ROUGE-L: This gives a ratio between the size of Longest preprint arXiv:2009.06851.
Co mmon Subsequence between the two summaries and size [7] Zheng, C., Wang, H. J., Zhang, K., & Fan, L. (2020). A
of reference summary. It should be less than the unigram F Baseline Analysis for Podcast Abstractive Summarization. arXiv preprint
score of both LCS. arXiv:2008.10648.
[8] Yang, Z., Zhu, C., Gmyr, R., Zeng, M., Huang, X., & Darve, E.
3) ROUGE-W : This is the same as ROUGE-L only the
(2020). TED: A Pretrained Unsupervised Summarization Model with
weights, that is, common consecutive words.
T heme Modeling and Denoising. arXiv preprint arXiv:2001.00725.
4) ROUGE-S: It calcu lates the amount of skip b igram [9] Wang, Y. S., & Lee, H. Y. (2018). Learning to encode text as
common between the two summarizes. human-readable summaries using generative adversarial networks. arXiv
5) ROUGE-SU: Th is is an improvement ROUGE-S with preprint arXiv:1810.02851.
weighted unigram. [10] Raphal, Nithin, Hemanta Duwarah, and Philemon Daniel. n.d.
“Survey on Abstractive Text Summarization.” International Conference on
Communication and Signal Processing, April 3-5, 2018, India.
V. CONCLUSION A ND FUT URE SCOPE [11] Khatri, C., Singh, G., & Parikh, N. (2018). Abstractive and
extractive text summarization using document context vector and recurrent
In this paper, we have reviewed various research papers on neural networks. arXiv preprint arXiv:1807.08000.
abstractive, extract ive, hybrid techniques along with [12] Liu, F., & Liu, Y. (2013). Towards abstractive speech
learning methods - supervised, unsupervised and summarization: Exploring unsupervised and supervised approaches for
reinforcement. These papers have different algorith ms and spoken utterance compression. IEEE Transactions on Audio, Speech, and
workings but all have pro mising results. Each of these Language Processing, 21(7), 1469-1480.
[13] Lee, G. H., & Lee, K. J. (2017, November). Automatic text
techniques have their own set of challenges which can be summarization using reinforcement learning with embedding features. In
solved using a certain variation in a particular technique. Proceedings of the Eighth International Joint Conference on Natural
But the most significant and co mmon challenges that are yet Language Processing (Volume 2: Short Papers) (pp. 193-197).
to be resolved are - [14] Prakash, C., & Shukla, A. (2014, September). Human Aided
Text Summarizer" SAAR" Using Reinforcement Learning. In 2014
● Evaluation of the summaries: Judging the quality of the International Conference on Soft Computing and Machine Intelligence (pp.
summary subjectively, both the manually written summaries 83-87). IEEE.
and the summaries generated by the model. [15] Mohsen, F., Wang, J., & Al-Sabahi, K. (2020). A hierarchical
self-attentive neural extractive summarizer via reinforcement learning
● Labelled Data: To get more manually written summaries
(HSASRL). Applied Intelligence, 1-14.
that can be fed to the model as training and evaluation data [16] Bhagchandani, G., Bodra, D., Gangan, A., & Mulla, N. (2019,
in supervised learning techniques. May). A Hybrid Solution To Abstractive Multi-Document Summarization
Using Supervised and Unsupervised Learning. In 2019 International

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1316


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

Conference on Intelligent Computing and Control Systems (ICCS) (pp. [28] Pera, M. S., & Ng, Y. K. (2010). A Naive Bayes classifier for
566-570). IEEE. web document summaries created by using word similarity and significant
[17] García-Hernández, R. A., Montiel, R., Ledeneva, Y., Rendón, factors. International Journal on Artificial Intelligence Tools, 19(04), 465-
E., Gelbukh, A., & Cruz, R. (2008, October). T ext summarization by 486.
sentence extraction using unsupervised learning. In Mexican International [29] Bui, D. D. A., Del Fiol, G., Hurdle, J. F., & Jonnalagadda, S.
Conference on Artificial Intelligence (pp. 133-143). Springer, Berlin, (2016). Extractive text summarization system to aid data extraction from
Heidelberg. full text in systematic review development. Journal of biomedical
[18] Joshi, A., Fidalgo, E., Alegre, E., & Fernández-Robles, L. informatics, 64, 265-272.
(2019). SummCoder: An unsupervised framework for extractive text [30] Moratanch, N., & Chitrakala, S. (2017, January). A survey on
summarization based on deep auto-encoders. Expert Systems with extractive text summarization. In 2017 international conference on
Applications, 129, 200-215. computer, communication and signal processing (ICCCSP) (pp. 1-6).
[19] El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. IEEE.
K. (2020). EdgeSumm: Graph-based framework for automatic text [31] Amini, M. R., & Gallinari, P. (2001, September). Automatic
summarization. Information Processing & Management, 57(6), 102264. text summarization using unsupervised and semi-supervised learning. In
[20] Zheng, H., & Lapata, M. (2019). Sentence centrality revisited European Conference on Principles of Data Mining and Knowledge
for unsupervised summarization. arXiv preprint arXiv:1906.03508. Discovery (pp. 16-28). Springer, Berlin, Heidelberg.
[21] Vanetik, N., Litvak, M., Churkin, E., & Last, M. (2020). An [32] Krishnan, D., Bharathy, P., & Venugopalan, A. M. (2019, May).
unsupervised constrained optimization approach to compressive A Supervised Approach For Extractive T ext Summarization Using Minimal
summarization. Information Sciences, 509, 22-35. Robust Features. In 2019 International Conference on Intelligent
[22] Ozsoy, M. G., Alpaslan, F. N., & Cicekli, I. (2011). Text Computing and Control Systems (ICCS) (pp. 521-527). IEEE.
summarization using latent semantic analysis. Journal of Information [33] Shah, C., & Jivani, A. (2019). An Automatic Text
Science, 37(4), 405-417. Summarization on Naive Bayes Classifier Using Latent Semantic Analysis.
[23] Song, W., Choi, L. C., Park, S. C., & Ding, X. F. (2011). Fuzzy In Data, Engineering and Applications (pp. 171-180). Springer, Singapore.
evolutionary optimization modeling and its applications to unsupervised [34] Gillick, D., & Favre, B. (2009, June). A scalable global model
categorization and extractive summarization. Expert Systems with for summarization. In Proceedings of the Workshop on Integer Linear
Applications, 38(8), 9112-9121. Programming for Natural Language Processing (pp. 10-18).
[24] Steinberger, J., & Ježek, K. (2012). Evaluation measures for text [35] McDonald, R. (2007, April). A study of global inference
summarization. Computing and Informatics, 28(2), 251-275. algorithms in multi-document summarization. In European Conference on
[25] Collins, E., Augenstein, I., & Riedel, S. (2017). A supervised Information Retrieval (pp. 557-564). Springer, Berlin, Heidelberg.
approach to extractive summarisation of scientific papers. arXiv preprint [36] Baziotis, Christos, et al. "SEQ^ 3: Differentiable Sequence-to-
arXiv:1706.03946. Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence
[26] Charitha, S., Chittaragi, N. B., & Koolagudi, S. G. (2018, Compression." arXiv preprint arXiv:1904.03651 (2019).
August). Extractive document summarization using a supervised learning [37] Lin, Chin-Yew. "Rouge: A package for automatic evaluation of
approach. In 2018 IEEE Distributed Computing, VLSI, Electrical Circuits summaries." In Text summarization branches out, pp. 74-81. 2004.
and Robotics (DISCOVER) (pp. 1-6). IEEE.
[27] Wong, K. F., Wu, M., & Li, W. (2008, August). Extractive
summarization using supervised and semi-supervised learning. In
Proceedings of the 22nd international conference on computational
linguistics (Coling 2008) (pp. 985-992).

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1317


Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on February 01,2025 at 06:34:14 UTC from IEEE Xplore. Restrictions apply.

You might also like