Seminar Text Summarization 1
Seminar Text Summarization 1
Chapter 1
Introduction
1. Abstractive Summarization
2. Extractive Summarization
Conventional methods for text summarization includes directly extracting words from the
textual content to represent the summary, Text summarization include removing stop words,
identifying noun groups, lemmatization etc. The major disadvantage of conventional methods
is that the summary generated may contain redundant words. As there are no record of the
words that are already been selected it is possible that words may repeat itself in the summary
as it does in the main text. Also, in conventional methods the relation between the summary
that are generated and the document is very low. Thus, making it difficult for the users to
have clear understanding of the document from the summarized content. Thus to overcome
Fig 1.3: Three layered neural network with one input layer,one output layer and one hidden
layer
The paper is organized as follows: Section II explains the various models of deep
learning that are employed in text summarization and Section III compares various text
Chapter 2
Literature Review
Anish Jadhav, Rajat Jain, Steve Fernandes, Sana Shaik (2019) -Text
Summarization using Neural Network Encoder-Decoder model with attention layer
performs better than only ender decoder model. Required more data to improve the
accuracy of the model. It is very difficult to digest huge amount of data which has
provided stimulus to generate automatic summarization where the main focus is on
generating summary from given document. Focus is on sentence extraction. It helps
to create the summary by identifying salient text units. The extractive approach
involves the different words in title, nouns, word frequency, action nouns etc. There
are various methods selected to generate summary like graph based algorithms,
sentences ranging from binary classifiers, integer linear programming. The data
driven approach has been used in this paper with the help of neural networks and
sentence features. Machine translation has become very important part. Encoder
Decoder is used by recurrent neural network. The encoder helps in reading the source
sequence into the list whereas decoder helps in generating the target sequence.
Framework is developed for single documents summarization which helps in
extracting the sentences. The model includes neural network-based hierarchical
document reader or encoder and an attention-based content extractor. Reader helps in
telling the meaning of paragraph based on sentences.
packed portrayals of news points more often than not as features to encourage
perusing. Programmed content synopsis is troublesome and non paltry errand. Luhn
et al presented a technique to choose acclaimed sentences from the content
utilizing highlights,
has provided stimulus to generate automatic summarization where the main focus is
on generating summary from given document. Focus is on sentence extraction. It
helps to create the summary by identifying salient text units. The extractive approach
involves the different words in title, nouns, word frequency, action nouns etc. There
are various methods selected to generate summary like graph based algorithms,
sentences ranging from binary classifiers, integer linear programming. The data
driven approach has been used in this paper with the help of neural networks and
sentence features. Machine translation has become very important part. Encoder
Decoder is used by recurrent neural network. The encoder helps in reading the
source sequence into the list whereas decoder helps in generating the target
sequence. Framework is developed for single documents summarization which helps
in extracting the sentences. The model includes neural network-based hierarchical
document reader or encoder and an attention-based content extractor. Reader helps in
telling the meaning of paragraph based on sentences. Model include different
neural network to extract sentences.
Chapter 3
Sequence-to-Sequence Modeling
Machine learning includes the neural network method referred to as "sequence-to-
sequence learning," which is primarily utilized in models for language processing.
The objective is to attempt to anticipate the following state arrangement based on the
previous succession using two RNNs that will cooperate using a unique token.
1) Encoder
2) Decoder
Encoder-Decoder: The encoder will record the input sequence's information and
represent it in a hidden state. The output order will be predicted by the decoder using the
encoder's most recent hidden input state. The two major methods used to increase the
encoder's efficiency are reversing the input text (reverse encoder) and bidirectional encoding.
The reverse encoder receives the input sentence in its reversed form. Alternative:
bidirectional RNN. The past and future are only vaguely known.
Attention Mechanism
A complex cognitive ability required for humans is attention. An important feature of
perception is that people often do not validate most of the information at once. Instead,
people tend to selectively focus on a certain piece of information when and where it is needed
while at the same time ignoring other noteworthy information. The efficiency and accuracy
of cognitive information processing is significantly increased by the attention mechanism.
The encoder-decoder model for machine translation has been improved with the
addition of an attention mechanism. The idea behind the attention mechanism is to give the
decoder flexible access to the most important components of the input sequence through a
weighted permutation of all the encoded input vectors, the best-fit vectors get the highest
weights. In this project we use global attention, which is a kind of attention mechanism.
Dataset
The dataset consists of 98000 news articles and summaries. The dataset contains news
articles with different lengths.
Train validate split
90%-10% split is used. 90% data is for training the model and the 10% data is used for
validation. The model accuracy is measured using this 10% data.
Data preprocessing
The dataset has null values with variable article sizes. Thus, it requires preprocessing before
using it in model training.
2. Prepended <start> and appended <end> tag to each article and summary to allow the
article to detect the start and end of the article.
3. Padded each article and summary to create constant size input and output.
Model architecture
Model architecture defines the logical connections of various functions used in the
model creation and training. We used the batch size of 100 to train the model because we
don't have the memory to store more than a hundred samples at a time to train the model. The
model architecture is mainly consisting of an encoder network, Decoder network, and
attention layer.
Encoder - network consists of 128 gated recurrent units it accepts the vectorized input. The
output of the encoder is given to the attention layer.
Attention layer - attention layer is responsible for eliminating repeated words in the summary
and generating proper grammar in the abstractive summary. It has 64 units. The output of the
attention layer is provided to the Decoder.
Decoder - network consists of 128 gated recurrent units, it generates the numerical output,
this output is mapped using the tokenized array which is built using Python at the start of the
training of the model.
Fig -3.2: Layers in network
The above diagram shows the layers That are used in our project.
System Implementation
We implement the abstractive method using the deep learning technique called Long
Short Term Memory (LSTM) which is a type of Recurrent Neural Network Algorithms. The
data used for this project is CNN_dailymail dataset.
Data
The data used is the CNN_dailymail dataset. It has two features: article and
highlights. The article includes the document that is to be summarized. It is the news article.
Highlights are the headlines of the corresponding news which are used as summaries.
Method
The model used is the abstractive method which is implemented using deep learning
techniques.
Algorithm
The algorithm used is the LSTM or Long Short Term Memory model which is a type
of Recurrent Neural Network model.
Model
The model used is sequence to sequence model. Sequenceto-sequence learning is a
training model that can convert sequences of one input domain into the sequences of another
output domain. It is generally used when the input and output of a model can be of variable
lengths.
Data Preprocessing
Performing basic pre-processing steps is very important before we get to the model
building part. Using messy and uncleaned text data is a potentially disastrous move. So, we
will drop all the unwanted symbols, characters, etc. from the text that do not affect the
objective of our problem.
Algorithm
In the current days, we are trying to create algorithms which can help us replicate the
human brain and achieve its functionalities. This has been achieved by the neural networks.
Neural Networks are the set of algorithms that an recognize patterns in the data. They closely
resemble the human brain and have the capability to create models that can work or function
like a human brain. Recurrent Neural Network (RNN) are a type of neural networks. They
are feedforward neural networks which have an internal memory. In a traditional neural
network, the input and the output sequences are independent of the each other. But in order
to predict a sequence or a sentence, we need to know the previous words to predict the next
word. Hence, we need internal memory. RNN helps us store the previous memory with the
help of hidden states which remembers information about previous sequences.
The RNN is named so as it recurrently performs the same function on all input of data
and the hidden layers. This reduces the complexity of having to store various parameters for
each of the layers in the network thus saving the memory. The output of the current input
depends on the past outputs too. After the output is produced, it is sent back to the same
network so that it can be stored and used for the processing of next output in the same
sequence. In order to generate an output in RNN, we consider the current input and the
output that was stored from the previous input.
RNNs work perfectly when it comes to short contexts. But when we want to create a
summary of a complete article, we need to capture the context behind the complete input
sequence and not just the output of the previous input. Hence, we need a network that can
capture the complete context like a human brain. Unfortunately, simple RNN fails to capture
the context or the long term relation of the data that is it cannot remember or recall data in
the input that occurred long before and hence cannot make an effective prediction. RNN can
remember data or context only for a short term. This is called vanishing gradient problem.
This issue can be resolved by a slightly different version of RNN - The Long Short Term
Memory Networks
Long Short-Term Memory (LSTM) networks are a better version of RNN. They can
remember the past data easily by resolving the vanishing gradient problem. LSTM uses back
propagation to train the model. LSTM is well-suited for predictions and classifications of
data sequences of unknown durations. They can also be used in language translation and text
summarization methods.
Model
another output domain. It is generally used when the input and output of a model can be of
variable lengths. It is a method of encoder-decoder based machine translation that maps an
input of sequence to an output of sequence with a tag and attention value. The idea is to use
two LSTMs that will work together with a special token and try to predict the next state
sequence from previous sequence.
Encoder-Decoder architecture:
Encoder
An encoder is an LSTM network which reads the entire input sequence. At each time
step, one word from the input sequence is read by the encoder. It then processes the input at
each time step and captures the context and the key information related to the input sequence.
It takes each word of input(x) and generates the hidden state output (h) and the cell state
which is an internal state(c). The hidden state(hi) and cell state(ci) of the last time step are the
internal representation of the complete input sequence which will be use to initialize the
decoder.
Decoder
The decoder is also an LSTM network. It reads the entire internal representation
generated by the encoder one word at a time step. It then predicts the same sequence offset by
one time step. The decoder is trained to predict the next word in the output sequence given
the previous word based on the contextual memory stored by the LSTM architecture. Two
special tokens <start> and <end> are added at the beginning and at the end of the target
sequence before feeding it to the decoder. We start predicting the target sequence by passing
one word at a time. The first word of output of the decoder is always <start> token. The end
of the output sequence is represented by <end> token.
The above architecture of the model is built using the TensorFlow library which used
to build layers in neural networks. The final architecture of the model will be as shown below
Attention layer
Chapter 4
Results
We have compared 3 models to generate an abstractive summary. The Encoder-
decoder model with the attention layer performs best.
RNN Encoderdecoder Encoderdecoder
model model with
Attention
layer
Accuracy: 10% Accuracy: 40% Accuracy: 60%
to 70%
summary.
Training loss of Encoder-decoder model with attention layer for 200 epochs.
Chapter 5
Conclusion
In conclusion, the exploration of Text Summarization using Deep Learning
underscores the potential of advanced computational techniques in distilling extensive textual
information into concise and coherent summaries. This methodology, characterized by
systematic data preprocessing, model selection, and rigorous evaluation, showcases the
power of models like Sequence-to-Sequence with Attention and transformer-based
architectures. By embracing fine-tuning, hyperparameter tuning, and error analysis, this
approach not only advances the effectiveness of text summarization but also paves the way
for future innovations in automating information condensation. Ultimately, the fusion of deep
learning and summarization holds great promise in enhancing our capacity to navigate and
comprehend the burgeoning expanse of digital content.
Refrences
[1] U. Hahn, I. Mani, "of Automatic Researchers are investigating summarization tools
and methods that", IEEE Computer33. 11, pp. 29-36, November 2000.
[2] E. Lloret, M. Palomar, "Text summarization in progress:
a literature review" in Springer, Springer, pp. 1-41, 2012
[3] K. Spärck Jones, "Automatic summarizing: The state of the art", Information
Processing & Management, vol. 43, pp. 1449-1481, nov 2007
[4] A. Khan, N. Salim, "A review on abstractive summarization methods", Journal of
Theoretical and Applied Information Technology, vol. 59, no. 1, pp. 64-
[5] Opidi, A., 2019. A Genuine Introduction to Text Summary in Machine Learning. Blog,
FloydHub, April, 15.
[6] Lloret, E., 2008. Text summary: an overview. Paper supported by the Spanish
Government under the assignment TEXT-MESS (TIN2006-15265-C06-01).
[7] Kovačević, A. and Kečo, D., 2021, June. Bidirectional LSTM Networks for Abstractive
Text Summarization. In International Symposium on Innovative and Interdisciplinary
Applications of Advanced Technologies (pp. 281-293). Springer, Cham.
[8] Yang, L., 2016. Abstractive summarization for amazon reviews.
[9] . A novel approach to workload prediction using attention-based LSTM encoder-decoder
network in the cloud environment. EURASIP Report on Wireless Communications and
Networking, 2019(1), pp.1-18.
[10] Fabbri, A.R., Kryśiciński, W., McCann, B., Xioing, C., Socher, R. and Radev, D., 2021.
Summeval: Re-evaluating summary evaluation. Transactions of the Association for
Computational Language, 9, pp.391-409.
[11] Bhati, V. and Kher, J., 2019. Survey for Amazon fine food reviews. Int. Res. J. Eng.
Technol.(IRJET), 6(4).
[12] Syed, A.A., Gaol, F.L. and Matsuo, T., 2021. A survey of the state-of-the-art prototype in
neural abstractive text summary. IEEE Access, 9, pp.13248-13265.
[13] Raphal, N., Duwarah, H. and Daniel, P., 2018, April. Survey on abstractive text
summarization. In 2018 Global Summit on Communication and Signal Processing
(ICCSP) (pp. 0513-0517). IEEE.
[14] Sherstinsky, A., 2020. Basics of recurrent neural network (RNN) and long short-term
memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, p.132306.
[15] Shewalkar, A., 2019. Performance judgment of deep neural networks applied to voice
recognition: RNN, LSTM, and GRU. Thesis of Artificial Intelligence and Soft Computing
Research, 9(4), pp.235-245.
[16] He, W., Wu, Y. and Li, X., 2021, October. Attention Appliance for Neural Machine
Translation: A survey. In 2021 IEEE 5th IT, Networking, Electronic and Automating
Control Conference (ITNEC) (Vol. 5, pp. 1485-1489). IEEE.
[17] Song, S., Huang, H. and Ruan, T., 2019. Abstractive text summarization using LSTM-
CNN-based deep learning. Multimedia Tools and Applications, 78(1), pp.857-875.
[18] Costa-jussà, M.R., Nuez, Á. and Segura, C., 2018. Experimental research on encoder-
decoder architectures with attention to chatbots. Computación y Sistemas, 22(4), pp.1233-
1239.
[19] Niu, Z., Zhong, G. and Yu, H., 2021. An analysis of the attention mechanism of deep
learning. Neurocomputing, 452, pp.48-62.
[20] Xi, W.D., Huang, L., Wang, C.D., Zheng, Y.Y. and Lai, J., 2019, August. BPAM:
Guidance grounded on BP Neural Network with Attention Mechanism. In IJCAI (pp.
3905-3911).