0% found this document useful (0 votes)

29 views30 pages

Guide Text Summarization Using Deep Learning in Python

The document provides a comprehensive guide to building a text summarization model using deep learning in Python. It discusses extractive and abstractive summarization approaches and introduces sequence-to-sequence modeling for text summarization. The guide then covers concepts needed for building a text summarizer model before demonstrating an implementation in Python.

Uploaded by

RAPTER GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views30 pages

Guide Text Summarization Using Deep Learning in Python

Uploaded by

RAPTER GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Comprehensive Guide to Text

Summarization using Deep
Learning in Python
A complete guide for text summarization
in nlp. Learn about text summarization
using deep learning and how to build a
text summarization model in Python.

By Aravindpai Pai

15 min. read

Introduction
“I don’t want a full report, just give me a summary
of the results”. I have often found myself in this
situation – both in college as well as my
professional life. We prepare a comprehensive
report and the teacher/supervisor only has time to
read the summary.

Sounds familiar? Well, I decided to do something

about it. Manually converting the report to a
summarized version is too time taking, right?
Could I lean on Natural Language Processing
(NLP) techniques to help me out?

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 1/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

This is where the awesome concept of Text

Summarization using Deep Learning really helped
me out. It solves the one issue which kept
bothering me before – now our model can
understand the context of the entire text. It’s a
dream come true for all of us who need to come
up with a quick summary of a document!

And the results we achieve using text

summarization in deep learning? Remarkable. So
in this article, we will walk through a step-by-step
process for building a Text Summarizer using
Deep Learning by covering all the concepts
required to build it. And then we will implement
our first text summarization model in Python!

Note: This article requires a basic understanding

of a few deep learning concepts. I recommend
going through the below articles.

A Must-Read Introduction to Sequence

Modelling (with use cases)
Must-Read Tutorial to Learn Sequence
Modeling (deeplearning.ai Course #5)

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 2/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Essentials of Deep Learning: Introduction to

Long Short Term Memory

I’ve kept the ‘how does the attention mechanism

work?’ section at the bottom of this article. It’s a
math-heavy section and is not mandatory to
understand how the Python code works. However,
I encourage you to go through it because it will
give you a solid idea of this awesome NLP
concept.

What is Text Summarization in NLP?

Let’s first understand what text summarization is
before we look at how it works. Here is a succinct
definition to get us started:

“Automatic text summarization is the task of

producing a concise and fluent summary while
preserving key information content and overall
meaning”

-Text Summarization
Techniques: A Brief Survey, 2017

Text summarization refers to the technique of

condensing a lengthy text document into a
succinct and well-written summary that captures
the essential information and main ideas of the
original text, achieved by highlighting the
significant points of the document.

There are broadly two different approaches that

are used for text summarization:
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 3/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Extractive Summarization
Abstractive Summarization

Let’s look at these two types in a bit more detail.

Extractive Summarization

The name gives away what this approach does. We

identify the important sentences or phrases from
the original text and extract only those from the
text. Those extracted sentences would be our
summary. The below diagram illustrates extractive
summarization:

I recommend going through the below article for

building an extractive text summarizer using the
TextRank algorithm:

An Introduction to Text Summarization using

the TextRank Algorithm (with Python
implementation)

Abstractive Summarization
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 4/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

This is a very interesting approach. Here, we

generate new sentences from the original text.
This is in contrast to the extractive approach we
saw earlier where we used only the sentences that
were present. The sentences generated through
abstractive summarization might not be present in
the original text:

You might have guessed it – we are going to build

an Abstractive Text Summarizer using Deep
Learning in this article! Let’s first understand the
concepts necessary for building a Text
Summarizer model before diving into the
implementation part.

Exciting times ahead!

Introduction to Sequence-to-Sequence
(Seq2Seq) Modeling
We can build a Seq2Seq model on any problem
which involves sequential information. This
includes Sentiment classification, Neural Machine
Translation, and Named Entity Recognition – some
very common applications of sequential
information.

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 5/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

In the case of Neural Machine Translation, the

input is a text in one language and the output is
also a text in another language:

In the Named Entity Recognition, the input is a

sequence of words and the output is a sequence
of tags for every word in the input sequence:

Our objective is to build a text summarizer where

the input is a long sequence of words (in a text
body), and the output is a short summary (which is
a sequence as well). So, we can model this as a
Many-to-Many Seq2Seq problem. Below is a
typical Seq2Seq model architecture:

There are two major components of a Seq2Seq

model:

Encoder
Decoder

Let’s understand these two in detail. These are

essential to understand how text summarization
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 6/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

works underneath the code. You can also check

out this tutorial to understand sequence-to-
sequence modeling in more detail.

Understanding the Encoder-Decoder

Architecture
The Encoder-Decoder architecture is mainly
used to solve the sequence-to-sequence
(Seq2Seq) problems where the input and
output sequences are of different lengths.

Let’s understand this from the perspective of text

summarization. The input is a long sequence of
words and the output will be a short version of the
input sequence.

Generally, variants of Recurrent Neural Networks

(RNNs), i.e. Gated Recurrent Neural Network (GRU)
or Long Short Term Memory (LSTM), are preferred
as the encoder and decoder components. This is
because they are capable of capturing long term
dependencies by overcoming the problem of
vanishing gradient.

We can set up the Encoder-Decoder in 2 phases:

Training phase
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 7/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Inference phase

Let’s understand these concepts through the lens

of an LSTM model.

Training phase

In the training phase, we will first set up the

encoder and decoder. We will then train the model
to predict the target sequence offset by one
timestep. Let us see in detail on how to set up the
encoder and decoder.

Encoder

An Encoder Long Short Term Memory model

(LSTM) reads the entire input sequence wherein,
at each timestep, one word is fed into the
encoder. It then processes the information at
every timestep and captures the contextual
information present in the input sequence.

I’ve put together the below diagram which

illustrates this process:

The hidden state (hi) and cell state (ci) of the last
time step are used to initialize the decoder.
Remember, this is because the encoder and

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 8/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

decoder are two different sets of the LSTM

architecture.

Decoder

The decoder is also an LSTM network which reads

the entire target sequence word-by-word and
predicts the same sequence offset by one
timestep. The decoder is trained to predict the
next word in the sequence given the previous
word.

<start> and <end> are the special tokens which

are added to the target sequence before feeding
it into the decoder. The target sequence is
unknown while decoding the test sequence. So, we
start predicting the target sequence by passing
the first word into the decoder which would be
always the <start> token. And the <end> token
signals the end of the sentence.

Pretty intuitive so far.

Inference Phase

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 9/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

After training, the model is tested on new source

sequences for which the target sequence is
unknown. So, we need to set up the inference
architecture to decode a test sequence:

How does the inference process work?

Here are the steps to decode the test sequence:

1. Encode the entire input sequence and initialize

the decoder with internal states of the encoder
2. Pass <start> token as an input to the decoder
3. Run the decoder for one timestep with the
internal states
4. The output will be the probability for the next
word. The word with the maximum probability
will be selected
5. Pass the sampled word as an input to the
decoder in the next timestep and update the
internal states with the current time step
6. Repeat steps 3 – 5 until we generate
<end> token or hit the maximum length of the
target sequence

Let’s take an example where the test sequence is

given by [x1, x2, x3, x4]. How will the inference
process work for this test sequence? I want you to
think about it before you look at my thoughts
below.
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 10/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

1. Encode the test sequence into internal state

vectors
2. Observe how the decoder predicts the target
sequence at each timestep:

Timestep: t=1

Timestep: t=2

And, Timestep: t=3

Limitations of the Encoder – Decoder

Architecture
As useful as this encoder-decoder architecture is,
there are certain limitations that come with it.

The encoder converts the entire input sequence

into a fixed length vector and then the decoder
predicts the output sequence. This works only
for short sequences since the decoder is

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 11/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

looking at the entire input sequence for the

prediction
Here comes the problem with long sequences.
It is difficult for the encoder to memorize long
sequences into a fixed length vector

“A potential issue with this encoder-decoder

approach is that a neural network needs to be
able to compress all the necessary information
of a source sentence into a fixed-length vector.
This may make it difficult for the neural
network to cope with long sentences. The
performance of a basic encoder-decoder
deteriorates rapidly as the length of an input
sentence increases.”

-Neural Machine Translation by Jointly

Learning to Align and Translate

So how do we overcome this problem of long

sequences? This is where the concept of attention
mechanism comes into the picture. It aims to
predict a word by looking at a few specific parts of
the sequence only, rather than the entire
sequence. It really is as awesome as it sounds!

The Intuition behind the Attention

Mechanism
How much attention do we need to pay to every
word in the input sequence for generating a
word at timestep t? That’s the key intuition
behind this attention mechanism concept.

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 12/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Let’s consider a simple example to understand

how Attention Mechanism works:

Source sequence: “Which sport do you like the

most?
Target sequence: “I love cricket”

The first word ‘I’ in the target sequence is

connected to the fourth word ‘you’ in the source
sequence, right? Similarly, the second-word ‘love’
in the target sequence is associated with the fifth
word ‘like’ in the source sequence.

So, instead of looking at all the words in the

source sequence, we can increase the importance
of specific parts of the source sequence that
result in the target sequence. This is the basic
idea behind the attention mechanism.

There are 2 different classes of attention

mechanism depending on the way the attended
context vector is derived:

Global Attention
Local Attention

Let’s briefly touch on these classes.

Global Attention

Here, the attention is placed on all the source

positions. In other words, all the hidden states of
the encoder are considered for deriving the
attended context vector:

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 13/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Source: Effective Approaches to Attention-based

Neural Machine Translation – 2015

Local Attention

Here, the attention is placed on only a few source

positions. Only a few hidden states of the encoder
are considered for deriving the attended context
vector:

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 14/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Source: Effective Approaches to Attention-based

Neural Machine Translation – 2015

We will be using the Global Attention mechanism

in this article.

Understanding the Problem Statement

Customer reviews can often be long and
descriptive. Analyzing these reviews manually, as
you can imagine, is really time-consuming. This is
where the brilliance of Natural Language
Processing can be applied to generate a summary
for long reviews.

We will be working on a really cool dataset. Our

objective here is to generate a summary for the
Amazon Fine Food reviews using the abstraction-
based approach we learned about above.

You can download the dataset from here.

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 15/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Implementing Text Summarization in

Python using Keras
It’s time to fire up our Jupyter notebooks! Let’s
dive into the implementation details right away.

Custom Attention Layer

Keras does not officially support attention layer.

So, we can either implement our own attention
layer or use a third-party implementation. We will
go with the latter option for this article. You can
download the attention layer from here and copy it
in a different file called attention.py.

Let’s import it into our environment:

Import the Libraries

Read the dataset

This dataset consists of reviews of fine foods from

Amazon. The data spans a period of more than 10
years, including all ~500,000 reviews up to
October 2012. These reviews include product and
user information, ratings, plain text review, and
summary. It also includes reviews from all other
Amazon categories.

We’ll take a sample of 100,000 reviews to reduce

the training time of our model. Feel free to use the
entire dataset for training your model if your
machine has that kind of computational power.

Drop Duplicates and NA values

Preprocessing
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 16/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Performing basic preprocessing steps is very

important before we get to the model building
part. Using messy and uncleaned text data is a
potentially disastrous move. So in this step, we will
drop all the unwanted symbols, characters, etc.
from the text that do not affect the objective of
our problem.

Here is the dictionary that we will use for

expanding the contractions:

We need to define two different functions for

preprocessing the reviews and generating the
summary since the preprocessing steps involved
in text and summary differ slightly.

a) Text Cleaning

Let’s look at the first 10 reviews in our dataset to

get an idea of the text preprocessing steps:

data['Text'][:10]

Output:

We will perform the below preprocessing tasks for

our data:
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 17/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Convert everything to lowercase

Remove HTML tags
Contraction mapping
Remove (‘s)
Remove any text inside the parenthesis ( )
Eliminate punctuations and special characters
Remove stopwords
Remove short words

Let’s define the function:

b) Summary Cleaning

And now we’ll look at the first 10 rows of the

reviews to an idea of the preprocessing steps for
the summary column:

Output:

Define the function for this task:

Remember to add the START and END special

tokens at the beginning and end of the summary:

Now, let’s take a look at the top 5 reviews and

their summary:

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 18/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Output:

Understanding the distribution of the sequences

Here, we will analyze the length of the reviews and

the summary to get an overall idea about the
distribution of length of the text. This will help us
fix the maximum length of the sequence:

Output:

Interesting. We can fix the maximum length of the

reviews to 80 since that seems to be the majority
review length. Similarly, we can set the maximum
summary length to 10:

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 19/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

We are getting closer to the model building part.

Before that, we need to split our dataset into a
training and validation set. We’ll use 90% of the
dataset as the training data and evaluate the
performance on the remaining 10% (holdout set):

Preparing the Tokenizer

A tokenizer builds the vocabulary and converts a

word sequence to an integer sequence. Go ahead
and build tokenizers for text and summary:

a) Text Tokenizer

b) Summary Tokenizer

Model building

We are finally at the model building part. But

before we do that, we need to familiarize
ourselves with a few terms which are required
prior to building the model.

Return Sequences = True: When the return

sequences parameter is set to True, LSTM
produces the hidden state and cell state for
every timestep
Return State = True: When return state = True,
LSTM produces the hidden state and cell state
of the last timestep only
Initial State: This is used to initialize the
internal states of the LSTM for the first
timestep
Stacked LSTM: Stacked LSTM has multiple
layers of LSTM stacked on top of each other.
This leads to a better representation of the
sequence. I encourage you to experiment with

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 20/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

the multiple layers of the LSTM stacked on top

of each other (it’s a great way to learn this)

Here, we are building a 3 stacked LSTM for the

encoder:

Output:

I am using sparse categorical cross-entropy as

the loss function since it converts the integer
sequence to a one-hot vector on the fly. This
overcomes any memory issues.

Remember the concept of early stopping? It is

used to stop training the neural network at the
right time by monitoring a user-specified metric.
Here, I am monitoring the validation loss
(val_loss). Our model will stop training once the
validation loss increases:

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 21/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

We’ll train the model on a batch size of 512 and

validate it on the holdout set (which is 10% of our
dataset):

Understanding the Diagnostic plot

Now, we will plot a few diagnostic plots to

understand the behavior of the model over time:

Output:

We can infer that there is a slight increase in the

validation loss after epoch 10. So, we will stop
training the model after this epoch.

Next, let’s build the dictionary to convert the index

to word for target and source vocabulary:

Inference

Set up the inference for the encoder and decoder:

We are defining a function below which is the

implementation of the inference process (which
we covered in the above section):

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 22/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Let us define the functions to convert an integer

sequence to a word sequence for summary as well
as the reviews:

Here are a few summaries generated by the

model:

This is really cool stuff. Even though the actual

summary and the summary generated by our
model do not match in terms of words, both of
them are conveying the same meaning. Our model
is able to generate a legible summary based on
the context present in the text.

This is how we can perform text summarization

using deep learning concepts in Python.

How can we Improve the Model’s

Performance Even Further?
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 23/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Your learning doesn’t stop here! There’s a lot more

you can do to play around and experiment with the
model:

I recommend you to increase the training

dataset size and build the model. The
generalization capability of a deep learning
model enhances with an increase in the training
dataset size
Try implementing Bi-Directional LSTM which is
capable of capturing the context from both the
directions and results in a better context vector
Use the beam search strategy for decoding the
test sequence instead of using the greedy
approach (argmax)
Evaluate the performance of your model based
on the BLEU score
Implement pointer-generator networks and
coverage mechanisms

How does the Attention Mechanism

Work?
Now, let’s talk about the inner workings of the
attention mechanism. As I mentioned at the start
of the article, this is a math-heavy section so
consider this as optional learning. I still highly
recommend reading through this to truly grasp
how attention mechanism works.

The encoder outputs the hidden state (hj) for

every time step j in the source sequence
Similarly, the decoder outputs the hidden state
(si) for every time step i in the target sequence
We compute a score known as an alignment
score (eij) based on which the source word is
aligned with the target word using a score
function. The alignment score is computed from
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 24/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

the source hidden state hj and target hidden

state si using the score function. This is given
by:

eij= score (si, hj )

where eij denotes the alignment score for

the target timestep i and source time step j.

There are different types of attention mechanisms

depending on the type of score function used. I’ve
mentioned a few popular attention mechanisms
below:

We normalize the alignment scores using

softmax function to retrieve the attention
weights (aij):

We compute the linear sum of products of the

attention weights aij and hidden states of the
encoder hj to produce the attended context
vector (Ci):

The attended context vector and the target

hidden state of the decoder at timestep i are

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 25/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

concatenated to produce an attended hidden

vector Si

Si= concatenate([si; Ci])

The attended hidden vector Si is then fed into

the dense layer to produce yi

yi= dense(Si)

Let’s understand the above attention mechanism

steps with the help of an example. Consider the
source sequence to be [x1, x2, x3, x4] and target
sequence to be [y1, y2].

The encoder reads the entire source sequence

and outputs the hidden state for every timestep,
say h1, h2, h3, h4

The decoder reads the entire target sequence

offset by one timestep and outputs the hidden
state for every timestep, say s1, s2, s3

Target timestep i=1

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 26/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Alignment scores e1j are computed from the

source hidden state hi and target hidden state
s1 using the score function:
e11= score(s1, h1)
e12= score(s1, h2)
e13= score(s1, h3)
e14= score(s1, h4)

Normalizing the alignment scores e1j using

softmax produces attention weights a1j:
a11= exp(e11)/((exp(e11)+exp(e12)+exp(e13)+exp(e14))
a12= exp(e12)/(exp(e11)+exp(e12)+exp(e13)+exp(e14))
a13= exp(e13)/(exp(e11)+exp(e12)+exp(e13)+exp(e14))
a14= exp(e14)/(exp(e11)+exp(e12)+exp(e13)+exp(e14))

Attended context vector C1 is derived by the

linear sum of products of encoder hidden states
hj and alignment scores a1j:
C1= h1 * a11 + h2 * a12 + h3 * a13 + h4 * a14

Attended context vector C1 and target hidden

state s1 are concatenated to produce an
attended hidden vector S1
S1= concatenate([s1; C1])

Attentional hidden vector S1 is then fed into the

dense layer to produce y1
y1= dense(S1)

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 27/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

Target timestep i=2

Alignment scores e2j are computed from the

source hidden state hi and target hidden state
s2 using the score function given by
e21= score(s2, h1)
e22= score(s2, h2)
e23= score(s2, h3)
e24= score(s2, h4)

Normalizing the alignment scores e2j using

softmax produces attention weights a2j:
a21= exp(e21)/(exp(e21)+exp(e22)+exp(e23)+exp(e24))
a22= exp(e22)/(exp(e21)+exp(e22)+exp(e23)+exp(e24))
a23= exp(e23)/(exp(e21)+exp(e22)+exp(e23)+exp(e24))
a24= exp(e24)/(exp(e21)+exp(e22)+exp(e23)+exp(e24))

Attended context vector C2 is derived by the

linear sum of products of encoder hidden states
hi and alignment scores a2j:
C2= h1 * a21 + h2 * a22 + h3 * a23 + h4 * a24

Attended context vector C2 and target hidden

state s2 are concatenated to produce an
attended hidden vector S2
S2= concatenate([s2; C2])

Attended hidden vector S2 is then fed into the

dense layer to produce y2
y2= dense(S2)

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 28/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

We can perform similar steps for target timestep

i=3 to produce y3.

I know this was a heavy dosage of math and

theory but understanding this will now help you to
grasp the underlying idea behind attention
mechanism. This has spawned so many recent
developments in NLP and now you are ready to
make your own mark!

Code

Find the entire notebook here.

Frequently Asked Questions

Q1. What is text summarization in NLP?

A. Text summarization in NLP (Natural Language

Processing) involves using computational
algorithms to automatically condense a large text
document into a shorter summary that captures
its essential information.

Q2. What are the 4 types of summarization?

A. The four types of summarization are: extraction,

abstraction, indicative, and query-based.
Extraction summarizes the most important
information directly from the text. Abstraction
summarizes information in a new way, while
indicative focuses on the main message. Query-
based uses specific questions to generate a
summary.
https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 29/30
10/3/23, 9:45 AM Comprehensive Guide to Text Summarization using Deep Learning in Python

End Notes
Take a deep breath – we’ve covered a lot of
ground in this article. And congratulations on
building your first text summarization model using
deep learning! We have seen how to build our own
text summarizer using Seq2Seq modeling in
Python.

If you have any feedback on this article or any

doubts/queries, kindly share them in the
comments section below and I will get back to
you. And make sure you experiment with the
model we built here and share your results with
the community!

You can also take the below courses to learn or

brush up your NLP skills:

Natural Language Processing (NLP) using

Python
Introduction to Natural Language Processing
(NLP)

https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/ 30/30

Solutions
No ratings yet
Solutions
11 pages
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
No ratings yet
TC6 PROJECT SYNOPSIS KrishShetty VedantLandge 231106 101402
13 pages
Data Representation For Deep Learning - Based Arabic Text Summarization Performance Using Python Results
No ratings yet
Data Representation For Deep Learning - Based Arabic Text Summarization Performance Using Python Results
18 pages
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-14 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-14 Reference-Material-I
13 pages
Seminar Text Summarization 1
No ratings yet
Seminar Text Summarization 1
21 pages
Abstractive Text Summary Generation With Knowledge Graph Representation
No ratings yet
Abstractive Text Summary Generation With Knowledge Graph Representation
9 pages
Text Summarization - Articles - Weights & Biases
No ratings yet
Text Summarization - Articles - Weights & Biases
16 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
43 pages
Group 13 Sem 2 Review 1
No ratings yet
Group 13 Sem 2 Review 1
20 pages
Research Paper 8
No ratings yet
Research Paper 8
4 pages
Deep Recurrent Generative Decoder For Abstractive Text Summarization
No ratings yet
Deep Recurrent Generative Decoder For Abstractive Text Summarization
10 pages
NLP Text Summary
No ratings yet
NLP Text Summary
21 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Searching For Effective Neural Extractive Summarization: What Works and What's Next
No ratings yet
Searching For Effective Neural Extractive Summarization: What Works and What's Next
10 pages
1805 03616 ReinforcedTopicAwareConvS2S PDF
No ratings yet
1805 03616 ReinforcedTopicAwareConvS2S PDF
8 pages
Advanced Text Summarization Techniques: Integrating RNNS, Transformers, and Pca For Enhanced Performance
No ratings yet
Advanced Text Summarization Techniques: Integrating RNNS, Transformers, and Pca For Enhanced Performance
8 pages
Automatic Text Recognisation
No ratings yet
Automatic Text Recognisation
4 pages
Summerization Presentation
No ratings yet
Summerization Presentation
9 pages
IEEE Conference Template 1 PDF
No ratings yet
IEEE Conference Template 1 PDF
3 pages
Inlg 19 TL DR Writeup 4
No ratings yet
Inlg 19 TL DR Writeup 4
7 pages
Don't Give Me The Details, Just The Summary! Topic-Aware Convolutional Neural Networks For Extreme Summarization
No ratings yet
Don't Give Me The Details, Just The Summary! Topic-Aware Convolutional Neural Networks For Extreme Summarization
11 pages
Text Summarization Using NLP Technique
No ratings yet
Text Summarization Using NLP Technique
7 pages
Combination of Abstractive and Extractive Approaches For Summarization of Long Scientific Texts
No ratings yet
Combination of Abstractive and Extractive Approaches For Summarization of Long Scientific Texts
11 pages
11461-Article Text-20356-1-10-20211106
No ratings yet
11461-Article Text-20356-1-10-20211106
5 pages
Ranking Sentences For Extractive Summarization With Reinforcement Learning
No ratings yet
Ranking Sentences For Extractive Summarization With Reinforcement Learning
13 pages
Final Ojt
No ratings yet
Final Ojt
31 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction
No ratings yet
G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction
98 pages
Ir Case Study
No ratings yet
Ir Case Study
8 pages
Implementation of NLP Based Automatic Text Summarization Using Spacy
No ratings yet
Implementation of NLP Based Automatic Text Summarization Using Spacy
15 pages
9 JCS 3
No ratings yet
9 JCS 3
6 pages
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
No ratings yet
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
26 pages
For Seminar
No ratings yet
For Seminar
17 pages
Irsw Project
No ratings yet
Irsw Project
8 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Deep Learning L3
No ratings yet
Deep Learning L3
37 pages
Paper Work
No ratings yet
Paper Work
12 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Day 4
No ratings yet
Day 4
22 pages
J173 Tech-Talk-Sum - Fine-Tuning Extractive Summarization and Enhancing BERT Text Contextualization For Technological Talk Videos
No ratings yet
J173 Tech-Talk-Sum - Fine-Tuning Extractive Summarization and Enhancing BERT Text Contextualization For Technological Talk Videos
18 pages
NLP-Driven Summarization of Local Language Texts
No ratings yet
NLP-Driven Summarization of Local Language Texts
52 pages
CS5984 Final Report
No ratings yet
CS5984 Final Report
57 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
10 pages
Unit5 3
No ratings yet
Unit5 3
48 pages
Text Summarization Using The T5 Transformer Model
No ratings yet
Text Summarization Using The T5 Transformer Model
3 pages
Sentiment Analysis With An Recurrent Neural Networks
No ratings yet
Sentiment Analysis With An Recurrent Neural Networks
12 pages
Final4 W18-2706
No ratings yet
Final4 W18-2706
10 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
No ratings yet
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
26 pages
08 Natural Language Processing in Tensorflow
No ratings yet
08 Natural Language Processing in Tensorflow
29 pages
Project Report
No ratings yet
Project Report
25 pages
Project File
No ratings yet
Project File
23 pages
Extracting Sentences and Words
No ratings yet
Extracting Sentences and Words
11 pages
Sma U-4
No ratings yet
Sma U-4
25 pages
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
No ratings yet
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
7 pages
150 Poster
No ratings yet
150 Poster
1 page
ML For NLP-LO3
No ratings yet
ML For NLP-LO3
61 pages
Beginner's guide to mastering python
From Everand
Beginner's guide to mastering python
Xilis
No ratings yet
An Introduction to Python Programming: A Practical Approach: step-by-step approach to Python programming with machine learning fundamental and theoretical principles.
From Everand
An Introduction to Python Programming: A Practical Approach: step-by-step approach to Python programming with machine learning fundamental and theoretical principles.
Dr. Krishna Kumar Mohbey
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Multimodal Dialogue System Seamless Sign Language To Text and Speech Translation
No ratings yet
Multimodal Dialogue System Seamless Sign Language To Text and Speech Translation
1 page
NLP Unit 1
No ratings yet
NLP Unit 1
18 pages
Cae Questions Paper
No ratings yet
Cae Questions Paper
14 pages
NLP Qb-Ese
No ratings yet
NLP Qb-Ese
2 pages
Prathamesh Ghatole: Experience
No ratings yet
Prathamesh Ghatole: Experience
2 pages
Download textbook Computational Linguistics And Intelligent Text Processing 18Th International Conference Cicling 2017 Budapest Hungary April 17 23 2017 Revised Selected Papers Part I Alexander Gelbukh ebook all chapter pdf
100% (24)
Download textbook Computational Linguistics And Intelligent Text Processing 18Th International Conference Cicling 2017 Budapest Hungary April 17 23 2017 Revised Selected Papers Part I Alexander Gelbukh ebook all chapter pdf
54 pages
Arabic Chatbots A Survey
No ratings yet
Arabic Chatbots A Survey
8 pages
Online and Linear-Time Attention by Enforcing Monotonic Alignments
No ratings yet
Online and Linear-Time Attention by Enforcing Monotonic Alignments
19 pages
Natural Language Processing in Action Understanding Analyzing and Generating Text With Python 1st Edition Hobson Lane Download
100% (1)
Natural Language Processing in Action Understanding Analyzing and Generating Text With Python 1st Edition Hobson Lane Download
46 pages
Class47 49 - AttentionBasedModels Transformers 10 15may2023
No ratings yet
Class47 49 - AttentionBasedModels Transformers 10 15may2023
27 pages
Deep Learning Based Chatbot Architecture For Medical Diagnosis and Treatment Recommendation
No ratings yet
Deep Learning Based Chatbot Architecture For Medical Diagnosis and Treatment Recommendation
6 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
Intenship Report
No ratings yet
Intenship Report
45 pages
Unit 4
No ratings yet
Unit 4
22 pages
Transformers in Finance
No ratings yet
Transformers in Finance
27 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
Dept of Cse
No ratings yet
Dept of Cse
35 pages
Cs224n 2020 Lecture08 NMT
No ratings yet
Cs224n 2020 Lecture08 NMT
77 pages
Progress in Neural NLP Modeling, Learning, and Reasoning
No ratings yet
Progress in Neural NLP Modeling, Learning, and Reasoning
16 pages
Leveraging AI-Powered Chatbots To Enhance Customer
No ratings yet
Leveraging AI-Powered Chatbots To Enhance Customer
26 pages
Chapter 2 Literature Review AI Chatbot Report
No ratings yet
Chapter 2 Literature Review AI Chatbot Report
4 pages
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
No ratings yet
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
21 pages
ChatBot Research Paper 1
No ratings yet
ChatBot Research Paper 1
8 pages
Deep Learning Based Multilingual Speech Synthesis Using Multi Feature Fusion Methods
No ratings yet
Deep Learning Based Multilingual Speech Synthesis Using Multi Feature Fusion Methods
16 pages
Lecture Attention Neural Networks
No ratings yet
Lecture Attention Neural Networks
74 pages
ACL - 2020 - Mike Lewis - BART Denoising Sequence-To-Sequence Pre-Training For Natural Language Generation, Translation, and Comprehension
No ratings yet
ACL - 2020 - Mike Lewis - BART Denoising Sequence-To-Sequence Pre-Training For Natural Language Generation, Translation, and Comprehension
10 pages
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
No ratings yet
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
8 pages
DAA FinalReport
No ratings yet
DAA FinalReport
14 pages
NeurIPS 2020 Retrieval Augmented Generation For Knowledge Intensive NLP Tasks Paper
No ratings yet
NeurIPS 2020 Retrieval Augmented Generation For Knowledge Intensive NLP Tasks Paper
16 pages
English-to-Malayalam Machine Translation Framework Using Transformers
No ratings yet
English-to-Malayalam Machine Translation Framework Using Transformers
5 pages
Deep Learning For Time Series Forecasting: A Survey
No ratings yet
Deep Learning For Time Series Forecasting: A Survey
34 pages
Deep Learning Project
No ratings yet
Deep Learning Project
21 pages
2024 Conversational - AI - An - Explication - of - Few-Shot - Learning - Problem - in - Transformers-Based - Chatbot - Syste
No ratings yet
2024 Conversational - AI - An - Explication - of - Few-Shot - Learning - Problem - in - Transformers-Based - Chatbot - Syste
19 pages
DeepNorm Deep Learning Approach
No ratings yet
DeepNorm Deep Learning Approach
7 pages