Konuralp
Konuralp
1
Get To The Point: Summarization
with Pointer-Generator Networks
Improving Abstractive Summarization through Hybrid Copying and Coverage
Problems :
➢ Repetition
➢ Factual inaccuracy
➢ Out-of-vocabulary (OOV) words
abstractive extractive
Generating words from scratch. Copying words from the input text.
+ Creating more human like + Easier to implement than
summaries abstractive
- Possibility of - Less human like summaries
inaccurate factual details and
repetitive text
- Harder to train and fine-tune
3
What This Paper Proposes
summarization
➢ Generating human-like, factually accurate summaries by seamlessly integrating new word generation with
direct word extraction from the input text
➢ Eliminating out of vocabulary(oov) problem
➢ Eliminating repetitive text generation
4
5
Basic Definitions
Tokenization:
➢ Tokens can be considered as smallest unit of natural language processing. Can be words, subwords, characters and
sentences
Word Embedding:
➢ Word embeddings are numerical vector representations of words, sentences, and documents that capture their
semantic essence. Similar embeddings cluster together, reflecting their related context or meaning
6
Seq2Seq with Attention
➢ An encoder-decoder architecture using bi-LSTM
➢ Encoder processes input tokens into hidden states
➢ Decoder generates summary token-by-token
Role of attention:
➢ By computing attention weights at each t, help decoder to focus on revelants parts of the input
Limitations:
7
Encoders and Decoders
Encoders
➢ Processing the input one by one
➢ Convert each word into a vector representation using a Bi-LSTM
➢ These vectors are called hidden states, including their semantic meaning
Context Vector
➢ A weighted combination of encoder hidden states
➢ Created by applying attention over the encoder outputs
➢ Tells the decoder what part of the input to focus
Attention distribution
➢ A set of scores showing how much focus to put on each word in the input
Vocab distribution
➢ Shows the probability of every word in the vocabulary being the next word
➢ The word with the highest probability is selected, if model decided to generate
Decoders
➢ Generates summary one word at a time
➢ Using previous word, context vector, and past hidden state to predict the next word
8
Seq2Seq
9
Pointer Generator Network
A Hybrid Approach :
➢ For each step a decision is given whether to create a word, or to copy directly from input text
This solves :
10
Pointer Generator Network
11
Coverage Mechanism
➢ By building a coverage vector, it tracks how much attention each word has received for the given time
➢ Using coverage vector to guide attention so it doesn't keep attending to the same words again and again
Role of coverage :
This solves :
➢ Reducing repetition
➢ Better at tracking coverage
➢ Increasing ROUGE & METEOR test performances
12
Results
➢ This graph proves that how the coverage mechanism
effects repetition. Without it, the model keeps repeating
words and phrases. Using coverage, repetition decreases
summaries becomes much more fluent
13
Results
➢ This graph shows how much of the generated summaries
are novel(new) not copied. The pointer-generator model
creates more unique phrases and sentences compared to
the older baseline
14
Results
15