BERT Summarization MP IA1
BERT Summarization MP IA1
Under Guidance of
Dr. Hiren Thakkar
Introduction
● Text summarization has become a prominent subject in research,with many seeking
ways to enhance its effectiveness.We aim to contribute to this field by making use of
the power of BERT, a transformer model, to create concise and high-quality text
summaries
● Our task involves generating extractive summaries, which retain important sentences
from the original text. These text summaries serve as valuable tools for various users
and researchers, enabling them to quickly get key ideas without needing to go
through the entire long documents.
● By highlighting essential keywords and ideas, our approach simplifies information
retrieval, saving time and effort for those seeking to extract crucial information from
extensive texts.
● Through our work, we want to advance the accessibility and efficiency of text
summarization, so that important insights are easier to find and use.
Summarization
● Summarization in NLP(Natural Language Processing) shortens different types of information
(text, audio, video) into shorter versions while retaining main ideas.
● Text summarization focuses on shortening written content while Audio summarization shortens
spoken content, helping to capture essential information from recordings or conversations, and
Video summarization processes video content, extracting important scenes or segments to provide
a quick overview.
● Summarization helps in saving time and accessing crucial information efficiently, benefiting
various tasks such as decision-making, research, and communication.
● However, there are some drawbacks for summarization such as : the quality of summaries can
change depending on how complicated the original material is and how well the summarization
method works, various summarization methods may work better for certain types of content,
making it hard to find one method that works for everything, some summaries may introduce
errors or miss out some contexts like go out of topic with some unnecessary information.
Text Summarization
● This refer to the process of shortening a set of data or written content computationally
and create a sample (summary) that represents the most important or relevant
information from the original content.
● This project focuses on the text summarization which can be further classified into
extractive and abstractive text summarization.
● With extractive text summarization we extract sample sentences from the original text
and be included in our summary generated while abstractive text summarization is
advanced technique that generate concise summary with core information without
necessary using the sentences from original text.
● Financial research, social media marketing, search engine, email filtering, E-Commerce
products review those are some domains in which Text summarization is used mostly.
● Some drawbacks of text summarization are: loss of context(some algorithms), difficulty
with ambiguity, loss of details, biased summaries, and difficulty with long documents.
Extractive Summarization
❏ The extractive approach to text summarization identifies and extracts key phrases and
sentences from a document.
❏ These elements are then combined to create a concise summary that faithfully
presents the main points of the original text.
❏ All words and phrases in the extractive summary are directly come from the source
material.
Techniques for Extractive text summarization
❏ Lex-Rank
Graph-based approach that represents text as a graph with sentences as nodes and word co-
occurrences as edges. Ranks nodes based on PageRank centrality, highlighting semantically
significant sentences.
But one major weakness, it is sensitive to parameter tuning for PageRank algorithm. May struggle
with polysemy (multiple meanings of words) and long-range dependencies.
❏ Frequency based algorithm
Calculates Term Frequency (TF) of words and ranks sentences based on their sum or average TF
Drawbacks: Naïve approach, neglects semantic relationships and sentence structure. Prone to
redundancy and factual summarization without capturing the essence of the text
❏ Luhn's algorithm
Combines TF-IDF weighting with sentence position bias. Assigns higher scores to sentences with
high TF-IDF keywords appearing earlier in the text.
Drawbacks: Inherits limitations of TF-IDF, overweights keywords and early sentences. Sensitive
to term selection and proximity weights. May miss important information later in the text.
BERT(Bidirectional Encoder Representations from
Transformers)
❏ BERT, a powerful transformer language processing model, is now being used in Google Search for
natural language understanding.
❏ Bert is bidirectional model that runs from right to left and from left to right in order to understand
the meaning of language or context of give input.
❏ If we stack the encoders we get bert but if we stack decoders we get GPT and bert was trained on a
very large corpus of data set such as Wikipedia and bookcorpus which makes bert so powerful.
❏ Bert works in two ways such as pre-training that involves masked Language model, Next Sentence
Predictions and Fine tuning in which bert language model solve the NLP tasks.
❏ In the Masked Language Model, bert tends to mask 15% of words in given sentence with [Mask]
token and then used our model to understand the relationship between words and generate the
original words to replace [Mask] tokens.
❏ In Next Sentence Prediction, bert language model tends to confirms the relationship between
sentences by knowing of the sentence was following another or is precedence of another sentence
like know if sentence B follows sentence A or sentence is precedence of sentence B.
Continued . . .
● Bert is available into sizes such as bert
base with 12 encoder layers and bert
large with 24 encoder layers.
● Fine tuning works on pre-trained dataset
to solve the NLP tasks like text
summarization where uses BertSum, to
add summarization layer that can utilize
the context vectors of words(bert’s
output) and generate the context of
sentences to go into final summary.
Summarization with BERT
1. Preprocessing:
Tokenization: The input text is segmented into individual words and special tokens, akin to dissecting text
into its fundamental building blocks.
Embedding: Each token is mapped to a high-dimensional vector, capturing its semantic meaning and
relationship to other tokens.
2. Sentence Encoding
Sentence Representation: Each sentence is transformed into a dense vector encapsulating its essential
information. Think of it as creating a succinct synopsis for each sen tence.
Continue….
3. Sentence Scoring:
Attention Mechanism: BERT assigns attention weights to different parts of each sentence,
emphasizing crucial information analogous to highlighting key passages while reading.
Score Calculation: A score reflecting the sentence's significance for the summary is computed based
on its encoded representation and attention weights. Higher scores are awarded to sentences rich in
relevant information.
4. Summary Generation:
Ranking and Selection: Sentences are ranked based on their calculated scores. Top-ranked sentences,
akin to the most relevant articles in a vast library, are chosen for inclusion in the summary.
Conciseness and Coherence: BERT ensures the chosen sentences are diverse, non-redundant, and
form a cohesive narrative resembling a well-structured abstract.
Basic Implementation
This code snippet condenses text using a pre-trained language model (BERT). It creates a
shorter, informative summary (around 150 words) capturing key points. This helps users
quickly grasp the essence of the text, boosting efficiency and access to information.
Scope of Future
● Increase the summarization accuracy
● Make the text summarization for other languages other than English.
● Implement our own powerful Bert model for the text summarization.
● Working on the research paper for our major project
Conclusion
● By concluding, we have got enough understanding about our problem statement and
all basics skills required to implement our major project.
● we focus on improving text summarization using BERT, a powerful model. We aim
to create short and high-quality summaries of texts by making text summarization
more accessible and efficient, ensuring that valuable insights are easier to discover
and use
● Through our team work and continuously working closer with our mentor we will be
able to achieve our goals for this major.
References
1. Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding. ArXiv. /abs/1810.04805
2. Automatic Text Summarization Using Term Frequency, Luhn’s Heuristic, and Cosine
Similarity Approaches. (n.d.). Automatic Text Summarization Using Term Frequency,
Luhn’s Heuristic, and Cosine Similarity Approaches | IEEE Conference Publication |
IEEE Xplore. https://ieeexplore.ieee.org/document/10188527
Thank You !