0% found this document useful (0 votes)

48 views

Text Summarization:An Overview: October 2013

The document discusses various methods for automatic text summarization. It describes extractive and abstractive summarization techniques. Extractive techniques select important sentences from the original text while abstractive techniques understand concepts and express them in natural language. The main steps for text summarization are topic identification, interpretation, and summary generation. Popular extractive methods discussed include TF-IDF, cluster-based approaches, graph theoretic approaches, and machine learning classification models.

Uploaded by

Naty Dasilva Jr.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Text Summarization:An Overview: October 2013

Uploaded by

Naty Dasilva Jr.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/257947528

Text Summarization:An Overview

Article · October 2013

CITATIONS READS

13 39,421

3 authors, including:

Samrat Babar
Sanjeevan Engineering and Technology Institute Panhala
3 PUBLICATIONS 97 CITATIONS

SEE PROFILE

All content following this page was uploaded by Samrat Babar on 21 May 2014.

The user has requested enhancement of the downloaded file.

Text Summarization:An Overview
Mr.S.A.Babar,M.Tech-CSE,RIT

1.Abstract:
In this new era,where tremondous information is available on the internet,it is most important to
provide the improved mechanism to extract the information quickly and most efficiently . It is very difficult
for human beings to manually extract the summary of a large documents of text. There are plenty of text
material available on the internet. So there is a problem of searching for relevant documents from the
number of documents available, and absorbing relevant information from it.In order to solve the above two
problems, the automatic text summarization is very much necessary.Text summarization is the process of
identifying the most important meaningful information in a document or set of related documents and
compressing them into a shorter version preserving its overall meanings.

2.Introduction:
Before going to the Text summarization, first we, have to know that what a summary is. A summary is
a text that is produced from one or more texts, that conveys important information in the original text, and it
is of a shorter form. The goal of automatic text summarization is presenting the source text into a shorter
version with semantics.The most important advantage of using a summary is ,it reduces the reading time.
Text Summarization methods can be classified into extractive and abstractive summarization. An extractive
summarization method consists of selecting important sentences, paragraphs etc. from the original document
and concatenating them into shorter form. An Abstractive summarization is an understanding of the main
concepts in a document and then express those concepts in clear natural language.

There are two different groups of text summarization : indicative and informative.Inductive
summarization only represent the main idea of the text to the user. The typical length of this type of
summarization is 5 to 10 percent of the main text.On the other hand, the informative summarization systems
gives concise information of the main text .The length of informative summary is 20 to 30 percent of the
main text .

3.Main steps for text summarization:

There are three main steps for summarizing documents.These are topic identification, interpretation
and summary generation.

3.1. Topic Identificatio:The most prominent information in the text is identified .There are different
techniques for topic identification are used which are Position, Cue Phrases, word
frequency.Methods which are based on the position of phrases are the most useful methods for
topic identification.

3.2. Interpretation :Abstract summaries need to go through interpretation step. In This step,
different subjects are fused in order to form a general content.

3.3. Summary Generation :In this step, the system uses text generation method.

4. Extractive text summarization : This process can be divided into two steps: Pre Processing step and
Processing step. Pre Processing is structured representation of the original text. It usually includes: a)
Sentences boundary identification. In English, sentence boundary is identified with presence of dot at the end
of sentence. b) Stop-Word Elimination—Common words with no semantics c) Stemming—The purpose of
stemming is to obtain the stem or radix of each word, which emphasize its semantics.

In Processing step, features influencing the relevance of sentences are decided and calculated and then
weights are assigned to these features using weight learning method. Final score of each sentence is
determined using Feature-weight equation. Top ranked sentences are selected for final summary.

5. TEXT SUMMARIZATION HISTORY:

In the past, extractive summarizers have been mostly based on scoring sentences in the source
document. The most common and recent text summarization techniques use either statistical approaches, or
linguistic techniques.The high frequency words ,standard keyword ,Cue Method,Title Method ,Location
Method are used for weighting the sentences.

6. FEATURES FOR EXTRACTIVE TEXT SUMMARIZATION :

Most of the current automated text summarization systems use extraction method to produce a
summary .Sentence extraction techniques are commonly used to produce extraction summaries. One of the
methods to obtain suitable sentences is to assign some numerical measure of a sentence for the summary
called sentence scoring and then select the best sentences to form document summary based on
the compression rate. In the extraction method, compression rate is an important factor used to define the
ratio between the length of the summary and the source text. As the compression rate increases, the summary
will be larger, and more insignificant content is contained. While the compression rate decreases the
summary to be short, more information is lost. In fact, when the compression rate is 5-30%, the quality of
summary is acceptable.

7. EXTRACTIVE SUMMARIZATION METHODS :

A. Term Frequency-Inverse Document Frequency (TF-IDF) method:

B. Cluster based method:

C. Graph theoretic approach:

D. Machine Learning approach:

E. Text summarization with neural networks :

F. Automatic text summarization based on fuzzy logic :

7.1 Term Frequency-Inverse Document Frequency (TF-IDF) method:

It is a numerical statistic which reflects how important a word is in a given document.The TF-IDF
value increases proportionally to the number of times a word appears in the document.This method mainly
works in the weighted term-frequency and inverse sentence frequency paradigm .where sentence-frequency is
the number of sentences in the document that contain that term. These sentence vectors are then scored by
similarity to the query and the highest scoring sentences are picked to be part of the summary.Summarization
is query-specific .
The hypothesis assumed by this approach is that if there are ‘‘more specific words’’ in a given
sentence, then the sentence is relatively more important. The target words are usually nouns .This method
performs a comparison between the term frequency (tf) in a document -in this case each sentence is treated
as a document and the document frequency (df), which means the number of times that the word occurs along
all documents. The TF/IDF score is calculated as follows:

7.2 Cluster based method:

In this method, the semantic

nature of a given
document is captured and expressed in natural language by a set of triplets (subjects, verbs, objects related to
each sentence).Cluster these triplets using
similar information. The triplets statements are considered as the basic unit in the process of
summarization.More similar the triplets are, the more the information is useless repeated; thus, a summary
may be constructed using a sequence of sentences related the computed clusters.

Example:

7.3 Graph theoretic approach:

In this technique, there is a node for every sentence . Two sentences are connected with an edge if the
two sentences share some common words, in other words, their similarity is above some threshold. This
representation gives two results :The partitions contained in the graph (that is those sub-graphs that are
unconnected to the other sub graphs), form distinct topics covered in the documents. The second result by
the graph-theoretic method is the identification of the important sentences in the document. The nodes with
high cardinality (number of edges connected to that node), are the important sentences in the partition, and
hence carry higher preference to be included in the summary.
Figure shows an example graph for a document. It can be seen that there are about 3-4 topics in the
document; the nodes that are encircled can be seen to be informative sentences in the document, since they
share information with many other sentences in the document. The graph theoretic method may also be
adapted easily for visualization of inter and intra document similarity.
7.4 Machine Learning approach :
In this method,the training dataset is used for reference and the summarization process is modeled as a
classification problem: sentences are classified as summary sentences and non-summary sentences based on
the features that they possess. The classification probabilities are learnt statistically from the training data,
using Bayes’ rule:

where, s is a sentence from the document collection, F1, F2...FN are features used in classification. S is the
summary to be generated, and P (s∈< S | F1, F2, ..., FN) is the probability that sentence s will be chosen to
form the summary given that it possesses features F1,F2...FN.

7.4 Text summarization with neural networks:

In this method, each document is converted into a list of sentences. Each sentence is represented as a
vector [f1,f2,...,f7], composed of 7 features.

Seven Features of a Document

1) f1 Paragraph follows title

2) f2 Paragraph location in document

3) f3 Sentence location in paragraph

4) f4 First sentence in paragraph

5) f5 Sentence length

6) f6 Number of thematic words in the sentence

7) f7 Number of title words in the sentence

The first phase of the process involves training the neural networks to learn the types of sentences
that should be included in the summary. Once the network has learned the features that must exist in
summary sentences, we need to discover the trends and relationships among the features that are inherent in
the majority of sentences. This is accomplished by the feature fusion phase, which consists of two steps: 1)
eliminating uncommon features; and 2) collapsing the effects of common features.

7. 5 Automatic text summarization based on fuzzy logic:

This method considers each characteristic of a text such as sentence length, similarity to little,
similarity to key word and etc. as the input of fuzzy system .Then, it enters all the rules needed for
summarization, in the knowledge base of system. After ward, a value from zero to one is obtained for each
sentence in the output based on sentence characteristics and the available rules in the knowledge base. The
obtained value in the output determines the degree of the importance of the sentence in the final summary.
The input membership function for each feature is divided into three membership functions which are
composed of insignificant values (low L), very low (VL), medium (M), significant values (High h) and very
high (VH). The important sentences are extracted using IF-THEN rules according to the feature criteria.
The fuzzy logic system consists of four components:
fuzzifier, inference engine, defuzzifier, and the fuzzy
knowledge base. In the fuzzifier, crisp inputs are
translated into linguistic values using a membership
function to be used to the input linguistic variables.
After fuzzification, the inference engine refers to the
rule base containing fuzzy IFTHEN rules to derive the
linguistic values. In the last step, the output linguistic
variables from the inference are converted to the final
crisp values by the defuzzifier using membership
function for representing the final sentence score.

8.Evaluating the summarization systems:

Summary evaluation is a very important aspect for text summarization. Generally, summaries can be
evaluated using intrinsic or extrinsic measures. While intrinsic methods attempt to measure summary quality
using human evaluation and extrinsic methods measure the same through a task-based performance measure
such the information retrieval-oriented task.
Evaluation methods are useful in evaluating the
usefulness and trustfulness of the summary. Evaluating the qualities like comprehensibility, coherence, and
readability is really difficult. System evaluation might be performed manually(gold standard) by experts .To
measure the quality of summary,the manually expert system is used. The qualitative evaluation is done by
counting the numbers of sentences selected by the system that match with the human gold standard. To
measure the quantittative assessment of the summary the ROUGE evaluator tool is used which consist of
precision, recall and F-measure .

9.Conclusion:
Automatic text summarization is an old challenge but the current research direction diverts towards
emerging trends in biomedicine, product review, education domains, emails and blogs. This is due to the fact
that there is information overload in these areas, especially on the World Wide Web.Automated summarization
is an important area in NLP (Natural Language Processing) research. It consists of automatically creating a
summary of one or more texts. The purpose of extractive document summarization is to automatically select a
number of indicative sentences, passages, or paragraphs from the original document .Text summarization
approaches based on Neural Network, Graph Theoretic, Fuzzy and Cluster have, to an extent, succeeded in
making an effective summary of a document.Both extractive and abstractive methods have been researched.
Most summarization techniques are based on extractive methods. Abstractive method is similar to summaries
made by humans. Abstractive summarization as of now requires heavy machinery for language generation and
is difficult to replicate into the domain specific areas.

View publication stats

Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
No ratings yet
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
6 pages
Text Summarization Using Word Frequency
No ratings yet
Text Summarization Using Word Frequency
3 pages
Abriefoverviewofautomaticdocument Summarization: Abhishek Sathe
No ratings yet
Abriefoverviewofautomaticdocument Summarization: Abhishek Sathe
2 pages
An Overall Survey of Extractive Based Automatic Text Summarization Methods
No ratings yet
An Overall Survey of Extractive Based Automatic Text Summarization Methods
6 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
Analysis of Abstractive and Extractive Summarizati
No ratings yet
Analysis of Abstractive and Extractive Summarizati
11 pages
A.V.C. College of Engineering: Mayiladuthurai, Mannampandal-609 305
No ratings yet
A.V.C. College of Engineering: Mayiladuthurai, Mannampandal-609 305
21 pages
Amitkv Lit Survey Summarization
No ratings yet
Amitkv Lit Survey Summarization
15 pages
Automatic Text Summarization Using Python
No ratings yet
Automatic Text Summarization Using Python
8 pages
Comparative Study of Text Summarization Methods
No ratings yet
Comparative Study of Text Summarization Methods
6 pages
A Graph Based Approach On Extractive Summarization
No ratings yet
A Graph Based Approach On Extractive Summarization
9 pages
An Overview of Extractive Based Automati
No ratings yet
An Overview of Extractive Based Automati
12 pages
Survey of Scientific Document Summa
No ratings yet
Survey of Scientific Document Summa
37 pages
Robin 3 PDF
No ratings yet
Robin 3 PDF
6 pages
Abstractive Survey
No ratings yet
Abstractive Survey
8 pages
A Review Paper On Extractive Techniques of Text Summarization
No ratings yet
A Review Paper On Extractive Techniques of Text Summarization
4 pages
Feature Based Automatic Text Summarization Methods a Comprehensive State-Of-The-Art Survey
No ratings yet
Feature Based Automatic Text Summarization Methods a Comprehensive State-Of-The-Art Survey
23 pages
An Automatic Text Summarization Using Feature Terms For Relevance Measure
No ratings yet
An Automatic Text Summarization Using Feature Terms For Relevance Measure
5 pages
Paper A Survey On ETS
No ratings yet
Paper A Survey On ETS
6 pages
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
No ratings yet
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
8 pages
Types of Extractive Methods
No ratings yet
Types of Extractive Methods
22 pages
(IJCST-V3I4P21) : Ms - Pallavi.D.Patil, P.M.Mane
No ratings yet
(IJCST-V3I4P21) : Ms - Pallavi.D.Patil, P.M.Mane
7 pages
Optimal Features Set For Extractive Automatic Text Summarization
No ratings yet
Optimal Features Set For Extractive Automatic Text Summarization
6 pages
Extractive Text Summarization Using Word Frequency
No ratings yet
Extractive Text Summarization Using Word Frequency
6 pages
BP15
No ratings yet
BP15
15 pages
Automatic Summarisation II: Methods
No ratings yet
Automatic Summarisation II: Methods
84 pages
Automatic Text Document Summarization Based On Machine Learning
No ratings yet
Automatic Text Document Summarization Based On Machine Learning
4 pages
An Extractive Approach for English Text
No ratings yet
An Extractive Approach for English Text
11 pages
Features Selection and Weight Learning For Punjabi Text Summarization
No ratings yet
Features Selection and Weight Learning For Punjabi Text Summarization
4 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
A Survey On Abstractive Text Summarization
No ratings yet
A Survey On Abstractive Text Summarization
7 pages
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
No ratings yet
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
12 pages
22mca025 22mca032 22mca034
No ratings yet
22mca025 22mca032 22mca034
14 pages
Assessing Sentence Scoring Techniques Fo
No ratings yet
Assessing Sentence Scoring Techniques Fo
10 pages
Coas Ojit 0502 03065k
No ratings yet
Coas Ojit 0502 03065k
16 pages
RVVM
No ratings yet
RVVM
9 pages
Abstractive Text Summarization: State of The Art, Challenges, and Improvements
No ratings yet
Abstractive Text Summarization: State of The Art, Challenges, and Improvements
38 pages
Document Summarization: Abhirut Gupta Mandar Joshi Piyush Dungarwal
No ratings yet
Document Summarization: Abhirut Gupta Mandar Joshi Piyush Dungarwal
47 pages
1 Extractive Text Summarization Technique Based Fuzzy Membership Calculation Using Roughsets
No ratings yet
1 Extractive Text Summarization Technique Based Fuzzy Membership Calculation Using Roughsets
15 pages
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
No ratings yet
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
13 pages
Malayalam 2
No ratings yet
Malayalam 2
4 pages
ASWIN_TS_summarisation_of_NLP_simplified_notes_unit_3[1]
No ratings yet
ASWIN_TS_summarisation_of_NLP_simplified_notes_unit_3[1]
4 pages
Multi-Document Extractive Summarization For News Page 1 of 59
No ratings yet
Multi-Document Extractive Summarization For News Page 1 of 59
59 pages
5bbb PDF
No ratings yet
5bbb PDF
6 pages
State of The Art Text - Summarisation
No ratings yet
State of The Art Text - Summarisation
15 pages
FALLSEM2024-25_BCSE409L_TH_VL2024250101879_2024-11-14_Reference-Material-I
No ratings yet
FALLSEM2024-25_BCSE409L_TH_VL2024250101879_2024-11-14_Reference-Material-I
13 pages
A Statistical Approach To Perform Web Based Summarization: Kirti Bhatia, Dr. Rajendar Chhillar
No ratings yet
A Statistical Approach To Perform Web Based Summarization: Kirti Bhatia, Dr. Rajendar Chhillar
3 pages
Three Member Presentation
No ratings yet
Three Member Presentation
141 pages
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
29 pages
Paper 09
No ratings yet
Paper 09
6 pages
A Comparative Study On Text Summarization Methods: Abstract
No ratings yet
A Comparative Study On Text Summarization Methods: Abstract
7 pages
IEEE_Conference_Template__3_
No ratings yet
IEEE_Conference_Template__3_
4 pages
Text Summarization Using Python NLTK
No ratings yet
Text Summarization Using Python NLTK
8 pages
Research Paper Summer Izer
No ratings yet
Research Paper Summer Izer
6 pages
150
No ratings yet
150
6 pages
Recent Approaches For Text Summarization
No ratings yet
Recent Approaches For Text Summarization
13 pages
37 127 1 PB PDF
No ratings yet
37 127 1 PB PDF
25 pages
Research Paper On Text
No ratings yet
Research Paper On Text
7 pages
Text Summarization
No ratings yet
Text Summarization
3 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter - 3 MOSFET Working Operation - 2
No ratings yet
Chapter - 3 MOSFET Working Operation - 2
59 pages
Tsreport
No ratings yet
Tsreport
25 pages
MIT18 440S14 ProblemSet7
No ratings yet
MIT18 440S14 ProblemSet7
4 pages
Lecture 4: Counting, Pigeonhole Principle, Permutations, Combinations
No ratings yet
Lecture 4: Counting, Pigeonhole Principle, Permutations, Combinations
56 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
Comparative Study of Text Summarization Methods
No ratings yet
Comparative Study of Text Summarization Methods
6 pages
Tag Questions Game
No ratings yet
Tag Questions Game
3 pages
Pdfcoffee.com Reading and Writing Skills Module 1 Shs PDF Free
No ratings yet
Pdfcoffee.com Reading and Writing Skills Module 1 Shs PDF Free
18 pages
Mini Project
No ratings yet
Mini Project
19 pages
Resource
No ratings yet
Resource
6 pages
Dowden, B Logical Reasoning Cap 3 y 6
No ratings yet
Dowden, B Logical Reasoning Cap 3 y 6
80 pages
Week 7
No ratings yet
Week 7
8 pages
(eBook PDF) Mosaicos: Spanish as a World Language 7th Edition download
100% (1)
(eBook PDF) Mosaicos: Spanish as a World Language 7th Edition download
47 pages
b2 Writing Part 1 Fashion Worksheet
No ratings yet
b2 Writing Part 1 Fashion Worksheet
8 pages
Future Tense Review_ English ESL Worksheets PDF & Doc
No ratings yet
Future Tense Review_ English ESL Worksheets PDF & Doc
1 page
Small Talk' - Developing Fluency, Accuracy, and Complexity in Speaking
No ratings yet
Small Talk' - Developing Fluency, Accuracy, and Complexity in Speaking
12 pages
Simple, Compound, and Complex
No ratings yet
Simple, Compound, and Complex
20 pages
English Language Essay Writing Competition Concept Paper 2022
No ratings yet
English Language Essay Writing Competition Concept Paper 2022
10 pages
English As Second Language Excercise
No ratings yet
English As Second Language Excercise
16 pages
TCCA SI 500-018 - Transfer of Design Approval Documents To A New Holder
No ratings yet
TCCA SI 500-018 - Transfer of Design Approval Documents To A New Holder
23 pages
General English (Course Audit)_tintero
No ratings yet
General English (Course Audit)_tintero
202 pages
IHWO Activity Pack - Level C2: Reeling Turtles
No ratings yet
IHWO Activity Pack - Level C2: Reeling Turtles
6 pages
Examen Ingles
No ratings yet
Examen Ingles
2 pages
Topic Vocabulary: Change: Adapt
No ratings yet
Topic Vocabulary: Change: Adapt
14 pages
Grammar Estructuras
No ratings yet
Grammar Estructuras
6 pages
Analytical Reasoning
100% (1)
Analytical Reasoning
44 pages
Soal & Pembahasan TOEFL by Cliff
No ratings yet
Soal & Pembahasan TOEFL by Cliff
18 pages
Modals and Semi-Modals
No ratings yet
Modals and Semi-Modals
25 pages
application-guide-for-website_16_12
No ratings yet
application-guide-for-website_16_12
3 pages
21909-Article Text-78687-91675-10-20231105
No ratings yet
21909-Article Text-78687-91675-10-20231105
17 pages
2023 Monthly Progress Report MPR
No ratings yet
2023 Monthly Progress Report MPR
2 pages
Individual Work - Revised-Pion, Michelle B - Marungko and Claveria A Comparative Study
No ratings yet
Individual Work - Revised-Pion, Michelle B - Marungko and Claveria A Comparative Study
10 pages
Tcs Employment Application Form
No ratings yet
Tcs Employment Application Form
5 pages
RIZAL-Barcelona To Berlin Group 1 EDUCATION
No ratings yet
RIZAL-Barcelona To Berlin Group 1 EDUCATION
35 pages
Outcomes Pre Int Unit1
No ratings yet
Outcomes Pre Int Unit1
8 pages
Speech Sound Disorders (Ideal Therapy Plan)
100% (1)
Speech Sound Disorders (Ideal Therapy Plan)
54 pages

Text Summarization:An Overview: October 2013

Uploaded by

Text Summarization:An Overview: October 2013

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Text Summarization:An Overview

Article · October 2013

The user has requested enhancement of the downloaded file.

3.Main steps for text summarization:

5. TEXT SUMMARIZATION HISTORY:

6. FEATURES FOR EXTRACTIVE TEXT SUMMARIZATION :

7. EXTRACTIVE SUMMARIZATION METHODS :

A. Term Frequency-Inverse Document Frequency (TF-IDF) method:

B. Cluster based method:

C. Graph theoretic approach:

D. Machine Learning approach:

E. Text summarization with neural networks :

F. Automatic text summarization based on fuzzy logic :

7.1 Term Frequency-Inverse Document Frequency (TF-IDF) method:

7.2 Cluster based method:

In this method, the semantic

7.3 Graph theoretic approach:

7.4 Text summarization with neural networks:

Seven Features of a Document

1) f1 Paragraph follows title

2) f2 Paragraph location in document

3) f3 Sentence location in paragraph

4) f4 First sentence in paragraph

6) f6 Number of thematic words in the sentence

7) f7 Number of title words in the sentence

7. 5 Automatic text summarization based on fuzzy logic:

8.Evaluating the summarization systems:

View publication stats

You might also like