0% found this document useful (0 votes)
28 views6 pages

Optimal Features Set For Extractive Automatic Text Summarization

This document discusses optimal feature sets for extractive automatic text summarization. It analyzes the results of testing all possible combinations of seven features on documents from the DUC 2002 dataset. The paper reviews related work on extractive summarization and discusses various word-level, sentence-level, and paragraph-level features proposed by researchers for scoring sentences and extracting summaries.

Uploaded by

Madara maximus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Optimal Features Set For Extractive Automatic Text Summarization

This document discusses optimal feature sets for extractive automatic text summarization. It analyzes the results of testing all possible combinations of seven features on documents from the DUC 2002 dataset. The paper reviews related work on extractive summarization and discusses various word-level, sentence-level, and paragraph-level features proposed by researchers for scoring sentences and extracting summaries.

Uploaded by

Madara maximus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2015 Fifth International Conference on Advanced Computing & Communication Technologies

Optimal Features Set For Extractive Automatic Text


Summarization

Yogesh Kumar Meena, Peeyush Deolia Dinesh Gopalani


Department of Computer Science & Engineering Department of Computer Science & Engineering
Malaviya national Institute of Technology Malaviya national Institute of Technology
Jaipur, India Jaipur, India
[email protected], [email protected] [email protected]

Abstract- The goal of text summarization is to reduce the size


of the text while preserving its important information and overall tool is created that can create abstractive summary then it will
meaning. With the availability of internet, data is growing leaps perform much better than extractive tools. On the other hand in
and bounds and it is practically impossible summarizing all this
data manually. Automatic summarization can be classified as
extractive summarization the goal is to select words or
extractive and abstractive summarization. For abstractive sentences or paragraphs which contains maximum information
summarization we need to understand the meaning of the text about the topics being discussed in document. For this we need
and then create a shorter version which best expresses the to somehow give each sentence(in case we are selecting
meaning, While in extractive summarization we select sentences sentences) some weight. For that we need some features on the
from given data itself which contains maximum information and basis of which we can give these sentences some scores.
fuse those sentences to create an extractive summary. In this
Plenty of features have been proposed till date, out of which
paper we tested all possible combinations of seven features and
then reported the best one for particular document. We analyzed we will talk about all the important features. Extractive text
the results for all 10 documents taken from DUC 2002 dataset summarization mainly has three steps named as preprocessing,
using ROUGE evaluation matrices. sentence ranking and summary generation. In preprocessing
step the document is represented in structured way, for
Keywords—Extractive;Abstractive;Text Summarization; Term example we can break the document into sentences if we want
Frequency; Feature.
to work upon it with sentence scoring features. Then comes
stop words removal in this step we remove all the insignificant
I. INTRODUCTION words from the sentences that we are going to score. Then
As the WWW is becoming popular the amount of data has final step in preprocessing is stemming, in this step we stem
been increasing beyond limits, and it is not possible for the words in sentences to find the root words. In the next step
humans to summarize this huge amount of data, even though which is sentence ranking, according to the features used we
many times it is required that we need to get the information calculate weights of each and every sentence. The features can
without wasting much amount of time. Hence an automatic be applied individually or collectively. And finally out of these
process i.e. Automatic Text Summarization to solve this scored sentences we can select the sentences with maximum
problem. The purpose of Automatic Text Summarization is to scores to be included in our summary. In extractive
generate summary from a single document or bunch of summarization sentences are as it is extracted and hence long
documents using a machine. It should express whole content in sentences can be a hurdle in creating short summaries, while in
minimum number of words without losing its information abstractive summarization we can take care of length of
content. A good summarization system should keep the size to sentences. Some time no individual sentences carry good
the minimum whilst keeping all the relevant information in the enough information, in this case abstractive summarization is
document. These tools can alternatively be used for topic much better than extractive summarization. The sentences in
identification or keyword extraction as well. An example of Extractive summaries are not well linked to each other hence
summarizer can be Google Chrome's summarizer which finds the summary is less readable. The major problem with
long comments on reddit and summarizes them online. There abstractive techniques is that it needs a technology which
are broadly two categories of text summarization, Abstractive today's machines are not capable of. Even now we do not have
summarization and Extractive summarization. Abstractive text the technology to generate natural language. And hence it is
summarization is understanding the meaning of text and then nearly impossible to create a pure abstractive summarizer as of
generating a summary out of it. It mainly requires natural now. Rest of the paper is organized as, in section II we discuss
language generation technique which itself is a growing field, related work in the area of extractive text summarization.
so the programs that can create an abstractive summary are Section III we discuss various feature proposed by researchers.
harder to develop as compared to extractive ones. However if a In section IV we discuss proposed feature combinations.

2327-0659/15 $31.00 © 2015 IEEE 35


DOI 10.1109/ACCT.2015.123
Section V discusses results and performance evaluation. In techniques along with evolutionary algorithms for final
section VI we conclude the paper. summary generation. All above researchers scored sentences
on the basis of some features. Most of the researchers gave
II. RELATED WORK equal weight to the all features included for sentence scoring.
In this section we will review work done so far in the area Rafael et al [27] in 2014 tried the combination s of different
of extractive automatic text summarization. The very first time word, sentence and graph level algorithms for sentence
someone suggested use of computer for summarization of scoring. In 2014 Meena and Gopalani [28] reviewed 22
documents was in 1950's, Luhn [1] in his paper suggested an features from available sources and analyzed the results on
extractive method using keyword frequency feature. This different combinations of features. We will discuss about these
paper is base for all the related work that has been done till features in detail in next segment.
date. After this in 1969 Edmundson [4] suggested some new
methods namely cue words, title word and sentence location. III. FEATURES PROPOSED BY RESERCHERS
Church et al [7] in 1995 suggested some new features namely In this section we will discuss various features proposed by
sentence length cut-off, fixed phrase, paragraph feature, researchers to score the sentences. Various word level,
thematic word feature, upper case word feature. Another new sentence levels, paragraph level features are proposed by
paper in the same year suggested using signature word feature. researchers.
Rush et al [17] in 1971 used sentence rejection methodology
using different set of rules. Later it was used in 1975 for A. Term Level Features
generation of chemical abstracts [24]. Later in 1997 Hovy and
1. Term Frequency [1]: The most frequent words in the
Lin [18] experimented that position method cannot produce document, after stemming and stop words removal, are the
efficient summaries for all domains. In 2001 MEAD [19] was most important words according to this feature. Sentences are
developed that used features such as TF/IDF, Sentence scored according to the presence of keywords in them. This
Location, Cue Words and Longest Common Subsequences for feature has been used widely in many approaches. In TF-IDF
sentence scoring. Nobata et al 2001 [23] used sentence [2] term frequency is an alternative of word frequency only.
location, sentence length, TF/IDF values of words, headline Term frequency is also used to calculate signature words in
similarity and query to score significant sentences. They [3].
compared their method with TF-based and Lead-based
methods. Varma et al 2005[21] used features such as sentence 2. Cue Words [4]: The cue method is based on the hypothesis
position, presence of verb in sentence, referencing pronouns, that the sentences which contains pragmatic words such as
length of the sentence, term frequency, length of words, parts "Significant", "impossible", "hardly" etc are more probable to
of speech tag, familiarity of the word, named entity tag, be included in the summary. This method uses a pre stored
occurrence as heading or subheading and font style to score dictionary of cue words which are discovered from manual
the sentences. Fattah and Ren 2009 [6] proposed trainable summaries. This list also contains negative keywords which
models genetic algorithm, mathematical regression, feed have negative relevance in summary. Sentences containing
forward neural networks, probabilistic neural network and these words are given negative weights.
Gaussian mixture model for text summarization. They used
various features like sentence position, positive keyword, 3. Title Similarity [4]: A Title method suggests that the
negative keyword, sentence centrality, sentence resemblance sentences which contain words from the title are the most
to the title, named entity in sentence, numerical data, sentence important. This method is based on the hypothesis that an
author conceives the title as circumscribing the subject matter
relative length, busy path of sentences and aggregate similarity
of the document.
for the purpose of training the model for text summarization.
Prasad & Kulakarni 2010[25] used word similarity among
4. Uppercase word feature [5]: Uppercase words are often
paragraph, word similarity among sentence, iterative query more important than other words these uppercase words are
score, format based score, numerical data, cue words, term acronyms of pronouns, hence they convey some important
frequency, thematic feature and tile similarity as features to information, and hence the sentences containing uppercase
score the sentences. The applied some evolutionary words are given higher score.
approaches as well for summary production. Abuobieda et al
2012[20] used title feature, sentence length, sentence position, 5. Positive Keyword [6]: The keywords frequently occur in
numerical data and thematic words as features for scoring summary should be given higher weight similar to cue words.
sentences. They used genetic algorithms to final design of the
feature space. Rafael et al [26] in 2012 deeply analyzed all the 6. Negative Keyword [6]: The keywords frequently never
sentence scoring features using ROUGUE [29] evaluation occur in summary should be given negative weight and
matrices. Mendoza et al 2014[22] used sentence position, title sentence containing them should be excluded from the final
similarity, sentence length, cohesion and coverage as the summary.
features of the objective function. They used optimization

36
7. Residual IDF [7]: Instead of simple IDF, RIDF can be 2. Semantic structure [12]: This feature suggests that the
better criteria for selecting sentences. RIDF is basically the sentence that has maximum sentences related to it is the most
residual IDF of a term in a given document, i.e. the difference important sentence. The sentences are considered as nodes of
between observed IDF and expected IDF under the a graph and related sentences (the sentences that have
assumption that the term follows an independent distribution common terms) have edges between them. The sentence with
model such as Poisson distribution model. maximum number of edges is the most important sentence and
so on.
8. Gain [8]: Luhn in his first paper stated that words with
medium frequency are most important, and Gain is a feature 3. Sentence Length Cut Off [5]: Too short and Too long
that uses this hypothesis. In simple IDF words that occur very sentences are not included in summary, because too short
scarcely in the corpora gets very high score, but there sentences are not significant and too long sentences increase
importance is less. Gain overcomes this weakness of IDF by the length of summary unnecessarily. A threshold can be fixed
introducing a new formula which views the optimal gain (e.g. not less than 5 words and not more than 40 words).
associated with the feature as word's importance.
4. Fixed Phrase [5]: Sentences which contains some fixed
9. Term Co-occurrence [9]: This method is somewhat similar phrases are given priority. These phrases are also known as
to the keyword frequency feature but the difference is that Indicator phrases, and these are usually 2 words long (e.g. "In
instead of assigning weights to particular terms we locate conclusion", "My opinion" etc).
clusters of important words within sentences and accordingly
weights are assigned to them. According to the paper, in a 5. Concept Signature [2]: Concept signature uses Word co-
document with 25-30 sentences reasonable term occurrence occurrence feature to extract topic words with list of
for establishing a term as important are 7 sentences. associated (keyword weight) pairs. It is based on the
hypothesis that when a concept is important in the document, a
10. Query Score [9]: It is believed that the more number of set of words will co-occur fairly predictably. This feature is
query terms are present in the sentence the more important the also used in IR systems for query expansion.
sentence is. The scores are computed in a manner that the
sentence containing more number of terms from the query gets 6. Concept Count [13]: This feature suggests that we should
higher scores. count the occurrence of concepts rather than individual verbs
and nouns. Along with the count of verbs and nouns we also
11. Word Co-occurrence [10]: Same word appearing in use this feature, the verbs and nouns which carry the same
different text units. There are two variations of this feature one meaning form a single concept. It is called a concept sent.
is without stemming and another is after applying stemming. Wordnet synonyms, hypernyms, hyponyms can be used for
In documents where it is less probable that a single word will this matter.
be used for different meanings it is better to use stemming.
7. Maximum Marginal Relevance (MMR) [14]: This feature is
12. Matching Noun Phrases [10]: Tools can be used to identify used in query based summarizer. It is known that a
simplex noun phrases, and match the ones with same header. summarized document should contain minimum redundancy.
MMR helps in attaining this goal. In query based summarizers
13. Wordnet Synonyms [10]: The synonyms are matched with the sentences are arranged according to maximum similarity
the help of synset provided by wordnet. The use of synonyms with the query, however in MMR provides a linear
increases the similarity between sentences which contain same combination of relevance and novelty. In MMR a sentences is
words but as synonyms. important if it is both similar to the query and not similar to
other sentences in the document.
14. Word Significant Score [11]: This feature calculates
relative significance of words in a given document. It is 8. Page rank [15]: It is based on one of the most popular
somewhat similar to TF-IDF feature. ranking algorithms used for web link analysis. The graph
weights are calculated using cosine similarity between
15. Title Similarity Revised [11]: In this feature method the sentences, and PageRank algorithm is applied on this graph,
sentence score is calculated as square of number of common which in turn gives you a sequence of most important
terms in the sentence and title of the document. sentences. The algorithm starts with assigning arbitrary values
to each node, the computation iterates until convergence
below a given threshold is achieved
B. Sentence Level Features
1. Sentence location [4]: This hypothesis suggests that the
most important sentences of the document occur very early or C. Paragraph Level Features
very late in the text. Weights are assigned according to the 1. Paragraph Feature [5]: This feature records information for
position of sentence in the text. initial ten paragraphs and last 5 paragraphs. Sentences in
paragraphs are scored according to their position in the

37
paragraph whether they are paragraph-initial or paragraph- IV. PROPOSED WORK
final or paragraph-medial. In our proposed work we have tried all combinations of the
features available with us. We used six important features for
2. Optimal Position Policy [2]: This feature is somewhat initial work. These features are TF-IDF (F1), Co-occurrence
similar to Sentence location feature, but it is more systematic. (F2), Sentence Centrality (F3), Sentence Location (F4),
After evaluating 13000 newspaper articles they found a Named Entity (F5) and Proper Noun (F6). These features are
sequence of most important sentences. The title is most already described in previous section. These features generate
important sentence and is most likely to bear the main topic. the 63 number of combinations as given in Table I.

3. Paragraph Position of Sentences [16]: Each sentence is TABLE I. NO OF COMBINATIONS


given a weight according to its position in the paragraph.
Combination
Initial sentence is given value 1, intermediate sentence is S. No.
Size Total
given value 2 and last sentence is given value 3. 1. 1 6
2. 2 15
D. Corpus Based Features 3. 3 20
4. 4 15
1. Signature word [3]: Signature words are special words 5. 5 6
which are more important in the corpus. A large corpus with 6. 6 1
many documents of different publications is used to identify
signature words. The frequency of occurrence of the
individual words averaged across different publications is All the combinations of the proposed six features are given
calculated and this score is used to determine the signature in Table II. These combinations will then used in scoring the
words. These words can be helpful as at the time of sentence sentences. Each of the features if included in the scoring is
scoring we can give more weight to the sentences which given a weight value 1.
contain signature words.
V. RESULTS AND ANALYSIS
2. Baseline Term Probability [16]: With the help of baseline We used 10 documents from DUC 2002 dataset to evaluate
documents we try to find how important the term is, in the our algorithm. For the purpose of assessment of results we
summary. If the term is more frequent in the Baseline have used ROUGE-1 metric. ROUGE-q checks for the
document set it will get higher probability. Given the presence of each individual word in the system generated
frequencies bj equal to the number of times term j occurred in summary which is present in the gold summary. Initially we
a collection B of "baseline" documents, we compute Baseline tried our experiments with 10 features but then we found that
term probability for each sentence i in a document. only six of them are giving the top results. These six then were
selected as to give 63 combinations. Table III shows the
3. Document Term Probability [16]: This feature calculates the Precision value of top feature sets of each document. For
likeliness of a term within the document. Document 1 word co-occurrence has the highest precision
value. Word co-occurrence and named entity are giving the
4. Lead words [13]: In multi-document summarization if we best result in case of document 2 and document 6. TF-IDF is
can identify the main topics that appear in the lead sentences. giving best precision value for document 3. For document 4
These topics must cover all the important information in the sentence centrality is giving highest value of precision. Named
entity feature is giving highest value for document 5. For
documents. In the paper they developed a list of 4600-4900
document 7 word co-occurrence and sentence location are
lead words using two large corpora. These lead words are used giving best results. Sentence centrality and proper noun
as binary values and the sentences richest in lead words get the combination is giving best result for document 8. Word co-
highest scores. occurrence, sentence location, sentence centrality and named
entity are giving highest value for document 9. For document
5. Verb specificity [13]: The association of subject nouns to 10, name entity and sentence location are giving best precision
verbs is computed on the basis of a large corpus study. A value. It is clear that a single feature is not sufficient to give
collection of 1 year documents of newswire was used as the best result for all the documents. Different set of
corpus. Verb specificity score is calculated on the basis that combinations are required to get the optimal results. Similarly
how often mutual information between a verb and a noun Table IV shows Recall values of every document with best
exceeds the threshold value. Following formula reflects how features set. For document 1 and document 9 sentence location
often the mutual information between a particular verb and is having the best recall value. For document 2 and document
one noun or another exceeded a threshold. 5 TF-IDF is giving the best result. For document 3 and
document 7 name entity is giving highest value of recall.
All these feature will are used to score the sentences. The top Sentence centrality is giving highest value in case of document
ranked sentences are selected as the summary sentences. 4. Word co-occurrence and named entity are giving best
results in case of document 6. Word co-occurrence and
sentence location are best for document 8.

38
TABLE II. INDIVIDUAL COMBINATIONS TABLE III. RESULTS WITH BEST SET FOR PRECISION
Combinations of Features Precision
S. No. S. No. Feature Set
F1 F2 F3 F4 F5 F6 Value
1. 0 0 0 0 0 1 1. F2 0.51
2. 0 0 0 0 1 0 2. F2+F5 0.40
3. 0 0 0 0 1 1 3. F1 0.42
4. 0 0 0 1 0 0 4. F3 0.42
5. 0 0 0 1 0 1 5. F5 0.49
6. 0 0 0 1 1 0
6. F2+F5 0.65
7. 0 0 0 1 1 1
8. 0 0 1 0 0 0 7. F2+F4 0.40
9. 0 0 1 0 0 1 8. F3+F6 0.47
10. 0 0 1 0 1 0 9. F2+F3+F4+F5 0.30
11. 0 0 1 0 1 1 10. F4+F5 0.64
12. 0 0 1 1 0 0
13. 0 0 1 1 0 1 TABLE IV. RESULTS WITH BEST SET FOR RECALL
14. 0 0 1 1 1 0
15. 0 0 1 1 1 1 S. No. Feature Set Recall Value
16. 0 1 0 0 0 0 1. F4 0.56
17. 0 1 0 0 0 1 2. F1 0.42
18. 0 1 0 0 1 0 3. F5 0.41
19. 0 1 0 0 1 1 4. F3 0.43
20. 0 1 0 1 0 0 5. F1 0.46
21. 0 1 0 1 0 1 6. F2+F5 0.71
22. 0 1 0 1 1 0
7. F5 0.51
23. 0 1 0 1 1 1
8. F2+F4 0.50
24. 0 1 1 0 0 0
25. 0 1 1 0 0 1 9. F4 0.32
26. 0 1 1 0 1 0 10. F2+F4+F5 0.72
27. 0 1 1 0 1 1
28. 0 1 1 1 0 0
29. 0 1 1 1 0 1
30. 0 1 1 1 1 0 TABLE V. RESULTS WITH BEST SET FOR F-MEASURE
31. 0 1 1 1 1 1 F-Measure
32. 1 0 0 0 0 0 S. No. Feature Set
Value
33. 1 0 0 0 0 1 1. F4 0.53
34. 1 0 0 0 1 0 2. F1 0.41
35. 1 0 0 0 1 1 3. F2+F4 0.34
36. 1 0 0 1 0 0
4. F3+F5 0.42
37. 1 0 0 1 0 1
38. 1 0 0 1 1 0 5. F1+F3 0.45
39. 1 0 0 1 1 1 6. F2+F5 0.68
40. 1 0 1 0 0 0 7. F5 0.43
41. 1 0 1 0 0 1 8. F3+F6 0.46
42. 1 0 1 0 1 0 9. F4 0.30
43. 1 0 1 0 1 1 10. F2+F4+F5 0.67
44. 1 0 1 1 0 0
45. 1 0 1 1 0 1
46. 1 0 1 1 1 0 For document 10, Word co-occurrence, sentence location
47. 1 0 1 1 1 1 and named entity are giving the best results. The results for F-
48. 1 1 0 0 0 0 Measure are given in Table V. In this case also different
49. 1 1 0 0 0 1
combinations are giving the best results. For document 1 and
50. 1 1 0 0 1 0
51. 1 1 0 0 1 1
document 9 sentence location is giving the best result. TF-IDF
52. 1 1 0 1 0 0 is giving best results for document 2. Word co-occurrence and
53. 1 1 0 1 0 1 sentence location are giving best results in case of document 3.
54. 1 1 0 1 1 0 Sentence centrality and named entity are giving best results in
55. 1 1 0 1 1 1 case of document 4. TF-IDF and sentence centrality are best
56. 1 1 1 0 0 0 for document 5. For document 6 word co-occurrence and
57. 1 1 1 0 0 1 named entity are best. Named entity is having highest value in
58. 1 1 1 0 1 0 case of document 7. For document 8, sentence centrality and
59. 1 1 1 0 1 1 named entity are giving best results. Word co-occurrence,
60. 1 1 1 1 0 0
sentence location and name entity are having best result for
61. 1 1 1 1 0 1
62. 1 1 1 1 1 0
document 10.
63. 1 1 1 1 1 1

39
VI. CONCLUSION International Conference on Human Language Technology Research,
ser. HLT ’02. San Francisco, CA, USA: Morgan Kaufmann Publishers
This paper suggests that it is not easy to find an efficient Inc., 2002, pp. 52–58, 2002.
extractive summary of text using different feature [14] J. Carbonell and J. Goldstein, “The use of mmr, diversity-based
combinations. We tried all combinations for several features. reranking for reordering documents and producing summaries,” in
Proceedings of the 21st Annual International ACM SIGIR Conference
We found that specific combinations can give higher on Research and Development in Information Retrieval, ser. SIGIR ’98.
efficiency but only on one or few documents. News New York, NY, USA: ACM, 1998, pp. 335–336, 1998.
documents especially our algorithm performs consistently on [15] R. Mihalcea and P. Tarau, “A language independent algorithm for single
all documents as it takes the advantage of sentence location and multiple document summarization,” in In Proceedings of IJCNLP,
2005.
feature. TF/ISF, Named Entity and Proper Nouns are good
[16] J. M. Conroy and D. P. O’leary, “Text summarization via hidden
indicators to include the sentences in the summary. Proposed markov models,” in Proceedings of the 24th Annual International ACM
feature set may be extended with some semantic features with SIGIR Conference on Research and Development in Information
more filtering levels. Retrieval, ser. SIGIR ’01. New York, NY, USA: ACM, 2001, pp. 406–
407,2001.
REFERENCES [17] J. E. Rush, R. Salvador, and A. Zamora. Automatic abstracting and
indexing. ii. production of indicative abstracts by application of
[1] H. P. Luhn, “The automatic creation of literature abstracts,” IBM J. Res. contextual inference and syntactic coherence criteria. Journal of the
Dev., vol. 2, no. 2, pp. 159–165, Apr. 1958. American Society for Information Science, 22(4):260-274, 1971.
[2] E. Hovy and C.-Y. Lin, “Automated text summarization and the [18] Chin-Yew Lin and Eduard Hovy. 1997. Identifying topics by position. In
summarist system,” in Proceedings of a Workshop on Held at Baltimore, Proceedings of the fifth conference on Applied natural language
Maryland: October 13-15, 1998, ser. TIPSTER ’98. Stroudsburg, PA, processing (ANLC '97). Association for Computational Linguistics,
USA: Association for Computational Linguistics, 1998, pp. 197–214., Stroudsburg, PA, USA, 283-290. 1997
1998.
[19] Radev. D., Blair-Goldensohn S. And Zhang Z. (2001). Experiments in
[3] R. Brandow, K. Mitze, and L. F. Rau, “Automatic condensation of single and multi-document summarization using MEAD. In First
electronic publications by sentence selection,” Inf. Process. Manage., Document Understanding Conference, New Orleans LA, 2001.
vol. 31, no. 5, pp. 675–685, Sep. 1995.
[20] A. Abuobieda, N. Salim, A. Albaham, A. Osman, and Y. Kumar. Text
[4] H. P. Edmundson, “New methods in automatic extracting,” J. ACM, vol. summarization features selection method using pseudo genetic-based
16, no. 2, pp. 264–285, Apr. 1969 model. In International Conference on Information Retrieval Knowledge
[5] J. Kupiec, J. Pedersen, and F. Chen, “A trainable document Management (CAMP), 2012, pages 193-197, March 2012.
summarizer,” in Proceedings of the 18th Annual International ACM [21] J. J, Pingali, and V. Varma. Sentence extraction based on single
SIGIR Conference on Research and Development in Information document summarization. In Workshop on Document Summarization,
Retrieval, ser. SIGIR ’95. New York, NY, USA: ACM, 1995, pp. 68–73, 2005.
1995.
[22] M. Mendoza, S. Bonilla, C. Noguera, C. Cobos, and E. Leon. Extractive
[6] M. A. Fattah and F. Ren, “Ga, mr, ffnn, pnn and gmm based models for single-document summarization based on genetic operators and guided
automatic text summarization,” Computer Speech and Language, vol. local search. Expert Syst. Appl., 41(9):4158-4169, July 2014.
23, no. 1, pp. 126 – 144, 2009.
[23] C. Nobata, S. Sekine, M. Murata, K. Uchimoto, M. Utiyama, and H.
[7] K. Church and W. A. Gale, “Inverse document frequency (idf): A Isahara. Sentence extraction system assembling multiple evidence. In
measure of deviations from poisson,” in Proceedings of the Third Proceedings of the Second NTCIR Workshop Meeting, pages 5-213,
Workshop on Very Large Corpora, 1995, pp. 121–130, 1995. 2001.
[8] K. Papineni, “Why inverse document frequency?” in Proceedings of the [24] J. J. Pollock and A. Zamora. Automatic Abstracting Research at
Second Meeting of the North American Chapter of the Association for Chemical Abstracts Service. Chemical Information and Computer
Computational Linguistics on Language Technologies, ser. NAACL ’01. Sciences, 15(4):226-232, Nov. 1975.
Stroudsburg, PA, USA: Association for Computational Linguistics,
[25] P. R. Shardanand and U. Kulkarni. Implementation and evaluation of
2001, pp. 1–8., 2001.
evolutionary connectionist approaches to automated text summarization.
[9] A. Tombros and M. Sanderson, “Advantages of query biased summaries In Journal of Computer Science, pages 1366-1376, Feb 2010.
in information retrieval,” in Proceedings of the 21st Annual International
[26] Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel
ACM SIGIR Conference on Research and Development in Information
Pereira e Silva, Fred Freitas, George D.C. Cavalcanti, Rinaldo Lima,
Retrieval, ser. SIGIR ’98. New York, NY, USA: ACM, 1998, pp. 2–10,
Steven J. Simske, Luciano Favaro, Assessing sentence scoring
1998.
techniques for extractive text summarization, Expert Systems with
[10] K. R. McKeown, J. L. Klavans, V. Hatzivassiloglou, R. Barzilay, and E. Applications, Volume 40, Issue 14, 15 October 2013, Pages 5755-5764,
skin, “Towards multidocument summarization by reformulation: ISSN 0957-4174, 2012.
Progress and prospects,” in Proceedings of the Sixteenth National
[27] Rafael Ferreira., Freitas F., De Souza Cabral L, Dueire Lins R., Lima
Conference on Artificial Intelligence and the Eleventh Innovative
R., Franca G., Simske S.J. and L. Favaro. "A Context Based Text
Applications of Artificial Intelligence Conference Innovative
Summarization System," 2014 11th IAPR International Workshop on
Applications of Artificial Intelligence, ser. AAAI ’99/IAAI ’99. Menlo
Document Analysis Systems (DAS), vol., no., pp.66,70, 7-10 April
Park, CA, USA: American Association for Artificial Intelligence, 1999,
2014.
pp. 453–460, 1999.
[28] Yogesh Kumar Meena and Dinesh Gopalani, Analysis of Sentence
[11] C. Hori, S. Furui, R. Malkin, H. Yu, and A. Waibel, “Automatic
Scoring Methods for Extractive Automatic Text Summarization,
summarization of english broadcast news speech,” in Proceedings of the
International Conference on Information and Communication
Second International Conference on Human Language Technology
Technology for Competitive Strategies (ICTCS-2014), November 14 -
Research, ser. HLT ’02. San Francisco, CA, USA: Morgan Kaufmann
16 2014, Udaipur, Rajasthan, India , ACM 978-1-4503-3216-3/14/11,
Publishers Inc., 2002, pp. 241–246, 2002.
2014.
[12] E. F. Skorochod’ko, “Adaptive method of automatic abstracting and
[29] Lin, Chin-Yew. 2004. ROUGE: a Package for Automatic Evaluation of
indexing,” in IFIP Congress (2)’71, 1971, pp. 1179–1182, 1971.
Summaries. In Proceedings of the Workshop on Text Summarization
[13] B. Schiffman, A. Nenkova, and K. McKeown, “Experiments in Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26, 2004.
multidocument summarization,” in Proceedings of the Second

40

You might also like