Literature Study On Multi-Document Text Summarization Techniques
Literature Study On Multi-Document Text Summarization Techniques
net/publication/310596578
CITATION READS
1 785
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Chintan Shah on 21 November 2016.
1 Introduction
For retrieving information, People widely use internet such as Google, Yahoo, Bing
and so on. Since amount of material on the internet is growing rapidly, for users it is
not easy to find relevant and appropriate information as per the requirement. Once a
user sends a query on a search engine for data or information then the response is
most of the timesthousands of documents and the user has to face the tedious task
offinding the appropriate information from this sea of rejoinder. This problem is
called as “Data Overloading” [1]. Automatic text summarization is the summary of
source of text in shorter version, that retain the main feature of the content and help
the user to quickly understand large volume of information.A number of authors have
proposed techniques for automatic text summarization which can be broadly classi-
fied as: extractive summarization and abstractive summarization. In extractive sum-
marization, it selects sentences that have the highest weightage in the retrieved docu-
ment and put them together to generate a summary version of original document
without changing or altering the main text,where as in abstractive summary,the origi-
nal text gets converted into another semantic form with the help of linguistic methods
to get a shorter summary of original document [2].
The primary goal of multiple-document summarization is to build summary which
has maximum coverage, less redundant data and maximum cohesiveness between
sentences [2]. In another words, main sentences are extracted from each document
and then are re-arranged to get multi-documents summary.Multi-document summari-
zation flow is shown in Fig.1
This survey paper covers various aspects which are given below
1. Several approaches of Graph, Cluster, Term Frequency, and Latent sematic
analysis for multi-document summarization
2. Issues and problems shown by different researchers for improvement in this
area
3. Evaluation criteria for comparing automatic summary and human summary
We have inSection IIof this paper, described related work done on multi-document
text summarization with help of Graph, Cluster, Term-Frequency and Latent semantic
analysis methods. In the Section III we have shown analysis and comparison of all
methods with scope of improvement, Section IV contains the evaluation criteria and
Section V contains conclusion.
2 Related work
Salton [13] (2005) has proposed method of term frequencyinverse document fre-
quency model (TF-IDF), where the mark of a term in this document is the ratio be-
tween the amount of terms in this document to the frequency of the amount of docu-
ments that contain those terms.Importance of evaluating the expression is given by the
principle TFI X IDFI, where TFI is the term frequency of ‘I’ in the document and
IDFI is the inverted frequency in which that term ‘I’ occurs. Therefore, sentences can
be scored for illustration with help computing relevance of terms in the sentence.
Jun’ichiFukumoto [14] (2004) proposed a technique for multi-document summa-
rization in which an easy strategy to build abstract with help of TF-IDF based extrac-
tion is used. Summaries for individual documents are generated and same summaries
will be used for generating multi-document summary. The proposed system automati-
cally categorizes a document into three different sub-sets with help of info of high
frequency nouns and named object, the categories are one topic, multi-topic type and
others. To summarize, the first sentences are take out from each document based on
TF-IDF, the position of the sentence and weighingof a sentence. During the next step,
needless parts of sentences are discarded. Then all sentences which are extracted are
sorted in the original order in a document to generate summarized form of each single
document. In the next stage, all extracted sentences are grouped in clusters and the
repeated clauses are removed. The remaining clauses are sorted for generating the
final summary.
4 Evaluation Measures
5 Conclusion
This literature survey paper contains various methods for multi-document text
summarization. Several techniques have been explored for multi-document summari-
zation such as Graph Based, Cluster Based, Term-Frequency Based and Latent Se-
mantic Analysis(LSA) based. Researchers can focus only on specific approaches from
existing techniques and make an improvement in those approaches to generate new or
hybrid approach for building better summaries which take less effort. We have com-
pared in this paper, Graph, Cluster, Term-Frequency and LSA. New approach or hy-
brid approach can be developed with help of natural language processing approach
and linguistic approach, which can help us to generate better summary for multi-
document.
6 References
1. M.-y' Kan and 1. L. Klavans, "Using librarian techniques in automatic text summarization
for information retrieval, " in Proceedings of the 2ndACMlIEEE-CS joint conference on
digital libraries, pp. 36-45, ACM, 2002
2. Y. K. Meena, A. Jain and D. Gopalani, "Survey on Graph and Cluster Based approaches in
Multi-document Text Summarization," Recent Advances and Innovations in Engineering
(ICRAIE), 2014, Jaipur, 2014, pp. 1-5. doi: 10.1109/ICRAIE.2014.6909126
3. M. Haque, S. Pervin, Z. Begum, et aI., "Literature review of automatic multiple documents
text summarization, " International Journal of Innovation and Applied Studies, vol. 3, no.
1, pp. 121-129, 2013.
4. R. Mihalcea and P. Tarau, 'Textrank: Bringing order into texts, " in Proceedings of
EMNLP, vol. 4, Barcelona, Spain, 2004.
5. J. Zhang, L. Sun, and Q. Zhou, "A cue-based hub-authority approach for multi-document
text summarization, " in Natural Language Processing and Knowledge Engineering, 2005.
IEEE NLP-KE'05. Proceedings of 2005 IEEE International Conference on, pp. 642-645,
IEEE, 2005.K. Elissa
6. S. Hariharan and R. Srinivasan, "Studies on graph based approaches for single and multi-
document summarizations, " Int. 1. Comput. Theory Eng, vol. 1, pp. 1793-8201, 2009
7. K. S. Thakkar, R. V. Dharaskar, and M. Chandak, "Graph-based algorithms for text sum-
marization, " in Emerging Trends in Engineering and Technology (lCETET), 2010 3rd In-
ternational Conference on, pp.516- 519, IEEE, 2010.
8. S. S. Ge, Z. Zhang, and H. He, "Weighted graph model based sentence clustering and
ranking for document summarization, " in Interaction Sciences (ICIS), 2011 4th Interna-
tional Conference on, pp. 90-95, IEEE, 2011
9. T.-A. Nguyen-Hoang, K. Nguyen, and Q.-V. Tran, "Tsgvi: a graphbased summarization
system for vietnamese documents,"Journal of Ambient Intelligence and Humanized Com-
puting, vol. 3, no. 4, pp. 305- 313, 2012.
10. J. D. Schlesinger, D. P. Oleary, and J. M. Conroy, "Arabic/English multi-document sum-
marization with CLASSY the past and the future, " in Computational Linguistics and Intel-
ligent Text Processing, pp. 568-581, Springer, 2008.
11. X.-c. Ma, G.-B. Yu, and L. Ma, "Multi-document summarization using clustering algo-
rithm, " in Intelligent Systems and Applications, 2009. ISA 2009. International Workshop
on, pp. 1-4, IEEE, 2009.
12. V. K. Gupta and T. J. Siddiqui, "Multi-document summarization using sentence clustering,
" in Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on,
pp. 1-5, IEEE, 2012
13. G. Salton, “Automatic Text Processing: the transformation, analysis, and retrieval of in-
formation by computer,” AddisonWesley Publishing Company, USA, 1989.
14. Jun'ichi Fukumoto, “Multi-Document Summarization Using Document Set Type Classifi-
cation,” Proceedings of NTCIR- 4, Tokyo, pp. 412-416, 2004.
15. S. Xiong and Y. Luo, "A New Approach for Multi-document Summarization Based on La-
tent Semantic Analysis," Computational Intelligence and Design (ISCID), 2014 Seventh
International Symposium on, Hangzhou, 2014, pp. 177-180.
16. J. Steinberger and K. Jezek, “Using latent semantic analysis in text summarization and
summary evaluation,” in Proc. ISIM ’04, 2004, pp. 93–100.
17. E. Lioret and M. Palomar, 'Text summarization in progress: a literature review, " Artificial
Intelligence Review, vol. 37, no. I, pp. 1-41, 2012.
18. D. Das and A. F. Martins, "A survey on automatic text summarization, "Literature Survey
for the Language and Statistics II course at CMU, vol. 4, pp. 192-195, 2007.