An Extractive Approach For English Text
An Extractive Approach For English Text
www.ijsar.in
23
IJSAR, 6(5), 2019; 20-30
Fuzzy logic system design usually logic system is the defuzzification. The
implicates selecting fuzzy rules and output membership function which is
membership function. The selection of fuzzy divided into three membership functions:
rules and membership functions directly Output Unimportant, Average, and
affect the performance of the fuzzy logic Important is used to convert the fuzzy
system. results from the inference engine into a crisp
The fuzzy logic system consists of four output for the final score of each sentence.
components: fuzzifier, inference engine, In fuzzy logic method, each sentence of
defuzzifier, and the fuzzy knowledge base. the document is represented by sentence
In the fuzzifier, crisp inputs are translated score. Then all document sentences are
into linguistic values using a membership ranked in a descending order according to
function to be used to the input linguistic their scores. A set of highest score sentences
variables. After fuzzification, the inference are extracted as document summary based
engine refers to the rule base containing on the compression rate. It has been proven
fuzzy IF-THEN rules to derive the linguistic that the extraction of 20 percent of sentences
values. In the last step, the output linguistic from the source document can be as
variables from the inference are converted to informative as the full text of a document.
the final crisp values by the defuzzifier using Finally, the summary sentences are arranged
membership function for representing the in the original order.
final sentence score. In order to implement
text summarization based on fuzzy logic,
first, the features such as sentence length,
term weight, sentence position, sentence to
sentence similarity, Title Word etc are used
as input to the fuzzifier. Triangular
membership functions and fuzzy logic is
used to summarize the document.
The input membership function for each
feature is divided into five fuzzy set which
are composed of unimportant values (low
(L) and very low (VL), Median (M)) and
important values (high (H) and very high
(VH)). Figure 3. Fuzzy Inference Engine
In inference engine, the most important part Clustering Techniques[6,10]
in this procedure is the definition of fuzzy
IF-THEN rules. The important sentences are Different approaches to clustering data can
extracted from these rules according to our be described with the help of the hierarchy
features criteria. Sample of IF-THEN rules shown in Figure 4 (other taxonometric
shows as the following rule. representations of clustering methodology
are possible; ours is based on the discussion
IF (NoWordInTitle is VH) and in Jain and Dubes [1988]). At the top level,
(SentenceLength is H) and (TermFreq is there is a distinction between hierarchical
VH) and (SentencePosition is H) and and partitional approaches (hierarchical
(SentenceSimilarity is VH) and methods produce a nested series of
(NoProperNoun is H) and partitions, while partitional methods produce
(NoThematicWord is VH) and only one).
(NumbericalData is H) THEN (Sentence is The taxonomy shown in Figure 2 must be
important) Likewise, the last step in fuzzy supplemented by a discussion of cross-
24
IJSAR, 6(5), 2019; 20-30
cutting issues that may (in principle) affect designed to optimize a squared error
all of the different approaches regardless of function. This optimization can be
their placement in the taxonomy. accomplished using traditional techniques or
Agglomerative vs. divisive: This aspect through a random search of the state space
relates to algorithmic structure and consisting of all possible labelling.
operation. An agglomerative approach
begins with each pattern in a distinct Incremental vs. non-incremental: This issue
(singleton) cluster, and successively merges arises when the pattern set to be clustered is
clusters together until a stopping criterion is large, and constraints on execution time or
satisfied. A divisive method begins with all memory space affect the architecture of the
patterns in a single cluster and performs algorithm. The early history of clustering
splitting until a stopping criterion is met. methodology does not contain many
Monothetic vs. polythetic: This aspect examples of clustering algorithms designed
relates to the sequential or simultaneous use to work with large data sets, but the advent
of features in the clustering process. Most of data mining has fostered the development
algorithms are polythetic; that is, all features of clustering algorithms that minimize the
enter into the computation of distances number of scans through the pattern set,
between patterns, and decisions are based on reduce the number of patterns examined
those distances. A simple monothetic during execution, or reduce the size of data
algorithm reported in Anderberg [1973] structures used in the algorithm’s operations.
considers features sequentially to divide the
given collection of patterns. Related works in Indian languages
The extractive summarization research
works in Indian Languages are not up to
date as compared to the other languages like
English, German, and Spanish etc. It is
mainly due to the diversity in the Indian
Languages and the lack of resources such as
raw data, various NLP tools etc. This section
explains the extractive summarization works
in Indian Languages like Malayalam, Hindi,
and Bengali etc.
Malayalam Text Summarization [3]
Krishnaprasad P, Sooryanarayanan A and
Ajeesh Ramanujan uses abstractive
Figure 4 Clustering Techniques. approach to summarize the text in
Malayalam language. They generated the
Hard vs. fuzzy: A hard clustering algorithm summary from the given document by
allocates each pattern to a single cluster recombining the extracted important
during its operation and in its output. A sentences from the text. In order to identify
fuzzy clustering method assigns degrees of the important sentences in the text they
membership in several clusters to each input follow the content word method. Content
pattern. A fuzzy clustering can be converted word is extracted from the frequency
to a hard clustering by assigning each distribution of except stop words. The
pattern to the cluster with the largest proposed system comprises of two
measure of membership. components, Text analyzing component and
Deterministic vs. stochastic: This issue is the summary generation component. The
most relevant to partitional approaches Text analyzing component is used to
25
IJSAR, 6(5), 2019; 20-30
identify the features associated with the After the sentence ranking the next task is
sentences and based upon the features it Sentence Selection. In this phase, top N
assign the score to each sentence. The main scored sentences may be used to generate
tasks involved are sentence marking, feature the summary. But this generate the
extraction and sentence ranking. Summary coherence. So, After selecting the sentences,
generation component uses the sentence the sentences are recombined in the
score to generate the summary and it chronological order present in the original
involves two main tasks Sentence Selection input text for getting readable summary.
and Summary generation. The proposed system for Malayalam
provides faster method to generate the
summary. For each news article 4
summaries have been generated based on the
condensation rate of 10,15,20,25
percentages and the generated summaries
are evaluated with the reference summaries
by using standard metric ROUGH. The
performance of the given system may be
improved by adding the stemming process,
improvement in the sentence splitting
criteria and adding more number of features.
Phase 3:
In this phase, depending on the frequency of
each sentence the ranking is done, which
means the sentences are get assigned with
some frequency and after that they are
ranked in descending order. This became the
input to next process.
Phase 4:
To select the sentences which we got in
phase 3 for summary generation, we set
threshold value. Depending upon this
threshold value high score sentences are
used to generate summary.
Conclusion
The main aim of this research work is to
Figure 6 Proposed Architecture combine the both approaches of query
Phase 2: dependent summarization and clustering of
In this phase, the frequencies are get the document. The proposed work will be
allocated to each word. the frequency is mainly focused on summarization of text
depends on how many times that particular files (i.e. .txt).The proposed work will be
term occurs in that document. For that limited to clustering of text files of Standard
purpose standard formula is used as follows: files related to the topic popular amongst
researchers will be used. Standard
28
IJSAR, 6(5), 2019; 20-30
performance evaluation metrics will be used Abstractive Text Summarization
to validate performance. Techniques”, American Journal of
Acknowledgment Engineering Research, 2017, Volume-6,
We would like to thank Mr. Amit Kolhe, Issue-8, pp-253-260.
Managing Trustee of Sanjivani College of 8. Nikita desai, prachi shah, “Automatic
Engineering, Kopargaon, India and Principal Text Summarization Using Supervised
of Sanjivani College of Engineering, Machine Learning Technique For Hindi
Kopargaon, India for providing the Langauge”, International Journal of
resources needed to carry out the proposed Research in Engineering and Technology,
work. 2016, Vol. 5, Issue. 6.
9. Vishal Gupta, Gurpreet Singh Lehal, “A
References Survey of Text Summarization Extractive
1. Elena Lloret, “Text Summerization: An Techniques”, Journal of Emerging
overview”. Technologies in Web Intelligence,
2. Mehdi Allahyari, Seyedamin Pouriyeh, August 2010, Vol. 2, No. 3.
Mehdi Assefi, Saeid Safaei, Elizabeth D. 10. Sheetal Shimpikar, Sharvari Govilkar, “A
Trippe, Juan B. Gutierrez, Krys Kochut, Survey of Text Summarization
“Text Summarization Techniques: A Techniques for Indian Regional
Brief Survey”, In Proceedings of arXiv, Languages”, International Journal of
USA, July 2017. Computer Applications, Volume 165,
3. Krishnaprasad P, Sooryanarayanan A, No.11, May 2017.
Ajeesh Ramanujan, “Malayalam Text 11. Yogesh Kumar Meena, Dinesh Gopalani,
Summarization: An Extractive “Domain Independent Framework for
Approach”, IEEE, International Automatic Text Summarization”,
Conference on Next Generation Elsevier, International Conference on
Intelligent Systems (ICNGIS), 2016 Intelligent Computing, Communication
4. Dragomir R Radev, Eduard Hovy, and & Convergence (ICCC), 2015.
Kathleen McKeown. 2002. “Introduction 12. Jimmy Lin., “Summarization.”,
to the special issue on summarization”. Encyclopedia of Database Systems.
Computational linguistics 28, 4 (2002), Heidelberg, Germany: Springer-Verlag,
399–408. 2009.
5. Sunitha C., Dr. A. Jaya, Amal Ganesh, 13. Jackie CK Cheung, “Comparing
“A Study on Abstractive Text Abstractive and Extractive
Summarization Techniques in Indian Summarization of Evaluative Text:
Languages”, Elsevier, Fourth Controversiality and Content Selection”,
International Conference on Recent B. Sc. (Hons.) Thesis in the Department
Trends in Computer Science and of Computer Science of the Faculty of
Engineering, 2016. Science, University of British Columbia,
6. Yogesh Kumar Meena, Dinesh Gopalani, 2008.
“Feature Priority Based Sentence 14. Soumye Singhal, Arnab Bhattacharya,
Filtering Method for Extractive “Abstractive Text Summarization”
Automatic Text Summarization”, 15. Rene Arnulfo Garcia-Herandez and Yulia
Elsevier, International Conference on Ledeneva, “Word Sequence Models for
Intelligent Computing, Communication Single Text Summarization”, IEEE, 44-
& Convergence (ICCC), 2015. 48, 2009.
7. Sabina Yeasmin, Priyanka Basak Tumpa, 16. Hans christian, mikhael pramodana agus,
Adiba Mahjabin Nitu, Md. Palash Uddin, derwin suhartono, “Single Document
Emran Ali, Masud Ibn Afjal, “Study of Automatic Text Summarization Using
29
IJSAR, 6(5), 2019; 20-30
Term Frequency-Inverse Document Scientific Research, Intelligent
Frequency (tf-idf)”, ComTech Vol. 7 No. Information Management, November
4, December 2016,285-294 2009, 1, 128-138.
17. Akash Ajampura Natesh, Somaiah 20. Mr. S. A. Babar, Prof. S. A. Thora,
Thimmaiah Balekuttira, Annapurna P “Improving Text Summarization using
Patil, “Graph Based Approach for Fuzzy Logic & Latent Semantic
Automatic Text Summarization”, Analysis”, International Journal of
International Journal of Advanced Innovative Research in Advanced
Research in Computer and Engineering (IJIRAE), Volume 1, Issue
Communication Engineering, Vol. 5, 4, May 2014.
Special Issue 2, October 2016. 21. L. Suanmali , N. Salim and M.S.
18. Khushboo S. Thakkar, Dr. R. V. Binwahlan,"Fuzzy Logic Based Method
Dharaskar, M. B. Chandak, “Graph- for Improving Text Summarization" ,
Based Algorithms for Text International Journal of Computer
Summarization”, IEEE, Third Science and Information Security, 2009,
International Conference on Emerging Vol. 2, No. 1,pp. 4-10.
Trends in Engineering and Technology, 22. S. A. Babar, Pallavi D. Patil, “Improving
2010. Performance of Text Summarization”,
19. Rasim ALGULIEV, Ramiz Elsevier, International Conference on
ALIGULIYEV, “Evolutionary Algorithm Information and Communication
for Extractive Text Summarization”, Technologies (ICICT), 2014.
30