TOPSIS With Multiple Linear Regression For Multi-Document Text Summarization

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/320518186
TOPSIS with Multiple Linear Regression for Multi-Document Text

Summarization
Research in Iraqi Journal of Science · July 2017

DOI: 10.24996/ijs.2017.58.3A.14
CITATIONS READS
0 145
2 authors, including:
Zuhair Ali
Al-Mustansiriya University
8 PUBLICATIONS 3 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
multi document summarization View project
classification View project
All content following this page was uploaded by Zuhair Ali on 20 October 2017.
The user has requested enhancement of the downloaded file.

Malallah and Ali Iraqi Journal of Science, 2017, Vol. 58, No.3A, pp: 1298-1307
DOI: 10.24996/ijs.2017.58.3A.14
ISSN: 0067-2904
TOPSIS with Multiple Linear Regression for Multi-Document Text

Summarization
Suhad Malallah1, Zuhair Hussein Ali*2
1
Computer Science Department, University of Technology, Baghdad, Iraq.
2
Computer Science Department, College of Education, Al- Mustansiriya University, Baghdad, Iraq.
Abstract
The huge amount of information in the internet makes rapid need of text
summarization. Text summarization is the process of selecting important sentences
from documents with keeping the main idea of the original documents. This paper
proposes a method depends on Technique for Order of Preference by Similarity to
Ideal Solution (TOPSIS). The first step in our model is based on extracting seven
features for each sentence in the documents set. Multiple Linear Regression (MLR)
is then used to assign a weight for the selected features. Then TOPSIS method
applied to rank the sentences. The sentences with high scores will be selected to be
included in the generated summary. The proposed model is evaluated using dataset
supplied by the Text Analysis Conference (TAC-2011) for English documents. The
performance of the proposed model is evaluated using Recall-Oriented Understudy
for Gisting Evaluation (ROUGE) metric. The obtained results support the
effectiveness of the proposed model.
Keywords: weight feature, Muliple Linear Regression, TOPSIS,PIS,NIS.
‫ مع االنحدار الخطي المتعدد لتلخيص النصوص المتعدده‬Topsis
‫ زىير حسين علي‬، ‫سياد مال اهلل‬

.‫ العراق‬،‫ بغداد‬،‫ الجامعو التكنموجيو‬،‫قسم عموم الحاسبات‬
.‫ العراق‬،‫ بغداد‬،‫ الجامعو المستنصريو‬،‫ كمية التربيو‬،‫قسم عموم الحاسبات‬
‫الخالصو‬
‫بالنظر لمكميات الكبيرة الموجودة من المعمومات في االنترنت ادى الى الحاجة الضرورية لتمخيص‬
‫ أن عممية تمخيص المعمومات تتضمن أستخراج الجمل الميمو من النصوص مع المحافظة عمى‬.‫المعمومات‬
‫ ىذا البحث يقترح طريقة تعتمد عمى تقنية ترتيب االفضمية عن طريق‬.‫االفكار الرئيسية لمنصوص الممخصو‬
‫أستخراج سبعة‬ ‫ الخطوة االولى في موديمنا المقترح تعتمد عمى‬.(TOPSIS(‫التشابو الى الحل المثالي‬
‫ بعدىا تم طريقة أستخدام االنحدار الخطي المتعدد‬.‫خصائص لكل جممة من جمل النصوص المراد تمخيصيا‬
.‫لغرض تعيين أوزان لمخصائص المختارة‬
‫ يتم أختيار الجمل ذات الدرجة االعمى لغرض تضمينيا‬.‫ لغرض ترتيب الجمل‬TOPSIS ‫ثم تطبق طريقة‬
‫ أختبرت النتائج باستخدام‬.‫ ) لمغة االنكميزية‬TAC-2011 ( ‫تم أستخدام قاعدة بيانات‬.‫ضمن الممخص المتكون‬
.‫أثبتت النتائج كفاءه النظام المقترح‬ ROUGE ‫برنامج‬
_____________________________
*Email: [email protected]
1298
DOI: 10.24996/ijs.2017.58.3A.14
1. Introduction
According to the fast development of information-communication technologies, enormous quantity
of documents have been created and put together in the World Wide Web. The huge amount of
documents makes it difficult for the user to get useful information [1]. To deal with such problem of
information overload, Automatic Text Summarization (ATS) has been used as a solution. ATS is the
process of generating a single document summary from a set of documents or from a single document
without losing its main ideas [2]. This process helps users to the general review of all related
documents and interested issues with understanding the main content of the summarized documents;
this process also helps to reduce the time needed to get these briefs. Rely on the amount of document
to be summarized ATS can be classified as a Single Document summarization (SDS) or Multi
Document summarization (MDS). In SDS only one document can be summarized into shorter one,
whereas in MDS a set of related documents with same topic is summarized into one shorter summary
[3]. Summarization methods, also, can be classified as abstractive summarization and extractive
summarization. Aabstractive summarization depends on Natural Language Processing (NLP)
strategies, which request deep understanding of NLP techniques to analyze the documents sentences
and paragraphs, since some changes have to be done to the selected sentences. Whereas in the
extractive summarization, no change is applied to the sentences which are selected to be included in
the final summary[4]. Thus abstractive summarization seems to be more difficult and time-consuming
than extractive summarization [5]. Also summarization can be categorized as query summarization
and generic summarization. In the query based summarization a summary was generated according to
the user query, where the documents searched to match with the user query [6]. While generic
summarization creates a summary which include the main content of the documents. One of the most
challenges for the generic summarization is that no topic or query available for the summarization
process [7].
2. Related Works
ATS reduces a large number of text documents to a smaller set of sentences which explain the main
ideas of these documents. Specialists in NLP are more interested to discover new methods for
summarizing and exploring a variety of models to come up with perfect summarization. In this section
we investigate some of these methods [8].
In [9] the authors suggested a method for calculating the weights of the selected features. Five
different features were used, the first two are structural features in which consist of more than simple
features, while the remaining three features are simple features. These five selected features are used
as input parameters to the particle swarm optimization (PSO) used to train these features and assign a
weight to each one of them. Their results showed that structural features got average weight higher
than simple features. In [10] the authors suggested a method based on selecting five features. These
features are: sentence position, sentence length, numerical information, thematic words and title
feature. The pseudo genetic algorithm was used to train the dataset and assign a weight to each feature.
Their results showed that the importance of these features are in the following order title feature,
sentence position, thematic words, sentence length and numerical information. In [11], a set of features
were extracted for each setence ; this set was used as input to a model consist of three functions:
Cellular Learning Automata (CLA), PSO, and fuzzy logic. The CLA was used to calculate the
similarity between sentences to reduce the redundancy. While the PSO was used to set a weight for
each feature, then the fuzzy logic was used to give scores to the sentences, these scored sentences were
arranged in descending order, and the sentence with higher score was selected to be included in the
created summary. In [12], the authors proposed a method based on formulating the problem of MDS
as a multi-objective optimization (MOO). Two main objective functions were formulated these are
redundancy reduction and content coverage. The redundancy reduction was computed using cosine
similarity between each sentence in the dataset, whereas the content coverage was computed using the
cosine similarity between each sentence with the mean of document collection. Evolutionary
Algorithm was used to combine these two objective functions with the aim to minimize the first
objective and maximize the second objective function. Good results are obtained from their method.
The fundamental objective of document summarization is the extraction of suitable and pertinent
sentences from the input document(s). A technique to acquire the significant sentences is through
assigning a weight for each sentence which indicates the salience of a sentence for selection to the
summary and then selecting the top ones [13]. In this paper a method for extracting generic MDS for
9911
DOI: 10.24996/ijs.2017.58.3A.14
English text is proposed which depends on extracting seven features for each sentence in the
documents, then a mathematical model is used for assigning a weight for each feature. The
mathematical model is based on Multiple Linear Regression (MLR). The weights of the selected
features are used as input to the TOPSIS algorithm. The TOPSIS uses both: the selected features and
their calculated weights to rank the sentences. We have utilized Text Analysis Conference (TAC-
2011) dataset to assess the summarized results.
3. Problem Statement and Formulation
To produce a good summary for any MDS system two issues must be considered. These issues are
1-Relevancey: can be defined as the goodness of information included in the created summary. A
summary considered as relevant if it includes many information relevant to the main topic of the
documents.
2-Redundancy: The generated summary should include less redundant information to cover most of the
relevant topics.
Formally, given a corpus which consists of many clusters, each cluster contains a set of documents
called D with the same topic. The set D can be defined as D= {d1, d2,…, dn} where n is the number of
distinct document in D. Each D can be represented by a set of sentences called Si, i.e D= {Si |
1<=i<=M}where M represents the total number of sentences in the set D.
Our goal is to find a subset of set D called A i.e. A ⊂ D that satisfies both objectives: relevancy
maximization and redundancy reduction.
4. Basic Concepts
There are two main stages: Preprocessing and feature extraction.
4.1 Preprocessing
There are four steps in this stage.
A- Sentence segmentation: which can be done by splitting sentences according to the dot between
them.
B- Tokenization: Is the process of splitting sentence into words
C- Stop Words Removal: Words which don't give the necessary information for identifying
significant meaning of the document content and appear frequently are removed. There are a
variety of methods used for specifying such stop words list. . Presently, a number of English stop
word list is usually used to help text summarization process
D- Stemming: is the process of producing root of the word, in This paper word stemming is
performed using Porter’s stemming algorithm [14].
4.2 Features Extraction
An essential part of ATS is computing features score for every sentence. The features include:
sentence position, sentence length, numerical data, thematic word, title word, proper noun and centroid
value [15].
A- Sentence Position (SP): higher score will be given to the first sentence; the score decreases
according to the sentence position in the document. This feature can be computed according to Eq.
(1).
N  i 1
F1( si )  (1)
N
Where i is the position of the sentence ( s) in a document of N sentences
B- Sentence length (SL): This feature is computed by dividing the sentence length by the length of the
longest sentence in the document as in Eq. (2).
L( s i )
F 2( si )  (2)
Lmax
Where L(si) is the length of sentence si and Lmax is the length of longest sentence in the document.
C- Numerical data (ND): has important information to be included in the summary. This feature is
calculated by dividing the number of numerical data in the sentence by the sentence length as in Eq.
(3).
Num( si )
F 3( si )  (3)
L( s i )
9011
DOI: 10.24996/ijs.2017.58.3A.14
Where Num(si) is the number of numerical data in the sentence (si)

D- Thematic Words (TW): terms that appear most frequently in the document. This feature can be
calculated by computing the repetition of all terms in the document, then only (K) terms with the
highest repetition is selected, in this work, This feature is calculated by dividing the number of thematic
words in the sentence by the maximum number of thematic words in the document as expressed in Eq.
(4).
TW ( si )
F 4( si )  (4)
TWmax
TW is the number of thematic words in the sentence.
TWmax maximum thematic words in the sentences.
E-Title Feature (TF): This feature is important when summarizing the document. The score is
calculated as in Eq. (5).
( ) (5)
( )
Where TF is the words that exists in both: Si and Title
F-Proper Noun (PN): The sentence is important if it includes the maximum number of proper nouns
[16]. This feature is calculated as in Eq. (6)
( )
( ) ( )
( )
Where PN is the number of proper nouns in a sentence si.
G-Centroid value (CV): Is a feature used to specify salient sentences in the multiple documents [17].
This feature can be calculated as follows
L ( Si )
F 7( S i )  C
i 1
w (7)
Cw=TF *IDF (8)

[ ] ( )
Where n total number of documents.

nw number of documents containing the given word.
F7 is the centroid value of the sentence Si
Cw is the centroid value of the word.
TF is the term frequency which represents the frequency of a given term in the document.
IDF is the inverse term frequency computed by division of the total number of documents over the
number of documents including the given term.
The division excluded from all the seven features to produce non normalized features that used
directly in the TOPSIS algorithms.
5. The proposed method
There are three main stages in the proposed method. The first stage includes how to compute the
weights for the extracted features. In the second stage the computed weights of stage one are used as
input to TOPSIS algorithm. The TOPSIS algorithm is used to rank all sentences, then higher score
sentences will be chosen to be included in the final summary. Third stage includes removing
redundancy. Figure-1 shows the block diagram of the proposed method.
9019
DOI: 10.24996/ijs.2017.58.3A.14
Figure1- Block Diagram for Proposed MDS.
5.1 Multiple Linear Regression

MLR is a statistical method for formulating the relationship between the independent variables
and a dependent variable, where there are two or more independent variables, but only one dependent
variable [18]. MLR can be formulated as in Eq (10)
Y= W0+W1 X1+W2 X2+---+Xm Wm (10)
Where
[Y] is the output vector (dependent variable).
[W] feature weight vector.
[X] The extracted features (independent variables).
The regression model can be represented in a matrix form as follows.
 
Y1  W1 
Y   X X X X X X X W2 
  
 2   1,1 1, 2 1,3 1, 4 1,5 1,6 1, 7  W3 
Y3   X 2,1 X 2, 2 X 2,3 X 2, 4 X 2,5 X 2, 6 X 2,7   
   W
  4
  .. 
.   X X X X X X X  W5 
   p ,1 p,2 p ,3 p,4 p ,5 p ,6 p ,7 
Y p  W6 
W7 
 
9019
DOI: 10.24996/ijs.2017.58.3A.14
Where p is the number of sentences from the collected document data set. To estimate the weights
for the extracted features we must train our model. There are 70 documents from the TAC-2011
dataset used for the training mode. The seven extracted features (X1, X2,…, X7) that were described in
section 5 are used as input to the model. The desired output Y can be computed using cosine
similarity between all sentences from selected trained documents and each sentence from the manually
summarized documents. As in Eq (11)
AB i i
Similarity ( A, B)  i 1
(11)
z z
(A )
i 1
i
2
*  (B )
i 1
i
2
Where A sentence from trained documents

B sentence from manually summarized documents.
Each sentence (A ) will be compared with all sentences (B) and higher score value assign to Yi. The
score of the equation (11) range from (0) to (1), zero score means there is no matching between A and
B whereas one means A identical to B. Thus we have values of (Xi) and (Yi) for equation (10).
Our goal is to estimate the values of (Wi) which represent the weights of the selected features
[19]. W will be calculated using Eq.(12) .
W=(X.Xt)-1.XtY (12)
5.2 TOPSIS Method

TOPSIS is a decision making method that developed by Hwang and Yoon in 1981[20], in 1993. It
was more developed by Hwang, Lai and Liu[21]. TOPSIS is an efficient method for Multi-Attribute
Decision Making, that is used for ranking, evaluating and selecting the most suitable alternative
among various alternatives. TOPSIS depends on choosing an alternative that has nearest distance to
the Positive Ideal Solution (PIS) and farthest distance from Negative Ideal Solution [22]. It can help
in choosing features that assist to decide which alternatives are the most appropriate for the problems.
The weights and scores are the two essential evaluation parameters used to make a decision [23].
In our proposed method the scores and weights of the features are computed mathematically (as
described in section 5 and section 6.1) to be used in ranking the sentences. This makes TOPSIS
suitable for MDS where there are many criteria (features) and we have to choose the most suitable
decision.
5.3 TOPSIS as Summary Generation
In this paper TOPSIS will be employed in MDS to propose a mathematical model for ranking
sentences and choose the most suitable ones. To create a decision matrix for TOPSIS seven features
and M sentences are used as shown in Table- 1.
Table 1 - Decision Matrix for TOPSIS method
F1 F2 F3 F4 F5 F6 F7
S1 X1,1 X1,2 X1,3 . . . X1,7
S2 X2,1 X2,2 X2,3 . . X2,6 X2,7
. . . . . . . .
SM XM,1 XM,2 XM,3 . . . XM,7
Xi,j is the feature value where, i=1,..,M and j=1,..,7.

Our goal is to get the best sentences that are nearest to the PIS and far from NIS. Each attribute in the
decision matrix is arranged either in increasing order or decreasing order. In the proposed method, the
attributes are arranged only in decreasing order from the highest to lowest one to get sentences with
the highest score [24]. Algorithm (1) describes the main steps of TOPSIS technique
9010
DOI: 10.24996/ijs.2017.58.3A.14
TOPSIS Algorithm
Step1 :input decision matrix {section 5.3}
Output sentences in descending order
Step2: Normalized a decision matrix by
Step3: Use MLR to construct the feature weights vector Wj

Step 4: Construct the Weighted Normalized Decision Matrix by multiplying each
column by Wj to get Vi,j
Step5: Determine the ideal solution for each column A*{The highest value in the
column Vj+}
Step6: Determine the Negative solution for each column A-{The lowest value in the
column Vj-}
Step7: Determined the PIS
Step8:Determine the NIS
Step9 :calculate closness to ideal solution

Step10:rank all sentences according to the results of step9
By algorithm 1- all the sentences arranged in descending order depending on their score.
5.4 Remove Redundancy

This stage is very necessary for MDS. There are many documents with the same topic, some
sentences may be repeated in more than one document. A technique is required to remove the
redundant sentences from the generated summary, which allows the final summary to include the most
important ideas for the summarized documents [15]. The cosine similarity as explained in Eq. (11) is
used to compute the similarity between two sentences and exclude the sentence from a final summary
when the similarity between them is more than a specified threshold. The following algorithm (2)
illustrates reducing redundancy and generating summary in the proposed MDS model.
9011
DOI: 10.24996/ijs.2017.58.3A.14
Summary generating algorithm
input 1- set of ranked sentences in descending order from topsis algorithm called
scored_sent
2- Max summary size called Max_size
output generated summary called summary

Step1: let summary =[]
Size=0
No_of_sen=0
Step2 : from Scored_sent select Si with highest score
Step3 :Flag=false
for j from 1 to No_of_sen
compare Si with Sj{Sj sentence from summary} according to Eq.(11)
If (Similarity(Si,Sj) >threshold) then flag=true
Step4:if (flag) delete Si go to step2
Else Put Si in the summary
Size=size+count_words (Si)
NO_of_sen=No_of_sen+1
Step5: if size<max_size goto step2
Else end
6. Dataset and Evaluation Metrics

The dataset used in our experiment is TAC-2011 which consists of a document set written in seven
languages (English, Arabic, Greek, Czech, French, Hindi, Hebrew). For each language (10) topics are
used each of (10) documents. Summarization of (10) pre evaluated documents were also provided by
the authors of TAC-2011 [25]. This summarization will be used in comparison with our results. Our
proposed method deals with English language only.
ROUGE [26] will be used to evaluate the performance of the proposed system. ROUGE package
produces three numbers representing: precision (P), Recall (R) and F−score. They are formulated as
follows.
∑
( )
∑
Where the Si number of sentences occurring in both system and ideal summaries
Sj the number of sentences in the system summary.
∑
( )
∑
Where the Si number of sentences occurring in both system and ideal summaries
Sk the number of sentences occurring in ideal summary.
( )
( )
Where
7. Experimental Results
Table 2- shows the results of our proposed MDS method and system summary that included in the
TAC-2011 dataset [25] using ROUGE-1.
9012
DOI: 10.24996/ijs.2017.58.3A.14
Table 2- Proposed MDS results

Proposed MDS Results System Results
ID
Precision Recall F-Score Precision Recall F-Score
Number
ID1 0.45312 0.44421 0.44853 0.41253 0.40524 0.40776
ID2 0.52301 0.51232 0.51749 0.45655 0.46481 0.46062
ID3 0.49313 0.49131 0.49221 0.47909 0.43169 0.45404
ID4 0.48253 0.51324 0.49646 0.44966 0.44423 0.44691
ID5 0.52413 0.51023 0.51689 0.43513 0.41092 0.42243
ID6 0.51452 0.41321 0.44777 0.45122 0.35471 0.39617
ID7 0.44213 0.43214 0.43696 0.3953 0.39586 0.39547
ID8 0.43452 0.42341 0.42874 0.39265 0.38714 0.38985
ID9 0.39923 0.40212 0.40065 0.37726 0.38105 0.3791
ID10 0.5810 0.57203 0.57641 0.51806 0.52488 0.52141
As it's clear the results of the proposed method are better than the results of the peer summaries and
that because of two reasons; the first reason the effect of the selected features which improves the
performance of a TOPSIS method. The second reason most of the ATS methods may be affected by
one feature that makes the sentence score high, while TOPSIS computes the effect of all features in the
selected sentences.
8. Conclusions
The need for MDS increases with the rapid growth of information in the Internet. In this paper a
method for MDS has been proposed which depends on TOPSIS. There are two important points in
TOPSIS over other sentence ranking techniques. The first point TOPSIS depends on ranking the
sentences according to the effect of all feature whereas in other method the effect of one feature may
exceed the effect of other features which allows the sentence to take a high score. The second point is
the weights of the features that are calculated mathematically using MLR to overcome the problem of
assigning weight manually.
References
1. Alguliev , R. M., Aliguliyev, R. M., and Isazade, N. R. 2013. Expert Systems with Applications
Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst.
Appl., 40(5): 1675–1689.
2. Kumar ,R. and Chandrakal, D. 2016. A survey on text summarization using optimization
algorithm. ELK Asia Pacific Journals, 2(1).
3. Huang, L., He, Y., Wei, F., and Li, W. 2010. Modeling Document Summarization as Multi-
objective Optimization. In Intelligent Information Technology and Security Informatics (IITSI),
Third International Symposium on, pp:382-386. IEEE.
9013
DOI: 10.24996/ijs.2017.58.3A.14
4. Song ,W., Cheon, L., Cheol, S., and Feng, X. 2011. Expert Systems with Applications Fuzzy
evolutionary optimization modeling and its applications to unsupervised categorization and
extractive summarization. Expert Syst. Appl., 38(8): 9112–9121
5. Babar, S. A. and Patil, P. D. 2015. Improving Performance of Text Summarization. Procedia
Comput. Sci., Icict 2014, pp: 354–363.
6. Song, W., Cheon, L., Cheol, S., and Feng, X. 2011. Expert Systems with Applications Fuzzy
evolutionary optimization modeling and its applications to unsupervised categorization and
extractive summarization. Expert Syst. Appl., 38(8): 9112–9121.
7. Ferreira, R., De Souza, L., Dueire ,R., De Frana, G., Freitas, F., Cavalcant, G. D. C. i, Lima, R.,
Simske, S. J., and Favaro, L. 2013. Assessing sentence scoring techniques for extractive text
summarization. Expert Syst. Appl., no1. May.
8. Megala, S. S. and Kavitha, A. 2014. Feature Extraction Based Legal Document Summarization.
ijarcsms, 2(12): 346–352.
9. Binwahlan, M. S., Salim, N., and Suanmali, L. .Swarm Based Features Selection for Text
Summarization. IJCSN , 9(1): 175–179.
10. Abuobieda, A., Salim, N., Albaham, A. T., Osman, A. H., and Kumar Y. J. 2012. Text
summarization features selection method using pseudo genetic-based model. Proc. - 2012 Int.
Conf. Inf. Retr. Knowl. Manag. CAMP, pp. 193–197.
11. Ghalehtaki, R., Khotanlou,H. and Esmaeilpour ,M. 2014. A combinational method of fuzzy ,
particle swarm optimization and cellular learning automata for text summarization. IEEE
conference, 15(1).
12. Saleh, H. H., Kadhim, N. J. 2016. Extractive Multi-Document Text Summarization Using Multi-
Objective Evolutionary Algorithm Based Model. Iraqi Journal of Science, 57(1C): 728-741.
13. Luo, W., Zhuang, F., He, Q., and Shi, Z. 2013. Exploiting relevance , coverage , and novelty for
query-focused multi-document summarization. Knowledge-Based Syst., 46: 33–42.
14. Porter stemming algorithm: http://www.tartarus.org/martin/PorterStemmer
15. JOHN, A. 2016. Multi-Document Summarization System: Using Fuzzy Logic and Genetic
Algorithm. Int. J. Adv. Res. Eng. Technol., 7(1): 30 – 40 .
16. Satoshi, C. N. , Murata, M., Uchimoto, K., Utiyama, M. and Isahara, H. 2001. Sentence Extraction
System Assembling Multiple Evidence. Proc. 2nd NTCIR Work., pp. 319–324.
17. John, A. and Wilscy, D. M. 2013. Random Forest Classifier Based Multi-Document Summarization
System. IEEE Recent Adv. Intell. Comput. Syst. , pp. 31–36.
18. Chaterjee, S. and Hadi, A. 2006. Regression analysis by example. A John Wiley & Sons, Inc.,
Publication Fourth Edition.
19. Timothy, Z. 2015. Multiple Regression and Beyond: An Introduction to Multiple Regression and
Structural Equation Modeling. By Routledge, Second edition published.
20. Hwang, C.L.; Yoon, K. 1981. Multiple Attribute Decision Making: Methods and Applications. New
York: Springer-Verlag.
21. Hwang, C. L., Lai, Y.J., Liu, T.Y. 1993. A new approach for multiple objective decision
making. Computers and Operational Research, 20(8): 889-899.
22. Zavadskas, E., Zakarevicius, A. and Antucheviciene, J. 2006. Evaluation of Ranking Accuracy in
Multi-Criteria Decisions. Informatica, 17(4): 601-618.
23. García-cascales, M. S. and Lamata, M. T. 2012. On rank reversal and TOPSIS method. Math.
Comput. Model., 56(5): 123–132.
24. Thanh, P. and van, P. 2016. Project Success Evaluation Using TOPSIS Algorithm. Journal of
Engineering and Applied Sciences, 8: 1876-1879.
25. Giannakopoulos, G. and El-Haj, M. and Favre, B. and Litvak, M. and Steinberger, J. and Varma,
V. 2011. TAC 2011 MultiLing pilot overview . In: Text Analysis Conference (TAC) 2011,
MultiLing Summarisation Pilot : TAC, Maryland, USA.
26. Lin, C.-Y. 2004. ROUGE: a package for automatic evaluation summaries, in: Proceedings of the
Workshop on Text Summarization Branches Out, Barcelona, Spain, July 25–26, pp. 74–81
9014
View publication stats

TOPSIS With Multiple Linear Regression For Multi-Document Text Summarization

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

TOPSIS With Multiple Linear Regression For Multi-Document Text Summarization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TOPSIS With Multiple Linear Regression For Multi-Document Text Summarization

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

TOPSIS with Multiple Linear Regression for Multi-Document Text

Research in Iraqi Journal of Science · July 2017

multi document summarization View project

classification View project

The user has requested enhancement of the downloaded file.

TOPSIS with Multiple Linear Regression for Multi-Document Text

Keywords: weight feature, Muliple Linear Regression, TOPSIS,PIS,NIS.

‫ مع االنحدار الخطي المتعدد لتلخيص النصوص المتعدده‬Topsis

‫ زىير حسين علي‬، ‫سياد مال اهلل‬

Where Num(si) is the number of numerical data in the sentence (si)

Cw=TF *IDF (8)

Where n total number of documents.

Figure1- Block Diagram for Proposed MDS.

5.1 Multiple Linear Regression

Where A sentence from trained documents

5.2 TOPSIS Method

Xi,j is the feature value where, i=1,..,M and j=1,..,7.

Step2: Normalized a decision matrix by

Step3: Use MLR to construct the feature weights vector Wj

Step8:Determine the NIS

Step9 :calculate closness to ideal solution

5.4 Remove Redundancy

Summary generating algorithm

2- Max summary size called Max_size

output generated summary called summary

6. Dataset and Evaluation Metrics

Table 2- Proposed MDS results

ID1 0.45312 0.44421 0.44853 0.41253 0.40524 0.40776

ID2 0.52301 0.51232 0.51749 0.45655 0.46481 0.46062

ID3 0.49313 0.49131 0.49221 0.47909 0.43169 0.45404

ID4 0.48253 0.51324 0.49646 0.44966 0.44423 0.44691

ID5 0.52413 0.51023 0.51689 0.43513 0.41092 0.42243

ID6 0.51452 0.41321 0.44777 0.45122 0.35471 0.39617

ID7 0.44213 0.43214 0.43696 0.3953 0.39586 0.39547

ID8 0.43452 0.42341 0.42874 0.39265 0.38714 0.38985

ID9 0.39923 0.40212 0.40065 0.37726 0.38105 0.3791

ID10 0.5810 0.57203 0.57641 0.51806 0.52488 0.52141

View publication stats

You might also like