2015-Elsevier-Improved-particle-swarm-optimization-algorithm-and-its-application-in-text-feature-selection
2015-Elsevier-Improved-particle-swarm-optimization-algorithm-and-its-application-in-text-feature-selection
a r t i c l e i n f o a b s t r a c t
Article history: Text feature selection is an importance step in text classification and directly affects the classification
Received 8 September 2014 performance. Classic feature selection methods mainly include document frequency (DF), information
Received in revised form 6 June 2015 gain (IG), mutual information (MI), chi-square test (CHI). Theoretically, these methods are difficult to get
Accepted 6 July 2015
improvement due to the deficiency of their mathematical models. In order to further improve effect of
Available online 15 July 2015
feature selection, many researches try to add intelligent optimization algorithms into feature selection
method, such as improved ant colony algorithm and genetic algorithms, etc. Compared to the ant colony
Keywords:
algorithm and genetic algorithms, particle swarm optimization algorithm (PSO) is simpler to implement
Text classification
Text feature selection
and can find the optimal point quickly. Thus, this paper attempt to improve the effect of text feature
Particle swarm optimization algorithm selection through PSO. By analyzing current achievements of improved PSO and characteristic of classic
Constriction factor feature selection methods, we have done many explorations in this paper. Above all, we selected the
common PSO model, the two improved PSO models based respectively on functional inertia weight and
constant constriction factor to optimize feature selection methods. Afterwards, according to constant
constriction factor, we constructed a new functional constriction factor and added it into traditional
PSO model. Finally, we proposed two improved PSO models based on both functional constriction factor
and functional inertia weight, they are respectively the synchronously improved PSO model and the
asynchronously improved PSO model. In our experiments, CHI was selected as the basic feature selection
method. We improved CHI through using the six PSO models mentioned above. The experiment results
and significance tests show that the asynchronously improved PSO model is the best one among all
models both in the effect of text classification and in the stability of different dimensions.
© 2015 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.asoc.2015.07.005
1568-4946/© 2015 Elsevier B.V. All rights reserved.
630 Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636
weight calculation [17]. The results showed that the chaotic iner- particle is treated as a point in a D-dimensional space, which adjusts
tia weight was the best in terms of effect, and the random inertia it’s “flying” according to its own flying experience as well as the fly-
weight was the highest in efficiency. ing experience of other particles. The particles flight with a certain
Classification is the process in which ideas and objects are rec- velocity in the D-dimensional space to find the optimal solution.
ognized, differentiated, and understood [18]. Classification implies The velocity of particle i expresses as Vi = (vi1 , vi2 , ..., viD ), the loca-
that objects are usually grouped into required categories for some tion of particle i expresses as (xi1 , xi2 , ..., xiD ), the optimal location
specific purpose. There are many kinds of classification, including of particle i expresses as Pi = (pi1 , pi2 , ..., piD ), it is also called pbest .
power quality classification, nonstationary power signal time series The global optimum position of all particles expresses as Pg = (pg1 ,
data classification [19], text classification, etc. Recently, biological pg2 , ..., pgD ), it is also called gbest . Each particle in group has a fitness
evolution algorithms are more commonly used in classification to function to calculate the fitness value. In standard PSO, the velocity
improve accuracy. For example, B. Biswal et used Bacterial Forag- update formula of the dimension d shows in formulae (1) and (2):
ing Optimization Algorithm in classification of power quality data
[20]. Luo Xin et used Ant Colony Optimization Algorithm to con- vid = w × vid + c1 × rand( ) × (pid − xid ) + c2 × Rand( ) × (pgd − xid )
struct a text classification model [21]. Biswal et used Particle Swarm (1)
Optimization Algorithm to improve Fuzzy C-Means Algorithm. This
improved algorithm was utilized into Time Frequency Analysis and
Non-Stationary Signal Classification and Power Quality Disturb-
ance Classification [22,23]. Lu Yonghe and Liang Minghui utilized xid = xid + vid (2)
Genetic Algorithm to optimize text feature selection method in text PSO parameters include: Q (Population Quantity), w (inertia
classification [24]. weight), C1 and C2 (acceleration constants), vmax (the maximum
In this paper, we focus on using Particle Swarm Optimization velocity), Gmax (the maximum number of iterations), rand ( ) and
Algorithm to improve performance of text classification. Under Rand ( ) are random functions with values in [0,1]. The value of C1
the given classification system, text classification was defined as and C2 usually takes constant 2 [3].
a process which automatically identify text category according to
the text content [25]. Traditionally, scholars put forward many
2.2. Improve inertia weight
improved methods for text classification by optimizing mathe-
matic model. For example, Lu Yonghe and Li Yanfeng optimized
Inertia weight is an important parameter of the standard PSO.
text feature weighting method based on TF-IDF algorithm [26]. Lu
It determines the operating results of PSO. Fixed inertia weight
Yonghe and He Xinyu added similarity matrix and dimension index
make the particles always have the same exploration competence
table into KNN to improve KNN classification algorithm [27,28].
in flight. Formula (3) is the velocity formula with a fixed inertia
Wei Tingting et added WordNet and lexical chains to optimize text
weight in traditional PSO [3]:
clustering model [29]. Currently, in the field of text classification
research, the improved PSO has three aspects of application. The vid (t + 1) = w × vid (t) + c1 × rand( ) × (pid − xid (t)) + c2 × Rand( )
first one is to optimize the classic text classifier, such as KNN, SVM,
etc. [30]. Li Huan et al. proposed a simplified PSO KNN classifica- × (pgd − xid (t)) (3)
tion algorithm [31]. Tang Zhaoxia used the PSO algorithm to find
k neighbors in order to improve the efficiency of web text classifi-
According to experience, w generally takes between 0 and 1
cation [32]. Tuo Shouheng improved the inertia weight in PSO and
[3]. Thus, w is 0.9 in this paper. The velocity formula (3) becomes
added this method into SVM classification [33]. The second one is
formula (4). (Here, we call it Program 1):
to build a text classifier using PSO. Luo Xin constructed the text
classification model based on PSO [34]. Tong Yala et al. proposed vid = 0.9 × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid )
extraction method for classification rule based on chaos PSO [35].
Similarly, Tan Dekun also proposed a text classification method (4)
based on chaos PSO [36]. The last one is the use of PSO for text
feature selection [30]. Chih-Chin Lai et al. studied the application
Currently, the conventional strategy of improving inertia weight
of PSO in text feature selection for spam classification [37]. This was
is LDIW (Liner Decreasing Inertia Weight) [41]. The changing way
a specific application using PSO to optimize text feature selection.
of w appears in formula (5). (Here, we call it Program 2):
Yaohong Jin et al. used the improved PSO in Chinese text feature
selection [38]. Likewise, Zahran et al. proposed an improved feature vid = w × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid )
selection method based PSO in order to solve the problem of Ara-
bic text feature selection [39]. HK Chantar et al. also analyzed the ⎧
characteristics of the Arabic text classification and used binary PSO ⎨w = w + (wstart − wend )(1 −
T
) if pgd =
/ xid
end
for text feature selection to improve accuracy of text classification Gmax (5)
[40].
⎩w = w if pgd = xid
end
In this paper, we firstly improve particle swarm optimization
algorithms, and then apply them to text feature selection. Finally, where T is the number of iterations, T ∈ [0, Gmax ), pgd is global best
we analyze the application effect of various improved particle position, wstart is the initial inertia weight and wend is the value of
swarm programs by using KNN classifier. evolution in the maximum iterations. wstart and wend are calculated
in subsequent chapters.
2.1. Traditional particle swarm optimization algorithm Because the dimensions of a text feature vector in Text Catego-
rization are usually very high, the particles in PSO will gather into
PSO uses a number of particles, which constitute a swarm mov- a point when it is not yet to find the global optimum [42]. Thus,
ing around in the search space, to look for the best solution. Each Clerc introduced constriction factor K into PSO to ensure the best
Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636 631
convergence. Formula (6) presents the velocity formula (Here, we closer to food. In addition, the birds fly rapidly to find the orien-
call it Program 3): tation of food as soon as possible. When the particle is in a good
position in iterative process, inertia weight should take a small
vid = K[vid + c1 × rand( ) × (pid − xid ) + c2 × Rand( ) × (pgd − xid )]
value so as to retain the small part of original velocity for local
(6) exploration. When the fitness value of the particle position is poor,
inertia weight should take a larger value so as to preserve most
of the original velocity for better global optimization. The velocity
Program 3 in this paper used the formula proposed by Clerc to formula integrating constriction factor and inertia weight becomes
calculate the constriction factor K. Value c1 and value c2 used 2.05 formula (10) (Here, we call it Program 5):
which were the same with Clerc’s experiment. Here we reserve
four decimal places of K for experiment. Formula (7) is the specific vid = K[w × vid + c1 × rand( ) × (pid − xid ) + c2 × Rand( )
velocity formula:
× (pgd − xid )] (10)
vid = 0.7298 × [vid + 2.05 × rand( ) × (pid − xid ) + 2.05 × Rand( )
× (pgd − xid )] (7)
Considering research results done by Huang Zhongpeng [40],
the calculation for specific changes of inertia weight is shown in
In the early iterations, a particle in PSO needs to detect in a wide formula (11):
range to determine the likely location of the optimal solution. In
later iterations, it needs to develop locally within a small range to w = wend + (wstart − wend )(1 − (T/Gmax )) if pgd =
/ xid
(11)
determine the optimal point. Thus, K should take a larger value in
w = wend if pgd = xid
the early and take a smaller value in the later. Simultaneously, K
should become smaller slowly to the minimum in a longer period where T is the number of iterations, T ∈ [0, Gmax ), pgd is the global
of late stage [43]. This pattern of change is consistent with the con- optimum position.
cave function. In applications of text classification, especially in text Here we need to calculate value wstart and value wend . The
feature selection applications, the primary objective of PSO is to find constriction factor K and the inertia weight w are assumed to be
the optimal solution in the search space to get the best result for constant in order to facilitate calculation. Because each dimension
text classification. To avoid premature convergence, the constric- value of PSO does not interfere with each other, we only discuss
tion factor should choose a convex function in the early iterations the situation of one-dimensional value. pb is the individually best
so that the particles can find optimal solution in a wide range. In the position, gb is the globally best position. Formula (12) is obtained
late period, it should choose a concave function so that the constric- according to the formula (10).
tion factor can change slowly to the minimum in order to develop
locally. It ensures convergence of the algorithm. According to this v(t + 1) = K × w × v(t) + K × c1 × rand( ) × (pb − x(t)) + K × c2
principle, the functional constriction factor structuring on the basis
of the cosine function is showed in formula (8): × Rand( ) × (gb − x(t)) (12)
cos((/Gmax ) × T ) + 2.5
K= (8)
4 The rand ( ) and Rand ( ) are functions to generate a random
where T is the number of iterations. Set Gmax = 40, the changing number between 0 and 1. In order to calculate value K, we replace
curve of value K appeared in Fig. 1. respectively c1 × rand() and c2 × rand() with c1 and c2 . We can get
The curve of K in the Fig. 1 is convex function at first and trans- the formula (13):
forms into a concave function at last. The value K is substituted in
formula (1), and then formula (1) turns into formula (9). (Here, we v(t + 1) = K × w × v(t) + K × c1 × (pb − x(t)) + K × c2 × (gb − x(t))
call it Program 4). Formula (9) is described below: (13)
cos(( × T/Gmax )) + 2.5
vid = × [vid + 2 × rand( )
4
We get the v(t + 2) similarly:
× (pid − xid ) + 2 × Rand( ) × (pgd − xid )] (9)
v(t + 2) = K × w × v(t + 1) + K × c1 × (pb − x(t + 1)) + K × c2
× (gb − x(t + 1)) (14)
2.4. Improve synchronously inertia weight and constriction factor
The velocity formula using only constriction factor does not x(t + 2) = x(t + 1) + v(t + 2) = x(t + 1) + K × w × v(t + 1)
use inertia weight. The analysis made by Lin Jie showed that the
algorithm with constriction factor and the algorithm with iner- + K × c1 × (pb − x(t + 1)) + K × c2 × (gb − x(t + 1))
tia weight are equivalent when the constriction factor K and the = x(t + 1) + Kw(x(t + 1) − x(t)) + Kc1 pb − Kc1 x(t + 1)
inertia weight w is equal [44]. The algorithm with constriction
factor can be regarded as a special case of algorithm with inertia + Kc2 gb − Kc2 x(t + 1) = (1 + Kw − Kc1 − Kc2 )x(t + 1)
weight. However, constriction factor and inertia weight have com-
− Kwx(t) + Kc1 pb + Kc2 gb (15)
pletely different meaning. Therefore, this section discusses the kind
of situation using synchronously the inertia weight and functional
constriction factor together. Constriction factor K uses directly the
⎡ ⎤
formula proposed in Section 2.3. The changing way of inertia weight 1 + Kw − Kc1 − Kc2 −Kw Kc1 pb + Kc2 gb
needs to seek from process of birds foraging. According to the Make A = ⎣ 1 0 0 ⎦ (16)
research results of Eberhart [45], in process of bird foraging, the
birds slow down to search for the specific location when they are 0 0 1
632 Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636
Formula (17) is the homogeneous matrix of formula (15): to the maximum value of the right term. We know by the for-
⎡ ⎤ ⎡ ⎤ when T is 0, −1/K is the
mula (8): maximum value, and its value
x(t + 2) x(t + 2)
is −1/ cos(/Gmax × 0) + 2.5 /4 . According to the inequality
⎣ x(t + 1) ⎦ = A ⎣ x(t) ⎦ (17) 1
(21), wend = ((c1 + c2 )/2) − (cos(/Gmax ×0)+2.5)/4
. If the values of c1
1 1
and c2 are 2, then wend = 2 − (1/(3.5/4)) = 0.857143.
Formula (18) shows the characteristic equation of the coefficient In addition, the value w is in the range [0,1], according to the
matrix for formula (15): experience, wstart takes 1 in this paper. The velocity formula of Pro-
gram 2 becomes formula (22) when wend and wstart are put into the
( − 1)(2 − (1 + Kw − Kc1 − Kc2 ) + Kw) = 0 (18)
formula (5).
The characteristic Eq. (18) has three characteristic roots. For-
vid = w × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid )
mula (19) shows these roots:
⎧ ⎧
⎪ =1 ⎨ w = 0.857143 + (1 − 0.857143) 1 − T
⎪
⎪ if pgd =
/ xid
⎪
⎨ 1 + Kw − Kc1 − Kc2 + (1 + Kw − Kc1 − Kc2 )2 − 4Kw Gmax (22)
˛=
2 (19) ⎩
w = 0.857143 if pgd = xid
⎪
⎪
⎪
⎪ (1 + Kw − Kc1 − Kc2 )2 − 4Kw
⎩ ˇ = 1 + Kw − Kc1 − Kc2 − The obtained parameters are put into the formula (22) (Program
2 5), the formula (22) becomes formula (23):
If (1 + Kw − Kc1 − Kc2 )2 ≥ 4Kw, ˛ and ˇ are real roots.
If (1 + Kw − Kc1 − Kc2 )2 < 4Kw, ˛ and
ˇare imaginary roots. cos( × T/Gmax ) + 2.5
vid = × [w × vid + 2 × rand( )
When ˛ and ˇ are real roots, ˛ and ˇ are all absolute values. 4
When ˛ and ˇ are imaginary roots, ˛ and ˇ are models.
× (pid − xid ) + 2 × Rand( ) × (pgd − xid )]
We hope that PSO algorithm converges to the global optimal
solution as soon as possible. Thus it must be true that max (˛,
⎧
ˇ) ≤ 1 [47], and then we get the inequalities (20):
⎨ w = 0.857143 + (1 − 0.857143) 1 − T if pgd =
/ xid
⎧ Gmax
⎪ (23)
⎪ 1 + Kw − Kc1 − Kc2 + (1 + Kw − Kc1 − Kc2 )2 − 4Kw ⎩
⎪
⎪ ≤1 w = 0.857143 if pgd = xid
⎨ 2
(20)
⎪
⎪ 2.5. Improve asynchronously inertia weight and constriction
⎪
⎪ 1 + Kw − Kc1 − Kc2 − (1 + Kw − Kc1 − Kc2 )2 − 4Kw
⎩ 2
≤1
factor
Solving the inequalities (20), inequalities (21) is the result: Constriction factor affects convergence of PSO and inertia
weight affects the degree of maintaining the original velocity. Based
c1 + c2 1 on the different characteristics of constriction factor and inertia
w≥ − (21)
2 K weight, we use them in different periods of PSO. In the early iter-
Because the right term of the inequality (21) is not constant ations of particles, inertia weight keeps the original velocity in a
term, it changes along with the change of iterations. Accord- certain tactics in order to balance the global search and local explo-
ing to the inequality (21), w should be greater than or equal ration. In the later, functional constriction factor ensures that PSO
Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636 633
can converge to the optimal point. Because the first half-cycle of Table 1
Programs description.
functional constriction factor is a convex function so that it could
take a global search; the second half-cycle is concave function so No. Program description
that the velocity of particle slowly decreases to the minimum, it Program 1 Fixed inertia weight
ensures both local exploration and the convergence. In the itera- Program 2 Improved inertia weight
tive process, inertia weight should take a small value so that the Program 3 Fixed constriction factor
particle retains the small part of original velocity for local develop- Program 4 Improved constriction factor
Program 5 Improved synchronously inertia weight and constriction factor
ment when the position of particle is closed to the optimal solution.
Program 6 Improved asynchronously inertia weight and constriction factor
Inertia weight should take a larger value so that the particle pre-
served the most of original velocity to find better global optimal
solution when the fitness of particle is poor [45]. Thus, the change Table 2
Classification accuracy in different features dimensions for six programs.
of inertia weight is showed by formula (24) [46]:
⎧ Programs 1 2 3 4 5 6
⎨ w = 0.857143 + (1 − 0.857143) 1 − T if pgd =
/ xid
Gmax (24)
Feature dimensions
⎩ 300 0.7428 0.7572 0.7514 0.7948 0.7601 0.7803
w = 0.857143 if pgd = xid 450 0.7775 0.7659 0.7457 0.7746 0.7803 0.8064
600 0.7775 0.7457 0.7948 0.7919 0.7919 0.7977
Eq. (25) is the adjusted constriction factor: 750 0.7514 0.7717 0.7688 0.7717 0.7659 0.7948
900 0.7688 0.7717 0.7861 0.7543 0.7514 0.7832
cos(2/Gmax × (T − Gmax /2)) + 2.428571
K= (25) 1050 0.7486 0.789 0.7861 0.7659 0.7717 0.7803
4 1200 0.7659 0.7832 0.7919 0.7861 0.789 0.7919
The equation of velocity change is showed in formula (26). (Here, Mean 0.7618 0.7692 0.775 0.7799 0.7729 0.7927
we call it Program 6): Standard deviation 0.0142 0.0148 0.0199 0.0147 0.0142 0.0099
⎧
⎨ vid = w × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid ) if T <
Gmax
3.2.1. Convergence of all programs
2
⎩v Gmax In order to evaluate the convergence of all programs, we
id = K[˛ × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid )] if T >= recorded each program’s specific convergence process in the
2
(26) experimental conditions of feature dimensions 300 and feature
dimensions 450. Under feature dimensions 300, the convergence
curves are showed in Fig. 2. Program 6 has the best final fitness in
The ˛ in the formula (26) is to measure the degree of reserved
all Programs and Program 1 has the final worst fitness. Convergence
original velocity of particle. In the case where the particle velocity
speed of Program 6 is fast, it merely need less than 20 times itera-
needs to be reserved, Program 6 uses constriction factor for con-
tion to achieve the optimum. Convergence speed of Program 3 is the
vergence of velocity. The initial value ˛ is wend , but we found by
fastest but its final fitness is poor. Convergence speed of Program 5
experiments that value wend retained too much original velocity
is the slowest but its final fitness is the best except for Program 6.
and leaded instability of the final convergence. So we decide to set
Convergence curves are similar between Program 2 and Program 4.
˛ in interval (0, wend ] and find that ˛ being 0.7 shows the best per-
Under feature dimensions 450, the convergence curves are showed
formance. Therefore, the change of velocity for Program 6 is shown
in Fig. 3. Program 6 also has the best fitness in all Programs and
in formula (27):
its convergence speed was very rapid. The fitness of Program 1 is
⎧ merely worse than Program 6, but its convergence speed is the
⎨ vid = w × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid ) T<
Gmax
2 slowest. It needs to iterate over 25 times to achieve the optimum,
⎩v = K[0.7 × vid + 2 × rand( ) × (pid − xid ) + 2 × Rand( ) × (pgd − xid )] T >=
Gmax but Program 6 only needs 15 times. The convergence speed of Pro-
id
2 gram 3 is the fastest, but its fitness is the worst. The optimum of
(27) Program 5 is the same to Program 2, and the convergence speed
of Program 5 is faster. Program 4 has general performance in opti-
3. Experimental data and analysis mum and convergence speed. In summary, Program 6 has the best
convergence in all programs.
3.1. Experimental environment
3.2.2. Classification results
The experiment’s main goal is to apply the improved PSO to the Tables 2 and 3 are the all experimental results. We can draw
text feature selection. In the experiment, we used the improved PSO some conclusions from the experimental results. First of all, Pro-
to randomly search text feature set, and then constructed the text gram 6 presents the highest mean in both accuracy and MacF1 , and
feature vector based on this set [30]. The KNN was selected as classi-
fier due to its simplicity. We evaluate advantages and disadvantages Table 3
among the different improved PSO programs by comparing the MacF1 in different features dimensions for six programs.
classification accuracy and MacF1 . The core of the experiment was
Programs 1 2 3 4 5 6
to compare programs about velocity change. Some parameters in
the experiment need to be explained: the K in KNN was 10; Q (pop- Feature dimensions
300 0.7613 0.766 0.6885 0.7803 0.7247 0.7842
ulation quantity) was 10; Gmax (maximum generation) was 40. Text 450 0.7394 0.7509 0.7239 0.745 0.7843 0.7699
preprocessing uses the Standard Analyzer in Lucene 3.0. The corpus 600 0.7367 0.7472 0.7708 0.7664 0.7697 0.803
applied the Reuters-21578. 750 0.7287 0.7401 0.7661 0.728 0.683 0.7835
900 0.7022 0.7534 0.7407 0.7055 0.7218 0.7803
3.2. Comparison and analysis of the experimental results 1050 0.7106 0.7576 0.7911 0.7616 0.7521 0.775
1200 0.723 0.7502 0.7584 0.7459 0.7725 0.7807
Six programs described in chapter 3 made comparative experi- Mean 0.7288 0.7571 0.7485 0.7536 0.744 0.7824
ments. Table 1 described the six programs. Standard deviation 0.0196 0.0081 0.0341 0.0251 0.0359 0.0104
634 Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636
shows the most excellent results. Meanwhile, all programs have Table 4
Comparison of classification accuracy between program 6 and other programs by
low standard deviations for accuracy and MacF1 , it means that accu-
using paired T-test.
racy and MacF1 of all programs are stable and balanced. Especially,
as for accuracy, the standard deviation of Program 6 is the lowest; Parameters t p
as for MacF1 , it is the second lowest. So Program 6 presents the Paired programs
most stable performance and it is also the optimal solution. 1 vs 6 −7.727 0
2 vs 6 −2.797 0.031
3 vs 6 −1.712 0.138
3.2.3. Significance tests 4 vs 6 −2.229 0.067
5 vs 6 −3.960 0.007
Through the above analysis, the means of Program 6 is highest
among six programs for both accuracy and MacF1 . In order to verify
the means of Program 6 are significantly higher than the ones of
other programs, paired-sample T-test was used for significance test
and the results are shown in Tables 4 and 5.
Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636 635
Table 5 Acknowledgement
Comparison of MacF1 between program 6 and other programs by using paired T-test.
Parameters t p This work was supported in part by the National Natural Sci-
Paired programs ence Foundation of China (NSFC Grant No: 71373291). This work
1 vs 6 −7.122 0 also was supported by the National High Technology Research
2 vs 6 −5.478 0.002 and Development Program of China (863 Program) under Grant
3 vs 6 −2.641 0.038 2012AA101701.
4 vs 6 −3.794 0.009
5 vs 6 −2.669 0.037
References
[1] J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proc IEEE International
Table 4 shows that the T-test results of 1 vs 6, 2 vs 6 and 5 vs Conference on Neural Networks, Perth, Australia, 1995, pp. 1942–1948.
6 reject the null hypothesis while the ones of 3 vs 6 and 4 vs 6 do [2] Xiaofeng Xie, Zhang Wenjun, Yang Zhilian, Overview of particle swarm opti-
mization, Control Decis. 18 (2) (2003) 129–134.
not at a significance level of 0.05. Therefore, Program 6 is signifi- [3] Dingwei Wang, Wang Junwei, Wang Hongfeng, Zhang Ruiyou, Guo Zhe, Intel-
cantly different from other programs except for Programs 3 and 4 ligent Optimization Methods, Higher Education Press, Beijing, 2007, 221–222,
in accuracy. 226.
[4] P.J. Angeline, Using selection to improve particle swarm optimization, in: IEEE
Table 5 shows that all T-test results reject the null hypothesis at International Conference on Evolutionary Computation, Anchorage, AK, USA,
a significance level of 0.05, thus Program 6 is significantly different 1998, pp. 84–89.
from other programs in MacF1 . [5] M. Lovbjerg, T.K. Rasmussen, T. Krink, Hybrid particle swarm optimizer with
breeding and subpopulations, in: Third Genetic and Evolutionary Computation
Therefore, we can draw conclusion that Program 6 is superior to Conference, 2001, pp. 469–476.
other programs in the effect of text classification and the stability [6] N. Higashi, H. Iba, Particle swarm optimization with gaussian mutation, in: The
of different dimensions. 2003 Congress on Evolutionary Computation, Canberra, Australia, 2003, pp.
72–79.
To sum up, the Program 6 can achieve the best effect and
[7] S. Baskar, P.N. Suganthan, A novel concurrent particle swarm optimization, in:
stability of text classification. In addition, constriction factor and The 2004 Congress Evolutionary Computation, Portland, OR, USA, 2004, pp.
inertia weight have no equivalence because they update the veloc- 792–796.
[8] L. dos Santos Coelho, A quantum particle swarm optimizer with chaotic muta-
ity from different aspects. Constriction factor generates effect after
tion operator, Chaos Solitons Fractals 37 (5) (2002) 1409–1418.
the calculation of velocity so that the improved PSO ensures con- [9] R. Brits, A.P. Engelbrecht, F. van den Bergh, A Niching particle swarm opti-
vergence. Inertia weight measures the degree of particles keeping mizer, in: The 4th Asia-Pacific Conference on Simulated Evolution and Learning,
original velocity. Inertia weight cannot act on the part of informa- Singapore, 2002, pp. 692–696.
[10] J.S. Vesterstrm, J. Riget, T. Krink, Division of labor in particle swarm optimisa-
tion sharing and the part of social cognition in velocity formula. tion, in: The 4th Congress on Evolutionary Computation, 2002, pp. 1570–1575.
Thus, constriction factor and inertia weight have different targets [11] S. Janson, M. Middendorf, A hierarchical particle swarm optimizer, in: The 2003
to adjust the velocity of particle and these two adjustments are very Congress on Evolutionary Computation, Canberra, Australia, 2003, pp. 770–776.
[12] M. Clerc, The swarm and the queen: towards a deterministic and adaptive
necessary to integrate together. particle swarm optimization, in: Proceedings of the Congress on Evolutionary
Computation, Washington DC, USA, 1999, pp. 1951–1957.
[13] F. van den Bergh, An Analysis of Particle Swarm Optimizers, Department of
Computer Science, University of Pretoria, South Africa, 2002, pp. 2002.
4. Conclusion [14] F. Van den Bergh, A.P. Engelbrecht, A new locally convergent particle swarm
optimizer, in: Proc of the IEEE International Conference on Systems, Man and
Cybernetics, Hammamet, Tunisia, 2002, pp. 96–101.
In this paper, some improved PSO programs for text feature [15] A. Chatterjee, P. Siarry, Nonlinear inertia weight variation for dynamic adapta-
selection are proposed, including the functional constriction fac- tion in particle swarm optimization, Comput. Oper. Res. 33 (3) (2006) 859–871.
tor and its integration with the improved inertia weight. Proved by [16] N. Ahmad, M. Mehdi Ebadzadeh, R. Safabakhsh, A novel particle swarm opti-
mization algorithm with adaptive inertia weight, Appl. Soft Comput. 11 (4)
contrastive experiments, the PSO using functional constriction fac-
(2011) 3658–3670.
tor has better performance in text classification than common PSO. [17] J.C. Bansal, P.K. Singh, M. Saraswat, A. Verma, S.S. Jadon, A. Abraham, Iner-
What is more, the PSO integrating inertia weight and constriction tia weight strategies in particle swarm optimization, in: 2011 Third World
factor shows a significant increase in classification performance. Congress on Nature and Biologically Inspired Computing, Salamanca, Spain,
2011, pp. 633–640.
The improvement of integrating inertia weight and constriction [18] H. Cohen, C. Lefebvre (Eds.), Handbook of Categorization in Cognitive Science,
factor is more obvious. Last of all, because of the convergence char- Elsevier, 2005.
acteristics of PSO, combining asynchronously inertia weight and [19] B. Biswal, M. Biswal, S. Hassan, Non-stationary power signal time series data
classification using LVQ classifier, Appl. Soft Comput. 18 (2014) 158–166.
constriction factor is better than combining them synchronously. [20] B. Biswal, H.S. Behera, R. Bisoi, P.K. Dash, Classification of power quality data
In summary, the proposed methods are effective on improving per- using decision tree and chemotactic differential evolution based fuzzy cluster-
formance of text classification. ing, Swarm Evol. Comput. 4 (2012) 12–24.
[21] Xin Luo, Wang Zhaoli, Lu Yonghe, A study of text classification based on ant
However, a perfect method does not exist. The proposed meth- colony optimization, Libr. Inf. Serv. 55 (2) (2011) 103–106.
ods also have some deficiencies. Firstly, the experiments only used [22] B. Biswal, P.K. Dash, B.K. Panigrahi, Time frequency analysis and non-stationary
the KNN to verify the reasonableness for the proposed PSO pro- signal classification using PSO based fuzzy C-means algorithm, IETE J. Res. 53
(5) (2007) 441–450.
grams. The reasonableness of other classification methods has not [23] B. Biswal, P.K. Dash, K.B. Panigrahi, Power quality disturbance classification
yet to be verified. Secondly, the experiments only verified the effec- using fuzzy C-means algorithm and adaptive particle swarm optimization, IEEE
tiveness in feature selection. Hence, we still do not know whether Trans. Ind. Electron. 56 (1) (2009) 212–220.
[24] Lu Yonghe, Liang Minghui, Improvement of text feature extraction with genetic
the proposed methods are effective for weight calculation. Finally,
algorithm, N. Technol. Libr. Inf. Serv. 4 (2014) 48–57.
the data set used in experiments is constant and some texts are [25] Yingfan Gao, Wang Huilin, A controllable feature selection algorithm based on
outdated. It affects experimental results. Thus, the effectiveness of Poisson estimates, J. Chin. Soc. Sci. Tech. Inf. 29 (3) (2010) 408–413.
more data remains to be tested. [26] Lu Yonghe, Li Yanfeng, Improvement of text feature weighting method based
on TF-IDF algorithm, Libr. Inf. Serv. 57 (3) (2013) 90–95.
Because of these shortcomings, our future research will focus on [27] Lu Yonghe, He Xinyu, Improved KNN classification algorithm based on dimen-
implementing a more comprehensive verification for the improved sion index table, Inf. Stud. Theory Appl. 37 (5) (2014) 102–106.
PSO programs proposed by us. For example, we will use them in [28] Lu Yonghe, He Xinyu, The document similarity matrix in the application of the
KNN classification efficiency, Inf. Stud. Theory Appl. 37 (1) (2014) 141–144.
other steps in text classification model and will gather more test [29] T. Wei, Y. Lu, H. Chang, Q. Zhou, X. Bao, A semantic approach for text clustering
samples for verification. using WordNet and lexical chains, Expert Syst. Appl. 42 (4) (2015) 2264–2275.
636 Y. Lu et al. / Applied Soft Computing 35 (2015) 629–636
[30] Lu Yonghe, Cao Lichao, Text feature selection method based on particle swarm [40] K.C. Hamouda, W.C. David, Feature subset selection for arabic document
optimization, N. Technol. Libr. Inf. Serv. 29 (3) (2011) 408–413. categorization using BPSO-KNN, in: 2011 Third World Congress on Nature
[31] Huan Li, Jiao Jianmin, Improved simplified PSO KNN classification algorithm, and Biologically Inspired Computing (NaBIC), Salamanca, Spain, 2011, pp.
Comput. Eng. Appl. 44 (32) (2008) 57–59. 546–551.
[32] Zhaoxia Tang, Web intelligent classification algorithm based PSO and KNN, J. [41] Y. Shi, R.C. Eberhart, Empirical study of particle swarm optimization, in: CEC
Taiyuan Norm. Univ. (Nat. Sci. Ed.) 9 (4) (2010) 55–58. 99. Proceedings of the 1999 Congress on IEEE Evolutionary Computation, 1999,
[33] Shouheng Tuo, Research on text categorization based on support vector p. 3.
machine optimized by particle swarm optimization algorithm, Comput. Dev. [42] Dexiang Luo, The Improvement Method Research of Particle Swarm Optimiza-
Appl. 23 (10) (2010) 3–5. tion, GuangXi University for Nationalities, NanNing, 2009.
[34] Xin Luo, Text Classification Based on Swarm Intelligence, Sun Yat-sen Univer- [43] Jianxiang Wei, Sun Yuehong, Su Xinning, A document clustering algorithm
sity, GuangZhou, 2009. using particle swarm optimization, J. Chin. Soc. Sci. Tech. Inf. 29 (3) (2010)
[35] Yala Tong, Chen Yi, A web document categorization rule extraction based 428–432.
on chaos particle swarm optimization, Microelectron. Comput. 26 (2) (2009) [44] Jie Lin, Weighted Naive Bayesian Classifier Based on Particle Swarm
193–196. Optimization, YunNan University of Finance and Economics, KunMing,
[36] Dekun Tan, Research of Chinese text categorization based on chaotic particle 2011.
swarm optimization, Appl. Res. Comput. 27 (12) (2010) 4464–4466. [45] R.C. Eberhart, A new optimizer using particle swarm theory, in: Proceedings of
[37] Lai Chih-Chin, Chih-Hung Wu, Particle swarm optimization-aided feature the 6th International Symposium on Micro and Human Science, Nagoya, Japan,
selection for spam email classification, Innov. Comput. Inf. Control, Kumamoto, 1995, pp. 39–43.
Japan (2007) 165. [46] Chongpeng Huang, Xiong Weili, Xu Baoguo, Influnce of inertia weight on astrin-
[38] Yaohong Jin, Xiong Wen, Wang Cong, Feature selection for Chinese text catego- gency of particle swarm algorithm and its improvement, Comput. Eng. 34 (12)
rization based on improved particle swarm optimization, in: Natural Language (2008) 31–33.
Processing and Knowledge Engineering (NLP-KE). Beijing, China, 2010, pp. 1–6. [47] Hongbo Liu, Wang Xiukun, Tan Guozhen, Convergence analysis of particle
[39] B.M. Zahran, G. Kanaan, Text feature selection using particle swarm optimiza- swarm optimization and its improved algorithm based on Chao, Control Decis.
tion algorithm, World Appl. Sci. J. 2009 (7) (2009) 69–74. 21 (6) (2006) 636–640.