0% found this document useful (0 votes)
23 views11 pages

Journal of Computational Science: Laith Mohammad Abualigah, Ahamad Tajudin Khader, Essam Said Hanandeh

Uploaded by

yuvicena940
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views11 pages

Journal of Computational Science: Laith Mohammad Abualigah, Ahamad Tajudin Khader, Essam Said Hanandeh

Uploaded by

yuvicena940
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Journal of Computational Science 25 (2018) 456–466

Contents lists available at ScienceDirect

Journal of Computational Science


journal homepage: www.elsevier.com/locate/jocs

A new feature selection method to improve the document clustering


using particle swarm optimization algorithm
Laith Mohammad Abualigah a,∗ , Ahamad Tajudin Khader a , Essam Said Hanandeh b
a
School of Computer Sciences, Universiti Sains Malaysia, 11800, Pulau Pinang, Malaysia
b
Department of Computer Information System, Zarqa University, P.O. Box 13132 Zarqa, Jordan

a r t i c l e i n f o a b s t r a c t

Article history: The large amount of text information on the Internet and in modern applications makes dealing with
Received 18 December 2016 this volume of information complicated. The text clustering technique is an appropriate tool to deal with
Received in revised form 23 June 2017 an enormous amount of text documents by grouping these documents into coherent groups. The doc-
Accepted 30 July 2017
ument size decreases the effectiveness of the text clustering technique. Subsequently, text documents
Available online 6 September 2017
contain sparse and uninformative features (i.e., noisy, irrelevant, and unnecessary features), which affect
the effectiveness of the text clustering technique. The feature selection technique is a primary unsu-
MSC:
pervised learning method employed to select the informative text features to create a new subset of
00-01
99-00
a document’s features. This method is used to increase the effectiveness of the underlying clustering
algorithm. Recently, several complex optimization problems have been successfully solved using meta-
Keywords: heuristic algorithms. This paper proposes a novel feature selection method, namely, feature selection
Unsupervised feature selection method using the particle swarm optimization (PSO) algorithm (FSPSOTC) to solve the feature selection
Informative features problem by creating a new subset of informative text features. This new subset of features can improve
Particle swarm optimization algorithm
the performance of the text clustering technique and reduce the computational time. Experiments were
K-mean text clustering algorithm
conducted using six standard text datasets with several characteristics. These datasets are commonly
used in the domain of the text clustering. The results revealed that the proposed method (FSPSOTC)
enhanced the effectiveness of the text clustering technique by dealing with a new subset of informative
features. The proposed method is compared with the other well-known algorithms i.e., feature selection
method using a genetic algorithm to improve the text clustering (FSGATC), and feature selection method
using the harmony search algorithm to improve the text clustering (FSHSTC) in the text feature selection.
© 2017 Elsevier B.V. All rights reserved.

1. Introduction as one dimension space. Thus, the effectiveness of the clustering


technique is affected by the size of the dimension space and the
In recent years, there has been a significant increase in digital volume of the uninformative features [4].
text documents on the Internet and modern applications which Text documents hold informative and uninformative features,
affect the text analysis (i.e., text feature selection, text document where the uninformative features are noisy, irrelevant, redundant
clustering, text categorization, etc.) process. Text clustering is a features, etc. [5,6]. Unsupervised feature selection is a primary task
suitable technique used to group (cluster) an enormous amount used to find a new optimal subset of informative features for each
of text documents into a predetermined number of groups [1,2]. document [7,8]. This technique is used to enhance the clustering
This technique has been used in many domains within text mining technique without foreknowledge of the document’s class label.
such as text retrieval, text categorization, and image segmentation The feature selection method gives accurate results when defined
[3]. Vector space model (VSM) is a standard traditional model used as an optimization problem. It is based on two objectives: (1) How
in the text mining to represent the document features as a vec- to improve the effectiveness of the text clustering algorithm, (2)
tor (row) of weight. In this model, each term weight is described How to obtain the low number of uninformative features [9,10].
Several domains in text mining benefit from the feature selection
technique such as text clustering, anomaly detection in earth dams,
text categorisation based on clustering feature selection, load and
∗ Corresponding author. price forecast of electrical power systems, and text retrieval.
E-mail addresses: lmqa15 [email protected] (L.M. Abualigah),
[email protected] (A.T. Khader), [email protected] (E.S. Hanandeh).

https://doi.org/10.1016/j.jocs.2017.07.018
1877-7503/© 2017 Elsevier B.V. All rights reserved.
L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466 457

Text clustering is a useful unsupervised learning method used A new method combined document frequency (DF) and term
to partition a set of digital text documents into a subset of groups frequency (TF) is proposed to enhance the text feature selection
to make the access tidy and easier for the users. Text clustering technique (DT-FS) [20]. It improved the performance of text clus-
algorithms seek to find an optimal solution to distribute a set of tering by keeping the essential information. The threshold value is
text documents [11]. The algorithm acts based on evaluation cri- determined beforehand to apply the document’s features. Experi-
teria such as objective function and fitness function. The Internet ments were conducted on seven documents. The results showed
has become a primary source of information. Also, there is an unor- that the proposed method overwhelms the other comparative
ganised enormous amount of text documents used in universities, methods. A new unsupervised text feature selection method is
learning centres, hospitals, and digital library datasets [12]. Sub- used for text clustering technique, namely clustering guided sparse
sequently, the clustering technique is useful to help users review structural learning (CG-SSL) [21].
their needs. Metaheuristic algorithms have been successfully used to solve
A new feature selection method is proposed using the parti- many hard optimization problems [22,7,23,24]. Kennedy and Eber-
cle swarm optimization algorithm, namely, the feature selection hart in (1995) introduced the particle swarm optimization (PSO)
method using the particle swarm optimization algorithm to algorithm [25]. It inspired by a swarm intelligence based on
improve the text clustering technique (FSPSOTC). It is used for the metaheuristic optimization search. PSO mimics the social
selecting an optimal subset of informative features to improve the behaviour of the swarm, which refers to the set of candidate solu-
effectiveness of the text clustering technique. The first goal of the tions where each solution is a particle. This algorithm used the
paper is to propose a new feature selection method for enhanc- global best solution concept to obtain the optimal solution. In each
ing the performance of the text feature selection technique by iteration, the global best solution is recorded and updated [4,26,27].
eliminating the uninformative features. Experimental results were The feature selection technique is a type of optimization prob-
conducted on six text datasets to investigate the proposed method. lem used to achieve a new subset of informative features. PSO and
The proposed method is tested using the K-mean text clustering the genetic algorithm (GA) are two algorithms commonly used to
algorithm. This algorithm is a simple and fast clustering algorithm solve complex optimization problems. A new technique used the
used in the text clustering domain to evaluate the feature selection PSO algorithm with an opposition-based mechanism applied for the
and dimension reduction methods [13,2,4]. The results showed that text feature selection method [4]. The authors added two strategies:
the proposed method (FSPSOTC) outperformed the original K-mean (i) starting with promising solutions to achieve an optimal solu-
clustering technique without any feature selection operation and tion, and (ii) a new dynamic inertia weight to improve the fitness
obtained better results in comparison with the other well-known function. The proposed method is investigated three text subset
unsupervised feature selection algorithms in terms of precision, datasets. The experimental results showed that the effectiveness of
recall, F-measures and accuracy measures. the selected features is increased in terms of the clustering accuracy
This paper arranged as follows: Section 2 shows the text pre- and reduced the computation time.
processing steps for the text analysis. Section 3 illustrates the A new feature selection method based on using the PSO
proposed feature selection method to improve text clustering. algorithm is proposed to improve the effectiveness of text catego-
Experimental results of the proposed method are produced in Sec- rization [28]. The effectiveness of the proposed method is compared
tion 4. Finally, the conclusion is given in Section 5. with the effectiveness of other similar methods using the Reuters-
21578 dataset. The experimental results demonstrated that the
proposed selection method obtained better results in comparison
2. Literature review with other well-known methods to improve the text categorisation.
In another study, three models for the feature selection technique
Several different models have been used to enhance the effec- are proposed [14]. These models include: (1) the first model used
tiveness of the text feature selection problem. Metaheuristic the original PSO algorithm, (2) the second model used the improved
algorithms are the most successfully used approach to enhance the PSO algorithm, and (3) the third type has added a new function to
feature selection techniques [14,4]. Unsupervised feature selection the original PSO algorithm. The improved PSO algorithm introduces
techniques are utilised in text mining to improve the performance the inertia weight strategy to optimise the feature selection model.
of text clustering algorithms by obtaining the informative text fea- Experimental results showed that the second PSO model is the best
tures [15,10]. Several search strategies have been used to determine model for improving the effectiveness of text feature selection.
an informative subset of text features. These are categorised into Text mining deals with a lot of text features and data, and thus
filter and wrapper. the text clustering and the text classification can result in high
Filter selection methods are used for statistical analysis of the computational time and low reliability. A term-document matrix,
document’s features to assign a relevance value (score). These namely TDM is obtained through text mining representation as a
methods choose a new subset of informative features without any sparse matrix. This paper focuses on finding a set of optimised text
foreknowledge of the documents class label [16]. There are many features from the corpus using the genetic algorithm (GA) [29]. GA
approaches in this category including document frequency (DF) is utilised to extract text features as desired according to the term
[14], mutual information [17], information gain [18], term variance score. Term frequency-inverse document frequency (TF-IDF) is a
[2], term strength [4], mean absolute difference [13], mean–median common weight scheme to display the relationships of features
[2], Gini index [19] and absolute cosine [4]. through the repetitive process of the feature extraction. Experi-
Wrapper selection methods are used as a search strategy to ments were conducted on a set of spam mail (text documents) to
find a new subset of informative text features and evaluate the test and develop the performance of the text feature selection. The
obtained subset by using learning mechanisms (under consider- results showed that the proposed feature selection method with
ation the documents class label). However, the computational time GA recorded better performance regarding text clustering and text
of these methods is higher than the other methods [16]. A new cat- classification instead of using all the document’s features.
egory gives the advantages of both wrapper and filter approaches A novel GA is introduced to choose an optimal subset of text fea-
is a hybrid selection method for finding an accurate informative tures for developing the text clustering method [30]. This method
subset of informative features [16,5,6]. This method has received used the TF-IDF weight scheme to reduce the terms relationships.
considerable high attention for solving text feature selection prob- Experiments were conducted using spam email records to validate
lem because of its characteristics in dealing with text features. the effectiveness of the feature selection. The authors proved that
458 L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466

the proposed GA for solving the feature selection improved the effi- the text clustering technique, applying the k-mean clustering
ciency of the text clustering algorithm. In another study, a selection algorithm, and producing optimal clusters. In this stage, the k-
technique is proposed using the common term weight scheme (TF- mean clustering algorithm is applied to find optimal clusters
IDF) to reduce the execution time and enhance the effectiveness of documents. The following subsections show the stages of the
the clustering algorithm [31]. The results explained that the pro- proposed methodology.
posed method improved the efficiency of document clustering and
classification methods.
3.1. Text pre-processing
Feature selection methods are proposed based on the genetic
algorithm for improving the text clustering (FSGATC). In this
The pre-processing steps are used to change the text document
method, GA attempts to solve the text feature selection problem
contents to numerical form. A review of these pre-processing steps
[5]. At of the end, FSGATC is prepared to enhance the text clustering
presented in the following subsections [33,11].
by creating a new subset of informative text features. Experiments
were carried out on four text benchmark datasets and compared
with another well-known algorithm in the same domain, namely 3.1.1. Tokenization
feature selection technique based on the harmony search (HS) algo- Tokenization is the manner of splitting a stream of text doc-
rithm for text clustering (FSHSTC) [6]. The results proved that the uments into words or terms, and removing the empty sequence.
proposed method (FSGATC) recorded better results according to Each word or symbol is taken from the first character to the last
the performance of the text clustering algorithm (i.e., K-mean algo- character, where each word is called token [4].
rithm). These results were compared with the original K-mean
clustering algorithm without any feature selection algorithm and K-
mean clustering algorithm with HS algorithm for feature selection 3.1.2. Removal of stop words
(FSHSTC) in terms of F-measure and accuracy measures. A list of common popular words, such as: an, that, be, and other
The cat swarm optimization (CSO) algorithm is proposed to common words that have small weighting, high-frequency words,
solve benchmark optimization problems. CSO is modified to and short functional words in the text document clustering are
improve the feature selection technique in the text classification known as stop words. These words must be removed from the
[32]. The experiment was carried out using extensive data. The documents because they have high frequency and high weighting
results showed that the proposed modified CSO outperforms tradi- and thus decrease the performance of the text clustering technique.
tional version. Moreover, the TF-IDF combined with CSO achieved The list of stop words is available at http://www.unine.ch/Info/clef/
better results in text feature selection technique compared with , which consists of 571 words [18].
TF-IDF alone.
3.1.3. Stemming
3. Proposed method Stemming transforms appropriate inflectional forms of some
words to the same root by removing the prefixes and suffixes of
The proposed feature selection method based on particle swarm each word. For example, intersect, dissect, and section all have the
optimization algorithm for improving the text clustering (FSPSOTC) common origin or source Sect called feature. In this paper, we use
is applied to improve the text clustering algorithm by obtain- the Porter stemmer, which is the most common stemming method,
ing a new subset of document features. This method is used to employed in the area of text mining [2].
improve the performance of the text clustering algorithm by find-
ing a new optimal subset of text features. FAPSOTC is divided into 3.1.4. Terms weighting
three stages. The term weighting is assigned for each term or feature accord-
ing to its term frequency in each recorded document. If the term
• In the first stage, the pre-processing steps are applied to represent frequency is high and the same feature appears in a few documents,
the text documents in a numerical style as shown in Fig. 1. This we conclude that this feature is useful to distinguish between the
stage includes six steps, which are read the dataset, tokenization, contents of the document [16]. The term weighting for feature j in
removal of the stop words, stemming, compute the terms weight, document i is calculated using Eq. (1).
and document representation using vector space model (VSM).  n 
• In the second stage, the PSO algorithm is proposed to solve the wi,j = tf (i, j) × idf (i, j) = tf (i, j) × log , (1)
feature selection problem by eliminating uninformative features df (j)
at the level of the document. Fig. 2 shows the second stage of
where wi,j represents the weight of term j in the document number
the proposed method. It includes three steps, which are mod-
i, tf(i, j) is the occurrences of the term j in the document number i and
elling the feature selection as an optimization problem to adapt
idf(i, j) is the inverse document frequency. Which is used to enhance
the PSO algorithm. The PSO algorithm is applied to find a new
the terms that appear in a few documents. n is the number of all
subset of informative text features. This algorithm works at the
documents in the dataset, and df(j) is the number of documents
level of each document, which means that the PSO works docu-
which contain feature j. The following expression represents the
ment by document. It runs for each document until the maximum
documents in a common standard format using the vector space
number of iteration is reached. After fishing all the documents
model [33]:
given in the dataset, the PSO will combine all the produced sub-
sets of documents to generate a new dataset with the informative ⎡ w ··· w1,(t−1) w1,t

1,a
features.
• In the third stage, the k-mean text clustering algorithm is used to ⎢ ⎥
⎢ .. .. .. .. ⎥
partition a set of documents into clusters. We choose the k-mean ⎢ . . . . ⎥
VSM = ⎢
⎢ ... ... ... ... ⎥
⎥ (2)
text clustering algorithm as a powerful and efficient text clus-
⎢ ⎥
tering algorithm to evaluate the effectiveness of the proposed ⎣ w(n−1),1 ··· ··· w(n−1),t ⎦
feature selection method. Fig. 3 shows the third stage of the text
clustering technique. It includes three steps, which are modelling wn,1 ··· wn,(t−1) wn,t
L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466 459

Fig. 1. The first stage of the proposed method.

Fig. 2. The second stage of the proposed method.

3.2. Feature selection using particle swarm optimization Given F a set of text features, it is represented as a vector Fi = fi,1 ,
algorithm fi,b , . . ., fi,j , . . ., fi,t , where t is the number of all unique text features, i
is the document number. Let FS a new subset of informative features
This section explains the proposed feature selection method SFi = si,1 , si,2 , . . ., si,j , . . ., si,m , which is generated by the selection
based on the particle swarm optimization algorithm. algorithm with a new length m, sij ∈ {0, 1}, j = 1, 2, . . ., m. If si,j = 1
means that the jth feature is selected as informational feature in the
document number i. But, if si,j = 0 means that the jth text feature is
3.2.1. Mathematical model of the feature selection problem hidden or uninformative feature in the document number i [2,5,6].
The feature selection problem is expressed as an optimization
problem to find an optimal subset of informative features.
460 L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466

Fig. 3. The third stage of the proposed method.

Table 1 computing the difference of the mean value, then computing the
Solution representation of the feature selection technique
variance between the average and median of xi,j as the following:
X 0 1 1 −1 −1 1 0 −1 1 −1
t
1
MAD(Xi) = |xi,j − x̄i |, (3)
ai
j=1
3.2.2. Solution representation
In the PSO algorithm for the selection problem, each solution where
(particle) denotes a subset of document features as shown in the 1 t
solution presented in Table 1. The swarm of the PSO includes a col- x̄i = xi,j , (4)
lection of particles (positions), which represented as binary vectors a
j=1
(row), each particle includes some positions (features). Each posi-
tion represents one feature in the document. The jth position in the MAD(Xi) represents the fitness function of the solution i and xi,j is
particle gives the situation of the jth feature [5]. the value of the jth feature in the ith document, which came from the
We applied the feature selection method based on PSO algo- TF-IDF. ai is the number of the selected text features in document
rithm, which begins with random solutions and improves the i, t is the number of all text features, and x̄i is the mean value of the
population to reach the optimal global solution [6,4]. The opti- vector i.
mal solution designs a new subset of the document features. Each
unique feature in the given dataset considers as a one search space. 3.3. Particle swarm optimization algorithm
Table 1 presents the solution representation of the feature selection
technique. Kennedy and Eberhart in [25] introduced the particle swarm
If position j is equal to 1, the jth feature is chosen as an informa- optimisation (PSO) algorithm. PSO mimics the social behaviour of
tive feature, if position j is equal to 0, the jth feature is not chosen the swarm, which refers to the set of candidate solutions where
as an informative feature. If the position j is equal to −1, it means each solution is a particle. This algorithm uses the global best solu-
that the jth feature is not included in the original document. tion concept to obtain the optimal solution. In each iteration, the
global best solution is recorded and updated [8,13,4]. The pseudo-
code of the PSO is indicated in Algorithm 1.
3.2.3. Fitness function
The fitness function (FF) is an evaluation measure used to evalu- Algorithm 1. Particle swarm optimization
ate each candidate solution given by the feature selection algorithm
1: Input: Generate the initial particles randomly.
as a candidate solution to tackle the feature selection problem. Each 2: Output: Optimal particle and its fitness value.
generation calculates the fitness function for all candidate solu- 3: Algorithm
tions. If the quality of the solution is increased, this solution will be 4: Initialize swarm and parameters of the particle swarm
optimization c1 , c2 and etc.
replaced with the current solution or vice versa.
5: Evaluate all particles using the fitness function by Eq. (3).
The solution, which has a high fitness function value is consid- 6: while Termination criteria do
ered the optimal solution is given by the PSO algorithm to solve 7: Update the velocity
the feature selection of the current document [18]. In this paper, 8 Update each position
we used mean absolute difference (MAD) in the PSO algorithm as 9 Evaluate the fitness function.
10: Replaces the worst particle with best particle.
a fitness function for feature selection problem based on the stan-
11: Update LB and GB.
dard weighting scheme (TF-IDF). This weighing scheme is used as 12: end while
the objective function for evaluating the solution features or posi- 13: Return a new subset of informative features D1 .
tions [16]. MAD is a measure used in the feature selection domain The PSO algorithm generates particles with random positions.
to assign a relevance score (weightiness) for each text feature by Each candidate solution called particle is evaluated by the fitness
L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466 461

function as formulated in Eq. (3). In PSO, the solutions contain some Table 2
Terms frequency of the original features of the documents (D).
single entities (features). PSO is placed in the search space of a
feature selection problem and evaluates the fitness function at its Features documents 1 2 3 4 5 6 7 8 9 10
current location. Each solution determines its movement by com- 1 0 1 0 5 1 0 2 0 0 0
bining aspects of the historical information according to its current 2 1 0 2 1 5 0 1 0 0 0
and best fitness. The next iteration selects locations after all solu- 3 3 1 0 2 1 4 1 0 1 1
tions are moved. Lastly, the solutions, which are similar to a flock 4 2 5 4 1 0 0 0 1 0 0
5 0 1 0 1 0 2 0 3 0 1
of birds collectively searching for food, will likely be close to an
optimal fitness function [14]. The PSO algorithm includes a store
of solutions (particle swarm optimization memory PSOM), which Table 3
filled by generating S random solutions as in the matrix (5). The removed uninformative features in D.
⎡ ⎤
x11 ··· xt1 f (X1 ) Features documents 1 2 3 4 5 6 7 8 9 10

⎢ x2 ⎥ 1 0 0 0 5 1 0 0 0 0 0
⎢ 1 ··· xt2 f (X2 ) ⎥
⎢ ⎥ 2 1 0 2 1 0 0 1 0 0 0
⎢ . .. .. ⎥ 3 3 1 0 2 1 0 1 0 0 1
⎢ .. .. ⎥ 4 2 5 0 1 0 0 0 1 0 0
⎢ . . . ⎥
⎢ ⎥ 5 0 1 0 1 0 2 0 0 0 1
PSOM = ⎢ .. .. .. .. ⎥ (5)
⎢ . . . . ⎥
⎢ S−1 ⎥
⎢ x1 ··· xtS−1 f (XS−1 ) ⎥ Table 4
⎢ ⎥
⎢ S ⎥ The new subset of informative features (D1 ).
⎣ x1 ··· xtS f (XS ) ⎦
Features documents 1 2 3 4 5 6 7 8 9

1 0 0 0 5 1 0 0 0 0
2 1 0 2 1 0 0 1 0 0
PSO works based on two main factors to update each particle 3 3 1 0 2 1 0 1 0 1
position, namely, particle position, as shown in Eq. (6) and veloc- 4 2 5 0 1 0 0 0 1 0
ity, as shown in Eq. (7). The velocity of each particle is updated 5 0 1 0 1 0 2 0 0 1
according to particle movement effect, and each particle attempts
to move to the optimal position [13].
An example of the feature selection technique is shown in
xij = xij + vij , (6) Tables 2–4. Table 2 shows the terms frequency of the original fea-
where tures of the documents (D). The selected features as uninformative
features are given in bold as shown in Table 2. After the feature
vi,j = w ∗ ×vij + c1 × rand1 × (LBI − xi,j ) + c2 × rand2 × (GBI − xi,j ), selection technique is applied, the uninformative features will be
(7) removed as shown in Table 3. Finally, the dimension size could
reduce if the feature does not appear in any document (i.e., as the
feature number 9 does not appear in any document) and thus it will
The value of inertia weight often changes based on the itera- be removed as shown in Table 4. Thus, a new subset of informative
tion in the range of [0, 1]. LBI is the current best local solution at features (D1 ) is obtained to improve the performance of the text
iteration number I, and GBI is the current best global solution at iter- clustering technique as shown in Table 4.
ation number I. rand1 , and rand2 are random numbers in the range
of [0, 1], c1 and c2 are usually two constant. The inertia weight is
3.4. Text clustering technique
determined by Eq. (8).
I −I

max This section presents the steps of the k-mean text document
w = (wmax − wmin ) × + wmin , (8)
Imax clustering system after getting a new subset of essential and infor-
mative features using the PSO algorithm.
where wmax and wmin are the largest and smallest inertia weights,
respectively. The values of these weights are constants in the range
of (0.5–0.9), which is proposed by [8,4]. 3.4.1. Mathematical model of the text document clustering
The proposed algorithms deal with binary optimization prob- problem
lems [13]. Hence, the algorithms are modified to update the The text clustering method is defined as: given D a huge set
solutions positions by discrete value for each dimension. Eq. (9) of text documents D = d1 , d2 , . . ., dj , . . ., dn , where, n represents
represents the Sigmoid function used to determine the probability the number of documents in the given document collection, d1
of the ith position and Eq. (10) is used to update the new position. represents the recorder (document) number 1, Cosdi,cj is an objec-
The Sigmoid function values of the updating process are presented tive function to maximise the cosine similarity measure between
in Fig. 4. the document number i and the cluster centroid number j [12,11].
⎧ Cosine similarity is the common evaluation measure used in text
⎨ 1 ifrand < 1
mining, particularly in the text clustering domain to evaluate the
si,j = 1 + exp−vi,j , (9) effectiveness of the text clustering technique.
⎩ 0 otherwise

where rand is a random number between [0, 1], xi,j represents the 3.4.2. Compute clusters centroids
values of position j, and −vi,j denotes the velocity of particle i at To cluster a huge set of text documents into a subset of coher-
position j, j = 1, b, . . ., t. ent clusters, each cluster represents one centroid, which requires
 updating in each iteration using Eq. (11). Each document is assigned
1, rand < si,j to the similar cluster based on its similarity with the clusters cen-
xi,j = , (10) troids. Ck is the centroid of cluster k, which is represented as a vector
0, otherwise
Ck = (ck1 , ck2 , . . ., ckj , . . ., ckt ), ckj is the centroid of cluster j and t is the
462 L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466

Fig. 4. Sigmoid function used in PSO algorithm.

length of the cluster centroid [12]. The following equation is used Table 5
Description of the datasets.
to calculate the cluster centroid:
n Dataset Source # of documents # of features # of clusters
(a )d
i=1 ki i
ckj =  ri , (11) DS1 Reuters−21578 200 2935 4
a
j=1 kj DS2 20Newsgroups 100 3263 5
DS3 Reuters−21578 100 2063 8
. where di denotes the document number i that belongs to cluster DS4 20Newsgroups 200 5773 10
centroid number i, akj is the total number of text documents that DS5 20Newsgroups 1000 16,471 10
DS6 20Newsgroups 10,000 19,480 20
belong to cluster j, ri is the number of text documents in cluster i
[12].
Algorithm 2. K-mean clustering algorithm [11]
3.4.3. Similarity measure
1: Input: A collection of text documents D, and K is the number of all
Cosine is the standard measure used in the document clustering clusters.
technique to compute the similarity score between two vectors (i.e., 2: Output: Assign D to K.
document and cluster centroid) as d1 is document number 1 and d2 3: Termination criteria
is the cluster centroid. The following equation is used to calculate 4: Randomly choosing K document as clusters centroid C = (c1 , c2 , . . .,
cK ).
the similarity value:
5: Initialize matrix X as zeros
t 6: for all d in D do
j=1
w(tj , d1 ) × w(tj , d2 ) 7: let j = argmaxk{1to K} , based on Cos(di , ck ).
Cos(d1 , d2 ) =   , (12) 8: Assign di to the cluster j, A[i][j] = 1.
t 2 t 2
j=1
w(tj , d1 ) j=1
w(tj , d2 ) 9: end for
10: Update the clusters centroid using Eq. (11).
where w(tj , d1 is the weight of term j in the document number 1, 11: End
t 2
w(tj , d1 ) is the square of the summation of terms score for
j=1
t 2
4. Experimental results
the document number 1 from {j = 1 to t} and j=1
w(tj , d2 ) is the
square of the summation of terms score for the document number We implemented the PSO algorithm for the text feature selec-
2 from {j = 1 to t}, which d2 denotes the cluster centroid [33,11]. tion to find a new subset of more informative text features. We
then implemented the k-mean clustering algorithm for text clus-
3.4.4. K-mean algorithm tering using Matlab (version 7.10.0 with different capability and
The k-mean algorithm was introduced in 1967 as a local search RAM. This section provides the details of the given datasets, the
technique [11]. It is a suitable approach used in the domain of the evaluation criteria, the experiments’ results and discussion.
text mining to group a set of the documents into groups [6]. The
K-mean clustering algorithm is considered a proper algorithm to 4.1. Benchmark datasets
choose initial clusters centroid.
The k-mean clustering algorithm is used to partition D a set of Table 5 shows six text standard datasets used to test the effec-
text documents D = (d1 , d2 , d3 , . . ., dn ) into K a set of coherent clus- tiveness of the proposed method (i.e., feature selection using
ters. It uses the maximum similarity score for assigning the text particle swarm optimization algorithm to improve the text clus-
document to the more similar cluster centroid by Eq. (12). This tering (FSPSOTC)) and compared the proposed method (FSPSOTC)
algorithm uses X as data matrix (n × K), n is the number of all doc- with similar algorithms in the domain of feature selection such as:
uments, K is the number of clusters. Each text document shows a (i) feature selection using the genetic algorithm to improve the text
vector of weight di = (wi1 , wi2 , . . ., wij , . . ., wit ), t is the number of clustering (FSGATC) [6], (ii) feature selection using the harmony
all text features in D, k-mean clustering algorithm searches about search algorithm to improve the text clustering (FAHSTC) [5], and
the optimal (n × K) [12,11]. This procedure is presented in Algo- (iii) K-mean text clustering without any feature selection algorithm
rithm 2. [11].
L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466 463

Text clustering benchmark standard datasets are available at Table 6


Algorithm effectiveness based on clusters quality using K-mean algorithm.
the Laboratory of Computational Intelligence (LABIC)1 in numeri-
cal form after extracting the terms. The first dataset (DS1), called Dataset Method K-mean [11] FSHSTC [6] FSGATC [5] FSPSOTC
Reuters-21578, contains 200 random documents and 2935 text DS1 Accuracy 0.5565 0.5365 0.5955 0.5845
features that relate to four groups. The second dataset (DS2), Precision 0.5201 0.5274 0.5690 0.5754
called 20Newsgroups, provides 100 random documents and 3263 Recall 0.5077 0.5046 0.5681 0.5518
text features that relate to five groups. The third dataset (DS3), F-measure 0.5244 0.5011 0.5679 0.5690
Rank 3 4 2 1
called Reuters-21578, contains 100 documents and 2063 text fea-
tures that relate to eight groups. The fourth dataset (DS4), called DS2 Accuracy 0.3520 0.3595 0.4070 0.4040
Precision 0.2852 0.3166 0.3346 0.3551
20Newsgroups, provides 200 random documents and 5773 text
Recall 0.2718 0.3159 0.3446 0.3595
features that relate to ten groups. The fifth dataset (DS5), called F-measure 0.3057 0.3150 0.3386 0.3559
20Newsgroups, provides 1000 random documents and 16,471 text Rank 4 3 2 1
features that relate to ten groups. The sixth dataset (DS6), called
DS3 Accuracy 0.5070 0.5025 0.4705 0.5170
20Newsgroups, provides 10,000 random documents and 19,480 Precision 0.4721 0.4611 0.4262 0.4768
text features that relate to twenty groups. Recall 0.4709 0.4644 0.4261 0.4758
F-measure 0.4751 0.4610 0.4262 0.4844
Rank 3 2 4 1
4.2. Evaluation criteria
DS4 Accuracy 0.2707 0.2692 0.2762 0.2862
Precision 0.2422 0.2502 0.2578 0.2581
The comparative evaluations were conducted using one inter-
Recall 0.2514 0.2499 0.2479 0.2626
nal assessment measure: similarity measure and four external F-measure 0.2349 0.2491 0.2526 0.2707
evaluation measures: Accuracy (Ac), Precision (P), Recall (R) and Rank 4 3 2 1
F-measure (F). These measurements are standard evaluation crite- DS5 Accuracy 0.4247 0.4290 0.4347 0.4495
ria used in the domain of the text clustering to evaluate the clusters Precision 0.4015 0.4197 0.4094 0.4247
accuracy [16,11]. Recall 0.3974 0.4091 0.4259 0.4263
F-measure 0.4011 0.4117 0.4210 0.4298
Rank 4 3 2 1
4.2.1. Precision, Recall and F-measure
DS6 Accuracy 0.3548 0.3551 0.3611 0.3678
F-measure (F) is a standard measurement utilised in the domain
Precision 0.3254 0.3290 0.3510 0.3689
of the text document clustering. It works to measure the percent- Recall 0.3625 0.3564 0.3325 0.3452
age of the matched clusters which depends on two measurements: F-measure 0.3421 0.3486 0.3489 0.3565
Precision (P) and Recall (R). The precision and recall are standard Rank 4 3 2 1
measures used in the area of text mining. Also, it is combined to Mean rank 03.66 03.00 02.33 01.00
calculate the F-measure value for cluster j and class i [5,6]. Final rank 4 3 2 1

ni,j
P(i, j) = , (13)
nj
4.3. Results and discussion
ni,j
R(i, j) = , (14)
ni We applied the k-mean clustering algorithm to examine the
where nij is the number of members of class i in cluster j, nj is the influence of the feature selection algorithms on text clustering. The
number of members of cluster j and ni is the number of members feature selection technique is used to improve the effectiveness of
of class i. the text clustering algorithm by using a new subset which contains
the informative features as input to the k-mean clustering algo-
2 × P(i, j) × R(i, j)
F(i, j) = , (15) rithm. The proposed FSPSOTC is compared with other published
P(i, j) + R(i, j) feature selection algorithms in the domain of the feature selection
where P(i, j) is the precision of members of class i in cluster j, R(i, problem.
j) is the recall of members of class i in cluster j and the F-measure For fair comparisons, we tested over ten runs replicated. This
value for all clusters calculate by the following equation: number is selected based on the previous studies in the text clus-
tering domain sufficient to evaluate the proposed method [2]. The
nj
F= maxi {n(i, j)}, (16) PSO is a global search algorithm that runs 500 iterations in each
n run [4]. Five hundred iterations are enough for the convergence of
j
the global search algorithm. The k-mean is a local search algorithm
that runs 100 iterations in each run [11]. We experimentally noted
4.2.2. Accuracy
that 100 iterations are enough for the convergence of local search
Accuracy (AC) is an external measurement commonly used to
clustering algorithm [5,6].
compute the percentage of correct assigned documents to each
Table 6 shows the effectiveness of the k-mean text document
cluster according to the following equation [5,6]:
clustering based on the PSO feature selection algorithm. The pro-
K posed feature selection method using the PSO algorithm developed
1
AC = P(i, j) (17) an effective text clustering in almost all given datasets (i.e., DS1,
n DS2, DS3, DS4, DS5, and DS6) based on evaluation criteria of the
i=1
document clustering method. According to the accuracy measure,
where P(i, j) is the precision value for class i in cluster j, n is the the proposed FSPSOTC obtained the best results in four out of six
number of all documents in each cluster, and K is the number of all datasets (i.e., DS3, DS4, DS5, and DS6) followed by FSGATC with the
clusters. best results in two out of six datasets (i.e., DS1, and DS2). According
to the precision measure, the proposed FSPSOTC obtained the best
results in all datasets (i.e., DS1, DS2, DS3, DS4, DS5, and DS6). More-
1
http://sites.labic.icmc.usp.br/text collections/. over, according to the precision measure, the proposed FSPSOTC
464 L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466

Table 7 Table 8
Feature reduction ratio for each algorithm along with the given datasets. Computational time of the feature selection algorithms.

Dataset Method Original size FSHSTC [6] FSGATC [5] FSPSOTC Dataset Algorithm time (in second) Ranking

DS1 Dimension 2935 738 805 790 DS1 FSGATC 3850.248 2


Reduction ratio 74% 72% 73% FSHSTC 3921.649 3
FSPSOTC 3704.461 1
DS2 Dimension 3263 328 382 410
Reduction ratio 89% 88% 87% DS2 FSGATC 1950.658 2
FSHSTC 1994.414 3
DS3 Dimension 2063 469 547 416
FSPSOTC 1725.211 1
Reduction ratio 77% 73% 79%
DS3 FSGATC 2201.685 1
DS4 Dimension 5773 779 869 1016
FSHSTC 2314.546 2
Reduction ratio 86% 84% 82%
FSPSOTC 2353.242 3
DS5 Dimension 16,471 3474 3401 3314
DS4 FSGATC 4020.231 2
Reduction ratio 78% 79% 79%
FSHSTC 3920.001 1
DS6 Dimension 19,480 4415 4615 4299 FSPSOTC 4150.321 3
Reduction ratio 77% 76% 77%
DS5 FSGATC 8456.392 2
FSHSTC 9853.361 3
FSPSOTC 8260.211 1

DS6 FSGATC 16,450.461 3


obtained the best results in five out of six datasets (i.e., DS2, DS3, FSHSTC 15,430.240 2
FSPSOTC 14,563.114 1
DS4, DS5, and DS6) followed by the FSGATC with the best results
in one out of six datasets (i.e., DS1). Finally, according to the F- Algorithm Mean rank Final ranking
measure measure, which is the most important measure used in
FSGATC 2.0 2
the domain of the text clustering, the proposed FSPSOTC obtained
FSHSTC 2.3 3
the best results in all datasets (i.e., DS1, DS2, DS3, DS4, DS5, and FSPSOTC 1.6 1
DS6).
The statistical analysis is performed based on F-measure eval-
uation criteria. The average rankings of the feature selection
algorithms are reported in Table 6. The proposed feature selec- Table 7 shows the reduction ratio obtained by the feature selec-
tion method that uses the particle swarm optimisation algorithm tion algorithm. The proposed FSPSOTC achieved the best results
to improve the text clustering (FAPSOTC) is ranked the highest, where the dimension reduced from 2935 to 790 in DS1, which indi-
followed by the feature selection method that uses the genetic algo- cated a reduction of 0.73%. 3263 to 410 in DS2, with a reduction of
rithm to improve text clustering (FAGATC), the feature selection 87%. 2063 to 416 in DS3, with a reduction of 79%. 5773 to 1016 in
method that uses harmony search algorithm to improve the text DS4, with a reduction of 82%. 16,471 to 3314 in DS5, which indicated
clustering (FSHSTC), and the k-mean text clustering without any a reduction of 79%. Finally, 19,480 to 4299 in DS6, which showed a
feature section algorithm. reduction of 77%. Hence, the proposed FSPSOTC recorded the bet-
This section shows the experimental results to investigate ter results regarding the effectiveness of text clustering and the
the effectiveness of the proposed feature selection algorithm dimension space (compared to FSGATC and FSHSTC).
(FAPSOTC) and compare it with the performance of the similar We compared the convergence properties of the proposed
algorithms in the domain of text clustering. Table 6 shows that the PSO algorithm with the other comparative algorithms (i.e., GA,
FAPSOTC obtained the best performance according to all evaluation and HS) to solve the feature selection problem. The convergence
measures compared with the similar algorithms and the k-mean behaviour is shown in Fig. 5. The convergence values of the pro-
clustering algorithm. The FAPSOTC worked best on all six datasets. posed algorithm (i.e., PSO for feature selection) are the optimal
The PSO algorithm finds a good balance between exploitation and values obtained through ten runs to solve the feature selection
exploration search, which improves its performance. problem with each run consisting of 500 iterations. It is evident that
The proposed feature selection method is used to select an opti- the convergence behaviour of the proposed PSO has fast conver-
mal subset of informative text features for improving document gence and it gets the optimal results for almost all six text datasets.
clustering by obtaining coherent clusters. The proposed FSPSOTC Moreover, FSHSTC converges slowly in comparison with the com-
performs very well in enhancing the text clustering technique, and parative methods. It is evident that the FSPSOTC overcomes the
it reduced the number of uninformative features. The proposed other comparative methods between iteration 150 and iteration
FSPSOTC overcomes the shortcomings of similar methods when 400. It shows a much smoother convergent curve in comparison
dealing with a vast collection of text documents with multiple fea- with the other comparative methods.
ture spaces and sparse features. Lastly, the FSPSOTC was the best Finally, Table 8 shows the computational time recorded with
overall method and ranked 1 according to the average F-measure. various metaheuristic optimisation algorithms. Notably, the low-
The FSGATC was the second best method with rank 2. The FSH- est ranked feature selection algorithm is considered the best based
STC was ranked 3. The pure k-mean clustering without any feature on the computational time (execution time). The proposed FAP-
selection algorithm was the worst method with rank 4. SOTC recorded the lowest computational time in comparison with
Table 6 shows the effectiveness of the k-mean text document the other optimisation algorithms. Nevertheless, FAP-SOTC is use-
clustering based on using the feature selection algorithms, which ful in enhancing the performance of the text clustering, as shown
is evaluated based on the clusters quality using four evaluation in Table 6. A careful validation of the execution time analysis shows
criteria and six standard text benchmark datasets. It is evident that the proposed feature selection algorithm using PSO (FSPSOTC)
that the proposed FSPSOTC performed very well and overwhelmed records the comparative time. Moreover, all of the variants of the
the other comparative methods (i.e., FSGATC and FSHSTC). The optimisation algorithms (i.e., GA, and HS), as well as the records, are
proposed FSPSOTC recorded the better effectiveness based on accu- comparatively high in execution time. Lastly, the FSPSOTC recorded
racy, precision, recall and F-measure as external measurements the best computational time based on the final ranking with rank 1.
almost overall datasets. The FSGATC recorded the second best computational time based on
L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466 465

Fig. 5. Convergence behaviour of feature selection algorithms using six datasets.

the final ranking with rank 2. The FSHSTC recorded the third best For the evaluation process, experiments were carried out on
computational and ranked third. six text document datasets and compared with well-known pub-
lished methods in the domain of the feature selection. The proposed
5. Conclusion method (FSPSOTC) overwhelms all other comparative algorithms
(i.e., harmony search (FSHSTC) and genetic algorithm (FSGATC)) as
This paper proposed a new method to solve the text feature measured by precision, recall, F-measure, and accuracy. Regarding
selection problem using an unsupervised learning algorithm. Par- the effectiveness, the proposed selection method enhanced the text
ticle swarm optimization (PSO) is used as a feature selection document clustering results by assisting the k-mean text clustering
algorithm. This algorithm uses the term frequency-inverse docu- to make more similar groups. Regarding the feature size, FSPSOTC
ment frequency (TF-IDF) as an objective function to evaluate each reduced the feature size of the original text datasets.
text feature at the level of the document. The proposed method For future work, the hybrid particle swarm optimization algo-
(FAPSOTC) takes the original dataset to obtain a new optimal sub- rithm with another component can be applied to improve the local
set of informative features. The new subset of informative features exploitation and global exploration search abilities of the algorithm
is entered to k-mean text clustering algorithm to investigate the to obtain more informative features [34]. Furthermore, a new meta-
feature selection method according to the cluster accuracy. heuristic algorithm (i.e., krill herd algorithm) can be applied instead
466 L.M. Abualigah et al. / Journal of Computational Science 25 (2018) 456–466

of the k-mean technique to solve the text document clustering [23] S. Tabakhi, P. Moradi, F. Akhlaghian, An unsupervised feature selection
problem. algorithm based on ant colony optimization, Eng. Appl. Artif. Intell. 32 (2014)
112–123.
[24] A.L. Bolaji, M.A. Al-Betar, M.A. Awadallah, A.T. Khader, L.M. Abualigah, A
Acknowledgement comprehensive review: Krill Herd algorithm (KH) and its applications, Appl.
Soft Comput. 49 (2016) 437–446.
[25] R. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in:
The authors would like to thank the editors and reviewers for Proceedings of the Sixth International Symposium on Micro Machine and
their helpful comments. This paper is an extended version of a Human Science, 1995. MHS’95, IEEE, 1995, pp. 39–43.
paper presented at COMPSE 2016 Golden Sands Resort on 11–12 [26] S.-W. Lin, K.-C. Ying, S.-C. Chen, Z.-J. Lee, Particle swarm optimization for
parameter determination and feature selection of support vector machines,
November 2016. Expert Syst. Appl. 35 (4) (2008) 1817–1824.
[27] Y. Liu, G. Wang, H. Chen, H. Dong, X. Zhu, S. Wang, An improved particle
References swarm optimization for feature selection, J. Bionic Eng. 8 (2) (2011) 191–200.
[28] M.H. Aghdam, S. Heidari, Feature selection using particle swarm optimization
in text categorization, J. Artif. Intell. Soft Comput. Res. 5 (4) (2015) 231–238.
[1] S. Fouchal, M. Ahat, S.B. Amor, I. Lavallée, M. Bui, Competitive clustering
[29] S.-S. Hong, W. Lee, M.-M. Han, The feature selection method based on genetic
algorithms based on ultrametric properties, J. Comput. Sci. 4 (4) (2013)
algorithm for efficient of text clustering and text classification, Int. J. Adv. Soft
219–231.
Comput. Appl. 7 (1) (2015).
[2] K.K. Bharti, P.K. Singh, Hybrid dimension reduction by integrating feature
[30] P. Shamsinejadbabki, M. Saraee, A new unsupervised feature selection
selection with feature extraction method for text clustering, Expert Syst. Appl.
method for text clustering based on genetic algorithms, J. Intell. Inf. Syst. 38
42 (6) (2015) 3105–3114.
(3) (2012) 669–684.
[3] L.M. Abualigah, A.T. Khader, M.A. Al-Betar, O.A. Alomari, Text feature selection
[31] Y. Lu, M. Liang, Z. Ye, L. Cao, Improved particle swarm optimization algorithm
with a robust weight scheme and dynamic dimension reduction to text
and its application in text feature selection, Appl. Soft Comput. 35 (2015)
document clustering, Expert Syst. Appl. (2017).
629–636.
[4] K.K. Bharti, P.K. Singh, Opposition chaotic fitness mutation based adaptive
[32] K.-C. Lin, K.-Y. Zhang, Y.-H. Huang, J.C. Hung, N. Yen, Feature selection based
inertia weight BPSO for feature selection in text clustering, Appl. Soft Comput.
on an improved cat swarm optimization algorithm for big data classification,
43 (2016) 20–34.
J. Supercomput. 72 (8) (2016) 3210–3221.
[5] L.M. Abualigah, A.T. Khader, M.A. Al-Betar, Unsupervised feature selection
[33] L.M.Q. Abualigah, E.S. Hanandeh, Applying genetic algorithms to information
technique based on harmony search algorithm for improving the text
retrieval using vector space model, Int. J. Comput. Sci. Eng. Appl. 5 (1) (2015)
clustering, in: 2016 7th International Conference on Computer Science and
19.
Information Technology (CSIT), 2016, pp. 1–6, http://dx.doi.org/10.1109/CSIT.
[34] L.M. Abualigah, A.T. Khader, M.A. AlBetar, E.S. Hanandeh, A new hybridization
2016.7549456.
strategy for krill herd algorithm and harmony search algorithm applied to
[6] L.M. Abualigah, A.T. Khader, M.A. Al-Betar, Unsupervised feature selection
improve the data clustering, 2017.
technique based on genetic algorithm for improving the text clustering, in:
2016 7th International Conference on Computer Science and Information
Technology (CSIT), 2016, pp. 1–6, http://dx.doi.org/10.1109/CSIT.2016. Laith Mohammad Abualigah is currently pursuing his
7549453. Ph.D in Computer science at the school of Computer Sci-
[7] Y. Jafer, S. Matwin, M. Sokolova, Privacy-aware filter-based feature selection, ence in University Sains Malaysia. He received bachelor
in: 2014 IEEE International Conference on Big Data (Big Data), IEEE, 2014, pp. in Computer Information System and master degree in
1–5. Computer Science from in Al-Albayt University in 2010,
[8] L.M. Abualigah, A.T. Khader, M.A. AlBetar, E.S. Hanandeh, Unsupervised text 2014, respectively. His research interests include infor-
feature selection technique based on particle swarm optimization algorithm mation retrieval, feature selection, data mining, and text
for improving the text clustering, 2017. mining.
[9] H. Shi, Y. Li, Y. Han, Q. Hu, Cluster structure preserving unsupervised feature
selection for multi-view tasks, Neurocomputing 175 (2016) 686–697.
[10] M. Luo, F. Nie, X. Chang, Y. Yang, A.G. Hauptmann, Q. Zheng, Adaptive
unsupervised feature selection with structure regularization, IEEE Trans.
Neural Netw. Learn. Syst. (2017).
[11] L.M. Abualigah, A.T. Khader, M.A. Al-Betar, Multi-objectives-based text
clustering technique using k-mean algorithm, in: 2016 7th International Ahamad Tajudin Khader obtained his B.Sc. and M.Sc.
Conference on Computer Science and Information Technology (CSIT), IEEE, degrees in Mathematics from the University of Ohio, USA
2016, pp. 1–6. in 1982 and 1983, respectively. He received his PhD in
[12] R. Forsati, M. Mahdavi, M. Shamsfard, M.R. Meybodi, Efficient stochastic Computer Science from the University of Strathclyde,
algorithms for document clustering, Inf. Sci. 220 (2013) 269–291. UK in 1993. He is currently working as a professor and
[13] L.M. Abualigah, A.T. Khader, Unsupervised text feature selection technique Dean in the School of Computer Sciences, Universiti Sains
based on hybrid particle swarm optimization algorithm with genetic Malaysia. He has authored and co-authored hundreds of
operators for the text clustering, J. Supercomput. (2017) 1–23. high quality papers in international journals, conferences,
[14] Y. Lu, M. Liang, Z. Ye, L. Cao, Improved particle swarm optimization algorithm and book. He is a member of the Institute of Electrical and
and its application in text feature selection, Appl. Soft Comput. 35 (2015) Electronics Engineers (IEEE) as an international profes-
629–636. sor, Association of Computing Machinery (ACM), Special
[15] S. Alelyani, J. Tang, H. Liu, Feature selection for clustering: a review, Data Interest Group for Genetic and Evolutionary Computa-
Clust. Algorithms Appl. 29 (2013) 110–121. tion (SIGEVO), and Computational Intelligence Society. His
[16] K.K. Bharti, P.K. Singh, A three-stage unsupervised dimension reduction research interests mainly focus on optimization and scheduling.
method for text clustering, J. Comput. Sci. 5 (2) (2014) 156–169.
[17] J.R. Vergara, P.A. Estévez, A review of feature selection methods based on
mutual information, Neural Comput. Appl. 24 (1) (2014) 175–186. Essam Said Hanandeh is an assistant professor in the
[18] H. Uğuz, A two-stage feature selection method for text categorization by Department of Computer Science at Zarqa University,
using information gain, principal component analysis and genetic algorithm, Zarqa, Jordan. He received the degree of B.Sc. in1990, he
Knowl. Based Syst. 24 (7) (2011) 1024–1032. earned his master degree of M.Sc. in IT, A Ph.D. in CIS was
[19] W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, Z. Wang, A novel feature selection received in 2008; he joined Zarqa University in Jordan in
algorithm for text categorization, Expert Syst. Appl. 33 (1) (2007) 1–5. 2008. He worked for 15 years as Programmer & System
[20] Y. Wang, Y. Liu, L. Feng, X. Zhu, Novel feature selection method based on Analyst. He published more than six research papers in
harmony search for email classification, Knowl. Based Syst. 73 (2015) international journals and conferences.
311–323.
[21] Z. Li, J. Liu, Y. Yang, X. Zhou, H. Lu, Clustering-guided sparse structural
learning for unsupervised feature selection, IEEE Trans. Knowl. Data Eng. 26
(9) (2014) 2138–2150.
[22] L.M. Abualigah, A.T. Khader, M.A. Al-Betar, M.A. Awadallah, A krill herd
algorithm for efficient text documents clustering, in: 2016 IEEE Symposium
on Computer Applications & Industrial Electronics (ISCAIE), IEEE, 2016, pp.
67–72.

You might also like