Feed-Forward Neural Network Optimized by Hybridization of PSO and ABC For Abnormal Brain Detection
Feed-Forward Neural Network Optimized by Hybridization of PSO and ABC For Abnormal Brain Detection
ABSTRACT: Automated and accurate classification of MR brain achieved perfect classification on Dataset-66 and Dataset-160. For
images is of crucially importance for medical analysis and interpreta- Dataset-255, the 10 repetition achieved average sensitivity of
tion. We proposed a novel automatic classification system based on 99.37%, average specificity of 100.00%, average precision of
particle swarm optimization (PSO) and artificial bee colony (ABC), 100.00%, and average accuracy of 99.45%. The offline learning cost
with the aim of distinguishing abnormal brains from normal brains in 219.077 s for Dataset-255, and merely 0.016 s for online prediction.
MRI scanning. The proposed method used stationary wavelet trans- Thus, the proposed SWT 1 PCA 1 HPA-FNN method excelled exist-
form (SWT) to extract features from MR brain images. SWT is ing methods. It can be applied to practical use. C 2015 Wiley Peri-
V
translation-invariant and performed well even the image suffered odicals, Inc. Int J Imaging Syst Technol, 25, 153–164, 2015; Published online in
from slight translation. Next, principal component analysis (PCA) was Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/ima.22132
harnessed to reduce the SWT coefficients. Based on three different
Key words: particle swarm optimization; artificial bee colony; hybrid-
hybridization methods of PSO and ABC, we proposed three new var-
ization; magnetic resonance imaging; feed-forward neural network;
iants of feed-forward neural network (FNN), consisting of IABAP-
stationary wavelet transform; principle component analysis; pattern
FNN, ABC-SPSO-FNN, and HPA-FNN. The 10 runs of K-fold cross
recognition; classification
validation result showed the proposed HPA-FNN was superior to not
only other two proposed classifiers but also existing state-of-the-art
methods in terms of classification accuracy. In addition, the method I. INTRODUCTION
Magnetic resonance imaging (MRI) is a low-risk, fast, noninvasive
imaging technique that produces high quality images of the anatomical
Correspondence to: Yudong Zhang and Sidan Du; e-mail: zhangyudong@njnu.
edu.cn structures of the human body, especially in the brain, and provides rich
Grant sponsors: NSFC (610011024, 61273243, 51407095), Program of Natural information for clinical diagnosis and biomedical research (Goh et al.,
Science Research of Jiangsu Higher Education Institutions (13KJB460011, 2014). Soft tissue structures are clearer and more detailed with MRI
14KJB520021), Jiangsu Key Laboratory of 3D Printing Equipment and Manufacturing
(BM2013006), Key Supporting Science and Technology Program (Industry) of Jiangsu than other imaging modalities (Zhang et al., 2013). Numerous
Province (BE2012201, BE2014009-3, BE2013012-2), Special Funds for Scientific and researches are carried out, trying not only to improve the magnetic res-
Technological Achievement Transformation Project in Jiangsu Province
(BA2013058), Nanjing Normal University Research Foundation for Talented Scholars
onance (MR) image quality (Dong et al., 2014), but also to seeking
(2013119XGQ0061, 2014119XGQ0080). novel methods for easier and quicker pre-clinical diagnosis from MR
Where
1 t2ft
wðtjfs ; ft Þ ¼ pffiffiffi wð Þ (2)
fs fs
Here, the wavelet w(t|fs, ft) is calculated from the mother wavelet
w(t) by translation and dilation: fs is the scale factor, ft the translation
factor (both real positive numbers), and C the coefficients of WT.
There are several different kinds of wavelets which have gained pop-
ularity throughout the development of wavelet analysis.
Equation (1) can be discretized by restraining fs and ft to a dis-
crete lattice (fs 5 2^ft and fs > 0) to give the DWT, which can be
expressed as follows.
Figure 1. A graphical illustration of e-decimated DWT (e 5 10,110,
hX i controlling the DS operator to reserve odd or even index). [Color fig-
Lðnjfs ; ft Þ ¼ DS n
xðnÞlfs ðn22fs ft Þ ure can be viewed in the online issue, which is available at wileyonli-
hX i (3) nelibrary.com.]
fs
Hðnjfs ; ft Þ ¼ DS n
xðnÞhfs ðn22 ft Þ
sequence of 0 s and 1 s, namely, e 5 e1e2. . .eJ. This transform is
Here the coefficients L and H refer to the approximation compo- called the e-decimated DWT. A graphical example of e 5 10,010
nents and the detail components, respectively. The functions l(n) and was shown in Figure 1.
h(n) denote the low-pass filter and high-pass filter, respectively. The The SWT calculate all the e-decimated DWT for a given signal at
DS operator means the downsampling. one time. More precisely, for level 1, the SWT can be obtained by
The above decomposition process can be iterated with successive convolving the signal with the appropriate filters as in the DWT but
approximations being decomposed so that one signal is broken down without downsampling. Then the coefficients of the approximation
into various levels of resolution. The whole process is called a wave- and detail at level 1 are the same as the signal length.
let decomposition tree. The general step j convolves the approximation coefficients at
In applying this technique to MR images, the DWT is applied level j-1, with appropriate filters but without downsampling, to pro-
separately to each dimension. As a result, there are four subband duce the approximation and detail coefficients at level j. The sche-
(LL, LH, HH, and HL) images at each scale. The subband LL is then matic diagram is shown in Figure 2a. The algorithm of 1D-SWT can
used for the next decomposition. As the level of the decomposition be easily extended to the 2D case. Figure 2b shows the schematic
increased, we obtained more compact yet coarser approximation diagram of 2D-SWT.
components. Thus, wavelets provide a simple hierarchical frame-
work for interpreting the image information.
C. Feature Reduction. Excessive features increase computation
B. e-decimated DWT and Stationary WT. The DWT is trans- times and storage memory. Furthermore, they sometimes make clas-
lation variant, meaning that DWT of a translated version of a signal sification more complicated, which is called the curse of dimension-
x is not the translated version of the DWT of x. Suppose I denotes a ality. It is required to reduce the number of features. PCA is a
given MR image, and T the translation operator, then statistical procedure that uses an orthogonal transformation to con-
DWTðTðIÞÞ 6¼ T ðDWTðIÞÞ (4) vert a set of observations of possibly correlated variables into a set of
values of linearly uncorrelated variables called principal components
Formula (4) suggested the features obtained by DWT may change
(PC) (Zhang et al., 2014b,c,d). PCA is efficient to reduce the dimen-
remarkably when the brain MR image was only slightly shifted
sion of a data set, while retaining most of the variations. It has three
because of the dithering of the subject. In the worst cases, the DWT
effects: it orthogonalizes the components of the input vectors so that
based classification may even recognize two images from one subject
they are not correlated with each other, it orders the resulting orthog-
as two from different subjects, when the centers of the images are
onal components so that those with the largest variation come first,
located at slightly different positions.
and it eliminates those components contributing the least to the
How to preserve the translation invariance property lost by classi-
cal DWT? e-decimated DWT was proposed to solve the problem.
The DS in classical DWT, [Eq. (3)], retrains even indexed elements,
which is where the time/spatial variant problem lies in. To address
the problem, DS in e-decimated DWT chose predefined indexed ele-
ments (odd or even) instead of purely even indexed elements.
The choice concerns every step of the decomposition process. If
we perform all the different possible decompositions of the original
signal for a given maximum level J, then we will have 2J different
decompositions (Cherif et al., 2010).
Suppose ej 5 1 or 0 denotes the choices of odd or even indexed Figure 2. Schematic diagram of SWT. [Color figure can be viewed
elements at step j. Then, every decomposition is labeled by a in the online issue, which is available at wileyonlinelibrary.com.]
the population Here, N denotes the number of solutions, and f denotes the fit-
Step 2 Repeat ness value;
Step 6 Normalize Pi values into [0, 1]
Step 7 Produce the new solutions (new positions) ti for the
onlookers from the solutions xi, selected depending on Pi, and
evaluate them
Step 8 Apply the greedy selection process for the onlookers
between xi and ti
Step 9 Determine the abandoned solution (source), if exists,
and replace it with a new randomly produced solution xi for the
scout using the equation:
xij ¼ minj 1uij 3 maxj 2minj (16)
D. Hybridization II—ABC-SPSO. El-Abd (2011) proposed The system is coherent with existing classification systems. The
artificial bee colony-standard particle swarm optimization (ABC- implementation of the proposed system is two-fold: offline learning
SPSO) algorithm. They considered the update equation of ABC only with the aim of training the classifier, and online prediction with the
updates a single problem variable at a time after which the new solu- aim of predict normal/abnormal labels for subjects.
tion is re-evaluated. This component is added to PSO after the main
loop. For every particle i in the swarm, the ABC update equation is V. MATERIALS AND ASSESSMENT
applied to its personal best pbest solutions of the particles. This is A. Three Benchmark Dataset. Three different benchmark MR
done after randomly selecting another particle k and a random prob- image datasets, that is, Dataset-66, Dataset-160, and Dataset-255, were
lem variable j. Table IV showed the pseudocodes of ABC-SPSO used for test in this study. All datasets consist of T2-weighted MR brain
images in axial plane and 256 3 256 in-plane resolution, which were
downloaded from the website of Harvard Medical School with URL of
E. Hybridization III—HPA. Kiran and Gunduz (2013) proposed http://med.harvard.edu/AANLIB/. Dataset-66 and Dataset-160 were
a recombination-based Hybridization of PSO and ABC (HPA). The already widely used in brain MR image classification. They consist of
best solutions of the populations obtained at each iteration of the abnormal images from seven types of diseases along with normal
PSO and ABC are recombined, and the solution (referred as images. The abnormal brain MR images of the two datasets consist of
“TheBest”) obtained from recombination is given to the PSO and following diseases: glioma, meningioma, Alzheimer’s disease, Alzhei-
onlooker bees of ABC as global best (gbest) and neighbor, respec- mer’s disease plus visual agnosia, Pick’s disease, sarcoma, and Hun-
tively. Therefore, TheBest provides HPA with better global search tington’s disease. Das et al. (2013) proposed the third dataset “Dataset-
and exploitation abilities. Table V listed the pseudocodes of HPA. 255,” which contains 11 types of diseases, among which seven types of
diseases are the same as the Dataset-66 and Dataset-160 mentioned
F. Proposed Methods. Based on FNN and the three different before, and four new types of diseases (chronic subdural hematoma,
hybridization optimization methods, we proposed three novel FNN cerebral toxoplasmosis, herpes encephalitis, and multiple sclerosis)
variants: IABAP-FNN, ABC-SPSO-FNN, and HPA-FNN. From were included. Figure 5 shows samples of brain MR images.
Figure 4. Flowchart of the proposed system. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
B. CV Setting. Following common convention and ease of strati- VI. EXPERIMENT RESULTS
fied cross validation, 10 3 6-fold stratified cross validation (CV) was The experiments were carried out on the platform of IBM machine
used for Dataset-66, and 10 3 5-fold stratified CV was used for the with 3 GHz core i3 processor and 8 GB RAM, running under Win-
other two datasets. Table VII shows the statistical characteristics and dows 7 operating system. The algorithm was in-house developed via
CV setting of the three datasets. Matlab 2014a (The Mathworks V C ).
images were aligned to form a two-dimensional matrix. Results in SVM 1 IPOL (Zhang and Wu, 2012), DWT 1 PCA 1 SVM 1 GRB
Figure 7 show only seven PCs can preserve at least 80% of the total (Zhang and Wu, 2012), WE 1 SWP 1 PNN (Saritha et al., 2013), and
energy (the red line represents the energy threshold). We did not set RT 1 PCA 1 LS-SVM (Das et al., 2013)), on the basis of averaging
the energy threshold as 95% or higher as common, since that will the results of 10 repetition of either 5-fold or 6-fold stratified CV.
yield too many features, which will levy a heavy computation burden For the purpose of fair comparison, the population sizes of the
for following classifiers. algorithms are taken as 100, that is, consisting of 50 particles, 25
employed bees, and 25 onlooker bees. c1 and c2 of PSO were assigned
C. Classification Comparison. We compared the proposed with the values of 2, and x was assigned with the value of 0.75. The
three methods (SWT 1 PCA 1 IABAP-FNN, SWT 1 PCA 1 ABC- algorithms terminated when the maximum iteration number of 5000
SPSO-FNN, SWT 1 PCA 1 HPA-FNN) with state-of-the-art methods was reached. The comparison result was shown in Table VIII, which
(DWT 1 SOM (Chaplot et al., 2006), DWT 1 SVM (Chaplot et al., also showed the feature vector dimension of each scheme.
2006), DWT 1 SVM 1 POLY (Chaplot et al., 2006), It was clear from Table VIII that the proposed
DWT 1 SVM 1 RBF (Chaplot et al., 2006), DWT 1 PCA 1 FP-ANN SWT 1 PCA 1 IABAP-FNN obtained accuracy of 100.00, 99.44, and
(El-Dahshan et al., 2010), DWT 1 PCA 1 KNN (El-Dahshan et al., 99.18% over Dataset-66, Dataset-160, and Dataset-255, respectively.
2010), DWT 1 PCA 1 SVM (Zhang and Wu, 2012), DWT 1 The proposed SWT 1 PCA 1 ABC-SPSO-FNN method obtained
PCA 1 SVM 1 HPOL (Zhang and Wu, 2012), DWT 1 PCA 1 100.00, 99.75, and 99.02% over three datasets, respectively. Finally,
Figure 6. The comparison between DWT and SWT. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.
com.]
Table VIII. Accuracy comparison, some data extracted from (Das et al., 2013).
Existing approaches Feature No. Dataset-66 Dataset-160 Dataset-255
DWT 1 SOM (Chaplot et al., 2006) 4761 94.00 93.17 91.65
DWT 1 SVM (Chaplot et al., 2006) 4761 96.15 95.38 94.05
DWT 1 SVM 1 POLY (Chaplot et al., 2006) 4761 98.00 97.15 96.37
DWT 1 SVM 1 RBF (Chaplot et al., 2006) 4761 98.00 97.33 96.18
DWT 1 PCA 1 FP-ANN (El-Dahshan et al., 2010) 7 97.00 96.98 95.29
DWT 1 PCA 1 KNN (El-Dahshan et al., 2010) 7 98.00 97.54 96.79
DWT 1 PCA 1 SCABC-FNN (Zhang et al., 2011) 19 100.00 99.27 98.82
DWT 1 PCA 1 SVM (Zhang and Wu, 2012) 19 96.01 95.00 94.29
DWT 1 PCA 1 SVM 1 HPOL (Zhang and Wu, 2012) 19 98.34 96.88 95.61
DWT 1 PCA 1 SVM 1 IPOL (Zhang and Wu, 2012) 19 100.00 98.12 97.73
DWT 1 PCA 1 SVM 1 GRB (Zhang and Wu, 2012) 19 100.00 99.38 98.82
WE 1 SWP 1 PNN (Saritha et al., 2013) 3 100.00 99.94 98.86
RT 1 PCA 1 LS-SVM (Das et al., 2013) 9 100.00 100.00 99.39
Proposed Approaches Feature No. Dataset-66 Dataset-160 Dataset-255
SWT 1 PCA 1 IABAP-FNN 7 100.00 99.44 99.18
SWT 1 PCA 1 ABC-SPSO-FNN 7 100.00 99.75 99.02
SWT 1 PCA 1 HPA-FNN 7 100.00 100.00 99.45