Neurocomputing: Deepak Ranjan Nayak, Ratnakar Dash, Banshidhar Majhi
Neurocomputing: Deepak Ranjan Nayak, Ratnakar Dash, Banshidhar Majhi
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
a r t i c l e i n f o a b s t r a c t
Article history: Recently there has been remarkable advances in computer-aided diagnosis (CAD) system development for
Received 25 April 2017 detection of the pathological brain through MR images. Feature extractors like wavelet and its variants,
Revised 20 September 2017
and classifiers like feed-forward neural network (FNN) and support vector machine (SVM) are very of-
Accepted 5 December 2017
ten used in these systems despite the fact that they suffer from many limitations. This paper presents
Available online 19 December 2017
an efficient and improved pathological brain detection system (PBDS) that overcomes the problems faced
Communicated by Dr Xiaoming Liu by other PBDSs in the recent literature. First, we support the use of contrast limited adaptive histogram
equalization (CLAHE) to enhance the quality of the input MR images. Second, we use discrete ripplet-
Keywords:
Computer-aided diagnosis (CAD) II transform (DR2T) with degree 2 as the feature extractor. Third, in order to reduce the huge number
Magnetic resonance imaging (MRI) of coefficients obtained from DR2T, we employ PCA+LDA approach. Finally, an improved hybrid learn-
Discrete ripplet-II transform (DR2T) ing algorithm called MPSO-ELM has been proposed that combines modified particle swarm optimization
Extreme learning machine (ELM) (MPSO) and extreme learning machine (ELM) for segregation of MR images as pathological or healthy. In
Modified PSO (MPSO) MPSO-ELM, MPSO is utilized to optimize the hidden node parameters (input weights and hidden biases)
of single-hidden-layer feedforward neural networks (SLFN) and the output weights are determined ana-
lytically. The proposed method is contrasted with the current state-of-the-art methods on three bench-
mark datasets. Experimental results indicate that our proposed scheme brings potential improvements
in terms of classification accuracy and number of features. Additionally, it is observed that the proposed
MPSO-ELM algorithm achieves higher accuracy and obtains compact network architecture compared to
conventional ELM and BPNN classifier.
© 2017 Elsevier B.V. All rights reserved.
1. Introduction input to the PBDSs because of its advantage of providing huge in-
formation about the soft tissues of the human brain [1–3]. In ad-
Brain disease is one of the leading causes of death in peo- dition, MRI is a non-invasive and radiation-free imaging modality
ple with different age groups across the globe. Many types of as compared to other modalities like X-ray and CT scan. However,
brain diseases exist such as neoplastic diseases (brain tumor), cere- due to the enormous data storage, it is hard to interpret MR im-
brovascular diseases (stroke), degenerative diseases, and infectious ages manually. In particular, manual interpretation is a costly, trou-
diseases. Some of these diseases cause minor problems in the hu- blesome and time-consuming task [4,5]. To overcome such issues,
man brain and some prompt to death. Therefore, development of automated PBDSs need to be developed by using dedicated com-
the automated pathological brain detection system (PBDS) which puter systems which can assist radiologists in taking accurate and
is a type of computer-aided diagnosis (CAD) system specifically de- faster decisions. PBDS utilizes various image processing and pat-
signed for the diagnosis of the human brain is of great importance. tern recognition algorithms in its different stages.
PBDS plays an important role in arriving at correct and quick clin- Many attempts have been made toward the development of var-
ical decisions. An advanced medical imaging modality known as ious PBDSs in the past decade [6]. However, the development of
magnetic resonance imaging (MRI) has been commonly used as the ideal PBDS is still in its infancy because of the difficulty in se-
lecting proper algorithms for feature extraction, feature selection
and classification that will combinedly work in all cases regard-
∗
Corresponding author. less of the type of image modalities and the dataset size. Hence,
E-mail addresses: [email protected] (D.R. Nayak), [email protected] (R. PBDS remains an open challenging problem in front of researchers.
Dash), [email protected] (B. Majhi).
https://doi.org/10.1016/j.neucom.2017.12.030
0925-2312/© 2017 Elsevier B.V. All rights reserved.
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 233
Our objective in this study is to enhance the performance of the dimensional feature space. However, these techniques are optional
PBDS for abnormality detection in the human brain. in case of the second category.
It has been observed that discrete wavelet transform (DWT) is Chaplot et al. [1] were the forebears who proposed PBDS with
the most used feature extractor in PBDSs as it analyzes images at the help of 2D DWT features and two classifiers such as self-
several scales and handles one-dimensional (1D) singularities ef- organizing map (SOM) and SVM. The authors in [7] have proposed
fectively. However, it has limited capability of representing two- a PBDS where Slantlet transform (ST) is employed for feature ex-
dimensional 2D singularities (edges of an image). That is, DWT is traction and back-propagation neural network (BPNN) is used for
not able to capture curve like features effectively from the images. classification. Later, El-Dahshan et al. [8] introduced a hybrid ap-
Therefore, to handle such issue, application of advanced trans- proach with the assistance of 2D DWT and two classifiers such as
forms are in great demand. Further, classifiers like feed-forward k-nearest neighbor (k-NN) and feed forward back-propagation arti-
neural network (FNN) and support vector machine (SVM) are of- ficial neural network (FP-ANN). Moreover, to reduce the feature di-
ten used in earlier PBDSs because of their capability in separat- mensionality, they applied principal component analysis (PCA) on
ing nonlinear input patterns and predicting continuous functions. the feature vectors generated from 2D DWT. Further, with same
To train FNN, conventional gradient-based learning algorithms such features the authors in [3,9–11], proposed new PBDSs. In these
as back-propagation (BP) and Levenberg-Marquardt (LM) are used cases, gradient-based schemes and population based optimiza-
which have many limitations such as trapping at local minima, tion schemes such as scaled conjugate gradient (SCG), PSO, adap-
slower learning speed, and learning epochs. Traditional SVM clas- tive chaotic PSO (ACPSO), and scaled chaotic artificial bee colony
sifier encounters higher computational complexity and it performs (SCABC) are used to optimize the parameters of FNN, BPNN and
poorly on large datasets. Furthermore, the number of features re- kernel SVM (KSVM). Zhang and Wu [12] have suggested a PBDS
quired in some PBDSs is also higher which makes the classifier’s where DWT plus PCA based features are given to a KSVM classi-
task more difficult and costly. fier. In [4], the authors proposed a PBDS where the features are
This paper aims at developing an efficient PBDS which can over- derived from Ripplet transform (RT) and then subjected to PCA
come the issues faced by existing PBDSs. The proposed system for dimensionality reduction. Thereafter, they applied least squares
markedly improves the recent results with the help of ripplet-II SVM (LS-SVM) in order to get significant results. Later on, in [6],
transform and a new variant of extreme learning machine (ELM). the authors have offered a scheme where DWT is used for feature
In more detail, the major contributions are extraction after the employment of feedback pulse coupled neu-
ral network (FPCNN). Finally, they have employed FP-ANN classi-
(a) Use of discrete ripplet-II transform (DR2T) for feature extraction fier. Afterward, Wang et al. [13] offered a new PBDS based on sta-
as it is effective in capturing 2D singularities along with a group tionary wavelet transform (SWT) which is translation invariant in
of curves from MR images. nature [14]. In this, PCA is used to derive a low dimensional fea-
(b) Use of a recently proposed learning algorithm known as ELM ture vector. Additionally, in order to optimize the parameter of FNN
in oder to overcome the problems of traditional learning al- classifier, they made use of two evolutionary schemes, namely, ar-
gorithms. In particular, ELM provides faster learning speed tificial bee colony (ABC) and PSO. These schemes were coined as
and better generalization performance than other conventional IABAP-FNN, ABC-SPSO-FNN, and HPA-FNN. In another work, Zhang
learning algorithms. et al. [15] have deployed weighted-type fractional Fourier trans-
(c) Combining modified particle swarm optimization (MPSO) and form (WFRFT) and PCA for feature extraction and reduction, re-
ELM (MPSO-ELM) to avoid the difficulties faced by basic ELM spectively. For classification, they have used two variants of SVM,
such as local minima issue, slow response speed on testing namely, generalized eigenvalue proximal SVM (GEPSVM) and twin
data, high requirement of hidden neurons and ill-conditioned SVM (TSVM). Later, Nayak et al. [5] have proposed a PBDS with the
problem. support of 2D DWT and probabilistic PCA (PPCA). In this, AdaBoost
(d) Comparison with other competent methods in terms of classi- with random forests (ADBRF) scheme was employed for classifica-
fication accuracy and number of features on three well-known tion. In [16], authors have offered a PBDS which uses SWT, PCA,
datasets. and GEPSVM for feature extraction, reduction, and classification.
The authors in [17] have proposed a PBDS where the features are
The remaining part of the article is organized as follows. Re-
extracted from the HL3 sub-band of 2D DWT. PCA+LDA technique
lated works are summarized in Section 2. Section 3 offers the ma-
was applied instead of PCA in order to get more relevant features.
terials used in the experiments. The proposed methodology is dis-
Chen et al. [18] offered a new PBDS in which Minkowski-Bouligand
cussed in Section 4. The statistical setting and pseudocode of the
dimension (MBD) features are computed after detection of edges
proposed scheme are presented in Section 5. In Section 6, the ex-
from brain images using Canny edge detector. Thereafter, they pro-
perimental results are analyzed. Conclusions based on this study
posed an improved PSO based on three-segment particle repre-
and the future research directions are highlighted in Section 7.
sentation, time-varying acceleration coefficient, and chaos theory
(PSO-TTC) to train the single-hidden layer feedforward neural net-
2. Related work work. Dash et al. [19] proposed a PBDS based on curvelet features.
In this, LS-SVM was employed for detection of pathological brain.
During the past decade, a number of pathological brain detec- Many recent articles on PBDS have been used various feature
tion systems (PBDSs) have been reported in the literature for de- descriptors like energy, entropy, mean and standard deviation etc.,
tecting various brain diseases. In almost all PBDSs, MRI has been in the feature extraction stage. For example, Saritha et al. [20] have
used as the imaging modality. The PBDSs can be categorized into offered a PBDS which uses the wavelet entropy (WE) values as
two classes, i.e., direct feature based PBDS and indirect feature the features. The spider web plot and t-test scheme are subse-
based PBDS depending on the type of features used. In the first quently used to select the significant features. Finally, a probabilis-
category, the coefficients from several image transforms are di- tic neural network (PNN) is employed for classification. Later, Yang
rectly utilized as the key features, while in the later one, statisti- et al. [21] have proposed a PBDS where the energy values of a
cal descriptors like energy, entropy, mean and standard deviation, level-3 DWT are used as features. For classification, they applied
evaluated from the direct coefficients are served as the features. biogeography-based optimization (BBO) method with SVM in or-
The systems falling under the first category mostly require feature der to achieve better generalization performance. In [22], a dis-
transformation or selection techniques in order to reduce the high crete wavelet packet transform (DWPT) based PBDS is proposed.
234 D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247
Two different types of entropies namely, Shannon entropy (SE) 3. Datasets used
[23] and Tsallis entropy (TE) were calculated from the sub-bands.
GEPSVM is utilized to assign the class label as healthy or patholog- The performance of the proposed PBDS are tested on three
ical. Furthermore, in [24], a PBDS in which FNN with a hybrid BBO benchmark datasets, namely, DS-66, DS-160, and DS-255 accom-
and PSO based method known as HBP is proposed. In this work, modating 66, 160 and 255 images, respectively. These datasets hold
wavelet entropy has been used as the features. In [25], WE and T2-weighted brain MR images of size 256 × 256 in axial view plane
a Naive Bayes classifier (NBC) based PBDS is proposed, while in which is available in Medical School of Harvard University web-
[26], a wavelet energy and SVM based PBDS is introduced. Zhang site [34]. Along with the healthy brain samples, the datasets DS-66
et al. [27] have used Tsallis entropy values of DWPT as the fea- and DS-160 have samples from seven classes of diseases, namely,
tures and fuzzy support vector machine (FSVM) as the classifier. sarcoma, AD, AD plus visual agnosia (VA), glioma, meningioma,
In [28], seven WE features and seven Hu moment invariants (HMI) Huntington’s disease (HD), and Pick’s disease (PD). However, DS-
features are used followed by GEPSVM+RBF classifier. Wang et al. 255 contains four more diseases, viz., cerebral toxoplasmosis (CTP),
[29] have proposed a PBDS based on a novel feature called frac- multiple sclerosis (MS), herpes encephalitis (HE), and chronic sub-
tional Fourier entropy (FRFE) which is the combination of FRFT and dural hematoma (CSH). Samples of all kind of MR images are
SE. Welch’s t-test (WTT) and Mahalanobis distance (MD) was ap- shown in Fig. 1. Out of 11 types of diseases, glioma, meningioma,
plied separately to select the relevant features and subsequently, and sarcoma are of brain tumor type; while CTP, MS, and HE are of
twin SVM (TSVM) classifier is employed for classification. Later, infectious type. The diseases such as AD, AD plus VA, PD, and HD
in [30], a PBDS based on FRFE features and multilayer perceptron are called the degenerative diseases; whereas CSH is a cerebrovas-
(MLP) is proposed. Three pruning methods, namely, Bayesian de- cular disease.
tection boundaries (BDB), dynamic pruning (DP), and Kappa coef- The proposed work is a two-class classification problem
ficient (KC) are utilized to get the optimal hidden neurons in MLP. (healthy or pathological), where the pathological class includes im-
Subsequently, adaptive real coded BBO (ARCBBO) approach is em- ages from all kinds of diseases. It is worth mentioning here that
ployed to update the weights of MLP. In [31], the authors have predicting pathological samples as healthy (cost sensitivity prob-
employed three varieties of binary PSO (BPSO) to select significant lem) is more severe than the converse one. This inaccurate pre-
features from the 25 entropy values (primary features) of a 8-level diction may cause patient’s death. To alleviate this issue, a greater
DWT. PNN was deployed for classification. While in [32], the vari- degree of pathological samples are deliberately taken into account
ance and entropy (VE) values relative to the sub-bands of a dual- as compared to healthy samples with the aim of making the sys-
tree complex wavelet transform (DTCWT) are used as features. In tem more biased towards the pathological samples.
this, GEPSVM and TSVM are employed as the classifier. Later, for
feature extraction Nayak et al. [33] have computed energy and en- 4. Proposed methodology
tropy values from the sub-bands of 2D-SWT. They have employed
a symmetric uncertainty ranking (SUR) filter for feature selection. This section describes the methods involved in the proposed
Finally, AdaBoost with support vector machine (ADBSVM) is used PBDS. The overall architecture of the proposed PBDS is shown in
for classification. Fig. 2. As shown in the figure, the proposed PBDS has four ma-
The literature study shows that in most PBDSs wavelet and jor steps, namely, preprocessing, feature extraction, feature reduc-
its variants (like SWT, DWPT, DTCWT, etc.) are frequently used as tion, and classification. The input of the system is a brain MR im-
the feature extractor. However, traditional DWT suffers from many age and the output is the class label (healthy or pathological).
drawbacks such as limited directional selectivity and translation In the preprocessing step, we use contrast limited adaptive his-
variance. SWT can resolve translation variance issue; however, it togram equalization (CLAHE). In feature extraction step, we em-
leads to redundancy and it is not able to capture higher dimen- ploy ripplet-II transform to derive features and in feature reduction
sional singularities. Further, DTCWT is efficient and less redundant step, we harness PCA+LDA approach. Subsequently, for classifica-
which offers more directional selectivities (i.e., six) as compared tion, an improved hybrid learning method MPSO-ELM is utilized,
to SW T and DW T. Here, it can be concluded that all these trans- where MPSO is used to optimize the initial weights and biases of
forms are less capable of handling 2D singularities. Therefore, fur- the SLFN. The proposed PBDS works in two parts, namely, offline
ther improvements in directional selectivity need to be explored learning and online prediction. The first part includes the training
to capture curve like features from MR images. Furthermore, it has and evaluation process with the reduced feature sets, whereas, the
been observed that FNN and SVM are commonly used in many second part predicts a class label for a query MR image. In the fol-
PBDSs despite they require more parameters to tune and are time- lowing, all the steps are delineated in detail.
consuming. Additionally, most of the schemes have been validated
on small datasets and shown higher accuracies; however, they per- 4.1. Preprocessing based on CLAHE
form poorly when evaluated on large datasets. Thus, there exists
a scope to mitigate the shortcomings of the existing schemes in Image preprocessing is one of the most vital and rudimentary
terms of number of features needed and improvement in accuracy steps in image analysis that leads to improvement in the quality of
on large datasets. the images. It has been observed that some images in the selected
Keeping this in mind, we have proposed an efficient PBDS to datasets are low-contrast in nature. Therefore, to enhance the im-
classify the MR image as healthy or pathological. The proposed ages, a well-known methodology named contrast limited adap-
PBDS uses discrete ripplet-II transform (DR2T) for feature extrac- tive histogram equalization (CLAHE) has been employed. CLAHE
tion due to its ability in capturing directional features (edges and is a variation of adaptive histogram equalization (AHE) which at
curves). Thereafter, PCA+LDA approach is employed in order to first computes a histogram of gray values in a contextual region
decide the most significant feature set. Eventually, an improved centered around each pixel and thereafter, it assigns a value to
learning algorithm called MPSO-ELM for SLFN is introduced which each pixel intensity within the display range according to the
offers advantages such as local minima avoidance, better general- pixel intensity rank in its local histogram [35]. Dissimilar to AHE,
ization capability, faster learning rate, and well-conditioned com- CLAHE has the benefits of preventing over-enhancement of noise
pared to standard classifiers like FNN, SVM, LS-SVM, ELM, etc. and diminishing the edge-shadowing effect [36]. It uses a fixed
These improvements lead the proposed PBDS to a more robust and value dubbed as clip limit which helps in clipping the histogram
accurate system over other existing schemes. prior to the computation of cumulative distribution function (CDF).
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 235
However, CLAHE redistributes those parts of the histogram that gree and orientation parameters, respectively. By tuning these pa-
surpass the clip limit equally among all histogram bins. rameters, ripllet-II transform can capture structural information
along arbitrary curves. Using (1) and (2), we have
4.2. Feature extraction based on DR2T
RT 2 g (s, t, d, θ ) = ϕs,t (r ), GRd [g] (3)
Fourier transform has been found to be less suitable for fea- where GRd [g] is the GRT of function g and is defined as
ture extraction from images since it loses the time information
and it can not handle 1D singularities. Hence, it fails to provide GRd (r, θ ) = g(ρ , α )δ r − ρ cosd ((α − θ )/d ) ρ dρ dα (4)
efficient representation of images that contains edges, however, it
only works well for smooth images. In contrast, wavelet transform The GRT can also be evaluated using Fourier transform [44]. Eq.
performs better in representing 1D singularities (i.e., point singu- (3) indicates that ripplet-II transform is the inner product between
larities). But, conventional wavelet transform is not capable of rep- GRT and 1D wavelet. It can also be represented as
resenting 2D singularities along arbitrarily shaped curves. In order 1D−W T
g(ρ , α ) ⇒ GRd [g](r, θ ) ⇒ RT 2 g (s, t, d, θ )
GRT
(5)
to resolve the problem that conventional wavelet suffers from, an-
other transform called ridgelet transform was introduced which is which defines that ripplet-II transform works in two steps: first
based on Radon transform [37,38]. Ridgelet holds great potential in compute GRT of g and then compute 1D WT of the GRT of g.
representing line singularities (i.e., it is capable of extracting lines The discrete version of ripplet-II transform (DR2T) can be de-
of arbitrary orientation), but it can not handle 2D singularities ef- fined as
fectively. Thereafter, first generation curvelet transform based on 1D−DW T
g(ρ , α ) ⇒ GRd [g](r, θ ) ⇒ RT 2 g (s, t, d, θ )
DGRT
(6)
multiscale ridgelet was proposed by Candes and Dohono to resolve
the 2D singularities along smooth curves [39]. Later on, they pro- in which the discrete GRT (DGRT) of g is first computed and sub-
posed the second generation curvelet transform called fast discrete sequently, the 1D discrete W T (DW T) of the DGRT of g is com-
curvelet transform (FDCT) which is simple, fast, and less redundant puted. The computing procedure for discrete ripplet-II transform
than the former one [40]. Because of the capabilities like multires- becomes more simpler when d = 2. In this case, the GRT is dubbed
olution, more directional selectivity, anisotropy, localization, etc., it as ‘parabolic Radon transform’ and is defined as follows [44]:
has drawn attentions over last decades. The anisotropic property √ √
GR2 (r, θ ) = 2 rR g ρ 2 , 2α r , θ /2 (7)
guarantees solving 2D singularities along C2 curves and to accom-
plish this, curvelet utilizes a parabolic scaling law [41]. However, where, R[g(ρ , α )](r, θ ) is the classical Radon transform (CRT) in
the reason behind the selection of parabolic scaling is not clear. polar coordinates. However, in general, the GRT of function g for
In order to combat this issue, a new transform called as ripplet- d > 0 takes the form in Fourier domain as
I transform is proposed which generalizes the scaling law [42,43]. +∞
∞ −1/2
In general, ripplet-I transform generalizes the curvelet transform GRFd (r, θ ) = 2 g(ρ , α )e−in α dα × 1 − (r/ρ )2/d
by adding two parameters such as support c and degree d. When n=−∞ r
Algorithm 1 Feature extraction using discrete ripplet-II transform. tion during online prediction. Here, the selected eigenvectors are
suitably called as basis vectors (BV).
Input: N input images: g[x, y]; 0 ≤ x < m, 0 ≤ y < n
Output: Feature matrix: FM of size N × D
1: for each image g[x, y] ∈ N do 4.4. Classification based on MPSO-ELM
2: Transform g(x, y ) into polar coordinates g(ρ , α ) and substi-
tute (ρ , α ) with (ρ 2 , 2α ) In this section, we first discuss the preliminaries of extreme
3: Transform polar coordinates (ρ , α ) to Cartesian coordi- learning machine (ELM) and particle swarm optimization (PSO),
nates (x, y ) and obtain another image g (x, y ) by 2D bilinear and thereafter we describe the proposed MPSO-ELM algorithm in
interpolation detail.
4: Compute 1D FFT of g (x, y ) i.e., G (u, v ) along θ (columns)
5: Compute GRd (r, θ ) in Fourier domain i.e., GRFd (r, θ ) for 4.4.1. Extreme learning machine (ELM)
G (u, v ) and d = 2 using (8) In the past decades, single-hidden layer feedforward neural net-
6: Compute inverse 1D FFT ginv on GRFd (r, θ ) along θ (columns) works (SLFNs) have been successfully applied in many applica-
7: Apply 1D DWT on ginv along r and obtain the coefficients tions as it could approximate any continuous function and classify
8: Arrange the coefficients in a vector of size 1 × D, where D is any disjoint region. To train the SLFNs, gradient-based learning al-
the total number of features and store it in a matrix gorithms such as Levenberg-Marquardt (LM) and backpropagation
9: end for (BP) algorithm, have been widely used. However, despite their pop-
10: Obtain a feature matrix FM containing all vectors ularity, these learning algorithms face various issues such as poor
learning speed due to improper learning steps, get trapped at local
minima, require large number of iterations to obtain better learn-
high storage space requirement. Therefore, the application of dif- ing performance, and overfitting [49]. A recently developed learn-
ferent dimensionality reduction techniques is of great importance ing algorithm known as extreme learning machine (ELM) avoids
to obtain the most relevant candidate features. Principal compo- the limitations faced by gradient based learning schemes. ELM has
nent analysis (PCA) has been found to be effective in reducing fea- also the potential for high performance classification and solving
ture dimension. PCA transforms high-dimensional input data to a regression tasks [50,51]. Different from other conventional learn-
lower-dimensional space termed as principal subspace while keep- ing algorithms such as BP, SVM and LS-SVM, ELM learns faster
ing maximum variations of the data. That is, PCA always seeks a with better generalization performance. In ELM, the hidden node
direction that best represents the data by excluding the class la- parameters (the input weights and hidden biases) are randomly
bels and hence unsupervised in nature. assigned, while the output weights of SLFNs are analytically de-
In contrast to PCA, linear discriminant analysis (LDA), a super- termined by simple inverse operation of the hidden layer output
vised approach, attempts to find a feature subspace that best dis- matrix. ELM has been discussed below mathematically.
criminates between the classes and therefore has drawn the atten- Given N distinct training samples (xi ,ti ), where xi =
T
tion of researchers in the past decades. More formally, LDA always [xi1 , xi2 , . . . , xil ] ∈ Rl and ti = [ti1 , ti2 , . . . , tiC ]T ∈ RC , the SLFNs
searches those vectors over which the samples of dissimilar classes having nh hidden nodes and activation function φ (.) can be
are far from each other, whereas the samples of similar classes represented as
are close to each other. However, traditional LDA leads to degra-
nh
nh
dation in performance while dealing with high dimensional and woi φ (x j ) = woi φ whi · x j + bi = o j , j = 1, 2, . . . , N (10)
small sample size problem as in these cases the within-scatter ma- i=1 i=1
trix (Sw ) becomes singular [47]. Further, to make sure that Sw does T
not become singular, at least D + C (where D=dimension of the Here, whi = whi1 , whi2 , . . . , whil represents the weight vector that
feature vector and C=number of classes) number of samples are links between ith hidden neuron and the input neurons, woi =
needed which in general is practically not possible [48]. To tackle
T
woi1 , woi2 , . . . , woiC indicates the weight vector that connects the
this problem, a popular approach called PCA+LDA has been applied
ith hidden neuron and the output neurons, and bi is the bias of
in our proposed work, where a D-dimensional data is first reduced
the ith hidden neuron. The SLFNs can approximate these N sam-
using PCA to an M-dimensional data and thereafter reduced to a
ples with zero error, i.e., ∃ whi , woi , and bi such that
l-dimensional data using LDA, l < < M < D.
The process of finding an optimal or relevant feature set is de-
nh
scribed as follows. Initially, the eigenvalues of different features are woi φ whi · x j + bi = t j , j = 1, 2, . . . , N (11)
arranged in decreasing order. Subsequently, a measure called the i=1
normalized cumulative sum of variances (NCSV) corresponding to Now, Eq. (11) can be represented in matrix form as
each feature is calculated and the NCSV value for jth feature is de-
fined as Hwo = T (12)
j
where,
α (u )
u=1 H wh1 , wh2 , . . . , whn , b1 , b2 , . . . , bnh , x1 , x2 , . . . , xN
NCSV ( j ) = ; 1≤ j≤D (9) ⎡ h ⎤
D φ (wh1 · x1 + b1 ) ... φ (whnh · x1 + bnh )
α (u ) ⎢ .. .. ⎥
u=1 =⎣ . ... . ⎦ ,
where, α (u) represents the eigenvalue of the uth feature and D de-
φ (wh1 · xN + b1 ) ... φ (whnh · xN + bnh )
⎡ ⎤ ⎡ T⎤ N×nh
notes the dimensionality of the feature vector. Finally, a threshold wo1 T t1
value is set manually and the features for which the NCSV value ⎢ .. ⎥ ⎢.⎥
w =⎣ . ⎦
o and T = ⎣ .. ⎦
surpasses the threshold are selected. Relevant features selected are
determined experimentally to have a maximal accuracy. It is worth won T tNT
h nh ×C N×C
mentioning here that the coefficients of the first l eigenvectors Here, H denotes the hidden layer output matrix. Now, the out-
(satisfying the threshold) are retained to perform feature reduc- put weights wo can be analytically determined by finding the
238 D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247
smallest norm least square (LS) solution of the above linear sys- 4.4.3. Proposed MPSO-ELM method
tem (Eq. (12)) as Since ELM randomly chooses the input weights and hidden bi-
wˆo = H† T (13) ases, it leads to two crucial problems [52,56,57]: (i) it needs more
hidden neurons than conventional gradient based methods that
where, H† indicates the Moore-Penrose (MP) generalized inverse make ELM responds slowly to unknown testing data, and (ii) it
of matrix H and this method helps ELM to have better general- causes an ill-conditioned hidden layer output matrix H in presence
ization performance [52]. The smallest norm LS solution is unique of more hidden neurons for which ELM induces poor generaliza-
and has the minimum norm among all the LS solutions. As the so- tion performance. Condition number has been found to be a good
lution of ELM is obtained using an analytical method without iter- qualitative measure to calculate the conditioning of a matrix [57].
atively tuning parameters, it converges faster than other traditional It indicates how close a system is to be ill-conditioned. It may be
learning algorithms. noted that an ill-conditioned system has large condition number
while a well-conditioned system has small condition number. The
2-norm condition number of the matrix H can be calculated as,
4.4.2. Particle swarm optimization (PSO)
Particle swarm optimization (PSO), a population-based evolu- λmax (HT H )
K2 ( H ) = (18)
tionary algorithm introduced by Eberhart and Kennedy [53,54], has λmin (HT H )
shown great potential in a wide variety of search and optimiza-
tion problems. This algorithm is influenced by the social behavior where, λmax (HT H) and λmin (HT H) denote the largest and smallest
of bird flocking or fish schooling. Unlike other evolutionary algo- eigenvalues of matrix HT H.
rithms such as genetic algorithms (GA) and differential evolution In order to tackle these issues, few efforts have been made in
(DE), PSO does not require any evolution parameters (crossover the last decade using evolutionary algorithms (EAs) and swarm in-
and mutation) and is easy to implement. In PSO, each bird of a telligence based algorithms since these algorithms have the ben-
flock represents a solution in the search space which is also called efits of global searching for optimization problems. Zhu et al.
as a particle. Each particle is characterized by its own position and [52] suggested a hybrid algorithm called evolutionary ELM (E-
velocity. At first, PSO initializes a set of particles (or solutions) ran- ELM), where a modified differential evolution (DE) algorithm is
domly known as swarm and thereafter looks for the best solution utilized to optimize hidden node parameters and MP generalized
(optima) by updating generations. In each iteration, the velocity inverse is utilized to find the solution. They have shown that E-
of each particle is updated by two best values, namely, pbest and ELM provides faster learning speed and better generalization per-
gbest, where pbest denotes the position of the best solution found formance than other traditional algorithms. Additionally, it obtains
so far by a particle and gbest indicates the best value found so far much more compact network than basic ELM. However, E-ELM de-
by all the particles in the population. Then, the position of the par- mands two additional parameters to tune, namely, the mutation
ticle is adjusted using the updated velocity. factor and the crossover factor. Because of the potency of PSO to
For a D-dimensional search space, the position and velocity search global optimum, it can be expected that hybridization of
of jth particle can be expressed as S j = (s j1 , s j2 , . . . , s jD ) ∈ D and PSO and ELM should be promising for training SLFNs. Xu and Shu
V j = (v j1 , v j2 , . . . , v jD ) ∈ D , respectively. The pbest and gbest of the [56] introduced another E-ELM based on PSO to select the hidden
jth particle is denoted as Pj pbest = ( p j1 , p j2 , . . . , p jD ) and Pgbest = node parameters which require only one parameter to tune man-
( pgbest1 , pgbest2 , . . . , pgbestD ) respectively. Then the traditional PSO is ually. They have added boundary conditions into conventional PSO
stated as [53,54] to enhance the performance of ELM. Later, in [58], an improved
PSO based ELM (IPSO-ELM) is proposed to find optimal SLFNs.
v jd (k + 1 ) = v jd (k ) + c1 ∗ rand1() ∗ p jd (k ) − s jd (k ) During searching, IPSO considers both the root mean squared er-
ror (RMSE) and the norm of output weights of the validation set
+c2 ∗ rand2() ∗ pgbestd (k ) − s jd (k ) (14)
in order to obtain better convergence performance. Suresh et al.
[59] have proposed a hybrid learning algorithm using real-coded
genetic algorithm and ELM (RCGA-ELM) for no-reference image
s jd (k + 1 ) = s jd (k ) + v jd (k + 1 ), 1 ≤ j ≤ N p , 1 ≤ d ≤ D (15)
quality assessment. But, RCGA requires two genetic parameters
such as crossover and mutation. Zhao et al. [57] offered an input
where, Np indicates the total number of particles, c1 and c2 are
weight selection technique for improving the conditioning of ELM
the acceleration coefficients, and rand1() and rand2() denote two
with the help of linear hidden neurons. With this technique, they
separate uniform distributed random numbers in the range [0,1].
have achieved numerical stability without degrading accuracy.
Later, Shi and Eberhart [55] proposed an adaptive PSO (APSO)
In this paper, a new approach combining modified PSO (MPSO)
by introducing a new parameter called inertia weight into tradi-
and ELM is suggested which avoids the issues faced by existing
tional PSO and is defined as
methods in the recent literature. In this method, we use MPSO
v jd (k + 1 ) = ω ∗ v jd (k ) + c1 ∗ rand1() ∗ p jd (k ) − s jd (k ) to optimize the hidden node parameters and MP generalized in-
verse to analytically find the solution. Since the two acceleration
+c2 ∗ rand2() ∗ pgbestd (k ) − s jd (k ) (16) components (c1 : cognitive component and c2 : social component)
in PSO strongly influences the convergence to the global optimum
where, ω denotes the inertia weight that helps in balancing the
solution, proper selection of these components are very important.
local and global search. ω can be defined as a decreasing function
In order to select these components efficiently, we incorporate a
of time instead of a fixed positive value,
strategy called time-varying acceleration coefficients (TVAC) [60] in
ω (k ) = ω f − k ∗ (ω f − ωi ) /maxIter (17) addition to the time-varying inertia weight factor in classical PSO
and hence we termed it as MPSO. TVAC improves the global search
where, ωi , ωf , k, and maxIter represent initial weight, final weight, ability at the beginning stage of the optimization and encourages
current iteration number and maximum iteration number, respec- the particles to converge toward the global optima at the end of
tively. Contrasted with classical PSO, this variant of PSO is more the search. Here, we select a large value of c1 and small value of c2
efficient since it plays a role in finding global optimum with a rea- at the beginning of optimization, whereas we select a small value
sonable number of iterations. of c1 and large value of c2 at the final stage. Mathematically, c1
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 239
and c2 can be defined as (f) Repeat (b)-(e) until the maximum number of iterations are
over. Finally, the optimal input weights and hidden biases are
c1 ( k ) = c1i + k ∗ c1 f − c1i /maxIter (19)
obtained which are applied to the testing data to find the per-
formance of the system.
c2 ( k ) = c2i + k ∗ c2 f − c2i /maxIter (20)
where, c1i and c1f indicate the initial and final value of c1 , respec- As the proposed scheme uses Eq. (23) and (24) to find the
tively. Similarly, c2i and c2f represent the initial and final value of optimal input weights and hidden biases, it tends to provide the
c2 , respectively. k and maxIter are the current iteration number and smaller norm of output weights of SLFNs. On the other hand, the
maximum number of allowable iterations. smaller norm of the output weights leads to a smaller condition
The main goal of MPSO is to minimize the norm of the output value of the output hidden matrix. In general, the proposed MPSO-
weights and to bound the hidden node parameters within a spe- ELM has the following advantages: it has no evolution parame-
cific range that on the other hand enhances the convergence per- ters, it improves the conditioning, and it provides better general-
formance of ELM. This idea was conceived by Bartlett [61] where ization performance with a much more compact network. Com-
it is stated that neural networks tend to have better generaliza- pared to other gradient based methods and classical ELM, this ap-
tion performance with the weights of smaller norm. Therefore, we proach does not need activation function to be continuously differ-
consider both RMSE and norm of the output weights of SLFNs entiable. The pseudocode of the MPSO-ELM algorithm is given in
to search the global optima in MPSO. The steps of the proposed Algorithm 2. Since the proposed scheme involves major techniques
MPSO-ELM are delineated as follows:
Algorithm 2 Pseudocode of the proposed MPSO-ELM scheme.
(a) At first, initialize randomly all the particles in the swarm such
that each particle consists of a set of input weights and hidden 1: Initializing MPSO Initialize all particles in the swarm where
biases as each particle correspond to input weights and biases. Initialize
ωi , ω f , c1i , c1 f , c2i and c2 f
Pj = wh11 , wh12 , . . . , wh1l , wh21 , wh22 , . . . , wh2l , whnh 1 ,
2: Evaluate the output weights for each particle according to Eq.
whnh 2 , . . . , whnh l , b1 , b2 , . . . , bnh (21) (13)
3: Calculate the validation error for each particle according to Eq.
It may be noted that all the input weights and hidden biases
(22) as fitness
are randomly initialized within a range of [-1,1].
4: Initialize the Ppbest and Pgbest using the fitness values
(b) For each particle, evaluate the output weights and fitness. Here,
5: while k < Max number of iterations do
we set the root-mean squared error (RMSE) on the validation
6: Calculate the time-varying parameters ω, c1 and c2 accord-
set as the fitness. The validation set is considered rather than
ing to Eqs. (17), (19) and (20)
the whole training set in order to avoid the overfitting. The fit-
7: for each particle do
ness can be defined as
8: Update the velocity and position according to Eq. (15)
Nv nh
|| woi φ whi · x j + bi − t j ||22
and (16)
9: Check if any particles go beyond the search space, amend
j=1 i=1
f () = (22) it according to Eq. (25)
Nv
10: Update the Pj pbest and Pgbest according to Eq. (23) and Eq.
where, Nv indicates the number of validation samples.
(24), respectively
(c) Update the Pjpbest of all the particles and Pgbest of the swarm
11: end for
using the fitness value and the norm of the output weights as
12: k=k+1
follows:
13: end while
Pj if f (Pj ) < f (Pjbest ) and ||woPj || < ||woPjbest || 14: Output the global best position
Pjbest =
Pjbest otherwise
(23)
such as DR2T, PCA+LDA and MPSO-ELM, the scheme is referred to
as “DR2T + PCA+LDA + MPSO-ELM”. The overall steps followed in
Pj if f (Pj ) < f (Pgbest ) and ||woPj || < ||woPgbest ||
Pgbest = the proposed system are articulated in Algorithm 3.
Pgbest otherwise
(24)
5. Experimental design and evaluation
where, f(Pj ), f(Pjbest ), and f(Pgbest ) denotes the fitness value of
the current particle j, the best position of the particle j so far, In this section, we discuss about the experimental design and
and the global best position in the swarm, respectively. woP , performance measures used. In order to validate the proposed
j
woP , and woP represents the output weights generated by scheme “DR2T + PCA+LDA + MPSO-ELM”, simulation has been car-
jbest gbest
ried out on three different datasets, namely, DS-66, DS-160, and
MP generalized inverse for the the current particle j, the best
DS-255. For statistical analysis, cross-validation (CV) has been em-
position of the particle j so far, and the global best position in
ployed which avoids over-fitting problem. Also, it makes the classi-
the swarm, respectively.
fier to generalize on independent datasets. In this work, we incor-
(d) Calculate the value of time-varying parameters, ω, c1 and c2 us-
porate stratification into CV which splits the folds in such a man-
ing Eqs. (17), (19) and (20) respectively. Subsequently, update
ner that each fold will have a similar class distributions. Fig. 3 de-
the velocity and position of all the particles in the swarm based
picts the setting of a 5-fold CV for a single run. In each trial, one
on Eqs. (15) and (16) and generate the new population.
fold is used for testing, one for validation and the rests are used
(e) Based on the literature [49,50], all the input weights and biases
for training. The validation set is used to find the parameters of
should lie in the range of [-1,1]. Thus, check the following con-
the MPSO-ELM i.e., it helps us to know when to stop training. The
dition to deal with the position out-of-bound issue
test set is used to evaluate the performance in a run of five trials.
−1 if s jd (k + 1 ) < −1 For DS-66, we employ 6-fold stratified cross validation (SCV) while
s jd (k + 1 ) = , 1 ≤ j ≤ Np , 1 ≤ d ≤ D
1 if s jd (k + 1 ) > 1 for other two datasets, we select 5-fold SCV. Additionally, we run
(25) the SCV procedure 10 times on three datasets to avoid randomness.
240 D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247
Fig. 4. Preprocessing using CLAHE (β =0.01 and region size=8 × 8). Row 1 lists the original MR samples. Row 2 lists the corresponding contrast enhancement using CLAHE.
Table 1
Algorithm 3 Implementation steps of the proposed PBDS. Statistical setting of K-fold SCV for three benchmark datasets [4,5,30].
Offline learning: Dataset K-fold SCV Total samples Training Validation Testing
1: for each ground truth image do
H P H P H P H P
2: Enhance the contrast using CLAHE
3: Apply DR2T with degree 2 on the enhanced image. DS-66 6 18 48 12 32 3 8 3 8
DS-160 5 20 140 12 84 4 28 4 28
4: Obtain the DR2T coefficients and form a feature vector set DS-255 5 35 220 21 132 7 44 7 44
of dimension D
5: end for
6: Apply PCA+LDA approach to reduce the dimension of feature
vector from D to l, where l is calculated from NCSV measure.
Retain the corresponding l basis vector (BV) coefficients It is worth mentioning here that the statistical setting for all the
7: Perform K-fold stratified cross validation on all the whole three datasets is kept similar to the literatures [4,5,30] as shown
dataset and generate the training, validation and testing data in Table 1.
8: Train the ELM algorithm using MPSO and find the optimized Four different measures, namely, sensitivity (Se ), specificity (Sp ),
input weights and hidden biases. Calculate the RMSE on the precision (Pr ) and accuracy (ACC) are used to test the effective-
validation set as fitness ness of the proposed scheme. Se is the fraction of pathological MR
9: Compute the output weights using the optimized input weights samples correctly predicted by the model, while Sp is the fraction
and hidden biases of healthy MR samples correctly predicted by the model. How-
10: Evaluate the classification performance on the testing set ever, ACC determines the fraction of the correctly predicted sam-
Online prediction: ples (both pathological and healthy) in the total number of testing
1: Load the unknown MR image as input to the system. samples. Moreover, to compare the proposed MPSO-ELM method
2: Preprocess the query image with CLAHE with other methods such as DE-ELM, PSO-ELM, ELM, and BPNN,
3: Employ DR2T with degree 2 over the enhanced image two additional measures (condition number and norm of output
4: Obtain the DR2T features and store it in a feature vector weights) are used.
5: Find reduced feature set by multiplying the feature vector with
the retained BV coefficients
6: Feed the reduced feature set to the SLFN classifer trained by 6. Experimental results and analysis
MPSO-ELM and predict the output label as healthy or patho-
logical The proposed system was implemented using MATLAB toolbox
on a machine with 3.7 GHz processor, 8 GB RAM, and windows 8
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 241
b
1
DR2T
Ridgelet
2D DWT
0.2
Nomalised coefficients
0.04
0.008
0.0016
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
OS. The parameters used and the statistical set up was kept similar tive enhanced images corresponding to four original MR images
to other competent schemes to derive relative comparisons. are shown in Fig. 4. From the figure, it is evident that the affected
regions in the enhanced images are more clear as compared to the
6.1. Preprocessing and feature extraction results original images.
Next, we apply DR2T to each of the preprocessed images and
To enhance the contrast of original MR image, CLAHE is utilized extract the features as the transform coefficients of DR2T. In DR2T,
which relies on the proper setting of its parameters. In the present we set the number of decomposition levels for 1D DWT as 2
case, the original MR image is divided into 64 contextual regions. and use Haar wavelet as the basis because of its simplicity. As
The number of bins and the clip limit (β ) are selected as 256 and the images are of size 256 × 256, therefore the total number of
0.01. It may be noted that uniform distribution scheme is selected features extracted by DR2T from a single image are 256 ∗ 256 =
for each region to obtain a flat histogram shape. The representa- 65536 which is much larger in size. Further, 2D DWT and ridgelet
242 D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247
0.95
0.9
PCA
PCA+LDA
0.85
NCSV
0.8
0.75
0.7
0.65
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of features
Fig. 6. NCSV values with respect to different number of features on the combination of three datasets.
transform are applied in place of DR2T and their coefficients are Table 2
Parameters list in MPSO-ELM.
stored. The magnitude of the coefficients of each transform is com-
puted and then the coefficients of the corresponding transforms Parameters Values Descriptions
are normalized by their largest coefficient. Finally, the normalized ωi 0.4 Initial inertial weight
magnitude of coefficients is sorted in decreasing order to check ωf 0.9 Final inertial weight
the rate of decay in coefficients (Fig. 5). It may be observed that c1i 2.5 Initial value of cognitive component c1
DR2T provides the fastest decay in coefficients as compared to 2D c1f 0.5 Final value of cognitive component c1
c2i 0.5 Initial value of social component c2
DWT and ridgelet. Therefore, DR2T generates sparse feature vectors
c2f 2.5 Final value of social component c2
which strongly influences the classification performance. Pj [-1,1] Particle initialization
We employ PCA+LDA to reduce the high dimensional feature algorithm are kept same i.e., 20 and 30 respectively. The parame-
space (65,536 features) derived from DR2T. The significant feature ters used in the MPSO-ELM algorithm are obtained by experiences
set is obtained according to the NCSV values of the features. The and are listed in Table 2. For PSO-ELM, the value of acceleration
NCSV values relating to various number of features obtained by coefficients c1 and c2 are set as 2, while for DE-ELM, the crossover
PCA+LDA and standard PCA are shown in Fig. 6. It can be observed rate and scaling factor (F) are set as 0.9 and 0.8 respectively.
that PCA+LDA preserves maximum information with only three The performance of all algorithms such as MPSO-ELM, PSO-
features, whereas standard PCA requires more number of features. ELM, DE-ELM, ELM, and BPNN are tested on the three benchmark
For a chosen threshold value of 0.95, standard PCA and PCA+LDA datasets and is reported in Tables 3–5. From the tables, it can be
result in 13 and three features respectively. Therefore, PCA+LDA observed that DE-ELM, PSO-ELM, and MPSO-ELM obtain higher ac-
approach is found to be more suitable for finding the significant curacy on all the three datasets with comparatively less hidden
features. neurons than basic ELM and BPNN. DE-ELM and PSO-ELM achieve
similar accuracy with MPSO-ELM on DS-66 and DS-160, while they
6.3. Classification results earn smaller accuracy compared to MPSO-ELM on DS-255. It can
also be seen that classical ELM demands more hidden neurons
For classification of MR images as healthy or pathological, than BPNN.
we employ an improved learning algorithm called MPSO-ELM for Moreover, it is observed that the condition value of the ma-
SLFN classifier. In this section, first we have compared the perfor- trix H obtained by DE-ELM, PSO-ELM and MPSO-ELM algorithm is
mance of the proposed MPSO-ELM with other learning algorithms, smaller as compared to conventional ELM over all the datasets. The
namely, differential evolution with ELM (DE-ELM), PSO-ELM, ELM norm value of MPSO-ELM, PSO-ELM and DE-ELM is also found to
and BPNN. It may be noted that the population size and the max- be less than ELM and therefore they can have better generalization
imum number of iterations for MPSO-ELM, PSO-ELM and DE-ELM performance than basic ELM. Further, it is seen that the smaller
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 243
Table 3
Performance comparison of different classifiers on DS-66.
Classifiers ACC (%) Hidden neurons (nh ) Norm Condition number (K2 )
BPNN 10 0.0 0 5 – –
ELM 10 0.0 0 7 57.3199 6.0980e+03
DE-ELM 10 0.0 0 4 21.4401 119.9567
PSO-ELM 10 0.0 0 4 26.5115 135.9442
MPSO-ELM 10 0.0 0 4 19.1374 106.3816
Table 4
Performance comparison of different classifiers on DS-160.
Classifiers ACC (%) Hidden neurons (nh ) Norm Condition number (K2 )
BPNN 10 0.0 0 5 – –
ELM 99.94 7 167.9688 1.4007e+04
DE-ELM 10 0.0 0 3 15.6922 57.1863
PSO-ELM 10 0.0 0 3 16.2196 71.1975
MPSO-ELM 10 0.0 0 3 12.8221 51.9365
Table 5
Performance comparison of different classifiers on DS-255.
Classifiers ACC (%) Hidden neurons (nh ) Norm Condition number (K2 )
BPNN 99.22 7 – –
ELM 99.18 10 56.2456 3.6126e+03
DE-ELM 99.57 5 11.1868 104.6756
PSO-ELM 99.49 5 13.0537 392.9668
MPSO-ELM 99.69 5 10.5511 96.4639
Table 6
10 × 5-fold SCV result of DR2T + PCA+LDA + MPSO-ELM scheme over DS-255.
1 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 50 (98.04) 51 (10 0.0 0) 254 (99.61)
2 51(10 0.0 0) 50 (98.04) 51 (10 0.0 0) 51 (10 0.0 0) 50 (98.04) 253 (99.22)
3 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 255 (10 0.0 0)
4 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 255 (10 0.0 0)
5 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 255 (10 0.0 0)
6 51 (10 0.0 0) 50 (98.04) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 254 (99.61)
7 51 (10 0.0 0) 51 (10 0.0 0) 50 (98.04) 51 (10 0.0 0) 50 (98.04) 253 (99.22)
8 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 255 (10 0.0 0)
9 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 255 (10 0.0 0)
10 51 (10 0.0 0) 49 (96.08) 51 (10 0.0 0) 51 (10 0.0 0) 51 (10 0.0 0) 253 (99.22)
Sum 2542
Average 254.2 (99.69)
norm value of wo results in a smaller condition value of matrix Table 6 lists the correctly classified samples and the corre-
H. Among DE-ELM, PSO-ELM and MPSO-ELM, MPSO-ELM obtains sponding accuracies obtained by “DR2T+ PCA+LDA + MPSO-ELM”
smaller condition value, smaller norm value and higher accuracy. on DS-255 during each trial of a 10 × 5-fold SCV process. The
Therefore, it can be concluded that the proposed MPSO-ELM can results in the table indicate that the proposed “DR2T+ PCA+LDA
achieve better generalization performance with compact networks. + MPSO-ELM” scheme can correctly classify 2542 samples out of
The results reported in the tables are the average values of 50 tri- 2550 samples (2200 pathological and 350 healthy samples). Fur-
als and the parameters of all the schemes are determined through ther, among 2200 pathological samples, 2196 are correctly classi-
experimental evaluation. fied by our scheme and the rest four samples are misclassified to
To demonstrate the efficiency of the proposed MPSO-ELM al- healthy class. While among 350 healthy samples, 346 samples are
gorithm with three features, accuracy comparison has been made correctly classified by our scheme and rest four samples are mis-
with k-NN, random forest (RF), and SVM classifier along with classified to pathological class. Considering these results, the sensi-
BPNN, ELM, and PSO-ELM on all the three datasets, and the re- tivity (Se ), specificity (Sp ), and precision values of “DR2T+ PCA+LDA
sults are shown in Fig. 7. For DS-66, the accuracies earned by + MPSO-ELM” scheme are computed as 99.82%, 98.86%, and 99.82%,
k-NN, BPNN, RF, SVM, ELM, and PSO-ELM are 99.55%, 10 0.0 0%, respectively which are listed in Table 7.
99.39%, 99.70%, 10 0.0 0%, and 10 0.0 0% respectively. The accura- To compare the efficacy of PCA+LDA over PCA, we employ both
cies obtained by k-NN, BPNN, RF, SVM, ELM, and PSO-ELM are of them separately in the proposed system and named as “DR2T+
99.50%, 10 0.0 0%, 99.38%, 10 0.0 0%, 99.94%, and 10 0.0% respectively PCA + MPSO-ELM” and “DR2T+ PCA+LDA + MPSO-ELM”. The per-
on DS-160; while the accuracies are 99.02%, 99.22%, 99.14%, 99.37%, formances of both the schemes over three datasets are shown in
99.18%, and 99.49% respectively on DS-255. However, MPSO-ELM Table 7. It may be noticed that the proposed “DR2T+ PCA+LDA +
earns ideal classification on DS-66 and DS-160, and obtains an ac- MPSO-ELM” scheme earns better performance than “DR2T+ PCA +
curacy of 99.69% on DS-255 which is superior to all other classi- MPSO-ELM” on all the datasets with less number of features. For
fiers. Therefore, the proposed learning algorithm is found to be the DS-255, “DR2T+ PCA + MPSO-ELM” obtains slightly higher speci-
most suitable algorithm among all other learning algorithms. ficity and precision values than “DR2T+ PCA+LDA + MPSO-ELM”.
244 D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247
100
99.5
99
Classification accuracy (%)
98.5
98
97.5
97 k−NN
BPNN
96.5 RF
SVM
ELM
96
PSO−ELM
MPSO−ELM
95.5
95
DS−66 DS−160 DS−255
Datasets
Fig. 7. Classification accuracy of different classifiers over three standard datasets.
Table 7
Classification performance (%) of the proposed scheme with PCA and PCA+LDA over
three datasets.
Table 8
Classification accuracy (%) comparison with wavelet and curvelet based method over
three datasets.
However, the higher the sensitivity value of a CAD system, the bet- in the proposed system and the results are reported in Table 8.
ter is the performance of the CAD system. Therefore, the proposed It may be seen that the proposed scheme with DR2T features
“DR2T+ PCA+LDA + MPSO-ELM” scheme holds greater potential in achieves better performance than DWT and FDCT features on all
making correct clinical decisions. the datasets. Here, the DWT features are derived from all the sub-
Further, in order to support the effectiveness of DR2T features bands of 2-level decomposition. In addition, the DWT features used
over DWT and curvelet (FDCT) features, we have conducted an ex- in literature [3,5,6] are also tested which results in smaller accu-
periment where both DWT and FDCT features are separately used racy compared to the proposed scheme. It may be noted that the
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 245
Table 9
Comparative analysis with other competent PBDSs on three standard datasets.
FDCT features are similar to the literature [62]. It can be concluded faster response to probe MR images. Additionally, it improves the
here that using DR2T features the proposed scheme brings poten- conditioning and produces better generalization performance with
tial improvements in the performance. Furthermore, to test the ef- a much more compact network. This makes the system to achieve
fectiveness of CLAHE in the proposed system, we study the results better results as compared to its counterparts.
of the system with CLAHE and without CLAHE. The results demon- However, the proposed system has the following loopholes. The
strate that the system without CLAHE attains slightly smaller accu- proposed system has been validated on three available datasets
racy (i.e., 99.61%). which accommodate images from patients during the late and
middle stages of diseases, but a larger dataset with images from all
6.4. Comparison with other PBDSs stages of diseases can be tested in order to achieve better general-
ization performance. The current work deals with solving a two-
An extensive comparison with 21 existing competent PBDSs has class classification problem, however solving a multi-class brain
been made on three datasets in the context of feature size, run size disease classification problem is highly in demand. Further, MPSO
and the classification accuracy as given in Table 9. It can be seen requires more parameter to tune, so a less parameter based opti-
that a large number of the PBDSs yield perfect classification on DS- mization scheme can be investigated in future.
66, but merely two schemes, such as “RT + PCA + LS-SVM” [4] and
“DWPT + TE + GEPSVM” [22] along with our proposed “DR2T +
PCA+LDA + MPSO-ELM” scheme offer ideal classification on DS-160. 7. Conclusions and future work
Further, it is noticed that no existing PBDSs can earn perfect clas-
sification on DS-255, but the suggested system earns higher classi- In this paper, an attempt has been made to develop an im-
fication accuracy i.e., 99.69% with a minimum number of features proved pathological brain detection system. The proposed scheme
than others. Though the improvement in accuracy is marginal and initially uses DR2T features to extract the relevant features from
comparable with some of the existing schemes, the result is ob- the enhanced brain MR images. A PCA+LDA approach has been em-
tained over a number of runs of a k-fold SCV procedure. This re- ployed to reduce the feature dimension. Finally, a new learning al-
flects the improvement in proposed scheme to be robust and reli- gorithm called MPSO-ELM is proposed to train the SLFN. The pro-
able. The use of MSPO-ELM in the proposed scheme leads to have posed scheme inherits the advantages of DR2T and ELM for detec-
better generalization performance and faster response on unknown tion of the pathological brain from MR images. The experimental
testing data. results on three standard datasets demonstrate that the proposed
scheme yields higher accuracy than other competent schemes with
6.5. Strengths and weaknesses a minimum number of features. Moreover, it has been shown that
the proposed MPSO-ELM method offers several advantages over
From the experiments, it has been shown that the proposed other methods such as BPNN, SVM, and conventional ELM.
system markedly improves the recent results. The suggested PBDS This work opens up many research directions. The proposed
makes use of discrete ripplet-II transform (DR2T) for feature ex- PBDS has been validated on various accessible datasets that are
traction. Unlike other transforms like Fourier transform, DWT, smaller of size, but a bigger dataset collected online will further
DTCWT, ridgelet, etc., DR2T has the capability of representing 2D prove its potency. To improve the generalization behavior of the
singularities along arbitrarily shaped curves which is inherent in proposed scheme, images from various imaging modalities such as
MR images. In addition, it provides rotation invariant and sparse CT, MRSI, and PET can be considered. Hybridizing a less parameter-
features which is essential for performing classification task. Fur- based optimization algorithm with ELM is another possible future
ther, MPSO-ELM is proposed to classify the MR images as it pos- work. Deep learning algorithms could be investigated as potential
sesses interesting properties that enable our system to provide alternatives to the proposed MPSO-ELM. In addition, other popu-
246 D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247
lar transforms such as nonsubsampled contourlet transform (NSCT) [27] Y.-D. Zhang, S.-H. Wang, X.-J. Yang, Z.-C. Dong, G. Liu, P. Phillips, T.-F. Yuan,
and contourlet can be tested as the feature extractor. Pathological brain detection in MRI scanning by wavelet packet Tsallis entropy
and fuzzy support vector machine, SpringerPlus 4 (1) (2015b) 1–16.
[28] Y. Zhang, S. Wang, P. Sun, P. Phillips, Pathological brain detection based on
References wavelet entropy and Hu moment invariants, Bio-med. Mater. Eng. 26 (s1)
(2015c) S1283–S1290.
[1] S. Chaplot, L.M. Patnaik, N.R. Jagannathan, Classification of magnetic resonance [29] S. Wang, Y. Zhang, X. Yang, P. Sun, Z. Dong, A. Liu, T.-F. Yuan, Pathological brain
brain images using wavelets as input to support vector machine and neural detection by a novel image feature fractional Fourier entropy, Entropy 17 (12)
network, Biomed. Signal Process. Control 1 (1) (2006) 86–92. (2015) 8278–8296.
[2] C. Westbrook, Handbook of MRI Technique, John Wiley & Sons, Oxford, 2014. [30] Y. Zhang, Y. Sun, P. Phillips, G. Liu, X. Zhou, S. Wang, A multilayer perceptron
[3] Y. Zhang, Z. Dong, L. Wu, S. Wang, A hybrid method for MRI brain image clas- based smart pathological brain detection system by fractional Fourier entropy,
sification, Expert Syst. Appl. 38 (8) (2011) 10049–10053. J. Med. Syst. 40 (7) (2016) 1–11.
[4] S. Das, M. Chowdhury, K. Kundu, Brain MR image classification using multi- [31] S. Wang, P. Phillips, J. Yang, P. Sun, Y. Zhang, Magnetic resonance brain classifi-
scale geometric analysis of ripplet, Prog. Electromagn. Res. 137 (2013) 1–17. cation by a novel binary particle swarm optimization with mutation and time–
[5] D.R. Nayak, R. Dash, B. Majhi, Brain MR image classification using two-dimen- varying acceleration coefficients, Biomed Engineering/Biomedizinische Technik
sional discrete wavelet transform and AdaBoost with random forests, Neuro- (2016) 1–10.
computing 177 (2016) 188–197. [32] S. Wang, S. Lu, Z. Dong, J. Yang, M. Yang, Y. Zhang, Dual-tree complex wavelet
[6] E.A. El-Dahshan, H.M. Mohsen, K. Revett, A.B.M. Salem, Computer-aided diag- transform and twin support vector machine for pathological brain detection,
nosis of human brain tumor through MRI: a survey and a new algorithm, Ex- Appl. Sci. 6 (6) (2016) 169.
pert Syst. Appl. 41 (11) (2014) 5526–5545. [33] D.R. Nayak, R. Dash, B. Majhi, Stationary wavelet transform and adaboost with
[7] M. Maitra, A. Chatterjee, A Slantlet transform based intelligent system for mag- SVM based pathological brain detection in MRI scanning, CNS Neurol. Disord.
netic resonance brain image classification, Biomed. Signal Process. Control 1 Drug Targets 16 (2) (2017) 137–149.
(4) (2006) 299–306. [34] K.A. Johnson, J.A. Becker, 1999, The Whole Brain Atlas, http://www.med.
[8] E.S.A. El-Dahshan, T. Honsy, A.B.M. Salem, Hybrid intelligent techniques for harvard.edu/AANLIB/.
MRI brain images classification, Digit. Signal Process. 20 (2) (2010) 433–441. [35] S.M. Pizer, R.E. Johnston, J.P. Ericksen, B.C. Yankaskas, K.E. Muller, Contrast-lim-
[9] Y. Zhang, S. Wang, L. Wu, A novel method for magnetic resonance brain im- ited adaptive histogram equalization: speed and effectiveness, in: Proceedings
age classification based on adaptive chaotic PSO, Prog. Electromagn. Res. 109 of the First Conference on Visualization in Biomedical Computing, IEEE, 1990,
(2010) 325–343. pp. 337–345.
[10] Y. Zhang, L. Wu, S. Wang, Magnetic resonance brain image classification by [36] E.D. Pisano, S. Zong, B.M. Hemminger, M. DeLuca, R.E. Johnston, K. Muller,
an improved artificial bee colony algorithm, Prog. Electromagn. Res. 116 (2011) M.P. Braeuning, S.M. Pizer, Contrast limited adaptive histogram equalization
65–79. image processing to improve the detection of simulated spiculations in dense
[11] Y. Zhang, S. Wang, G. Ji, Z. Dong, An MR brain images classifier system via mammograms, J. Digit. Imaging 11 (4) (1998) 193–200.
particle swarm optimization and kernel support vector machine, Sci. World J [37] M.N. Do, M. Vetterli, The finite ridgelet transform for image representation,
2013 (2013) 1–9. IEEE Trans. Image Process. 12 (1) (2003) 16–28.
[12] Y. Zhang, L. Wu, An MR brain images classifier via principal component anal- [38] E.J. Candés, D.L. Donoho, Ridgelets: A key to higher-dimensional intermit-
ysis and kernel support vector machine, Prog. Electromagn. Res. 130 (2012) tency? Philos. Trans. Royal Soc. London Math. Phys. Eng. Sci. 357 (1760) (1999)
369–388. 2495–2509.
[13] S. Wang, Y. Zhang, Z. Dong, S. Du, G. Ji, J. Yan, J. Yang, Q. Wang, C. Feng, [39] E.J. Candés, D.L. Donoho, Curvelets- a Surprisingly Effective Nonadaptive Rep-
P. Phillips, Feed-forward neural network optimized by hybridization of PSO and resentation for Objects with Edges, Vanderbilt University Press, Nashville, TN,
ABC for abnormal brain detection, Int. J. Imaging Syst. Technol. 25 (2) (2015) 20 0 0, pp. 105–120.
153–164. [40] E. Candes, L. Demanet, D. Donoho, L. Ying, Fast discrete curvelet transforms,
[14] Y. Chen, L. Shi, Q. Feng, J. Yang, H. Shu, L. Luo, J.-L. Coatrieux, W. Chen, Artifact Multiscale Model. Simul. 5 (3) (2006) 861–899.
suppressed dictionary learning for low-dose CT image processing, IEEE Trans. [41] E.J. Candés, D.L. Donoho, New tight frames of curvelets and optimal represen-
Med. Imaging 33 (12) (2014) 2271–2292. tations of objects with piecewise C2 singularities, Commun. Pure Appl. Math.
[15] Y.-D. Zhang, S. Chen, S.-H. Wang, J.-F. Yang, P. Phillips, Magnetic resonance 57 (2) (2004) 219–266.
brain image classification based on weighted-type fractional Fourier transform [42] J. Xu, L. Yang, D. Wu, Ripplet: A new transform for image processing, J. Vis.
and nonparallel support vector machine, Int. J. Imaging Syst. Technol. 25 (4) Commun. Image Represent. 21 (7) (2010) 627–639.
(2015a) 317–327. [43] M. Ghahremani, H. Ghassemian, Remote sensing image fusion using ripplet
[16] Y. Zhang, Z. Dong, A. Liu, S. Wang, G. Ji, Z. Zhang, J. Yang, Magnetic reso- transform and compressed sensing, IEEE Geosci. Remote Sens. Lett. 12 (3)
nance brain image classification via stationary wavelet transform and general- (2015) 502–506.
ized eigenvalue proximal support vector machine, J. Med. Imaging Health Inf. [44] J. Xu, D. Wu, Ripplet transform type II transform for feature extraction, IET
5 (7) (2015b) 1395–1403. Image Process. 6 (4) (2012) 374–385.
[17] D.R. Nayak, R. Dash, B. Majhi, Classification of brain MR images using discrete [45] A. Cormack, The radon transform on a family of curves in the plane, Proc. Am.
wavelet transform and random forests, in: Proceedings of the Fifth National Math. Soc. 83 (2) (1981) 325–330.
Conference on Computer Vision, Pattern Recognition, Image Processing and [46] A. Cormack, The radon transform on a family of curves in the plane. ii, Proc.
Graphics (NCVPRIPG), IEEE, 2015, pp. 1–4. Am. Math. Soc. 86 (2) (1982) 293–298.
[18] Y.-D. Zhang, X.-Q. Chen, T.-M. Zhan, Z.-Q. Jiao, Y. Sun, Z.-M. Chen, Y. Yao, [47] J. Yang, J.-y. Yang, Why can LDA be performed in PCA transformed space? Pat-
L.-T. Fang, Y.-D. Lv, S.-H. Wang, Fractal dimension estimation for developing tern Recognit. 36 (2) (2003) 563–566.
pathological brain detection system based on Minkowski-Bouligand method, [48] A.M. Martínez, A.C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell.
IEEE Access 4 (2016) 5937–5947. 23 (2) (2001) 228–233.
[19] D.R. Nayak, R. Dash, B. Majhi, Pathological brain detection using curvelet fea- [49] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and ap-
tures and least squares SVM, Multimed. Tools Appl. (2016) 1–24. plications, Neurocomputing 70 (1) (2006) 489–501.
[20] M. Saritha, K.P. Joseph, A.T. Mathew, Classification of MRI brain images us- [50] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: a new learn-
ing combined wavelet entropy based spider web plots and probabilistic neural ing scheme of feedforward neural networks, in: Proceedings of the IEEE In-
network, Pattern Recognit. Lett. 34 (16) (2013) 2151–2156. ternational Joint Conference on Neural Networks, Vol. 2, IEEE, 2004, pp. 985–
[21] G. Yang, Y. Zhang, J. Yang, G. Ji, Z. Dong, S. Wang, C. Feng, Q. Wang, Automated 990.
classification of brain images using wavelet-energy and biogeography-based [51] G.-B. Huang, D.H. Wang, Y. Lan, Extreme learning machines: a survey, Int. J.
optimization, Multimed. Tools Appl. 75 (2015) 1–17. Mach. Learn. Cybern. 2 (2) (2011) 107–122.
[22] Y. Zhang, Z. Dong, S. Wang, G. Ji, J. Yang, Preclinical diagnosis of magnetic reso- [52] Q.-Y. Zhu, A.K. Qin, P.N. Suganthan, G.-B. Huang, Evolutionary extreme learning
nance (MR) brain images via discrete wavelet packet transform with Tsallis en- machine, Pattern Recognit. 38 (10) (2005) 1759–1763.
tropy and generalized eigenvalue proximal support vector machine (GEPSVM), [53] R.C. Eberhart, J. Kennedy, Particle swarm optimization, in: Proceedings of the
Entropy 17 (4) (2015) 1795–1813. IEEE conference on Neural Network, IEEE, 1995a, pp. 1942–1948.
[23] D.R. Nayak, R. Dash, B. Majhi, J. Mohammed, Non-linear cellular automata [54] R.C. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in:
based edge detector for optical character images, Simulation (2016) 1–11. Proceedings of the Sixth International Symposium on Micro Machine and Hu-
[24] Y. Zhang, S. Wang, Z. Dong, P. Phillip, G. Ji, J. Yang, Pathological brain de- man Science, IEEE, 1995b, pp. 39–43.
tection in magnetic resonance imaging scanning by wavelet entropy and hy- [55] Y. Shi, R. Eberhart, A modified particle swarm optimizer, in: Proceedings of
bridization of biogeography-based optimization and particle swarm optimiza- the IEEE International Conference on Evolutionary Computation Proceedings,
tion, Prog. Electromagn. Res. 152 (2015) 41–58. 1998. IEEE World Congress on Computational Intelligence., The 1998, IEEE,
[25] X. Zhou, S. Wang, W. Xu, G. Ji, P. Phillips, P. Sun, Y. Zhang, Detection of patho- 1998, pp. 69–73.
logical brain in MRI scanning based on wavelet-entropy and naive Bayes clas- [56] Y. Xu, Y. Shu, Evolutionary extreme learning machine based on particle swarm
sifier, in: Proceedings of the Bioinformatics and Biomedical Engineering, 2015, optimization, in: Proceedings of the International Symposium on Neural Net-
pp. 201–209. works, Springer, 2006, pp. 644–652.
[26] G. Zhang, Q. Wang, C. Feng, E. Lee, G. Ji, S. Wang, Y. Zhang, J. Yan, Automated [57] G. Zhao, Z. Shen, C. Miao, Z. Man, On improving the conditioning of extreme
classification of brain MR images using wavelet-energy and support vector ma- learning machine: a linear case, in: Proceedings of the 7th International Con-
chines, in: Proceedings of the International Conference on Mechatronics, Elec- ference on Information, Communications and Signal Processing, ICICS, IEEE,
tronic, Industrial and Control Engineering (MEIC-15), 2015a, pp. 683–686. 2009, pp. 1–5.
D.R. Nayak et al. / Neurocomputing 282 (2018) 232–247 247
[58] F. Han, H.-F. Yao, Q.-H. Ling, An improved evolutionary extreme learning Ratnakar Dash received his Ph.D. degree from National
machine based on particle swarm optimization, Neurocomputing 116 (2013) Institute of Technology, Rourkela, India, in 2013. He is cur-
87–93. rently working as Assistant Professor in the Department
[59] S. Suresh, R.V. Babu, H. Kim, No-reference image quality assessment using of Computer Science and Engineering at National Insti-
modified extreme learning machine classifier, Appl. Soft Comput. 9 (2) (2009) tute of Technology, Rourkela, India. His field of interests
541–552. include signal processing, image processing, intrusion de-
[60] A. Ratnaweera, S.K. Halgamuge, H.C. Watson, Self-organizing hierarchical par- tection system, steganography, etc. He is a professional
ticle swarm optimizer with time-varying acceleration coefficients, IEEE Trans. member of IEEE, IE, and CSI. He has published forty re-
Evolut. Comput. 8 (3) (2004) 240–255. search papers in journals and conferences of international
[61] P.L. Bartlett, The sample complexity of pattern classification with neural net- repute.
works: the size of the weights is more important than the size of the network,
IEEE Trans. Inf. Theory 44 (2) (1998) 525–536.
[62] D.R. Nayak, R. Dash, B. Majhi, V. Prasad, Automated pathological brain detec-
tion system: A fast discrete curvelet transform and probabilistic neural net- Banshidhar Majhi received his Ph.D. degree from Sam-
work based approach, Expert Syst. Appl. 88 (2017) 152–164. balpur University, Odisha, India, in 2001. He is currently
working as a Professor in the Department of Computer
Deepak Ranjan Nayak is currently pursuing Ph.D. in Science and Engineering at National Institute of Technol-
Computer Science and Engineering at National Institute ogy, Rourkela, India. His field of interests include image
of Technology, Rourkela, India. His current research inter- processing, data compression, cryptography and security,
ests include medical image analysis, pattern recognition parallel computing, soft computing, and biometrics. He is
and cellular automata. He serves as the reviewer of many a professional member of MIEEE, FIETE, LMCSI, IUPRAI,
SCI indexed journals such as IET Image Processing, Mul- and FIE. He served as reviewer of many international jour-
timedia Tools and Applications, Computer Vision and Im- nals and conferences. He is the author and co-author of
age Understanding, Computer and Electrical Engineering, over 70 journal papers of international repute. Besides, he
Fractals, Journal of Medical Imaging and Health Informat- has 100 conference papers and he holds 2 patents on his
ics, and IEEE Access. He is a student member of IEEE. name. He has received Samanta Chandra Sekhar Award
for the year 2016 by Odisha Bigyan Academy for his outstanding contributions to
Engineering and Technology.