0% found this document useful (0 votes)
16 views13 pages

Expert Systems With Applications: Rahimeh Rouhi, Mehdi Jafari, Shohreh Kasaei, Peiman Keshavarzian

Uploaded by

Sree Vani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views13 pages

Expert Systems With Applications: Rahimeh Rouhi, Mehdi Jafari, Shohreh Kasaei, Peiman Keshavarzian

Uploaded by

Sree Vani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

ESWA 9558 No.

of Pages 13, Model 5G


26 September 2014

Expert Systems with Applications xxx (2014) xxx–xxx


1

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

5
6

3 Benign and malignant breast tumors classification based on region


4 growing and CNN segmentation
7 Q1 Rahimeh Rouhi a,⇑, Mehdi Jafari b, Shohreh Kasaei c, Peiman Keshavarzian a
8 a
Department of Computer Engineering, Kerman Branch, Islamic Azad University, Kerman, Iran
9 b
Department of Electrical Engineering, Kerman Branch, Islamic Azad University, Kerman, Iran
10 c
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

11
12
a r t i c l e i n f o a b s t r a c t
1
2 4
5
15 Article history: Breast cancer is regarded as one of the most frequent mortality causes among women. As early detection 26
16 Available online xxxx of breast cancer increases the survival chance, creation of a system to diagnose suspicious masses in 27
mammograms is important. In this paper, two automated methods are presented to diagnose mass types 28
17 Keywords: of benign and malignant in mammograms. In the first proposed method, segmentation is done using an 29
18 Breast cancer automated region growing whose threshold is obtained by a trained artificial neural network (ANN). In the 30
19 Segmentation second proposed method, segmentation is performed by a cellular neural network (CNN) whose parame- 31
20 Cellular neural network
ters are determined by a genetic algorithm (GA). Intensity, textural, and shape features are extracted from 32
21 Region growing
22 Genetic algorithm
segmented tumors. GA is used to select appropriate features from the set of extracted features. In the next 33
23 Artificial neural network stage, ANNs are used to classify the mammograms as benign or malignant. To evaluate the performance 34
24 of the proposed methods different classifiers (such as random forest, naïve Bayes, SVM, and KNN) are 35
used. Results of the proposed techniques performed on MIAS and DDSM databases are promising. The 36
obtained sensitivity, specificity, and accuracy rates are 96.87%, 95.94%, and 96.47%, respectively. 37
Ó 2014 Published by Elsevier Ltd. 38
39

40
41
42 1. Introduction challenging problem (Bator & Nieniewski, 2012; Vyborny & 60
Giger, 1994). 61
43 According to American cancer society (ACS), about 40,030 breast Recent developments in digital mammography imaging 62
44 cancer deaths are predicted in USA in 2013 (39,620 women and systems have aimed to better diagnosis of abnormalities in the 63
45 410 men). Breast cancer is referred to an abnormal multiplication breast (Winsberg, Elkin, Macy, Bordaz, & Weymouth, 1967) and 64
46 of cells in the breast tissue. It is the second most common cause have increased the survival chance (Akay, 2009). Computer aided 65
47 of deaths in women after lung cancer. Since 1989, the breast cancer diagnosis (CAD) systems are sets of automatic or semi-automatic 66
48 mortality rates have declined sharply in young women under tools using computer technology to help radiologists with detec- 67
49 50 years of age. Such reduced mortality rates can be associated tion and classification of breast abnormalities (Freer & Ulissey, 68
50 with the early detection and effective treatments (American 2001). As a result, CAD systems seem appealing to the radiologists. 69
51 Cancer Society, 2013). Generally, there are several types of abnor- Generally, a CAD system consists of segmentation, feature extrac- 70
52 malities in mammograms such as masses and micro-calcifications tion and classification stages (Tahmasbi, Saki, & Shokouhi, 2011). 71
53 (Kopans, 1998). Masses are attributed to any lesion, lump, or pro- Efficient segmentation of mammograms in which the main charac- 72
54 tuberance in the breast and micro-calcifications are calcium depos- teristics of tumors, especially boundaries, are preserved can suc- 73
55 its due to secretion of milk glands in the breast appearing in cessfully influence the consequent stages. The main contribution 74
56 clusters or individuals. The size of individual micro-calcifications of this paper is proposing two segmentation approaches based on 75
57 can be ranged from twenty to hundreds of microns in diameter. the improvement of region growing and CNN with respect to pre- 76
58 Since micro-calcifications have higher contrast than masses in serve tumor boundary information to diagnose benign and malig- 77
59 mammograms, detection and diagnosis of masses is a more nancy in mammograms. 78
In mammograms there are two regions of breast and non- 79
breast. To diagnose masses, the region of interest (ROI) is cropped 80
⇑ Corresponding author. Tel.: +98 9135011345.
from the breast area. Pixels inside masses usually have the highest 81
E-mail addresses: [email protected], [email protected] (R. Rouhi), mjafari@
intensity in an ROI. They usually present visually continuous vari- 82
iauk.ac.ir (M. Jafari), [email protected] (S. Kasaei), [email protected]
(P. Keshavarzian). ation. So, these characteristics imply a region growing procedure to 83

http://dx.doi.org/10.1016/j.eswa.2014.09.020
0957-4174/Ó 2014 Published by Elsevier Ltd.

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

2 R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx

84 segment masses in ROIs (Wei, Chen, & Liu, 2012). Region growing Silva, and de Paiva (2009) and Borges Sampaio et al. (2011), mam- 150
85 method is a region-based segmentation in which masses are seg- mograms segmentation was done using the stated CNN templates 151
86 mented by grouping similar neighboring pixels of seed points. of Textudil and Blur that were introduced in Zarándy et al. (1994). 152
87 For example, if a similarity measure of the two adjacent pixels is With respect to the obtained results by Borges et al., it was pointed 153
88 greater than a threshold, these pixels are considered as similar out that to reach better segmentation results CNN parameters 154
89 and thus are grouped together. The grouping of neighboring pixels should be optimized. Template learning techniques have been used 155
90 continues until no similar pixels remain. widely in studies to design CNN template. During this, the correla- 156
91 Compared to the detection of seed points for an image, selection tion between the input and the desired output images is found and 157
92 of an appropriate value for the threshold is a more considerable the template is obtained. Some recent relevant studies on design- 158
93 problem; because the threshold can affect preserving the boundary ing CNN parameters are stated as follows. 159
94 information of the masses and benign and malignancy diagnosis A method to detect edges in noisy images using CNN was intro- 160
95 (Cao, Hao, Zhu, & Xia, 2010). In spite of these problems, during duced in Li, Liao, Li, Huang, and Li (2011) in which the CNN tem- 161
96 the region growing process (as a region-based method in boundary plate was designed with a linear matrix inequality (LMI)-based 162
97 extraction) the intensity and the spatial and connectivity informa- method. Cerasa et al. introduced a method to train CNN by using 163
98 tion are considered; contrary to what happened in pixel- and edge- a genetic algorithm in automatic segmentation of brain MR images 164
99 based methods (Rabottino, Mencattini, Salmeri, Caselli, & Lojacono, (Cerasa et al., 2012). Wang et al. proposed CNN learning algorithms 165
100 2011). There are different approaches to improve traditional region based on genetic algorithm and particle swarm optimization (PSO) 166
101 growing segmentation, some recent studies are as follows. for infrared images. Based on their results, genetic algorithm 167
102 Wei et al. proposed a boundary segmentation technique based showed fast convergence compared to PSO, especially for large 168
103 on region growing as a part of their content-based mammogram images (Wang, Yang, Xie, & An, 2014). In fact, in the second pro- 169
104 retrieval system. The brightest pixels in an ROI were considered posed technique, the CNN parameters are set using a genetic algo- 170
105 as seed points. Determination of the threshold was done experi- rithm with an attempt to preserve tumor boundary in 171
106 mentally in which 100 images were tested using different thresh- mammograms segmentation, specially for the images in DDSM 172
107 olds and then the threshold was set as 1.07 (Wei et al., 2012). and MIAS databases. 173
108 Choosing a constant threshold for all images caused smaller or lar- In order to evaluate the two proposed methods, the following 174
109 ger mass segments than the reference mass segment which is stages are performed. After segmentation of the tumors based on 175
110 called under-and over-segmentation, respectively. As such, a improved region growing and CNN, as it was pointed out above, 176
111 method is needed to determine an adaptive threshold for each different features such as intensity, shape, and texture are 177
112 image. Berber et al. adjusted region growing to set threshold adap- extracted from the segmented tumors. Then, to select appropriate 178
113 tively by using a produced mass size estimation of intensity histo- features and reduce the computational cost the feature selection 179
114 gram in OTSU segmentation (Berber, Alpkocak, Balci, & Dicle, process is performed by a genetic algorithm. The performance of 180
115 2013). In Lixin, Yanan, Bin, and Yuhong (2013) to reach the best the proposed segmentation methods is practically revealed in 181
116 growth criteria, the trend of the gradient of different mass images benign and malignant classification stage. The proposed methods 182
117 was considered in order to segment masses in mammograms. Fur- are evaluated by different classifiers such as MLP, random forest, 183
118 thermore, the high frequency pixels with intensity above 200 were naïve Bayes, SVM, and KNN. It should be noted that both of the pro- 184
119 determined as seed points. Görgel et al. proposed a local seed region posed methods are new approaches compared to the existing 185
120 growing (LSRG) method in which the selection of seed points and methods to diagnose benign and malignant breast tumors in mam- 186
121 similarity criterion were determined according to the local and glo- mograms. The obtained results demonstrate that the segmentation 187
122 bal information (i.e., mean and standard deviation of neighboring using the improved CNN with capability of parallel computing and 188
123 pixels 3  3, as local and the entire image as global). Also, to set MLP classification are promising compared with the existing meth- 189
124 the threshold, different definitions were introduced and tested ods, in terms of sensitivity, specificity, accuracy, and AUC. 190
125 (Görgel, Sertbas, & Ucan, 2013). In Melouah (2013), a training data- To provide a better understanding of the proposed algorithms, 191
126 set contains the extracted features from images and their corre- the presented article is organized in the following order. In Section 192
127 sponding threshold in order to produce an adaptive threshold in 2, the proposed method is described consisting of six subsections 193
128 region growing was built. Then, by using a K-nearest neighbor of image acquisition, pre-processing, segmentation, post-process- 194
129 (KNN) and the training dataset the threshold of an image was esti- ing, feature extraction, feature selection, and classification. In 195
130 mated, adaptively. In the first proposed method of this study, the Section 3 experimental results of segmentation and classification 196
131 region growing-based segmentation method is improved by using are explained, and finally in Section 4 the conclusion is drawn. 197
132 the extracted intensity features from ROIs and applying the ANN to
133 generate an adaptive threshold. 2. Proposed method 198
134 Because of the high computational cost of region growing
135 method, it is unsuitable for segmenting high resolution images The applied stages in the breast cancer detection and diagnosis 199
136 such as mammograms (Rabottino et al., 2011). Therefore, to over- systems are similar to those of artificial intelligence-based systems 200
137 come this shortcoming, a segmentation method using CNN with and generally consist of pre-processing, segmentation, feature 201
138 hardware accessibility and parallel computing is applied in the sec- extraction, feature selection, and classification (Ganesan et al., 202
139 ond proposed method of this study. 2013). The general trend of our proposed automatic techniques is 203
140 A CNN is a model introduced by Chua and Yang based on ANN introduced in Figs. 1 and 2. The difference between the two pro- 204
141 and cellular automata. Due to its architecture computational posed techniques lies in the way segmentation techniques are 205
142 power, it is suited for real-time and high-speed parallel signal pro- applied. As shown in Fig. 1, the first method uses region growing 206
143 cessing (Chua & Yang, 1988). Cells in the network map input signal algorithm whose required threshold is obtained by a trained neural 207
144 into output with dynamic behavior which is determined by using network as shown in Fig. 2 the segmentation of the second method 208
145 19 parameters named cloning template. Setting the CNN template is performed by a cellular neural network whose parameters are 209
146 is the most important issue in the image processing application of determined by genetic algorithm for partitioning the image into 210
147 CNN (Matsumoto, Chua, & Suzuki, 1990). In different studies, the meaningful regions. In order to elaborate more on the proposed 211
148 segmentation of a variety of images (such as mammograms, MR, methods, this section is divided into several subsections as the 212
149 infrared images) have been done using CNN. In Sampaio, Diniz, following. 213

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx 3

Intensity Manual
feature segmented Image
Image acquisition
extraction images
acquisition

Genetic Pre-processing
ANN Pre-
algorithm
processing

Produced Segmentation Produced Segmentation


(Region growing) template (CNN)
threshold

Feature Feature
extraction extraction

Feature Feature
selection (GA) selection (GA)

Tumor Tumor
classification classification
(ANN) (ANN)

Benign Malignant
Benign Malignant

Fig. 1. The first proposed automated technique based on region growing


Fig. 2. The second proposed automated technique based on cellular neural network
segmentation.
segmentation.

214 2.1. Image acquisition


one mammography exam of one patient. A ‘‘volume’’ is a collection 241
of cases collected together for ease of distribution. Images in this 242
215 The proposed techniques are applied on a portion of images
database were taken in MLO and craniocaudal (CC) views. Each 243
216 taken from two common databases of mammography imaging anal-
image has been annotated by at least two expert radiologists and 244
217 ysis society (MIAS) and digital database for screening mammography
the location of abnormalities is assigned as a chain code. Also, this 245
218 (DDSM). To examine the performance of the proposed techniques,
database contains metadata of each abnormality using the breast 246
219 we used 93 and 170 images containing malignant and benign
imaging reporting and data system (BI-RADS) lexicon. Severity of 247
220 masses, respectively, from both MIAS and DDSM databases. In
abnormalities are divided into benign and malignant and classes 248
221 the following the characteristics of the two applied databases are
of abnormalities are categorized into well-defined, ill-defined and 249
222 described.
spiculated masses, architectural distortion, asymmetry, calcifica- 250
tion and normal. Also, characteristics of background tissue of 251
223 2.1.1. Mammography imaging analysis society abnormalities are divided into fatty, fatty-glandular, and dense- 252
224 The database is available online on the website http://pei- glandular. 253
225 pa.essex.ac.uk/info/mias.html for the academic research purposes.
226 The database contains 161 pairs of mediolateral oblique (MLO) view
2.2. Pre-processing 254
227 images with 1024  1024 resolution. The images are taken from a
228 film-screen mammographic imaging of the United Kingdom
In the pre-processing stage first of all, the images are cropped to 255
229 national breast screening program. After digitizing the images,
obtain the ROIs containing abnormality tissues and masses. This is 256
230 they have been annotated based on background tissue (fatty,
done by using the existing coordinates corresponding to the center 257
231 fatty-glandular and dense glandular), class of abnormality (calcifi-
and the approximate radius of each abnormality for MIAS images 258
232 cation, well defined, spiculated, architectural distortion, asymme-
and the chain code for DDSM images. Since mammograms are 259
233 try, ill-defined, and normal) and severity of abnormality (benign
taken under different conditions, they are affected by noise and 260
234 or malignant) by expert radiologists. Also, the center coordinates
some artifacts. Moreover, they usually do not have the desired con- 261
235 of abnormality and its approximate radius are determined.
trast to perform accurate analyses of the two proposed techniques. 262
As such, the local area histogram equalization is used (Bick & 263
236 2.1.2. Digital database for screening mammography Diekmann, 2010) and then the median filtering (Mohanty, 264
237 It is another collection of mammograms consisting of 2620 Senapati, & Lenka, 2013) is applied to suppress noise. In the histo- 265
238 cases and 43 volumes. It is freely available on http://mara- gram equalization stage, the intensity of image pixels is stretched 266
239 thon.csee.usf.edu/Mammography/Database.html. A ‘‘case’’ is a to extend the contrast. Median filtering is a nonlinear operation 267
240 collection of mammograms and information corresponding to (Forsyth & Ponce, 2003) often used in image processing to reduce 268

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

4 R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx

269 ‘‘salt and pepper’’ and speckle noise. Fig. 3 shows subsequent pre- Artificial neural networks are major parts of machine learning 313
270 processing steps applied on an image taken from DDSM database. algorithms. Basically, a neural network is constructed based on 314
some processing elements, namely neurons, which are connected 315
together by synapses. Each neuron calculates the sum of weighted 316
271 2.3. Segmentation input signals and then an activation function is applied to limit the 317
output of neurons to a pre-specified interval. In order to map input 318
272 Here, the segmentation process separates the tumor areas from vectors to output vectors, the weights of the neural network should 319
273 the background tissue in mammograms. There are two major be tuned. This process is known as training or learning (Haykin, 320
274 approaches in segmentation. (i) Region-based methods (such as 1994). 321
275 region growing, split/merge using quad-tree decomposition) in Multi-layer neural network (MLP) is composed of one or several 322
276 which similarities are detected, and (ii) boundary-based methods hidden layers. MLP is trained using a back propagation (BP) algo- 323
277 (such as thresholding, gradient edge detection) in which disconti- rithm. In this algorithm, the aim is minimizing the error E between 324
278 nuities are detected and linked to form region boundaries (Oliver the network output and target vectors. The applied steps in BP are 325
279 et al., 2010). The segmentation of nontrivial images is one of the as the following (Chauhan, Goel, & Dhingra, 2012): 326
280 most difficult tasks in image processing. In the proposed diagnosis
281 techniques two region-based methods, region growing and cellular (i) Initialize the weights in the neural network randomly. 327
282 neural network, are proposed for segmenting tumors. These are (ii) Repeat. 328
329
283 explained as the following. (iii) For each training sample x: 330
(iv) Forward propagate x through the network. 331
(v) Backward error E in the network. 332
284 2.3.1. Region growing method
(vi) End. 333
285 The applied segmentation method in the first proposed tech-
(vii) Until terminating condition (minimum error E). 334
286 nique is an improvement of region growing segmentation method.
(viii) End. 335
287 Region growing is a region-based method starting with seed points
288 in the image. Seeds propagate until the specified stop criteria is Forward propagation: 336
289 satisfied (Zucker, 1976). Different steps of region growing are pre- Calculate the output corresponding to each training sample x, 337
290 sented as the following: by passing x through neurons in the network. 338
Backward propagation: 339
291 (i) Input image = ROI; (x, y) = maximum intensity in ROI; Produced errors of each neuron in the output and hidden layers 340
292 t = produced seed point using trained ANN; mean of are calculated according to 341
342
293 region = I(x, y);
rk ¼ Ok ð1  Ok Þðtk  Ok Þ ð1Þ 344
294 (ii) Start region growing until the distance between the region
X 345
295 intensity mean and new pixels intensity mean become
296 higher than the threshold t.
dh ¼ Oh ð1  Oh Þ wkh dk ð2Þ
k 347
297 (iii) Add new 4-neighbors pixels.
298 (iv) Add neighbor if inside and not already part of the segmented value of weights related to the synapse is evaluated by 348
349
299 area.
wji ¼ wji þ Dwji ð3Þ 351
300 (v) Add pixel with intensity nearest to the mean of the region, to
301 the region. where Dwji is defined as 352
353
302 (vi) Calculate the new mean of the region.
Dwji ¼ gdj X ij ð4Þ 355
303 (vii) Save the x and y coordinates of the pixel (for the neighbor
304 add process). and g is the learning rate. 356
305 (viii) Return 2. By applying a 2-layer neural network, an appropriate threshold 357
306 is obtained for each image. The input and target matrices for 358
307 Considering the fact that the mammograms do not have the training ANN consist of the extracted values related to the intensity 359
308 same intensity contrast, allocation of a constant threshold to histogram features from ROIs (see Table 1) and the obtained 360
309 segment the image by region growing leads to inaccurate segmen- threshold of each image for region growing. After training, the 361
310 tation results. Consequently, an automatic method is needed to obtained neural network is capable of generating a relevant 362
311 determine an appropriate threshold. To this end, a trained artificial threshold for segmentation. As such, each image is more accurately 363
312 neural network (ANN) is proposed. segmented applying its own generated threshold. 364

(a) (b) (c)


Fig. 3. Result of pre-processing step. (a) Original image, (b) Contrast enhanced image, (c) Median filtered image.

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx 5

365 2.3.2. Cellular neural network method


366 The segmentation in the second proposed technique is based on
367 a CNN whose parameters are determined by using a genetic algo-
368 rithm. CNN is an array of nonlinear programmable analog proces-
369 sors named cells. Each cell interacts with its neighboring cells
370 similar to what happens in the cellular automata method. CNNs
371 are capable of parallel processing by mapping inputs to outputs;
372 similar to what happens in AAN. Fig. 4 shows a 2-dimensional
373 CNN in which each cell interacts with other cells in a 4  4
374 neighborhood.
375 Corresponding to an image with M  N pixels, a CNN with
376 M  N cells is considered. Each cell in CNN is a dynamic system
377 whose state changes over time based on a mathematical model
378 and its output is a nonlinear function of cell states. For a cell Cij a
379 neighborhood Sij(r) with radius r P 0 is defined according to
380
382 Sij ðrÞ ¼ fC kl : maxðjk  ij; jl  jjÞ 6 r; 1 6 k 6 M; 1 6 l 6 N ð5Þ
383 In this work, the CNN model in Chua & Yang (1988) is applied.
384 Each cell in this model is a simple nonlinear analog circuit as Fig. 4. Two-dimensional CNN with 4  4 cells.
385 shown in Fig. 5. It consists of a linear capacitor, an independent
386 current source, an independent voltage source, two linear resistors,
387 at most 2m linear voltage-controlled current sources (m being the
388 number of neighbor cells of the considered unit). The voltage vxij(t)
389 across the capacitor is the state of the cell, Cij, while vyij and vuij(t)
390 represent the input and the output, respectively, and Ixy(i, j; k, l; t)
391 and Ixu(i, j; k, l) are defined as
392
394 Ixy ði; j; k; l; tÞ ¼ Aði; j; k; lÞv ykl ðtÞ ð6Þ
395
397 Ixu ði; j; k; lÞ ¼ Bði; j; k; lÞv ukl ð7Þ
398 Parameters A(i, j; k, l) and B(i, j; k, l) specify interaction between
399 CNN cells. The output vyij(t) is determined using nonlinear voltage Fig. 5. Structure of cell in CNN.
400 controlled current source Iyx that is the only nonlinear element of
401 the cell
402 process such as genetic algorithm which is a stochastic optimiza- 418
1
Iyx ¼ f ðv xij ðtÞÞ ð8Þ tion algorithm originated from a reproduction procedure. Each 419
404 Ry chromosome in GA is characterized as 420
421
405 where f is the characteristic function of the nonlinear controlled CH1 ¼Z; b33; b32; b31; b23; b22; b21; b13; b12; b11; a33;
406 current source, defined as a32; a31; a23; a22; a21; a13; a12; a11 423
407
1 
409 f ðv xij ðtÞÞ ¼ jv xij ðtÞ þ 1j  v xij ðtÞ  1 ð9Þ In order to decrease the computational cost, matrices A and B 424
2 are considered asymmetric. Thus, the structure of each chromo- 425
410 By applying Kirchhoff laws, the state of a cell in CNN is defined some (template) is modeled as 426
427
411 according to the nonlinear differential equation as
412 CH2 ¼ ½a11; a12; a13; a21; a22; b11; b12; b13; b21; b22; z: 429
dv xij ðtÞ 1 X
C ¼  v xij ðtÞ þ Aði; j; k; lÞf ðv xkl ðtÞÞ The training phase of CNN using genetic algorithm is done using 430
dt Rx C kl 2Sij ðrÞ two typical images and their corresponding manually segmented 431
X images by expert radiologists. In Figs. 6 and 7 two typical images 432
þ Bði; j; k; lÞv ukl þ z ð10Þ
taken from DDSM and MIAS databases are shown. The steps in 433
414 C kl 2Sij ðrÞ
training CNN is defined as the following: 434
415 The most important problem in using CNN is parameters setting
416 A(i, j; k, l), B(i, j; k, l), and scalar Z; i.e., training CNN. The template (i) Generate initial population with selecting the numbers in 435
417 [A(i, j; k, l), B(i, j; k, l), Z] can be determined by an evolutionary the range [6, 6] randomly. 436
(ii) Repeat. 437
438
(iii) Calculate fitness function corresponding to each element in 439
Table 1
Intensity histogram features for training ANN to produce an
Population (chromosome). 440
appropriate threshold. (iv) Select pairs of the best ranking chromosomes as parents. 441
(v) Apply cross over operator. 442
Feature No. Feature
(vi) Apply mutation operator. 443
1 Mean (vii) Select one of training images (manually segmented images) 444
2 Variance
3 Skewness
randomly, see Figs. 6 and 7. 445
4 Kurtosis (viii) Set the parameters in produced chromosome from step (vi) 446
5 Entropy in template. 447
6 Energy (ix) Segment image using produced template and CNN 448
7 Contrast
algorithm. 449

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

6 R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx

Population size: 30 455


Crossover percentage: 0.8 456
Mutation percentage: 0.3 457
Mutation rate: 0.02 458
Chromosome selection: Roulette Wheel method. 459
Terminating condition: maximum iteration 100. 460
461
Fitness function in GA is defined as Eq. (11). 462
463
X
M X
N
diff ðGÞ ¼ IGij  T ij ð11Þ
i¼1 j¼1 465

where symbol  denotes the xor operator and is operated between 466
two pixels in location (i, j) of two binary images; i.e., manually seg- 467
mented image (target image) and the segmented image by CNN. 468
Generally, Eq. (11) produces the number of unequal pixels in the 469
target and segmented images. By subtracting the produced value 470
of Eq. (11) from the size of image, the number of equal pixels in 471
both binary images is generated. Hence, fitness function is evalu- 472
ated by 473
474
G
fitnessðCNN Þ ¼ M  N  diff ðGÞ ð12Þ 476

and the aim in GA is maximizing the related value (the number of 477
equal pixels) (Cerasa et al., 2012). 478
After applying all the mentioned procedures in this section, the 479
CNN template is generated as 480
2 3 481
0:4650 0:5773 0:6385
Fig. 6. Two selected images taken from DDSM database to train CNN: (a) original 6 7
images, (b) manually segmented images. A ¼ 4 0:3090 2:8782 0:3090 5
0:6385 0:5773 0:4650
2 3
2:8451 3:8326 1:0681
6 7
B ¼ 4 1:4856 0:9895 1:4856 5 z ¼ 4:9082
1:0681 3:8326 2:8451 483

Due to the fact that images are taken under varying conditions 484
in different databases, segmentation cannot be done precisely and 485
accurately for MIAS images using the above obtained template. As 486
a result, the following template is produced for MIAS database as 487
2 3 488
0:3912 4:6618 4:9898
6 7
A ¼ 4 2:8552 7:1702 2:8552 5
4:9898 4:6618 0:3912
2 3
2:5636 4:9664 2:9539
6 7
B ¼ 4 2:5140 3:2211 2:5140 5 z ¼ 6:3426
2:9539 4:9664 2:5636 490

2.4. Post processing 491

Undesirable objects might appear in the resulted images and it 492


is unavoidable. To remove such imperfections, the area of the 493
objects is taken into consideration. Besides, because of the impor- 494
tance of tumor margin in clinical recognition process, a sequence of 495
morphological techniques (dilation and erosion) is used to pre- 496
serve the marginal information of tumor. Fig. 8 shows the results 497
of post-processing stage. 498

Fig. 7. Two selected images taken from MIAS database to train CNN: (a) original
2.5. Feature extraction 499
images, (b) manually segmented images.
In this work, 51 features related to the intensity histogram, 500

450 (x) Compute fitness function. shape, and texture features are extracted from segmented images 501

451 (xi) Until terminating condition. to present the characteristics of the segmented masses, appropri- 502

452 (xii) End. ately. Compared to other regions in mammograms, tumors appear 503
with a higher intensity. Shape features affect benign and malignant 504
453 The parameters related to perform GA are selected as the classification strength; because tumors belonging to the same class 505
454 following: are of similar shape. As malignant tumors often have erratic 506

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx 7

Fig. 8. Post-processing step: (a) original image, (b) segmented image, and (c) segmented image after pot-processing.

507 texture compared to benign tumors, textural features are extracted Table 3
508 from gray-level co-occurrence matrix (GLCM) (Jalaja, Bhagvati, GLCM features.
509 Deekshatulu, & Pujari, 2005) containing the second-order statisti- Feature No. Feature
510 cal information of neighboring pixels of an image. Most of the 1 Energy
511 extracted features are listed in Tables 2–4. 2 Homogeneity
512 Among extracted features, Zernike moments are good descrip- 3 Correlation
513 tors for object shape. To extract features, they do not require pre- 4 Contrast
514 cise border information of each object. As such, even if the objects
515 are not segmented very well, they can achieve good results (Wei,
516 Li, Chau, & Li, 2009). Table 4
517 Zernike moments map an image to a set of Zernike complex Region features/shape measurements.
518 polynomials. Since Zernike polynomials are orthogonal to each Feature No. Feature
519 other, they present image features without overlapping and extra
1 Area
520 information (Liu, Babbs, & Delp, 2001). The process of calculating 2 Major Axis length
521 Zernike moments related to an image are explained as follows 3 Minor Axis length
522 (Tahmasbi et al., 2011): 4 Convex Area
5 Eccentricity
6 Diameter
523 (i) Calculate radius polynomials. 7 Orientation
524 (ii) Calculate Zernike basic functions. 8 Solidity
525 (iii) Map image matrix on Zernike basic functions to obtain Zer- 9 Perimeter
526 nike moments. 10 Extent
11 Skeleton
527
12 Spiculated
528 Discrete form of Zernike moments for an image with N  N pix-
529 els is
530
n þ 1X
N1 X
N1
extracted features). Produced Zernike moments and their physical 544
Z n;m ¼ f ðx; yÞV n;m ðx; yÞ meaning are shown in Table 5, (Amroabadi, Ahmadzadeh, & 545
kN c¼0 r¼0
Hekmatnia, 2011). 546
n þ 1X
N1 X
N1
In addition to the extracted features, spiculated feature is also of 547
¼ f ðx; yÞRn;m ðpxy Þejmhcr ð13Þ
532 kN c¼0 r¼0 the same importance in classification. It is obtained according to 548
Minavathi, Murali, and Dinesh (2011) by calculating the curvature 549
533 where 0 6 pxy 6 1 and kN is a normalization factor. Also, n is a non- angle for each pixel from the edge of the tumors. As shown in Fig. 9, 550
534 negative integer representing the order of radius polynomial and to calculate the curvature angle, two appropriate pixels on both 551
535 the angle repetition m is an integer satisfying sides of the point A are considered. After plotting the lines passing 552
536
538 n  jmj ¼ an ev en number and jmj 6 n ð14Þ from point A though pixels, the angle between the two lines is 553
evaluated. For each pixel on the edge of tumor a similar process 554
539 where Rn,m and Vn,m are radius polynomials and Zernike two-dimen- is applied to obtain the curvature angle. The curvature angles less 555
540 sional basic function, respectively. Although Zernike moments with than 70° for each pixel on the edge of tumor indicate the spiculated 556
541 higher orders hold more information, they increase the computa- feature. 557
542 tional costs. Therefore, Zernike moments and the angle of each Finally, for each tumor, the number of curvature angles is 558
543 moment are computed up to the 4th order (30 features out of 51 counted. The spiculated feature can effect tumor classification. 559
Malignant tumors have more number of spiculated pixels than 560
benign tumors. 561
Table 2
Intensity histogram features.
2.6. Feature selection 562
Feature No. Feature
1 Mean Feature selection is done in order to select appropriate features 563
2 Variance from extracted features. It improves the prediction accuracy and 564
3 Skewness decreases the computational cost (Lai, Li, & Biscof, 1989). Feature 565
4 Kurtosis
selection is a search problem in a large space of solutions (different 566
5 Entropy
mixtures of features). For selecting features, the genetic algorithms 567

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

8 R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx

Table 5
Zernike moments and their physical meaning.

Index m n Zernike polynomial Physical meaning


0 0 0 1 Piston: constant term
1 1 1 2q sin h Distortion: title in x direction
2 1 1 2q cos h Distortion: title in y direction
pffiffiffi
3 2 2 6q2 sin 2h Astigmatism with axis at ±45°
pffiffiffi
4 2 0 3ð2q2  1Þ Spherical defocus: field curvature
pffiffiffi
5 2 2 6q2 cos 2h Astigmatism with axis at 0 or 90
pffiffiffi
6 3 3 8q3 sin 3h Triangular astigmatism, based on x-axis (Trefoil)
pffiffiffi
7 3 1 8ð3q3  2qÞ sin h cos h Primary coma along x-axis
pffiffiffi
8 3 1 8ð3q3  2qÞ Primary coma along y-axis
pffiffiffi
9 3 3 8q3 cos 3h Triangular astigmatism, based on y-axis (Trefoil)
pffiffiffiffiffiffi
10 4 4 10q4 sin 4h Quatrefoil
pffiffiffiffiffiffi
11 4 2 10ð4q4  3q2 Þ sin 2h 5th order astigmatism
pffiffiffi
12 4 0 5ð6q4  6q2 þ 1Þ Spherical
pffiffiffiffiffiffi
13 4 2 10ð4q4  3q2 Þ cos 2h 5th order astigmatism
pffiffiffiffiffiffi
14 4 4 10q4 cos 4h Quatrefoil

Table 6
Features which are frequently selected using GA in two proposed techniques.

Feature names
Mean Minor Axis Length
Variance Convex Area
Skewness Eccentricity
Kurtosis Diameter
Entropy Orientation
Contrast Spiculated
Homogeneity Solidity
Correlation Perimeter
Area Extent
Major Axis Length Zernike
Skeleton Zernike angle

Terminating condition: maximum iteration 100 and differential 592


between values of fitness functions in 2 iteration of GA > 0.002. 593
Fig. 9. Procedure of calculating curvature angle. 594
Fitness function (Peng, Long, & Ding, 2005): 595

  596
XX pðx; yÞ
568 are used with two different forms of chromosomes creation and fit- MIðx; yÞ ¼ pðx; yÞ log ð15Þ
569 ness functions. In the first form, each chromosome is a binary x2X y2Y
pðxÞpðyÞ 598
570 string in which each gene shows the presence or absence of each
571 feature, as 0 and 1. In the second form, each chromosome has 20 where x and y are two feature vectors, p(x, y) is the joint probability 599
572 genes and each gene is assigned with values ranged from [1, 51] density function of x and y vectors, and p(x) and p(y) are marginal 600
573 in order to select one out of 51 features. The general steps of probability density functions of feature vectors. 601
574 genetic algorithm are as follows: The parameters related to GA in the second form of chromo- 602
somes creation and fitness function are: 603
575 (i) Create the initial population of chromosomes.
576
577 (ii) Repeat. Population size: 50 604
578 (iii) Calculate fitness function corresponding to each element in Crossover percentage: 0.8 605
579 population (individual). Mutation percentage: 0.5 606
580 (iv) Select pairs of the best ranking chromosomes as parents. Mutation rate: 0.02 607
581 (v) Apply cross over operator. Chromosome selection: Roulette Wheel method. 608
582 (vi) Apply mutation operator. Terminating condition: maximum iteration 100. 609
583 (vii) Until terminating condition. 610
584 (viii) End. Fitness function Minavathi et al., 2011: 611
612
585 The parameters related to GA in the first form of chromosomes
kRcf
586 creation and fitness function are: Merit ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð16Þ
k þ kðk  1ÞRff 614
587 Population size: 80
588 Crossover percentage: 0.8 where k is the number of selected features, Rcf is the mean of 615
589 Mutation percentage: 0.5 coefficient correlation between the selected features and target 616
590 Mutation rate: 0.02 vector, and Rff is the mean of coefficient correlation between the 617
591 Chromosome selection: Roulette Wheel method. two selected features. 618

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx 9

Fig. 10. Results of segmentation based on region growing (first proposed method) corresponding two images: (a) original images, (b) segmented images, and (c) target
images.

Fig. 11. Results of segmentation based on CNN (second proposed method) corresponding two images: (a) original images, (b) segmented images, and (c) target images.

619 After performing GA algorithms with different chromosome GA in feature selection step are given to the neural networks as 629
620 structures and fitness functions, some features that are in bold in input vectors. Therefore 20 neurons are placed in the input layers. 630
621 Table 6 are selected frequently among the extracted features in To perform a 2-class categorization to distinguish the benign 631
622 implementation of the two automatic proposed methods on DDSM tumors from the malignant ones, we put one neuron in the output 632
623 and MIAS databases. layer of the neural network to produce two outputs 0 or 1 to detect 633
the benign and malignant tumors, respectively. 634
624 2.7. Classification
3. Experimental results 635
625 Since MLPs are appropriate tools in pattern recognition
626 (Chauhan et al., 2012; Cheng et al., 2006; Kuo, Hsiao, Huang, & To evaluate the performance of the methods, 93 mammography 636
627 Chen, 2008; Pawar & Patil, 2013; Tahmasbi et al., 2011) they are ROIs containing 54 benign and 39 malignant tumors from MIAS 637
628 used in this work. The values related to the selected features using database, and 170 mammography ROIs containing 74 benign and 638

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

10 R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx


670
Table 7 1X n
 2
Segmentation results related to the first proposed technique. MSE ¼ OðiÞ  F ðiÞ ð19Þ
n i¼1 672
Image in Fig. 10 DICE (%) Jaccard (%)
1 0.92 0.87 where O and F are the target and output matrices, respectively. 673
2 0.91 0.89 Other related metrics are also calculated as (Metz, 1978): 674

TP: true positive, the classification result is positive in presence 675


of malignancy. 676

Table 8 TN: true negative, the classification result is negative in being 677
Segmentation results related to the second proposed technique. benign. 678

Image in Fig. 11 DICE (%) Jaccard (%)


FP: false positive, the classification result is positive in being 679
benign. 680
1 0.93 0.88
FN: false negative, the classification result is negative in pres- 681
2 0.95 0.90
ence of malignancy. 682
683
According to above definitions the equations related to specific- 684
639 96 malignant tumors from DDSM database were taken. The pro- ity (accuracy of negative class), sensitivity (accuracy of positive 685
Ò
640 posed algorithms were run in MATLAB (software MATLAB version class) and accuracy of recognize both negative and positive classes 686
641 R2012a) and on a PC with the following characteristics: Intel Pen- are defined as 687
642 tium 4 (2.93 GHz) and 4 GB of RAM with windows-7 operating sys- 688
643 tem. The obtained results are presented in two sub-sections of TN
Specificity ¼ ð20Þ 690
644 segmentation and classification performance for the two proposed TN þ FP
645 methods. 691
TP
Sensitiv ity ¼ ð21Þ 693
646 3.1. Segmentation performance TP þ FN
694
647 Segmentation accuracy determines the eventual success or TP þ TN
Accuracy ¼ ð22Þ 696
648 failure of segmentation procedures. To measure the segmentation TP þ TN þ FP þ FN
649 performance of the proposed methods two criteria DICE and
The receiver operating characteristic (ROC) is one of the standard 697
650 Jaccard which are frequently used are applied. DICE indicates the
criteria for evaluating classification performance of the proposed 698
651 degree of overlapping between two binary images (Dice, 1945)
methods. ROC curve is plotted in a two-dimensional space where 699
652 and Jaccard generates the degree of similarity metric (Cheetham
x-axis and y-axis represent 1-specificity (or true positive rate) and 700
653 & Hazel, 1969). These are defined by
654 sensitivity (or false positive rate), respectively. It shows the tradeoff 701
2jA \ Bj between hit rates and false alarm rates of classifiers. For better 702
DICEðA; BÞ ¼ ð17Þ
656 jAj þ jBj explanation of ROC, area under the ROC curve was measured. 703

657 Sensitivity is a significant factor in diagnosing tumors since an 704


jA \ Bj increase in the value of FN might lead into patients’ death. 705
Jaccard ¼ ð18Þ To determine a good ANN structure, various ANN structures 706
659 jA [ Bj
with different number of layers and nodes are applied. Often 707
660 where A and B are the manually segmented image and output two-layer neural networks obtained better results. The structure 708
661 image of the segmentation method, respectively. of the applied ANNs in the proposed techniques is listed in Tables 709
662 As illustrated bellow the segmentation methods in the pro- 9 and 10 based on the stage with which ANNs are applied with the 710
663 posed systems are applied on the images in Figs. 10 and 11. Their specific purposes (segmentation or classification). Activation func- 711
664 corresponding results are listed in Tables 7 and 8. As shown, the tion of the output layer and learning rule for applied ANNs are the 712
665 obtained values of DICE and Jaccard related to the second proposed linear function and back propagation (BP), respectively. The second 713
666 technique achieve better results. column in Table 9 illustrates the structure of the applied neural 714
network in segmentation for determining an appropriate threshold 715
667 3.2. Classification performance in the first proposed method. To classify tumor types in MIAS data- 716
base another ANN is used as shown in the third column. Similarly, 717
668 Errors of the used MLP neural networks are calculated by mean for the images taken from DDSM database other ANNs based on 718
669 squared error (MSE) according to what is provided in the fourth and fifth columns are applied. In 719

Table 9
Architecture of applied ANNs in first proposed technique.

Purposed Segmentation Classification Segmentation Classification


Database MIAS MIAS DDSM DDSM
Type of ANN MLP MLP MLP MLP
Number of layers 3 2 2 2
Neurons in input layer 7 20 7 20
Neurons in hidden layer(s) 6, 3 20 10 3
Neurons in output layer 1 1 1 1
Learning rule BP BP BP BP
Training function trainlm trainlm trainlm trainbr
Activation function of hidden layer(s) logsig, tansig logsig tansig logsig
Error MSE MSE MSE MSE
Type of problem Regression Classification Regression Classification

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx 11

Table 10
Architecture of applies ANNs in second proposed technique.

Purposed Classification Classification


Database MIAS DDSM
Type of ANN MLP MLP
Number of layers 2 2
Neurons in input layer 20 20
Neurons in hidden layer 20 3
Neurons in output layer 1 1
Learning rule BP BP
Training function trainlm trainbr
Activation function of hidden layer logsig logsig
Error MSE MSE
Type of problem Classification Classification

720 the second proposed technique, other ANNs to classify mammo-


721 grams in MIAS and DDSM databases are used (see Table 10).
722 The results of AUC, specificity, sensitivity, and accuracy in 10
723 times of running the proposed algorithms in the best and average
724 cases are presented in Table 11.
725 According to Table 11, the second method applied on DDSM
726 database, in the best and average cases, has better sensitivity Fig. 12. ROC curves for proposed techniques on MIAS and DDSM databases.
727 respect to the other methods. It is desirable to have both high sen-
728 sitivity and high specificity, but this is frequently not possible.
729 Higher sensitivity leads to lower specificity, and the vice versa
730 (Lalkhen & McCluskey, 2008). In tumor diagnosis, higher sensitivity
731 is more important as it causes early detection of malignancy. In the
732 first proposed method, the higher specificity shows benign diagno-
733 sis strength, whereas, in the second proposed method sensitivity is
734 higher. For better illustration, in Fig. 12 the ROC curves are plotted
735 for proposed methods on the two used databases in the best case.
736 Furthermore to assess a certainty level of the two proposed
737 methods in tumor classification on images taken from MIAS and
738 DDSM databases 10-fold cross validation method is used
739 (Refaeilzadeh, Tang, & Liu, 2009). In 10-fold cross validation the
740 database is divided into 10 folds. ANN is trained with 9 folds and
741 the remaining fold is applied to test the neural network. The
742 obtained accuracy mean is considered for evaluating the neural net-
743 work in classification. In Fig. 13, to compare the results of 10-fold
744 cross validation for MLP with other classifiers such as random forest, Fig. 13. Results of different classifiers using 10-fold cross validation.
745 naïve Bayes, KNN, and SVM are provided. As shown, MLP is the best
746 classifier in both proposed methods among the other classifiers to
747 recognize benign and malignant tumors. For performance comparison purposes, some typical results of 759

748 Several existing methods with different approaches in diagnos- the two proposed methods on DDSM and MIAS databases are 760

749 ing benign or malignant tumors and their results are listed in Table shown. It should be noted that unlike the methods in Mu, Nandi, 761

750 12. In Verma, McLeod, and Klevansky (2010), a new SCBDL classi- & Rangayyan (2008), Rojas-Domínguez and Nandi (2009), Saki, 762

751 fier showed better results compared to the standard MLP and Tahmasbi, Soltanian-Zadeh, and Shokouhi (2013), Tahmasbi, Saki, 763

752 improved the overall accuracy in tumor diagnosis. Therefore, in and Shokouhi (2010), Tahmasbi et al. (2011), Verma, McLeod, 764

753 the fifth row of Table 12 the results of soft clustered-based direct and Klevansky (2009), Verma et al. (2010) that the images are 765

754 learning (SCBDL) on segmented images based on CNN are given. manually segmented and in Buciu and Gacsadi (2011), Masotti 766

755 After clustering the benign and malignant tumors into 6 subclass- (2006) that have no segmentation stage, in our proposed methods 767

756 es, the weight matrix in the cluster layer is determined by using all of the stages in image analysis including pre-processing, seg- 768

757 the Modified Gram–Schmidt orthogonalization. After determining mentation, post-processing, feature extraction, feature selection, 769

758 the weights, the test data is given to SCBDL. and classification are done automatically. 770

Table 11
Classification specificity, sensitivity, and accuracy rate in proposed techniques.

Proposed methods Database Sensitivity (%) Specificity (%) Accuracy (%) AUC (%)
First proposed method in best case MIAS 85.41 91.89 88.65 92.45
First proposed method in average case MIAS 80.76 82.40 81.58 88.18
Second proposed method in best case MIAS 92.70 90.54 90.16 95.58
Second proposed method in average case MIAS 87.91 85.40 86.66 88.15
First proposed method in best case DDSM 95.83 95.94 95.57 95.86
First proposed method in average case DDSM 94.68 95.94 94.67 95.31
Second proposed method in best case DDSM 96.87 95.94 96.47 95.1
Second proposed method in average case DDSM 96.25 93.78 95.01 94.99

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

12 R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx

Table 12
Comparison of existing methods.

Methods (%) Database Classifier Sensitivity (%) Specificity (%) Accuracy (%) AUC (%)
Second technique (best) DDSM MLP 96.87 95.94 96.47 95.10
Second technique (average) DDSM MLP 96.25 93.78 95.01 94.99
Second technique (best) MIAS MLP 92.70 90.54 90.16 95.58
Second technique (average) MIAS MLP 87.91 85.40 86.66 88.15
Second technique (best) DDSM SCBDL 80.70 79.00 80.00 –
Wang and Yang et al., 2014; Wang, Li, & Gao, 2014 DDSM SVM – – 92.74 96.50
Liu and Tang, 2013 DDSM SVM 92.00 93.00 93.00 94.39
Saki et al., 2013 MIAS OWBPE 90.10 88.06 89.28 92.80
Zhang, Tomuro, Furst, & Raicu, 2012 DDSM SVM – – 72.00 –
Tahmasbi et al., 2011 MIAS MLP 100 94.50 96.43 97.60
Buciu and Gacsadi, 2011 MIAS PSVM 84.61 80 82.30 78.00
Tahmasbi et al., 2010 MIAS MLP 90.10 94.44 92.80 98.00
Verma et al., 2010 DDSM MLP 85.00 92.50 88.75 –
Verma et al., 2010 DDSM SCBDL 97.50 97.50 97.50 –
Verma et al., 2009 DDSM SCNN 97.83 90.74 94.28 –
Rojas-Domínguez and Nandi, 2009 DDSM, MIAS Bayesian, FLD – – 81.00 –
Mu et al., 2008 MIAS S2SP – – – 95.00
Masotti, 2006 DDSM SVM 90.00 95.50 92.75 97.80

771 4. Conclusions function (RBF)) and different evolutionary algorithms (such as par- 815
ticle swarm, harmony, ant colony, and tabu search) can be applied 816
772 Breast tumor segmentation is one of the most important and in determination of an adaptive threshold and CNN templates in 817
773 crucial stages in medical image processing and pattern recognition. the first and second methods. These can be considered as future 818
774 In this work, two automated methods were presented based on the work. 819
775 improvement of region growing and CNN segmentations to
776 produce an adaptive threshold and appropriate templates, respec- Acknowledgment 820
777 tively, in order to preserve tumor boundary information to
778 diagnose benign and malignancy in mammograms. In the first We would like to thanks Ms. Mahdieh Rouhi for her language 821
779 method, intensity features and ANN were applied to generate an help and also the support of computer engineering group. 822
780 adaptive threshold in the region growing stage of segmentation
781 process. In the second method, a learning algorithm was proposed References 823
782 using genetic algorithm to determine CNN templates in segmenta-
783 tion of images (in DDSM and MIAS databases). The proposed meth- American Cancer Society (2013). Breast cancer: Facts and figures 2013. Atlanta: ACS. 824
784 ods have improved the CNN and region growing methods in breast Akay, M. F. (2009). Support vector machines combined with feature selection for 825
breast cancer diagnosis. Expert Systems with Applications, 36(2), 3240–3247. 826
785 tumor segmentation. To evaluate the performance of the proposed Amroabadi, S. H., Ahmadzadeh, M. R., & Hekmatnia, A. (2011). Mass detection in 827
786 segmentation methods, after extracting the intensity, shape, and mammograms using GA based PCA and Haralick features selection. Electrical 828
787 texture features in order to characterize the segmented tumors, a Engineering (ICEE), 2011 19th Iranian Conference. IEEE (pp. 1–4). IEEE. 829
Bator, M., & Nieniewski, M. (2012). Detection of cancerous masses in mammograms 830
788 genetic algorithm was applied to select the appropriate features. 831
by template matching: Optimization of template brightness distribution by
789 Finally, the tumor classification was performed using different means of evolutionary algorithm. Journal of Digital Imaging, 25(1), 162–172. 832
790 classifiers such as MLP, random forest, naïve Bayes, SVM, and Berber, T., Alpkocak, A., Balci, P., & Dicle, O. (2013). Breast mass contour 833
segmentation algorithm in digital mammograms. Computer Methods and 834
791 KNN. As a result, MLP produced the best diagnosis performance 835
Programs in Biomedicine, 110(2), 150–159.
792 in both proposed methods. Bick, U., & Diekmann, F. (2010). Digital mammography (1st ed.). Berlin: Springer. 836
793 The obtained results indicated that the improvement of the Borges Sampaio, W., Moraes Diniz, E., Corrêa Silva, A., Cardoso de Paiva, A., & 837
Gattass, M. (2011). Detection of masses in mammogram images using CNN, 838
794 region growing and CNN segmentations and utilization of inten- 839
geostatistic functions and SVM. Computers in Biology and Medicine, 41(8),
795 sity, textural, and shape features of the segmented tumors provides 653–664. 840
796 satisfactory systems to diagnose benign and malignant tumors. Buciu, I., & Gacsadi, A. (2011). Directional features for automatic tumor 841
classification of mammogram images. Biomedical Signal Processing and Control, 842
797 Considering parallel computations and hardware accessibility of
6(4), 370–378. 843
798 CNN (and also its demonstrated high ability compared with region Cao, Y., Hao, X., Zhu, X., & Xia, S. (2010). An adaptive region growing algorithm for 844
799 growing in diagnosing of tumors), this paper sheds more light on breast masses in mammograms. Frontiers of Electrical and Electronic Engineering 845
800 breast tumor classification and helps the radiologists in their in China, 5(2), 128–136. 846
Cerasa, A., Bilotta, E., Augimeri, A., Cherubini, A., Pantano, P., Zito, G., et al. (2012). A 847
801 diagnosis task. Moreover, the higher sensitivity versus the lower cellular neural network methodology for the automated segmentation of 848
802 specificity of the second proposed method shows the higher diag- multiple sclerosis lesions. Journal of Neuroscience Methods, 203(1), 193–199. 849
803 nosis strength of malignancy compared to benign of tumors. This Chauhan, S., Goel, V., & Dhingra, S. (2012). Pattern recognition system using MLP 850
neural networks. Pattern Recognition, 4(9), 43–46. 851
804 advantage increases the survival chance of the patient with early 852
Cheetham, A. H., & Hazel, J. E. (1969). Binary (presence–absence) similarity
805 detection of cancer and offers more effective treatment options. coefficients. Journal of Paleontology, 43(5), 1130–1136. 853
806 Also, the classification performance of the second proposed Cheng, H. D., Shi, X. J., Min, R., Hu, L. M., Cai, X. P., & Du, H. N. (2006). Approaches for 854
automated detection and classification of masses in mammograms. Pattern 855
807 method (in terms of sensitivity, specificity, accuracy, and AUC) is 856
Recognition, 39(4), 646–668.
808 comparable with that of other existing methods to recognize Chua, L. O., & Yang, L. (1988). Cellular neural networks: Applications. IEEE 857
809 benign and malignant breast tumors. Transactions on Circuits and Systems, 35(10), 1273–1290. 858
Dice, L. R. (1945). Measures of the amount of ecologic association between species. 859
810 In spite of the stated strengths of the proposed techniques, var-
Ecology, 26(3), 297–302. 860
811 iability of the results on DDSM and MIAS databases is considered as Forsyth, D. A., & Ponce, J. (2003). Computer vision: A modern approach. Upper Saddle 861
812 a weakness. As a result, further study can be conducted with more River, NY: Prentice Hall. 862
Freer, T. W., & Ulissey, M. J. (2001). Screening mammography with computer-aided 863
813 emphasis on different preprocessing methods. Also, other neural
detection: Prospective study of 12,860 patients in a community breast center. 864
814 networks (such as self-organizing map (SOM), SVM, and radial basis Radiology, 220(3), 781–786. 865

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020
ESWA 9558 No. of Pages 13, Model 5G
26 September 2014

R. Rouhi et al. / Expert Systems with Applications xxx (2014) xxx–xxx 13

866 Ganesan, K., Acharya, U., Chua, C. K., Min, L. C., Abraham, K., & Ng, K. (2013). Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information 921
867 Computer-aided breast cancer detection using mammograms: A review. IEEE criteria of max-dependency, max-relevance, and min-redundancy. IEEE 922
868 Reviews in Biomedical Engineering, 6, 77–98. Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. 923
869 Görgel, P., Sertbas, A., & Ucan, O. N. (2013). Mammographical mass detection and Rabottino, G., Mencattini, A., Salmeri, M., Caselli, F., & Lojacono, R. (2011). 924
870 classification using Local Seed Region Growing-Spherical Wavelet Transform Performance evaluation of a region growing procedure for mammographic 925
871 (LSRG–SWT) hybrid scheme. Computers in Biology and Medicine, 43(6), breast lesion identification. Computer Standards & Interfaces, 33(2), 128–135. 926
872 765–774. Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of Database 927
873 Haykin, S. (1994). Neural networks: A comprehensive foundation. PTR: Prentice Hall. Systems. US: Springer (pp. 532–538). US: Springer. 928
874 Jalaja, K., Bhagvati, C., Deekshatulu, B. L., & Pujari, A. K. (2005). Texture element Rojas-Domínguez, A., & Nandi, A. K. (2009). Development of tolerant features for 929
875 feature characterizations for CBIR. Geoscience and Remote Sensing Symposium, characterization of masses in mammograms. Computers in Biology and Medicine, 930
876 2005. IGARSS’05. Proceedings. 2005 IEEE International, Vol. 2 (pp. 733–736). 39(8), 678–688. 931
877 Kopans, D. (1998). Breast imaging. Philadelphia: Lippincott-Raven. Saki, F., Tahmasbi, A., Soltanian-Zadeh, H., & Shokouhi, S. B. (2013). Fast opposite 932
878 Kuo, S. J., Hsiao, Y. H., Huang, Y. L., & Chen, D. R. (2008). Classification of benign and weight learning rules with application in breast cancer diagnosis. Computers in 933
879 malignant breast tumors using neural networks and three-dimensional power Biology and Medicine, 43(1), 32–41. 934
880 Doppler ultrasound. Ultrasound in Obstetrics & Gynecology, 32(1), 97–102. Sampaio, W. B., Diniz, E. M., Silva, A. C., & de Paiva, A. C. (2009). Detection of masses 935
881 Lai, S. M., Li, X., & Biscof, W. F. (1989). On techniques for detecting circumscribed in mammograms using cellular neural networks, hidden Markov models and 936
882 masses in mammograms. IEEE Transactions on Medical Imaging, 8(4), 377–386. Ripley’s K function. Systems, Signals and Image Processing, IWSSIP, 16th 937
883 Lalkhen, A. G., & McCluskey, A. (2008). Clinical tests: Sensitivity and specificity. International Conference. IEEE (pp. 1–3). IEEE. 938
884 Continuing Education in Anaesthesia, Critical Care & Pain, 8(6), 221–223. Tahmasbi, A., Saki, F., & Shokouhi, S. B. (2011). Classification of benign and 939
885 Li, H., Liao, X., Li, C., Huang, H., & Li, C. (2011). Edge detection of noisy images based malignant masses based on Zernike moments. Computers in Biology and 940
886 on cellular neural networks. Communications in Nonlinear Science and Numerical Medicine, 41(8), 726–735. 941
887 Simulation, 16(9), 3746–3759. Tahmasbi, A., Saki, F., & Shokouhi, S. B. (2010). Mass diagnosis in mammography 942
888 Liu, S., Babbs, C. F., & Delp, E. J. (2001). Multiresolution detection of spiculated images using novel FTRD features. Biomedical Engineering (ICBME), 2010 17th 943
889 lesions in digital mammograms. IEEE Transactions on Image Processing, 10(6), Iranian Conference. IEEE (pp. 1–5). IEEE. 944
890 874–884. Verma, B., McLeod, P., & Klevansky, A. (2010). Classification of benign and malignant 945
891 Liu, X., & Tang, J. (2013). Mass classification in mammograms using selected patterns in digital mammograms for the diagnosis of breast cancer. Expert 946
892 geometry and texture features, and a new SVM-based feature selection method. Systems with Applications, 37(4), 3344–3351. 947
893 IEEE Systems Journal, 8(3), 910–920. Verma, B., McLeod, P., & Klevansky, A. (2009). A novel soft cluster neural network 948
894 Lixin, S., Yanan, L., Bin, Y., & Yuhong, W. (2013). Segmentation of breast masses using for the classification of suspicious areas in digital mammograms. Pattern 949
895 adaptive region growing. 8th International Forum on Strategic Technology (IFOST), Recognition, 42(9), 1845–1852. 950
896 2013 (Vol. 2. IEEE (pp. 77–81). IEEE. Vyborny, C. J., & Giger, M. L. (1994). Computer vision and artificial intelligence in 951
897 Masotti, M. (2006). A ranklet-based image representation for mass classification in mammography. American Journal of Roentgenology, 162(3), 699–708. 952
898 digital mammograms. Medical Physics, 33, 3951. Wang, W., Yang, L. J., Xie, Y. T., & An, Y. W. (2014). Edge detection of infrared image 953
899 Matsumoto, T., Chua, L. O., & Suzuki, H. (1990). CNN cloning template: Connected with CNN_DGA algorithm. Optik – International Journal for Light and Electron 954
900 component detector. IEEE Transactions on Circuits and Systems, 37(5), 633–635. Optics, 125(1), 464–467. 955
901 Melouah, A. (2013). A novel region growing segmentation algorithm for mass Wang, Y., Li, J., & Gao, X. (2014). Latent feature mining of spatial and marginal 956
902 extraction in mammograms. Modeling Approaches and Algorithms for Advanced characteristics for mammographic mass classification. Neurocomputing, 144, 957
903 Computer Applications (Vol. 488. SCI: Springer (pp. 95–104). SCI: Springer. 107–118. 958
904 Metz, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine, Wei, C. H., Chen, S. Y., & Liu, X. (2012). Mammogram retrieval on similar mass 959
905 8(4), 283–298. lesions. Computer methods and programs in biomedicine, 106(3), 234–248. 960
906 Minavathi Murali, S., & Dinesh, M. S. (2011). Curvature and shape analysis for the Wei, C. H., Li, Y., Chau, W. Y., & Li, C. T. (2009). Trademark image retrieval using 961
907 detection of spiculated masses in breast ultrasound images. International synthetic features for describing global shape and interior structure. Pattern 962
908 Journal of Machine Intelligence, 3(4), 333–339. Recognition, 42(3), 386–394. 963
909 Mohanty, A. K., Senapati, M. R., & Lenka, S. K. (2013). A novel image mining Winsberg, F., Elkin, M., Macy, J., Bordaz, V., & Weymouth, W. (1967). Detection of 964
910 technique for classification of mammograms using hybrid feature selection. radiographic abnormalities in mammograms by means of optical scanning and 965
911 Neural Computing and Applications, 22(6), 1151–1161. computer analysis. Radiology, 89(2), 211–215. 966
912 Mu, T., Nandi, A. K., & Rangayyan, R. M. (2008). Classification of breast masses using Zarándy, Á., Roska, T., Liszka, G., Hegyesi, J., Kék, L., & Rekeczky, C. (1994). Design of 967
913 selected shape, edge-sharpness, and texture features with linear and kernel- analogic CNN algorithms for mammogram analysis. In Proceedings of the third 968
914 based classifiers. Journal of Digital Imaging, 21(2), 153–169. IEEE international workshop on cellular neural networks and their applications, 969
915 Oliver, A., Freixenet, J., Marti, J., Pérez, E., Pont, J., Denton, E. R., et al. (2010). A review 1994. CNNA-94 (pp. 255–260). IEEE. 970
916 of automatic mass detection and segmentation in mammographic images. Zhang, Y., Tomuro, N., Furst, J., & Raicu, D. S. (2012). Building an ensemble system 971
917 Medical Image Analysis, 14(2), 87–110. for diagnosing masses in mammograms. International Journal of Computer 972
918 Pawar, P. S., & Patil, D. R. (2013). Breast cancer detection using neural network Assisted Radiology and Surgery, 7(2), 323–329. 973
919 models. In CSNT ‘13 Proceedings of the 2013 international conference on Zucker, S. W. (1976). Region growing: Childhood and adolescence. Computer 974
920 communication systems and network technologies (pp. 568–572). Graphics and Image Processing, 5(3), 382–399. 975
976

Please cite this article in press as: Rouhi, R., et al. Benign and malignant breast tumors classification based on region growing and CNN segmentation. Expert
Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.09.020

You might also like