Lung Cancer Detection by Using Artificial Neural Network and Fuzzy Clustering Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2011 IEEE GCC Conference and Exhibition (GCC), February 19-22, 2011, Dubai, United Arab Emirates

LUNG CANCER DETECTION BY USING ARTIFICIAL NEURAL


NETWORK AND FUZZY CLUSTERING METHODS
Fatma Taher Rachid Sammouda
Department of Computer Engineering Department of Computer Science
Khalifa University, Sharjah University of Sharjah, Sharjah
[email protected] [email protected]

ABSTRACT techniques are expensive and time consuming. In other


words, most of these techniques are detecting the lung
The early detection of the lung cancer is a challenging cancer in its advanced stages, where the patient’s chance
problem, due to the structure of the cancer cells. This of survival is very low. Therefore, there is a great need
paper presents two segmentation methods, Hopfield for a new technology to diagnose the lung cancer in its
Neural Network (HNN) and a Fuzzy C-Mean (FCM) early stages. Image processing techniques provide a good
clustering algorithm, for segmenting sputum color images quality tool for improving the manual analysis. A
to detect the lung cancer in its early stages. The manual numbers of medical researchers utilized the analysis of
analysis of the sputum samples is time consuming, sputum cells for early detection of lung cancer [3].
inaccurate and requires intensive trained person to avoid For this reason we attempt to use automatic diagnostic
diagnostic errors. The segmentation results will be used system for detecting lung cancer in its early stages based
as a base for a Computer Aided Diagnosis (CAD) system on the analysis of the sputum color images [4]. In order to
for early detection of lung cancer which will improves the formulate a rule we have developed a technique for
chances of survival for the patient. The two methods are unsupervised segmentation of the sputum color image to
designed to classify the image of N pixels among M divide the images into several meaningful sub regions.
classes. In this study, we used 1000 sputum color images Image segmentation has been used as the first step in
to test both methods, and HNN has shown a better image classification and clustering. There are many
classification result than FCM, the HNN succeeded in algorithms which have been proposed in other articles for
extracting the nuclei and cytoplasm regions. medical image segmentation, such as histogram analysis,
regional growth, edge detection and Adaptive
Index Terms— Lung Cancer Detection, Image Thresholding [5]. Review image of segmentation
Segmentation, Sputum Cells, Hopfield Neural Network, techniques can be found in [6]. Other authors have
Fuzzy C-Mean Clustering. considered the use of color information as the key
discriminating factor for cell segmentation for lung
cancer diagnosis [7]. The analysis of sputum images have
been used in [8] for detecting tuberculosis; it consists of
1. INTRODUCTION analyzing sputum images for detecting bacilli. In this
paper, two basic techniques have been applied: Hopfield
Lung cancer is considered to be as the main cause of Neural Network (HNN) and Fuzzy C-Mean Clustering
cancer death worldwide, and it is difficult to detect in its Algorithm (FCM) to segment sputum color images
early stages because symptoms appear only at advanced prepared by the Papanicalaou standard staining methods
stages causing the mortality rate to be the highest among into red dyes and blue dyes images [9]. We present the
all other types of cancer. More people die because of lung results of the segmentation of some images for both
cancer than any other types of cancer such as: breast, methods.
colon, and prostate cancers. There is significant evidence The reminder of this paper is organized as follows. In
indicating that the early detection of lung cancer will Section 2, Hopfield Neural Network segmentation
decrease the mortality rate. The most recent estimates algorithm is described. In Section 3, fuzzy clustering
according to the latest statistics provided by world health algorithm is proposed. In Section 4, the analysis phase is
organization indicates that around 7.6 million deaths discussed. Finally in Section 5, the conclusion is drawn
worldwide each year because of this type of cancer. and several issues for future works are presented.
Furthermore, mortality from cancer are expected to
continue rising, to become around 17 million worldwide 2. HOPFIELD NEURAL NETWORK
in 2030 [1]. There are many techniques to diagnosis lung
cancer, such as Chest Radiograph (x-ray), Computed Hopfield Neural Network (HNN) is one of the artificial
Tomography (CT), Magnetic Resonance Imaging (MRI neural networks, which has been proposed for
scan) and Sputum Cytology [2]. However, most of these segmenting both gray-level and color images. In [10], the

978-1-61284-119-9/11/$26.00 ©2011 IEEE 295


authors present the segmentation problem for gray-level 5. Repeat from step 2 until convergence then
images as minimizing a suitable energy function with terminate.
HNN, it derived the network architecture from the energy We applied the HNN with the specification mentioned
function, and classify the sputum cells into nuclei, above to one thousand sputum color images and
cytoplasm and background classes, where the input was maintained the result for further processing in the
the RGB component of the used images. In our work we following steps. Our algorithm could segment 97% of the
used the HNN algorithm as our segmentation method. images successfully in nuclei, cytoplasm regions and
The HNN is very sensitive to intensity variation and it clear background. Furthermore, HNN took short time to
can detect the overlapping cytoplasm classes. HNN is achieve the desired results. By experiment, HNN needed
considered as unsupervised learning. Therefore, the less than 120 iterations to reach the desired segmentation
network classifies the feature space without teacher based result in 36 seconds.
on the compactness of each cluster calculated using the
Euclidean distance measure between the kth pixel and the 3. FUZZY CLUSTERING
centroid of class l. The neural network structure consists
of a grid of N x M neurons with each column representing Clustering is the process of dividing the data into
a cluster and each row representing a pixel. The network homogenous regions based on the similarity of objects;
is designed to classify the image of N pixels of P features information that is logically similar physically is stored
among M classes, such that the assignment of the pixels together, in order to increase the efficiency in the
minimizes the criterion function. database system and to minimize the number of disk
1 N M 2 2 access. The process of clustering is to assign the q feature
E= ∑∑ RklVkl .
2 k =1 l =1
(1) vectors into K clusters, for each kth cluster Ck is its center.
Fuzzy Clustering has been used in many fields like
Where Rkl is considered as the Euclidean distance pattern recognition and Fuzzy identification. A variety of
measure between the kth pixel and the centroid of class l, Fuzzy clustering methods have been proposed and most
Vkl is the output of the kth neurons. The minimization is of them are based upon distance criteria. The most widely
achieved using HNN and by solving the motion equations used algorithm is the Fuzzy C-Mean algorithm (FCM), it
satisfying: uses reciprocal distance to compute fuzzy weights. This
∂u i ∂E algorithm has as input a predefined number of clusters,
= − µ (t ) . (2) which is the k from its name. Means stands for an average
∂t ∂Vi location of all the members of particular cluster and the
Where µ (t ) is as defined in [10] a scalar positive output is a partitioning of k cluster on a set of objects.
function of time used to increase the convergence speed The objective of the FCM cluster is to minimize the total
of the HNN. By applying the relation (2) to equation (1), weighted mean square error:
∑ ∑ (W
k
we get a set of neural dynamics given by: J = (W q , C ( k ) ) = qk ) p || x ( q ) − c ( k ) || 2 (7)
dU kl ( q =1,Q ( k =1, K )
= − µ (t )[ Rkl2 Vkl ] . (3) The FCM allows each feature vector to belong to multiple
dt clusters with various fuzzy membership values [11]. Then
where Ukl and Vkl are the input and output of the kth the final classification will be according to the maximum
neuron respectively. To assign a label m to the kth pixel weight of the feature vector over all clusters. The detailed
we use the input-output function given by: algorithm:
Vkm (t + 1) = 1, ifU km = Max[U kl (t ), ∀1] Input: Vectors of objects, each object represent s
. (4) dimensions, where v = {v1,v2,……,vn} in our case it will
Vkl (t ) = 0, otherwise.
be an image pixels, each pixel has three dimensions RGB,
The HNN segmentation algorithm can be summarized in K = number of clusters.
the following steps: Output = a set of K clusters which minimize the sum of
1. Initialize the input of neurons to random values. distance error.
2. Apply the input-output relation given in (4) to Algorithm steps:
obtain the new output value for each neuron, 1. Initialize random weight for each pixel, it uses
establishing the assignment of pixel to classes.
3. Compute the centroid for each class as follow: fuzzy weighting with positive weights {W qk }
 n
 between [0, 1].
 ∑ x K vkl  2. Standardize the initial weights for each qth
x L =  K =1  . feature vector over all K clusters via
(5)
nl Wqk / ∑W
r =1, K
qr
Where nl is the number of pixels in class l. . (8)
4. Solve the set of differential equation in (3) to 3. Standardize the weights over k = 1,…,K for each
update the input of each neuron:
q to obtain W qk , via
dU kl
U kl (t + 1) = U kl (t ) + (6)
dt

296
Wqk = Wqk / ∑W
( r =1, Q )
rk , q = 1,..., Q the sensitivity of HNN, the cytoplasm regions were
represented by two clusters. These cytoplasm clusters
. (9) will be merged later if the difference in their mean values
4. Compute new centroids C(k), k = 1,….,K via is not large. Comparing the FCM segmentation result in
C (k ) = ∑W qk X ( q ) , k = 1,..., K (h) to the raw image (d), the nuclei regions are detected,
( q =1, Q )
. (10) but they present a little overlapping in the way that the
two different nuclei may be seen or considered as one
5. Update the weights {W qk } via nucleus, and this can affect the diagnosis results. The
cytoplasm regions are smoother than in the case of HNN,
Wqk =(1/ || xq −ck ||2)1/(p−1) / ∑(1/ || x −c || )
(r=1,K)
q r 2 1/(p−1)
,k =1,..K,q =1,...,Q reflecting that the FCM is less sensitive to the intensity
variation than HNN. The learning error waveforms of the
(11) above comparison and discussion are shown in Figure 2,
6. If there is change in the input, repeat from step where it can be seen that the segmentation error at
3, else terminate. convergence is smaller with HNN than with the FCM.
7. Assign each pixel to a cluster based on the However, the FCM converge fifty iterations earlier than
maximum weight. HNN. Figure 3 (a) shows a sample of sputum color image
We applied the FCM clustering algorithm with the stained with blue dyes, (b) and (c) show the segmentation
specification mentioned above to one thousand sputum results using HNN and the FCM with the RGB
color images and maintain the result for further components of the raw image (a), respectively. As is seen
processing in the following steps. Our algorithm in the segmentation results of both algorithms in (b) and
segments the images into nuclei, cytoplasm regions and (c) the nuclei have not been detected and the background
clear background, however, the FCM is not sensitive to presents a lot of intensity variation. A filter was needed to
intensity variation, therefore, the cytoplasm regions are minimize the effect of the intensity variation in the raw
detected as one cluster when we fixed the cluster number image as described in [10]. The result of this filter is
to three, four, five and six. Moreover, FCM failed in shown in (d). (e) And (f) are the segmentation results
detecting the nuclei; it detected only part of it. By obtained using HNN and FCM with RGB components of
experiment, the FCM algorithm takes less than 50 (d) with three clusters. (g) And (h) the segmentation
iterations to reach the desired results in 10 seconds on results with four clusters. Here, the nuclei have been
average. detected, however a color cluster is missing in the result
of FCM (h). The same applies for the previous case of the
4. ANALYSIS PAHSE red cells. HNN is more sensitive to intensity variation
between nuclei-nuclei or nuclei-cytoplasm regions. This
In this section, we present the result obtained with two is clear in Figure 4, which shows quantitatively, the
sample images; the first sample containing red cells learning error waveforms of HNN and FCM during the
surrounded by a lot of debris nuclei and a background segmentation process of the blue sample.
reflecting a large number of intensity variation in its pixel
values as shown in Figure 1 (a), and the second sample is 5. CONCLUSION
composed of blue stained cells shown in Figure 3 (a). In
Figure 1, (b) and (c) show the segmentation results using In this study, two segmentation processes have been used,
HNN and the FCM with RGB components of the raw the first one was Hopfield Neural Network (HNN), and
image (a), respectively. As is seen in the segmentation the second one was Fuzzy C-Mean (FCM) Clustering
results of both algorithms (b) and (c), the nuclei of the algorithm. It was found that the HNN segmentation
cells were not detected, in the case of HNN in (b), and results are more accurate and reliable than FCM
were not accurately represented in (c). For this reason we clustering in all cases. The HNN succeeded in extracting
developed a filter to extract our regions of interest, the nuclei and cytoplasm regions. However FCM failed in
described in [10], and the result is shown in (d). (e) And detecting the nuclei, instead it detected only part of it. In
(f) show the segmentation results by using HNN and addition to that, the FCM is not sensitive to intensity
FCM with the RGB components of (d). By fixing the variations as the segmentation error at convergence is
cluster numbers to three, respectively, we realized that in larger with FCM compared to that with HNN.
the case of HNN, the nuclei were detected but not The HNN will be used as a basis for a Computer Aided
precisely. In the case of FCM only part of the nuclei has Diagnosis (CAD) system for early detection of lung
been detected. We increased the cluster numbers to four cancer. In the future, we plan to consider a Bayesian
as an attempt to solve the nuclei detection problem. The decision theory for the detection of the lung cancer cells,
results are shown in (g) and (h) for both HNN and FCM, followed by developing a model based on the idea of
respectively. watershed algorithm which combined the idea of edge
Comparing the HNN segmentation result in (g) to the raw detection and region based approach to extract the
image (d), we can say that the nuclei regions were homogeneous tissues represented in the image. As soon
detected perfectly, and also their corresponding as a more extended dataset is available.
cytoplasm regions. However, due to the problem of
intensity variation in the raw image (d) and also due to

297
6. REFERENCES Figure 1. (a) Original raw image stained with red dyes, (b) and
(c) the segmentation results for the image in (a) by using HNN
[1] Dignam JJ, Huang L, Ries L, Reichman M, Mariotto A, and FCM, respectively. (d) The filtered image. (e) And (f) show
Feuer E. “Estimating cancer statistic and other-cause mortality the segmentation results for the filtered image in (d) by using
in clinical trial and population-based cancer registry cohorts”, HNN and FCM, and by fixing the cluster numbers to three,
Cancer 10, Aug 2009. respectively. (g) And (h) the results by fixing the cluster
[2] T. C. Kennedy, Y. Miller and S. Prindiville, “Screening numbers four, respectively.
for Lung Cancer Revisited and the Role of Sputum Cytology
8.00E+20
and Fluorescence Bronchoscopy in a High-Risk Group,” Chest 7.00E+20
Journal, vol. 10, pp. 72-79, 2005. 6.00E+20

Error Values
5.00E+20
[3] Z. Daniele, H. Andrew, J. Nickerson, “Nuclear Structure 4.00E+20
Fuzzy_k=4
NN_k=4
in Cancer Cells,” Nature Reviews Cancer, Medical School, vol. 3.00E+20

4, no. 9, pp. 677-87, USA, Sep. 2004. 2.00E+20


1.00E+20
[4] A. Sheila and T. Ried “Interphase Cytogenetics of Sputum 0.00E+00
Cells for the Early Detection of Lung Carcinogenesis”, 1 13 25 37 49 61 73 85 97 109
Iteration No.
Coordinating Center for Clinical Trials, National Cancer
Institute, 6120 Executive Boulevard, Bethesda, MD 20852-
4910. 2010. Figure 2. The learning error waveforms of HNN and FCM
[5] K. McCrae, D. Ruck, S. Rogers and M. Oxley, “Color during the segmentation process, for the red cells image in
Image Segmentation,” Proceeding of the SPIE- The Figure 1 (a).
International Society for Optical Engineering, Application of
Artificial Neural Networks, Orlando, USA, pp. 306-315, April,
1994.
[6] L. Lucchese and S. K. Mitra, “Color Image Segmentation:
A State of the Art Survey,” Proceeding of the Indian National
Science Academy (INSA-A), New Delhi, India, vol. 67, no. 2,
pp. 207-221, 2001.
[7] S.Shah, “Automatic Cell Images segmentation using a
Shape-Classification Model”, Proceedings of IAPR Conference (a) (b) (c)
on Machine vision Applications, pp. 428-432,Tokyo, Japan,
2007.
[8] M. G. Forero, F. Sroubek and G. Cristobal, “Identification
of Tuberculosis Based on Shape and Color,” Journal of Real
time imaging, vol. 10, pp. 251-262, 2004.
[9] Y. HIROO,” Usefulness of Papanicolaou stain by
rehydration of airdried smears ”, Journal of the Japanese Society (d) (e) (f)
of Clinical Cytology, vol. 34, pp. 107-110, Japan,2003.
[10] R. Sammouda, N. Niki, H. Nishitani, S. Nakamura, and S.
Mori, “Segmentation of Sputum Color Image for Lung Cancer
Diagnosis based on Neural Network,” IEICE Transactions on
Information and Systems. vol. 8, pp. 862-870, August, 1998.
[11] H. Sun, S. Wang and Q. Jiang, "Fuzzy C-Mean based
(g) (h)
Model Selection Algorithms for Determining the Number of
Figure 3. (a) Original raw image stained with blue dyes, (b) and
Clusters," Pattern Recognition, vol. 37, pp.2027-2037, 2004.
(c) the segmentation results for the image in (a) by using HNN
and FCM, respectively. (d) The filtered image. (e) And (f) show
the segmentation results for the filtered image in (d) by using
HNN and FCM, and by fixing the cluster numbers to three,
respectively. (g) And (h) the results by fixing the cluster
numbers four, respectively.

7.00E+16
(a) (b) (c) 6.00E+16

5.00E+16

4.00E+16
Error

Fuzzy_Mk3
3.00E+16 NN_k3

2.00E+16

1.00E+16

0.00E+00
1 13 25 37 49 61 73 85 97 109

(d) (e) (f) Itration No

Figure 4. The learning error waveforms of HNN and FCM


during the segmentation process, for the blue cells image in
Figure 3 (a).

(g) (h)

298

You might also like