Lung Cancer
Lung Cancer
Lung Cancer
Abstract: This paper discusses the formation of Lung detect lung cancer is by the use of image processing,
cancer detection system by using the techniques of fuzzy c-means and convolutional neutral network to
Image processing. The system formed can take any develop Computer aided diagnosis. In this paper, CT
type of medical image within the three choices scan image, MRI scan image and ultrasound images
consisting of CT, MRI and Ultrasound images. Here are used. A CT scan or Computerized Axial
the proposed model is developed using Fuzzy-C- Tomography (CAT) scan is the most sensitive and
Means and Convolution Neural Network (CNN) specific detection modality produces cross-sectional
algorithm used for feature selection. This paper is an images of specific areas of scanned object by the use
extension of image processing using lung cancer of computer processed combination of many X-ray
detection and produces the results of feature images taken from different angle [1]. Radio waves
extraction and feature selection after segmentation. and magnetic field is used to form images of a body
The system formed accepts any one of medical image in an imaging technique known as Nuclear Magnetic
within the three choices consisting of MRI, CT and Resonance Imaging (NMRI) The aim of this paper
Ultrasound image as input. After preprocessing of is to design a system which can take any one of the
image, wiener filter is used for remove noise and three images as input and produces the desired
unwanted region. This present work proposes a output. The algorithms used are
method to detect the cancerous cells effectively from sensitivity,specificity and accuracy.The proposed
the CT, MRI scan and Ultrasound images. Pixel model consists of following steps such as:
Segmentation has been used for FCM segmentation Collection of lung image data set, preprocessing,
and filter is used for De-noising the medical images. wiener filter and FCM segmentation of CT and MRI
Simulation results are obtained for the cancer images. Every step is described in further sections.
detection system using MATLAB and comparison is
done between normal lung and abnormallung We apply an extensive preprocessing techniques
medical images. to get the accurate nodules in order to enhance the
accuracy of detection of lung cancer. Moreover, we
Keyword : Lung cancer,lung segmentation,Fuzzy-C- perform an end-to-end training of CNN from scratch
Means,CNN, Feature extraction. in order to realize the full potential of the neural
network i.e. to learn discriminative features.
I. INTRODUCTION Extensive experimental evaluations are performed on
One of the major reasonsfor non-accidental death is a dataset comprising lung nodules from more than
cancer. It has been proved that lung cancer is the 1390 low dose CT scans [2].
topmost cause of cancer death in men and women
worldwide. The death rate can be reduced if people
go for early diagnosis so that suitable treatment can
be administered by the clinicians within specified
time. Cancer is, when a group of cells go irregular
growth uncontrollably and lose balance to form
malignant tumors which invades surrounding
tissues. Cancer can be classified as Non-small cell
lung cancer (NSCLC) and small cell lung cancer
(SCLC).In this paper we confine to Non-small cell
lung cancer (NSCLC) as it is more prevalent than
small cell lung cancer (SCLC). There‟s a difference
between the diagnosis and treatment of non-small Figure 1: CT scan slice containing a small early stage lung
cell and small cell lung cancer. The various ways to cancer nodule.
II. REVIEW OF THE LITERATURE In [11], J. Tan et al. designed a framework that
detected lung nodules, then reduced the false positive
Recently, deep artificial neural networks have been for the detected nodules based on Deep neural
applied in many applications in pattern recognition network and Convolutional Neural Network. The
and machine learning, especially, Convolutional CNN has four convolutional layers and four pooling
neural networks (CNNs) which is one class of models layers. The filter was of depth 32 and size 3,5. The
[3]. Another approach of CNNs was applied on used dataset was acquired from the LIDC-IDRI for
ImageNet Classification in 2012 is called an about 85 patients. The resulted sensitivity was of
ensemble CNNs which outperformed the best results 0.82. The False positive reduction gotten by DNN
which were popular in the computer vision was 0.329.
community [4]. There has also been popular latest
research in the area of medical imaging using deep In [12], R. Golan proposed a framework that
learning with promising results. train the weights of the CNN by a back propagation
to detect lung nodules in the CT image sub-volumes.
H. Suk et al. [5] suggested a new latent and This system achieved sensitivity of 78.9% with 20
shared feature representation of neuro-imaging data false positives, while 71.2% with 10 FPs per scan, on
of brain using Deep Boltzmann Machine (DBM) for lung nodules that have been annotated by all four
AD/MCI diagnosis. G. Wu et al. [6] developed deep radiologists.Convolutional neural networks have
feature learning for deformable registration of brain achieved better than Deep Belief Networks in current
MR images to improve image registration by using studies on benchmark computer vision datasets. The
deep features. Y. Xu et al. [7] presented the CNNs have attracted considerable interest in machine
effectiveness of using deep neural networks (DNNs) learning since they have strong representation ability
for feature extraction in medical image analysis as a in learning useful features from input data in recent
supervised approach. Kumar et al. [8] proposed a years.
CAD system which uses deep features extracted from
an autoencoder to classify lung nodules as either Fuzzy k-c-means clustering algorithm used for
malignant or benign on LIDC database. In [9], Yaniv medical image segmentation which was introduced in
et al. presented a system for medical application of Ajala, 2012 [13]. Here fuzzy-c-means is a method of
chest pathology detection in x-rays which uses clustering algorithm which allows one piece of data
convolutional neural networks that are learned from a belongs to two or more clusters and k-means is a
non-medical archive. that work showed a simple clustering method in which we use low
combination of deep learning (Decaf) and PiCodes computational complexity as compared to fuzzy c-
features achieves the best performance. The proposed means. When both Clustering methods were
combination presented the feasibility of detecting combined to produce a more time efficient
pathology in chest x-ray using deep learning segmentation algorithm called as fuzzy-k-c-means
approaches based on nonmedical learning. The used clustering algorithm. They offered that thresholding
database was composed of 93 images. They obtained which is the most elementary technique for medical
an area under curve (AUC) of 0.93 for Right Pleural image segmentation, in which this algorithm divides
Effusion detection, 0.89 for Enlarged heart detection pixels in different classes depending upon their gray
and 0.79 for classification between healthy and level. It is also said that it approaches division of
abnormal chest x-ray. scalar images by forming a binary partition of the
intensity values of an image and lastly determines an
In [10], Suna W. et al., implemented three intensity value. This intensity value is termed as
different deep learning algorithms, Convolutional threshold, which separates the desired classes.
Neural Network (CNN), Deep Belief Networks Classifier techniques which were used for pattern
(DBNs), Stacked DenoisingAutoencoder (SDAE), recognition, partitions a feature space derived from
and compared them with the traditional image feature the image using data with known labels. A feature
based CAD system. The CNN architecture contains space is a set of N*M matrix where N relates to the
eight layers of convolutional and pooling layers, number of observations and M relates to the number
interchangeably. For the traditional compared to of attributes. Classifiers are known as supervised
algorithm, there were about 35 extracted texture and methods since they require training data which are
morphological features. These features were fed to manually segmented and then used it for
the kernel based support vector machine (SVM) for automatically segmenting new data.
training and classification. The resulted accuracy for
the CNN approach reached 0.7976 which was little In [14], Fatma, 2012 two more segmentation
higher than the traditional SVM, with 0.7940. They methods were used which were Hopfield Neural
used the Lung Image Database Consortium and Network (HNN), and Fuzzy C-Mean (FCM)
Image Database Resource Initiative (LIDC/IDRI) clustering algorithm. In this they found that the HNN
public databases, with about 1018 lung cases. provides enhanced, accurate and reliable
B. Preprocessing.
The images are subjected to pre-processing steps to
remove noise and unwanted region. First, get the
input image. Resize the image to the size acceptable
to the processing system. Convert resized image into
gray image in order to use only one color channel.
Gray- scale comparison involves simple algebraic
scalar operators. Gray scale image is enough to
distinguish peaks of intensity. After converting the Figure 7: Estimated Bias Field.
gray scale into binary image. That binary image is a
digital image for each pixel with two possible values. Input image noise and unwanted region as show in
The nextthing after acquiring an image is to figure 4, To remove noise and unwanted region, the
redimension it. Because each image has different input image is processed though filtered as show in
sizes so we can resize it with the same size. They figure 5, Binarization image as shown in figure 6 and
convert it to a gray scale image after resizing an input Estimate bias fild image as show in figure 7.
image.
C. Segmentation
It presents an automatic graph cut-based
segmentation framework that uses a distance-
constrained energy function to produce topologically
restricted solutions. This term ensures that labels are
assigned only to the lung pixels even in the presence
of other anatomical regions with similar lung-like
patterns. The Euclidean distance was specified to
make it clear that the distance referred to in this work
is the distance between two points, not the distance as
a measure of the difference between two regions.
Any metric can therefore be used to measure the
distance between points or regions. The contribution
of this work is to create an automatic method of lung
Figure 4: Noisy image. Figure5: Wiener Filter segmentation using Graph Cut that produces
Image. topographically restricted solutions to accurately
identify the lungs in a CT image.
D. Training.
Back-propagation algorithm is used to train the CNN
to detect lung tumors in CT image of size 512 × 512
pixel. It consists of two phases. In the first phase, a
CNN consists of multiple volumetric convolution,
rectified linear units (ReLU) and max pooling layers
is used to extract valuable volumetric features from
input data. The second phase is the classifier. It has
Figure 10: Opening – closing by reconstruction. multipleFC and threshold layers, followed by a
SoftMax layer to perform the high-level reasoning of
the neural network. No scaling was applied to the CT
images of the dataset to preserve the original values
of the DICOM images as much as possible. During
training, the randomsub-volumes extracted from the
CT images of the training set and are normalized
according to an estimate of the normal distribution of
the voxel values in the dataset.
Best Validation Performance is 0.0086636 at epoch 142 Training: R=0.98677 Validation: R=0.87533
0 1 1
10
0.4 0.4
-1
10
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Target Target
-2
10 Test: R=0.86456 All: R=0.95237
1 1
0.6 0.6
-3
10 0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Target Target
-4
10
0 20 40 60 80 100 120 140
Figure 16: Regression.
148 Epochs
Figure 14: CNN Performance Validation. The cancer affects level fixed using training,
validation, and test as regression in figure 16.
0
Gradient = 0.00076703, at epoch 148 E.Validate.
10 The neural network based on convolutional and
segmentation has been implemented in MATLAB
and the system is trained with sample data sets for the
gradient
-2
10 model to understand and familiarize the lung cancer.
A sample image has been fed as an input to the
-4
trained model and the model at this stage is able to
10 tell the presence of cancer and locate the cancer spot
in the sample image of a lung cancer. The process
Validation Checks = 6, at epoch 148
6
involves the feeding the input image, preprocessing,
feature extraction, identifying the cancer spot and
4
indicate the results to the user. In case of the
val fail
0
0 20 40 60 80 100 120 140
148 Epochs
(1/| 𝑋𝑝 − 𝐶𝑛 |2)1/(p-1)
Wvn = ,
n=1,2,….N,v=1,2,..V (4)
𝑁 2 1/(p+1)
𝑖 =1
(1//| 𝑋𝑝 − 𝐶𝑛 | )
Figure 18: Result.
6. If the input is altered, do again from step 3,
The cancer affected part as show in figure 17. The else stop the process.
result for figure 18.
7. Set each pixel to a cluster according to the
IV. CLUSTERING ALGORITHMS maximum weight.
A. Fuzzy- C-Means Clustering Algorithm. B. Convolution Neural Network.
Clustering is the method of separating the data into Architecture of one hidden layer is depicted in Figure
homogenous units by considering the relationship of 19. It is examined for its skill to classify theNodules.
objects. The clustering method is the allocation of the This network consists of three layers namely, one
feature vectors into N clusters. Every n thcluster has input layer, one hidden layer, and one output layer.
Cn as its center. Fuzzy Clustering is employed in The input layer has P neurons that represent the P x P
numerous areas such as pattern recognition and fuzzy pixel of the image obtained from segmentation
detection. Among various kinds of fuzzy clustering process. The hidden layer contains groups of N x N
methods, Fuzzy C-Mean clustering (FCM) is the neurons organized as a sovereign N x N feature map
extensively used one. FCM utilizes reciprocal (where N=P-r+1) and the r x r area is represented as
distance to determine fuzzy weights. The input of this the interested area. Each hidden neuron selects input
process is a pre known number of clusters, N. The from a r x r adjacent section on the input image
mean position of every the members of a cluster is section. If the neurons in the similar feature map are
identified. The output is the segregating of N clusters one neuron distant, then their interested areas in the
on a class of objects. Thegoal of the FCM cluster is input layer are one pixel distant. Each neuron of the
to reduce the total weighted mean square error, similar feature map is reserved to take the identical
(MSE). The FCM consents each feature vector to group of R weights and accomplish the equal action
match with several clusters of different fuzzy on the resultant fragments of the input image.
membership values. The final segmentation is based
on the optimum weight of the feature vector over all
clusters. The steps involved in the FCM algorithm are
given below.
Input: feature vectors (image voxels) v=
{v1,v2,……..vn}N=number of cluster.
Output: A group of clusters that lessning the sum of
error of distance.
Steps:
1. Set random weight for every pixel using fuzzy
weighting with positive weights {Wvn } ranging
from 0-1.
2. Normalize the starting weights for eachvnvoxel
on all N clusters by using the below equation.
Wvn / i=1N Wvi (1)
3. Normalize the weights on n=1 1,….,N for each Figure 19: Architecture of One Hidden Layer CNN.
v to get Wvn as given below
The advantage of hindering the weights permits the
W nv network to achieve shift-invariant pattern recognition.
Wvn = 𝐕 W n ,v=1,2,……….,V (2) Hence, the total action is represented as the r x r
𝐢=𝟏 v
convolution kernel. The feature map is the output
4. Estimate new centroidsCn,n=1,…….,n from obtained from the convolution of the input with the r x
r convolution kernel. Each hidden neuron yj creates its
output by means of an activation function represented.
V n
r=1 Wv v ,n=1,2,……N
Cn= (3) The minimum and maximum activation functions are
zero and one, correspondingly.
The network weights as well as the bias weights VI. EXPERIMENTAL RESULTS.
are altered by the application of the Back Propagation
(BP) algorithm. The BP algorithm iteratively alters The enactment of the study proposed is valuedby
the weights with theintention of reducing the total benchmark metrics: Sensitivity, Specificity, and
error of the actual output vector from the target Accuracy.The description of these metrics and how
vector. The error functionto be reduced is called as their values are estimated. They are valued using
the Sum-of-Squared Error (SSE).During training, the confusion matrix which includes true and false
interested areas within one hidden class are restricted positive and true and false negative. The true
to consume the equal form of weights. The weights negative and positive envisage that the cases are
between hidden and output layers and the weights of diseased and non-diseased in which they are in fact
every interested area, are altered by means of diseased and non-diseased. The false negative and
stochastic mode. In this method, the weight positive are simply contradictory to the true negative
difference for each training sample is obtained from and positive.
each back-propagated error and are altered
instantaneously for every neuron. 𝑇𝑃
Sensitivity =
V. DATA SECTION 𝑇𝑃+𝐹𝑁
(5)
Sensitivity was truthful positive estimates
Our primary dataset is the patient lung CT scan divided by the entire positives.
dataset from KagglesData Science Bowl (DSB) 2017
[20]. The dataset contains labeled data for 1387 𝑇𝑁
patients, which we divide into training set of size Specificity =
968, and test set of size 419. For each patient, the 𝑇𝑁+𝐹𝑃
(6)
data consists of CT scan data and a label (0 for no
cancer, 1 for cancer). Note that the Kaggle dataset
Specificity was truthful non- positive estimates
does not have labeled nodules. For each patient, the
divided by the entire negatives.
CT scan data consists of a variable number of images
(typically around 100- 400, each image is an axial
slice) of 512 × 512 pixels. The slices are provided in 𝑇𝑃+𝑇𝑁
Accuracy =
DICOM format. Around 75% of the provided labels 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
in the Kaggle dataset are 0, so we used a weighted (7)
loss function in our malignancy classifier to address
this imbalance. Where TP - True Positive, TN – True Negative,
FP- False Positive, FN- False Negative.
Average 89%
88%
Testing/ Validation 178 148 4 5 21 44% 85%
Average 85%
VIII. REFERENCES
VII. CONCLUSION
[1] W.J. Choi and T.S. Choi, “Automated pulmonary nodule
detection system in computed tomography images: A
In this paper we developed a convolutional neural
hierarchical block classification approach,” Entropy, vol.
network (CNN) architecture to detect nodules in 15, no. 2, pp. 507–523, 2013.
patients of lung cancer and detect. This step is a [2] A. Chon, N. Balachandar, and P. Lu, “Deep convolutional
preprocessing step for CNN. While we perform neural networks for lung cancer detection,” tech. rep.,
Stanford University, 2017.
well considering that we use less labeled data than [3] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional
most state-of-the-art CAD systems. As an networks and applications in vision.,” in Proceedings of the
interesting observation, the first layer is a IEEE International Symposium on Circuits and Systems
preprocessing layer for segmentation using (ISCAS), pp. 253–256, IEEE, 2010.
different techniques. Threshold, FCM and CNN are [4] K. Alex, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,” in
used to identify the nodules of patients. Advances in Neural Information Processing Systems 25
(NIPS 2012) (F. Pereira, C. J. C. Burges, L. Bottou, and K.
The network can be trained end-to-end from Q. Weinberger, eds.), pp. 1097–1105, 2012.
[5] H. Suk, S. Lee, and D. Shen, “Hierarchical feature
image patches. Its main requirement is the
representation and multimodal fusion with deep learning for
availability of training database, but otherwise no AD/MCI diagnosis,” NeuroImage, vol. 101, pp. 569–582,
assumptions are made about the objects of interest 2014.
or underlying image modality. [6] G. Wu, M. Kim, Q. Wang, Y. Gao, S. Liao, and D. Shen,
“Unsupervised deep feature learning for deformable
registration of mr brain images.,” Medical Image
In the future, it could be possible to extend Computing and Computer-Assisted Intervention, vol. 16,
our current model to not only determine whether or no. Pt 2, pp. 649–656, 2013.
not the patient has cancer, but also determine the [7] Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, and E. I. Chang,
exact location of the cancerous nodules. The most “Deep learning of feature representation with multiple
instance learning for medical image analysis,” in IEEE
immediate future work is to use FCM segmentation International Conference on Acoustics, Speech and Signal
as the initial lung segmentation. Also, we saved our Processing, ICASSP, pp. 1626–1630, 2014.
model at accuracy, but perhaps we could have [8] D. Kumar, A. Wong, and D. A. Clausi, “Lung nodule
classification using deep features in ct images,” in 2015
saved at other metrics. Other future work include
12th Conference on Computer and Robot Vision, pp. 133–
extending our models to images for other cancers. 138, June 2015.
The advantage of not requiring too much labeled [9] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and
data specific to our cancer is it could make it H. Greenspan, “Chest pathology detection using deep
learning with non-medical training,” Proceedings -
generalizable to other cancers. International Symposium on Biomedical Imaging, vol.
2015-July, pp. 294–297, 2015.