Lung Cancer Classification and Detection Using CNN
Lung Cancer Classification and Detection Using CNN
Keywords
Deep learning; CNN; Lung Cancer; Tumor Detection.
1. INTRODUCTION
Cancer is one of the most common diseases for humans. According
to the World Health Organization cancer has led to 9.6 million Figure 1. Lung cancer cases between males and females
deaths in 2018 and lung cancer is on top of the cancer types with over the world [1].
1.76 million deaths [1] and figure 1 shows the worldwide lung
broad range of applications because of their remarkable properties.
cancer diagnosed patients by region. Furthermore, lung cancer is
also considered one the most dreadful illnesses in the developing Computers are becoming more powerful for developing state-of-
countries with a death rate of 19.4% [1]. There are many reasons the-art devices for Human-Computer Interaction (HCI) in
causing lung cancer which is caused by mutations in the gene by conjunction with latest improvements and accomplishments in the
growing a cancerous mass of cells that is developed into a tumor. field of computer science and electronics. One of the application
areas of HCI devices can be seen in sign language recognition as
proposed by [2] and [3] for three different languages and gesture
recognition as proposed by [4]. Because both of these applications
include of recognizing patterns, computer vision as well as
ICEMIS'20, September 14–16, 2020, Almaty, Kazakhstan linguistics [5].
© 2020 Association for Computing Machinery.
Moreover, the great improvement of technology has helped many
ACM ISBN 978-1-4503-7736-2/20/06…$15.00
researchers to find ways for early detection of cancer. Many
https://doi.org/10.1145/3410352.3410822 techniques are done for early detection of lung cancer like
Computed Tomography (CT) scans, Sputum Cytology, Chest X-
ray and Magnetic Resonance Imaging (MRI). Detecting a tumor
means to classify the tumor into two categories 1. Benign tumor propagation neural network approach and kernel principal
(Non-cancerous) and 2. Malignant (Cancerous) [6]. To survive component analysis. The results showed the improvements on the
such a disease during its advanced stages is much less than when to error late which was less and higher accuracy.
get treatment in the earlier stages of the illness. Such analysis and
A similar work [11] tried to prove that convolutional neural
diagnostics methods can be significantly enhanced by the
network can deal in conjunction with encoded schemes. Their
techniques of image processing. Many researchers have used these
proposed work was based on convolutional neural network for cell
techniques for early detection of lung cancer. However, the early
detection and the system encoded the output pixel space. The
detection of cancer is still not prominently better. Therefore, with
researcher uses random projections to encode the images to a vector
the great progress of machine learning techniques, the early
of set dimension, and then regressing of the vector is done by CNN
detection and diagnosis of cancer are sought by a great number of
from the input pixels. In this work, the researchers used 7 typical
scientists and researchers. Because neural networks have played an
datasets with different target cells and other approaches of
important role in the cancer cells’ recognition amongst the normal
evaluation and comparison.
ones, which basically offers an effectual means to create a useful
Artificial Intelligence (AI) based cancer detection approach. The In [12], the implementation of ANN Classifier using MATLAB for
curing of the cancer can be more effective once the tumor cells are detecting skin cancer was proposed. In this study, a computer aided
correctly parted from the normal ones so the classification of the classification and dermoscopy image of skin cancer have been used
tumor cells can be done with training the neural network which in and many per-processing steps and segmentation to separate the
turn forms the foundation of the machine learning based cancer cancer skin part from normal skin were applied. Some features have
diagnosis [7]. Additionally, one of the techniques for this purpose been extracted by use of ANN classifier and it classified the data
is the deep learning which is an improved version of artificial neural into cancerous and non-cancerous.
network. In particular, for computer vision fields, generally the
convolutional neural network is used [8, 25-29]. 3. CHARACTERISTICS OF LUNG
The aim of this work is to design and implement a deep learning CANCER
convolutional neural network to detect the different types of lung Lung cancer is one of the cancer types and it is the second most
cancer which are small cell lung cancer, adenocarcinoma, common cancer. It is known that, from its name, the cancer starts
squamous cell cancer, large cell carcinoma and undifferentiated in the lung organ of the human body and spreads in many years.
non-small cell lung cancer and attempt to prepare a system for early The most common symptoms of lung cancer are hemoptysis,
detection by using MATLAB GUI. dyspnea, cough, losing weight and anorexia. It can mainly divide
into two parts which are the non-small cells (NSCLC) and small
2. RELATED WORKS cells (SCLC), [13] where NSCLC are in 80% of the lung cancers.
As mentioned before, many researchers’ goal in the medical filed The lung cancer can be detected in three steps which are tissue
is to attempt developing systems that can have an advantage against diagnosis, staging, and functional evaluation, respectively. To
cancer disease. That has the capability to detect the cancer in earlier understand lung cancer in a better way, an understanding of the
stages so the chances of recovery are higher. This section will structure of lung is useful; in figure 2 [14], it shows the right and
review some of the literature of similar and previous approaches. left sides of the lung which are sponge-like organs located in the
The proposed works are based on different machine learning chest. The air goes from trachea into bronchus then to the smaller
algorithms and different types of cancer. parts which are called bronchioles. In these bronchioles, there are
tiny air sacs known as alveoli. These alveoli connect with the blood;
A work done by [9] used deep learning and convolutional Neural therefore, the movement of carbon dioxide and oxygen occurs in
Networks for lung cancer detection. The work is based on alveoli [13].
computed tomography (CT) scans; the CT scans are trained by
using a double convolutional Deep Neural Network. The developed The most common lung cancer causes are either direct smoking or
system used to detect the Tx (where the tumor can’t be valued) secondhand smoke and also a small percentage of lung cancer
cancer stage which can help to determine the possibility of lung happens due to other reasons.
cancer. The system was divided into two main parts: the first was
to make the convolutional Deep Neural Network which is more
intensive by per classifying the inputs of CT scan images from
dataset by using K-means algorithm and the next step was built on
a double convolutional deep neural network and that the results of
the tests have shown an improvement in the accuracy from normal
convolutional Deep Neural Network where the double
convolutional Deep Neural Network was 0.9962 and the normal
was 0.876.
In [10], a detection and classification of breast cancer was done by
using analysis and gene-back proportional neural network
algorithm. The researchers used two algorithms which are K-
nearest neighbor (KNN) and Naïve Bayes to determine the breast
cancer. The dataset that was used were collected from university of
California Irvine. Their main aim of the work was predicting the Figure 2. Human lung [14].
features of the dataset by using machine learning. The proposed According to the World Health Organization the large sized lung
system started with pre-processing stage to find the missed cancer is also divided into the parts which are listed below: [1][15]
attributes in the dataset after that the system found the feature
• Adenocarcinoma
vectors. The test for detection has been done by using Gene-back
o Lepidic adenocarcinoma pattern recognition, and signal processing. There are three reasons
o Acinar adenocarcinoma why deep learning is widespread and popular:
o Papillary adenocarcinoma 1-Increased chips processing capabilities, such as general-purpose
o Micropapillary adenocarcinoma graphical processing units (GPGPU).
o Solid adenocarcinoma
o Invasive mucinous adenocarcinoma 2-Increasing size of the data used in the training.
o Colloid adenocarcinoma 3-Modern progress in machine learning, signal and information
o Fetal adenocarcinoma processing. [18]
o Enteric adenocarcinoma These recent developments have allowed the methods of deep
o Minimally invasive adenocarcinoma learning to use complex non-linear complex functions to learn
• Squamous cell carcinoma
representations of the distributed and hierarchical properties, and to
• Neuroendocrine tumors
achieve effective use of the disaggregated and unrated data. Studies
o Carcinoid tumors by many experts at several universities have proven the success of
o Typical carcinoid deep learning in various applications such as computer vision,
o Atypical carcinoid verbal recognition, voice research, speech recognition, image and
o Small cell carcinoma letter character encoding, semantic speech classification,
o Large cell neuroendocrine carcinoma understanding of natural languages, handwriting recognition, voice
• Large cell carcinoma processing, information retrieval, robotics, and molecular analysis
• Adenosquamous carcinoma that may lead to discovery new drugs. [19]
• Pleomorphic carcinoma
• Spindle cell carcinoma However, the use of deep and wide networks requires a large
• Giant cell carcinoma mathematical power during the training process, which is why
• Carcinosarcoma researchers have refrained from working with these networks
• Pulmonary blastoma recently. Also, better education algorithms contributed to the
• Other and unclassified carcinomas success of deep neuronal networks, and stochastic gradient descent
(SGD) algorithms were the most effective with large training
4. CNN groups and this is the case of most applications, as has recently
The main difference between neural network and convolutional proven its effectiveness in branch work across several machines in
neural network is that the convolutional neural network is in its last an asynchronous mode. [20].
layer and that it uses convolution instead of general matrix. As
mentioned before CNN is a type of deep learning that is commonly
6. IMPLEMENTATION AND RESULTS
used for image classification and object recognition in images [16].
CNN have self-optimizing neurons via learning and its architecture
has three layers which are convolutional layer, pooling layer and
fully-connected layer as seen in figure.
In short, CNN technique has two main procedures: convolution Figure 4. Architecture of proposed system.
and sampling. What takes place in the convolutional layer is that a
Figure 4, shows the architecture of the system and the network will
trainable filter is used on the input image which is convolved. Then
be used in transfer learning to train and predict with the ratios of
it becomes a feature image of each layer which is named feature
85% for training and 15 % for testing.
map, after that, a bias is added to it. This layer identifies the output
of the neurons that are linked to the local regions of the input via The proposed system is to develop a system that can detect the lung
the product calculation between the neuron weights and the region cancer in the earlier stages based on deep learning convolutional
linked to the input. Then the rectified linear unit (ReLu) which is neural network using MATLAB GUI by transfer learning for most
an activation function performs an elementwise function to the common lung cancer types which are: small cell lung cancer,
output that is generated by the previous layer. After that, the adenocarcinoma, squamous cell cancer, large cell carcinoma and
sampling process takes place in the pooling layer. The fully- undifferentiated non-small cell lung cancer which can be seen in
connected layer attempts to produce the class scores from the figure 5.
activations which then can be utilized for classification [17].
5. DEEP LEARNING
Deep learning is found among some fields: neural network
research, artificial intelligence, graphical modeling, optimization,
neural network, one output layer with 6 neurons and input
layer is shown.
C. Adenocarcinoma D .Squamous
Actual
Positive Negative
Figure 10. Training configuration.
Positive TP FP
Figure 10. shows the configuration of the training process that has
been used, stochastic gradient descent with momentum. Estimated
Negative FN TN
The Convolution Network training process can be summarized in
the following steps:
Step 1: Configure all filters, parameters and weights with random
initial values. TP is representing true positives: Meaning the image that is really
of cancer and was assessed by the system as cancer.
TN is representing true negatives: the image that is in real normal 7. CONCLUSION
and was predicted by the system as normal. It is inevitable that the advances of technology have affected all the
FP is representing false positives: the image which is supposed to other disciplines, particularly, the medical field. It is well known
be normal but predicted by the system as cancer. that cancer is one of the most difficult diseases to be cured. Many
scientists and researchers are trying to come up with ways to detect
FN is representing false negatives: the image which is really
cancerous cells earlier to increase survival chances. One of the
supposed to be cancer but estimated by the system as normal.
approaches scientists for early cancer detection is done by machine
Where 60 images were tested.
learning which is considered one of the most efficient techniques.
Equation (2) is used to find the accuracy to measure the That’s why, an early lung cancer detection system is proposed in
classification performance and it showed to be 93.33% as seen this work with 5 different types of lung cancer with 100 CT scan
below in (3). Furthermore, for the purpose of recognizing patterns, images with 227x227 for each type. A deep learning convolutional
retrieving and classifying information, precision and recall are also neural network with seven layers has been used and designed in
used [24]. Precision is defined as the fraction of important deep network designer. The training and testing were done by using
occurrences among the retrieved occurrences, while recall is transfer learning and the accuracy was done by using confusion
defined as the fraction of the total amount of important occurrences matrix which gave 93.33% accuracy, a precision of 92.59% and a
that were retrieved actually [24]. 100% recall. These results were compared with two other similar
Therefore, using (4) the precision was found to be 92.59% as shown works and was tabulated which showed that overall result of this
in (5) while with (6) the recall was found to be 1 as indicated in (7). current work is better.
TP + TN TP 8. REFERENCE
Auucracy = [1]. Who.int, “Cancer”, 2018. [Online]. Available:
(2)
+ TN + FP + FN https://www.who.int/news-room/fact-sheets/detail/cancer.
[Accessed: 14- Dec- 2019].
[2]. S. Ozbay and M. Safar, "Real-time sign languages recognition
based on hausdorff distance, Hu invariants and neural
50 + 6 50
= 93.33% (3) network", 2017 International Conference on Engineering and
+4+6+0 Technology (ICET), 2017. Available:
10.1109/icengtechnol.2017.8308204 [Accessed 14 December
2019].
TP [3]. M. Sefer, R. Agha and S. Özbay, "Comparison of neural
Percision = (4) network and hausdorff distance methods in American, British
TP + FP and Turkish sign languages recognition", Proceedings of the
First International Conference on Data Science, E-learning
50 (5)
and Information Systems - DATA '18, 2018.
= 92.59% Available: 10.1145/3279996.3280007 [Accessed 14
50 + 4 December 2019].
[4]. H. Zou, Y. Zhou, J. Yang, H. Jiang, L. Xie and C. Spanos,
"WiFi-enabled Device-free Gesture Recognition for Smart
Home Automation", 2018 IEEE 14th International
TP (6) Conference on Control and Automation (ICCA), 2018.
Re call = Available: 10.1109/icca.2018.8444331 [Accessed 14
TP + FN December 2019].
[5]. O. Aran, vision based sign language recognition: modeling
and recognizing isolated signs with manual and non-manual
50 components. CmpE., Bo gazi ci University, 2002.
=1 (7)
[6]. E. Beek, "Lung cancer screening: Computed tomography or
50 + 0
chest radiographs?", World Journal of Radiology, vol. 7, no.
8, p. 189, 2015. Available: 10.4329/wjr.v7.i8.189 [Accessed
Additionally, table II compares the current approach of this 14 December 2019].
proposed system with a couple of similar works. [7]. K. Kourou, T. Exarchos, K. Exarchos, M. Karamouzis and D.
Fotiadis, "Machine learning applications in cancer prognosis
Table 2. A Comparison between currant work and and prediction", Computational and Structural Biotechnology
similar works. Journal, vol. 13, pp. 8-17, 2015. Available:
Current [9] [23] 10.1016/j.csbj.2014.11.005 [Accessed 14 December 2019].
work [8]. S. Sasikala, M. Bharathi and B. Sowmiya, "Lung Cancer
Deep CNN Double NIN CNN Detection and Classification Using Deep CNN",
CNN CNN in International Journal of Innovative Technology and
Exploring Engineering, 2018.
Accuracy 93.33% 87.69% 99% 90% 90%
[9]. G. Jakimovski and D. Davcev, "Using Double Convolution
Precision 92.59% - - 99% 85% Neural Network for Lung Cancer Stage Detection", Applied
Recall 100% 97.46% 99% 68% 85% Sciences, vol. 9, no. 3, p. 427, 2019. Available:
10.3390/app9030427 [Accessed 14 December 2019].
[10]. A. Kaur and P. Kaur, "Breast Cancer Detection and
Classification using Analysis and Gene-Back Proportional
Neural Network Algorithm", in International Journal of '18, 2018. Available: 10.1145/3279996.3280024 [Accessed 15
Innovative Technology and Exploring Engineering, 2019. December 2019].
[11]. X. Yao and N. Ray, "Cell Detection in Microscopy Images [18]. I. Goodfellow, Y. Bengio and A. Courville, Deep learning.
with Deep Convolutional Neural Network and Compressed [19]. J. Gu et al., Recent Advances in Convolutional Neural
Sensing.", ArXiv, 2017. [Accessed 14 December 2019]. Networks. 2017.
[12]. A. R.B, J. Abdul Jaleel and S. Salim, "Implementation of ANN [20]. N. Khalifa, M. Taha, A. Hassanien and H. Mohamed, "Deep
Classifier using MATLAB for Skin Cancer Detection", Iris: Deep Learning for Gender Classification Through Iris
in International Journal of Computer Science and Mobile Patterns", Acta Informatica Medica, vol. 27, no. 2, p. 96, 2019.
Computing, 2013. Available: 10.5455/aim.2019.27.96-102 [Accessed 15
[13]. M. Bhatt, S. Kant and R. Bhaskar, "Pulmonary tuberculosis as December 2019].
differential diagnosis of lung cancer", South Asian Journal of [21]. K1 hospital Kirkuk, Iraq
Cancer, vol. 1, no. 1, p. 36, 2012. Available: 10.4103/2278- [22]. Frederick Nat. Lab for Cancer Research at the University of
330x.96507 [Accessed 15 December 2019]. Marburg
[14]. Lung Cancer. American Cancer Society, 2018. [23]. H. Abubakir Ahmed and S. Abdulla Mahmood, "A Deep
[15]. M. Zheng, "Classification and Pathology of Lung Learning Technique For Lung Nodule Classification Based on
Cancer", Surgical Oncology Clinics of North America, vol. False Positive Reduction", Journal of Zankoy Sulaimani - Part
25, no. 3, pp. 447-468, 2016. Available: A, vol. 21, no. 1, pp. 107-116, 2019. Available:
10.1016/j.soc.2016.02.003 [Accessed 15 December 2019]. 10.17656/jzs.10749 [Accessed 16 December 2019].
[16]. S. Shiwen, Characterizing Pulmonary Nodules using Machine [24]. C. Manning, P. Raghavan and H. Schütze, Introduction to
and Deep Learning Methods to Improve Lung Cancer information retrieval. Cambridge: Cambridge University
Diagnosis. 2018. Press, 2008, pp. 151-175.
[17]. R. Agha, M. Sefer and P. Fattah, "A comprehensive study on [25]. Vangipuram Radhakrishna, Shadi Aljawarneh, and Aravind
sign languages recognition systems using (SVM, KNN, CNN Cheruvu. 2018. Sequential Approach for Mining of Temporal
and ANN)", Proceedings of the First International Conference Itemsets. In Proceedings of the Fourth International
on Data Science, E-learning and Information Systems - DATA Conference on Engineering & MIS 2018 (ICEMIS ’18).
Association for Computing Machinery, New York, NY,
USA, Article 33, 1–6.
DOI:https://doi.org/10.1145/3234698.3234731
[26]. Shadi Aljawarneh, Muneer Bani Yassein, and We’am Adel
Talafha. 2018. A multithreaded programming approach for
multimedia big data: encryption system. Multimedia Tools
Appl. 77, 9 (May 2018), 10997–11016.
DOI:https://doi.org/10.1007/s11042-017-4873-9
[27]. Vangipuram Radhakrishna, Shadi A. Aljawarneh, Puligadda
Veereswara Kumar, and Kim-Kwang Raymond Choo. 2018.
A novel fuzzy gaussian-based dissimilarity measure for
discovering similarity temporal association patterns. Soft
Comput. 22, 6 (March 2018), 1903–1919.
DOI:https://doi.org/10.1007/s00500-016-2445-y
[28]. Shadi Aljawarneh, Shadi Aljawarneh, and Manisha Malhotra.
2017. Critical Research on Scalability and Security Issues in
Virtual Cloud Environments (1st. ed.). IGI Global, USA.
[29]. Shadi Aljawarneh, Muneer Bani Yassein, and We’am Adel
Talafha. 2017. A resource-efficient encryption algorithm for
multimedia big data. Multimedia Tools Appl. 76, 21
(November 2017), 22703–22724.
DOI:https://doi.org/10.1007/s11042-016-4333-y
Columns on Last Page Should Be Made As Close As
Possible to Equal Length
Authors’ background
Your Name Title* Research Field Personal website
*This form helps us to understand your paper better, the form itself will not be published.
*Title can be chosen from: master student, Phd candidate, assistant professor, lecture, senior lecture, associate
professor, full professor