PROJECT REPORT On Lung Cancer Detection Using CNN
PROJECT REPORT On Lung Cancer Detection Using CNN
USING CNN
Page 1 of 21
TABLE OF CONTENTS
Page 2 of 21
System Architecture ........................................................................................... Page 13
• Implementation Steps
Conclusion............................................................................................................... Page 17
Future Scope……………………………………………………………………….Page 18
References ............................................................................................................... Page 20
Appendix…………………………………………………………………………...Page 21
Page 3 of 21
ABSTRACT
Lung cancer is one of the leading causes of cancer-related deaths worldwide, primarily due to
late-stage detection and limited accessibility to early screening methods. Traditional
diagnostic techniques, such as radiologist-based manual examination of X-ray and CT scans,
are time-consuming, prone to human error, and often unavailable in remote areas. Artificial
Intelligence (AI) and Deep Learning (DL) offer a revolutionary solution by enabling fast,
automated, and highly accurate detection of lung cancer.
Problem Statement:
Existing diagnostic approaches rely heavily on human expertise, making them subjective,
slow, and susceptible to misdiagnosis. Many regions lack access to trained radiologists,
causing delays in diagnosis and treatment. A CNN-based lung cancer detection system can
help automate the process, ensuring faster, more reliable, and accessible healthcare solutions.
Objectives:
• Develop a deep learning-based model for lung cancer detection.
• Utilize a pre-trained ResNet-18 CNN model for high-accuracy classification of lung
cancer from X-ray images.
• Design a Flask-based web interface for real-time analysis and easy image uploads.
Methodology:
• Dataset Preparation: Use the IQ-OTH/NCCD dataset with 1,190 images for model
training and testing.
• Model Training: Fine-tune ResNet-18 to improve lung cancer classification accuracy.
• System Implementation: Deploy a Flask-based web application for real-time detection
and result display.
Key Results:
• Achieved 95% accuracy in detecting lung cancer.
• Reduced diagnostic time, improving patient care.
• Successfully deployed an interactive and user-friendly web interface for X-ray/CT
Scan analysis.
Page 4 of 21
1. Introduction
Lung cancer remains a major global health challenge, with high mortality rates due to
late-stage diagnosis. Traditional diagnostic methods, such as radiologist interpretation
of CT scans, are time-intensive and subject to human error, often leading to delayed
treatment. Early detection of lung cancer is crucial in improving survival rates, as
timely medical intervention can significantly enhance patient outcomes. However, the
lack of automated, efficient, and accurate diagnostic tools poses a significant
challenge.
This project aims to develop an AI-driven lung cancer detection system using
Convolutional Neural Networks (CNNs), specifically leveraging the ResNet-18
model. CNNs have revolutionized medical imaging by automating feature extraction
and classification, reducing the dependency on manual analysis. By using a pre-
trained ResNet-18 model, the system classifies lung CT scan images as cancerous or
non-cancerous with high accuracy and efficiency.
2. Problem Statement
Lung cancer detection through traditional methods is time-consuming, prone to
diagnostic errors, and highly dependent on expert radiologists. The high false-
negative rates in early detection stages lead to delayed diagnosis and treatment,
reducing survival rates. There is a need for a fast, accurate, and automated deep-
learning-based approach to assist medical professionals in detecting lung cancer at an
early stage.
3. Objectives
• To develop an AI-based system using CNN (ResNet-18) for lung cancer
detection from CT scan images.
• To enhance the accuracy of lung cancer classification through transfer
learning and hyperparameter tuning.
• To reduce dependency on human analysis by automating the detection
process.
• To evaluate model performance using metrics such as accuracy.
• To develop a web-based interface for real-time image upload and
classification.
Page 5 of 21
4. Scope of Work
• Dataset Selection: Using the Kaggle Lung Cancer Dataset for training and
testing.
• Data Preprocessing: Applying image resizing, noise reduction, and data
augmentation to improve model robustness.
• Model Development: Training and optimizing ResNet-18 through transfer
learning and hyperparameter tuning.
• Performance Evaluation: Assessing model efficiency using metrics such as
accuracy.
• Deployment: Creating a user-friendly web-based interface for real-world
application.
Page 6 of 21
Literature Review
Introduction
Lung cancer remains one of the most deadly diseases worldwide, with a high mortality rate
due to late detection. Early diagnosis is crucial for improving survival rates, and
advancements in medical imaging and deep learning have shown promise in this area. This
literature review examines previous research and studies related to lung cancer detection
using CT scan images, compares existing solutions, and identifies gaps in the current
research.
Page 7 of 21
3. A Comparative Analysis for Early Diagnosis of Lung Cancer Detection
and Classification by CT Images Processing Using ResNet-50 Model of
CNN (Research Paper 3)
o This research compared various automated methods for lung cancer detection
using CT images, focusing on the ResNet-50 model. The study discussed the
use of different datasets like LIDC, ELCAP, and LUNA-16, and highlighted
the importance of preprocessing, segmentation, and feature extraction. The
ResNet-50 model achieved an accuracy of 66.92% in lung cancer detection,
with higher accuracies in other applications like breast cancer detection
(99.10%) and COVID-19 detection (96.23%).
Page 8 of 21
Comparison with Existing Solutions
• Dataset and Preprocessing: Research Papers 1, 2, 4, and 5 used datasets from Kaggle,
while Research Paper 3 used multiple datasets like LIDC and LUNA-16.
Preprocessing techniques such as noise reduction, image normalization, and data
augmentation were common across all studies, with Research Paper 4 emphasizing
the importance of random oversampling and data augmentation to handle class
imbalance.
• Model Complexity: The hybrid VER-Net model in Research Paper 4 combined three
transfer learning models (VGG19, EfficientNetB0, and ResNet101), which increased
computational complexity but improved accuracy. In contrast, Research Paper 1 used
a simpler CNN architecture, which was easier to implement but had lower accuracy
compared to ResNet-50 variants.
1. Dataset Limitations: Most studies used publicly available datasets like Kaggle, LIDC,
and LUNA-16, which may not fully represent the diversity of lung cancer cases in
real-world clinical settings. There is a need for larger and more diverse datasets to
improve model generalizability.
2. Model Interpretability: While deep learning models like ResNet-50 and CNN achieve
high accuracy, they often lack interpretability. Clinicians need models that not only
predict accurately but also provide insights into the decision-making process, which is
currently a gap in existing research.
Page 9 of 21
3. Class Imbalance: Many studies, including Research Papers 1 and 4, faced challenges
with class imbalance, where certain types of lung cancer (e.g., adenocarcinoma) were
overrepresented compared to others (e.g., large cell carcinoma). Techniques like
random oversampling and data augmentation were used, but more robust methods are
needed to handle this issue.
5. Generalization to Other Diseases: While most studies focused on lung cancer, there is
potential to apply these models to other diseases using CT scan images. Research
Paper 4 suggested that VER-Net could be useful for other diseases, but this has not
been extensively explored.
Conclusion
The reviewed studies demonstrate significant advancements in lung cancer detection using
CT scan images and deep learning models. ResNet-50 and its variants, along with hybrid
models like VER-Net, have shown superior performance in terms of accuracy and AUC.
However, challenges related to dataset diversity, model interpretability, class imbalance, and
computational efficiency remain. Future research should focus on addressing these gaps to
develop more robust and clinically applicable models for early lung cancer detection.
Page 10 of 21
Methodology
Introduction
Lung cancer remains one of the leading causes of mortality worldwide, and early detection is
crucial for improving survival rates. Traditional diagnostic methods, such as manual
examination of CT scans and X-rays by radiologists, are time-consuming and prone to human
error. This project employs deep learning techniques, specifically ResNet-18, a pre-trained
Convolutional Neural Network (CNN) model, to automate and enhance lung cancer
detection. The model takes X-ray/CT Scan images as input and classifies them as either
positive (cancerous) or negative (non-cancerous).
Dataset Used
The dataset utilized for this project is IQ-OTH/NCCD - Lung Cancer Dataset, which contains
1,190 X-ray images labelled as either cancerous or non-cancerous. This dataset is crucial for
training and evaluating the deep learning model.
Page 11 of 21
Tools, Technologies, and Frameworks Used
To build and deploy the model efficiently, we used the following:
Page 12 of 21
System Architecture and Flow Diagram
4. Classification: The model predicts whether the X-ray is positive or negative for lung
cancer.
5. Output: The result is displayed to the user.
Implementation Steps
1. Dataset Collection: IQ-OTH/NCCD dataset with 1,190 images is used.
Page 13 of 21
Project Implementation
Dataset Preparation:
• The IQ-OTH/NCCD - Lung Cancer Dataset consisting of 1190 images was used for
training and testing.
• Images were resized and normalized for better model performance.
• Data augmentation techniques such as rotation, flipping, and contrast adjustments
were applied to improve generalization.
System Development:
• A user-friendly interface was designed to allow users to upload X-ray/CT Scan
images.
• The system processes the input image and classifies it as positive (cancer detected) or
negative (no cancer detected).
• The Flask framework was used to integrate the model into a web-based interface.
Page 14 of 21
Challenges Faced and Solutions
• Imbalanced Dataset: The dataset had more normal cases than cancer cases.
Solution: Data augmentation techniques were used.
• Overfitting: Initial training resulted in overfitting. Solution: Dropout layers
and L2 regularization were applied.
• Processing Speed: Large images slowed down inference time. Solution:
Images were resized to 224×224 pixels for faster processing.
Page 15 of 21
Page 16 of 21
Conclusion
By utilizing the IQ-OTH/NCCD dataset, which contains 1,190 images, the model has been
trained to classify lung scans as cancerous or non-cancerous. The methodology involved
image preprocessing, CNN model training, hyperparameter tuning, and performance
evaluation. The results demonstrated that deep learning-based approaches outperform
traditional diagnostic techniques, offering higher precision and reliability.
Despite its success, the model has room for further improvements, including multi-class
classification, integration with hospital systems, real-time detection through mobile
applications, and enhanced interpretability using Explainable AI (XAI). Future advancements
in transfer learning, dataset expansion, and multimodal diagnostic approaches can further
refine the system, making it more robust and clinically applicable.
In conclusion, this project establishes a strong foundation for AI-driven lung cancer
diagnostics. By integrating such intelligent automated detection systems into healthcare
infrastructure, we can significantly improve early detection rates, enhance patient outcomes,
and ultimately contribute to reducing lung cancer-related mortality. With continuous research,
collaboration, and technological advancements, this system can become a vital tool in modern
medical diagnostics, paving the way for a future where AI plays a crucial role in cancer
detection and treatment planning.
Page 17 of 21
Future Scope of Lung Cancer Detection using CNN
The implementation of deep learning-based lung cancer detection using Convolutional Neural
Networks (CNN) has demonstrated significant potential in assisting early diagnosis.
However, there is vast scope for further development and improvement in various aspects of
the project.
3. Multi-Class Classification
The current model focuses on binary classification (cancerous or non-cancerous). Future
improvements could include multi-class classification, differentiating between various stages
of lung cancer, tumor types, and severity levels, allowing for more detailed diagnosis.
Page 18 of 21
6. Explainable AI and Interpretability
A significant challenge in deep learning is model interpretability. Future research can focus
on integrating explainable AI (XAI) techniques such as Grad-CAM or SHAP to provide
insights into how the model makes predictions, increasing trust and usability among medical
professionals.
Page 19 of 21
References
[2] V. Lakide and V. Ganesan, “Precise Lung Cancer Prediction using ResNet-50 Deep
Neural Network Architecture,” J. Electron. Electromed. Eng. Med. Inform., vol. 7, no. 1, pp.
38–46, Jan. 2025, DOI: 10.35882/jeeemi.v7i1.518.
Page 20 of 21
Appendix
Dataset Information
The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD)
lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of
three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in
different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by
oncologists and radiologists in these two centers. The dataset contains a total of 1190 images
representing CT scan slices of 110 cases. These cases are grouped into three classes: normal,
benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed
with benign; and 55 cases classified as normal cases. The CT scans were originally collected
in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes:
120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and
window center from 50 to 600 were used for reading. with breath hold at full inspiration. All
images were de-identified before performing analysis. Written consent was waived by the
oversight review board. The study was approved by the institutional review board of
participating medical centers. Each scan contains several slices. The number of these slices
range from 80 to 200 slices, each of them represents an image of the human chest with
different sides and angles. The 110 cases vary in gender, age, educational attainment, area of
residence and living status. Some of them are employees of the Iraqi ministries of Transport
and Oil, others are farmers and gainers. Most of them come from places in the middle region
of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.
Page 21 of 21