Project Report Early Lung Cancer Detection Using Machine Learning and Image Processing
Project Report Early Lung Cancer Detection Using Machine Learning and Image Processing
A PROJECT REPORT
Submitted by
PRAJWALA G (111516205042)
RAMYA M (111516205045)
INFORMATION TECHNOLGY
APRIL 2020
BONAFIDE CERTIFICATE
Certified that this project report titled “EARLY LUNG CANCER DETECTION
USING MACHINE LEARNING AND MAGE PROCESSING”, is the
bonafide work of “PRAJWALA G (111516205042), RAMYA M
(111516205045), RAMYA GEETHA S V (111516205046)” who carried out the
project work under my supervision, for the partial fulfillment of the requirements for
the award of the degree of Bachelor of Technology in Information Technology.
Certified further that to the best of my knowledge and belief, the work reported herein
does not form part of any other thesis or dissertation on the basis of which a degree or
an award was conferred on an earlier occasion.
SIGNATURE SIGNATURE
CERTIFICATE OF EVALUATION
Semester : 08
Prajwala G
(111516205042)
EARLY LUNG CANCER Dr. R. Jothilakshmi
DETECTION USING
MACHINE LEARNING Ramya M M.E., Ph.D.,
AND IMAGE (111516205045)
PROCESSING ASSOCIATE
Ramya Geetha S V
PROFESSOR
(111516205046)
The report of the project work submitted by the above students in partial
fulfillment for the award of Bachelor of Technology Degree in
INFORMATION TECHNOLOGY of Anna University was evaluated and
confirmed to be the report of the work done by the above students and then
evaluated.
ACKNOWLDGEMENT
A project of this magnitude and nature requires the kind co-operation and
support from many, for successful completion. We wish to express our sincere
thanks to all those who were involved in the completion of this project.
It is our immense pleasure to express our deep sense of gratitude to our chairman
Thiru R.S.MUNIRATHINAM, our vice chairman Thiru R.M.KISHORE, and
our director Thiru R.JOTHI NAIDU, for the facilities and support given by them
in the college.
ABSTRACT
Lung cancer is the second most common cancer in both men and women, and
is by far the leading cause of cancer death among both men and women. Each year,
more people die of lung cancer than of colon, breast, and prostate cancers combined.
Early detection of lung cancer can increase the chance of survival among people.
The overall 5-year survival rate for lung cancer patients increases from 14 to 49% if
the disease is detected in time. Although Computed Tomography (CT) can be more
efficient than X-ray. However, problem seemed to merge due to time constraint in
detecting the present of lung cancer regarding on the several diagnosing method
used. Hence, a lung cancer detection system using image processing is used to
classify the present of lung cancer in a CT- images. In this project, MATLAB have
been used through every procedures made. In this image processing involve, image
extraction by Neural network. We are aiming to get the more accurate results by
TABLE OF CONTENTS
ABSTRACT 5
LIST OF FIGURES 9
LIST OF ABBREVIATIONS 10
1 INTRODUCTION
2 LITERATURE SURVEY 19
3 SYSTEM ANALYSIS
4 UML DIAGRAMS
5 REQUIREMENT SPECIFICATIONS
6 MODULES
7 SYSTEM TESTING
9 APPENDIX
SCREENSHOTS 44
9.1 STEPS INVOLVED IN THE 44
ALGORITHM
9.2 FILTERING PROCESS OF THE IMAGE 45
9.3 EQUALIZATION PROCESS 45
9.4 BINARIZATION PROCESS 46
9.5 CUCKOO SEARCH 46
9.6 APPLYING K-MEANS ALGORITHM 47
9.7 PROCESSING OF THE K-MEANS 47
ALGORITHM
9.8 NORMALIZATION PROCESS 48
9.9 ANN CLASSIFICATION 49
9.10 FINAL RESULT 49
10 REFERENCES
50
LIST OF FIGURES
LIST OF ABBREVIATIONS
CT - COMPUTED TOMOGRAPHY
HU - HOUSEFIELD UNITS
PCA - PRINCIPAL COMPONENT ANALYSIS
SVM - SUPPORT VECTOR MACHINE
PPA - PRINCIPAL PATTERN ANALYSIS
UML - UNIFIED MODELLING LANGUAGE
10
CHAPTER 1
INTRODUCTION
Out of all these signals, the field that deals with the type of signals for which
the input is an image and the output is also an image is done in image processing. As
it name suggests, it deals with the processing on images. It can be further divided
into analog image processing and digital image processing.
Digital image processing has dominated over analog image processing with the
passage of time due its wider range of applications.
The digital image processing deals with developing a digital system that
performs operations on a digital image.
Image
128 30 123
123 77 89
80 255 255
Each number represents the value of the function f(x,y) at any point. In this case the
value 128, 232,123 each represents an individual pixel value. The dimensions of the
picture is actually the dimensions of this two dimensional array.
Relationship between a digital image and a signal
If the image is a two dimensional array then what does it have to do with a
signal? In order to understand that, we need to first understand what is a signal?
12
Signal
In physical world, any quantity measurable through time over space or any
higher dimension can be taken as a signal. A signal is a mathematical function, and
it conveys some information.
The two dimensional signals are those that are measured over some other
physical quantities. The example of two dimensional signal is a digital image. We
will look in more detail in the next tutorial of how a one dimensional or two
dimensional signals and higher signals are formed and interpreted.
Relationship
Since anything that conveys information or broadcast a message in physical
world between two observers is a signal. That includes speech or (human voice) or
an image as a signal. Since when we speak , our voice is converted to a sound
wave/signal and transformed with respect to the time to person we are speaking to.
Not only this , but the way a digital camera works, as while acquiring an image from
a digital camera involves transfer of a signal from one part of the system to the
other.
13
two dimensional array or matrix of numbers which are nothing but a digital image.
Computer graphics
Computer graphics deals with the formation of images from object models,
rather than the image is captured by some device. For example: Object rendering.
Generating an image from an object model. Such a system would look something
like this.
Artificial intelligence
Artificial intelligence is more or less the study of putting human intelligence
into machines. Artificial intelligence has many applications in image processing. For
example: developing computer aided diagnosis systems that help doctors in
interpreting images of X-ray , MRI, etc. and then highlighting conspicuous section
to be examined by the doctor.
Signal processing
Signal processing is an umbrella and image processing lies under it. The
amount of light reflected by an object in the physical world (3d world) is pass
through the lens of the camera and it becomes a 2d signal and hence result in image
formation. This image is then digitized using methods of signal processing and then
this digital image is manipulated in digital image processing.
14
Analyzing and manipulating the image which includes data compression and
image enhancement and spotting patterns that are not to human eyes like
satellite photographs.
Output is the last stage in which result can be altered image or report that is
based on image analysis.
Types
The two types of methods used for Image Processing are analog and digital
image processing. Analog or visual techniques of image processing can be used for
the hard copies like printouts and photographs. Image analysts use various
fundamentals of interpretation while using these visual techniques. The image
processing is not just confined to area that has to be studied but on knowledge of
analyst. Association is another important tool in image processing through visual
techniques. So analysts apply a combination of personal knowledge and collateral
data to image processing.
Digital Processing techniques help in manipulation of the digital images by
using computers. As raw data from imaging sensors from satellite platform contains
deficiencies. To get over such flaws and to get originality of information, it has to
undergo various phases of processing. The three general phases that all types of data
have to undergo while using digital technique are Pre- processing, enhancement and
display, information extraction.
15
1.4 Applications
Intelligent Transportation Systems – This technique can be used in
Automatic number plate recognition and Traffic sign recognition.
Remote Sensing – For this application, sensors capture the pictures of the
earth’s surface in remote sensing satellites or multi – spectral scanner which is
mounted on an aircraft. These pictures are processed by transmitting it to the
Earth station. Techniques used to interpret the objects and regions are used in
flood control, city planning, resource mobilization, agricultural production
monitoring, etc.
Moving object tracking – This application enables to measure motion
parameters and acquire visual record of the moving object. The different types
of approach to track an object are:
Motion based tracking
Recognition based tracking
16
17
18
CHAPTER 2
LITERATURE SURVEY
The research papers used for literature survey in the project are
PAPER: 1
Feature Extraction and Principal Component Analysis for Lung Cancer
Detection in CT scan Images
Ada, Rajneet Kaur
19
PAPER: 2
A survey on early detection and prediction of lung cancer
Neha Panpaliya , Neha Tadas , Surabhi Bobade , Rewti Aglawe , Akshay Gudadhe
Lung cancer is the leading cause of cancer death worldwide. The earlier
detection of lung cancer is a challenging problem due to structure of cancer
cell, where most of the cells are overlapped each other.
For early detection and treatment stages image processing technique are
widely used and for prediction of lung cancer, identification of genetic as
well as environmental factors are very important in developing novel
method of lung cancer prevention.
In various cancer tumors such as lung cancer the time factor is very
important to discover the abnormality issue in target images. Prediction of
lung cancer we consider significant pattern and their corresponding weight
age and score using decision tree algorithm.
Using the significant pattern tool for lung cancer prediction system will
develop. In this proposed system we use Histogram Equalization is used for
preprocessing of images and feature extraction processes and neural
network classifier to check the state of patient whether it is normal or
abnormal.
If the lung cancer is successfully detected and predicted in its early stages
will reduce many treatment options and also reduce risk of invasive surgery
and increase survival rate.
Therefore lung cancer detection and prediction system will propose which
is easy, cost effective and time saving. This will produce promising result
for detection and prediction of lung cancer.
Therefore early detection and prediction of lung cancer should play a vital
role in the diagnosis process and also increase the survival rate of patient.
20
PAPER: 3
Analysis and Edge Detection of Lung Cancer – Survey
C. Jeya Bharathi, Dr. P. Kabilan
Treating cancer in the early stages can provide more treatment options, less
invasive surgery, and increases the survival rate.
This paper deals with the detection of cancerous cells from Lungs CT scan
images. To analyze the cancerous cells, physicians tackle many challenging
tasks.
Locating lung cancer at an early stage is a challenging task since there are few
or no symptoms in this stage of the disease and majority of the cases are
diagnosed in the later stages of the disease.
The majority of lung cancers originate as a small growth or nodule in the lung.
Screening CT scans are extremely sensitive in detecting nodules as small as 2
or 3mm within the lungs.
CT screening is efficient in locating majority of lung cancers. Lung CT Scan
helps in detecting lung cancers at an early stage when compared with other
scans like MRI, X-Ray, etc.
This present work proposes a method to detect the cancerous cells effectively
from the CT scan images by reducing the detection error made by the
physicians’ naked eye for medical study based on canny edge detection.
PAPER: 4
Prediction of lung cancer using image processing techniques: A review
To predict the lung cancer various features are extracted from the images
therefore, pattern recognition based approaches are useful to predict the lung
cancer.
Here, a comprehensive review for the prediction of lung cancer by previous
researcher using image processing techniques is presented. The summary for
the prediction of lung cancer by previous researcher using image processing
techniques is also presented.
PAPER: 5
K-Means Clustering using Fuzzy C-Means Based Image Segmentation for Lung
Cancer
K. Kaviarasu , V. Sakthivel
22
23
CHAPTER 3
SYSTEM ANALYSIS
24
The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This
is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
Two key considerations involved in the feasibility analysis are,
Economic Feasibility
Technical Feasibility
Study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the
developed system as well within the budget and this was achieved because most of
the technologies used are freely available. Only the customized products had to be
purchased.
Study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on
the available technical resources. This will lead to high demands on the available
technical resources. This will lead to high demands being placed on the client. The
developed system must have a modest requirement, as only minimal or null changes
are required for implementing this system.
25
CHAPTER 4
UML DIAGRAMS
The Unified Modeling Language (UML) was created to forge a common,
semantically and syntactically rich visual modeling language for the architecture,
design, and implementation of complex software systems both structurally and
behaviorally. UML has applications beyond software development, such as process
flow in manufacturing.
It is analogous to the blueprints used in other fields, and consists of different
types of diagrams. In the aggregate, UML diagrams describe the boundary, structure,
and the behavior of the system and the objects within it.
UML is not a programming language but there are tools that can be used to
generate code in various languages using UML diagrams. UML has a direct relation
with object-oriented analysis and design.
A use case is a list of steps that define interaction between an actor (a human
who interacts with the system or an external system) and the system itself.
Use case diagrams depict the specifications of a use case and model the
functional units of a system.
These diagrams help development teams understand the requirements of their
system, including the role of human interaction therein and the differences
between various use cases.
A use case diagram might display all use cases of the system, or just one
group of use cases with similar functionality.
To begin a use case diagram, add an oval shape to the center of the
drawing.
Type the name of the use case inside the oval.
26
Represent actors with a stick figure near the diagram, then use lines to
model relationships between actors and use cases.
image of lungs
extracting features
User Application
predict disease
To create a sequence diagram, write the class instance name and class name in
a rectangular box.
Draw lines between class instances to represent the sender and receiver of
messages.
27
:User
:User :Application
:Application :Image
Processing
2: stored on dataset()
4: Predict disease()
28
:User :Image
Processing
2: stored on dataset()
:Application
Activity diagrams show the procedural flow of control between class objects,
along with organizational processes like business workflows. These diagram are
made of specialized shapes, then connected with arrows. The notation set for activity
diagrams is similar to those for state diagrams.
User
Feed images on
dataset
Extract feature
of image
Predict
disease
30
CHAPTER 5
REQUIREMENT SPECIFICATION
The requirements specification is a technical specification of requirements for
the software products. It is the first step in the requirements analysis process it lists
the requirements of a particular software system including functional, performance
and security requirements. The purpose of the requirements analysis is to identify
and assess the system requirements for the proposed system.
31
CHAPTER 6
MODULES
The following are the modules of the project, which is planned in aid to complete the
project with respect to the proposed system, while overcoming existing system and
also providing the support for the future enhancement.
32
K-means is one of the simplest unsupervised learning algorithms that solve the
well-known clustering problem.
The procedure follows a simple and easy way to classify a given data set
through a certain number of clusters (assume k clusters) fixed apriori. The
main idea is to define k centers, one for each cluster.
These centers should be placed in a cunning way because of different location
causes different result.
So, the better choice is to place them as much as possible far away from each
other. The next step is to take each point belonging to a given data set and
associate it to the nearest center.
When no point is pending, the first step is completed and an early group age is
done. At this point we need to re-calculate k new centroids as barycenter of the
clusters resulting from the previous step.
After we have these k new centroids, a new binding has to be done between the
same data set points and the nearest new center. A loop has been generated.
As a result of this loop we may notice that the k centers change their location
step by step until no more changes are done or in other words centers do not
move any more.
34
6.4 CLASSIFIER
36
CHAPTER 7
SYSTEM TESTING
The main objective of testing is to uncover a host of errors, systematically and with
minimum effort and time.
37
White Box texting is also known as glass box testing. This type of testing, tests
the internal structure of the program. This can be applied at the unit, integration and
system levels of testing. Mostly, it is used in the unit level of the software testing
process. Sometimes it may not reveal defects in areas which have not been
implemented. It has its own advantages and its own disadvantages. The advantage is
that knowing the programming language code and familiarizing with them may prove
vital and help in identifying the errors quickly and at times may help in avoiding
them at the earliest.
38
It is amongst the two methods of mostly used testing methods. This tests the
main functionality of the program. It can be applied to every level of testing such as
Unit, Integration, System and Acceptance levels of testing. Exhaustive input testing
is required to find all errors. For doing this type of testing knowing the internal code
and how it works is not needed but what it is supposed to do is known by the person
who is performing the test. The test cases are developed based on the specific
requirements according to the goals. There are Boundary Valve Analysis, Class
Partitioning, and Cause Effect Graph etc.
UNIT TESTING
FUNCTIONAL TESTING
39
INTEGRATION TESTING
VALIDATION TESTING
Validation test succeeds when the software functions in a manner that can be
reasonably expected by the client. Software validation is achieved through a series of
black box testing which confirms to the requirements. The software is validated
based on the series of tests that it passes through according to the condition posed by
the customer. Mostly the customer main requirements would be to make every
process as simple as possible and to reduce the complexity of the usage of the final
product. Taking all these conditions into mind the validation testing is done and the
various test cases are design.
SYSTEM TESTING
40
the system testing is done so as to ensure that the requirements are fulfilled properly.
All it basically does is it performs tests to find the discrepancies between the system
and its original objective, current specifications and system documentation. If any
discrepancy is to be found the respective errors will be rectified and again system
testing will be performed to make sure the rectification does not introduce a new
error into the system.
STRUCTURE TESTING
41
42
CHAPTER 8
CONCLUSION
43
APPENDIX
SCREENSHOTS
44
Equalization process
45
Binarization process
Cuckoo Search
46
47
Normalization process
48
ANN Classification
Final Result
49
REFERENCES
1. Ada, Rajneet Kaur” Feature Extraction and Principal Component Analysis for
Lung Cancer Detection in CT scan Images” International Journal of Advanced
Research in Computer Science and Software Engineering, Volume 3, Issue 3,
March 2013.
2. Almas Pathan, Bairu.K.saptalkar, “Detection and Classification of Lung
Cancer Using Artificial Neural Network”, International Journal on Advanced
Computer Engineering and Communication Technology Vol-1 Issue :2011.
3. American Cancer Society, “Cancer facts & figures2010”
http://www.cancer.org/acs/groups/content/@epidemiologysurveilance/docume
nt s/document/acspc026238.pdf (2010).
4. Arvind Kumar Tiwari” Prediction Of Lung Cancer Using Image Processing
Techniques: A Review” Advanced Computational Intelligence: An
International Journal (ACII), Vol.3, No.1, January 2016.
5. C. Jeya Bharathi, Dr. P. Kabilan” Analysis and Edge Detection of Lung Cancer
– Survey” International Journal on Recent and Innovation Trends in
Computing and Communication ISSN: 2321-8169 Volume: 4 Issue: 5.
6. Dasu Vaman Ravi Prasad,“Lung cancer detection using image processing
techniques”, International journal of latest trends in engineering and
technology.(2013)
7. Fatma Taher1,*, Naoufel Werghi1, Hussain Al-Ahmad1, Rachid Sammouda2,
“Lung Cancer Detection Using Artificial Neural Network and Fuzzy
Clustering Methods,” American Journal of Biomedical Engineering 2012, 2(3):
136-142
8. Morphological Operators, CS/BIOEN 4640: “Image Processing Basics”,
50
51