Heart Cancer Prediction Using Machine Learning
Heart Cancer Prediction Using Machine Learning
A Major Project Report Submitted in Partial Fulfillment for the Award of the Degree
of Bachelor of Technology in Information Technology
To
Dr. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY,
LUCKNOW
Submitted by:
Ved Prakash Srivastava(1900100130106)
Saurabh Mishra (1900100130081)
Mohd Shazan (1900100130060)
Shivam Malaviya(1900100130086)
UNDER THE SUPERVISION OF
Mr. Vivek Pandey
Assistant Professor
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNITED COLLEGE OF ENGINEERING AND RESEARCH,
PRAYAGRAJ
MAY 2023
/ INFORMATION TECHNOLOGY
CANDIDATE DECLARATION
We, hereby certify that the project entitled “Prediction Of Heart Cancer By Using
Machine Learning” submitted by us in partial fulfillment of the requirement for the
award of degree of the B. Tech. (Information Technology) submitted to Dr. A.P.J.
Abdul Kalam Technical University, Lucknow at United College of Engineering and
Research, Prayagraj is an authentic record of our own work carried out during a
period from June, 2022 to May, 2023 under the guidance of Assistant Prof. Vivek
Pandey, Department of Computer Science & Engineering). The matter presented in
this project has not formed the basis for the award of any other degree, diploma,
fellowship or any other similar titles.
Signature of the Student
(Ved Prakash Srivastava, Roll No; 1900100130106)
Signature of the Student
(Saurabh Mishra, Roll No; 1900100130081)
Signature of the Student
(Mohd. Shazan, Roll No; 1900100130060)
Signature of the Student
(Shivam Malaviya, Roll No; 1900100130086)
Place:Prayagraj
Date:
CERTIFICATE
This is to certify that the project titled “Prediction Of Heart Cancer By Using
Machine Learning” is the bonafide work carried out by Ved Prakash Srivastava
(1900100130106), Saurabh Mishra (1900100130081), Mohd. Shazan
(1900100130060) and (Shivam Malaviya, Roll No; 1900100130086) in partial
fulfillment of the requirement for the award of degree of the B. Tech. (Information
Technology) submitted to Dr. A.P.J Abdul Kalam Technical University, Lucknow at
United College of Engineering and Research, Prayagraj is an authentic record of their
own work carried out during a period from June, 2022 to May, 2023 under the
guidance of Assistant Prof. Vivek Pandey, Department of Computer Science &
Engineering). The Major Project Viva-Voce Examination has been held on
__________________.
Signature of the Guide _________________________________
[Mr. Vivek Pandey]
Signature of Project Coordinator ________________________
[Mr. Shyam Bahadur Verma]
Signature of the Head of Department _____________________
[Dr. Vijay Kumar Dwivedi]
Place:
Date:
ABSTRACT
The Heart are the centre of breath control and ensure that every cell in the body
receives oxygen. At the same time, they filter the air to prevent the entry of useless
substances and germs into the body. The human body has specially designed defence
mechanisms that protect the Heart. However, they are not enough to completely
eliminate the risk of various diseases that affect the Heart. Infections, inflammation or
even more serious complications, such as the growth of a cancerous tumour, can
affect the Heart. Heart cancer generally occurs in both male and female due to
uncontrollable growth of cells in the Heart. This causes a serious breathing problem in
both inhale and exhale part of chest. Cigarette smoking and passive smoking are the
principal contributor for the cause of Heart cancer as per world health organization.
The mortality rate due to Heart cancer is increasing day by day in youths as well as in
old persons as compared to other cancers. Even though the availability of high-tech
medical facility for careful diagnosis and effective medical treatment, the mortality
rate is not yet controlled up to a good extent. Therefore, it is highly necessary to take
early precautions at the initial stage such that it’s symptoms and effect can be found at
early stage for better diagnosis. Machine learning now days has a great influence to
health care sector because of its high computational capability for early prediction of
the diseases with accurate data analysis. The Heart are the centre of breath control and
ensure that every cell in the body receives oxygen. At the same time, they filter the air
to prevent the entry of useless substances and germs into the body. The human body
has specially designed defence mechanisms that protect the Heart. However, they are
not enough to completely eliminate the risk of various diseases that affect the Heart.
Infections, inflammation or even more serious complications, such as the growth of a
cancerous tumour, can affect the Heart. In this work, we used machine learning (ML)
methods to build efficient models for identifying high-risk individuals for incurring
Heart cancer and, thus, making earlier interventions to avoid long-term complications.
The suggestion of this article is the Rotation Forest that achieves high performance
and is evaluated by well-known metrics, such as precision, recall, F-Measure,
accuracy and area under the curve (AUC). More specifically, the evaluation of the
experiments showed that the proposed model prevailed with an AUC of 99.3%,
F-Measure, precision, recall and accuracy of 97.1%.
ACKNOWLEDGEMENT
We express our sincere gratitude to the Dr. A.P.J Abdul Kalam Technical University,
Lucknow for giving us the opportunity to work on the Major Project during our final
year of B.Tech. (IT) is an important aspect in the field of engineering. We would like
to thank Dr. H.P. Shukla, Principal and Dr. Vijay Kumar Dwivedi, Head of
Department, CSE at United College of Engineering and Research, Prayagraj for their
kind support. We also owe our sincerest gratitude towards Assistant Prof. (Mr. Vivek
Pandey) for his valuable advice and healthy criticism throughout our project which
helped us immensely to complete our work successfully. We would also like to thank
everyone who has knowingly and unknowingly helped us throughout our work. Last
but not the least, a word of thanks for the authors of all those books and papers which
we have consulted during our project work as well as for preparing the report.
List of Figure
Figure 1. CT Scan image for Heart Cancer 2
Figure 2. Distribution of participants among the age groups in the balanced data 24
Figure 3. Models Evaluation Based on AUC ROC Curves 24
List of Table
Table 1: Various Methodologies are Analysed for Cancer Prediction 13-14
Table 2. The order of features in the balanced data 19
Table 3. The breakdown of participants in the balanced data by feature 21
Table 4. Machine learning models’ settings 23
Table 5. Evaluation of performance 24
Table 6. Models’ comparison in terms of accuracy, recall and precision 24
Table of Contents
Title Page i
Declaration of the Student ii
Certificate iii
Abstract iv
Acknowledgement v
List of Figure vi
List of Table vi
1. INTRODUCTION 1
1.1 Types of Heart Cancer 2
1.1.1 Heart Nodules 2
1.1.2 Heart Cancer without small cell 2
1.1.3 Tiny Cell Heart Cancer 3
1.1.4 Mesothelioma 3
1.1.5 Breast Cancers 3
1.1.5.1 Chest Wall Tumour Types 4
1.1.6 Metastasized Cancer 4
1.2 Risk Factors for Heart Cancer? 4
1.2.1 Smoulder 4
1.2.2 Smoke inhalation 5
1.2.3 Radon 5
1.2.4 Various Substances 5
1.3 Radiation Therapy to the Chest 6
Heart cancer risk is increased for cancer patients who underwent chest radiation
therapy. 6
1.3.1 Diet 6
1.4 Symptoms of Heart Cancer? 6
7
1.5 What are the first signs of Heart cancer? 7
1.6 Heart cancer staging 7
1.7 Limited vs. extensive stage 7
1.8 What tests are done to identify Heart cancer? 8
1.8.1 Blood tests 8
1.8.2 Imaging 8
1.8.3 Biopsy 8
1.9 Types of Treatment for Heart Cancer 8
1.10 Side effects of the treatment 9
2. Literature Survey 11
2.1 Existing System 11
2.2 ANALYSIS 13
2.3 Our Proposed System 15
2.3.1 Support Vector Machine 15
2.3.2 Random Forest Classifier 15
2.3.3 K-Nearest Neighbours Algorithm 16
2.3.4 Introduction to Django Framework 17-28
2.3.5 Conclusion 29
3. System Analysis & Design 30
3.1 Dataset Description 31
3.2 Data Pre-processing 32
3.3 Training and Testing Samples 32
3.4 Features Analysis 32
3.4 Machine Learning Models 33
3.5 Evaluation Metrics 34
4. Results and Discussion 36
4.1 Experiments Setup 36
8
4.2 Evaluation 36
4.3 Discussion 37
5. Conclusions 40
Future Work 41
References 46
9
1. INTRODUCTION
When bodily cells proliferate unchecked, a condition named as cancer outcomes.
When cancer develops in the Heart, it is referred to as Heart cancer. Other bodily parts,
such as lymph nodes, organs including the brain, the Heart can also be the site of the
start of Heart cancer. Heart cancer has the potential to spreading out to further organs.
The term "cancer cells" refers to cells which have spread from one organ to another.
The two main groups into which they are commonly separated are tiny cell and
non-tiny cell Heart malignancies, which include adenocarcinoma and squamous cell
carcinoma. These numerous types of Heart cancer have distinctive patterns of
development and therapeutic responses [1]. While small cell Heart cancer is more
common, non-small cell Heart cancer is more common. Heart cancer, which is also the
worst disease, is thought to be the main factor in high mortality in the modern world.
Compared to other cancers, Heart cancer has a greater impact on people, and as
expected, it currently occupies position seven in the fatality rate index, contributing
1.6% of world death [2]. The brain is affected by Heart cancer after it has spread to the
Heart. There are two primary classifications of Heart cancer. The two forms of Heart
cancer are tiny cell and non-tiny cell. Acute chest hurt, a dry wheeze, shortness of
inhalation, body weight loss, and other symptoms are possible in patients [3]. Doctors
who study the causes and progression of cancer emphasise the role of smoking and
passive Heart cancer is primarily caused by smoking. Heart cancer is treated with
abscission, chemo, diffraction, immune remedy, and other procedures. Despite this,
doctors can only diagnose Heart cancer once it has advanced, making the diagnosis
relatively weak [4]. To quickly and effectively lower the mortality rate with effective
control, early prediction prior to the last phase is essential. Even with the right
treatment and diagnosis, the prediction for Heart cancer is quite encouraging[5]. The
prognosis for Heart cancer varies depending on the patient's age and gender, and race
are all factors, as well as health status. The American Cancer Society calculates that a
patient's likelihood of surviving Heart cancer is 47% if it is identified at a young stage.
It is extremely improbable that Heart cancer in its early stages will be accidently
discovered on an X-ray image [6]. Lesions with a diameter of 510 millimetres or less
that are spherical are notoriously difficult to find. Figure 1 displays a CT scan of a
patient with Heart cancer.
Figure 1. CT Scan image for Heart Cancer
1.1 Types of Heart Cancer
The Heart own tissue is where the majority of Heart cancers are most frequently found.
The chest wall and Heart may also be affected by other, more uncommon kinds of
cancer [7].
1.1.1 Heart Nodules
A Heart nodule is microscopic substance growths. They could be metastatic tumours
that have spread from several body regions, benign, precancerous, or both [8]. In
general, a bigger nodule has a greater likelihood of having cancer than a smaller one.
A patient is treated when examined for inconsistent symptoms like an accident or
abdominal pain, Heart nodules are frequently discovered [9].
1.1.2 Heart Cancer without small cell
A majority typical among Heart cancers, non-tiny cell Heart cancer is one. More slowly
compared to Heart cancer with tiny cells, it grows and spreads. According to the kind
of cells that make up the tumour, in total, three basic category of Heart Cancer without
tiny cell:
11
● The form of non-tiny cell Heart cancer is the most familliar type that occurs
most frequently. It grows and spreads slower than Heart cancer with tiny cells
[10]. Heart cancer that is not tiny cell can be classified three primary categories
based on the type of cells that comprise tumour.
● Giant, abnormal-looking cells are present in a variety of malignancies known
as giant cell carcinomas [11]. These tumours frequently advance swiftly and
can start anywhere in the Heart.
● Epidermoid carcinoma is another name for squamous cell cancer. It frequently
starts in the bronchi close to the centre of the Heart [12].
1.1.3 Tiny Cell Heart Cancer
Smoking causes tiny cell Heart cancer in almost all situations. In contrast to other
forms of Heart cancer, it spreads swiftly and grows quickly. The two forms of small
cell Heart carcinoma are as follows [13]:
● Tiny cell carcinoma (oat cell cancer, which accounts for the majority of tiny
cell Heart malignancies);
● Tiny cell carcinoma combined
1.1.4 Mesothelioma
The most frequent source of the uncommon cancer of the lining of the chest,
mesothelioma, is asbestos exposure. It is the root due to around 5% of Heart cancer
cases. It takes between 30 and 50 years between being exposed to asbestos and getting
the disease for mesothelioma to appear [14]. The majority of those who get
mesothelioma worked in environments where asbestos fibres were breathed. When
mesothelioma is found, it is staged, which tells the patient and the treating doctor how
big the tumour is and where it has spread from the initial site. Surgery, radiation
therapy, and chemotherapy are available treatments for mesothelioma [15]. Currently
being studied are combined strategies that combine various treatments, including the
use of chemotherapy before surgery and novel medications that precisely target
mesothelioma cells.
12
1.1.5 Breast Cancers
Breast Cancers tumour is uncommon. Tumours detected in the chest wall can be
benign or malignant, like other malignancies [16]. Tumours with cancer must be
treated. In relation to their location and the symptoms they produce, benign tumours
will be treated. For instance, a tumour needs to be treated if it presses against a Heart
and prevents the patient from breathing.
1.1.5.1 Chest Wall Tumour Types
Whether a tumour in the chest wall is primary (beginning there) or metastatic
(spreading there a malignancy that originated elsewhere, like the breast), is another
way to classify them. Every metastatic tumour is cancerous [17]. Chest wall tumours
are more frequently primary in children than metastatic in adults. Main tumours
originate in the muscles or bones that make up the chest wall.
1.1.6 Metastasized Cancer
Some Heart cancers are the consequence of pulmonary metastasis, which is when
cancer spread from the Heart's neighbouring bodily portion via the lymphatic or
circulation. Heart metastases can develop from almost any malignancy [18]. Several
malignancies frequently among the spread to the Heart are;
Ex- cancers of the bladder, breast, colon, kidney, liver, neuroblastoma, prostate,
sarcoma, and Wilms' tumour
1.2 Risk Factors for Heart Cancer?
Multiple Risks elements have been recognised through means of studies that could
increase your chance of spreading Heart cancer.
1.2.1 Smoulder
Heart cancer chance is primarily increased by smoking. For 80% to 90% of Heart
cancer fatalities in the US, smoking cigarettes is to blame. Smoking tobacco,
including cigarettes, cigars, and pipes, raises the chance of Heart cancer developing.
There are about 7,000 compounds in tobacco smoke, Consequently, it is very
poisonous. Lots of them are lethal. One way or another, minimum 70 of them have
been joined to either human or animal cancer [19].
13
Smokers have a 15–30-fold higher danger of non-smokers to acquire Heart cancer or
die from it. Even light or infrequent cigarette usage raises the chance of Heart cancer.
Smoking more frequently and for longer periods of time raises the chance.
Smokers who left smoking have a lower chance of Heart cancer compare to they would
have otherwise, but they still have a higher risk than non-smokers [20]. Smoking
cessation can lower the danger of Heart cancer at any age.
In practically each and every bodily part, smoulder increases the chance of cancer.
Smoking shoots up the risk of grow a number of cancers, including those of the voice
box (larynx), trachea, stomach, colon, rectal, liver, pancreas, mouth, throat,
oesophageal, stomach, colon, and bronchial.
1.2.2 Smoke inhalation
Heart cancer risk is also increased by second hand smoke, which includes tobacco,
cigar, and pipe smoke. Anyone who inhales second hand smoke is doing the same
thing as someone who smokes [21]. One in four non-smokers and 14 million children
in the United States during 2013 and 2014 were exposed to second hand smoke.
1.2.3 Radon
In the US, smoking and radon are the two leading causes of Heart cancer. Water, soil,
and rocks can all be the source of the radon-filled natural gas. It has no flavour or
smell and is translucent. Radon may become trapped and start to build up in the air
when it enters homes or other buildings through cracks or holes [22]. Those People
occupy or are employed by these residences, businesses are exposed to high amounts
of radon. Heart cancer can develop after a long duration due to radon exposure.
The Environmental Protection Agency (EPA) in the United States estimates that
Radon is a factor in the annual death toll from Heart cancer of 21,000 persons. Heart
cancer is more likely to develop if you are exposed to radon in smokers compared to
non-smokers [23]. However, the EPA claims probably greater than 10% of deaths
from Heart cancer associated with radon occur in smokers who have never smoked
cigarettes. Nearly one in every fifteen homes in the US have excessive radon levels.
Find out how to radon test your home and how help reduce radon levels if they are
excessive.
14
1.2.4 Various Substances
various examples of pollutants that can be found in various sectors and the risk is
elevated by asbestos, arsenic, diesel exhaust, and particular types of silica and
chromium [24]. Numerous these medications put smokers at a notably elevated risk of
getting Heart cancer. If someone lives in an area with higher air pollution levels, their
risk of developing Heart cancer may increase.
1.3 Radiation Therapy to the Chest
Heart cancer risk is increased for cancer patients who underwent chest radiation
therapy.
1.3.1 Diet
Researchers are examining a wide range of check out the meals and dietary
supplements if they affect the chance of acquiring Heart cancer. There is still much to
discover. We are aware that those who both smoke and use beta-carotene supplements
are more prone to developing Heart cancer [25]. Visit Heart Cancer Prevention for
additional facts. Additionally, drinking water contaminants like radon and arsenic
(mostly from private wells) can raise the potential for Heart cancer.
1.4 Symptoms of Heart Cancer?
Heart cancer signs and manifestation can differ between individuals. Some people
struggle with Heart complication. Few individuals accompanied with metastatic Heart
cancer (extend to other organs) experience symptoms unique to that body area. Some
folks just exhibit general illness feature. Usually, patients with Heart cancer don't
exhibit feature until their illness has gotten worse. Some signs of Heart cancer include
[26];
● An escalating or persistent cough.
● Chest ache.
● Breathlessness.
● Wheezing.
● Sneezing blood.
● Constantly feeling really exhausted.
● Loss of weight without apparent cause.
15
Frequent Two more anomalies associated pneumonia attacks and swelling or
expanded lymph nodes (glands) into the chest, close to the Heart, are symptoms
associated with Heart cancer.
1.5 What are the first signs of Heart cancer?
A chronic influenza or a cough does not improve despite treatment may occasionally
be a pioneer symptom of Heart cancer, while additionally, it may indicate less serious
conditions. among the most typical signs of Heart cancer is a determined or escalating
wheeze, which can also cause shortness of inhaling, chest hurt, hoarseness, or
unexplained body weight loss.
According to where in the Heart the cancer first emerges, several of these signs may
show up quick (in phase I or II), while they typically don't until the infection has
going to a latest phase. In light of this, it's imperative to get checked out if you have a
greater than average chance of acquiring Heart cancer.
1.6 Heart cancer staging
There are numerous size and spread combinations for each stage that can fit into that
group. For example, while the main tumour in a level III cancer may be little than one
in a Stage II cancer, supplementary parameters may have raise the cancer to that level.
Heart cancer is generally staged as follows:
● Stage 0 (in-situ): The bronchus or Heart's upper lining has cancer. It hasn't
gotten outside the Heart or into other Heart tissue.
● Stage I: The Heart-specific cancer has not spread elsewhere.
● Stage II: A tumour in a Heart lobe that is larger than Stage I, has migrated to
internal lymph nodes, or contains many tumours.
● Stage III: More advanced stage II cancer, metastases to nearby lymph nodes
or structures, or several tumours in different lobes identical Heart.
● Stage IV: The cancer has unfurled to the fluid surrounding the heart, the other
Heart, the fluid surrounding the Heart, and other isolated organs.
16
1.7 Limited vs. extensive stage
Although doctors now refer to small cell Heart cancer as being in stages I through IV,
you may also hear the terms restricted or extensive phase used. This depends on the
area's able to be fixed with only radiation field.
● The lymph nodes in the middle of the chest or above the collar bone on the
same side are occasionally involved in limited stage SCLC, which is localised
to one Heart.
● One Heart has developed advanced stage SCLC that has progressed to the
lymph nodes on the opposite side of the Heart, the other Heart, or other body
parts.
1.8 What tests are done to identify Heart cancer?
Blood tests, imaging procedures, and fluid or tissue biopsies are some of the tests your
healthcare professional might request or carry out.
1.8.1 Blood tests
Blood tests can help your doctor examine your organs and other body components to
see how they are functioning, but they cannot detect cancer on their own.
1.8.2 Imaging
Images from chest X-rays and CT scans might show your doctor changes in your
Heart. In order to assess a troubling CT scan finding or to ascertain whether cancer
has spread following a cancer diagnosis, PET/CT scans are frequently performed.
1.8.3 Biopsy
Your doctor may do a number of procedures to get a closer look at what's happening
inside your chest. Your doctor may perform a biopsy during the same procedures to
get specimen of tissue or fluid that can be examined with a microscope to identify the
kind of cancer and look for cancer cells. Testing for genetic abnormalities that might
impact your therapy is another option for samples.
1.9 Types of Treatment for Heart Cancer
There are numerous varieties of treatments obtainable according on the type and
phase of the Heart cancer. Victim with non-tiny cell Heart cancer may receive treatment
17
with enucleation, chemo, emission, targeted remedy, or a composition of these [27].
The tiny cell Heart cancer patients usually receive both chemotherapy and radiation
therapy.
Surgery- A Strategy where cancerous tissue is sperate by doctors.
Chemotherapy- Reducing or eliminating cancer with novel drugs. Occasionally, both
oral and intravenous administration of the medication is permitted.
Radiation therapy- Eradicating cancer with high-energy radiation resembling an
X-ray.
Targeted therapy- Drug treatment to halt the growth and spread of cancer cells.
Tablets or intravenous injections of the substances are both options. Tests will be
performed on you to establish whether targeted therapy is suitable for your particular
type of cancer before beginning treatment [27].
To treat Heart cancer, many medical specialties typically work together.
Medical surgeons that specialised in Heart infection are referred to as pulmonologists.
Surgeons are health care providers who execute operations. Thoracic surgeons
specialise in surgeries entangled the chest, heart, and Heart[28]. Medical doctors
called oncologists utilise medication to treat cancer. Medical specialists known as
radiation oncologists utilise radiation to treat cancer.
1.10 Side effects of the treatment
There can be negative effects depending on the Heart cancer treatment method chosen.
Your doctor can explain to you the potential issues to look out for and the potential
side effects of your specific treatment.
● There is a chance that any medication, including prescription drugs,
over-the-counter (OTC) medications, alternative, herbal, or complementary
therapies, as well as vitamin supplements, will have unfavourable reaction.
● Before the Food and Drug Administration (FDA) of the United States or a
comparable body in another country would approve a medication, the drug
company is required to identify all known adverse effects of the drug.
18
● Adverse effects must be disclosed, investigated in human clinical studies, and
mentioned in patient information leaflets (PILs). The PIL is offered with the
sale of medications and medical supplies to the general population.
The FDA encourages patients to report negative pharmaceutical side effects.
When a patient disobeys a doctor's orders, this is known as non-compliance, or
non-adherence, and can have negative implications.
Examples comprise:
● Refusing to take a medication that a doctor has given stopping an exercise
programme to improve a leg because it hurt
● When a person takes a medication for the first time, stops using it, or changes
the dosage, these are the times when adverse effects are most likely to occur.
● The FDA encourages patients to report negative pharmaceutical side effects.
When a patient disobeys a doctor's orders, this is known as non-compliance, or
non-adherence, and can have negative implications.
2. LITERATURE SURVEY
In this section, we'll go through the dataset we used as well as the two key
components of the methodology we used to forecast the risk of Heart cancer: class
stabilize and feature in the stabilize data ranking. Additionally, we will note the
theoretical traits' incidence frequencies in bond to the major subtypes of Heart cancer.
Performance indicators and ML models are also included.
3.1 Dataset Description
The current study was supported by a public dataset [39]. There are 309 participants,
and each participant's attributes—1 for the target class and 15 for the ML
models—are listed as follows:
• Gender [37]: The gender of the person is indicated by this attribute.
• Age (years) [38]: The person's age is recorded using this feature.
• Smoking [39]: This characteristic lets you know whether or not a user smokes.
• Yellow fingers [40]: Whether a participant has yellow fingertips or not is indicated
by this characteristic.
• Anxiety [41]: This function reveals if the user is feeling nervous or not.
• Peer pressure [42]: This feature records whether or not the individual experiences
peer pressure.
• Chronic disease [43]: This element indicates whether or not the person has a
chronic illness.
• Fatigue [44]: Whether the participant is fatigued or not affects how this feature
behaves.
• Allergy [45]: Whether the participant is fatigued or not affects how this feature
behaves.
• Wheezing [46]: This attribute indicates whether or not the participant has wheezing.
• Alcohol [47]: This function reveals if the user drinks liquor or not.
30
• Whoop [48]: This characteristic relates to whether or not the participant coughs.
• Shortness of breath [49]: This characteristic deals with the participant's level of
breathlessness.
• Swallowing difficulty [50]: This characteristic lets you know whether or not the
individual has trouble swallowing.
• Chest pain [51]: This feature records whether or not the individual is experiencing
chest pain.
• Heart Cancer: This function indicates whether or not the user has received a Heart
cancer diagnosis.
With the exception of age, which is a number, all the attributes are nominal.
3.2 Data Pre-processing
Due to the absence of outliers or missing values in the dataset we used, we must
emphasise that no processing was done on it. We used SMOTE [56] to address the
participants' significantly skewed class distribution between classes for people with
Heart cancer (88%) and people without Heart cancer. In the commonly used SMOTE
technique, fabricated data [57] for the second string class, Non-Heart Cancer, is
created using a 5-NN classifier. This fabricated data is oversampled to ensure implies
there is an equal distribution of cases between the two groupings.
3.3 Training and Testing Samples
A neural network, a form of artificial intelligence, is used to train the input data
specimen and then test them. At the start of the procedure, the weights of the neural
network are generated randomly from the input data. The same dataset that was
utilised for training the neural networks serves as the basis for their evaluation. To
determine the frequency of errors or error rates that occur during classification
process, data is weighted. Errors are then corrected by reweighting the dataset.
3.4 Features Analysis
We then calculated the importance score of each feature that was involved in
the features analysis for the target class. Two feature ranking techniques—gain ratio
and random forest—were taken into consideration for this purpose. We put in the gain
31
ratio (GR) method [58], which assigns a score based on GR (fi) = (H(c)-H(c|
fi))/(H(fi)) where H(c) is the entropy of the variable that captures the class values,
H(c| fi) and H(fi) are the conditional entropy of the class given the feature, and the
entropy of the feature fi (i = 1, 2, 3, . . . , 15), respectively. In order to evaluate a
feature's capacity to best distinguish between instances in the two classes, Random
Forest computes the Gini impurity [59]. Table 1 displays the ranking scores in
downward-sloping. We can observe that five out of fourteen features were placed in
the same sequence as significance by both approaches based on the calculated scores,
while some of the other features were arranged in proximal or reverse order. Values
that are close to 0 and/or negative indicate characteristics that are of low or no
importance. All of the qualities will be taken into account while training and
validating the models because they are necessory predictors of Heart cancer
development and medical professionals' guidance of it.
Table 2. The order of features in the balanced data
Random Forest Gain Ratio
Age 0.3463 Senstivity 0.3952
Senstivity 0.2808 Liquor 0.3698
Liquor 0.2664 SwallowDifficulty 0.3255
Inhaling 0.2568 Inhaling 0.3082
Whoop 0.2443 PeerPressure 0.293
SwallowDifficulty 0.2328 Coughing 0.2475
PeerPressure 0.2244 Age 0.1565
ChronicDisease 0.1663 ChronicDisease 0.1176
ChestPain 0.0959 ChestPain 0.0435
Unease 0.0775 YellowFingers 0.0292
Smoulder 0.0752 Unease 0.028
YellowFingers 0.0726 Smoulder 0.023
ShortnessofBreath 0.0433 ShortnessofBreath 0.0135
Sex −0.005
5
Sex 0.0026
Exhaustion −0.033
3
Exhaustion 0.0009
The breakdown of participation per age group is also shown in Figure 1. We note that
the age range 60–64 has the largest frequency of Heart cancer cases, with those 50–79
years old being the most commonly affected.
32
Figure 2. Distribution of participants among the age groups in the balanced data.
Table 2 displays the apparent of the traits in every class. Men and women are almost
uniformly likely to be given a Heart cancer diagnosis based on their gender.
Additionally, based on this table, we can consequently, each of the characteristics we
examined is turned on in Heart cancer patients by 27% to 36%, despite the fact that a
significant number of patients reported these symptoms even before receiving a Heart
cancer diagnosis. Even though the illness hadn't formed, keeping an eye on risk
factors, warning signs, and subsequent clinical checks may assist to shut out or
decrease the disease's unfavourable outcome.
3.4 Machine Learning Models
During the study piece, several machine learning (ML) models were employed for the
topic at hand in order to compare how well they performed against one another. More
particular, we examined the Support Vector Machine (SVM), a widely used
kernel-based classifier [64]. Additionally, a linear classifier was trained using
stochastic gradient descent (SGD) [65] under an SVM convex loss function. [71] were
taken advantage of from the ensemble random forest (RF). Finally, a straightforward
artificial neural network and a distance-based classifier called K-nearest neighbours
(K-NN) [74] were assessed.
Table 3. The breakdown of participants in the balanced data by feature values and
class label
Feature
HeartCance
r Feature
HeartCance
r
33
Sex No Yes Senstivity No Yes
Women 26.12% 23.15% No 49.07% 19.05%
Men 23.88% 26.85% Yes 0.92% 30.94%
Smoudler No Yes Rasp No Yes
No 30.01% 21.30% No 47.45% 19.82%
Yes 20.01% 28.70% Yes 2.59% 30.18%
YellowFingers No Yes Liquor No Yes
No 29.82% 19.81% No 48.73% 19.46%
Yes 20.18% 30.19% Yes 1.31% 30.52%
Unesae No Yes Chock No Yes
No 33.51% 23.70% No 45.01% 18.71%
Yes 16.49% 26.30% Yes 5.10% 31.31%
PeerInfluence No Yes
ShortnessofInhal
e No Yes
No 48.16% 23.15% No 11.77% 17.40%
Yes 1.87% 26.85% Yes 38.34% 32.58%
ChronicIllness No Yes ShallowProblem No Yes
No 41.84% 23.70% No 49.06% 24.06%
Yes 8.14% 26.30% Yes 0.93% 25.92%
tiredness No Yes Chestheart No Yes
No 15.92% 15.00% No 32.58% 20.38%
Yes 34.06% 35.00% Yes 17.42% 29.64%
3.5 Evaluation Metrics
Accuracy, precision, recall, F-Measure, and AUC metrics were taken into
consideration to evaluate the performance of the machine learning models [75]. The
bequest of the scepticism matrix, that comprises the elements true positive (TP), true
negative (TN), false positive (FP), and false negative (FN), will be evaluated in
respect to the necessary metrics:
The number of occurrences from all of the data that were correctly predicted is
measured and used to evaluate the presentation of the classification job. We also
looked at recall, which measures a model's sensitivity to distinguish between patients
who genuinely had Heart cancer and were rightly classified as productive in
comparison to all deserving contributors. While recall is a gauge of number, precision
34
is a gauge of quality. The F-Measure, which combines precision and recall into a
single score, enables the evaluation of models. Finally, the AUC, which has a range
from 0 to 1, is used to identify the ML model that performs the excellent at
differentiating cases of Heart cancer from cases of non-Heart cancer. Separability is
measured by the AUC. When the AUC hits one, the models are completely capable of
differentiating between two class distributions.