Cervical Cancer Prediction Using Machine Learning
Cervical Cancer Prediction Using Machine Learning
(2100301540012)
Cervical Cancer Prediction
ADITI BHATT
Using Machine Learning
(2100301540003)
APOORVA
CHAUDHARY
(2100301540013)
GAURAV TIWARI
(2100301540027)
Keywords: machine learning (ML), cervical cancer, human papillomavirus (HPV), gradient
boosting, support vector machine (SVM)I
1. Introduction
Human life is plagued with difficulties because it is difficult to predict when problems arise.
In general, women usually experience several difficulties in their lifetime. One of the most
critical ailments they may face is cervical cancer, which causes many problems [1]. The
elevated mortality age of uterine cancer is due to women’s lack of knowledge about the
importance of early detection [2]. Cervical cancer is a dangerous cancer, which threatens
women’s health worldwide, and its early signs are relatively difficult to detect [3]. It is
responsible for damaging deep tissues of the cervix and can gradually reach other areas of
the human body, such as the lungs, liver, and vagina, which can increase the difficulties
involved [4]. However, while cervical cancer is a slow-growing malignancy, precancerous
advances have made early detection, prevention, and therapy possible. Cervical cancer has
1
been reduced in most nations over past decades as detection technologies have improved.
This year, 4290 people are predicted to die from cervical cancer [5]. The fatality rate has
dropped by roughly half since the mid-1970s, thanks in part to enhanced screening, which
has resulted in the early identification of cervical cancer. The death rate has reduced from
over 4% per year in 1996–2003 to less than 1% in 2009–2018 [6]. The pre-invasive stages
of cervical cancer of the uterus last for a long time. Screening tests can provide successful
treatment of precancerous-stage lesions, so that cancer can be prevented. Nonetheless, it
has been determined that the death rate in underdeveloped nations is exceptionally high,
since they do not benefit from state-provided preventive strategies, such as free
immunization programs and national assessment programs.
When the cervix’s human papillomavirus (HPV) infection is left untreated, cervical cancer
develops [7]. Because it causes neoplastic development, the human papillomavirus (HPV) is
the most common infectious agent in cervical cancer. The improper proliferation of cervical
cancer cells and the multiplication of abnormal cells as a result of a malignant phase is
referred to as neoplastic progression [8]. The healthcare industry regularly generates
massive amounts of data that can be used to extract information for forecasting future
sickness based on a patient’s treatment history and health data. Furthermore, these areas
can be enhanced by leveraging crucial data in healthcare. Machine learning helps
individuals process vast amounts of complex medical data in healthcare and then analyze it
for therapeutic insights. Doctors can then use this information to provide medical care. As a
result, patient satisfaction can be improved when machine learning (ML) is employed in
healthcare.
Cervical cancer is one of the most common malignancies among women worldwide.
Recently, many studies have been conducted on cervical cancer using modern techniques
that provide prediction in the early stage. Using machine learning has contributed to early
prediction [9]. Therefore, the most important causes of this disease among female
populations are lack of awareness, lack of access to resources and medical centers, and the
expense of undergoing regular examination in some countries [10]. Machine learning has
improved the performance of analyses and the generation of accurate patient data. One
researcher [11] employed text mining, machine learning, and econometric tools to
determine which core and enhanced quality attributes and emotions are more relevant in
forecasting clients’ satisfaction in different service scenarios. Their paper presents findings
related to health product marketing and services, and proposes an automated and machine-
learning-based technique for generating insights. It also aids healthcare/health product e-
commerce managers improve the design and execution of e-commerce services. Moreover,
the importance of continuous quality improvement in the performance of machine learning
algorithms from a health care management and management information technologies
point of view is demonstrated in this paper by describing different kinds of machine
learning algorithms and analyzing healthcare data utilizing machine learning algorithms
[12].
Table 1
2
Evaluation
Source Used Dataset Classifiers Findings
Matrix
UCL-858 patients Cervical cancer
[13] ROC-AUC ML method
and 36 attributes diagnosis
Applied Cox
Patient
[14] N/A Neural network proportional
demographics
techniques
Hinslemann
[15] UCL repository ROC-AUC Decision tree
screening methods
Traditional
[16] EHRs AUC Random forest
approaches
G-mean and F- Handling the data
[17] N/A ADTree
measure imbalance
Machine
Using four target
learning (ML)
parameters:
Dataset collected algorithms are Decision tree
biopsy, cytology,
from the applied, such as algorithm shows a
[18] Schiller, and
University of decision tree higher value of
Hinselmann, as
California (UCI) and decision 98.5%.
well as 32 risk
jungle
factors
approaches.
Decision tree
Data mining The Microsoft algorithm, a higher
[19] (AUROC)
technique Azure ML tool value range of 97.8%
on the AUROC curve.
A majority of 557
women (70.2%)
A survey-based acquired their
study on cervical information from the
cancer to collect Using Stata 12.0 radio, while a
[20] N/A
data from 900 software. minority of 120
women aged 25 women (15.1%) got
to 49 years their information
from health care
organizations.
Assisted in
determining
Employing deep
Unbalanced cervical cancer, Machine
learning to predict
[21] medical image and benefits and learning
cervical cancer with
dataset drawbacks of approaches
high probability.
different
approaches
Used
Hinslemann Boosted decision
A dataset from
screening Deep-learning tree, decision forest,
[22] the University of
methods to neural network and decision jungle
California, Irvine
forecast cervical approaches.
cancer
3
Evaluation
Source Used Dataset Classifiers Findings
Matrix
The boosted decision
Electronic health Four machine
Random forest tree method
[23] record (EHR) learning
algorithm produced a precise
data classifiers
forecast of 98%.
Ant-miner, Suggested genetic
Data radiation on
RIPPER, Ridor, assistance as an
bone metastases Class imbalance
[24] PART, ADTree, optional strategy to
in cervical cancer learning (CIL)
C4.5, ELM, and enhance the validity
patients
Weighted ELM of the prediction.
Utilized to improve
Classification Method based classification
algorithms are on machine accuracy and shorten
[25] N/A
used to construct learning the time it takes to
the system approaches develop a
classification system.
Big data
Health analytics and Machine learning-
specialists and machine- based system might
Data related to
[26] other learning-based score as high as 86%
diabetes
stakeholders approaches on the diagnostic
collaborate may be used for accuracy Of DL.
diabetes.
Classify patient Boosted Score as high as 92%
UCI repository
[27] data to detect decision tree, on the diagnostic
dataset
cardiac disease decision forest accuracy of DL.
3. Methodology
The proposed research methodology is classified into several segments: research dataset, data
preprocessing, predictive model selection (PMS), and training method. Figure 1 depicts an
architectural diagram of the proposed research; by looking at Figure 1, it can be clearly observed
that the architectural diagram has been separated into four phases, because the model presented in
this research performs some essential tasks in each stage. Details on research data collection are
described in the Research Dataset section. The Data Preprocessing section mentions how to
remove noise from the dataset and make it useful for feeding in machine learning. The type of
predictive model selected to predict cervical cancer in this research is shown in the PMS portion.
The requisites for model training are shown in the Training Methods section. Finally, we design
the platform to provide an overall pipeline of cervical cancer prediction using the Python
programming language. This research implements an algorithm that is better suited for the
categorization of negative and positive cervical cancer diagnoses for clinical use. Cervical cancer
can be diagnosed with the help of algorithms including decision tree, logistic regression, support
vector machine (SVM), K-nearest neighbours (KNN), adaptive boosting, dradient boosting,
random forest, and XGBoost. The sequence and consequences are presented in the following
sections.
4
4. FUTURE SCOPE
The research opens up new possibilities for future exploration in the field of detecting cervical
cancer. The study uses various algorithms of machine learning to predict cervical cancer. This
domain can be further probed in various dimensions, which can include: 1. The research can
further explore deep learning algorithms such as convolutional neural networks to analyze
medical images and identify early signs, enabling the prediction of cervical cancer even earlier. 2.
The potential of wearable devices and mobile health applications can be investigated to collect
real-time data on lifestyle factors and symptoms, enabling personalized risk assessment and early
intervention. 3. Large-scale population studies can be conducted to validate the predictive model
across diverse demographics and geographical regions to ensure its effectiveness for different
populations. Furthermore, collaborating with healthcare providers and policymakers to integrate
the predictive model into routine screening programs enables proactive identification of high-risk
individuals and targeted interventions. 4. Investigating the application of natural language
processing (NLP) techniques in the analysis of electronic health records, with the aim of
extracting pertinent information for the prediction of cervical cancer. 5. In addition, other factors,
such as genetic data and biomarkers, can be incorporated to improve accuracy and early
detection.
References
1. Martin C.M., Astbury K., McEvoy L., Toole S., Sheils O., Leary J.J. Inflammation and
Cancer. Volume 511. Springer; Berlin, Germany: 2009. Gene expression profiling in
cervical cancer: Identification of novel markers for disease diagnosis and therapy;
pp. 333–359. [PubMed] [Google Scholar]
2. Purnami S., Khasanah P., Sumartini S., Chosuvivatwong V., Sriplung H. Cervical
5
cancer survival prediction using hybrid of SMOTE, CART and smooth support vector
machine. AIP Conf. Proc. 2016;1723:030017. [Google Scholar]
3. Yang X., Da M., Zhang W., Qi Q., Zhang C., Han S. Role of lactobacillus in cervical
cancer. Cancer Manag. Res. 2018;10:1219–1229. doi: 10.2147/CMAR.S165228. [PMC
free article] [PubMed] [CrossRef] [Google Scholar]
4. Ghoneim A., Muhammad G., Hossain M.S. Cervical cancer classification using
convolutional neural networks and extreme learning machines. Future Gener.
Comput. Syst. 2020;102:643–649.
doi: 10.1016/j.future.2019.09.015. [CrossRef] [Google Scholar]
5. Rehman O., Zhuang H., Muhamed Ali A., Ibrahim A., Li Z. Validation of miRNAs as
breast cancer biomarkers with a machine learning approach. Cancers. 2019;11:431.
doi: 10.3390/cancers11030431. [PMC free article] [PubMed] [CrossRef] [Google
Scholar]
10. Prabhpreet K., Gurvinder S., Parminder K. Intellectual detection and validation of
automated mammogram breast cancer images by multi-class SVM using deep
learning classification. Inform. Med. Unlocked. 2019;16:100151. [Google Scholar]
11. Sharif-Khodaei Z., Ghajari M., Aliabadi M.H., Apicella A. SMART Platform for
Structural Health Monitoring of Sensorised Stiffened Composite Panels. Key Eng.
Mater. 2012;52:581–584. doi: 10.4028/www.scientific.net/KEM.525-
526.581. [CrossRef] [Google Scholar]
12. Devi M.A., Ravi S., Vaishnavi J., Punitha S. Classification of cervical cancer using
artificial neural networks. Procedia Comput. Sci. 2016;89:465–472.
doi: 10.1016/j.procs.2016.06.105. [CrossRef] [Google Scholar]
6
13. Singh J., Sharma S. Prediction of Cervical Cancer Using Machine Learning
Techniques. Int. J. Appl. Eng. Res. 2019;14:2570–2577. [Google Scholar]
14. Asadi F., Salehnasab C., Ajori L. Supervised Algorithms of Machine Learning for
the Prediction of Cervical Cancer. J. Biomed. Phys. Eng. 2020;10:509–513. [PMC free
article] [PubMed] [Google Scholar]
15. Nithya B., Ilango V. Evaluation of machine learning based optimized feature
selection approaches and classification methods for cervical cancer prediction. SN
Appl. Sci. 2019;1:641. doi: 10.1007/s42452-019-0645-7. [CrossRef] [Google Scholar]
16. Lu L., Song E., Ghoneim A., Alrashoud M. Machine learning for assisting cervical
cancer diagnosis: An ensemble approach. Future Gener. Comput. Syst. 2020;106:199–
205. doi: 10.1016/j.future.2019.12.033. [CrossRef] [Google Scholar]
17. Alam T.M., Khan A., Iqbal A., Abdul W., Mushtaq M. Cervical cancer prediction
through different screening methods using data mining. Int. J. Adv. Comput. Sci.
Appl. 2019;10:346–357. doi: 10.14569/IJACSA.2019.0100251. [CrossRef] [Google
Scholar]
18. Mukama T., Ndejjo R., Musabyimana A., Halage A., Musoke D. Women’s
knowledge and attitudes towards cervical cancer prevention: A cross sectional study
in Eastern Uganda. BMC Women’s Health. 2017;17:9. doi: 10.1186/s12905
017-0365-3. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
19. Shetty A., Shah S. Survey of cervical cancer prediction using machine learning: A
comparative approach; Proceedings of the 2018 9th International Conference on
Computing, Communication and Networking Technologies (ICCCNT); Bengaluru,
India. 10–12 July 2018; pp. 1–6. [Google Scholar]
20. Bahad P., Saxena P. Study of adaboost and gradient boosting algorithms for
predictive analytics; Proceedings of the Intelligent Computing and Smart
Communication; Singapore. 20 December 2019. [Google Scholar]
21. Weegar R., Sundström K. Using machine learning for predicting cervical cancer
from Swedish electronic health records by mining hierarchical representations. PLoS
ONE. 2020;15:e0237911. doi: 10.1371/journal.pone.0237911. [PMC free
article] [PubMed] [CrossRef] [Google Scholar]
23. Šarenac T., Mikov M. Cervical cancer, different treatments and importance of bile
acids as therapeutic agents in this disease. Front. Pharmacol. 2019;10:484–513.
doi: 10.3389/fphar.2019.00484. [PMC free article] [PubMed] [CrossRef] [Google
Scholar]
7
24. Vos D., Verwer S. Efficient Training of Robust Decision Trees Against Adversarial
Examples; Proceedings of the International Conference on Machine Learning—PMLR
2021; Virtual. 18–24 July 2021; pp. 10586–10595. [Google Scholar]
25. Wang L. Support Vector Machines: Theory and Applications. Volume 177 Springer
Science & Business Media; Berlin, Germany: 2015. [Google Scholar]
26. Shankar K., Lakshmanaprabu S.K., Gupta D., Maseleno A., Albuquerque V.H.
Optimal feature-based multi-kernel SVM approach for thyroid disease
classification. J. Supercomput. 2020;76:1128–1143. doi: 10.1007/s11227-018-2469-
4. [CrossRef] [Google Scholar]
8
9
10