0% found this document useful (0 votes)
13 views

Multiple Disease Prediction Using Machine Learning and Deep Learning With The Im

Uploaded by

Librarian SCE
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Multiple Disease Prediction Using Machine Learning and Deep Learning With The Im

Uploaded by

Librarian SCE
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Multiple Disease Prediction using Machine

Learning and Deep Learning with the


2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings) | 979-8-3503-2234-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/AIBThings58340.2023.10292488

Implementation of Web Technology


Mostafizur Rahman Saiful Islam Sadia Binta Sarowar
Department of Electrical and Computer Department of Electrical and Computer Department of Electrical and Computer
Engineering Engineering Engineering
North South University North South University North South University
Dhaka , Bangladesh Dhaka , Bangladesh Dhaka , Bangladesh
[email protected] [email protected] [email protected]

Meem Tasfia Zaman


Department of Electrical and Computer
Engineering
North South University
Dhaka , Bangladesh
[email protected]

Abstract—Disease prediction is crucial in healthcare, web application that user could use. In our proposed work, we
enabling professionals to diagnose and treat diseases more have introduced a web-based software that is easy to use. This
effectively. In recent years, machine learning and web technology can potentially transform how we approach
technology have emerged as powerful tools for predicting healthcare by enabling early detection and personalized
various diseases. Machine learning algorithms can analyze large treatment plans. Disease prediction using machine learning
and complex datasets to learn patterns and relationships in the and web technology is a growing field that has the potential to
data, enabling them to make accurate disease prediction revolutionize healthcare. Machine learning algorithms can
technology. On the other hand, web technology can be used to analyze medical data to learn patterns and relationships in the
deploy machine learning models on platforms such as websites
data, enabling them to make accurate predictions about
or mobile apps, making them accessible to users. This research
addresses the need for early detection and diagnosis of diseases
diseases. On the other hand, web technology can be used to
like diabetes, heart disease, Parkinson’s disease, lung cancer and deploy machine learning models on platforms such as
brain stroke. To achieve this goal, various classification websites or mobile apps, making them accessible to users. In
algorithms of machine learning, including Support Vector this article, we discuss machine learning, deep learning
Machine, k-nearest neighbors, Decision Tree, Random Forest, algorithms and web technology for predicting multiple
AdaBoost, Gaussian Naive Bayes etc. and one deep learning diseases and outline the steps involved in building such a
model, Long Short-Term Memory, are implemented. The main system. We will also highlight some challenges and
goal of this paper is to create a web technology for predicting opportunities of using these technologies in healthcare.
multiple diseases and to outline the steps involved in building
such a system.
The rest of this paper is arranged as follows. The related work
in this field is discussed in Section II. Section III describes the
Keywords—machine learning, deep learning, web methodology followed to carry out this research. After that,
technology, multiple disease prediction. the results obtained from this paper are highlighted in Section
IV and the comparison of this result is featured in Section V.
I. INTRODUCTION Finally, the future work and conclusion are discussed in
Section VI and Section VII at the end of the paper.
As society progresses, people’s lifestyles and environmental
conditions are gradually changing. This leads to an increase in II. LITERATURE REVIEW
hidden risks connected with various diseases. Major diseases
like diabetes, heart diseases and brain strokes have a serious Numerous studies have already been done related to
impact on a global scale. In recent years, healthcare has predicting the diseases using different machine learning
witnessed noticeable advancements in applying machine techniques and algorithms which medical institutions can
learning and deep learning techniques for disease prediction. use. This part of the paper put insights on some of those
In previous work, researchers have explored various machine studies done in research papers, their techniques and results.
learning techniques for disease prediction. Principal Domes Kanchan B. et al. used Naive Bayes classification
component analysis was used by Dhomse et al. to study the (NB), Decision Trees (DT), and Support Vector Machines
prediction of particular diseases [1]. Heart diseases were (SVM) algorithm for disease prediction and they got 34.89%
predicted using multiple linear regression by Polaraju and accuracy for diabetes and 53% for heart disease. The study
Prasad [2]. Convolutional neural networks were the main explains how Principal Component Analysis (PCA) was used
focus of Ambekar and Phalnikar's study on illness risk to determine the bare minimum of attributes needed to
prediction [3]. Naive Bayes and random forest were used by improve the accuracy of different supervised machine
Jackins et al. to predict clinical illness [4]. A human-machine learning algorithms for heart disease prediction [1]. K. Polara
interface for illness pre-diagnosis was created by Gupta et al. Ju et al. used multiple linear regression models to predict the
[6]. Using machine learning techniques, Mohit et al. explored likelihood of developing heart disease, with an accuracy rate
the identification of several diseases [7]. These previous of about 75%. They used only statistical model for predicting
works did not propose any usable system that could easily be
heart disease. No machine learning model was applied here
used in daily life because they did not develop any software or

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.
[2]. In another research work done by Sayali Ambekar et al., B. Real-life Application
convolutional neural network-based disease prediction and The proposed solution of this paper can improve the
other machine learning algorithms were used. By using Naive possibility to detect various diseases on early stages. Several
Bayes, it was possible to predict breast cancer with an websites and pieces of software are optional with this
accuracy of 82%. They predicted heart disease in three solution. Patients can submit their data on a website, which is
different categories of risk; high, low and medium. They used subsequently stored on a server online. The computer model,
only two algorithms, KNN and Naive Bayes but did not which analyzes the data and forecasts the possibility of
develop any software [3]. Naive Bayes and Random Forest sickness, is hosted on a cloud server.
algorithms were used by V. Jackins et al. to classify diseases.
Their obtained accuracy rates for diabetes, coronary heart III. METHODOLOGY
disease, and cancer data were 74.46%, 82.35%, and 63.74%
A. Data Collection
respectively. They used only two algorithms and also did not
develop any software [4]. To predict diseases, Pahulpreet The dataset for this paper has been collected from Kaggle and
Singh Kohli et al. used Logistic Regression (LR), Decision Pima Indian Dataset and UCI Machine Learning Repository
Tree (DT), Support Vector Machine (SVM), Random Forest [12][13][14][15][16].
(RF), and Adaptive Boosting (AdaBoost). Accuracy levels of TABLE I. DATASET OVERVIEW
95.71 % for breast cancer, 84.42% for diabetes, and 87.12 %
Dataset name Number Number of Data Number
for heart disease were achieved through this work. The of Instances Format of Classes
limitation of their work was they did not implement their Features
machine learning model in any software. [5]. Prajval Gupta Diabetic 9 768 CSV 2
et al’s research developed two frameworks for disease pre- Heart 14 303 CSV 2
Lung Cancer 16 309 CSV 2
diagnosis using machine learning techniques including ANN,
Parkinson’s 24 195 CSV 2
SVM, and Decision Tree Induction. They use ANN but did Brain Stroke 12 5110 CSV 2
not develop any software. The overall accuracy of the system
came out to be nearby 89% [6]. An online application for Five datasets utilized in diverse medical investigations are
illness prediction was created by Indukuri Mohit et al. summarized in Table I. Each dataset includes details on
utilizing K-nearest neighbors, SVM, and Logistic various medical problems and is in comma-separated values
Regression. They reported accuracy rates of 76.60% for (CSV) format. For all the diseases there are between 9 to 24
diabetes, 94.55% for breast cancer, and 83.84% for heart features and 195 to 5110 occurrences. The primary emphasis
disease [7]. Saumya Gupta and Supriya Raheja proposed a of each dataset is a binary classification job with two classes.
method to predict stroke by using various machine-learning These databases are useful tools for performing medical
algorithms. 95%, 96%, and 97% accuracy ratings were studies and creating forecasting models for identifying and
attained using AdaBoost, XGBoost, and Random Forest comprehending various health issues.
Classifier [8]. The accuracy of k-nearest neighbors, Decision
Trees, Linear Regression, and SVM for predicting heart B. Mathematical Explanation and Evaluation of Algorithms
disease was compared by Archana Singh et al and, 83% Different machine learning and deep learning algorithms
accuracy was attained with SVM [9]. A diabetes prediction have been used such as Quadratic Discriminant Analysis
system based on machine learning was developed by (QDA), k-nearest neighbors (KNN), SVM, Linear
Priyanka Sonar and her colleagues. They got the accuracy of Discriminant Analysis (LDA), Naive Bayes algorithm
85% for Decision Tree, 77% for Naive Bayes, and 77.3% for (NBA), Decision tree (DT), Random Forest algorithm (RF),
SVM. They could use more algorithms and performance AdaBoost (AB), k-means clustering (KMC), XGBoost
metrices to achieve higher accuracy and best model [10]. (XGB), Gradient Boosting (GB), Neural-Network, and Long
Chinmayi Thallam et al proposed a method that involved Short-Term Memory (LSTM). RMSE (Root Mean Square
comparing various classification such as Support Vector Error), MAE (Mean Absolute Error), Recall, Precision, F1,
Machine (SVM), k-nearest neighbors (KNN), Random Forest R2 (R-squared), and k-fold Accuracy are commonly used
(RF), Artificial Neural Networks (ANN) and a hybrid model metrics in various fields, particularly in machine learning and
named Voting classifier. Support Vector Machine gave an statistics. These matrices will be used in this paper too and
output of 95% accuracy with 0.8 training data and 0.2 test their mathematical explanation of are given below.
data while Random Forest gave 97.5%, k-nearest neighbors
reached 97%, Neural Networks gave 95.99% and Voting RMSE(X,h) = ∑ (ℎ(𝑥 ( ) ) − 𝑦 ( ) ) (1)
Classifier gave 99.5% [11]. Overall, all these mentioned
previous works did not develop any software and did not use Equation (1) represents Root Mean Square Error (RMSE)
any database system to collect user input data to retrain which quantifies the overall accuracy of the model's
machine learning model. Also, most of them used a few predictions and provides a single value to compare different
machine learning algorithms. models. Lower RMSE values indicate better predictive
A. Aim of this work performance. Here, m is the total number of observations in
the data sets, x(i) is the predicted value for the ith observation,
The main goal of this work is to create a user-friendly web and y(i) is the actual value for the ith observation.
application where people can predict multiple diseases
accurately and simultaneously. By combining several
sickness detection techniques, the approach eliminates the MAE(X,h) = ∑ |ℎ(𝑥 ( ) − 𝑦 ( ) | (2)
need for additional websites or software.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.
Equation (2) shows the average absolute difference between were correctly predicted, the FPs are the instances that were
the expected and actual values, which is measured by Mean incorrectly predicted as positive, the TNs are the instances
Absolute Error (MAE). Here, m is the total number of that were correctly predicted as negative, and the FNs are the
observations in the data sets, x(i) is the predicted value for the instances that were incorrectly predicted as negative.
ith observation, and y(i) is the actual value for the ith
C. Software Architecture
observation. It gives a general gauge of prediction accuracy
and is utilized in regression tasks like RMSE. The software architecture in the Fig. 1 shows a three-tier
architecture, with a web app, a local server, and a database.
Recall = (3)
( ) The web app is responsible for serving the web apps to users
through a local server. The local server is responsible for
In binary classification tasks, recall is a statistic that is handling requests from the user's computer and
frequently utilized, particularly when the goal is to identify communicating with the web server. The database stores all
the positive class (as in the case of diagnosing diseases). It of the data that is used by the web app, including the user's
determines the proportion of real positives to all actual input. The machine learning model is used to make
positives. Recall measures the model's accuracy in predictions based on the user's data. When the user interacts
identifying all instances of positive data. True Positives (TP) with the web app, their data is sent to the local server. The
are the number of examples the model correctly identified as local server then sends the data to the web app, which sends
positive, and False Negatives (FN) are the number of it to the machine learning model.
examples incorrectly identified as negative in (3).

Precision = (4)
( )

Equation (4) shows an expression where True Positives (TP)


are the number of examples the model correctly identified as
positive, and False Positives (FP) are the number of examples
incorrectly identified as positive. The precision is calculated
by dividing the number of True Positives by the sum of the
True Positives and False Positives. A highly precise model
can correctly identify most of the examples that it identifies
as positive. Precision is often used along with recall, which
measures the model's ability to identify positive examples
correctly.
∗( ∗ )
F1 = (5)
( )

Equation (5) shows the representation of F1 score. It is a


harmonic mean of precision and recall, providing a single
metric that balances both. It is useful when someone wants to
consider both precision and recall simultaneously The F1
score ranges from 0 to 1, with higher values indicating better Fig. 1. Software Architecture of the Web App.
performance. The F1 score is often used to evaluate the
performance of a binary classification model, but it can also D. Feature of the System
be used for other types of classification models. Input Pages: There will be different input pages for different
diseases. Different parameters of each input page are given
𝑆𝑆 (6) below.
𝑅 = 1−
𝑆𝑆 1) Diabetics: Pregnancies (integer), Glucose (integer),
Equation (6) shows R2 score, which is a very important Blood Pressure (integer), Skin Thickness (integer), Insulin
metric that is used to evaluate the performance of a (integer), BMI (numerical), Diabetes Pedigree Function
regression-based machine learning model. It is pronounced as
(numerical), Age (integer), and Outcome (integer). The
R squared and known as the coefficient of determination. It
objective is to predict the presence (1) or absence (0) of
works by measuring the amount of variance in the predictions
diabetes based on these features.
explained by the dataset.
2) Parkinson’s: Various numerical features like
MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitte
Accuracy = (7)
r(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DD
P,MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,S
Equation (7) shows accuracy is the percentage of instances himmer:APQ5,MDVP:APQ,Shimmer:DDA, NHR, HNR,
that were correctly classified. It is calculated by dividing the RPDE, DFA, D2, and PPE. The target variable status
sum of TP and TN by the total number of instances.
(integer) indicates the presence (1) or absence (0) of
The confusion matrices show the number of true positives
Parkinson's disease.
(TP), false positives (FP), true negatives (TN), and false
negatives (FN) for each model. The TPs are the instances that

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.
3) Heart Disease: Features include age (integer), sex (NBA), and K-Means Clustering (KMC). The accuracy
(binary), cp (integer), trestbps (integer), chol (integer), fbs numbers in the table show how well each algorithm
(binary), restecg (integer), thalach (integer), exang (binary), performed for each job of diagnosing a certain ailment.
oldpeak (float), slope (integer), ca (integer), thal (integer). Greater accuracy ratings often represent the algorithm's
The target variable target (integer) denotes the presence (1) ability to accurately categorize occurrences into the relevant
or absence (0) of heart disease. illness category, which generally indicates higher
4) Brain Stroke: Categorical features such as gender, performance. It’s crucial to remember that the accuracy
work type, Residence type, smoking status, and binary figures do not, by themselves, give a comprehensive
features like hypertension, heart disease, ever married. evaluation of an algorithm's performance. To gain a deeper
knowledge of an algorithm's diagnostic capabilities, other
Numerical features include age, average glucose level, and
assessment matrices, including accuracy, recall, and F1-
bmi. The target variable stroke (integer) represents the
score, should be considered.
presence (1) or absence (0) of a stroke.
5) Lung Cancer: Categorical feature Gender, numerical B. Performance Evaluations of Deep Learning
feature age, and binary features smoking, yellow finger, We also implemented deep learning to observe the accuracy
anxiety, peer pressure, chronic disease fatigue, allergy, and other performances for disease prediction. The quantity
wheezing, alcohol consuming, coughing, shortness of breath, and caliber of the data, the model's level of complexity, and
swallowing difficulty, chest pain. The target variable lung the particular condition being forecasted can all affect how
cancer (categorical) indicates the presence or absence of lung well these models work. Fig. 2, 3 and 4 show accuracy
cancer. changes per epoch for Long Short-Term Memory (LSTM)
User Account: In our system, there will be three modules: model used for different disease prediction.
Admin, User (Patient), and Doctor. The website's homepage
will have two options for every user; login and sign up. Every
new user has to get registered through the admin. After
successful registration, user will be able to log in and predict
the diseases.
User information database: All user information will be
recorded in Django's database. This data will be used to retrain
the model. After retrain the model it will more accurate on
predicting diseases.
IV. RESULT
A. Performance Evaluations of Machine Learning
This section highlights the overview of the accuracy attained Fig. 2. The graph showing loss over epochs for diabetes and heart
by several machine learning algorithms applied on the data disease prediction.
sets for the prediction of diabetes, Parkinson’s disease, heart
disease, brain stroke and lung cancer.
TABLE II. MACHINE LEARNING ALGORITHM’S ACCURACY TABLE

Algorithms Accuracy
Diabetic’s Parkinson’s Heart. Brain Lung
SVM 0.83 0.95 0.83 0.77 0.96
DT 0.97 0.88 0.79 0.97 0.95
KNN 0.74 0.70 0.74 0.97 0.95
QDA 0.77 0.95 0.84 0.55 0.91
LDA 0.82 0.82 0.84 0.77 0.96
RF 0.93 0.93 0.82 0.99 0.96
AB 0.83 0.83 0.79 0.82 0.98
XGB 0.90 0.90 0.80 0.97 0.98
GB 0.88 0.88 0.80 0.84 0.96 Fig. 3. The graph showing loss over epochs for lung cancer and
NBA 0.80 0.80 0.81 0.64 0.95 Parkinson’s prediction.
KMC 0.58 0.58 0.80 0.62 0.46

The level of accuracies for different diseases are given in


Table II. The highlighted numbers in each column of Table II
represent the highest accuracy gained by different algorithms.
In our proposed system for binary classification,we
experimented with a few of the well-known machine learning
algorithms such as, Support Vector Machines (SVM),
Decision Trees (DT), k-nearest neighbors (KNN), Quadratic
Discriminant Analysis (QDA), Linear Discriminant Analysis
(LDA), Random Forests (RF), AdaBoost (AB), XGBoost
(XGB), Gradient Boosting (GB), Naive Bayes algorithm

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Web app home page

Fig. 4. The graph showing loss over epochs for brain stroke

LSTM models for illness prediction is a viable strategy for


enhancing patient outcomes. However, as this study's
findings show, several variables might affect how well these
model’s function. We compared the accuracy of the Long
Short-Term Memory (LSTM) models for the diagnosis of
various illnesses. Parkinson's, diabetes, heart disease, lung
cancer, and brain stroke are among the illnesses covered. The
LSTM model training outcomes for various illnesses are
displayed in Fig. 2, 3 and 4. The figures shows the number of
training epochs and the associated accuracy attained by the
models for a particular illness. The LSTM model was trained
for 10 epochs and has an accuracy of 55% for Parkinson's
disease. Parkinson's is a neurological condition that impairs
Fig. 6. Login page
mobility and is marked by symptoms including stiffness,
tremors, and trouble coordinating one's movements. The
LSTM model was likewise trained for 10 epochs with 71%
accuracy in the case of diabetes. High blood sugar levels
caused by insufficient insulin synthesis or inefficient insulin
use define diabetes as a metabolic condition. The same model
correctly identified the heart disease with an accuracy of 79%
after 150 training iterations. For lung cancer and brain stroke,
the LSTM model correctly identified with accuracy level of
87% and 76% respectively after 10 iterations of training.
C. Implementation of Web Technology
Several illnesses were predicted by a Django web application,
empowering consumers and lowering healthcare
Fig. 7. Data input page for diabetes disease prediction
expenditures. Machine learning algorithms offer tailored risk
estimations and advise by asking questions about medical
history and lifestyle. While not a replacement for medical
advice, the app motivates users to take better care of
themselves. The app's database enables ongoing training and
increased precision. Several websites and pieces of software
are not required with this solution. Patients submit their data
on a website, where it is subsequently stored on a server
online. Fig. 5 shows the home page of our work. It was
created using Django in the backend and Bootstrap, HTML5,
and CSS in the frontend. To predict diseases user must do the
registration first. A registered user needs to log in before
predicting a disease. The log in page is shown in Fig. 6. The
input page where users must correctly put the attributes to Fig. 8. Result page for diabetes disease prediction
diagnose a particular disease, for example diabetes is shown
in Fig. 7. The result obtained from user given inputs for
diabetes prediction is shown in Fig. 8. All user input data
saved in a database is shown in Fig. 9. This data will be used
to retrain our model in the future.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 10. Confusion matrices for diabetics and heart disease
prediction.

Fig. 10 displays the confusion matrices for diabetes and heart


disease prediction. It illustrates that the random forest
Fig. 9. Input data showing from database classifier model is a good predictor of diabetes and has an
accuracy of 0.87, a precision of 0.88, and a recall of 0.61 for
D. Performance Metrices of Different Machine Learning diabetic prediction. It also shows that LDA classifier model
Models is a good predictor of heart disease with an accuracy of 0.78,
The performance characteristics of several machine learning a precision of 0.85, and a recall of 0.68.
models on various illnesses are shown in Table IV.
Parkinson's, diabetes, heart disease, lung cancer, and brain
strokes are the illnesses included in the table. RMSE, MAE,
Recall, Precision, F1-score, R2 score, and k-fold Accuracy
are the performance measures shown in this table. The
model's prediction accuracy is measured by RMSE and MAE,
with lower values indicating greater performance. Table IV
shows the algorithms used in our framework and web app
development, because these algorithms have given the best
performance.
TABLE IV. PERFORMANCE METRICES OF DIFFERENT MACHINE LEARNING
MODELS ON VARIOUS DISEASES
Fig. 11. Confusion matrices for Parkinson's and lung cancer prediction
Heart Lung Brain
Parkinson’s Diabetes
Metric Disease Cancer Stroke The confusion matrices shown in Fig. 11 depict that the LDA
QDA RF
LDA AB RF
classifier model is a good predictor of heart disease, whereas
RMSE 0.22 0.30 0.40 0.12 0.09 AdaBoost classifier model is a good predictor of lung cancer.
MAE 0.05 0.09 0.16 0.01 0.008 The LDA model has an accuracy of 0.78, a precision of 0.85,
Recall 1.0 0.95 0.91 1.0 1.0
and a recall of 0.68. The AdaBoost model has an accuracy of
0.80, a precision of 0.85, and a recall of 0.67.
Precision 0.9 0.87 0.81 0.98 0.98
F1 0.94 0.91 0.86 0.99 0.99
R2 0.79 0.61 0.33 0.48 0.96
k-fold
80% 86% 82% 86% 77.38%
Accuracy

The model's accuracy in properly identifying affirmative


instances is measured by recall, precision, and F1-score, with
higher values indicating better performance. The R2 score
quantifies the percentage of variation in the target variable
explained by the model. K-fold cross-validation is a method
for evaluating a machine learning model's performance and Fig. 12. Confusion matrix for brain stroke.
generalizability. The values in the table show how well each
model performed when identifying a certain ailment. It may Fig. 12 represents the confusion matrix of brain stroke
assess how well several machine learning models perform showing that the Random Forest model misclassified 6% of
over various disease prediction and then choose the model the instances. Of these misclassified instances, 5% were false
that best fits the application’s needs depending on the desired positives and 1% were false negatives. This means that the
performance metric. It’s crucial to remember that the task's model incorrectly predicted that 5% of the people who did
unique needs and features may influence the selection of the not have brain stroke actually had brain stroke, and the model
best suitable performance metric. Hence, a thorough analysis incorrectly predicted that 1% of the people who did have
utilizing a variety of indicators is advised to provide a more brain stroke actually did not have brain stroke.
precise evaluation of model performance.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.
E. Security Aspects on User Personal Data A model for multiple illness prediction can estimate the
Multiple disease prediction program must provide strong likelihood of many diseases and reduce death rates. This
protection for user personal data. Data must be encrypted paper uses different machine learning algorithms to measure
while they are being transmitted and stored, and HTTPS performance, and future work may involve adding more
should be used for secure communication. To secure sensitive diseases trained with machine learning and deep learning
information, robust authentication, role-based access control, models.
and protection against SQL injection must be provided. The
REFERENCES
admin should manage user sessions securely, abide with
[1] B. Dhomse Kanchan and M. Mahale Kishor, "Study of machine
privacy guidelines, and get informed consent before
learning algorithms for special disease prediction using principal
collecting any data. To reduce data usage, frequent backups of component analysis," Proceedings - International Conference
should be made and the developer should keep up with on Global Trends in Signal Processing, Information Computing
security patch updates. The developers should develop safe and Communication, ICGTSPICC 2016, pp. 5–10, Jun. 2017,
doi: 0.1109/ICGTSPICC.2016.7955260.
coding practices, keep an eye out for breaches, and think
[2] K. Polaraju and D. Prasad, "Prediction of Heart Disease using
about data anonymization to preserve users' privacy. To Multiple Linear Regression Model," 2017.
maintain user confidence and data integrity, compliance with [3] S. Ambekar and R. Phalnikar, "Disease Risk Prediction by
data protection rules is crucial. Using Convolutional Neural Network," Proceedings - 2018 4th
International Conference on Computing, Communication
V. RESULT COMPARISON AND CONTRIBUTION Control and Automation, ICCUBEA 2018, Jul. 2018, doi:
10.1109/ICCUBEA.2018.8697423.
The analysis of this work is conducted in a real-time database [4] V. Jackins, S. Vimal, M. Kaliappan, and M. Y. Lee, "AI-based
using a trained machine learning (ML) model on the same smart prediction of clinical disease using random forest classifier
and Naive Bayes," Journal of Supercomputing, vol. 77, no. 5, pp.
dataset and deployed in the ML model. The highest 5198–5219, May 2021, doi: 10.1007/S11227-020-03481-
accuracies obtained in this research are 93% for diabetes X/FIGURES/10.
using the Random Forest algorithm, 95% for Parkinson's [5] P. S. Kohli and S. Arora, "Application of machine learning in
using the Random Forest algorithm and QDA, 84% for heart disease prediction," 2018 4th International Conference on
Computing Communication and Automation, ICCCA 2018, Dec.
disease using LDA and QDA, 98% for Lung cancer using 2018, doi: 10.1109/CCAA.2018.8777449.
AdaBoost and Boost, and 99% for Brain Stroke using [6] P. Gupta, A. Suryavanshi, S. Maheshwari, A. Shukla, and R.
Random Forest algorithm. Previously, Priyanka Sonar et al Tiwari, "Human-machine interface system for pre-diagnosis of
suggested a machine learning based diabetes prediction diseasesusing machine learning," ACM International Conference
Proceeding Series, vol. Part F137705, pp. 71–75, Apr. 2018, doi:
system in 2019. They got the accuracy of 85%, 77% and 10.1145/3220511.3220525.
77.3% using Decision Tree, Naive Bayes, and SVM [7] I. Mohit, K. S. Kumar, A. U. K. Reddy, and B. S. Kumar, "An
algorithm respectively [10]. Moreover, Dhomse Kanchan B. Approach to detect multiple diseases using machine learning
et al. researched special disease prediction using principal algorithm," J Phys Conf Ser, vol. 2089, no. 1, p. 012009, Nov.
2021, doi: 10.1088/1742-6596/2089/1/012009.
component analysis using machine learning algorithms such [8] S. Gupta and S. Raheja, "Stroke Prediction using Machine
as Naive Bayes classification, Decision Tree, and Support Learning Methods," Proceedings of the Confluence 2022 - 12th
Vector Machine in 2017. This approach obtained a diabetes International Conference on Cloud Computing, Data Science and
accuracy of 34.89% and a heart disease accuracy of 53% [1]. Engineering, pp. 553–558, 2022, doi:
10.1109/CONFLUENCE52989.2022.9734197.
However, our work's accuracy is relatively higher than most [9] A. Singh and R. Kumar, "Heart Disease Prediction Using
the papers we have reviewed [2][3][4][6][7]. Machine Learning Algorithms," International Conference on
Electrical and Electronics Engineering, ICE3 2020, pp. 452–457,
VI. FUTURE WORK Feb. 2020, doi: 10.1109/ICE348803.2020.9122958.
[10] P. Sonar and K. Jaya Malini, "Diabetes prediction using different
Further work will mainly focus on medical assistance and machine learning approaches," Proceedings of the 3rd
proper medication to the patients as soon as possible so as to International Conference on Computing Methodologies and
build the best infrastructure and quickest way in the medical Communication, ICCMC 2019, pp. 367–371, Mar. 2019, doi:
sectors. Many possible improvements could be explored to 10.1109/ICCMC.2019.8819841.
[11] C. Thalami, A. Peribonca, S. S. T. Raju, and N. Sampath, "Early
diversify the research by discovering and considering extra Stage Lung Cancer Prediction Using Various Machine Learning
features. Due to the limitation of time, the following work is Techniques," Proceedings of the 4th International Conference on
required to be performed in future. There is plan to add more Electronics, Communication and Aerospace Technology, ICECA
diseases, use more classification techniques/methods, and 2020, pp. 1285–1292, Nov. 2020, Doi:
10.1109/ICECA49313.2020.9297576.
different discretization techniques. We would like to use [12] " Heart disease (no date) UCI Machine Learning Repository.
different rules such as association rule and various algorithms Available at: https://archive.ics.uci.edu/dataset/45/heart+disease
like clustering algorithms. In future, we are willing to make [13] Akbasli, I.T. (2022) Brain stroke prediction dataset, Kaggle.
use of filter-based feature selection methods in order to Available at:
https://www.kaggle.com/datasets/zzettrkalpakbal/full-filled-
achieve more appropriate as well as functional result. brain-stroke-dataset
[14] Bhat, M.A. (2021) Lung cancer, Kaggle. Available at:
VII. CONCLUSION https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer
This study aims to build a multi-disease prediction model [15] Learning, U.M. (2016) Pima Indians Diabetes Database, He
Kaggle. Available at:
utilizing machine learning algorithms to identify illnesses https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-
based on patient symptoms accurately. Users could anticipate database
many diseases simultaneously using the method without extra [16] Ukani, V. (2020) Parkinson’s disease data set, Kaggle.
software or website browsing. Increased life expectancy and Available at:
https://www.kaggle.com/datasets/vikasukani/parkinsons-
less financial load can result from early illness identification. disease-data-set

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2024 at 03:35:32 UTC from IEEE Xplore. Restrictions apply.

You might also like