Evaluation and Improving Prediction Accuracy On Healthcare Using Classifier Algorithms
Evaluation and Improving Prediction Accuracy On Healthcare Using Classifier Algorithms
----------------------------------------------------------------------------------------------------------------------------- ---------
Date of Submission: 01-04-2023 Date of Acceptance: 10-04-2023
----------------------------------------------------------------------------------------------------------------------------- ---------
ABSTRACT: Accurate projections enhance I. INTRODUCTION
patient outcomes. Classifiers may improve In recent years, the healthcare industry has
prediction accuracy. Healthcare professionals must witnessed a significant transformation in the way
gather and analyse data to evaluate classifier medical data is collected and analyzed. The
prediction accuracy. Clinical data includes medical availability of vast amounts of data has opened up
histories, test results, etc. Preprocessing and new opportunities for healthcare professionals to
cleaning data guarantees accuracy and analytical make more informed decisions and improve patient
ready. Logistic regression, decision trees, and outcomes. One way to leverage this data is by
random forests help healthcare staff analyse and using classifiers, which are machine learning
predict data. Accuracy, precision, recall, and F1 algorithms that can analyze and categorize data.
score measure classifier performance. Feature Evaluating and improving prediction accuracy
selection, data augmentation, and ensemble through classifiers has become an essential tool for
methods may improve healthcare prediction. healthcare professionals to provide high-quality
Adding data enhances classifier performance, care to their patients. By using various techniques
whereas feature selection chooses the most relevant such as feature selection, data augmentation, and
data properties. Ensemble classifiers improve ensemble methods, healthcare professionals can
performance. Classifiers help doctors increase improve prediction accuracy and make more
prediction accuracy and provide high- quality care. informed decisions. In this context, this article
Classifying data using these algorithms improves explores the key concepts and techniques involved
medical decisions and patient outcomes. in evaluating and improving prediction accuracy on
AI and ML have improved medical research. healthcare through classifiers.
Algorithms may give patient diagnostic Medical requirement is now a days is
information that simplifies and verifies decision- essential for all human being, government planning
making. A binary classification model employing to supply a lot of facilities relating to this sickness,
SVM, Logistic Random Forest, or KNN may which is still hospitalised and is a recentcomer to
automate this procedure.In this study, we train a the international scene. The main risk factors vary
neural network to predict a class's probability rather by country, but cholesterol, blood pressure,
than its class. We then design a new classifier using smoking, exercise, and food are notably affected.
these probabilities using class thresholds. The new Although while hereditary factors still play a part,
method's shifting probability threshold yields today's major risk variables are determined by
superior outcomes. lifestyle. Life may be greatly extended with an
Keywords : Logistic Regression, Prediction early and precise diagnosis, followed by the right
accuracy, Healthcare, Classifier, Machine learning, course of therapy.
Performance metrics, Medical data, Decision trees, The process of medical diagnosis must be
Random forests, automated as a result of its complexity in order to
assist medical professionals in their diagnostic
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 506
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 507
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252
These studies show that classifiers have and it involves three steps. In the first step, the
the potential to increase prediction accuracy in the original training dataset is constructed and trained
healthcare industry. Healthcare professionals can using KNN, RF, SVC, and LR. RF, SVC, and LR
improve patient outcomes and make better are trained on the 80% training dataset. After all
decisions by utilising machine learning algorithms four models have been trained in the first step, each
and different techniques. Nevertheless, additional model take their self predictions. Subsequently, a
study is required to examine classifiers' full new dataset is produced based on the predictions of
potential in healthcare and to solve any possible the fundamental classifiers. The new dataset will
ethical and privacy issues. have four dimensions. Additionally, this study will
analyze and train each of these models separately
III METHODOLOGY to evaluate the accuracy and effectiveness of the
The dataset in this study is split into two suggested NN model. Furthermore, the
categories, namely training (80%) and testing performance of the suggested stacking model will
(20%). The initial proposed model uses KNN to be compared with that of individual classifiers
train the training dataset component. To obtain such as KNN, Nave Bayes, Linear Discriminant
additional features for the original proposed model Analysis, and Decision Tree in terms of Recall,
classifier, RF, SVC, and LR are respectively used Precision, and F-Measure. Fig. 1 shows the
for training, requiring a total of 90 fits. A flow algorithm of the suggested stacking model.
diagram of the proposed paradigm is shown in Fig.,
To collect the heart disease dataset, we accuracy score results. The study utilized various
obtained it online from the Machine Learning classification techniques, including Logistic
Repository at the University of California, Irvin. Regression (LR), K-Nearest Neighbor (KNN),
The dataset was then split into training and test Random Forest (RF), and Support Vector Machine
sets, and a variety of methods were used to obtain (SVM), to identify diseases.
HAEMAT HAEMOG ERYTHR LEUCOC THROMB MCH MCH MCV AGE SEX
O L OC YT O C
35.1 11.8 4.65 6.3 310 25.4 33.6 75.5 1 F
43.5 14.8 5.39 12.7 334 27.5 34 80.7 1 F
33.5 11.3 4.74 13.2 305 23.8 33.7 70.7 1 F
39.1 13.7 4.98 10.5 366 27.5 35 78.5 1 F
30.9 9.9 4.23 22.1 333 23.4 32 73 1 M
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 508
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252
IV EXPERIMENTAL SETUP AND the patient may be discharged for our care; if it is
RESULT DISCUSSION higher than the predicted threshold, the patient
To experiment with heart disease should not be discharged.
prediction, the spyder scientific programme is used
with Anaconda navigator to run Python code with Many common exhibition metrics, such as
library imports. Hyper parameter adjustment was exactness, correctness, and characterization error,
then used to choose the best characteristics. There have been taken into account for the calculation of
were three phases to the experiment. the execution adequacy of models in order to
determine the presenting suitability of this model.
Stage 1: Basis classifier KNN is employed in the
first step. Base Algorithm K-NN : Following modelling, we
Step 2: For each of the 30 candidates, the made an effort to illustrate the precision of our
classification methods SVM, LR, and RF were approach using the basic classifier's algorithm.
used while fitting 3 folds. Throughout this plotting procedure, it will be
Stage 3: A hyper parameter of 100 epochs is used determined which basic classifier algorithm will
to specify how many times the learning algorithm provide the best predictions. The base classifier
will run over the whole training dataset. employed in our investigation generally had the
Step 4: We have chosen a variety of thresholds for same effect on accuracy. Finding the records that
our DSS technique, ranging from 0.1 to 0.65. For are most similar to another record in terms of
each threshold, we verify the precision, recall, shared characteristics is known as K-nearest
accuracy, and f-score using the validation data. To neighbours.
anticipate the test results, the threshold with the
greatest performance is ultimately selected. K-NN Train accuracy: 0.8047605553981297 K-
Step 5: After choosing the expected threshold NN Test accuracy: 0.7191392978482446 K-NN
value, apply it to the patient's data. If the chance of Test f-score: 0.6253776435045317
continued care is less than the predicted threshold,
DSS based Neural Network : The majority of the on how to care for a sick person may be made in a
focus that is placed on medical forecasts for the way that is both speedy and straightforward. It has
future by ANN-enabled decision support systems been suggested that the outcomes of my studies in
(DSS) is directed towards the area of medicine. this area have been favourable, albeit to varying
Because with the help of such a prognosis, a choice degrees.
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 509
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252
Fscore = 0.696755994358251
Accuracy = 0.7565118912797282
The line of accuracy for KNN, LR, SVC, 75.65% accuracy. According to the findings of this
and NN is shown in Figure 2, with the KNN model research, the proposed NN model, which has an
obtaining an accuracy of 71.91%, LR achieving innovative approach to prediction through
74.51%, SVC getting 75.19% accuracy, RF threshold, performs much better than the various
achieving 74.40% accuracy, and NN achieving other classifiers.
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 510
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252
Figure 3 depicts the line of F1 scores accuracy, and the NN model achieving 69.67
achieved by the KNN, LR, SVC, and NN models, percent accuracy. According to the findings of this
with the KNN model achieving 62.53 percent research, the proposed NN model, which has an
accuracy, the LR model achieving 63.65 percent innovative approach to prediction through
accuracy, the SVC model achieving 64.84% threshold, performs much better than the various
accuracy, the RF model achieving 64.68% other classifiers.
V CONCLUSION REFERENCE
A novel decision-making classifier model [1] Katsos, K. and Johnson et.al.
was used in this study to forecast whether patients (2023)“Current Applications of Machine
would be discharged from the hospital or would Learning for Spinal Cord Tumors” Life
need to stay. While a lot of work has been done in 2023, 13, 520. https://doi.org/10.3390/
the past to use classifiers to diagnose illnesses, the life13020520
prediction of discharge should never be used. In [2] Shaoxiong Ji and Pekka Marttinen
this work, a NN Model is used for classification, et.al(2023)"Patient Outcome and Zero-shot
with KNN, LF, SVC, and RF classifiers used as Diagnosis Prediction withHypernetwork-
comparisons. After evaluating the accuracy, recall, guided Multitask Learning"differarXiv:
precision, and f-measure, a comparison was made 2109.03062v2 [cs.CL] 25 Jan 2023
between the NN model and the suggested model. [3] Yong Li and Li Feng(2022)"Patient multi-
The results of simulations indicate that the relational graph structure learning for
suggested model outperforms other existing diabetes clinical assistant diagnosis"
methods in terms of disease classification and Mathematical Biosciences and engineering
prediction.The main reason the recommended MBE, 20(5): 8428– 8445. DOI:
strategy worked so magnificently was because the 10.3934/mbe.2023369 Received: 26
suggested model used a threshold-based approach December 2022
that delivered acceptable outcomes. [4] Dimitrios Zikos and Nailya DeLellis
(2022) "Comparison of the Predictive
Performance of Medical CodingDiagnosis
Classification Systems " MDPI journal,
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 511
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 512