0% found this document useful (0 votes)
16 views7 pages

Evaluation and Improving Prediction Accuracy On Healthcare Using Classifier Algorithms

Classification Algos

Uploaded by

Sudarsan P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Evaluation and Improving Prediction Accuracy On Healthcare Using Classifier Algorithms

Classification Algos

Uploaded by

Sudarsan P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal of Advances in Engineering and Management (IJAEM)

Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

Evaluation and Improving Prediction


Accuracy on Healthcare using Classifier
algorithms
1
Himani Mahur1, Swati Jadon2, Ankush Sharma3
M.tech., Research Scholar, Department of Computer Science& Engineering, Gurukul Institute of Engineering
& Technology, Kota, Rajasthan
2
Assistant Professor, Department of Computer Science& Engineering, Gurukul Institute of Engineering &
Technology, Kota, Rajasthan
3
Assistant Professor, Department of Computer Science& Engineering, Modi Institute of Technology, Kota,
Rajasthan

----------------------------------------------------------------------------------------------------------------------------- ---------
Date of Submission: 01-04-2023 Date of Acceptance: 10-04-2023
----------------------------------------------------------------------------------------------------------------------------- ---------
ABSTRACT: Accurate projections enhance I. INTRODUCTION
patient outcomes. Classifiers may improve In recent years, the healthcare industry has
prediction accuracy. Healthcare professionals must witnessed a significant transformation in the way
gather and analyse data to evaluate classifier medical data is collected and analyzed. The
prediction accuracy. Clinical data includes medical availability of vast amounts of data has opened up
histories, test results, etc. Preprocessing and new opportunities for healthcare professionals to
cleaning data guarantees accuracy and analytical make more informed decisions and improve patient
ready. Logistic regression, decision trees, and outcomes. One way to leverage this data is by
random forests help healthcare staff analyse and using classifiers, which are machine learning
predict data. Accuracy, precision, recall, and F1 algorithms that can analyze and categorize data.
score measure classifier performance. Feature Evaluating and improving prediction accuracy
selection, data augmentation, and ensemble through classifiers has become an essential tool for
methods may improve healthcare prediction. healthcare professionals to provide high-quality
Adding data enhances classifier performance, care to their patients. By using various techniques
whereas feature selection chooses the most relevant such as feature selection, data augmentation, and
data properties. Ensemble classifiers improve ensemble methods, healthcare professionals can
performance. Classifiers help doctors increase improve prediction accuracy and make more
prediction accuracy and provide high- quality care. informed decisions. In this context, this article
Classifying data using these algorithms improves explores the key concepts and techniques involved
medical decisions and patient outcomes. in evaluating and improving prediction accuracy on
AI and ML have improved medical research. healthcare through classifiers.
Algorithms may give patient diagnostic Medical requirement is now a days is
information that simplifies and verifies decision- essential for all human being, government planning
making. A binary classification model employing to supply a lot of facilities relating to this sickness,
SVM, Logistic Random Forest, or KNN may which is still hospitalised and is a recentcomer to
automate this procedure.In this study, we train a the international scene. The main risk factors vary
neural network to predict a class's probability rather by country, but cholesterol, blood pressure,
than its class. We then design a new classifier using smoking, exercise, and food are notably affected.
these probabilities using class thresholds. The new Although while hereditary factors still play a part,
method's shifting probability threshold yields today's major risk variables are determined by
superior outcomes. lifestyle. Life may be greatly extended with an
Keywords : Logistic Regression, Prediction early and precise diagnosis, followed by the right
accuracy, Healthcare, Classifier, Machine learning, course of therapy.
Performance metrics, Medical data, Decision trees, The process of medical diagnosis must be
Random forests, automated as a result of its complexity in order to
assist medical professionals in their diagnostic

DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 506
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

work. Data mining is growing in popularity in the To improve prediction accuracy,


healthcare sector as a reliable diagnostic method is healthcare professionals have also used various
needed to uncover untapped and crucial techniques such as feature selection, data
information in medical data. Data mining may be augmentation, and ensemble methods. For example,
used to examine patient information in order to a study by Yong Li and Li Feng(2022) used a
identify causes and symptoms and provide suitable feature selection technique to identify the most
digital storage solutions. Also, it may find best relevant features in predicting sepsis. The study
practises in clinical care to aid in the development found that the selected features improved
of norms and standards of care. Several academics prediction accuracy by 7.8%.
have recently shown a greater interest in creating
diagnostic software or healthcare apps that improve Another study by Dimitrios Zikos and
treatment outcomes by identifying illness patterns Nailya DeLellis (2022)used a data augmentation
and symptoms and prescribing more effective technique to generate additional data for predicting
therapies. For instance, studies that identify diabetic retinopathy. The study found that the
illnesses' causes and recommend effective augmented data improved prediction accuracy by
treatments. By physically seeing patients or using 4.6%.Ensemble methods have also been used to
lab test facilities, a diagnosis is made by identifying improve prediction accuracy in healthcare. For
the sickness or ailment. The patient's medical example, a study by Jules Le LayID and Edgar
history forms the basis of the diagnosis. Every time Alfonso-Lizarazo et.al.(2022)used an ensemble of
a patient is admitted to the hospital, doctors must deep learning models to predict heart disease. The
conduct a lengthy diagnostic procedure that will study found that the ensemble model outperformed
have an impact on the patient's health. In order to individual models and achieved an accuracy of
shorten diagnosis times and improve diagnostic 91.5%.
accuracy, it has become increasingly difficult to
develop trustworthy and efficient medical decision Suresh K Bhavnani and Weibin Zhang
support systems. et.al (2022) The objective of this research was to
create and assess an innovative analytical
II LITERATURE REVIEW framework, known as the Modelling and
In recent years, healthcare professionals Interpreting Patient Subgroups (MIPS), utilizing a
have shown an increasing interest in leveraging three-phase modelling process consisting of
machine learning algorithms to improve prediction classification, prediction, and visual analytical
accuracy in healthcare. Classifiers, in particular, modelling. The classification modelling was
have proven to be effective in analyzing and employed to precisely classify patients into
categorizing medical data to make more informed subgroups and estimate their expected outcome,
decisions and improve patient outcomes. while the prediction modelling was used to forecast
a patient's outcome.
One study by Katsos, K. and Johnson
et.al. (2023) explored the use of a classifier to Perera T, Grewal E, Ghali WA, et
predict hospital readmissions. The study used al.(2022)The focus of this investigation was to
logistic regression and decision tree algorithms to assess the connection between the quality of
analyze patients at a high risk of readmission are discharge care as perceived by patients and their
identified using electronic medical records. The post- discharge outcomes. Furthermore, the aim
classifier's accuracy in predicting readmissions was was to identify the factors that contribute to the
77.8%, with an F1 score of 0.76, according to the perceived quality of discharge care. In order to
study's findings. achieve these goals, we conducted a prospective
cohort study of medical inpatients at a tertiary
Another study by Shaoxiong Ji and care hospital in Calgary, Canada. To evaluate
Pekka Marttinen et.al(2023)used a random forest patients' perceptions of the discharge care quality,
classifier to predict sepsis in patients using we employed the Care Transitions Measure (CTM).
electronic health records. According to the In addition, data were collected from administrative
research, the classifier had a sensitivity of 95.5% databases to determine the composite outcome of a
and a specificity of 98.7% for correctly predicting 90-day hospital readmission or emergency
sepsis. The results demonstrated that the classifier department visit. Logistic regression modeling was
performed better than other methods like decision used to analyze the relationship between overall
trees and support vector machines. CTM scores, individual CTM components, and the
composite outcome.

DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 507
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

These studies show that classifiers have and it involves three steps. In the first step, the
the potential to increase prediction accuracy in the original training dataset is constructed and trained
healthcare industry. Healthcare professionals can using KNN, RF, SVC, and LR. RF, SVC, and LR
improve patient outcomes and make better are trained on the 80% training dataset. After all
decisions by utilising machine learning algorithms four models have been trained in the first step, each
and different techniques. Nevertheless, additional model take their self predictions. Subsequently, a
study is required to examine classifiers' full new dataset is produced based on the predictions of
potential in healthcare and to solve any possible the fundamental classifiers. The new dataset will
ethical and privacy issues. have four dimensions. Additionally, this study will
analyze and train each of these models separately
III METHODOLOGY to evaluate the accuracy and effectiveness of the
The dataset in this study is split into two suggested NN model. Furthermore, the
categories, namely training (80%) and testing performance of the suggested stacking model will
(20%). The initial proposed model uses KNN to be compared with that of individual classifiers
train the training dataset component. To obtain such as KNN, Nave Bayes, Linear Discriminant
additional features for the original proposed model Analysis, and Decision Tree in terms of Recall,
classifier, RF, SVC, and LR are respectively used Precision, and F-Measure. Fig. 1 shows the
for training, requiring a total of 90 fits. A flow algorithm of the suggested stacking model.
diagram of the proposed paradigm is shown in Fig.,

Figure 1 : Flow of process work

To collect the heart disease dataset, we accuracy score results. The study utilized various
obtained it online from the Machine Learning classification techniques, including Logistic
Repository at the University of California, Irvin. Regression (LR), K-Nearest Neighbor (KNN),
The dataset was then split into training and test Random Forest (RF), and Support Vector Machine
sets, and a variety of methods were used to obtain (SVM), to identify diseases.

HAEMAT HAEMOG ERYTHR LEUCOC THROMB MCH MCH MCV AGE SEX
O L OC YT O C
35.1 11.8 4.65 6.3 310 25.4 33.6 75.5 1 F
43.5 14.8 5.39 12.7 334 27.5 34 80.7 1 F
33.5 11.3 4.74 13.2 305 23.8 33.7 70.7 1 F
39.1 13.7 4.98 10.5 366 27.5 35 78.5 1 F
30.9 9.9 4.23 22.1 333 23.4 32 73 1 M

DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 508
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

34.3 11.6 4.53 6.6 185 25.6 33.8 75.7 1 M


31.1 8.7 5.06 11.1 416 17.2 28 61.5 1 F
40.3 13.3 4.73 8.1 257 28.1 33 85.2 1 F
33.6 11.5 4.54 11.4 262 25.3 34.2 74 1 F
35.4 11.4 4.8 2.6 183 23.8 32.2 73.8 1 F
33.7 11.5 4.57 13.2 322 25.2 34.1 73.7 1 M
54 16.6 7.61 10 88 21.8 30.7 71 1 F

IV EXPERIMENTAL SETUP AND the patient may be discharged for our care; if it is
RESULT DISCUSSION higher than the predicted threshold, the patient
To experiment with heart disease should not be discharged.
prediction, the spyder scientific programme is used
with Anaconda navigator to run Python code with Many common exhibition metrics, such as
library imports. Hyper parameter adjustment was exactness, correctness, and characterization error,
then used to choose the best characteristics. There have been taken into account for the calculation of
were three phases to the experiment. the execution adequacy of models in order to
determine the presenting suitability of this model.
Stage 1: Basis classifier KNN is employed in the
first step. Base Algorithm K-NN : Following modelling, we
Step 2: For each of the 30 candidates, the made an effort to illustrate the precision of our
classification methods SVM, LR, and RF were approach using the basic classifier's algorithm.
used while fitting 3 folds. Throughout this plotting procedure, it will be
Stage 3: A hyper parameter of 100 epochs is used determined which basic classifier algorithm will
to specify how many times the learning algorithm provide the best predictions. The base classifier
will run over the whole training dataset. employed in our investigation generally had the
Step 4: We have chosen a variety of thresholds for same effect on accuracy. Finding the records that
our DSS technique, ranging from 0.1 to 0.65. For are most similar to another record in terms of
each threshold, we verify the precision, recall, shared characteristics is known as K-nearest
accuracy, and f-score using the validation data. To neighbours.
anticipate the test results, the threshold with the
greatest performance is ultimately selected. K-NN Train accuracy: 0.8047605553981297 K-
Step 5: After choosing the expected threshold NN Test accuracy: 0.7191392978482446 K-NN
value, apply it to the patient's data. If the chance of Test f-score: 0.6253776435045317
continued care is less than the predicted threshold,

Table 1: Score value on K-NN


precision recall f1-score support

0 0.74 0.81 0.78 526

1 0.68 0.58 0.63 357

accuracy - - 0.72 883

macro avg 0.71 0.70 0.70 883

weighted avg 0.72 0.72 0.71 883

DSS based Neural Network : The majority of the on how to care for a sick person may be made in a
focus that is placed on medical forecasts for the way that is both speedy and straightforward. It has
future by ANN-enabled decision support systems been suggested that the outcomes of my studies in
(DSS) is directed towards the area of medicine. this area have been favourable, albeit to varying
Because with the help of such a prognosis, a choice degrees.
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 509
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

Fscore = 0.696755994358251
Accuracy = 0.7565118912797282

Table 2: Score value on optimize neural network


precision recall f1-score support

0 0.79 0.80 0.80 526

1 0.70 0.69 0.70 357

accuracy - - 0.76 883

macro avg 0.75 0.75 0.75 883

weighted avg 0.76 0.76 0.76 883

The line of accuracy for KNN, LR, SVC, 75.65% accuracy. According to the findings of this
and NN is shown in Figure 2, with the KNN model research, the proposed NN model, which has an
obtaining an accuracy of 71.91%, LR achieving innovative approach to prediction through
74.51%, SVC getting 75.19% accuracy, RF threshold, performs much better than the various
achieving 74.40% accuracy, and NN achieving other classifiers.

Figure 2 : Accuracy score of All Models

DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 510
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

Table 3: Accuracy comparison table of all models


Model KNN LR SVC RF NN

Value in% 71.9 74.5 75.1 74.4 75.6

Figure 3 depicts the line of F1 scores accuracy, and the NN model achieving 69.67
achieved by the KNN, LR, SVC, and NN models, percent accuracy. According to the findings of this
with the KNN model achieving 62.53 percent research, the proposed NN model, which has an
accuracy, the LR model achieving 63.65 percent innovative approach to prediction through
accuracy, the SVC model achieving 64.84% threshold, performs much better than the various
accuracy, the RF model achieving 64.68% other classifiers.

Figure 3 : F1 scores of all models

V CONCLUSION REFERENCE
A novel decision-making classifier model [1] Katsos, K. and Johnson et.al.
was used in this study to forecast whether patients (2023)“Current Applications of Machine
would be discharged from the hospital or would Learning for Spinal Cord Tumors” Life
need to stay. While a lot of work has been done in 2023, 13, 520. https://doi.org/10.3390/
the past to use classifiers to diagnose illnesses, the life13020520
prediction of discharge should never be used. In [2] Shaoxiong Ji and Pekka Marttinen
this work, a NN Model is used for classification, et.al(2023)"Patient Outcome and Zero-shot
with KNN, LF, SVC, and RF classifiers used as Diagnosis Prediction withHypernetwork-
comparisons. After evaluating the accuracy, recall, guided Multitask Learning"differarXiv:
precision, and f-measure, a comparison was made 2109.03062v2 [cs.CL] 25 Jan 2023
between the NN model and the suggested model. [3] Yong Li and Li Feng(2022)"Patient multi-
The results of simulations indicate that the relational graph structure learning for
suggested model outperforms other existing diabetes clinical assistant diagnosis"
methods in terms of disease classification and Mathematical Biosciences and engineering
prediction.The main reason the recommended MBE, 20(5): 8428– 8445. DOI:
strategy worked so magnificently was because the 10.3934/mbe.2023369 Received: 26
suggested model used a threshold-based approach December 2022
that delivered acceptable outcomes. [4] Dimitrios Zikos and Nailya DeLellis
(2022) "Comparison of the Predictive
Performance of Medical CodingDiagnosis
Classification Systems " MDPI journal,
DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 511
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 506-512 www.ijaem.net ISSN: 2395-5252

Technologies 2022, 10, 122. "An Intrusion Detection Model


https://doi.org/10.3390/technologies1006012 contemplating Hybrid Classification
2 estimation" MATEC Web of Conferences
[5] Jules Le LayID and Edgar Alfonso- 246, 03027 (2018) https://doi.org/10.
Lizarazo et.al.(2022) "Prediction of hospital 1051/matecconf/201824603027 ISWSO
readmission ofmultimorbid patients using 2018
machine learning models" PLoS ONE [14] SuvajitDutta, Bonthala CS Manideep
17(12): e0279433. (2018) "Portrayal of Diabetic Retinopathy
https://doi.org/10.1371/journal.pone.027943 Images by Using Deep Learning Models"
3 International Journal of Grid and Distributed
[6] Suresh K Bhavnani and Weibin Zhang Computing
et.al (2022) " A Framework for Modeling http://dx.doi.org/10.14257/ijgdc.2018.11.1.0
and Interpreting Patient SubgroupsApplied 9 , ISSN: 2005-4262 IJGDC Vol. 11, No.
to Hospital Readmission: Visual Analytical 1 (2018), pp.89-106
Approach" JMIR MEDICAL [15] Deeraj Shetty and Kishor Rit et. Al.(2017)
INFORMATICS "Diabetes Disease Prediction Using Data
https://medinform.jmir.org/2022/12/e37239, Mining". Generally speaking Conference on
JMIR Med Inform 2022 | vol. 10 | iss. 12 | Innovations in Information, Embedded and
e37239 | p. 1to p.19 Communication Systems (ICIIECS), 2017.
[7] Perera T, Grewal E, Ghali WA, et [16] SuganthiJeyasingh and
al.(2022) "Perceived discharge quality and MalathyVeluchamy "Polytomous Logistic
associations with hospital readmissions and Regression Based Random Forest Classifier
emergency department use: a prospective for Diagnosing Cancer Disease" Journal of
cohort study" BMJ Open Quality Cancer Science and Therapy SciTher 10:
2022;11:e001875. doi:10.1136/ bmjoq-2022- 226-234. doi:10.4172/1948-5956.1000549
001875
[8] Nitasha and Rajeev Kumar Bedi et.al.
(2019) "Association Classification
Algorithms for Breast Cancer Prognosis"
International Journal of Innovative
Technology and Exploring Engineering
(IJITEE) ISSN: 2278-3075, Volume-9 Issue-
2, December 2019
[9] S.ClementVirgeninya and E.
Ramaraj(2019) "Get-together And Hybrid
Logistic Regression(HLR) Algorithm For
Decision Making" International Journal Of
Scientific and Technology Research Volume
8, Issue 10, October 2019 Issn 2277-8616
[10] NonsoNnamoko, Abir Hussain, et. Al
(2018) "Foreseeing Diabetes Onset: An
Ensemble Supervised Learning Approach ".
IEEE Congress on Evolutionary
Computation (CEC), 2018.
[11] Tejas N. Joshi, Prof. Pramila M. Chawan
(2018) "Diabetes Prediction Using Machine
Learning Techniques".Int. Diary of
Engineering Research and Application, Vol.
8, Issue 1, (Part - II) January 2018, pp.- 09-
13
[12] AsmaGul And ArisPerperoglou (2018)
"Get-together of a subset of kNN classifiers"
Adv Data Anal Classif (2018)
Springerlink.com12:827-840,
https://doi.org/10.1007/s11634- 015-0227-5
[13] Manfu Ma And Wei Deng et. al.(2018)

DOI: 10.35629/5252-0504506512 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 512

You might also like