Heart Disease Prediction Using Hybrid Machine Learning Model
Heart Disease Prediction Using Hybrid Machine Learning Model
I. INTRODUCTION
Machine learning usage is growing vastly in the medical This study aims to predict heart disease based on machine
diagnosis industry, where the manual error can be reduced with learning via an automated medical diagnosis method. We use the
computer analysis, and accuracy is improved. The diagnosis of a hybrid model, as it is the finest classification method for
disease is highly reliable with machine learning techniques. predicting heart disease. A hybrid model is a novel technique,
which uses the probabilities arrived from one machine learning
Disease such as heart disease, liver disease, diabetes, tumor
model is given as input to the other machine learning model.
predictions is done through machine learning concepts [18].
Authorized licensed use limited to: M/s Shanti Education Society. Downloaded on September 04,2024 at 06:07:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9
This hybrid model gives us the better-optimized results based on Heart disease calculation using different machine learning
both machine learning procedure, which is considered for the procedures is studied in [9]. Classification and regression
implementations. models are used for prediction, namely the Decision tree, KNN
The proposed system is the prediction of heart disease by an algorithm, SVM, and linear regression procedure is used for the
automated machine learning diagnosis model with a high study. Experiment results proved that the KNN algorithm with
novelty-based hybrid model. This hybrid model is used to the highest accuracy. However, this model can be implemented
predict heart disease. Cleveland dataset is utilized here for in a real-time environment or applications.
processing. This dataset is considered commonly by machine
learning researchers. This dataset has a entire of 303 instances A cognitive approach is carried out in [10] for heart disease
and around 14 characteristics. prediction. In this work, five machine learning algorithms are
The study aims to classify it as a binary classification type considered for prediction, and all are evaluated with accuracy.
0(absence of heart disease) to 1 (present of heart disease). Logistic model tree is implemented to get better results in
Patients can go for treatment based on the result generated prediction, which used an ADA boost and bagging model to
through our proposed model. The proposed application helps in forecast heart disease. Their investigational results have exposed
taking advance measures for patients. that random forest achieved high accuracy on predictions.
In the following chapters, literature survey and related effort is
studied. In chapter III, the projected system is discussed, and the
implementation algorithm and methodology are discussed. In It is inferred from the existing works that there is a need for
chapter 4, results and discussions are done. In chapter V, this novelty in the study, and a robust, optimized model is needed for
work is concluded, and enhancements are discussed. heart disease prediction. The existing works are discussed with
the available machine learning algorithms, either implemented
II. RELATED WORK with tools such as Weka or MATLAB. Some of the works are
There are many current works studied by the researchers also done with the deep learning model. However, the optimized
about heart disease prediction and analysis. Some of such model is not studied. In our proposed model, the novelty of work
works are addressed below. is done. It is implemented with a hybrid model to give more
optimized results.
The author studies heart disease using the random forest in [1]
with the Cleveland dataset. The author used the Chi Square
feature selection model and genetic algorithm (GA) based III. PROPOSED WORK
feature selection model for the study. They proved in the
experimental results that their proposed model with Genetic A hybrid model is a novel technique, which uses the
algorithm feature selection has given high accuracy than the probabilities arrived from one machine learning model is given
existing models. However, the results are evaluated with existing as input to the other machine learning model. This hybrid model
machine learning models. gives us the better-optimized results based on both machine
learning algorithm, which is considered for the implementations.
In [2], the author has generated specific rules based on this PSO The proposed work is implemented with sklearn libraries,
algorithm and evaluated different rules to get a more accurate pandas, matplotlib, and other compulsory libraries. We have the
rule for heart disease identification. After evaluating the rules, C dataset downloaded from the uci repository. There are binary
5.0 is used for the classification of disease based on binary groups of heart disease in the downloaded info. The machine
classification. The author used UCI repository data for learning algorithm is implemented along with the hybrid model,
implementation and evaluated high accuracy using PSO and the such as decision tree and random forest.
Decision tree algorithm.
IV. DATASET DETAILS
Backpropagation neural network for heart disease prediction was Dataset collected with attributes sex indicates the gender of the
discussed in [3]. Deep learning model, which is a highly
patient, age indicates the age of the patient, trestbps indicates the
effective learning model for disease prediction. The author used
a neural network for learning and prediction. The author used resting blood pressure, cp indicates the chest pain, fbs indicates
the Cleveland dataset for the study and implemented simulation the fast blooding sugar, chol indicates cholesterol, thalach
in Matlab. However, the work can be done with deep learning indicates the maximu m heart rate achieved, restecg indicates the
models and highly accurate, and this can be extended to real resting electroc. result (1 anomality), oldpeak indicates the ST
world applications. depression induc. ex, exang indicates the exercise induced
angina, ca indicates the number of major vessels, slope indicates
The author in [8] discussed prediction of heart disease using data
mining practices. They studied and evaluated with some the slope of peak exercise ST, pred_attribute, thal indicates the
techniques such as the KNN algorithm, decision tree algorithm, thalassemia. The sample of collected data is shown in the below
neural network classifications, and Bayesian classification figure.
algorithms. The author also studied the genetic algorithm's use in
feature selection for heart disease essential features. and
experimented with the study and evaluated high accuracy with
the decision tree model.
The dataset is visualized to get number of heart disease cases a. Decision Tree
and number of normal cases from the dataset. It is shown as Decision tree is one of the learning models that is used
histogram plot as given in figure 2. in the problem of classification. We divide the dataset
The proposed workflow has the following advantages into two or more sets using this technique. In decision
tree, internal nodes represent a test on the
o Implemented two machine learning algorithm and a characteristics, the branch portrays the outcome, and
Hybrid model leaves are the decisions generated after subsequent
o Accuracy of all proposed algorithm is arrived to show processing.
the best model Decision Tree algorithm as follows
o Implement a hybrid model to make the proposed work i. Set the dataset's best feature as the root of the tree.
as an optimized model.
ii. Dataset is split into test and train sets . Subsets
The execution is carried out with the below given methodologies should be made in such a way that each subset contains
a. Dataset is collected from uci.edu information with the feature attribute like that.
b. Data Visualization is done iii. On each subset, the steps above are repeated until
we get leaves in the tree.
c. Splitting dataset into test and train data
The prediction for a record of a class label in the
d. Apply DT and RF models for training and analysis decision tree will start from the root. The values are
e. Train the model compared with the following record attributes with the
root attributes. The corresponding value of the next
f. Test the trained model and predict values node to go arrives in this comparison.
g. Get single input from user and predict heart disease b. Random Forest Regression
through hybrid model
Random Forest regression aggregates multiple
Cleveland dataset is considered. It is split into two parts as decisions to make a single decision. For training
training and testing sets. We have assumed 70% of the dataset as characteristics and then random sub characteristics for
training input to the machine learning algorithms and fit the sampling nodes, random sampling is done.
model. the remaining 30% as testing data for heart disease Split the dataset into the test set and the train. Subsets
prediction.
should be made in such a way that each subset contains
We exploited the Decision Tree, Random Forest, and Hybrid of a feature attribute like that.
the Decision tree. Random forest is used to predict heart disease
On each subset, the steps above are repeated until we
for 30% test input, and the values predicted to be plotted and get leaves in the tree.
compared for accuracy.
The tree building samples are performed by
. bootstrapping, meaning it can multiple times consider
the same feature. The maximum number of node
splitting features could be limited by numbers. This
algorithm reduces the problem of the fitting.
Figure 8 shows the mean square error (MSE), mean absolute [5] Detrano R. VA Medical Center, Long Beach and Cleveland Clinic
Foundation: Robert Detrano, MD (Doctoral dissertation, Ph. D., Donor:
error (MAE), R-Squared parameter, root mean square error David W. Aha, 1998.
(RMSE) and accuracy for Random Forest model.
[6] Xing, Yanwei, Jie Wang, and Zhihong Zhao. "Combination data mining
methods with new medical data to predicting outcome of coronary heart
disease." 2007 International Conference on Convergence Information
Technology (ICCIT 2007). IEEE, 2007.
[7] Chen, Jianxin, et al. "Predicting syndrome by NEI specifications: a
comparison of five data mining algorithms in coronary heart
disease." International Conference on Life System Modeling and
Simulation. Springer, Berlin, Heidelberg, 2007.
[8] Soni, Jyoti, et al. "Predictive data mining for medical diagnosis: An
overview of heart disease prediction." International Journal of Computer
Applications 17.8 (2011): 43-48.
[9] Singh, A., et.al (2020, February). Heart Disease Prediction Using Machine
Learning Algorithms. In 2020 International Conference on Electrical and
Electronics Engineering (ICE3) (pp. 452-457). IEEE.
[10] Hashi, E.K. and Zaman, M.S.U., 2020. Developing a Hyperparameter
T uning Based Machine Learning Approach of Heart Disease
Prediction. Journal of Applied Science & Process Engineering, 7(2),
pp.631-647.
[11] Shouman, Mai, T im T urner, and Rob Stocker. "Using data mining
techniques in heart disease diagnosis and treatment." 2012 Japan-Egypt
Conference on Electronics, Communications and Computers. IEEE, 2012.
Fig 9: Heart Disease prediction through Hybrid model
[12] Mohan, Senthilkumar, Chandrasegar Thirumalai, and Gautam Srivastava.
Figure 9 shows the mean square error (MSE), mean absolute "Effective heart disease prediction using hybrid machine learning
error (MAE), R-Squared parameter, root mean square error techniques." IEEE Access 7 (2019): 81542-81554.
(RMSE) and accuracy for Hybrid model. [13] Ramalingam, V. V., Ayantan Dandapath, and M. Karthik Raja. "Heart
disease prediction using machine learning techniques: a
survey." International Journal of Engineering & Technology 7.2.8 (2018):
684-687.
VI. CONCLUSION
[14] Polat, Kemal, Seral Şahan, and Salih Güneş. "Automatic detection of heart
disease using an artificial immune recognition system (AIRS) with fuzzy
Heart disease is one of the life-threatening diseases seen around resource allocation mechanism and k-nn (nearest neighbour) based
the world. The changing lifestyle and lack of physical activities weighting preprocessing." Expert Systems with Applications 32.2 (2007):
give more threat to condition. There are many diagnosis 625-631.
processes available in the medical industry. However, in terms [15] Palaniappan, Sellappan, and Rafiah Awang. "Intelligent heart disease
prediction system using data mining techniques." 2008 IEEE/ACS
of accuracy, machine learning is considered the best choice. The international conference on computer systems and applications. IEEE,
proposed work uses a TkInter Python designed application for 2008.
the heart disease prediction. The proposed system using [16] Das, Resul, Ibrahim T urkoglu, and Abdulkadir Sengur. "Effective
combinations of Decision Tree and Random forest for heart diagnosis of heart disease through neural networks ensembles." Expert
systems with applications 36.4 (2009): 7675-7680.
disease prediction as a hybrid model. Cleveland database is used
[17] Jonnavithula, et.al (2020, October). Role of machine learning algorithms
for this study. over heart biseases prediction. In AIP Conference Proceedings (Vol. 2292,
No. 1,p.040013). AIP Publishing LLC.
VII. FUTURE WORK
Deep learning algorithms playing a vital role in health care
applications. So, applying deep learning procedures for heart
disease prediction may give better outcome. Also, we are
interested in classifying it as a multi-class problem to identify
the disease's level.
REFERENCES