0% found this document useful (0 votes)
45 views5 pages

Heart Disease Prediction Using Hybrid Machine Learning Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Heart Disease Prediction Using Hybrid Machine Learning Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]

IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

Heart Disease Prediction using Hybrid machine


Learning Model
Dr. M. Kavitha1* , G. Gnaneswar1 , R. Dinesh1 , Y. Rohith Sai1 , R. Sai Suraj1
1* 1
Assistant Professor, Student
Department of Computer Science and Engineering,
Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India.
[email protected], [email protected], [email protected], [email protected],
[email protected]
2021 6th International Conference on Inventive Computation Technologies (ICICT) | 978-1-7281-8501-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICT50816.2021.9358597

Abstract – classification algorithms such as decision tree, naïve bayes and


Heart disease causes a significant mortality rate around the world, SVM (Support Vector Machine) are available; similarly,
and it has become a health threat for many people. Early prediction regression algorithms, namely Random forest, lasso, and logistic
of heart disease may save many lives; detecting cardiovascular
regressions, were used in the medical industry. In most of the
diseases like heart attacks, coronary artery diseases etc., isa critical
challenge by the regular clinical data analysis. Machine learning
tumor predictions, deep learning algorithms are largely used in
(ML) can bring an effective solution for decision making and the medical diagnosis field.
accurate predictions. The medical industry is showing enormous
development in using machine learning techniques. In the proposed As per survey reports, each year, nearly 17 million deaths
work, a novel machine learning approach is proposed to predict occurred due to cardiovascular diseases (CVD). The early
heart disease. The proposed study used the Cleveland heart disease detection of disease may save many lives, and mortality can be
dataset, and data mining techniques such as regression and reduced if the patients take their treatments on time [19].
classification are used. Machine learning techniques Random Cardiovascular diseases include many threats such as heart
Forest and Decision Tree are applied. The novel technique of the disease, and all etc. With the lack of physical activity due to the
machine learning model is designed. In implementation, 3 machine lifestyle changes, these diseases are becoming very common
learning algorithms are used, they are 1. Random Forest, 2.
even in the lesser age groups. Smoking, lack of physical
Decision Tree and 3. Hybrid model (Hybrid of random forest and
exercise, high cholesterol food, junk food, living habits
decision tree). Experimental results show an accuracy level of
88.7% through the heart disease prediction model with the hybrid
are the leading causes of heart disease.
model. The interface is designed to get the user's input parameter
to predict the heart disease, for which we used a hybrid model of
Decision Tree and Random Forest.

Key Words: Cleveland Heart Disease Database, Decision Trees,


Random forest, Hybrid algorithm, Machine learning

I. INTRODUCTION

Data mining is useful for studying and understanding a large


amount of data. It is used for the extraction of data and to make
the decision for further applications. The most common
techniques covered under data mining are clustering, association
rule mining, and classifications. There are plenty of algorithms
available for implementing these data mining techniques.
Though there are tools like weka are available for simulations,
Python programming is emerging with these algorithms built
with scikit learn packages. Thus, the real-time implementation of
data mining concepts is more reliable than ever. Fig 1: Block diagram of Heart Disease prediction

Machine learning usage is growing vastly in the medical This study aims to predict heart disease based on machine
diagnosis industry, where the manual error can be reduced with learning via an automated medical diagnosis method. We use the
computer analysis, and accuracy is improved. The diagnosis of a hybrid model, as it is the finest classification method for
disease is highly reliable with machine learning techniques. predicting heart disease. A hybrid model is a novel technique,
which uses the probabilities arrived from one machine learning
Disease such as heart disease, liver disease, diabetes, tumor
model is given as input to the other machine learning model.
predictions is done through machine learning concepts [18].
Authorized licensed use limited to: M/s Shanti Education Society. Downloaded on September 04,2024 at 06:07:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

This hybrid model gives us the better-optimized results based on Heart disease calculation using different machine learning
both machine learning procedure, which is considered for the procedures is studied in [9]. Classification and regression
implementations. models are used for prediction, namely the Decision tree, KNN
The proposed system is the prediction of heart disease by an algorithm, SVM, and linear regression procedure is used for the
automated machine learning diagnosis model with a high study. Experiment results proved that the KNN algorithm with
novelty-based hybrid model. This hybrid model is used to the highest accuracy. However, this model can be implemented
predict heart disease. Cleveland dataset is utilized here for in a real-time environment or applications.
processing. This dataset is considered commonly by machine
learning researchers. This dataset has a entire of 303 instances A cognitive approach is carried out in [10] for heart disease
and around 14 characteristics. prediction. In this work, five machine learning algorithms are
The study aims to classify it as a binary classification type considered for prediction, and all are evaluated with accuracy.
0(absence of heart disease) to 1 (present of heart disease). Logistic model tree is implemented to get better results in
Patients can go for treatment based on the result generated prediction, which used an ADA boost and bagging model to
through our proposed model. The proposed application helps in forecast heart disease. Their investigational results have exposed
taking advance measures for patients. that random forest achieved high accuracy on predictions.
In the following chapters, literature survey and related effort is
studied. In chapter III, the projected system is discussed, and the
implementation algorithm and methodology are discussed. In It is inferred from the existing works that there is a need for
chapter 4, results and discussions are done. In chapter V, this novelty in the study, and a robust, optimized model is needed for
work is concluded, and enhancements are discussed. heart disease prediction. The existing works are discussed with
the available machine learning algorithms, either implemented
II. RELATED WORK with tools such as Weka or MATLAB. Some of the works are
There are many current works studied by the researchers also done with the deep learning model. However, the optimized
about heart disease prediction and analysis. Some of such model is not studied. In our proposed model, the novelty of work
works are addressed below. is done. It is implemented with a hybrid model to give more
optimized results.
The author studies heart disease using the random forest in [1]
with the Cleveland dataset. The author used the Chi Square
feature selection model and genetic algorithm (GA) based III. PROPOSED WORK
feature selection model for the study. They proved in the
experimental results that their proposed model with Genetic A hybrid model is a novel technique, which uses the
algorithm feature selection has given high accuracy than the probabilities arrived from one machine learning model is given
existing models. However, the results are evaluated with existing as input to the other machine learning model. This hybrid model
machine learning models. gives us the better-optimized results based on both machine
learning algorithm, which is considered for the implementations.
In [2], the author has generated specific rules based on this PSO The proposed work is implemented with sklearn libraries,
algorithm and evaluated different rules to get a more accurate pandas, matplotlib, and other compulsory libraries. We have the
rule for heart disease identification. After evaluating the rules, C dataset downloaded from the uci repository. There are binary
5.0 is used for the classification of disease based on binary groups of heart disease in the downloaded info. The machine
classification. The author used UCI repository data for learning algorithm is implemented along with the hybrid model,
implementation and evaluated high accuracy using PSO and the such as decision tree and random forest.
Decision tree algorithm.
IV. DATASET DETAILS
Backpropagation neural network for heart disease prediction was Dataset collected with attributes sex indicates the gender of the
discussed in [3]. Deep learning model, which is a highly
patient, age indicates the age of the patient, trestbps indicates the
effective learning model for disease prediction. The author used
a neural network for learning and prediction. The author used resting blood pressure, cp indicates the chest pain, fbs indicates
the Cleveland dataset for the study and implemented simulation the fast blooding sugar, chol indicates cholesterol, thalach
in Matlab. However, the work can be done with deep learning indicates the maximu m heart rate achieved, restecg indicates the
models and highly accurate, and this can be extended to real resting electroc. result (1 anomality), oldpeak indicates the ST
world applications. depression induc. ex, exang indicates the exercise induced
angina, ca indicates the number of major vessels, slope indicates
The author in [8] discussed prediction of heart disease using data
mining practices. They studied and evaluated with some the slope of peak exercise ST, pred_attribute, thal indicates the
techniques such as the KNN algorithm, decision tree algorithm, thalassemia. The sample of collected data is shown in the below
neural network classifications, and Bayesian classification figure.
algorithms. The author also studied the genetic algorithm's use in
feature selection for heart disease essential features. and
experimented with the study and evaluated high accuracy with
the decision tree model.

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1330


Authorized licensed use limited to: M/s Shanti Education Society. Downloaded on September 04,2024 at 06:07:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

The dataset is visualized to get number of Heart disease cases


and number of normal cases from the dataset. It is shown as
histogram plot as given below.

Fig 3: Heart disease prediction system architecture


Figure 3 shows the architecture of proposed system for heart
Fig 2: Data Visualization of heart Disease in Cleveland disease prediction through machine learning algorithm models,
Dataset which is briefly clarified below.

The dataset is visualized to get number of heart disease cases a. Decision Tree
and number of normal cases from the dataset. It is shown as Decision tree is one of the learning models that is used
histogram plot as given in figure 2. in the problem of classification. We divide the dataset
The proposed workflow has the following advantages into two or more sets using this technique. In decision
tree, internal nodes represent a test on the
o Implemented two machine learning algorithm and a characteristics, the branch portrays the outcome, and
Hybrid model leaves are the decisions generated after subsequent
o Accuracy of all proposed algorithm is arrived to show processing.
the best model Decision Tree algorithm as follows
o Implement a hybrid model to make the proposed work i. Set the dataset's best feature as the root of the tree.
as an optimized model.
ii. Dataset is split into test and train sets . Subsets
The execution is carried out with the below given methodologies should be made in such a way that each subset contains
a. Dataset is collected from uci.edu information with the feature attribute like that.

b. Data Visualization is done iii. On each subset, the steps above are repeated until
we get leaves in the tree.
c. Splitting dataset into test and train data
The prediction for a record of a class label in the
d. Apply DT and RF models for training and analysis decision tree will start from the root. The values are
e. Train the model compared with the following record attributes with the
root attributes. The corresponding value of the next
f. Test the trained model and predict values node to go arrives in this comparison.
g. Get single input from user and predict heart disease b. Random Forest Regression
through hybrid model
Random Forest regression aggregates multiple
Cleveland dataset is considered. It is split into two parts as decisions to make a single decision. For training
training and testing sets. We have assumed 70% of the dataset as characteristics and then random sub characteristics for
training input to the machine learning algorithms and fit the sampling nodes, random sampling is done.
model. the remaining 30% as testing data for heart disease Split the dataset into the test set and the train. Subsets
prediction.
should be made in such a way that each subset contains
We exploited the Decision Tree, Random Forest, and Hybrid of a feature attribute like that.
the Decision tree. Random forest is used to predict heart disease
On each subset, the steps above are repeated until we
for 30% test input, and the values predicted to be plotted and get leaves in the tree.
compared for accuracy.
The tree building samples are performed by
. bootstrapping, meaning it can multiple times consider
the same feature. The maximum number of node
splitting features could be limited by numbers. This
algorithm reduces the problem of the fitting.

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1331


Authorized licensed use limited to: M/s Shanti Education Society. Downloaded on September 04,2024 at 06:07:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

c. Hybrid Model V. RESULTS AND DISCUSSIONS


We develop a hybrid model using a decision tree and
In Python 3.7.6, the proposed work is implemented with sklearn
random forest algorithm. The combined model works based on
probabilities of random forest. The probabilities from the libraries, pandas, matplotlib, and other required libraries. The
random forest are added to train data and fed to the decision tree dataset of heart disease downloaded from uci.edu will be
algorithm. Similarly, decision tree probabilities are identified considered for the study. Machine Learning algorithms such as
and fed to test data. Finally, values are predicted. Decision Tree and Random Forest were used. These machine
learning algorithms were used to predict the heart disease. To
improve the work and novelty of the work, we implemented a
Implementing machine learning on a preprocessed dataset is
hybrid model of Decision Tree and Random Forest. The result
done; the anticipated cardiovascular disease for the given test
shows that Heart disease detection is effective using the Random
dataset is plotted. Figure 4 shows the application we designed
Forest algorithm and a hybrid model. Decision Tree achieves
for heart disease prediction. The user/ patient can give their own
around 79% accuracy, and Random forest achieves 81%
input and detect the threat of heart disease. Disease prediction is
accuracy, Hybrid model achieves 88% accuracy.
classified as a binary prediction type, which means 0 - normal
and 1- Heart disease. The application is designed using TkInter Table 1: Experimental Results
in python.
Algorithm Accuracy (% )
Decision Tree 79
Random Forest 81
Hybrid (Decision Tree+ Random Forest) 88

Fig 4: Basic GUI for Heart Disease prediction

Fig 7: Heart Disease prediction through Decision Tree


Figure 7 shows the mean square error (MSE), mean absolute
error (MAE), R-Squared parameter, root mean square error
(RMSE) and accuracy for Decision Tree model.

Fig 5: Positive Case of Heart Disease prediction

Fig 8: Heart Disease prediction through Random Forest


Fig 6: Negative Case of Heart Disease prediction

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1332


Authorized licensed use limited to: M/s Shanti Education Society. Downloaded on September 04,2024 at 06:07:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
IEEE Xplore Part Number: CFP21F70-ART; ISBN: 978-1-7281-8501-9

Figure 8 shows the mean square error (MSE), mean absolute [5] Detrano R. VA Medical Center, Long Beach and Cleveland Clinic
Foundation: Robert Detrano, MD (Doctoral dissertation, Ph. D., Donor:
error (MAE), R-Squared parameter, root mean square error David W. Aha, 1998.
(RMSE) and accuracy for Random Forest model.
[6] Xing, Yanwei, Jie Wang, and Zhihong Zhao. "Combination data mining
methods with new medical data to predicting outcome of coronary heart
disease." 2007 International Conference on Convergence Information
Technology (ICCIT 2007). IEEE, 2007.
[7] Chen, Jianxin, et al. "Predicting syndrome by NEI specifications: a
comparison of five data mining algorithms in coronary heart
disease." International Conference on Life System Modeling and
Simulation. Springer, Berlin, Heidelberg, 2007.
[8] Soni, Jyoti, et al. "Predictive data mining for medical diagnosis: An
overview of heart disease prediction." International Journal of Computer
Applications 17.8 (2011): 43-48.
[9] Singh, A., et.al (2020, February). Heart Disease Prediction Using Machine
Learning Algorithms. In 2020 International Conference on Electrical and
Electronics Engineering (ICE3) (pp. 452-457). IEEE.
[10] Hashi, E.K. and Zaman, M.S.U., 2020. Developing a Hyperparameter
T uning Based Machine Learning Approach of Heart Disease
Prediction. Journal of Applied Science & Process Engineering, 7(2),
pp.631-647.
[11] Shouman, Mai, T im T urner, and Rob Stocker. "Using data mining
techniques in heart disease diagnosis and treatment." 2012 Japan-Egypt
Conference on Electronics, Communications and Computers. IEEE, 2012.
Fig 9: Heart Disease prediction through Hybrid model
[12] Mohan, Senthilkumar, Chandrasegar Thirumalai, and Gautam Srivastava.
Figure 9 shows the mean square error (MSE), mean absolute "Effective heart disease prediction using hybrid machine learning
error (MAE), R-Squared parameter, root mean square error techniques." IEEE Access 7 (2019): 81542-81554.
(RMSE) and accuracy for Hybrid model. [13] Ramalingam, V. V., Ayantan Dandapath, and M. Karthik Raja. "Heart
disease prediction using machine learning techniques: a
survey." International Journal of Engineering & Technology 7.2.8 (2018):
684-687.
VI. CONCLUSION
[14] Polat, Kemal, Seral Şahan, and Salih Güneş. "Automatic detection of heart
disease using an artificial immune recognition system (AIRS) with fuzzy
Heart disease is one of the life-threatening diseases seen around resource allocation mechanism and k-nn (nearest neighbour) based
the world. The changing lifestyle and lack of physical activities weighting preprocessing." Expert Systems with Applications 32.2 (2007):
give more threat to condition. There are many diagnosis 625-631.
processes available in the medical industry. However, in terms [15] Palaniappan, Sellappan, and Rafiah Awang. "Intelligent heart disease
prediction system using data mining techniques." 2008 IEEE/ACS
of accuracy, machine learning is considered the best choice. The international conference on computer systems and applications. IEEE,
proposed work uses a TkInter Python designed application for 2008.
the heart disease prediction. The proposed system using [16] Das, Resul, Ibrahim T urkoglu, and Abdulkadir Sengur. "Effective
combinations of Decision Tree and Random forest for heart diagnosis of heart disease through neural networks ensembles." Expert
systems with applications 36.4 (2009): 7675-7680.
disease prediction as a hybrid model. Cleveland database is used
[17] Jonnavithula, et.al (2020, October). Role of machine learning algorithms
for this study. over heart biseases prediction. In AIP Conference Proceedings (Vol. 2292,
No. 1,p.040013). AIP Publishing LLC.
VII. FUTURE WORK
Deep learning algorithms playing a vital role in health care
applications. So, applying deep learning procedures for heart
disease prediction may give better outcome. Also, we are
interested in classifying it as a multi-class problem to identify
the disease's level.

REFERENCES

[1] Jabbar, M. A., B. L. Deekshatulu, and Priti Chandra. "Intelligent heart


disease prediction system using random forest and evolutionary
approach." Journal of Network and Innovative Computing 4.2016 (2016):
175-184.
[2] Alkeshuosh, Azhar Hussein, et al. "Using PSO algorithm for producing
best rules in diagnosis of heart disease." 2017 international conference on
computer and applications (ICCA). IEEE, 2017.
[3] Al-Milli, Nabeel. "Backpropagation neural network for prediction of heart
disease." Journal of theoretical and applied information Technology 56.1
(2013): 131-135.
[4] Mythili, T., et al. "A heart disease prediction model using SVM-Decision
T rees-Logistic Regression (SDL)." International Journal of Computer
Applications 68.16 (2013).

978-1-7281-8501-9/21/$31.00 ©2021 IEEE 1333


Authorized licensed use limited to: M/s Shanti Education Society. Downloaded on September 04,2024 at 06:07:06 UTC from IEEE Xplore. Restrictions apply.

You might also like