0% found this document useful (0 votes)
68 views

Heart Failure Prediction Using Hybrid Method

Uploaded by

abhi spdy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Heart Failure Prediction Using Hybrid Method

Uploaded by

abhi spdy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

HEART FAILURE PREDICTION USING HYBRID

MACHINE LEARNING TECHNIQUES


Prof. Shyamala L, Rupali Kumari

ABSTRACT: mining will help improve quality of life and reduce


the associated medical costs. Recently, different
One of the major causes of death in the world is
machine learning researchers have developed
Heart Failure. This disease affects directly the numerous models based on feature transformation
heart’s pumping job. Because of this and machine learning methods for disease detection
perturbation, nutrients and oxygen are not well and mortality prediction. Earlier studies developed
circulated and distributed. One of the major logistic regression, C4.5, Naive Bayes, BNNF and
causes of death in the world is Heart Failure. BNND algorithms and obtained HF classification
This disease affects directly the heart’s pumping
accuracies of 71%, 81.11%, 81.48%, 80.95% and
job. In this paper, a multi-level risk assessment 81.11% respectively. The HF classification
of developing heart failure has been proposed, in
accuracy was improved to 84.5% by Polat et al. by
which five risk levels of heart failure can be developing an artificial immune system. Polat et al.
predicted using Machine Learning models proposed another novel system in which further
namely, decision tree classifier, Linear improved the HF classification accuracy to 87%.
Regression, Gradient Booster Classification, Recently, Ali et al. developed a hybrid method in
Random Forest Classifier, Support Vector which L1 regularized SVM with hybridized with
Machine, K Nearest Neighbor etc. On the other linear discriminant classifier. Their hybrid method
hand, we are boosting the early prediction of
resulted in HF classification accuracy of 90%.
heart failure by involving three main risk factors Recent research has been concentrated on features
with the heart failure data set. transformation and selection for improved HF
prediction. In this study, we search optimal feature
Keywords: Logistic Regression on, Random extraction algorithm by evaluating the performance
Forests Classifier Algorithm, KNN. of different feature extraction algorithms namely
Principle Component Analysis (PCA), Sparse PCA,
Kernel PCA and Incremental PCA. These
INTRODUCTION: algorithms are integrated with five different state of
Heart is a vital organ of human body and is the art machine learning models namely linear
responsible for pumping blood to other organs of the regression, Gaussian Naïve Bayes and linear
body. Heart failure (HF) is a serious disorder with discriminant analysis. In order to evaluate the
high prevalence. HF is prevalent in developed performance of the developed integrated models,
countries at a rate of approximately 2% in the adult four different evaluation metrics are used i.e.,
population and about 8% in older subjects. Mathews Correlation Coefficient (MCC),
Moreover, literature shows that about 3-5% of specificity, sensitivity and accuracy. The remaining
hospitals admissions have connection with HF of the paper has three sections. The second section
incidents. Moreover, HF diagnosis is very costly briefly explain the HF database and the developed
owing to the fact that in developed countries HF integrated models. Section III is about simulation
accounts for 2% of the total health costs. Hence, results and discussion of the obtained results while
development of non-invasive methods for HF last section presents the conclusion.
detection based on machine learning and data
RELATED WORK Diagnos KNN, other three
is of SMO techniques
There is ample related work in the fields directly Heart
related to this paper. ANN has been introduced to Disease
produce the highest accuracy prediction in the by Data
medical field. The back propagation multilayer Mining
perception (MLP) of ANN is used to predict heart Techniq
disease. The obtained results are compared with the ue
results of existing models within the same domain 2016 Ashok Evaluate Naïve 83%
Kumar the Bayes 80%
and found to be improved. The data of heart disease
Dwive perform KNN 85%
patients collected from the UCI laboratory is used di et. ance of Logisti 77%
to discover patterns with NN, DT, Support Vector al. different c
machines SVM, and Naive Bayes. The results are machine Regres
compared for performance and accuracy with these learning sion
algorithms. The proposed hybrid method returns techniqu Classifi
results of 86.8% for F-measure, competing with the es for cation
other existing methods. The classification without heart Tree
segmentation of Convolutional Neural Networks disease
(CNN) is introduced. This method considers the predicti
heart cycles with various start positions from the on
Electrocardiogram (ECG) signals in the training 2016 Jayami Heart J48, J48 gives
phase. CNN is able to generate features with various n Patel Disease Logisti 56.76%
et. al. Predicti c which is
positions in the testing phase of the patient. A large
on model better than
amount of data generated by the medical industry
using tree, LMT
has not been used effectively previously. The new Machine Rando algorithm
approaches presented here decrease the cost and Learnin m fores of
improve the prediction of heart disease in an easy g accuracy
and effective way. The various different research and 55.75%
techniques considered in this work for prediction Data
and classification of heart disease using ML and Mining
deep learning (DL) techniques are highly accurate Techniq
in establishing the efficacy of these methods. ue
2017 P. Sai Heart ANN Accuracy
YEAR AUTH- PURPOSE ALGORI ACCURACY Chandr disease proved in
OR THMS
USED asekhar predicti JAVA
2015 Sharma Efficient Decisio 86.3% for Reddy on using
Purush Heart n tree testing et. al. ANN
ottam Disease classifi phase. algorith
et. al. Predicti er 87.3% for m in
on training data
System phas mining
using 2018 Chala Predicti J48, It gives
Decisio Bayen on and Naïve short time
n Tree et. al. Analysis Bayes, result
2015 Boshra Predicti J48, J48 gives the SVM which
Brahmi on and Naïve better occurren helps
et. al. Bayes, accuracy ce of to give
than Heart quality of
Disease services Here the datasets are taken from the kaggle website.
using and The datasets includes 2 columns and 12 rows where
datamini reduce the columns includes serial number, attributes and
ng cost to description. The rows includes the attributes namely
techniqu individual age, sex, chest pain, cholestrol rate, resting
es electrographic results, fasting blood sugar, thalach,
Table 1. Literature Review exang, oldpeak, slope, number of major vessels
colored, thal-which means number of defect type.
The attributes in the dataset are listed in table 2.
PROBLEM STATEMENT The database contains NaN values. The NaN values
The obtrusive based strategies are typically cannot process by the programming hence these
performed at the point when patient accompany values need to convert into numerical values. In this
certain side effects which regularly are the essential approach mean of the column is calculated and NaN
indications where ordinary individual additionally values are replaced by the mean.
having little information can comprehend that
B. Splitting
patient is experiencing coronary illness or stroke
directly around then. Also, the methods generally The whole database is split into training and testing
are over the top expensive and computationally database. The 80% data is taken for training while
complex and requires some investment in remaining 20% data is used for testing.
appraisals. Then in the research we found that we
don't have framework which must investigate C. Classification
certain highlights and indications identified with the The training data is trained by using four different
patients, living style and parental history that could machine learning algorithms i.e. Decision Tree,
turn into preparatory to the patients. Ahead of time KNN, Kmean clustering and Adaboost. Each
we might want to make attention to the patients to algorithm is explained in detail.
be care full and take fundamental preventive strides
to keep away from such complex illness to enter the
body and thrive. The problem here we incurred that SNO Attributes Description
the prediction alone cannot over all rule out the
disease from the body. It needs to be cured by three 1 age Age in years
important basic things. 1. Medicine 2. Precautions
2 sex Male or Female
3. Changing living style by suggesting physical
activity by considering patients different attributes. 3 cp chest pain type
Therefore, our model is to predict the level of heart
disease and suggest them with medicinal and non- 4 trestbps resting blood pressure
medicinal ways to get rid of the heart disease. 5 chol serum cholesterol in
mg/dl
6 fbs fasting blood sugar
METHODOLOGY 7 restecg resting electrographic
results
Dataset collect on 8 thalach max mum heart rate
achieved
Kaggle is one of the most popular online community
websites for data science and machine learning exang exercise induced angina
algorithms. Kaggle allows the user to find and 9
publish the datasets. It has datasets on everything
where the people can easily get the related datasets.
10 oldpea ST depression induced by DECISION TREES
exercise relative to rest
For training samples of data, the trees are
11 slope Slope of the peak exercise constructed based on high entropy inputs. These
ST segment trees are simple and fast constructed in a top down
recursive divide and conquer (DAC) approach. Tree
12 ca No. of major vessels pruning is performed to remove the irrelevant
colored
samples.
13 thal Defect type KNN (K-Nearest Neighbour Algorithm)
K-Nearest Neighbour is used for both classification
Table 2. Dataset and regression technique. This algorithm does not
uses the parameters instead they use the datapoints
DATA PRE-PROCESSING
to derive the output. It is the concept of last learning
Heart disease data is pre-processed after collection model which is full of prediction. The basic idea of
of various records. The dataset contains a total of this algorithm is they use various datapoints as
303 patient records, where 6 records are with some inputs and with these datapoints they derive the
missing values. Those 6 records have been removed output that is full of assumption.
from the dataset and the remaining 297 patient
records are used in pre-processing. The multiclass RANDOM FOREST
variable and binary classification are introduced for This algorithm contains set of trees in which each
the attributes of the given dataset. The multi-class node is like a tree structure and from that the output
variable is used to check the presence or absence of is predicted. It handles the large amount of data. It
heart disease. In the instance of the patient having gives the accurate output and gives the better
heart disease, the value is set to 1, else the value is efficiency. The computation process is tough,
set to 0 indicating the absence of heart disease in the shows the accuracy rate for neural network and
patient. The pre-processing of data is carried out by random forests algorithm. The accuracy rate of
converting medical records into diagnosis values. neural network algorithm is high when compared to
The results of data pre-processing for 297 patient random forests.
records indicate that 137 records show the value of
1 establishing the presence of heart disease while
the remaining 160 reflected the value of 0 indicating
LOGISTIC REGRESSION
the absence of heart disease.
Logistic Regression is not like a regression model
CLASSIFICATION MODELLING instead it is like a classification model. This
The clustering of datasets is done on the basis of the algorithm gives the output in the form of binary
variables and criteria of Decision Tree (DT) values i.e. like 0’s and 1’s. It is one of the statistical
features. Then, the classifiers are applied to each models, and contains some statistical symptoms
clustered dataset in order to estimate its (assumption). The generated sample information is
performance. The best performing models are represented in the form of mathematical
identified from the above results based on their low representation. The logistic regression estimates the
rate of error. The performance is further optimized attributes i.e. full of assumptions. These
by choosing the DT cluster with a high rate of error assumptions are measured in either 0’s or 1’s. It has
and extraction of its corresponding classifier only two possible values i.e. true or false. Sigmoid
features. The performance of the classifier is function is frequently used by this algorithm.
evaluated for error optimization on this data set.
PERFORMANCE MEASURES In the presented work there are six different sections
in which the work take place. The six different
Several standard performance metrics such as sections are namely database selection, sample
accuracy, precision and error in classification have selection, attribute creation, Modelling/ Training,
been considered for the computation of performance Extract knowledge and finally Medical suggestions
efficacy of this model. Accuracy in the current and recommendations. The idea behind diving the
context would mean the percentage of instances whole working in the different section is to working
correctly predicting from among all the available out on each section independently so that the result
instances. Precision is defined as the percentage of and great accuracy can be achieved.
corrective prediction in the positive class of the
instances. Classification error is defined as the
percentage of accuracy missing or error available in
the instances. To identify the significant features of
heart disease, three performance metrics are used
which will help in better understanding the behavior
of the various combinations of the feature-selection.
ML technique focuses on the best performing model
compared to the existing models. We introduce
Hybrid method, which produces high accuracy and
less classification error in the prediction of heart
disease. The performance of every classifier is
evaluated individually and all results are adequately
recorded for further investigation.

PROPOSED WORK
It is important to have the first aid at the time of
heart attack. The number of deaths due to heart
attack occurs because there is a lack of awareness
and first aid given to patients. As the living style of
the person all-round the globe has been changed
which is the fundamental establishment for the
reason for various heart intricacies, there is Fig. 1. Proposed working architecture
sufficient research done to about the prediction.
In the proposed architecture as we can see that there
Here we are aiming to provide a one step further that
are six different sections which are listed above.
is studying the complexity of the heart disease and
Here first we are selecting the database in which it
giving the medical and non-medical suggestions to
is important to work on the target database so we are
get rid of the heart disease. In the proposed work we
selecting the target database from the heart disease
are focusing on analysis, prediction, accuracy after
database after that samples needs to be selected
using many algorithms and comparison and
from the pool of dataset and also it is important to
providing the suggestions. It is also important to
remove the noisy data present the dataset and since
know whether the person needs to be diagnosed
there are lot of attributes present in the dataset so it
with heart disease or not. In our work we are also
is necessary to create the specific attributes required
examining if the person needs to be examined or not
for the training of dataset now, we have to extract
after training the dataset. Experimenting with the
the relevant attributes which are useful for the
various classification models and checking with
process. In the next section there is Modelling and
yield the greatest accuracy.
training of the selected dataset happens. Here, in
this section we are using seven different machine 8. (thalach) maximum heart rate achieved
learning models one after another so that the best (#)
model that is the model with highest accuracy can 9. (exang) exercise induced angina
be find out. After applying the different machine (binary) (1 = yes; 0 = no)
learning algorithms and models now it extracts 10. (oldpeak) = ST depression induced by
the knowledge and finally gives the medical exercise relative to rest (#)
and non-medical suggestions. We have used the 11. (slope) of the peak exercise ST segment
supervised learning models in the proposed (Ordinal) (Value 1: up sloping, Value
system. 2: flat, Value 3: down sloping)
12. (ca) number of major vessels (0–
Quantitative research needs numerical data that can 3, Ordinal) colored by fluoroscopy
come out either from the numerical data itself or (thal) maximum heart rate achieved — (Ordinal): 3
otherwise graphs. Statistical methods are applied on = normal; 6 = fixed defect; 7 = reversible defect
it to get usefulness from the data. Qualitative
research is in words and in thoughts. There must be
expert opinion that can bring useful information EVALUATION RESULTS
through the thoughts and feeling of the examinee.
The prediction models are developed using 13
Qualitative research is to understand concepts,
features and the accuracy is calculated for modeling
thoughts, experiences and feelings of the patients.
techniques. The best classification methods are
This research paper uses both quantitative and
given. This table compares the accuracy,
qualitative data. We have used the University of
classification error, precision, F-measure,
California Irvine (UCI) dataset for this paper. There
sensitivity and specificity. The highest accuracy is
are 3 types of data used in this paper which are:
achieved by this proposed hybrid classification
Continuous (#): which is quantitative data that can method in comparison with existing methods. Out
be measured of the 13 features we examined, the top 4 significant
features that helped us classify between a positive &
Ordinal Data: Categorical data that has an order to negative Diagnosis were chest pain type (cp),
it (0,1,2,3, etc) maximum heart rate achieved (thalach), number of
Binary Data: data whose unit can take on only two major vessels (ca), and ST depression induced by
exercise relative to rest (oldpeak) as shown in the
possible states (0 &1)
figure 2.
There are 13 feature attributes identified in the
dataset for the heart disease prediction and working
which are mentioned below:

1. age (#)
2. sex: 1= Male, 0= Female (Binary)
3. (cp) chest pain type (4 values -
Ordinal): Value 1: typical angina,
Value 2: atypical angina, Value 3: non-
anginal pain, Value 4: asymptomatic
4. (trestbps) resting blood pressure (#) Fig. 2. Feature importance graph
5. (chol) serum cholesterol in mg/dl (#)
6. (fbs)fasting blood sugar > 120 mg/dl
(Binary) (1 = true; 0 = false)
7. (restecg) resting electrocardiography
results (values 0,1,2)
CONCLUSION
Identifying the processing of raw healthcare data of
heart information will help in the long term saving
of human lives and early detection of abnormalities
in heart conditions. Ma chine learning techniques
were used in this work to process raw data and
provide a new and novel discernment towards heart
disease. Heart disease prediction is challenging and
very important in the medical field. However, the
mortality rate can be drastically controlled if the
disease is detected at the early stages and
preventative measures are adopted as soon as
Fig. 3. Correlation Matrix possible. Further extension of this study is highly
desirable to direct the investigations to real-world
datasets instead of just theoretical approaches and
There is a positive correlation between chest pain simulations. The proposed hybrid approach is used
(cp) & target (our predictor). This makes sense combining the characteristics of Random Forest
since, the greater amount of chest pain results in a (RF) and Linear Method (LM). This method proved
greater chance of having heart disease. Cp (chest to be quite accurate in the prediction of heart
pain), is an ordinal feature with 4 values: Value 1: disease. The future course of this research can be
typical angina, Value 2: atypical angina, Value 3: performed with diverse mixtures of machine
non-anginal pain, Value 4: asymptomatic. learning techniques to better prediction techniques.
Furthermore, new feature selection methods can be
In addition, we see a negative correlation between developed to get a broader perception of the
exercises induced angina & our predictor. This significant features to increase the performance of
makes sense because when you exercise, your heart heart disease prediction.
requires more blood, but narrowed arteries slow
down blood flow.
From comparing positive and negative heart disease REFERENCES
patients. There are vast differences in means for [1] M. S. Amin, Y. K. Chiam, K. D. Varathan,
many of our Features. From examine the details, we ‘‘Identification of significant features and data mining
can observe that positive patients experience techniques in predicting heart disease,’’ Telematics
heightened maximum heart rate achieved (thalach) Inform., vol. 36, pp. 82–93, Mar. 2019. [Online].
average. In addition, positive patients exhibit about Available:
1/3rd the amount of ST depression induced by
https://linkinghub.elsevier.com/retrieve/pii/S073658531
exercise relative to rest (oldpeak). 8308876
Our Hybrid machine learning algorithm can now [2] S. M. S. Shah, S. Batool, I. Khan, M. U. Ashraf, S. H.
classify patients with Heart Disease. Now we can Abbas, and S. A. Hussain, ‘‘Feature extraction through
properly diagnose patients, & get them the help they parallel probabilistic principal component analysis for
need to recover. By diagnosing detecting these heart disease diagnosis,’’ Phys. A, Stat. Mech. Appl., vol.
features early, we may prevent worse symptoms 482, pp. 796–807, 2017. doi:
from arising later. Our Random Forest algorithm 10.1016/j.physa.2017.04.113.
yields the highest accuracy, 80%. Any accuracy [3] Y. E. Shao, C.-D. Hou, and C.-C. Chiu, ‘‘Hybrid
above 70% is considered good, but be careful intelligent modelling schemes for heart disease
because if your accuracy is extremely high, it may classification,’’ Appl. Soft Comput. J., vol. 14, pp. 47–
be too good to be true (an example of Over fitting). 52, Jan. 2014. doi: 10.1016/j.asoc.2013.09.020.
Thus, 80% is the ideal accuracy.
[4] J. S. Sonawane and D. R. Patil, ‘‘Prediction of heart [14] W. Zhang and J. Han, ‘‘Towards heart sound
disease using multilayer perceptron neural network,’’ in classification without segmentation using convolutional
Proc. Int. Conf. Inf. Commun. Embed- neural network,’’ in Proc. Comput. Cardiol. (CinC), vol.
44, Sep. 2017, pp. 1–4.
ded Syst., Feb. 2014, pp. 1–6.
[15] Y. Meidan, M. Bohadana, A. Shabtai, J. D.
[5] C. Sowmiya and P. Sumitra, ‘‘Analytical study of Guarnizo, M. Ochoa, N. O. Tippenhauer, and Y. Elovici,
heart disease diagnosis using classification techniques,’’ ‘‘ProfilIoT: A machine learning approach for IoT device
in Proc. IEEE Int. Conf. Intell. Techn. Control, Optim. identification based on network traffic analysis,’’ in Proc.
Signal Process. (INCOS), Mar. 2017, pp. 1–5. Symp. Appl. Comput., Apr. 2017, pp. 506–509.
[6] B. Tarle and S. Jena, ‘‘An artificial neural network [16] J. Wu, S. Luo, S. Wang, and H. Wang, ‘‘NLES: A
based pattern classification algorithm for diagnosis of novel lifetime extension scheme for safety-critical cyber-
heart disease,’’ in Proc. Int. Conf. Comput.,Commun., physical systems using SDN and NFV,’’ IEEE Internet
Control Automat. (ICCUBEA), Aug. 2017, pp. 1–4. Things J., no. 6, no. 2, pp. 2463–2475, Apr. 2019.
[7] V. P. Tran and A. A. Al-Jumaily, ‘‘Non-contact [17] J. Wu, M. Dong, K. Ota, J. Li, and Z. Guan, ‘‘Big
Doppler radar based prediction of nocturnal body data analysis-based secure cluster management for
orientations using deep neural network for chronic heart optimized control plane in software-defined networks,
failure patients,’’ in Proc. Int. Conf. Elect. Comput. IEEE Trans. Netw. Service Manag., vol. 15, no. 1, pp.
Technol. Appl. (ICECTA), Nov. 2017, pp. 1–5. 27–38, Mar. 2018.
[8] K. Uyar and A. Ilhan, ‘‘Diagnosis of heart disease [18] J. Wu, M. Dong, K. Ota, J. Li, and Z. Guan, ‘‘FCSS:
using genetic algorithm based trained recurrent fuzzy Fog computing based content-aware filtering for security
neural networks,’’ Procedia Comput. Sci., vol. 120, pp. services in information centric social networks,’’ IEEE
588–593, 2017. Trans. Emerg. Topics Comput., to be published. doi:
[9] T. Vivekanandan and N. C. S. N. Iyengar, ‘‘Optimal 10.1109/TETC.2017.2747158.
feature selection using a modified differential evolution [20] G. Li, J. Wu, J. Li, K. Wang, and T. Ye, ‘‘Service
algorithm and its effectiveness for prediction of heart popularity-based smart resources partitioning for fog
disease,’’ Comput. Biol. Med., vol. 90, pp. 125–136, computing-enabled industrial Internet of things,’’ IEEE
Nov. 2017. Trans. Ind. Information., vol. 14, no. 10, pp. 4702–4711,
[10] S. Radhimeenakshi, ‘‘Classification and prediction Oct. 2018.
of heart disease risk using data mining techniques of
support vector machine and artificial neural network,’’ in [21] J. Wu, K. Ota, M. Dong, and C. Li, ‘‘A hierarchical
Proc. 3rd Int. Conf. Comput. Sustain. Global Develop. security framework for defending against sophisticated
(INDIACom), New Delhi, India, Mar. 2016, pp. 3107– attacks on wireless sensor networks in smart cities,’’
3111. IEEE Access, vol. 4, pp. 416–424, 2016.

[11] R. Wagh and S. S. Paygude, ‘‘CDSS for heart [22] H. Li, K. Ota, and M. Dong, ‘‘Learning IoT in edge:
disease prediction using risk factors,’’ Int. J. Innov. Res. Deep learning for the Internet of Things with edge
Comput., vol. 4, no. 6, pp. 12082–12089, Jun. 2016. computing,’’ IEEE Netw., vol. 32, no. 1, pp. 96–101,
Jan./Feb. 2018.
[12] O. W. Samuel, G. M. Asogbon, A. K. Sangaiah, P.
Fang, and G. Li, ‘‘An integrated decision support system
based on ANN and Fuzzy_AHP for heart failure risk
prediction,’’ Expert Syst. Appl., vol. 68, pp. 163–172,

Feb. 2017.

[13] S. Zaman and R. Toufiq, ‘‘Codon based back


propagation neural network approach to classify
hypertension gene sequences,’’ in Proc. Int. Conf. Elect.,
Comput. Commun. Eng. (ECCE), Feb. 2017, pp. 443–
446.

You might also like