Heart Disease Prediction Using
Heart Disease Prediction Using
LEARNING
ABSTRACT
In the medical field, the diagnosis of heart disease is the most difficult task. The
diagnosis ofheart disease is difficult as a decision relied on grouping of large
clinical and pathologicaldata. Due to this complication, the interest increased in a
significant amount between theresearchers and clinical professionals about the
efficient and accurate heart disease prediction. In case of heart disease, the correct
diagnosis in early stage is important as timeis the very important factor. Heart
disease is the principal source of deaths widespread, andthe prediction of Heart
Disease is significant at an untimely phase. Machine learning in recentyears has
been the evolving, reliable and supporting tools in medical domain and has
providedthe greatest support for predicting disease with correct case of training and
testing. The mainidea behind this work is to study diverse prediction models for
the heart disease and selectingimportant heart disease feature using Random
Forests algorithm. Random Forests is theSupervised Machine Learning algorithm
which has the high accuracy compared to otherSupervised Machine Learning
algorithms such as logistic regression etc. By using RandomForests algorithm we
are going to predict if a person has heart disease or not
INTRODUCTION
The heart is a kind of muscular organ which pumps blood into the body and is the
central part of the body’s cardiovascular system which also contains lungs.
Cardiovascular
system also comprises a network of blood vessels, for example, veins, arteries,
andcapillaries. These blood vessels deliver blood all over the body. Abnormalities
in normal blood flow from the heart cause several types of heart diseases which are
commonly knownas cardiovascular diseases (CVD). Heart diseases are the main
reasons for death worldwide.According to the survey of the World Health
Organization (WHO), 17.5 million total globaldeaths occur because of heart
attacks and strokes. More than 75% of deaths fromcardiovascular diseases occur
mostly in middle-income and low-income countries. Also, 80%of the deaths that
occur due to CVDs are because of stroke and heart attack .
Therefore, prediction of cardiac abnormalities at the early stage and tools for the pr
ediction of heartdiseases can save a lot of life and help doctors to design an
effective treatment plan whichultimately reduces the mortality rate due to
cardiovascular diseases.Due to the development of advance healthcare systems,
lots of patient data arenowadays available (i.e. Big Data in Electronic Health
Record System) which can be usedfor designing predictive models for
Cardiovascular diseases. Data mining or machinelearning is a discovery method
for analyzing big data from an assorted perspective and
encapsulating it into useful information. “Data Mining is a non-trivial extraction of
implicit, previously unknown and potentially useful information about data”. Nowa
days, a huge amount of data pertaining to disease diagnosis, patients etc. are
generated by healthcareindustries. Data mining provides a number of techniques
which discover hidden patterns orsimilarities from data.Therefore, in this paper, a
machine learning algorithm is proposed for theimplementation of a heart disease
prediction system which was validated on two open accessheart disease prediction
datasets. Data mining is the computer based process of extractinguseful
information from enormous sets of databases. Data mining is most helpful in
anexplorative analysis because of nontrivial information from large volumes of
evidence.Medical data mining has great potential for exploring the cryptic patterns
in the data sets ofthe clinical domain. These patterns can be utilized for healthcare
diagnosis. However, the available rawmedical data are widely distributed,
voluminous and heterogeneous in nature .This data needsto be collected in an
organized form. This collected data can be then integrated to form a medical
information system. Data mining provides a user-oriented approach to novel
andhidden patterns in the Data The data mining tools are useful for answering
business questionsand techniques for predicting the various diseases in the
healthcare field. Disease prediction plays a significant role in data mining. This
paper analyzes the heart disease predictions usingclassification algorithms. These
invisible patterns can be utilized for health diagnosis inhealthcare data.Data mining
technology affords an efficient approach to the latest and indefinite patternsin the
data. The information which is identified can be used by the healthcare
administratorsto get better services. Heart disease was the most crucial reason for
victims in the countrieslike India, United States. In this project we are predicting
the heart disease usingclassification algorithms. Machine learning techniques like
Classification algorithms suchas Random forest, Logistic Regression are used to
explore different kinds of heart based problems.
LITERATURE SURVEY
Machine Learning techniques are used to analyze and predict the medical
datainformation resources. Diagnosis of heart disease is a significant and tedious
task in medicine.The term Heart disease encompasses the various diseases that
affect the heart. The exposureof heart disease from various factors or symptom is
an issue which is not complimentary fromfalse presumptions often accompanied
by unpredictable effects. The data classification is based on Supervised Machine
Learning algorithm which results in better accuracy. Here weare using the Random
Forest as the training algorithm to train the heart disease dataset andto predict the
heart disease. The results showed that the medicinal prescription and
designed prediction system is capable of prophesying the heart attack successfully
.
Machine Learningtechniques are used to indicate the early mortality by analyzing
the heart disease patients andtheir clinical records (Richards, G. et al., 2001).
(Sung, S.F. et al., 2015) have brought aboutthe two Machine Learning techniques,
k-nearest neighbor model and existing multi linearregression to predict the stroke
severity index (SSI) of the patients. Their study show that k-nearest neighbor
performed better than Multi Linear Regression model. (Arslan, A. K. et al.,2016)
have suggested various Machine Learning techniques such as support vector
machine(SVM), penalized logistic regression (PLR) to predict the heart stroke.
Their results showthat SVM produced the best performance in prediction when
compared to othermodels.Boshra Brahmi et al, [20] developed different Machine
Learning techniques toevaluate the prediction and diagnosis of heart disease. The
main objective is to evaluate thedifferent classification techniques such as J48,
Decision Tree, KNN and Naïve Bayes. Afterthis, evaluating some performance in
measures of accuracy, precision, sensitivity, specificityare evaluated .
Data source
Clinical databases have collected a significant amount of information about
patients andtheir medical conditions. Records set with medical attributes were
obtained from theCleveland Heart Disease database. With the help of the
dataset, the patterns significant to theheart attack diagnosis are extracted. The
records were split equally into two datasets: trainingdataset and testing dataset. A
total of 303 records with 76 medical attributes were obtained.All the attributes are
numeric-valued. We are working on a reduced set of attributes, i.e. only14
attributes.All these restrictions were announced to shrink the digit of designs, these
are as follows:
1)
The rule should distinct various features into the different groups.
3)
The count of features available from the rule is organized by medical history of
peoplehaving heart disease only
ALGORITHMS
Logistic Regression
A popular statistical technique to predict binomial outcomes (y = 0 or 1) is
LogisticRegression. Logistic regression predicts categorical outcomes (binomial /
multinomial valuesof y). The predictions of Logistic Regression (henceforth, LogR
in this article) are in the formof probabilities of an event occurring, i.e. the
probability of y=1, given certain values of inputvariables x. Thus, the results of
LogR range between 0-1.
First ,start with the selection of random samples from a given dataset.
•
Next ,this algorithm will construct a decision tree for every sample .Then it will
get the prediction result from every decision tree .
•
At last ,select the most voted prediction results as the final prediction result.The
following diagram will illustrates its working-
FEASIBILITY STUDY
A Feasibility Study is a preliminary study undertaken before the real work of a
projectstarts to ascertain the likely hood of the projects success. It is an analysis of
possiblealternative solutions to a problem and a recommendation on the best
alternative.