0% found this document useful (0 votes)
31 views7 pages

Paper New

This document summarizes a research paper that used logistic regression to estimate the prediction of heart disease. The paper aimed to improve heart disease prediction accuracy using logistic regression on a healthcare dataset that classifies patients as having heart disease or not based on their record information. Existing systems for heart disease prediction using machine learning and data mining techniques achieved less than 90% accuracy. The proposed system applied logistic regression to a cardiovascular disease dataset in an attempt to increase prediction accuracy.

Uploaded by

abc xyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views7 pages

Paper New

This document summarizes a research paper that used logistic regression to estimate the prediction of heart disease. The paper aimed to improve heart disease prediction accuracy using logistic regression on a healthcare dataset that classifies patients as having heart disease or not based on their record information. Existing systems for heart disease prediction using machine learning and data mining techniques achieved less than 90% accuracy. The proposed system applied logistic regression to a cardiovascular disease dataset in an attempt to increase prediction accuracy.

Uploaded by

abc xyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/337427499

Estimation of Prediction for Getting Heart Disease Using Logistic Regression


Model of Machine Learning

Conference Paper · November 2019


DOI: 10.1109/ICCCI48352.2020.9104210

CITATIONS READS

23 5,467

5 authors, including:

Tarun Saxena Nidhi Lal


Northeastern University Motilal Nehru National Institute of Technology
5 PUBLICATIONS   27 CITATIONS    12 PUBLICATIONS   106 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Estimation of Prediction for Getting Heart Disease Using Logistic Regression Model of Machine Learning View project

All content following this page was uploaded by Tarun Saxena on 21 November 2019.

The user has requested enhancement of the downloaded file.


Estimation of Prediction for Getting Heart Disease Using
Logistic Regression Model of Machine Learning
Montu Saw, Tarun Saxena, Sanjana Kaithwas, Rahul Yadav, Nidhi Lal
Dept. of Computer Science and Engineering
IIIT Nagpur, India
[email protected], [email protected],
[email protected], [email protected], [email protected]

Abstract- In the current era deaths due to heart disease have cardiovascular diseases and it is being ensured that at least 50%
become a major issue. Approximately one person dies per minute of patients with cardiovascular diseases have access to relevant
due to heart disease. Data is generated and has to be stored daily drugs and medical counselling by 2025 [2]. Around 17.9 million
because of fast growth in Information Technology. The data people died just because of cardiovascular diseases in 2016,
which is collected is converted into knowledge by data analysis by which is 31% of deaths around the world.
using various combinations of algorithms. Medical professionals
working in the field of heart disease have their own limitations, A major challenge in heart diseases is its detection [3]. It is
they cannot predict the chance of getting heart disease up to high difficult to predict that a person has a heart disease or not. There
accuracy. This paper aims to improve Heart Disease predict are instruments available which can predict heart diseases but
accuracy using the Logistic Regression model of machine either they are expensive or are not efficient to calculate the
learning considering the health care dataset which classifies the chance of heart disease in human [4]. A survey of World Health
patients whether they are having heart diseases or not according
Organization (WHO) says that medical professionals are able
to the information in the record.
to predict just 67% of heart disease, so there is a vast scope of
Keywords:-Heart Diseases; Data Analysis; Machine Learning;
Logistic Regression Algorithms. research in this field [5]. In case of India, access to good doctors
and hospitals in rural areas is very low. A 2016 WHO report
says that, just 58% of the doctors have medical degree in urban
areas and 19% in rural areas.
I. INTRODUCTION
In USA, someone has a heart attack every 40 seconds, that is,
The load of cardiovascular diseases is rapidly increasing all more than one person dies in USA due to heart attack. Apart
over the world from the past few years. Even if these diseases from this, Turkmenistan have the highest rate of deaths till
has found as the most important source of death, it has been 2012, with 712 deaths per 100,000 people. Whereas,
announced as the most manageable and avoidable disease [1]. Kazakhstan have the second highest rate of deaths due to heart
Mainly, blockage in arteries causes heart stroke. It occurs when diseases. India holds 56th position in this series [6]. Study also
heart does not pump the blood around the body efficiently. shows that, at ages 30-69 years, 1.3 million cardiovascular
deaths, 0.9 million (68.4%) were caused by coronary heart
Having high blood pressure is also one of the main causes of
disease and 0.4 million (28.0 %) by stroke
getting a heart disease. A survey says that, in 2011 to 2014, the
commonness of hypertension in the world was about 35%, Heart diseases are a major challenge in medical science,
which is also a cause of heart disease. Similarly, there are many Machine Learning could be a good choice for predicting any
more reasons for getting a heart disease such as obesity, not heart disease in humans [7]. Heart diseases can be predicted
taking in proper nutrition, increased cholesterol and lack of using Neural Network, Decision Tree, KNN, etc. Later in this
physical activity. So, prevention is very necessary. For paper, we will see that how Logistic Regression is used to find
prevention, awareness of heart diseases is important. Around the accuracy for heart disease. It also shows that how ML will
47% of people dies outside the hospital and it shows that they help in our future for heart disease.
don’t act on early warning signs.
II. RELATED WORK
Nowadays, lifespan of a human being is reduced because of
heart diseases. So, World Health Organization (WHO) There are many works in literature which diagnoses heart
developed targets for prevention of non-communicable diseases diseases using machine learning as well as data mining. A brief
(NCDs) in 2013, in which, 25% of relative reduction is from survey of that is presented here. A paper named ‘A review of
heart disease using machine learning and analytics approach’
by M. Marimuthu, M. Abinaya, K.S. Hariesh, K
Mandhankumar and V. Pavithra was published on September
2018. The result shows that, through the literature survey, they
concluded that, there is a need of combinational and more
complex models to increase the accuracy of prediction of heart
diseases.

Some papers which were published around 2 to 3 years back


have a less accuracy for the prediction of heart diseases as
compared to today’s need. ‘Efficient heart disease prediction
system using decision tree’ by Sharma Purshottam et al, it was
published in 2015. They have used decision tree classifier as
their technique and getting 86.3% accuracy. Similarly, we have,
‘Prediction of heart disease using modified K-means and by
using naïve bayes’ by Sairabi H Mujawar et al. This paper was
published in 2015. Their accuracy percentage for detection of
heart disease was 93% and for undetection it was 89% [13].
This shows that the accuracy percentage depends on the
technique which you are using.

Another example is of ‘heart disease prediction using machine


learning and data mining techniques’ by Jaymin Patel, Prof.
Tejpal Upadhyay and Dr. Sameer Patel from Nirma University,
Gujarat. [14]

III. EXISTING SYSTEM

Heart Disease is even highlighted as a silent killer which leads


to the death of the person without obvious symptoms. The
before all existing system [6] works on sets of both Deep
learning and data mining [7]. Medical diagnosis plays a vital
role and yet complicated task that needs to be executed
efficiently and accurately. To reduce the cost for achieving
clinical tests appropriate computer-based information and
decision support should be aided. Data mining is the use of
software techniques for finding patterns and consistency in sets
of data. Also, with the advent of data mining in the last two
decades, there is a big opportunity to allow computers to
directly construct and classify the different attributes or classes.
Learning of the risk components connected with heart disease
helps medicinal services experts to recognize patients at high
risk of having Heart Disease. Statistical analysis has identified
risk factors associated with heart disease to be age, blood
pressure, total cholesterol, diabetes, hypertension, family
history of heart disease, obesity and lack of physical exercise,
fasting blood sugar, etc. but by using all the existing systems
the accuracy is very less.[8]

Fig.1: Flowchart of Proposed Work [10]


IV. PROPOSED SYSTEM Source: The dataset as shown in Fig.2 is from an ongoing
cardiovascular study on residents of the town of Framingham,
This proposed system has data which classifies if patients have Massachusetts. The classification goal is to predict whether
heart disease or not according to some parameters. This the patient has 10-year risk of future coronary heart disease
proposed system can try to use this data to create a model that (CHD). There are both demographic, behavioral and medical
tries to predict (reading data and data Exploration) [9] if a risk factors that we can see in Fig.3.
patient has this disease or not. In this proposed system, using a
logistic regression (classification) algorithm we use the sklearn
library to calculate the score. Random search is a technique
where random combinations of the hyperparameters are used
to find the best solution for the built model. Finally, analyzing
the results with the help of Comparing Models and Confusion
Matrix. From the data we are having, it is classified into
different structured data based on the features of the patient
heart. From the availability of the data, we have to create a
model that predicts the patient’s disease using a logistic
regression algorithm. First, we have to import datasets read the
datasets, the data should contain different variables like age,
gender, sex, chest pain, slope, target. The data should be
explored so that the information is verified. Create a temporary
variable and also build a model for logistic regression [10].
Here, we use a sigmoid function which helps in the graphical
representation of the classified data. By using logistic
regression, the accuracy is increased as compared to the Fig.3: 10-year risk of coronary heart disease CHD
previous work done in the existing system.

V.APPROACH AND METHODOLOGY


Introduction World Health Organization has estimated 12
million deaths occur worldwide; every year due to Heart
diseases. Half the deaths in the United States and other
developed countries are due to cardio vascular diseases. The
early prognosis of cardiovascular diseases can aid in making
decisions on lifestyle changes in high risk patients and in turn
reduce the complications. This research intends to pinpoint
the most relevant/risk factors of heart disease as well as
predict the overall risk using logistic regression Data
Preparation. Logistic Regression is a type of regression
analysis in statistics used for prediction of outcome of a
categorical dependent variable from a set of predictor or
independent variables. In logistic regression the dependent Fig.4: Patient’s hypertensive nature
variable is always binary. Logistic regression is mainly used
to for prediction and also calculating the probability of Hypertension was the most important single identifiable risk
success.[11] factor for heart failure until the last few decades. The issue has
become less clear over recent years, in part, because of
uncertainties in the documentation of heart failure, the lack of
systematic recordings of arterial pressure prior to the onset of,
and treatment for, heart failure, and the absence of systematic
visualization of epicardial coronary arteries that is clearly
depicted in Fig.4.[12]

Fig.2: Dataset Distribution


Fig.7 depicts the effect of Age factor on cardiovascular disease.
Age is the most important risk factor in developing
cardiovascular or heart diseases, with approximately a tripling
of risk with each decade of life. Coronary fatty streaks can begin
to form in adolescence. It is estimated that 82 percent of people
who die of coronary heart disease are 65 and older.
Simultaneously, the risk of stroke doubles every decade after age
55.[13]

Fig.5: Cigarettes per day

Fig.5 depicts the effect of consumption of cigarettes on heart.


Smoking damages the heart and blood vessels very quickly, but
the damage is repaired quickly for most smokers who stop
smoking. Even a few cigarettes now and then damage the heart,
so the only proven strategy to keep your heart safe from the
effects of smoking is to quit. Fig.8: Dataset after Wrangling

Fig.6: Glucose level

Researchers found that high blood sugar (glucose) causes stronger


contraction of blood vessels and also identified a protein associated
with this increased contraction. The findings could lead to new
treatments to improve outcomes after heart attack or stroke that is
shown in Fig.6.

Fig.9: Before Data Wrangling

Fig.7: Age of the patient


dataset appropriate for data mining so that we can use this
Machine Learning in that logistic regression algorithms by
predicting if patient has heart disease or not. Any non-medical
employee can use this software and predict the heart disease and
reduce the time complexity of the doctors. It is still an open
domain waiting to get implemented in heart disease predication
and increase the accuracy.

VIII. FUTURE WORK


Today’s, world most of the data is computerized and everything
is in the cloud which can be accessed although it is not utilized
properly. By analyzing the available data, we can also use for
unknown patterns. The primary motive of this research is the
prediction of heart diseases with high rate of accuracy. For
predicting the heart disease, we can use logistic regression
algorithm, sklearn in machine learning. The future scope of the
paper is the prediction of heart diseases by using advanced
techniques and algorithms in less time complexity.
Fig.10: After Data Wrangling
IX. REFERENCES

[1] Avinash Golande, Pavan Kumar T. Heart disease prediction


using effective machine learning techniques.

[2] The Lancet Global Health. The changing patterns of


cardiovascular diseases and their risk factors in the states of
Fig.11: Accuracy Result India: The global burden of disease study 1990-2016.

[3] Himanshu Sharma, M A Rizvi. Prediction of heart disease


using machine learning algorithms: A survey.
VI. RESULT
[4] World health ranking.
From the above statistics it is clear that the model is highly
specific than sensitive. Men seem to be more susceptible to [5] Himanshu Sharma, M A Rizvi. Prediction of heart disease
heart disease than women. Increase in age, number of using machine learning algorithms: A survey.
cigarettes smoked per day and systolic Blood Pressure also
show increasing odds of having heart disease. Total [6] Sana Bharti,2015. Analytical study of heart disease
cholesterol shows no significant change in the odds of CHD. prediction compared with different algorithms; International
This could be due to the presence of good cholesterol (HDL) conference on computing, communication, and automation
in the total cholesterol reading. Glucose too causes a very (ICCA2015).
negligible change in odds (0.2%). The model predicted with
0.87 accuracy which can be seen in Fig.11. The model is more [7] Monika Gandhi,2015. Prediction in heart disease using
specific than sensitive. Overall model could be improved techniques of data mining, International conference on
with more data and by using more Machine Learning models. futuristic trend in computational analysis and knowledge
management (ABLAZE- 2015)
VII. CONCLUSION
[8] Sarath Babu, 2017.Heart disease diagnosis using data
The amount of Heart diseases can exceed the current scenario mining technique, international conference on electronics,
to reach the maximum point. Heart disease are complicated and communication and aerospace technology (ICECA2017)
each and every year lots of people are dying with this disease.
It is difficult to manually determine the odds of getting heart [9] A H Chen, 2011. HDPS: heart disease prediction system;
disease based on risk factors previously shown. By using this 2011 computing in cardiology
system one of the major drawbacks of this work is that it’s main
focus is aimed only to the application of classifying techniques
and algorithms for heart disease prediction, by studying various
data cleaning and mining techniques that prepare and build a
[10] Reddy Prasad,Pidaparthi Anjali, S.Adil, N.Deepa(Feb
2019) Heart Disease Prediction using Logistic Regression
Algorithm using Machine Learning

[11] Gritsenko, Elena. "Health Care Analytics: Modeling


Behavioral Risk Factors Associated With Disease." (2019).

[12] Kazzam, E., Ghurbana, B., Obineche, E. et al.


Hypertension — still an important cause of heart failure?. J
Hum Hypertens 19, 267–275 (2005)
doi:10.1038/sj.jhh.1001820

[13] M. Marimuthu, M. abinaya, K S Hariesh, K Madhankumar,


V Pavithra. A review on heart disease prediction using machine
learning and data analytics approach.

[14] Jaymin Patel, Prof. Tejpal Upadhyay and Dr. Samir Patel.
Heart disease prediction using machine learning and data
mining technique.

View publication stats

You might also like