0% found this document useful (0 votes)
94 views32 pages

Project Report First Phase @8 Suhana

This project report describes using machine learning to predict heart disease based on 14 input features. The objective is to optimize heart disease prediction by analyzing risk factors. Logistic regression was used as the machine learning algorithm and provided promising results for predicting heart disease symptoms. The accuracy of logistic regression was evaluated. Further optimization may be possible by additional analysis of risk factors associated with heart disease conditions.

Uploaded by

Shameer k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views32 pages

Project Report First Phase @8 Suhana

This project report describes using machine learning to predict heart disease based on 14 input features. The objective is to optimize heart disease prediction by analyzing risk factors. Logistic regression was used as the machine learning algorithm and provided promising results for predicting heart disease symptoms. The accuracy of logistic regression was evaluated. Further optimization may be possible by additional analysis of risk factors associated with heart disease conditions.

Uploaded by

Shameer k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

HEART DISEASE PREDICTION USING

MACHINE LEARNING
PROJECT REPORT

(Submitted in Partial Fulfillment of the Requirement for B-Tech Degree Course in Electronics
and Communication Engineering of APJ Abdul Kalam Technological University)

Submitted By
SUHANA SAINAB.S(AME19EC013)
SUDHARSANA.N (AME19EC012)
I.SAJEED(AME19EC006)

Under the guidance of


Ms. ASHA ARVIND
Assistant Professor,
Department of Electronics and Communication Engineering
HEART DISEASE PREDICTION USING MACHINE LEARNING

PROJECT REPORT

(Submitted in Partial Fulfillment of the Requirement for B. Tech Degree Course in Electronics and
Communication Engineering of A P J Abdul Kalam Technological University)

Submitted by
SUHANA SAINAB.S (AME19EC013)
SUDHARSANA .N(AME19EC012)
I.SAJEED(AME19EC006)

under the guidance of


Ms.ASHA ARVIND
Assistant Professor
Department of Electronics and Communication Engineering

Department of Electronics and Communication Engineering


Rajadhani Institute of Science and Technology,Post
Mankara, Palakkad – 678613
CERTIFICATE
This is to certify that Project report entitled “HEART DISEASE PREDICTION USING
MACHINE LEARNING” is a bonafide record of the project work done by SUHANA
SAINAB.S(AME19EC013), SUDHARSANA N(AME19EC012), I.SAJEED(AME19EC006) at
Rajadhani Institute of Science and Technology, in partial fulfillment of the requirements of the
B.Tech Degree course in Electronics and Communication Engineering of A P J Abdul Kalam
Technological University 2019-2023 batch.

Ms.SITHARA KRISHNAN Ms.ASHA ARVIND


Head of the department Guide

Dr.RAMANI.K
Principle

Internal Examiner External Examiner


ACKNOWLEDGEMENT

It is with great enthusiasm and learning spirit that we are bringing out this Project report.
Here we would like to mark my token of gratitude to all those who influenced me during
the period of my work. We would like to express my sincere thanks to The Management
of Rajadhani Institute of Science and Technology, Palakkad and Dr.RAMANI.K, the
Principal Rajadhani Institute of Science and Technology for the facilities provided here.
We express my heart-felt gratitude to Head of the Department, SITHARA KRISHNAN, As-
sistant Professor, Department of Electronics and Communication& Engineering for allowing
me to take up this work.
With immense pleasure and gratitude, We express sincere thanks to my guide Ms.ASHA
ARVIND and Co-ordinator Ms.BLESSY RAPHAEL,A Assistant Professor for her com-
mitted guidance, valuable suggestions and constructive criticisms. Her stimulating sugges-
tions and encouragement helped me through our project work. We extend my gratitude to all
teachers in the Department of Electronics and Communication Engineering, Rajadhani
Institute of Science and Technology, Palakkad for their support and inspiration.
Above all We praise and thank the Almighty God, who showered her abundant grace on me to
make this project a success. We also express my special thanks and gratitude to my family and
all my friends for their support and encouragement.
ABSTRACT

Heart attack prediction is one of the serious causes of morbidity in the world’s
population. The clinical data analysis includes a very crucial disease i.e., cardiovascular
disease as one of the most important sections for the prediction. Data Science and machine
learning (ML) can be very helpful in the prediction of heart attacks in which different risk
factors like high blood pressure, high cholesterol, abnormal pulse rate, diabetes, etc... can be
considered. The objective of this study is to optimize the prediction of heart disease using
ML. This prediction can help clinically in analyzing the risk factors of the disease and
interpretation of the patient scenario. Boosting the algorithm provided promising results to
predict symptoms of heart disease. It can further be optimized by working further on risk
factors associated withthis condition.
HEART DISEASE PREDICTION USING MACHINE LEARNING ii

Contents

2 LITERTURE SURVEY 2

3 THEORY OF THE PROJECT 5

4 METHODOLOGY 6

5 TECHNOLOGY 8

6 BLOCK DIAGRAM 10

7 LOGISTIC REGRESSION 12

8 TESTING 15

9 RESULT AND DISCUSSION 16

10 CONCLUSION 18

BIBLIOGRAPHY 18

11 SAMPLE CODE 20

12 SCREENSHOTS 22

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 1

List of Figures
to∆1
4.1 Proposed model 7

5.1 Machine learning 8

6.1 Block Diagram 10

7.1 Logistic Regression 13


9.1 Accuracy logistic
regression 16
9.2 ROC curve. 17
12.1 Print first 5 rows of the 22
dataset
12.2 print last 5 rows of the
dataset 22

12.3 some info about the


23
data

12.4 checking for missing 23


values
12.5 statistical measures
about the data 24

12.6 checking the


distribution of Target 24
Variable

12.7 Features 24

12.8 Target
25

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 2

Chapter 1

INTRODUCTION
A heart attack which is analogous to acute myocardial infarction (AMI) is one
of the most serious diseases in the segment of cardiovascular disease. It occurs due to the
interruption of blood circulation to muscle of the heart which damages the heart the muscle.
Diagnosing heart disease is also a crucial task. The symptoms, physical examination, and
understanding of the different signs of this disease are required to diagnose heart disease.
Different factors including cholesterol, genetic heart disease, high blood pressure,low
physical activity, obesity, and smoking can be reasons for the occurrence of heart dis- ease.
The major reason for heart attacks is the stoppage of blood to the coronary arteries.The red
blood cells (RBC) start getting low when blood flow is reduced; due to this the human body
stops getting necessary oxygen and loses consciousness. The early diagnosis through
symptoms and signs can help prevent patients of heart attacks if the prediction is accurate
enough.shows different symptoms of a heart attack. The work presented takes 14
features/attributes as input having number values. It has been stated that little modifications in
lifestyle including quitting smoking/alcohol/tobacco, having healthy food habits, and routine
exercises can help in the prevention of heart attacks. Any person living a healthy lifestyle with
early treatment after diagnosis can greatly increase the positive results. However, it is difficult
to identify the high risk of heart disease where different risks like diabetes, high blood
pressure, and cholesterol problems are present. In these types of scenarios, ML can help in the
early diagnosis of disease.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 3

Chapter 2

LITERTURE SURVEY

A thorough search has been done of the previous work on the domain of the heart disease
using different algorithms. The previous 21 years of work has been considered for study and
their shortcomings are noted down to further extend our research. A total of 50 papers from
Web of science, Science direct, and Scopus were collected from which 27 were selected for
final study after removal of duplicates and same domain-based papers. There is number of
works has been done related to disease prediction systems using different machine learning
algorithms in medical Centres.
Senthil Kumar Mohan et al, proposed Effective Heart Disease Prediction Using Hybrid
Machine Learning Techniques in which strategy that objective is to finding critical includes
by applying Machine Learning bringing about improving the exactness in the expectation of
cardiovascular malady. The expectation model is created with various blends of highlights
and a few known arrangement strategies. We produce an improved exhibition level with a
precision level of 88.7 percentage through the prediction model for heart dis-ease with
hybrid random forest with a linear model (HRFLM) they likewise educated about Diverse data
mining approaches and expectation techniques, Such as, KNN, LR, SVM, NN, and Vote have
been fairly famous of late to distinguish and predict heart disease.
Sonam Nikhar et al has built up the paper titled as Prediction of Heart Disease Using
Machine Learning Algorithms by This exploration plans to give a point by point portrayal
of Naà ¯ve Bayes and decision tree classifier that are applied in our examination especially
in the prediction of Heart Disease. Some analysis has been led to think about the execution
of prescient data mining strategy on the equivalent dataset, and the result uncovers that
Decision Tree beats over Bayesian classification system.
Aditi Gavhane, Gouthami Kokkula, Isha Pandya, Prof. Kailas Devadkar (PhD),
Prediction of Heart Disease Using Machine Learning, In this paper proposed system they
used the neural network algorithm multi-layer perceptron (MLP) to train and test the dataset.
In this algorithm there will be multiple layers like one for input, second for output and one
or more layers are hidden layers between these two input and output layers. Each node in
input layer is connected to output nodes through these hidden layers. This connection is as-
signed with some weights. There is another identity input called bias which is with weight,
which added to node to balance the perceptron. The connection between the nodes can be

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 4

feedforwarded or feedback based on the requirement.


Abhay Kishore et al, developed Heart Attack Prediction Using Deep Learning in which
This paper proposes a heart attack prediction system using Deep learning procedures, ex
plicitly Recurrent Neural System to predict the probable prospects of heart related infections
of the patient. Recurrent Neural Network is a very ground-breaking characterization
calculation that utilizes Deep Learning approach in Artificial Neural Network. The paper talks
about in detail the significant modules of the framework alongside the related hypothesis.The
proposed model deep learning and data mining to give the precise outcomes least blunders.
This paper gives a bearing and point of reference for the advancement of another type of heart
attack prediction platform. Prediction stage.
Lakshmana Rao et al, Machine Learning Techniques for Heart Disease Prediction in which
the contributing elements for heart disease are more (circulatory strain, diabetes, cur- rent
smoker, high cholesterol, etc..). So, it is difficult to distinguish heart disease. Different systems
in data mining and neural systems have been utilized to discover the seriousness of heart
disease among people. The idea of CHD ailment is bewildering, in addition, in this manner,
the disease must be dealt with warily. Not doing early identification, may impact the heart or
cause sudden passing. The perspective of therapeutic science furthermore, data burrowing is
used for finding various sorts of metabolic machine learning a procedure that causes the
framework to gain from past information tests, models without being expressly customized.
Machine learning makes rationale dependent on chronicled information. Mr. Santhana
Krishnan.J and Dr. Geetha.S, Prediction of heart disease using machine learning algorithm
This Paper predicts heart disease for Male Patient using Classification Techniques.The detailed
information about Coronary Heart diseases such as its Facts, Common Types, and Risk Factors
has been explained in this paper. The Data Minin tool used is WEKA (Waikato Environment
for Knowledge Analysis), a good Data Mining Tool for Bioinformatics Fields. The all three
available Interface in WEKA is used here; Naive Bayes, Artificial Neural Networks and
Decision Tree are Main Data Mining Techniques and through this techniques heart disease is
predicted in this System. The main Methodology used for pre- diction is Decision Trees like
CART, C4.5, CHAID, J48, ID3 Algorithms, and Naive Bayes Techniques.
Avinash Golande et al, proposed Heart Disease Prediction Using Effective Machine
Learning Techniques in which Specialists utilize a few data mining strategies that are available
to support the authorities or doctors distinguish the heart disease. Usually utilized
methodology utilized are decision tree, k- closest and Naà ¯ve Bayes. Other
unique
characterization based strategies utilized are packing calculation, Part thickness, consecutive
negligible stream-lining and neural systems, straight Kernel self- arranging guide and SVM
(Bolster Vector Machine). The following area obviously gives subtleties of systems that were
utilized

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 5

V.V. Ramalingam et Al,proposed Heart disease prediction using machine learning tech-
niques in which Machine Learning algorithms and techniques have been applied to various
medical datasets to automate the analysis of large and complex data. Many researchers, in
recent times, have been using several machine learning techniques to help the health care
industry and the professionals in the diagnosis of heart related diseases. This paper
presents a survey of various models based on such algorithms and techniques and analyse
their performance. Models based on supervised learning algorithms such as Support Vec-
tor Machines (SVM), K- Nearest Neighbour (KNN), Naà ¯ve Bayes, Decision Trees (DT),
Random Forest (RF) and ensemble models are found very popular among the researchers
and systems have been applied to different clinical datasets to robotize the investigation of
huge and complex information. Numerous scientists, as of late, have been utilizing a few
Machine Learning algorithms and techniques have been applied to various medical datasets
to automate the analysis of large and complex data. Many researchers, in recent times,
have been using several machine learning techniques to help the health care industry and
the professionals in the diagnosis of heart related diseases. This paper presents a survey of
various models based on such algorithms and techniques and analyze their performance.
Models based on supervised learning algorithms such as Support Vector Machines (SVM),
K- Nearest Neighbour (KNN), Naïve Bayes, Decision Trees (DT), Random Forest (RF)
and ensemble models are found very popular among the researchers. strategies to enable the
wellbeing to mind industry and the experts in the analysis of heart related sicknesses. This
paper presents a review of different models dependent on such calculations and methods
and analyze their exhibition. Models in light of directed learning calculations, for example,
Support Vector Machines (SVM), K- Nearest Neighbour (KNN), Naà ¯ve Bayes, Decision
Trees (DT), Random Forest (RF) and group models are discovered extremely well known
among the scientists.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 6

Chapter 3

THEORY OF THE PROJECT


Today the greatest challenge to medical industry to provide higher level facility to health
infrastructure to diagnose the disease in the initial day and give timely treatment to improve
the quality of life through quality of service. Around 31 percentage of mortality occurs world
due to cardiac disease . The developing and under developing countries lacks in infrastructure
and technologies, infrastructure and doctors to predict the disease in early stage to avoid
complications reduce mortality. The growth of Information and telecommunication technology
has benefited from rich to poor patients by providing real time information to the patients with
lower cost of diagnose and monitoring the patients’ health. This has increase in detail health
records of the patients dramatically. The vast medical records are available to the research.
The medical industry faces enormous challenges in using the huge medical data. The vast
amount of data is transformed to obtain valuable and accurate information speedily by
machine. Thus, machine learning is the important area. The highly useful machine learning
models used to discover the hidden pattern and correlation among features in the dataset . The
medical dataset is inconsistent and redundant, appropriate pre processing is pivot step .
Various researcher has included risk of different feature the most prevalent are 14 features.
Since the feature selection become an important part of the study, based on the feature selection
the model increases or decrease the prediction accuracy . The cardiac disease can be predicted
with the help of machine learning with greater accuracy will help healthcare to diagnose and
treat patient in early stage supporting many patients to diagnose disease in short period of time.
Thus, saving millions of lives.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 7

Chapter 4

METHODOLOGY
This paper shows the analysis of various machine learning algorithms, the algorithms
that are used in this paper are K nearest neighbors (KNN), Logistic Regression and Random
Forest Classifiers which can be helpful for practitioners or medical analysts for accurately
diagnose Heart Disease. This paperwork includes examining the journals, published paper and
the data of cardiovascular disease of the recent times. Methodology gives a framework for the
proposed model. The methodology is a process which includes steps that trans-form given
data into recognized data patterns for the knowledge of the users. The proposed methodology
(includes steps, where first step is referred as the collection of the data than in second
stage it extracts significant values than the 3rd is the preprocessing stage where we explore
the data. Data preprocessing deals with the missing values, cleaning of data and
normalization depending on algorithms used .After pre-processing of data, classifier is used
to classify the pre-processed data the classifier used in the proposed model are KNN,
Logistic Regression, Random Forest Classifier. Finally, the proposed model is undertaken,
where we evaluated our model on the basis of accuracy and performance using various
performance metrics. Here in this model, an effective Heart Disease Prediction System
(EHDPS) has been developed using different classifiers. This model uses 14 medical
parameters such as chest pain, fasting sugar, blood pressure, cholesterol, age, sex etc. for
prediction .

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 8

Figure 4.1: Proposed model

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 9

Chapter 5

TECHNOLOGY
Machine learning is an application of AI that enables systems to learn and im- prove from
experience without being explicitly programmed. Machine learning focuses on developing
computer programs that can access data and use it to learn for themselves. Similar to how the
human brain gains knowledge and understanding, machine learning relies on input, such as
training data or knowledge graphs, to understand entities, domains and the connections
between them. With entities defined, deep learning can begin.

Figure 5.1: Machine learning

The machine learning process begins with observations or data, such as examples, direct
experience or instruction. It looks for patterns in data so it can later make inferences based on
the examples provided. The primary aim of ML is to allow computers to learn autonomously
without human intervention or assistance and adjust actions accordingly. Machine learning
as a concept has been around for quite some time. The term “machine learning” was coined by
Arthur Samuel, a computer scientist at IBM and a pioneer in AI and computer gaming. Samuel
designed a computer program for playing checkers. The more the program played, the more
it learned from experience, using algorithms to make predictions.
ML has proven valuable because it can solve problems at a speed and scale that cannot
be duplicated by the human mind alone. With massive amounts of computational ability behind
a single task or multiple specific tasks, machines can be trained to identify patterns in and
relationships between input data and automate routine processes

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 10

Supervised Learning: More Control, Less Bias Supervised machine learning algo- rithms
apply what has been learned in the past to new data using labeled examples to predict future
events. By analyzing a known training dataset, the learning algorithm produces an inferred
function to predict output values. The system can provide targets for any new input after
sufficient training. It can also compare its output with the correct, intended output to find
errors and modify the model accordingly.
Unsupervised Learning: Speed and Scale Unsupervised machine learning algorithms are
used when the information used to train is neither classified nor labeled. Unsupervised learning
studies how systems can infer a function to describe a hidden structure from unlabeled data.
At no point does the system know the correct output with certainty. Instead, it draws inferences
from datasets as to what the output should be.
Reinforcement Learning: Reinforcement learning is a feedback-based learning method,
in which a learning agent gets a reward for each right action and gets a penalty for each
wrong action. The agent learns automatically with these feedbacks and improves its
performance. In reinforcement learning, the agent interacts with the environment and
explores it. The goal of an agent is to get the most reward points, and hence, it improves its
performance. The robotic dog, which automatically learns the movement of his arms, is an
example of Reinforcement learning.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 11

Chapter 6

BLOCK DIAGRAM

Figure 6.1: Block Diagram

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 12

Data acquisition The cardiac disease dataset obtained from the UCI ML repository. It
contains 14 features and 303 records.
Data pre-processing Cardiovascular disease UCI dataset is first loaded and then data
cleaning and finding missing values was performed on all records. The dataset contains
complete information. The attributes of the dataset are multiclass variable in characteristics
with double classification.
Feature selection The patient record is identified uniquely by two features of the dataset
by sex and age from 14 attributes of the dataset and assign individual ids. The rest of
the features consists of medical information. The medical information are vital attributes
predicting heart disease. The correlation performed on all 14 attributes with the target value
to select the features with high and positive correlation feature
Splitting dataset The splitting of the dataset in the following ratios of training and test-
ing set in percentile.
Classification One of the Simplest and best ML classification algorithm is Logistic Re-
gression. The LR is the supervised ML binary classification algorithm widely used in most
application. It works on categorical dependent variable the result can be discrete or binary
categorical variable 0 or 1. The sigmoid function is used as a cost function. Sigmoid function
maps a predicted real value to a probabilistic value between ‘0’ and ‘1’.
Model building: In this phase, we will be building our Machine learning model for heart
disease detection

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 13

Chapter 7

LOGISTIC REGRESSION
World Health Organization has estimated 12 million deaths occur worldwide, every year
due to Heart diseases. Half the deaths in the United States and other developed countries are
due to cardio vascular diseases. The early prognosis of cardiovascular diseases can making
decisions on lifestyle changes in high risk patients and in turn reduce the complications.
This research intends to pinpoint the most relevant/risk factors of heart disease as well as
predict the overall risk using logistic regression Data Preparation
Source The dataset is publically available on the Kaggle website, and it is from an on-
going cardiovascular study on residents of the town of Framingham, Massachusetts. The
classification goal is to predict whether the patient has 10-year risk of future coronary heart
disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records
and 15 attributes. Variables Each attribute is a potential risk factor. There are both demo-
graphic, behavioral and medical risk factors.
Demographic:
• Sex: male or female(Nominal)
• Age: Age of the patient;(Continuous - Although the recorded ages have been truncated to
whole numbers, the concept of age is continuous) Behavioral
• Current Smoker: whether or not the patient is a current smoker (Nominal)
• Cigs Per Day: the number of cigarettes that the person smoked on average in one day.(can be
considered continuous as one can have any number of cigarettes, even half a cigarette.)
Medical( history)
• BP Meds: whether or not the patient was on blood pressure medication (Nominal)
• Prevalent Stroke: whether or not the patient had previously had a stroke (Nominal)
• Prevalent Hyp: whether or not the patient was hypertensive (Nominal)
• Diabetes: whether or not the patient had diabetes (Nominal) Medical(current)
• Tot Chol: total cholesterol level (Continuous)
• Sys BP: systolic blood pressure (Continuous)
• Dia BP: diastolic blood pressure (Continuous)

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 14

• BMI: Body Mass Index (Continuous)


• Heart Rate: heart rate (Continuous - In medical research, variables such as heart rate though
in fact discrete, yet are considered continuous because of large number of possible values.)
• Glucose: glucose level (Continuous) Predict variable (desired target)
• 10 year risk of coronary heart disease CHD (binary: “1”, means “Yes”, “0” means “No”)
Logistic Regression Logistic regression is a type of regression analysis in statistics used for
prediction of outcome of a categorical dependent variable from a set of predictor or inde-
pendent variables. In logistic regression the dependent variable is always binary. Logistic
regression is mainly used to for prediction and also calculating the probability of success.
The results above show some of the attributes with P value higher than the preferred alpha(5
percentage) and thereby showing low statistically significant relationship with the probabil
ity of heart disease. Backward elimination approach is used here to remove those attributes
with highest P-value one at a time followed by running the regression repeatedly until all
attributes have P Values less than 0.05. Feature Selection: Backward elimination (P-value
approach)

Figure 7.1: Logistic Regression

Interpreting the results: Odds Ratio, Confidence Intervals and P-values • This fitted model
shows that, holding all other features constant, the odds of getting diagnosed with heart disease
for males (sexmale = 1)over that of females (sexmale = 0) is exp(0.5815) = 1.788687. In terms
of percent change, we can say that the odds for males are 78.8 per- centage higher than the
odds for females. • The coefficient for age says that, holding all others constant, we will see
7 percentage increase in the odds of getting diagnosed with CDH for a one year increase
in age since exp(0.0655) = 1.067644. • Similarly , with every

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 15

extra cigarette one smokes thers is a 2 percentage increase in the odds of CDH. • For Total
cholesterol level and glucose level there is no significant change.
• There is a 1.7 percentage increase in odds for every unit increase in systolic Blood
Pressure.
Model Evaluation - Statistics From the above statistics it is clear that the model is highly
specific than sensitive. The negative values are predicted more accurately than the positives.
Predicted probabilities of 0 (No Coronary Heart Disease) and 1 ( Coronary Heart Disease: Yes)
for the test data with a default classification threshold of 0.5 lower the threshold Since the
model is predicting Heart disease too many type II errors is not advisable. A False Negative (
ignoring the probability of disease when there actually is one) is more dangerous than a False
Positive in this case. Hence in order to increase the sensitivity, threshold can be lowered.
• All attributes selected after the elimination process show P-values lower than 5 percentage
and thereby suggesting significant role in the Heart disease prediction.
• Men seem to be more susceptible to heart disease than women. Increase in age, number of
cigarettes smoked per day and systolic Blood Pressure also show increasing odds of having
heart disease
• Total cholesterol shows no significant change in the odds of CHD. This could be due to the
presence of ’good cholesterol(HDL) in the total cholesterol reading. Glucose too causes a very
negligible change in odds (0.2 percentage)
• The model predicted with 0.88 accuracy. The model is more specific than sensitive. Overall
model could be improved with more data

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 16

Chapter 8

TESTING
The testing would be carried out on the Hospital Management System while logging
into the system as a customer or a normal user of the system. The Unit Testing is a test
that tests each single module of the software to check for errors. This is mainly done to discover
errors in the code of the Hospital Management System. The main goal of the unit testing
would be to isolate each part of the program and to check the correctness of the code. In
the case of Hospital Management System, all the web forms and the classes will be tested.
In Integration Testing, the individual software modules are combined and tested as a whole
unit. The integration testing generally follows unit testing where each module is tested as a
separate unit. The main purpose of the integration testing is to test the functional and
performance requirements on the major items of the project. Acceptance testing is generally
performed when the project is nearing its end. This test mainly qualifies the project and decides
if it will be accepted by the users of the system. The users or the customers of the project are
responsible for the test. The system testing is mainly done on the whole integrated system to
make sure that the project that has been developed meets all the requirements.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 17

Chapter 9

RESULT AND DISCUSSION


The logistical regression is tested with UCI dataset with five different ratios and their
accuracy . The accuracy of 87.10 percentage obtained by logistical regression for split ratio
of training and testing is 90:10. The increasing accuracy of the model by increasing the
training is shown in and accuracy of result

Figure 9.1: Accuracy logistic regression

The Logistics Regression increase its accuracy with increasing training by 50 percent-
age to 90 percentage and 90 percentage training and 10 percentage testing provides highest
accuracy of 87.10The classification report, precision, recall, f1-score and accuracy of LR
classifier for UCI dataset with 90 percentage training and 10 percentage testing. The model
has precision of 0.857, recall 0.857, F1-score 0.857 and accuracy of 87.10 percentage The
ROC (Receiver Operator Characteristics) curve as used to further investigation in to the model.
The performance of the model is visualized by ROC Curve and the tradeoff between TPR
(True Positive Rate) and FPR (False Positive Rate). It ranges from 0 to 1 and the area under it
signifies the capabilities of distinguish the class of ML model. The ROC curve as near to one
it is more capable of classifying. The represents the various previous research work carried on
Logistic Regression using rapid minor and python on UCI Dataset from the year 2019 to 2021
with accuracy of prediction.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 18

Figure 9.2: ROC curve.

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 19

Chapter 10

CONCLUSION
The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in
high risk patients and in turn reduce the complications, which can be a great milestone in the field
of medicine. This project resolved the feature selection i.e. backward elimination and
RFECV behind the models and successfully predict the heart disease, with 85% accuracy. The
model used was Logistic Regression. Further for its enhancement, we can train on models and
predict the types of cardiovascular diseases providing recommendations to the users, enhanced
model

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 20

Bibliography

[1] World Health Organization and J. Dostupno, cardiovascular diseases: key facts, vol. 13, no. 2016,
p. 6, 2016. [Online].https: // www. who. int/ en/ news-room/ fact- sheets/ detail/
cardiovascular-diseases-( cvds) . Google Scholar

[2] K. Uyar, A. Ilhan Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy
neural networks Proced. Comput. Sci., 120 (2017), pp. 588-593.

[3] N. Kausar, S. Palaniappan, B.B. Samir, A. Abdullah, N. Dey Systematic analysis of ap-
plied data mining based optimization algorithms in clinical attribute extraction and
classification for diagnosis of cardiac patients in Applications of Intelligent Optimiza- tion
in Biology and Medicine, Cham, Switzerland: Springer (2016), pp. 217-231

[4] M. Shouman, T. Turner, R. Stocker Integrating clustering with different data mining
techniques in the diagnosis of heart disease J. Comput. Sci. Eng., 20 (1) (2013), pp. 1-10

[5] M.S. Amin, Y.K. Chiam, K.D. Varathan Identification of significant features and data
mining techniques in predicting heart disease Telemat. Inf., 36 (2019), pp. 82-93 Mar.

[6] Z. Khan, D.K. Mishra, V. Sharma, A. Sharma Empirical study of various classifica- tion
techniques for heart disease prediction Proceedings of the IEEE 5th International
Conference on Computing Communication and Automation (ICCCA) (2020), pp. 57-62,
10.1109/ICCCA49541.2020.9250852

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 21

Chapter 11

SAMPLE CODE
FIRST PHASE
import numpy as np
import pandas as pd
from sklearn.modelselectionimporttraintests plit
from sklearn.linearmodelimportLogisticRegression
from sklearn.metrics import accuracyscore
Data Collection and Processing
[]
loading the csv data to a Pandas DataFrame
heartdata = pd.readcsv(′/content/data.csv′)

print last 5 rows of the dataset


heartdata.tail()
number of rows and columns in the dataset
heartdata.shape
getting some info about the data
heartdata.in f o()
checking for missing values
heartdata.isnull().sum()
statistical measures about the data
heartdata.describe()
checking the distribution of Target Variable

heartdata[′target′].valuecounts()
Splitting the Features and Target
X = heartdata.drop(columns =′ target′, axis = 1)Y = heartdata[′target′]

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 22

print(X)
print(Y)

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 23

Chapter 12

SCREENSHOTS
Data Collection and Processing
loaded the csv data to a Pandas DataFrame

Figure 12.1: print first 5 rows of the dataset

Figure 12.2: print last 5 rows of the dataset

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 24

Figure 12.3: some info about the data

Figure 12.4: checking for missing values

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 25

Figure 12.5: statistical measures about the data

Figure 12.6: checking the distribution of Target Variable

Splitting the Features and Target

Figure 12.7: Features

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE


HEART DISEASE PREDICTION USING MACHINE LEARNING 26

Figure 12.8: Target

RAJADHANI INSTITUTE OF SCIENCE AND TECHNOLOGY DEPT OF ECE

You might also like