Heart Disease Identification Using Machine Learning Classification
Heart Disease Identification Using Machine Learning Classification
v
5.3.4 MODEL TRAINING 16
5.3.5 TESTING MODEL 17
5.3.6 PERFORMANCE EVALUATION 19
5.3.7 PREDICTION 19
6 RESULT AND DISCUSSION 20
7 CONCLUSION AND FUTURE WORK 22
7.1 CONCLUSION 22
7.2 FUTURE WORK 23
8 REFERENCE 23
9 APPENDIX 25
A. SOURCE CODE 25
B. OUTPUT SCREENSHOTS 31
C. PLAGARISM REPORT 33
LIST OF FIGURES
LIST OF ABBREVATIONS
ML – Machine Learning
RF – Random Forest
vii
CHAPTER – 1
INTRODUCTION
1.1 Introduction
Heart disease (HD) is the critical health issue and numerous people have been
suffered by this disease around the world .The HD occurs with common symptoms of
breath shortness, physical body weakness and, feet are swollen. Researchers try to
come across an efficient technique for the detection of heart disease, as the current
diagnosis techniques of heart disease are not much effective in early time identification
due to several reasons, such as accuracy and execution time. The diagnosis and
treatment of heart disease is extremely difficult when modern technology and medical
experts are not available. The effective diagnosis and proper treatment can save the
lives of many people. According to the European Society of Cardiology, 26 million
approximately people of HD were diagnosed and diagnosed 3.6 million annually. Most
of the people in the United States are suffering from heart disease Diagnosis of HD is
traditionally done by the analysis of the medical history of the patient, physical
examination report and analysis of concerned symptoms by a physician. But the
results obtained from this diagnosis method are not accurate in identifying the patient
of HD. Moreover, it is expensive and computationally difficult to analyze. Thus, to
develop a non-invasive diagnosis system based on classifiers of machine learning to
resolve these issues. Expert decision system based on machine learning classifiers
and the application of artificial fuzzy logic is effectively diagnosis the HD as a result,
the ratio of death decrease.
CHAPTER -2
LITERARY REVIEW
Dr. M. Kavitha2021 Heart disease causes a significant mortality rate around the
world, and it has become a health threat for many people. Early prediction of heart
disease may save many lives; detecting cardiovascular diseases like heart attacks,
coronary artery diseases etc., is a critical challenge by the regular clinical data
analysis. Machine learning (ML) can bring an effective solution for decision making
and accurate predictions. The medical industry is showing enormous development in
using machine learning techniques. In the proposed work, a novel machine learning
approach is proposed to predict heart disease. The proposed study used the
Cleveland heart disease dataset, and data mining techniques such as regression and
classification are used. Machine learning techniques Random Forest and Decision
Tree are applied. The novel technique of the machine learning model is designed. In
implementation, 3 machine learning algorithms are used, they are 1. Random Forest,
2. Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree).
Experimental results show an accuracy level of 88.7% through the heart disease
prediction model with the hybrid model. The interface is designed to get the user's
input parameter to predict the heart disease, for which we used a hybrid model of
Decision Tree and Random Forest.
2
Abderrahmane Ed-daoudy 2019Over the last few decades, heart disease is the most
common cause of global death. So early detection of heart disease and continuous
monitoring can reduce the mortality rate. The exponential growth of data from different
sources such as wearable sensor devices used in Internet of Things health monitoring,
streaming system and others have been generating an enormous amount of data on a
continuous basis. The combination of streaming big data analytics and machine
learning is a breakthrough technology that can have a significant impact in healthcare
field especially early detection of heart disease. This technology can be more powerful
and less expensive. To overcome this issue, this paper propose a real-time heart
disease prediction system based on apache Spark which stand as a strong large scale
distributed computing platform that can be used successfully for streaming data event
against machine learning through in-memory computations. The system consists of
two main sub parts, namely streaming processing and data storage and visualization.
The first uses Spark ML lib with Spark streaming and applies classification model on
data events to predict heart disease. The seconds uses Apache Cassandra for storing
the large volume of generated data.
RahmaAtallah 2019 This paper presents a majority voting ensemble method that is
able to predict the possible presence of heart disease in humans. The prediction is
based on simple affordable medical tests conducted in any local clinic. Moreover, the
aim of this project is to provide more confidence and accuracy to the Doctor‘s
diagnosis since the model is trained using real-life data of healthy and ill patients. The
model classifies the patient based on the majority vote of several machine learning
models in order to provide more accurate solutions than having only one model.
Finally, this approach produced an accuracy of 90% based on the hard voting
ensemble model.
Noor Basha 2019Analysis and Prediction of diseases are two most demanding factors
to be faced critically by the doctors and data scientist, where data analytics be very
delightful issue, so in this regard, many health industries will working on variety of
human syndromes, where they generate huge data. Heart disease, cancer, tumor and
3
Alzheimer‘s disease are one of the chronic human diseases, where data scientist and
doctors are doing rapid and efficient analysis on these diseases using many machine
learning techniques to study and predict these diseases to save and reduce human
deaths.
CHAPTER - 3
Negative result= 0, the patient will not be diagnosed with heart disease.
In the proposed work user will search for the heart Disease diagnosis (heart
Disease and treatment related information) by giving symptoms as a query in the
search engine.
These symptoms are pre-processed to make the further process easier to find
the symptoms keyword which helps to identify the heart Disease quickly.
4
CFS+PSO are a type of instance-based learning, or lazy learning where the
function is only approximated locally and all computation is deferred until
classification.
This feature has been identified as the most suitable for the present system.
3.2.2 Advantages
1. It is easy to extract signatures from individual data instances, as their
structures. Just collect the symptoms that enough to scaling samples.
2. Can easily predict the heart Disease level and severity easily using range level
of queries.
3. The probability of vocabulary gap between diverse health seekers makes the
data more consistent compared to other formats of health data.
3.2.3 Disadvantages
1. Existing systems have failed to utilize and understand the importance of
misdiagnosis. A very important attribute which interconnects and addresses all
these issues.
2. It varies from patient‘s medical history, climatic conditions, neighborhood, and
various other factors.
CHAPTER 4
Introduction:
In this blog, we will discuss the workflow of a Machine learning project this includes all
the steps required to build the proper machine learning project from scratch.
We will also go over data pre-processing, data cleaning, feature exploration and
feature engineering and show the impact that it has on Machine Learning Model
Performance. We will also cover a couple of the pre-modeling steps that can help to
improve the model performance.
1. Gathering data
2. Data pre-processing
3. Researching the model that will be best for the type of data
6
4. Training and testing the model
5. Evaluation
The machine learning model is nothing but a piece of code; an engineer or data
scientist makes it smart through training with data. So, if you give garbage to the
model, you will get garbage in return, i.e. the trained model will provide false or wrong
prediction.