100% found this document useful (2 votes)
50 views11 pages

Heart Disease Identification Using Machine Learning Classification

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
100% found this document useful (2 votes)
50 views11 pages

Heart Disease Identification Using Machine Learning Classification

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 11

ABSTRACT

In this article, we proposed an efficient and accurate system to


diagnosis heart disease and the system is based on machine
learning techniques The system is developed based on classification
algorithms includes Support vector machine, Logistic regression,
Artificial neural network, K-nearest neighbor, Naïve bays, and
Decision tree while standard features selection algorithms have
been used. The features selection algorithms are used for features
selection to increase the classification accuracy and reduce the
execution time of classification system. Furthermore, the leave one
subject out cross-validation method has been used for learning the
best practices of model assessment and for hyper parameter tuning.
The performance measuring metrics are used for assessment of the
performances of the classifiers. The performances of the classifiers
have been checked on the selected features as selected by features
selection algorithms. The experimental results show that the
proposed feature selection algorithm (FCMIM) is feasible with
classifier support vector machine for designing a high-level
intelligent system to identify heart disease. Additionally, the
proposed system can easily be implemented in healthcare for the
identification of heart disease.
INDEX
TABLE OF CONTENTS
CHAPTER PAGE
TITLE
NO NO
ABSTRACT Ⅰ
LIST OF FIGURES Ⅱ
LIST OF ABBREVATIONS Ⅳ
1 INTRODUCTION 1
2 LITERARY REVIEW 1
3 AIM AND SCOPE OF THE PROJECT 4
3.1 AIM OF THE PROJECT 4
3.2 SCOPE OF THE PROJECT 4
3.2.1 PROPOSE SYSTEM 4
3.2.2 ADVANTAGES 5
3.2.3 DISADVANTAGES 5
4 WORKING THEORY OF THE PROJECT 5
4.1 MACHINE LEARNING 6
4.2 GATHERING DATA 7
4.3 DATA PRE-PROCESSING 7
4.4 RESEARCHING THE MODEL THAT WILL BE
9
BEST FOR THE TYPE OF DATA
4.5 TRAINING AND TESTING THE MODEL OF
13
DATA
4.6 EVALUATION 15
5 IMPLEMENTATION AND METHODOLOGY 16
5.1 SOFTWARE REQUIREMENT 16
5.2 HARDWARE REQUIREMENT 16
5.3 MODULE NAME 16
5.3.1 DATASET COLLECTION 15
5.3.2PRE-PROCESSING 15
5.3.3 FEATURE EXTRACTION 16

v
5.3.4 MODEL TRAINING 16
5.3.5 TESTING MODEL 17
5.3.6 PERFORMANCE EVALUATION 19
5.3.7 PREDICTION 19
6 RESULT AND DISCUSSION 20
7 CONCLUSION AND FUTURE WORK 22
7.1 CONCLUSION 22
7.2 FUTURE WORK 23
8 REFERENCE 23
9 APPENDIX 25
A. SOURCE CODE 25
B. OUTPUT SCREENSHOTS 31
C. PLAGARISM REPORT 33

LIST OF FIGURES

FIGURE TITLE PAGE


NO. NO.
4.1 RESEARCHING THE MODEL 9
4.2 CLASSIFICATION 10
4.3 REGRESSION 11
4.4 CLUSTERED DATA 12
4.5 OVERVIEW OF MODEL 13
4.6 TRAINING AND TESTING 13
4.7 DATA SEGMENTATION 14
4.8 CONFUSION MATRIX 15
4.9 ARCHITECTURAL DIAGRAM OF OUR MODEL 16
6.1 TEST ACCURACY OF RANDOM FOREST 20
6.2 TEST ACCURACY OF KNN 21
6.3 TEST ACCURACY OF LOGESTIC REGRESSION 21
6.4 COMPARISON OF THREE ALGORITHMS 22
vi
6.5 WEB PAGE INTERFACE 23

LIST OF ABBREVATIONS

ML – Machine Learning

CNN – Convolutional Neural Network

SVM – Support Vector Machine

KNN – K-Nearest Neighbor

RF – Random Forest

vii
CHAPTER – 1

INTRODUCTION

1.1 Introduction
Heart disease (HD) is the critical health issue and numerous people have been
suffered by this disease around the world .The HD occurs with common symptoms of
breath shortness, physical body weakness and, feet are swollen. Researchers try to
come across an efficient technique for the detection of heart disease, as the current
diagnosis techniques of heart disease are not much effective in early time identification
due to several reasons, such as accuracy and execution time. The diagnosis and
treatment of heart disease is extremely difficult when modern technology and medical
experts are not available. The effective diagnosis and proper treatment can save the
lives of many people. According to the European Society of Cardiology, 26 million
approximately people of HD were diagnosed and diagnosed 3.6 million annually. Most
of the people in the United States are suffering from heart disease Diagnosis of HD is
traditionally done by the analysis of the medical history of the patient, physical
examination report and analysis of concerned symptoms by a physician. But the
results obtained from this diagnosis method are not accurate in identifying the patient
of HD. Moreover, it is expensive and computationally difficult to analyze. Thus, to
develop a non-invasive diagnosis system based on classifiers of machine learning to
resolve these issues. Expert decision system based on machine learning classifiers
and the application of artificial fuzzy logic is effectively diagnosis the HD as a result,
the ratio of death decrease.

CHAPTER -2

LITERARY REVIEW

2.1 LITERARY REVIEW


1
Rahul Katarya2020 Predicting and detection of heart disease has always been a
critical and challenging task for healthcare practitioners. Hospitals and other clinics are
offering expensive therapies and operations to treat heart diseases. So, predicting
heart disease at the early stages will be useful to the people around the world so that
they will take necessary actions before getting severe. Heart disease is a significant
problem in recent times; the main reason for this disease is the intake of alcohol,
tobacco, and lack of physical exercise. Over the years, machine learning shows
effective results in making decisions and predictions from the broad set of data
produced by the health care industry. Some of the supervised machine learning
techniques used in this prediction of heart disease are artificial neural network (ANN),
decision tree (DT), random forest (RF), support vector machine (SVM), naïve Bayes)
(NB) and k-nearest neighbor algorithm. Furthermore, the performances of these
algorithms are summarized.

Dr. M. Kavitha2021 Heart disease causes a significant mortality rate around the
world, and it has become a health threat for many people. Early prediction of heart
disease may save many lives; detecting cardiovascular diseases like heart attacks,
coronary artery diseases etc., is a critical challenge by the regular clinical data
analysis. Machine learning (ML) can bring an effective solution for decision making
and accurate predictions. The medical industry is showing enormous development in
using machine learning techniques. In the proposed work, a novel machine learning
approach is proposed to predict heart disease. The proposed study used the
Cleveland heart disease dataset, and data mining techniques such as regression and
classification are used. Machine learning techniques Random Forest and Decision
Tree are applied. The novel technique of the machine learning model is designed. In
implementation, 3 machine learning algorithms are used, they are 1. Random Forest,
2. Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree).
Experimental results show an accuracy level of 88.7% through the heart disease
prediction model with the hybrid model. The interface is designed to get the user's
input parameter to predict the heart disease, for which we used a hybrid model of
Decision Tree and Random Forest.
2
Abderrahmane Ed-daoudy 2019Over the last few decades, heart disease is the most
common cause of global death. So early detection of heart disease and continuous
monitoring can reduce the mortality rate. The exponential growth of data from different
sources such as wearable sensor devices used in Internet of Things health monitoring,
streaming system and others have been generating an enormous amount of data on a
continuous basis. The combination of streaming big data analytics and machine
learning is a breakthrough technology that can have a significant impact in healthcare
field especially early detection of heart disease. This technology can be more powerful
and less expensive. To overcome this issue, this paper propose a real-time heart
disease prediction system based on apache Spark which stand as a strong large scale
distributed computing platform that can be used successfully for streaming data event
against machine learning through in-memory computations. The system consists of
two main sub parts, namely streaming processing and data storage and visualization.
The first uses Spark ML lib with Spark streaming and applies classification model on
data events to predict heart disease. The seconds uses Apache Cassandra for storing
the large volume of generated data.

RahmaAtallah 2019 This paper presents a majority voting ensemble method that is
able to predict the possible presence of heart disease in humans. The prediction is
based on simple affordable medical tests conducted in any local clinic. Moreover, the
aim of this project is to provide more confidence and accuracy to the Doctor‘s
diagnosis since the model is trained using real-life data of healthy and ill patients. The
model classifies the patient based on the majority vote of several machine learning
models in order to provide more accurate solutions than having only one model.
Finally, this approach produced an accuracy of 90% based on the hard voting
ensemble model.

Noor Basha 2019Analysis and Prediction of diseases are two most demanding factors
to be faced critically by the doctors and data scientist, where data analytics be very
delightful issue, so in this regard, many health industries will working on variety of
human syndromes, where they generate huge data. Heart disease, cancer, tumor and
3
Alzheimer‘s disease are one of the chronic human diseases, where data scientist and
doctors are doing rapid and efficient analysis on these diseases using many machine
learning techniques to study and predict these diseases to save and reduce human
deaths.

CHAPTER - 3

AIM AND SCOPE OF THE PROJECT

3.1 AIM OF THE PROJECT


 The AIM Cardiology Solution is a cardiology benefits management program that
helps ensure clinically appropriate.
 Cost-effective care for your members with heart disease.

3.2 SCOPE OF THE PROJECT

 The goal of our heart disease prediction project is to determine if a patient


should be diagnosed with heart disease or not, which is a binary outcome.

 So Positive result=1, the patient will be diagnosed with heart disease.

 Negative result= 0, the patient will not be diagnosed with heart disease.

3.2.1 Propose system

 In the proposed work user will search for the heart Disease diagnosis (heart
Disease and treatment related information) by giving symptoms as a query in the
search engine.

 These symptoms are pre-processed to make the further process easier to find
the symptoms keyword which helps to identify the heart Disease quickly.
4
 CFS+PSO are a type of instance-based learning, or lazy learning where the
function is only approximated locally and all computation is deferred until
classification.

 This feature has been identified as the most suitable for the present system.

3.2.2 Advantages
1. It is easy to extract signatures from individual data instances, as their
structures. Just collect the symptoms that enough to scaling samples.
2. Can easily predict the heart Disease level and severity easily using range level
of queries.
3. The probability of vocabulary gap between diverse health seekers makes the
data more consistent compared to other formats of health data.

3.2.3 Disadvantages
1. Existing systems have failed to utilize and understand the importance of
misdiagnosis. A very important attribute which interconnects and addresses all
these issues.
2. It varies from patient‘s medical history, climatic conditions, neighborhood, and
various other factors.

CHAPTER 4

WORKING THEORY OF OUR PROJECT

4.1 Machine Learning

What are the 7 steps of machine learning?


5
7 Steps of Machine Learning
 Step 1: Gathering Data. …
 Step 2: Preparing that Data. …
 Step 3: Choosing a Model. …
 Step 4: Training. …
 Step 5: Evaluation. …
 Step 6: Hyper parameter tuning. …
 Step 7: Prediction.

Introduction:

In this blog, we will discuss the workflow of a Machine learning project this includes all
the steps required to build the proper machine learning project from scratch.
We will also go over data pre-processing, data cleaning, feature exploration and
feature engineering and show the impact that it has on Machine Learning Model
Performance. We will also cover a couple of the pre-modeling steps that can help to
improve the model performance.

Python Libraries that would be need to achieve the task:


1. Numpy
2. Pandas
3. Sci-kit Learn
4. Matplotlib

Understanding the machine learning workflow

We can define the machine learning workflow in 3 stages.

1. Gathering data
2. Data pre-processing
3. Researching the model that will be best for the type of data
6
4. Training and testing the model
5. Evaluation

Okay but first let‘s start from the basics

What is the machine learning Model?

The machine learning model is nothing but a piece of code; an engineer or data
scientist makes it smart through training with data. So, if you give garbage to the
model, you will get garbage in return, i.e. the trained model will provide false or wrong
prediction.

4.2 Gathering Data


The process of gathering data depends on the type of project we desire to make, if we
want to make an ML project that uses real-time data, then we can build an IoT system
that using different sensors data. The data set can be collected from various sources
such as a file, database, sensor and many other such sources but the collected data
cannot be used directly for performing the analysis process as there might be a lot of
missing data, extremely large values, unorganized text data or noisy data. Therefore,
to solve this problem Data Preparation is done. We can also use some free data sets
which are present on the internet. Kaggle and UCI Machine learning Repository are
the repositories that are used the most for making Machine learning models. Kaggle is
one of the most visited websites that is used for practicing machine learning
algorithms, they also host competitions in which people can participate and get to test
their knowledge of machine learning.

4.3 Data pre-processing


Data pre-processing is one of the most important steps in machine learning. It is the
most important step that helps in building machine learning models more accurately. In
machine learning, there is an 80/20 rule. Every data scientist should spend 80% time
for data per-processing and 20% time to actually perform the analysis.
7

You might also like