0% found this document useful (0 votes)
10 views20 pages

PythonHeartDisease FirstReview

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

PythonHeartDisease FirstReview

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

PREDICTION OF HEART

DISEASE USING
VARIOUS DATA MINING
TECHNIQUES
SYNOPSIS
The main motivation of doing this project is to
present a heart disease prediction model for the
prediction of occurrence of heart disease.

Further, this research work is aimed towards


identifying the best classification algorithm for
identifying the possibility of heart disease in a
patient.

Convolutional neural network (CNN) architecture is


used.
SYNOPSIS (CONTD)
In addition, support vector machine based classification and
K-Nearest Neighbor based classification is also carried out
and accuracy is found out.

The applied SVM, KNN and CNN classification helps to


predict the heart disease with more accuracy in the new
data set.

The coding language used is Python 3.7.


OBJECTIVES
 Identification of safe or risk category of heart records.

 Featurereduction of data set for convolutional neural


network.

 To consider SVM/KNN Classification so that probability


of safe/risk in the given new test data is possible.

 Toreduce features before risk classification using


SVM/KNN is carried out.
EXISTING SYSTEM
 The existing system focuses on KNN classification
algorithms to effectively detect heart categories in
dataset.

 The dataset is taken from UCI repository. Preprocessed


such as zero value, N/A value and unicode character
elimination are not is not carried out here.

 Important features are extracted out for better


classification.

 Confusion matrix is prepared with accuracy score


calculation.
DISADVANTAGES OF
EXISTING SYSTEM
KNN Classification is not considered so that
probability of (disease) yes/no records in the
given new test data is not possible.

Feature reduction before malware identification


is not carried out.

Data columns with numeric values only take


from KNN classification.
PROPOSED SYSTEM
 Theproposed system focuses on SVM classification
algorithms as well as neural network to effectively
detect risk categories in dataset records.

 The dataset is taken from UCI repository and


preprocessed such as Unicode removal. Important
features are extracted out for better classification.
Confusion matrix is prepared with accuracy score
calculation.
PROPOSED SYSTEM
(CONTD)
Accuracy prediction is also carried out.
Convolutional Neural Network based
prediction model is worked out to find
algorithm efficiency.

540 training records and 210 test records are


taken out for convolutional neural network
training.

ADVANTAGE OF PROPOSED SYSTEM
 SVM Classification is considered so that probability of
safe/risk category in the given new test data is possible.

 Feature reduction before disease identification is carried


out.

 SVM supports well even if the dataset size is big.

 Convolutional Neural Network based prediction model


is worked out to find algorithm efficiency.
HARDWARE SPECIFICATION
Processor : Intel Core 2 Duo

Hard Disk Capacity : 500 GB

RAM : 4 GB DDR RAM

Monitor : 17inch Color

Keyboard : 102 keys

Mouse : Optical Scroll


SOFTWARE SPECIFICATION

Operating System : Windows 10 Pro

Environment : IDLE/CoLabs

Language : Python 3.7


MODULES
DATA SET COLLECTION

DATA SET SUBSETTING BASED ON HEART


DISEASE ATRIBUTES

SVM/KNN CLASSIFICATION

CNN CLASSIFICATION
1. DATA SET COLLECTION

The dataset which contains columns (Age,


Sex, Cp_Chestpaintype,
t_restingbloodpressure_d,
serumcholestrolinmg,
fastingbloodsugarlevel, restecg, thalach,
exang, oldpeak, peakslope, numvessels, thal,
classfactor) are saved in a single Excel
workbook as records. This is the input for
the project.
2. DATA SET SUBSETTING
BASED ON RISK TYPES
The dataset which contains columns (Age, Sex,
Cp_Chestpaintype, t_restingbloodpressure_d,
serumcholestrolinmg, fastingbloodsugarlevel, restecg,
thalach, exang, oldpeak, peakslope, numvessels, thal,
classfactor) are saved in a single Excel workbook as
records.

This is the input for the project in which 540


(collectively (Safe 0 and Risk 1) for training records
and 210 (collectively (Safe 0 and Risk 1) for testing
records are split and given for neural network.
3. SVM/KNN CLASSIFICATION
In this module, 80% of the data in given data set is
taken as training data and 20% of the data is taken as
test data.

The text (categorical) columns are converted into


numerical values.

Then the model is trained with training data and then


predicted with test data.

Of which, most of the apps are classified as disease


present or not.
4. CNN CLASSIFICATION
Here the dataset is taken first. It can be seen that news
data is stored in the form of csv values (Comma Separated
Values).

Each record contains attribute values for one heart disease


definition.
Once the model is created, it can be imported and then
compiled using ‘model.compile’. The model is trained for
just ten epochs but we can increase the number of epochs.

After the training process is completed we can make


predictions on the test set. The accuracy value is displayed
during iterations.
SYSTEM FLOW DIAGRAM
MACHINE LEARNING-BASED HEART DISEASE PREDICTION

Data Set Preprocess Classification

Remove N/A values SVM/KNN


Classification

Subset train and test Feature Reduction


records for CNN for CNN

Download dataset
from UCI repository
CNN
Classification in
reduced feature
set
LITERATURE SURVEY PAPERS
[1] Franck Le Duff, CristianMunteanb, Marc Cuggiaa and Philippe Mabob,
“Predicting Survival Causes After Out of Hospital Cardiac Arrest using Data
Mining Method”, Studies in Health Technology and Informatics, Vol. 107, No. 2,
pp. 1256-1259, 2004.
[2] W.J. Frawley and G. Piatetsky-Shapiro, “Knowledge Discovery in Databases:
An Overview”, AI Magazine, Vol. 13, No. 3, pp. 57-70, 1996.
[3] Kiyong Noh, HeonGyu Lee, Ho-Sun Shon, Bum Ju Lee and Keun Ho Ryu,
“Associative Classification Approach for Diagnosing Cardiovascular Disease”,
Intelligent Computing in Signal Processing and Pattern Recognition, Vol. 345,
pp. 721-727, 2006.
[4] Latha Parthiban and R. Subramanian, “Intelligent Heart Disease Prediction
System using CANFIS and Genetic Algorithm”, International Journal of
Biological, Biomedical and Medical Sciences, Vol. 3, No. 3, pp. 1-8, 2008.
[5] Sellappan Palaniappan and Rafiah Awang, “Intelligent Heart Disease
Prediction System using Data Mining Techniques”, International Journal of
Computer Science and Network Security, Vol. 8, No. 8, pp. 1-6, 2008
CONCLUSION
The project focuses on SVM classification algorithms to effectively
detect heart disease types.

The dataset is taken from UCI repository. Preprocessed such as zero


value, N/A value and unicode character elimination is carried out here.
Important features are extracted out for better classification.

Confusion matrix is prepared with accuracy score calculation.

In addition, KNN classification algorithms as well as neural network to


effectively detect risk types are also carried out.

The dataset is taken and preprocessed such as unicode removal.


Important features are extracted out for better classification.
THANK YOU

You might also like