0% found this document useful (0 votes)
15 views8 pages

Heart Disease Prediction Using Frequent Item Set M

Uploaded by

Vedika katarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

Heart Disease Prediction Using Frequent Item Set M

Uploaded by

Vedika katarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/345383677

Heart Disease Prediction Using Frequent Item Set Mining and Classification
Technique

Article in International Journal of Information Engineering and Electronic Business · November 2019
DOI: 10.5815/ijieeb.2019.06.02

CITATIONS READS

32 281

4 authors:

Sinkon Nayak Mahendra Kumar Gourisaria

12 PUBLICATIONS 102 CITATIONS


KIIT University
125 PUBLICATIONS 1,117 CITATIONS
SEE PROFILE
SEE PROFILE

Manjusha Pandey Siddharth Rautaray

144 PUBLICATIONS 1,299 CITATIONS


KIIT University
95 PUBLICATIONS 1,574 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Sinkon Nayak on 11 February 2021.

The user has requested enhancement of the downloaded file.


I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
Published Online November 2019 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijieeb.2019.06.02

Heart Disease Prediction Using Frequent Item Set


Mining and Classification Technique
Sinkon Nayak
School of Computer Engineering, KIIT Deemed University, Bhubaneswar, India
Email: [email protected]

Mahendra Kumar Gourisaria, Manjusha Pandey, Siddharth Swarup Rautaray


School of Computer Engineering, KIIT Deemed University, Bhubaneswar, India
Email: [email protected], [email protected], [email protected]

Received: 30 April 2019; Accepted: 30 May 2019; Published: 08 November 2019

Abstract—The heart is the most important part of the are used to spot and prevent the diseases at an primitive
human body. Any abnormality in heart results heart period of time. For the prediction of the heart related
related illness in which it obstructs blood vessels which illness it uses 14 attributes having 303 instances. Various
causes heart attack, chest pain or stroke. Care and performance measurement parameters are used like
improvement of the health by the help of identification, accuracy, sensitivity, specificity, positively predicted
prevention, and care of any kind of diseases is the main value, negatively predicted value and the area under
goal. So for this various prediction analysis methods are curve.
used which job is to identify the illness at prelim phase so This paper is organized into section as follows. Section
that prevention and care of heart disease is done. This II encapsulates heart disease. Section III provide a brief
paper emphasizes on the care of heart diseases at a description of literature survey of heart related disease.
primitive phase so that it will lead to a successful cure. In The work flow steps are discussed in section IV. Section
this paper, diverse data mining classification method like V is all about the preprocessing of data and VI describes
Decision tree classification, Naive Bayes classification, the attribute filtration. Section VII concise discussion of
Support Vector Machine classification, and k-NN the classification techniques such as Naive Bayes,
classification are used for determination and safeguard of Decision tree, SVM, k-NN. Dataset collection attributes
the diseases. elucidation, comparison study is discussed in section VIII.
Section VII is all of the result analysis. Section IX is the
Index Terms—Heart Disease, Frequent Itemset, conclusion, summarizes a brief overview of the content.
Classification , Performance Measurement Parameter.

II. LITERATURE STUDY


I. INTRODUCTION
The heart-related sickness is the capital inception of
Mining is the way used to find the unexplored data death for everyone these days. Cardiovascular disease
from a immense abstraction of data which is not easy to refers to the trouble occur with heart. Life is completely
analyze[9]. HealthCare is the field, a large abstraction of leaning on the effective functioning of heart. There are
data. Maintenance and improvement of the unwellness by various factors on the basis of which risk become
diagnosing, hindrance, and care of diseases. Problem is to
stipulate healthier care at an inexpensive monetary value. Table 1. Risk Factor for Heart Disease
Cardiovascular illness mean the difficulty take place in
heart, circulatory system, and blood vessels[3]. Heart Risk Factors Degree of Risk
sickness relates to anxiety and deformity in the heart Tobacco 1
itself. It specifically implies the condition of the heart that Diet 4
hinder blood vessels which results misfunction of heart. Obesity 2
Care and improvement of the unwellness by the help of Physical Inactivity 3
identification, hindrance, and care of any kind of illness. Sleep 4
Diverse methods are used for the early anticipation of any Air Pollution 2
deformity related to welfare so that one can get prilim High Blood Pressure 1
aware which leads to prevent or take care of health.
High Blood Sugar 1
Various predictive methods are used for early anticipation
and among them mining classification are the one used High Cholesterol 1
for this. Diverse classification methods such as Decision Stressful Work 2
tree, Naive Bayes, Support Vector Machine, and k-NN

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
10 Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique

increases[13,15]. Degree of risk are ranked from 1 to 4 if


the degree of risk is 1 then it indicates the high chances of
having heart disease and so on. The degree of risk having
heart disease increases which indicates the higher
probability of having heart illness. Table 1 describes the
risk factors and degree of risk having heart illness. Such
as :-
From the various study, the death rate is 272 people per
100 000 people in India and globally 235 per 100 000
people. In every year 610,000 population deceased due to
heart-related unwellness in the United States. In the year
2009 due to heart-related illness more than half of the
deaths were in men.
Aditya Sundar et al employee data mining Fig.1. Work Flow for Prediction of Heart Disease
classification techniques for the heart disease and
interrogate how accurately it predict the illness by Naive
Bayes and WAC[1].
Sellappan Palaniappan et al gives a clear idea about IV. DATA PREPROCESSING
several classification method Naive Bayes, Decision tree
and Neural network for the anticipation of heart related Figure 2 represents the flow diagram for the
illness [2]. preprocessing of data.
Chaitrali S.Dangare et al employ the technique of early
anticipation of heart related unwellness by Neural
network, Decision tree and Naive Bayes[3].
J Thomas et al discuss the classification methods k-NN
classification, Naive Bayes Classification, Decision tree
classifier and Neural network to anticipate the risk level
of a diligent to have a heart-related sickness or not. They
conclude that the accuracy of anticipation become greater
with the expand in the amount of attributes[4].
Swathy Wilson et al did a scrutiny of diverse mining
methods and ended with decision tree with k means
clustering returns better quality[5].
A Nishara Banu et al discussed Association Rule
Mining, Classification, and Clustering for detection of Fig.2. Data Preprocessing
heart-related unwellness. They showed their planned
Heart dataset consists of missing values, so at first
spotting knowledge can efficiently identify the heart
remove the missing values by replacing them with the
attack[6].
mean of that particular attribute. After that we normalize
Shabana Asmi P et al attached attributes in the dataset
the data by converting the datasets in the binary format
for discovery the heart-related sickness which results
I.e 0 and 1 on the basis of various conditions such as :- if
increase in accuracy and they used association rules for
age > 45, bp > 120, cholesterol > 240, thal > 3, thalach >
this[7].
100, target = 1, chest pain type > = 3, resting ecg >1,
induced angina = 1, oldpeak > 0, slope > = 2 and ca > 3
then replace them with “1” else replace them with “0”.
III. PROPOSED METHOD FOR PREDICTION
“1” refers to greater chances of presence of heart illness
For negotiate the medical data these days diverse and “0” refers to the absent of heart illness in the patients
information systems incidental to healthcare are being [13].
used because of the immense collection. The primary
grail is to design a system which is used for early
anticipation of heart related sickness. Figure 1 employee V. ATTRIBUTE FILTRATION
the methodology for the prediction of heart unwellness.
Figure 3 represents the flow diagram for filtering the
The dataset from UCI/Kaggle in CSV format then
attributes in the basis of frequent item set.
preprocess the data has been done which includes data
While handling large dataset for the identification of
transformation, data cleaning, and data integration. After
heart disease it is a complex task to get the applicable
preprocessing data mining classification algorithms such
content to predict heart attack at an primal phase on the
as Decision Tree, SVM, Naive Bayes, k-NN are applied
basis of the indicant observed in patients. So it is essential
for the prediction with and without filtering the attributes
for the Knowledge Discovery in Data[14]. Mining of
and compare their accomplishment.
knowledge or data mining is used for the prediction of

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique 11

various diseases. There are numerous symptoms observed heart data algorithm 1 is used in which frequent item set
in a patient for a particular disease which defines the is calculated.
clinical condition of them. For filtering the attribute of

Fig.3. Attribute Filtration by Frequent Itemset

A. Algrothim1: Algorithm for attribute filtration

Input: Heart dataset which is in binary form based


on the mentioned conditions in data preprocessing.
Output: Most important attribute in the frequent
itemsets fk.
Step-1: Import the Heart Dataset.
Step-2: Convert each attribute into binarized form
on the basis of given condition.
Step-3: Find the sum and support of each attribute
and set a minimum support value which is defined
by user.
Step-4: Prune the attribute which does not satisfy
minimum support. Fig.4. Attribute Filtration
Step-5: Check if any attribute contains all 1’s then
add the attribute to frequent itemset and delete that
attribute else goto next step.
Step-6: Calculate the sum of each attribute if the VI. CLASSIFICATION METHODS
maximum sum is unique then add the attribute to In data mining classification used for predicting a class
frequent itemset and delete the attribute along with for each constituent and assigns them to a allocate them
the rows which contains 0’s in the whole dataset. to target class. The main cognitive content of
Step-7: Calculate the maximum number of 0's row classification is to prognosticate the class for each one
wise and delete the row. data accurately[11]. In this paper four classification
Step-8: Repeat step 3 to 7 until the dataset is void. methods are taken into consideration. This section of the
Step-9: Output fk, most important attributes. paper gives a detailed idea about the classifiers and their
Output : frequent attribute fk= A14, A10, A8, A4, pros and cons. Depending on the pros and cons which
A2, A1. signify their characteristics the classifier gives the result.
The result also depends upon the dataset in which it is
going to apply.

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
12 Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique

Table 5. Pros and Cons of k-NN Classification Techniques


A. Decision Tree
Pros Cons
Basically a tree composition in which branch nodes
signify attribute, terminal nodes signify class labels and Applied to data of any Depends on K value.
distribution.
branches signify the termination or end points. Testing
benchmark are appertain on the source node and branch Very simple and intuitive. Affected by irrelevant attributes.
nodes and conditional on upon testing benchmark the
data will precede the branch till it reaches the leaf node or Work good for large sample. Need huge number of sample for
accuracy calculation.
class label[10,17].

Table 2. Pros and Cons of Decision Tree Classification Techniques Modeling is not expensive. Classifying unknown data is
very expensive.
Pros Cons

Robust, simple and easy to Class conditional


implement. independence.
VII. DATA SET ELUCIDATION
Not sensitive for irrelevant Dependencies is not taken into The dataset is gathered from UCI machine learning
features. discussion.
repository which is consists of 75 attributes but all of
them are not relevant for anticipation or for analysis so a
B. Naive Bayes subset of the dataset is taken into consideration i.e
consists of 14 attributes and 303 patients record[8]. Here
Bayesian classification is a probabilistic method to all the attributes in the dataset are described what they
solve the classification problem, based on Bayes theorem. refers to and for the prediction of heart illness we need to
It classify the data correctly with small training examine the peculiarities of illness which observed in a
dataset[10, 12,18]. particular patient. Depending on the peculiarities of
illness one can able to identify what kind of illness it is.
Table 3. Pros and Cons of Naive Bayes Classification Techniques So Figure 5 represents the dataset attribute description.
Pros Cons
Robust. Overfitting.

Easy to interpret. Prediction of continuous


variable is not suitable.

Need less computation. Perform poorly with many class


and small data.

C. SVM
Support Vector Machine can be described by a
hyperplane which separates the data into two parts which
lay in either side. It can be used for classification as well
as regression. It basically applied on the data which are
noisy and tangled in quality[10,19].

Table 4. Pros and Cons of SVM Classification Techniques Fig.5. Detail Description of Dataset

Pros Cons

Training of dataset is easy. Need good kernel function.


VIII. PERFORMANCE EXAMINATION

Scale well for high dimensional Sensitive to noisy data. For the computation of Accuracy, Sensitivity,
data. Specificity, Area under curve and ROC curve uses
confusion matrix exhibits in table 6.
Table 7 gives the comparison of data mining
D. k-NN
classification algorithms on the basis of various
k-NN classifier is the most instance-based method for performance parameter without attribute filtration.
classifying data. k-NN stores all available records and Sensitivity : P(+|1) : Percentage of Truly Positive:
classifies them on the basis of similarity measures[20]. TP/(TP+FN) (1) which correctly predicts to have illness.

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique 13

Specificity : P(-|0) : Percentage of Truly


Negative:TN/(TN+FP) (2) which correctly predicts not
have illness.
Accuracy : (TP+TN)/(TP+TN+FP+FN) (3) denotes
how healthy the test anticipate both collection.
Positive Predicted Value (PPV) : P(1|+) : Probability a
person who (+) have heart disease.
Negative Predicted Value (NPV) : P(0|-) : Probability a
person who (-) does not have heart disease.

Confusion matrix is a matrix which defines the


accomplishment of supervised learning methods and here
it is used for classification technique’s effecting. In the
Table 6 the row indicate the actual value and the column
signify the predicted value. If the actual value signify the
presence of illness in a particular patient and the classifier
signify the same then the result is TP and if the actual
value indicate the absence of illness and the prediction
does not match it gives the result FN. If the actual value
signify the absence of illness in a particular patient and Fig.6. ROC Curve of Various Classifier without Attribute Filtration
the classifier signify the same then the result is TN and if
the actual value indicate the absence of illness and the Figure 6 gives the Roc curve of different classifier and
prediction does not match it gives the result FP. area under curve. Figure 7 gives the performance of
different classifier with respect to accuracy, sensitivity
Table 6. Confusion Matrix for Heart Disease and specificity.
Class Label Present of Heart Heart Disease
Disease Not Present

Heart Disease Present TP FN

Heart Disease not FP TN


Present

Table 7. Comparison of Various Classifier without Attribute Filtration

Classifi Acc Sensiti Specifi PPV NPV AU


er (%) vity city (%) (%) C
(%) (%)
Fig.7. Performance Graph of Various Classifier without Attribute
Filtration
Decisio 84.91 36.95 61.81 44.73 53.9 .827
n Tree 6 5 Table 8. Comparison of Various Classifier with Attribute Filtration
Classifi Acc Sensiti Specifi PPV NPV AUC
er (%) vity city (%) (%) (%)
SVM 88.68 39.86 58.20 44.36 53.6 .884 (%) (%)
4 8
Decisio 69.81 41.30 56.96 44.53 53.71 .6928
Naive 96.23 39.13 57.57 43.54 53.0 .989 n Tree
Bayes 7 9

k-NN 58.49 50 48.49 44.80 54.7 .628 SVM 81.13 41.30 56.96 44.53 53.71 .8080
3

Naive 88.67 36.95 61.21 44.34 53.72 .9754


Bayes
ROC curve indicate the graphical representation of
k-NN 71.70 44.20 55.75 45.52 54.43 .7609
measuring the performance of classification methods
which is plotted between sensitivity which is the true
positive value and specificity which is the false positive
value[16].

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
14 Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique

Table 8 gives the comparison of data mining support at an diminish monetary value. For this various
classification algorithms on the basis of various predictive analysis methods are used which leads to
performance parameter with attribute filtration. achieve the result which in needed. This paper scrivener
the key detection and hindrance of heart related
unhealthiness by diverse classification methods which are
implemented using R analytical tool. This research paper
describes the classification techniques used for the early
anticipation. For the anticipation of heart related
unhealthiness at the primaeval period of time the
accuracy of Naive Bayes is dominant as compared to
another. From findings, the accuracy of foresee heart
unhealthiness dissent from each other and the accuracy of
foresee also rely on the platform. The accuracy and area
under curve is sovereign in case of Naive Bayes classifier
by using R data analytical tool for predicting heart illness
with or without attribute filtration but performance of k-
NN increases but the performance of others decreases.
And after this we will try ensemble technique to optimize
the proposed model and also compare with the existing
proposed one.

REFERENCES
[1] Sundar, N. Aditya, P. Pushpa Latha, and M. Rama
Fig.8. ROC Curve of Various Classifier with Attribute Filtration Chandra. "Performance analysis of classification data
mining techniques over heart disease
database." International journal of engineering science &
advanced technology 2.3 (2012): 470-478.
[2] Palaniappan, Sellappan, and Rafiah Awang. "Intelligent
heart disease prediction system using data mining
techniques." 2008 IEEE/ACS international conference on
computer systems and applications. IEEE, 2008.
[3] Dangare, Chaitrali S., and Sulabha S. Apte. "Improved
study of heart disease prediction system using data mining
classification techniques." International Journal of
Computer Applications 47.10 (2012): 44-48.
[4] Thomas, J., and R. Theresa Princy. "Human heart disease
prediction system using data mining techniques." 2016
International Conference on Circuit, Power and
Computing Technologies (ICCPCT). IEEE, 2016.
[5] Wilson, Aswathy, et al. "Data Mining Techniques For
Fig.9. Performance Graph of Various Classifier with Attribute Filtration Heart Disease Prediction." (2014).
[6] Banu, MA Nishara, and B. Gomathy. "Disease forecasting
Figure 8 represents the ROC curve of different system using data mining methods." 2014 International
classifier and area under curve with attribute filtration is conference on intelligent computing applications. IEEE,
maximum for Naive Bayes classifier as compare to others 2014.
but when we consider the performance then the [7] Waghulde, Nilakshi P., and Nilima P. Patil. "Genetic
performance of k-NN increases but the performance of neural approach for heart disease
other classification methods are decreases. Figure 9 prediction." International Journal of Advanced Computer
Research 4.3 (2014): 778.
represents the performance of classification methods with
[8] Database: http://archive.ics.uci.edu/ml/
respect to accuracy, sensitivity and specificity in a datasets/Heart+Disease
graphical way. [9] Wu, Xindong, et al. "Data mining with big data." IEEE
transactions on knowledge and data engineering 26.1
(2014): 97-107.
IX. CONCLUSION AND FUTURE SCOPE [10] Umadevi, S., and KS Jeen Marseline. "A survey on data
mining classification algorithms." 2017 International
This paper focuses on the early anticipation of heart Conference on Signal Processing and Communication
related unwellness on the basis of various indicant (ICSPC). IEEE, 2017.
observed in a particular patient so that one can got the [11] Tomar, Divya, and Sonali Agarwal. "A survey on Data
appropriate care and treatment for recovery. These days Mining approaches for Healthcare." International Journal
to get better medical service so that every tolerant able to of Bio-Science and Bio-Technology 5.5 (2013): 241-266.
recover from unwellness independent of the illness. So [12] Krishnapuram, B., et al., A Bayesian approach to joint
feature selection and classifier design.Pattern Analysis
the key challenge to provide better care and medical

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15
Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique 15

and Machine Intelligence, IEEE Transactions on, 2004. Manjusha Pandey, Ph.D (Computer
6(9): p. 1105-1111 Science), Member of IEEE is Professor at
[13] “Heart disease” from http://wikipedia.org the School of Computer Engineering, KIIT
[14] Frawley and Piatetsky-Shapiro, 1996. Knowledge University, Bhubaneswar. She has more than
Discovery in Databases:An Overview. The AAAI/MIT a decade of teaching and research experience.
Press, Menlo Park, C.A. Dr. Pandey has published numbers of
[15] "Hospitalization for Heart Attack, Stroke, or Congestive Research Papers in peerreviewed
Heart Failure among Persons with Diabetes", Special International Journals and conferences. Her areas of interest is
report: 2001 – 2003, New Mexico. WSN, Data analytics etc. She can be reached at
[16] “ROC curve” from https://en.wikipedia.org [email protected]
[17] “Decision Tree” from https://en.wikipedia.org
[18] “Naive Bayes” from https://en.wikipedia.org
[19] “Support Vector Machine” from https://en.wikipedia.org Siddharth Swarup Rautaray, Ph.D
[20] “K Nearest Neighbour” from https://en.wikipedia.org (Computer Science), Member of IEEE is
Professor at the School of Computer
Engineering, KIIT University, Bhubaneswar.
He has more than a decade of teaching and
research experience. Dr. Rautaray has
Authors’ Profiles
published numbers of Research Papers in
peer-reviewed International Journals and conferences. His areas
Sinkon Nayak, is a Student. Currently
of interest is Image Processing/DA/Human Computer
pursuing M. Tech (Computer Science and
Interaction. He can be reached at [email protected]
Engineering) at the School of Computer
Engineering, KIIT University, Bhubaneswar.
His areas of interest Data Analytics ,Data
mining etc . She can be reached at
[email protected].

Mahendra Kumar Gourisaria is Professor


at the School of Computer Engineering,
KIIT University, Bhubaneswar. He has more
than a decade of teaching and research
experience. He has published numbers of
Research Papers in peer-reviewed
International Journals and conferences. His
areas of interest include data mining and Cloud Computing. He
can be reached at [email protected].

How to cite this paper: Sinkon Nayak, Mahendra Kumar Gourisaria, Manjusha Pandey, Siddharth Swarup Rautaray, "
Heart Disease Prediction Using Frequent Item Set Mining and Classification Technique", International Journal of
Information Engineering and Electronic Business(IJIEEB), Vol.11, No.6, pp. 9-15, 2019. DOI:
10.5815/ijieeb.2019.06.02

Copyright © 2019 MECS I.J. Information Engineering and Electronic Business, 2019, 6, 9-15

View publication stats

You might also like