Heart Disease Prediction Using Machine Learning
Heart Disease Prediction Using Machine Learning
Learning Algorithms
ABSTRACT
Heart plays significant role in living organisms. Diagnosis and prediction of heart
related diseases requires more precision, perfection and correctness because a little mistake
can cause fatigue problem or death of the person, there are numerous death cases related to
heart and their counting is increasing exponentially day by day. To deal with the problem
there is essential need of prediction system for awareness about diseases. Machine learning is
the branch of Artificial Intelligence(AI), it provides prestigious support in predicting any kind
of event which take training from natural events. In this paper, we calculate accuracy of
machine learning algorithms for predicting heart disease, for this algorithms are k-nearest
neighbor, decision tree, linear regression and support vector machine(SVM) by using UCI
repository dataset for training and testing. For implementation of Python programming
Anaconda(jupytor) notebook is best tool, which have many type of library, header file, that
make the work more accurate and precise.
Keywords—supervised; unsupervised; reinforced; linear regression; decision tree; python
programming; jupytor Notebook; confusion matrix;
CHAPTER 1
INTRODUCTION
Heart is one of the most extensive and vital organ of human body so the care of heart is
essential. Most of diseases are related to heart so the prediction about heart diseases is
necessary and for this purpose comparative study needed in his field, today most of patient
are died because their diseases are recognized at last stage due to lack of accuracy of
instrument so there is need to know about the more efficient algorithms for diseases
prediction. Machine Learning is one of the efficient technology for the testing, which is based
on training and testing. It is the branch of Artificial Intelligence(AI) which is one of broad
area of learning where machines emulating human abilities, machine learning is a specific
branch of AI. On the other hand machines learning systems are trained to learn how to
process and make use of data hence the combination of both technology is also called as
Machine Intelligence. As the definition of machine learning, it learns from the natural
phenomenon, natural things so in this project we uses the biological parameter as testing data
such as cholesterol, Blood pressure, sex, age, etc. and on the basis of these, comparison is
done in the terms of accuracy of algorithms such as in this project we have used four
algorithms which are decision tree, linear regression, k-neighbour, SVM. In this paper, we
calculate the accuracy of four different machine learning approaches and on the basis of
calculation we conclude that which one is best among them. Section 1 of this paper consist
the introduction about the machine learning and heart diseases. Section II described, the
machine learning classification. Section III illustrated the related work of researchers. Section
IV is about the methodology used for this prediction system. Section V is about the
algorithms used in this project. Section VI briefly describes the dataset and their analysis with
the result of this project. And the last Section VII concludes the summary of his paper with
slight view about future scope of this paper.
The heart disease (HD) has been considered as one of the complex and life deadliest human
diseases in the world. In this disease, usually the heart is unable to push the required amount
of blood to other parts of the body to fulfil the normal functionalities of the body, and due to
this, ultimately the heart failure occurs [1]. The rate of heart disease in the United States is
very high [2]. The symptoms of heart disease include shortness of breath, weakness of
physical body, swollen feet, and fatigue with related signs, for example, elevated jugular
venous pressure and peripheral edema caused by functional cardiac or non-cardiac
abnormalities [3]. Investigation techniques in early stages used to identify heart disease were
complicated, and its resulting complexity is one of the major reasons that affect the standard
of life [4]. The heart disease diagnosis and treatment are very complex, especially in the
developing countries, due to the rare availability of diagnostic apparatus and shortage of
physicians and others resources which affect proper prediction and treatment of heart patients
[5]. The accurate and proper diagnosis of the heart disease risk in patients is necessary for
reducing their associated risks of severe heart issues and improving security of heart [6]. The
European Society of Cardiology (ESC) reported that 26 million adults worldwide were
diagnosed with heart disease and 3.6 million were diagnosed every year. Approximately 50%
of heart disease people suffering from HD die within initial 1-2 years, and concerned costs of
heart disease management are approximately 3% of health-care financial budget [7]. The
invasive-based techniques to the diagnosing of heart disease are based on the analysis of the
patient’s medical history, physical examination report, and analysis of concerned symptoms
by medical experts. All these techniques mostly cause imprecise diagnosis and often delay in
the diagnosis results due to human errors. Moreover, it is more expensive and
computationally complex and takes time in assessments [8]. In order to resolve these
complexities in invasive-based diagnosing of heart disease, a non-invasive medical decision
support system based on machine learning predictive models such as support vector machine
(SVM), k-nearest neighbour (K-NN), artificial neural network (ANN), decision tree (DT),
logistic regression (LR), AdaBoost (AB), Naïve Bayes (NB), fuzzy logic (FL), and rough set
[9, 10] has been developed by various researchers and widely used for heart disease
diagnosis, and due to these machine-learning-based expert medical decision system, the ratio
of heart disease death decreased [11]. Heart disease diagnosis through the machine-learning-
based system has been reported in various research studies. The classification performance of
different machine learning algorithms on Cleveland heart disease dataset has been reported in
the literature review. Cleveland heart disease dataset is online available on the University of
California Irvine (UCI) data mining repository which was used by various researchers [12,
13]. It is the dataset that has been used by various researchers for investigation of different
classification issues related to the heart diseases through different machine learning
classification algorithms. Detrano et al. [13] proposed a logistic regression classifier based
decision support system for heart disease classification and obtained a classification accuracy
of 77%. Cleveland dataset used [14] with global evolutionary approaches and achieved high
prediction performance in accuracy. The study used feature selection methods for selection of
features. Therefore, the classification performance of the approach depends on selected
features. Gudadhe et al. [15] used multilayer perceptron (MLP) and support vector machine
algorithms for heart disease classification proposed classification system and obtained
accuracy of 80.41%. Kahramanli and Allahverdi [16] designed a heart disease classification
system used a hybrid technique in which a neural network integrates a fuzzy neural network
and artificial neural network. And the proposed classification system achieved a classification
accuracy of 87.4%. Palaniappan and Awang [17] designed an expert medical diagnosing
heart disease system and applied machine learning techniques such as Naïve Bayes, decision
tree, and ANN in the system. The Naive Bayes predictive model obtained performance
accuracy 86.12%. The second best predictive model was ANN which obtained an accuracy of
88.12%, and decision tree classifier achieved 80.4% with correct prediction. Olaniyi and
Oyedotun [18] proposed a three-phase model based on the ANN to diagnose heart disease in
angina and achieved a classification accuracy of 88.89%. Moreover, the proposed system
could be easily deployed in healthcare information systems. Das et al. [19] proposed an ANN
ensemble-based predictive model that diagnoses the heart disease and used statistical analysis
system enterprise miner 5.2 with the classification system and achieved 89.01% accuracy,
80.09% sensitivity, and 95.91% specificity. Jabbar et al. [20] designed a diagnostic system
for heart disease and used machine learning classifier multilayer perceptron ANN-driven
back propagation learning algorithm and feature selection algorithm. The proposed system
gives excellent performance in terms of accuracy. In order to diagnose heart disease, an
integrated decision support medical system based on ANN and Fuzzy AHP were designed by
the authors in [12] which utilizes machine learning algorithm, artificial neural network, and
Fuzzy analytical hierarchical processing. Their proposed classification system achieved a
classification accuracy of 91.10% contribution of the proposed research is to design machine-
learning-based medical intelligent decision support system for the diagnosis of heart disease.
In the present study, various machines learning predictive models such as logistic regression,
k-nearest neighbour, ANN, SVM, decision tree, Naive Bayes, and random forest have been
used for classification of people with heart disease and healthy people. The feature selection
algorithms, Relief, minimal redundancy- maximal-relevance (mRMR), Shrinkage and
Selection Operator (LASSO), were also used to select the most important and highly
correlated features that great influence on target predicted value. Cross-validation methods
like k-fold were also used. In order to evaluate the performance of classifier, various
performance evaluation metrics such as classification accuracy, classification error,
specificity, sensitivity, Matthews’ correlation coefficient (MCC), and receiver optimistic
curves (ROC) were used. Additionally, model execution time has also been computed.
Moreover, data pre-processing techniques were applied to the heart disease dataset. The
proposed system has been trained and tested on Cleveland heart disease dataset, 2016. UCI
data-mining repository the dataset of Cleveland heart disease is available online. All the
computations were performed in Python on an Intel(R) Core™ i5-2400CPU @3.10GHz PC.
Major contributions of the proposed research work are as follows:
(a) All classifiers’ performances have been checked on full features in terms of classification
accuracy and execution time.
(b) The classifiers’ performances have been checked on selected features as selected by
feature selection (FS) algorithms Relief, mRMR, and LASSO with k-fold Cross-validation.
(c) The study suggests which feature algorithm is feasible with which classifier for designing
high-level intelligent system for heart disease that accurately classifies heart disease and
healthy people remaining parts of the paper are structured as follows:
In Section 2, the background information regarding heart disease dataset briefly reviews the
theoretical and mathematical background of feature selection and classification algorithms of
machine learning. It additionally discusses cross-validation method and performance
evaluation metrics. In Section 3, experimental results are discussed in detail. It final Section 4
is concerned with the conclusion of the paper.
[1] A. S. Abdullah and R. R. Rajalaxmi, “A data mining model for predicting the
coronary heart disease using random forest classifier,” 2012.
The proposed work is mainly concerned with the development of a data mining model with
the Random Forest classification algorithm. The developed model will have the
functionalities such as predicting the occurrence of various events related to each patient
record, prevention of risk factors with its associated cost metrics and an improvement in
overall prediction accuracy. As a result, the causes and the symptoms related to each event
will be made in accordance with the record related to each patient and thereby CHD can be
reduced to a great extent. Coronary Heart Disease (CHD) is a common form of disease
affecting the heart and an important cause for premature death. From the point of view of
medical sciences, data mining is involved in discovering various sorts of metabolic
syndromes. Classification techniques in data mining play a significant role in prediction and
data exploration. Classification technique such as Decision Trees has been used in predicting
the accuracy and events related to CHD. In this paper, a Data mining model has been
developed using Random Forest classifier to improve the prediction accuracy and to
investigate various events related to CHD. This model can help the medical practitioners for
predicting CHD with its various events and how it might be related with different segments
of the population. The events investigated are Angina, Acute Myocardial Infarction (AMI),
Percutaneous Coronary Intervention (PCI), and Coronary Artery Bypass Graft surgery
(CABG). Experimental results have shown that classification using Random Forest
Classification algorithm can be successfully used in predicting the events and risk factors
related to CHD.
The effects produced due to CHD are constant fatigue, physical disability, mental
stress and depression. This paper focuses on the creation of a data mining model using the
Random forest classification algorithm for evaluating and predicting various events related to
CHD. Some of these studies, has made with the implementation of data mining algorithm
such as K-NN, Naïve bayes, K-means, ID3, and Apriority algorithms. The growing
healthcare burden and suffering due to life threatening diseases such as heart disease and the
escalating cost of drug development can be significantly reduced by design and development
of novel methods in data mining technologies and allied medical informatics disciplines. In
CHD, if the risk factors are predicted in advance two sorts of problem can be solved. First,
various surgical treatments such as angioplasty, coronary stents, coronary artery bypass and
heart transplant can be avoided to a great extent. Second, the associated cost with each risk
factor can be reduced.
[2] A. H. Alkeshuosh, M. Z. Moghadam, I. Al Mansoori, and M. Abdar, “Using PSO
algorithm for producing best rules in diagnosis of heart disease,” 2017.
The experimental results show that the PSO algorithm achieved higher predictive
accuracy and much smaller rule list than C4.5. In this paper we proposed PSO algorithm for
production of best rules in prediction of heart disease. The experiments show that the rules
discovered for the dataset by PSO are generally with higher accuracy, generalization and
comprehensibility. Based on the average accuracy, the accuracy of the PSO method is 87%
and the accuracy of C4.5 is 63%. By using the PSO, one can extract effective classification
rules with acceptable accuracy. Furthermore, we conclude that PSO algorithm in rule
production has good performance for rule discovery on continuous data. For future work we
consider using improved PSO algorithm for producing the best rules in heart disease data set.
Heart disease is still a growing global health issue. In the health care system, limiting human
experience and expertise in manual diagnosis leads to inaccurate diagnosis, and the
information about various illnesses is either inadequate or lacking in accuracy as they are
collected from various types of medical equipment. Since the correct prediction of a person's
condition is of great importance, equipping medical science with intelligent tools for
diagnosing and treating illness can reduce doctors' mistakes and financial losses. In this
paper, the Particle Swarm Optimization (PSO) algorithm, which is one of the most powerful
evolutionary algorithms, is used to generate rules for heart disease. First the random rules are
encoded and then they are optimized based on their accuracy using PSO algorithm. Finally
we compare our results with the C4.5 algorithm.
The task of classification becomes very difficult when the number of possible various
combinations of parameters is so high. The self-adaptability of evolutionary algorithms
depended on population is very useful in rule extraction and selection for data mining.
[3] N. Al-milli, “Back propagation neural network for prediction of heart disease'' 2013
In this work, we present an approach that based on back propagation neural network
to model heart disease diagnosis. In this research paper, a heart disease prediction system is
developed using neural network. The proposed system used 13 medical attributes for heart
disease predictions. The experiments conducted in this work have shown the good
performance of the proposed algorithm compared to similar approaches of the state of the art
Moreover, new algorithms and new tools are continued to develop and represent day by day.
Diagnosing of heart disease is one of the important issue and many researchers investigated
to develop intelligent medical decision support systems to improve the ability of the
physicians. Neural network is widely used tool for predicting heart disease diagnosis. In this
research paper, a heart disease prediction system is developed using neural network. The
proposed system used 13 medical attributes for heart disease predictions. The experiments
conducted in this work have shown the good performance of the proposed algorithm
compared to similar approaches of the state of the art.
[4] C. A. Devi, S. P. Rajamhoana, K. Umamaheswari, R. Kiruba, K. Karunya, and R.
Deepika, ``Analysis of neural networks based heart disease prediction system,'' 2018.
In this research paper, we have presented Heart disease prediction system (HDPS)
using data mining and artificial neural network (ANN) techniques. From the ANN, a
multilayer perceptron neural network along with back propagation algorithm is used to
develop the system. Because MLPNN model proves the better results and helps the domain
experts and even person related with the field to plan for a better diagnose and provide the
patient with early diagnosis results as it performs realistically well even without retraining.
The experimental result shows that using neural networks the system predicts Heart disease
with nearly 100% accuracy.
This hidden information is useful for making effective decisions. Computer based
information along with advanced Data mining techniques are used for appropriate results.
Neural network is widely used tool for predicting Heart disease diagnosis. In this research
paper, a Heart Disease Prediction system (HDPS) is developed using neural network. The
HDPS system predicts the likelihood of patient getting a Heart disease. For prediction, the
system uses sex, blood pressure, cholesterol like 13 medical parameters. Here two more
parameters are added i.e. obesity and smoking for better accuracy. From the results, it has
been seen that neural network predict heart disease with nearly 100% accuracy. Predication
should be done to reduce risk of Heart disease. Diagnosis is usually based on signs,
symptoms and physical examination of a patient. Almost all the doctors are predicting heart
disease by learning and experience. The diagnosis of disease is a difficult and tedious task in
medical field. Predicting Heart disease from various factors or symptoms is a multi-layered
issue which may lead to false presumptions and unpredictable effects. Healthcare industry
today generates large amounts of complex data about patients, hospitals resources, disease
diagnosis, electronic patient records, medical devices etc. The large amount of data is a key
resource to be processed and analyzed for knowledge extraction that enables support for cost-
savings and decision making. Only human intelligence alone is not enough for proper
diagnosis.
[5] P. K. Anooj, “Clinical decision support system: Risk level prediction of heart disease
using weighted fuzzy rules,'' 2012.
This process is time consuming and really depends on medical experts’ opinions
which may be subjective. To handle this problem, machine learning techniques have been
developed to gain knowledge automatically from examples or raw data. Here, a weighted
fuzzy rule-based clinical decision support system (CDSS) is presented for the diagnosis of
heart disease, automatically obtaining knowledge from the patient’s clinical data. The
proposed clinical decision support system for the risk prediction of heart patients consists of
two phases: (1) automated approach for the generation of weighted fuzzy rules and (2)
developing a fuzzy rule-based decision support system. In the first phase, we have used the
mining technique, attribute selection and attribute weight age method to obtain the weighted
fuzzy rules. Then, the fuzzy system is constructed in accordance with the weighted fuzzy
rules and chosen attributes. Finally, the experimentation is carried out on the proposed system
using the datasets obtained from the UCI repository and the performance of the system is
compared with the neural network-based system utilizing accuracy, sensitivity and
specificity.
In the proposed work, we have proposed an effective clinical decision support system
using fuzzy logic in which automatically generated weighted fuzzy rules are used. At first,
data pre-processing is applied on the heart disease dataset for removing the missing values
and other noisy information. Then, using the class label, the input database is divided into
two subsets of data that are then used for mining the frequent attribute category individually.
Subsequently, the deviation range is computed using these frequent attribute categories so as
to compute the relevant attributes. Based on the deviation range, the attributes are selected
whether any deviation exists or not. Using this deviation range, the decision rules are
constructed and these rules are scanned in the learning database to find its frequency.
According to its frequency, the weight age is calculated for every decision rule obtained and
the weighted fuzzy rules are obtained with the help of fuzzy membership function. Finally,
the weighted fuzzy rules are given to the Mamdani fuzzy inference system so that the system
can learn these rules and the risk prediction can be carried out on the designed fuzzy system.
[6] L. Baccour, “Amended fused TOPSIS-VIKOR for classification (ATOVIC) applied
to some UCI data sets,'' 2016
Classification procedure is an important task of expert and intelligent systems.
Developing new algorithms of classification which improve accuracy or true positive
rates could have an influence on some life problems such as diagnosis prediction in medical
domain. Multi-criteria decision making (MCDM) methods are expected to search the best
alternative according to some criteria. Each criterion has a value relative to each alternative.
There are only two sets: a set of criteria and a set of alternatives. This work merges MCDM
methods TOPSIS and VIKOR and modifies them to be used for classification where the used
sets are three: the classes, the objects and the attributes (features) describing the objects.
Hence, ATOVIC, a new classification algorithm is proposed. In ATOVIC, criteria are
replaced by features and alternatives are replaced by objects. The latter belong to
corresponding classes. Two sets are employed one serves as reference and second serves as
test. An object from test set will be classified to the relative class based on the reference set.
ATOVIC is applied on a benchmark (UCI) CLEVELAND data set to predict heart disease.
Following the complexity of the data set and its importance, ATOVIC application is done on
different test sets of CLEVELAND using binary classification and multi-classification.
Moreover, ATOVIC is applied to thyroid data set to detect hyperthyroidism and
hypothyroidism diseases. The obtained results show the efficiency of ATOVIC in medical
domain. In addition, ATOVIC is applied to three other data sets: chess, nursery and titanic,
from UCI and KEEL websites. The obtained results are compared to those of some classifiers
from literature. The experimental results demonstrate that ATOVIC method improves
accuracy and true positive rates comparing to most classifiers considered from literature.
Hence ATOVIC is promising for use in prediction or classification.
CHAPTER 3
MACHINE LEARNING
Machine Learning is one of efficient technology which is based on two terms namely testing
and training i.e. system take training directly from data and experience and based on this
training test should be applied on different type of need as per the algorithm required.
There are three type of machine learning algorithms:
A. Supervised Learning
Supervised learning can be define as learning with the proper guide or you can say that
learning in the present of teacher .we have a training dataset which act as the teacher for
prediction on the given dataset that is for testing a data there are always a training dataset.
Supervised learning is based on "train me" concept. Supervised learning have following
processes:
• Classification
• Random Forest
• Decision tree
• Regression
To recognize patterns and measures probability of uninterruptable outcomes, is phenomenon
of regression. System have ability to identify numbers, their values and grouping sense of
numbers which means width and height, etc. There are following supervised machine
learning algorithms:
• Linear Regression
• Logistical Regression
• Support Vector Machines (SVM)
• Neural Networks
• Random Forest
• Gradient Boosted Trees
• Decision Trees
• Naive Bayes
B. Unsupervised Learning
Unsupervised learning can be define as the learning without a guidance which in
Unsupervised learning there are no teacher are guiding. In Unsupervised learning when a
dataset is given it automatically work on the dataset and find the pattern and relationship
between them and according to the created relationships, when new data is given it classify
them and store in one of them relation . Unsupervised learning is based on "self sufficient "
concept. For example suppose there are combination fruits mango, banana and apple and
when Unsupervised learning is applied it classify them in three different clusters on the basis
if there relation with each other and when a new data is given it automatically send it to one
of the cluster . Supervisor learning say there are mango, banana and apple but Unsupervised
learning said it as there are three different clusters. Unsupervised algorithms have following
process:
• Dimensionality
• Clustering
There are following unsupervised machine learning algorithms:
• t-SNE
• k-means clustering
• PCA
C. Reinforcement
Reinforced learning is the agent ability to interact with the environment and find out the
outcome. It is based on "hit and trial" concept. In reinforced learning each agent is awarded
with positive and negative points and on the basis of positive points reinforced learning give
the dataset output that is on the basis of positive awards it trained and on the basis of this
training perform the testing on datasets
Machine learning algorithm
Machines are by nature not intelligent. Initially, machines were designed to perform specific
tasks, such as running on the railway, controlling the traffic flow, digging deep holes,
travelling into the space, and shooting at moving objects. Machines do their tasks much faster
with a higher level of precision compared to humans. They have made our lives easy and
smooth. The fundamental difference between humans and machines in performing their work
is intelligence. The human brain receives data gathered by the five senses: vision, hearing,
smell, taste, and tactility. These gathered data are sent to the human brain via the neural
system for perception and taking action. In the perception process, the data is organized,
recognized by comparing it to previous experiences that were stored in the memory, and
interpreted. Accordingly, the brain takes the decision and directs the body parts to react
against that action. At the end of the experience, it might be stored in the memory for future
benefits. A machine cannot deal with the gathered data in an intelligent way. It does not have
the ability to analyze data for classification, benefit from previous experiences, and store the
new experiences to the memory units; that is, machines do not learn from experience.
Although machines are expected to do mechanical jobs much faster than humans, it is not
expected from a machine to: understand the play Romeo and Juliet, jump over a hole in the
street, form friendships, interact with other machines through a common language, recognize
dangers and the ways to avoid them, decide about a disease from its symptoms and laboratory
tests, recognize the face of the criminal, and so on. The challenge is to make dumb machines
learn to cope correctly with such situations. Because machines have been originally created to
help humans in their daily lives, it is necessary for the machines to think, understand to solve
problems, and take suitable decisions akin to humans. In other words, we need smart
machines. In fact, the term smart machine is symbolic to machine learning success stories and
its future targets. We will discuss the issue of smart machines in Section 1.4. The question of
whether a machine can think was first asked by the British mathematician Alan Turing in
1955, which was the start of the artificial intelligence history. He was the one who proposed a
test to measure the performance of a machine in terms of intelligence. Section 1.4 also
discusses the progress that has been achieved in determining whether our machines can pass
the Turing test. Computers are machines that follow programming instructions to accomplish
the required tasks and help us in solving problems. Our brain is similar to a CPU that solves
problems for us. Suppose that we want to find the smallest number in a list of unordered
numbers. We can perform this job easily. Different persons can have different methods to do
the same job. In other words, different persons can use different algorithms to perform the
same task. These methods or algorithms are basically a sequence of instructions that are
executed to reach from one state to another in order to produce output from input. If there are
different algorithms that can perform the same task, then one is right in questioning which
algorithm is better. For example, if two programs are made based on two different algorithms
to find the smallest number in an unordered list, then for the same list of unordered number
(or same set of input) and on the same machine, one measure of efficiency can be speed or
quickness of program and another can be minimum memory usage. Thus, time and space are
the usual measures to test the efficiency of an algorithm. In some situations, time and space
can be interrelated, that is, the reduction in memory usage leading to fast execution of the
algorithm. For example, an efficient algorithm enabling a program to handle full input data in
cache memory will also consequently allow faster execution of program.
INTRODUCTION TO THE DEEP LEARNING
Deep learning
Deep learning is a subset of machine learning in artificial intelligence (AI) that has networks
capable of learning unsupervised from data that is unstructured or unlabeled. Also known as
deep neural learning or deep neural network.
CNN is a feed forward neural network that is generally used for Image recognition and object
classification. ... A Recurrent Neural Network looks something like this: In RNN, the
previous states is fed as input to the current state of the network. RNN can be used in NLP,
Time Series Prediction, Machine Translation, etc.
Convolutional Neural Network (cnn)
Convolutional Neural Network is one of the main categories to do image classification and
image recognition in neural networks. Scene labeling, objects detections, and face
recognition, etc., are some of the areas where convolutional neural networks are widely used.
CNN takes an image as input, which is classified and process under a certain category such as
dog, cat, lion, tiger, etc. The computer sees an image as an array of pixels and depends on the
resolution of the image. Based on image resolution, it will see as h * w * d, where h= height
w= width and d= dimension. For example, An RGB image is 6 * 6 * 3 array of the matrix,
and the grayscale image is 4 * 4 * 1 array of the matrix.
In CNN, each input image will pass through a sequence of convolution layers along with
pooling, fully connected layers, filters (Also known as kernels). After that, we will apply the
Soft-max function to classify an object with probabilistic values 0 and 1.
Convolution Layer
Convolution layer is the first layer to extract features from an input image. By learning image
features using a small square of input data, the convolutional layer preserves the relationship
between pixels. It is a mathematical operation which takes two inputs such as image matrix
and a kernel or filter.
Strides
Stride is the number of pixels which are shift over the input matrix. When the stride is
equaled to 1, then we move the filters to 1 pixel at a time and similarly, if the stride is equaled
to 2, then we move the filters to 2 pixels at a time. The following figure shows that the
convolution would work with a stride of 2.
Padding
Padding plays a crucial role in building the convolutional neural network. If the image will
get shrink and if we will take a neural network with 100's of layers on it, it will give us a
small image after filtered in the end.
Pooling Layer
Pooling layer plays an important role in pre-processing of an image. Pooling layer reduces
the number of parameters when the images are too large. Pooling is "downscaling" of the
image obtained from the previous layers. It can be compared to shrinking an image to reduce
its pixel density. Spatial pooling is also called downsampling or subsampling, which reduces
the dimensionality of each map but retains the important information.
max
average
sum
The fully connected layer is a layer in which the input from the other layers will be flattened
into a vector and sent. It will transform the output into the desired number of classes by the
network.
A recurrent neural network (RNN) is a kind of artificial neural network mainly used in
speech recognition and natural language processing (NLP). RNN is used in deep learning and
in the development of models that imitate the activity of neurons in the human brain.
Recurrent Networks are designed to recognize patterns in sequences of data, such as text,
genomes, handwriting, the spoken word, and numerical time series data emanating from
sensors, stock markets, and government agencies.
A recurrent neural network looks similar to a traditional neural network except that a
memory-state is added to the neurons. The computation is to include a simple memory.
The recurrent neural network is a type of deep learning-oriented algorithm, which follows a
sequential approach. In neural networks, we always assume that each input and output is
dependent on all other layers. These types of neural networks are called recurrent because
they sequentially perform mathematical computations.
Applications:
There are many applications for deep learning
INTRODUCTION TO PYTHON
Python:
Python was conceived in the late 1980s as a successor to the ABC language. Python 2.0,
released in 2000, introduced features like list comprehensions and a garbage
collection system capable of collecting reference cycles. Python 3.0, released in 2008, was a
major revision of the language that is not completely backward-compatible, and much
Python 2 code does not run unmodified on Python 3.
The Python 2 language, i.e. Python 2.7.x, was officially discontinued on 1 January 2020 (first
planned for 2015) after which security patches and other improvements will not be released
for it. With Python 2's end-of-life, only Python 3.5.xand later are supported.
Python do?:
Why Python?:
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it
is written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-orientated way or a functional
way.
Python was designed for readability, and has some similarities to the English
language with influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets
for this purpose.
Windows Based
It is highly unlikely that your Windows system shipped with Python already installed.
Windows systems typically do not. Fortunately, installing does not involve much more than
downloading the Python installer from the python.org website and running it. Let’s take a
look at how to install Python 3 on Windows:
If your system has a 32-bit processor, then you should choose the 32-bit installer.
On a 64-bit system, either installer will actually work for most purposes. The 32-bit
version will generally use less memory, but the 64-bit version performs better for
applications with intensive computation.
If you’re unsure which version to pick, go with the 64-bit version.
Note: Remember that if you get this choice “wrong” and would like to switch to another
version of Python, you can just uninstall Python and then re-install it by downloading another
installer from python.org.
Once you have chosen and downloaded an installer, simply run it by double-clicking on the
downloaded file. A dialog should appear that looks something like this:
Important: You want to be sure to check the box that says Add Python 3.x to PATH as
shown to ensure that the interpreter will be placed in your execution path.
Then just click Install Now. That should be all there is to it. A few minutes later you should
have a working Python 3 installation on your system.
Mac OS based
While current versions of macOS (previously known as “Mac OS X”) include a version of
Python 2, it is likely out of date by a few months. Also, this tutorial series uses Python 3, so
let’s get you upgraded to that.
The best way we found to install Python 3 on macOS is through the Homebrew package
manager. This approach is also recommended by community guides like The Hitchhiker’s
Guide to Python.
1. Open a browser and navigate to http://brew.sh/. After the page has finished
loading, select the Homebrew bootstrap code under “Install Homebrew”. Then hit
cmd+c to copy it to the clipboard. Make sure you’ve captured the text of the
complete command because otherwise the installation will fail.
2. Now you need to open a Terminal app window, paste the Homebrew bootstrap
code, and then hit Enter. This will begin the Homebrew installation.
3. If you’re doing this on a fresh install of macOS, you may get a pop up alert asking
you to install Apple’s “command line developer tools”. You’ll need those to
continue with the installation, so please confirm the dialog box by clicking on
“Install”.
At this point, you’re likely waiting for the command line developer tools to finish installing,
and that’s going to take a few minutes. Time to grab a coffee or tea!
You can continue installing Homebrew and then Python after the command line developer
tools installation is complete:
1. Confirm the “The software was installed” dialog from the developer tools installer.
2. Back in the terminal, hit Enter to continue with the Homebrew installation.
3. Homebrew asks you to enter your password so it can finalize the installation. Enter
your user account password and hit Enter to continue.
4. Depending on your internet connection, Homebrew will take a few minutes to
download its required files. Once the installation is complete, you’ll end up back at
the command prompt in your terminal window.
Whew! Now that the Homebrew package manager is set up, let’s continue on with installing
Python 3 on your system.
Once Homebrew has finished installing, return to your terminal and run the following
command:
Assuming everything went well and you saw the output from Pip in your command prompt
window…congratulations! You just installed Python on your system, and you’re all set to
continue with the next section in this tutorial.
Numpy
NumPy is a Python package which stands for 'Numerical Python'. It is the core library for
scientific computing, which contains a powerful n-dimensional array object, provide tools
for integrating C, C++ etc. It is also useful in linear algebra, random number capability
etc.
Pandas
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on
the Numpy package and its key data structure is called the DataFrame. DataFrames allow
you to store and manipulate tabular data in rows of observations and columns of
variables.
Keras
Keras is a high-level neural networks API, written in Python and capable of running on
top of TensorFlow, CNTK, or Theano. Use Keras if you need a deep learning library that:
Allows for easy and fast prototyping (through user friendliness, modularity, and
extensibility).
Sklearn
Scikit-learn is a free machine learning library for Python. It features various algorithms
like support vector machine, random forests, and k-neighbours, and it also supports
Python numerical and scientific libraries like NumPy and SciPy.
Scipy
SciPy is an open-source Python library which is used to solve scientific and mathematical
problems. It is built on the NumPy extension and allows the user to manipulate and
visualize data with a wide range of high-level commands.
Tensorflow
TensorFlow is a Python library for fast numerical computing created and released by
Google. It is a foundation library that can be used to create Deep Learning models
directly or by using wrapper libraries that simplify the process built on top
of TensorFlow.
Django
Django is a high-level Python Web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the
hassle of Web development, so you can focus on writing your app without needing to
reinvent the wheel. It's free and open source.
Pyodbc
pyodbc is an open source Python module that makes accessing ODBC databases simple.
It implements the DB API 2.0 specification but is packed with even more Pythonic
convenience. Precompiled binary wheels are provided for most Python versions on
Windows and macOS. On other operating systems this will build from source.
Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of
arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays
and designed to work with the broader SciPy stack. It was introduced by John Hunter in
the year 2002.
Opencv
OpenCV-Python is a library of Python bindings designed to solve computer vision
problems. Python is a general purpose programming language started by Guido van
Rossum that became very popular very quickly, mainly because of its simplicity and code
readability.
Nltk
Natural Language Processing with Python NLTK is one of the leading platforms for
working with human language data and Python, the module NLTK is used for natural
language processing. NLTK is literally an acronym for Natural Language Toolkit. In this
article you will learn how to tokenize data (by words and sentences).
SQLAIchemy
SQLAlchemy is a library that facilitates the communication between Python programs
and databases. Most of the times, this library is used as an Object Relational Mapper
(ORM) tool that translates Python classes to tables on relational databases and
automatically converts function calls to SQL statements.
Urllib
urllib is a Python module that can be used for opening URLs. It defines functions and
classes to help in URL actions. With Python you can also access and retrieve data from
the internet like XML, HTML, JSON, etc. You can also use Python to work with this data
directly.
Installation of packages:
Syntax for installation of packages via cmd terminal using the basic
If ok then
Check the list of packages installed and then install required by following cmds
INTRODUCTION TO OPENCV
Open cv:
OpenCV was started at Intel in 1999 by Gary Bradsky and the first release came out in 2000.
Vadim Pisarevsky joined Gary Bradsky to manage Intel’s Russian software OpenCV team. In
2005, OpenCV was used on Stanley, the vehicle who won 2005 DARPA Grand Challenge.
Later its active development continued under the support of Willow Garage, with Gary
Bradsky and Vadim Pisarevsky leading the project. Right now, OpenCV supports a lot of
algorithms related to Computer Vision and Machine Learning and it is expanding day-by-
day. Currently OpenCV supports a wide variety of programming languages like C++, Python,
Java etc and is available on different platforms including Windows, Linux, OS X, Android,
iOS etc. Also, interfaces based on CUDA and OpenCL are also under active development for
high-speed GPU operations. OpenCV-Python is the Python API of OpenCV. It combines the
best qualities of OpenCV C++ API and Python language.
Since OpenCV is an open source initiative, all are welcome to make contributions to this
library. And it is same for this tutorial also. So, if you find any mistake in this tutorial
(whether it be a small spelling mistake or a big error in code or concepts, whatever), feel free
to correct it. 1.1. Introduction to OpenCV 7 OpenCV-Python Tutorials Documentation,
Release 1 And that will be a good task for fresher’s who begin to contribute to open source
projects. Just fork the OpenCV in github, make necessary corrections and send a pull request
to OpenCV.
OpenCV developers will check your pull request, give you important feedback and once it
passes the approval of the reviewer, it will be merged to OpenCV. Then you become a open
source contributor. Similar is the case with other tutorials, documentation etc. As new
modules are added to OpenCV-Python, this tutorial will have to be expanded. So those who
knows about particular algorithm can write up a tutorial which includes a basic theory of the
algorithm and a code showing basic usage of the algorithm and submit it to OpenCV.
Remember, we together can make this project a great success!!! Contributors Below is the list
of contributors who submitted tutorials to OpenCV-Python.
Additional Resources
4. OpenCV Documentation
5. OpenCV Forum
We will learn to setup OpenCV-Python in your Windows system. Below steps are tested in a
Windows 7-64 bit machine with Visual Studio 2010 and Visual Studio 2012. The screenshots
shows VS2012.
1.1. Python-2.7.x.
1.2. Numpy.
1.3. Matplotlib (Matplotlib is optional, but recommended since we use it a lot in our
tutorials).
2. Install all packages into their default locations. Python will be installed to C:/Python27/.
3. After installation, open Python IDLE. Enter import numpy and make sure Numpy is
working fine.
4. Download latest OpenCV release from source forge site and double-click to extract it.
If the results are printed out without any errors, congratulations!!! You have installed
OpenCV-Python successful
1. Python 3.6.8.x
2. Numpy
3. Matplotlib (Matplotlib is optional, but recommended since we use it a lot in our tutorials.)
7.2. Click on Browse Build... and locate the build folder we created.
7.4. It will open a new window to select the compiler. Choose appropriate compiler
(here, Visual Studio 11) and click Finish.
8. You will see all the fields are marked in red. Click on the WITH field to expand it. It
decides what extra features you need. So mark appropriate fields. See the below image:
9. Now click on BUILD field to expand it. First few fields configure the build method. See
the below image:
10. Remaining fields specify what modules are to be built. Since GPU modules are not yet
supported by Open CV Python, you can completely avoid it to save time (But if you work
with them, keep it there). See the image below:
11. Now click on ENABLE field to expand it. Make sure ENABLE_SOLUTION_FOLDERS
is unchecked (Solution folders are not supported by Visual Studio Express edition). See the
image below:
12. Also make sure that in the PYTHON field, everything is filled. (Ignore
PYTHON_DEBUG_LIBRARY). See image below:
16. In the solution explorer, right-click on the Solution (or ALL_BUILD) and build it. It will
take some time to finish.
17. Again, right-click on INSTALL and build it. Now OpenCV-Python will be installed.
18. Open Python IDLE and enter import cv2. If no error, it is installed correctly
Use the function cv2.imread () to read an image. The image should be in the working
directory or a full path of image should be given. Second argument is a flag which specifies
the way image should be read.
import numpy as np
import cv2
img = cv2.imread('messi5.jpg',0)
Warning: Even if the image path is wrong, it won’t throw any error, but print img will give
you None
Display an image Use the function cv2.imshow() to display an image in a window. The
window automatically fits to the image size. First argument is a window name which is a
string. second argument is our image. You can create as many windows as you wish, but with
different window names.
cv2.imshow('image’, mg)
cv2.waitKey(0)
cv2.destroyAllWindows()
Write an image
Use the function cv2.imwrite () to save an image. First argument is the file name, second
argument is the image you want to save.
cv2.imwrite('messigray.png',img)
This will save the image in PNG format in the working directory
Below program loads an image in gray scale, displays it, save the image if you press ‘s’ and
exit, or simply exit without saving if you press ESC key.
import numpy as np
import cv2
img = cv2.imread('messi5.jpg',0)
cv2.imshow('image’, mg)
k = cv2.waitKey(0)
cv2.imwrite('messigray.png',img)
cv2.destroyAllWindows()
CHAPTER 4
SOFTWARE REQUIREMENTS
approaches for finding which one is best among then and get the result on the favor of svm.
Kumar et al.[5] have worked on various machine learning and data mining algorithms and
analysis of these algorithms are trained by UCI machine learning dataset which have 303
samples with 14 input feature and found svm is best among them, here other different
algorithms are naivy bayes, knn and decision tree. Gavhane et al.[1] have worked on the
multi layer perceptron model for the prediction of heart diseases in human being and the
accuracy of the algorithm using CAD technology. If the number of person using the
prediction system for their diseases prediction then the awareness about the diseases is also
going to increases and it make reduction in the death rate of heart patient. Some researchers
have work on one or two algorithm for predication diseases. Krishnan et al.[2] proved that
decision tree is more accurate as compare to the naïve bayes classification algorithm in their
project. Machine learning algorithms are used for various type of diseases predication and
many of the researchers have work on this like Kohali et al.[7] work on heart diseases
prediction using logistic regression, diabetes prediction using support vector machine, breast
cancer prediction using Adaboot classifier and concluded that the logistic regression give the
accuracy of 87.1%, support vector machine give the accuracy of 85.71%, Adaboot classifier
give the accuracy up to 98.57% which good for predication point of view. A survey paper on
heart diseases predication have proven that the old machine learning algorithms does not
perform good accuracy for the predication while hybridization perform good and give better
accuracy for the predication[8].
In the above equation of entropy (1) Pij is probability of the node and according to it the
entropy of each node is calculated. The node which have highest entropy calculation is
selected as the root node and this process is repeated until all the nodes of the tree are
calculated or until the tree constructed. When the number of nodes are imbalanced then tree is
create the over fitting problem which is not good for the calculation and this is one of reason
why decision tree have less accuracy as compare to linear regression.
C. Support Vector Machine
It is one category of machine learning technique which work on the concept of hyperplan
means it classify the data by creating hyper plan between them. Training sample dataset is
(Yi, Xi) where i=1,2,3,…….n and Xi is the ith vector, Yi is the target vector. Number of
hyper plan decide the type of support vector such as example if a line is used as hyper plan
then method is called linear support vector.
Three traditional model selection approaches for GLM were used to select predictive models:
(1) stepAIC; (2) drop term and (3) anova. We used a backward direction with a k = log(n) for
stepAIC, chi-square test with a k = log(n) for drop term, and chi-square test for anova. Firstly,
we used stepAIC to choose a model (i.e. GLM1) from a full model, containing all 49
numerical predictors. We then simplified GLM1 using drop term and anova to remove non-
significant predictors and developed a further model (i.e. GLM2). We then considered
possible two-way interactions of remaining predictors in the model with lowest AIC (i.e.
GLM1) and simplified this newly formed model using stepAIC; and we then added a few
second orders based on the relationships of species richness with relevant predictors to this
model and further simplified it using stepAIC, drop term and anova, which led to the third
model.
Description of the dataset
The Cleveland heart dataset from the UCI machine learning repository has been used for the
experiments. The dataset consists of 14 attributes and 303 instances. There are 8 categorical
attributes and 6 numeric attributes. The description of the dataset is shown in Table 1.
Patients from age 29 to 79 have been selected in this dataset. Male patients are denoted by a
gender value 1 and female patients are denoted by gender value 0. Four types of chest pain
can be considered as indicative of heart disease. Type 1 angina is caused by reduced blood
flow to the heart muscles because of narrowed coronary arteries. Type 1 Angina is a chest
pain that occurs during mental or emotional stress. Non-angina chest pain may be caused due
to various reasons and may not often be due to actual heart disease. The fourth type,
Asymptomatic, may not be a symptom of heart disease. The next attribute trestbps is the
reading of the resting blood pressure. Chol is the cholesterol level. Fbs is the fasting blood
sugar level; the value is assigned as 1 if the fasting blood sugar is below 120 mg/dl and 0 if it
is above. Restecg is the resting electrocardiographic result, thalach is the maximum heart
rate, exang is the exercise induced angina which is recorded as 1 if there is pain and 0 if there
is no pain, old peak is the ST depression induced by exercise, and slope is the slope of the
peak exercise ST segment, ca is the number of major vessels colored by fluoroscopy, thal is
the duration of the exercise test in minutes, and num is the class attribute. The class attribute
has a value of 0 for normal and 1 for patients diagnosed with heart disease.
Classification
• Apply number of classifications techniques on the output of the first phase.
• Classification accuracy, precision, recall and f-measure will be used to evaluate the
efficiency of the used techniques; Figure 2 shows the classification results of the original
data.
• Eliminate low efficiency algorithms based on the evaluations from previous step. This
process done by comparing the values of accuracy, precision, recall and f-measure for each
feature to determine the consistency of the classification on the data set. We notice that Naïve
Bayes and SVM always perform better than others and never been eliminate, tree decision
eliminated a couple of times. Where KNN is most of the time get eliminated.
• Apply Hybridization, where we combine the results from the chosen Classification.
Feature selection
In the heart disease datasets, the number of features can reach up to tens of thousands;
the heart disease dataset has 14 attributes. Since a large number of irrelevant and redundant
attributes are involved in these expression data, the heart disease classification task is made
more complex. If complete data are used to perform heart disease classification, accuracy will
not be as accurate, and calculation time and costs will be high. Therefore, the feature
selection, as a pre-treatment step to machine learning, reduces sizing, eliminates unresolved
data, increases learning accuracy, and improves understanding of results. The recent increase
in the dimensionality of the data poses a serious problem to the methods of selecting
characteristics with regard to efficiency and effectiveness. The FCBF's reliable method [8] is
adopted to select a subset of discriminatory features prior to classification, by eliminating
attributes with little or no effect, FCBF provides good performance with full consideration of
feature correlation and redundancy. In this document, we first standardized the data and then
selected the features by FCBF in WEKA. The number of heart disease attributes increased
from 14 to 7.
Effectiveness
In this section, we evaluate the effectiveness of all classifiers in terms of time to build the
model, correctly classified instances, incorrectly classified instances and accuracy. The
results are shown in Table 3 without optimization, Table 4 optimized by FCBF and Table 5
optimized by FCBF, PSO and ACO. In order to improve the measurement of classifier
performance, the simulation error is also taken into account in this study. To do this, we
evaluate the effectiveness of our classifier in terms of: Kappa as a randomly corrected
measure of agreement between classifications and actual classes, Mean Absolute Error as the
way in which predictions or predictions approximate possible results, Root Mean Squared
Error, Relative Absolute Error, Root Relative Absolute Error, Root Relative Squared Error.
MACHINE LEARNING
Machine learning is a branch of artificial intelligence that aims at enabling machines to
perform their jobs skill fully by using intelligent software. The statistical learning methods
constitute the backbone of intelligent software that is used to develop machine intelligence.
Because machine learning algorithms require data to learn, the discipline must have
connection with the discipline of database. Similarly, there are familiar terms such as
Knowledge Discovery from Data (KDD), data mining, and pattern recognition. One wonders
how to view the big picture in which such connection is illustrated. SAS Institute Inc., North
Carolina, is a developer of the famous analytical software Statistical Analysis System (SAS).
In order to show the connection of the discipline of machine learning with different related
disciplines, we will use the illustration from SAS.
Machine learning algorithms are helpful in bridging this gap of understanding. The
idea is very simple. We are not targeting to understand the underlying processes that help us
learn. We write computer programs that will make machines learn and enable them to
perform tasks, such as prediction. The goal of learning is to construct a model that takes the
input and produces the desired result. Sometimes, we can understand the model, whereas, at
other times, it can also be like a black box for us, the working of which cannot be intuitively
explained.
Figure 13. Different machine learning techniques and their required data.
There are some tasks that humans perform effortlessly or with some efforts, but we are
unable to explain how we perform them. For example, we can recognize the speech of our
friends without much difficulty. If we are asked how we recognize the voices, the answer is
very difficult for us to explain.
CARDIOVASCULAR DISEASE
Globally, cardiovascular diseases are the number one cause of death and they are
projected to remain so. An estimated 17 million people died from cardiovascular disease in
2005, representing 30% of all global deaths. Of these deaths, 7.2 million were due to heart
attacks and 5.7 million due to stroke. About 80% of these deaths occurred in low- and middle
income countries. If current trends are allowed to continue, by 2030 an estimated 23.6 million
people will die from cardiovascular disease
Heart is one of the essential and vital organ of human body and prediction about heart
diseases is also important concern for the human beings so that the accuracy for algorithm is
one of parameter for analysis of performance of algorithms. Accuracy of the algorithms in
machine learning depends upon the dataset that used for training and testing purpose. When
we perform the analysis of algorithms on the basis of dataset whose attributes are shown in
TABLE.1 and on the basis of confusion matrix, we find KNN is best one. For the Future
Scope more machine learning approach will be used for best analysis of the heart diseases
and for earlier prediction of diseases so that the rate of the death cases can be minimized by
the awareness about the diseases.
REFERENCES
[1] Santhana Krishnan J and Geetha S, “Prediction of Heart Disease using Machine Learning
Algorithms” ICIICT, 2019.
[2] Aditi Gavhane, Gouthami Kokkula, Isha Panday, Prof. Kailash Devadkar, “Prediction of
Heart Disease using Machine Learning”, Proceedings of the 2nd International conference on
Electronics, Communication and Aerospace Technology(ICECA), 2018.
[3] Senthil kumar mohan, chandrasegar thirumalai and Gautam Srivastva, “Effective Heart
Disease Prediction Using Hybrid Machine Learning Techniques” IEEE Access 2019.
[4] Himanshu Sharma and M A Rizvi, “Prediction of Heart Disease using Machine Learning
Algorithms: A Survey” International Journal on Recent and Innovation Trends in Computing
and Communication Volume: 5 Issue: 8 , IJRITCC August 2017.
[5] M. Nikhil Kumar, K. V. S. Koushik, K. Deepak, “Prediction of Heart Diseases Using
Data Mining and Machine Learning Algorithms and Tools” International Journal of Scientific
Research in Computer Science, Engineering and Information Technology ,IJSRCSEIT 2019.
[6] Amandeep Kaur and Jyoti Arora,“Heart Diseases Prediction using Data Mining
Techniques: A survey” International Journal of Advanced Research in Computer Science ,
IJARCS 2015-2019.
[7] Pahulpreet Singh Kohli and Shriya Arora, “Application of Machine Learning in Diseases
Prediction”, 4th International Conference on Computing Communication And
Automation(ICCCA), 2018.
[8] M. Akhil, B. L. Deekshatulu, and P. Chandra, “Classification of Heart Disease Using K-
Nearest Neighbor and Genetic Algorithm,” Procedia Technol., vol. 10, pp. 85–94, 2013.
[9] S. Kumra, R. Saxena, and S. Mehta, “An Extensive Review on Swarm Robotics,” pp.
140–145, 2009.
[10] Hazra, A., Mandal, S., Gupta, A. and Mukherjee, “ A Heart Disease Diagnosis and
Prediction Using Machine Learning and Data Mining Techniques: A Review” Advances in
Computational Sciences and Technology , 2017.
[11] Patel, J., Upadhyay, P. and Patel, “Heart Disease Prediction Using Machine learning and
Data Mining Technique” Journals of Computer Science & Electronics , 2016.
[12] Chavan Patil, A.B. and Sonawane, P.“To Predict Heart Disease Risk and Medications
Using Data Mining Techniques with an IoT Based Monitoring System for Post-Operative
Heart Disease Patients” International Journal on Emerging Trends in Technology, 2017.
[13] V. Kirubha and S. M. Priya, “Survey on Data Mining Algorithms in Disease Prediction,”
vol. 38, no. 3, pp. 124–128, 2016.
[14] M. A. Jabbar, P. Chandra, and B. L. Deekshatulu, “Prediction of risk score for heart
disease using associative classification and hybrid feature subset selection,” Int. Conf. Intell.
Syst. Des. Appl. ISDA, pp. 628–634, 2012.
[15] https://archive.ics.uci.edu/ml/datasets/Heart+Disease