Development of Enhanced Prediction
Development of Enhanced Prediction
Research Protocol
For Registration for Ph.D. Programme
Submitted by
----------------------
Date: 20-05-2022
(Signature of Guide)
2
Table of Contents
i COVER PAGE............................................................................................................. 1
iv ABSTRACT ............................................................................................................... 4
1. INTRODUCTION ...................................................................................................... 5
7. REFERENCES ....................................................................................................... 20
3
ABSTRACT
There are several non-communicable diseases (NCDs) that are major causes of mortality in the
globe. This unseen plague is an under-appreciated cause of poverty that stymies many nations'
economic progress. As the number of people, families, and communities impacted rises, so does
the weight of the burden. In today's world, a big percentage of the population suffers from various
forms of cardiac disease, and the number of individuals suffering from and dying from these
diseases is on the rise. As a result, precise and early identification of cardiac disease is required,
as is correct and adequate treatment, which can save many patients' lives. However, due to the
complex processes and many symptoms and pathological testing, accurate identification of cardiac
problems is challenging and creates delays in adequate treatment.The Machine Learning (ML) has
a wide range of applications in many areas of life, including health care. With the fast expansion
and advancement of the internet, traditional patient service tactics have been phased out and
replaced with electronic healthcare systems in our research work, we are proposing a methodology
in which we will be choose an appropriate Prediction of cardiovascular illness using machine
learning. We will improve the efficiency using dimension reduction techniques. We need to study
to prevent & controlling methods for suffering the cardiovascular diseases. In this problem, we
will study different kinds of heart diseases and identify the major factors those are responsible for
heart diseases. We will calculate the various criteria such as precision, accuracy, and precision and
PR curve etc. with state-of-art methods.
4
1. INTRODUCTION
The heart is a muscle that pumps blood through the circulatory system. Carbon dioxide and
other metabolic waste are exhaled by the lungs as a result of blood pumping. Approximately the
size of a closed fist, the heart lies between the lungs in the chest's main compartment. Muscle
powerhouse of the human body.
Humans, other animals, and birds all have a heart with four chambers: the left and right
atria up top, and the left and right ventricles at the bottom. While the left heart consists of the left
atrium and ventricle, the right one is generally referred to as the right heart. The right atrium pumps
blood to the right ventricle, which receives it from the veins. The right ventricle receives blood
from the right atrium and pumps it to the lungs, where it receives further oxygenation. It is the left
ventricle that receives the oxygenated blood that has been pumped from the left atrium. Left
ventricle pumps oxygen-rich blood to the rest of the body, which is why it is the most powerful
chamber. During ventricular contraction, blood pressure is generated.
Parallel to the heart's surface, coronary arteries provide oxygen-rich blood to the heart.
Heart contraction and relaxation are controlled by a complex network of nerve tissue that carries
the complex signals.The heart is the organ that pumps blood via blood veins to various body
organs, ensuring that there is an appropriate supply of oxygen and other vital nutritional
components. The existence of any organism is entirely dependent on the efficient functioning of
the heart, and if the heart's pumping motion is disrupted, the body's primary organs, such as the
brain and kidneys, suffer. If the person's heart stops functioning, he or she will die in a matter of
minute [1].
There are many different types of heart conditions that might be referred to as "heart
disease." The most common kind of heart illness is coronary artery disease (CAD), which reduces
blood flow to the heart. Heart disease refers to a variety of issues that impact the regular
functioning of the circulatory system, which includes the heart and blood arteries. There are several
types of heart disorders, such as cardiovascular disease, which affects the heart and blood arteries
and causes blood to not be pumped and circulated correctly throughout the body. Because
cholesterol and fat are accumulated inside the walls of the arteries coronary heart disease, the heart
does not get the quantity of blood it needs of blood it requires.
5
According to the findings of the Global Burden of Disease research, India has an age-
standardized CVD mortality rate of 272 per 100,000 people, which is much higher than the global
average of 235 [2]. CVDs hit Indians a decade before they do the rest of the world. Early start,
quick progression, and a high death rate are all grounds of concern for us Indians when it comes
to CVD.
Machine learning algorithms play a critical role in the extraction and interpretation of
medical data. These systems have been widely utilised to implement decision support systems for
healthcare for forecasts, enhanced health forming public policies, and the avoidance of clinical
mistakes, as well as early detection, disease prevention, and preventable hospital fatalities.
Machine learning methods may be utilised as an intelligent system to consistently interpret that
what it means to have a data set and provide an appropriate output from raw data for varied
resolutions. These make it possible to analyse a vast dataset in order to discover patterns and
relationships between many things that would otherwise be impossible to notice without the use
of advanced analytical tools.
In this age of technology and vast population, it is critical to be able to forecast any illness
quickly and accurately. The problem can be solved by pre-processing raw healthcare data into a
well-organized machine learning model. A good machine learning model can not only detect a
condition fast, but also accurately, enhance therapy, decrease human intervention, and lessen the
need for medical lab testing. Building appropriate machine learning models for heart disease
prediction is one of the current requirements, given the staggering increase in heart disease
mortality.
6
2. LITERATURE REVIEW
There are numerous organs in our bodies, but the heart is one of the most important. A heart's
major function is to pump oxygenated blood, hormones, and to assist maintain blood pressure.
Heart Disease is caused by any irregularities in the heart's functioning. The early identification of
this condition can save a person's life. It will be difficult to treat Heart Disease if it is not diagnosed
in its early stages. Many machine learning techniques are utilised to detect cardiac disease. To
begin, the Machine Learning algorithms create a model using an acceptable training dataset.
The model then accepts the user's input and analyses and provides an accurate response
based on the training data.One of the most significant, vital, and widely used decision-making
tools in medical treatment is categorization. Many computational intelligence approaches have
been developed to serve the medical healthcare area.
M. F. Rabbi provided the most popular categorization models used in data mining. They
employ MATLAB multi-layered feed-forward back-propagation with k-nearest neighbour (K-
NN), artificial neural network (ANN), and support vector machine (SVM). The heart disease
Cleveland dataset, which comprises 303 occurrences and 76 characteristics from the UCI machine
learning repository, was used to examine their work. They discovered that the SVM method beat
the K-NN and ANN algorithms with 85 percent classification accuracy after pre-processing the
dataset and running the trials. In comparison, KNN scores 82 percent and 73 percent, respectively,
for ANN [4].
7
tasks. There are 303 patient records in the dataset, each with 14 characteristics. Their findings
showed that NB and SVM were effective for predicting cardiac disease [5].
Jinjri Wada et al. explored effective machine learning algorithms and determined the most
efficient for cardiovascular disease categorization using patient data. Multiple classification
algorithms, including SVM, KNN, DT, LR, and NB, were evaluated using evaluative metrics such
precision, recall, f1-score, accuracy, and training time [6]. They reveal that the most effective
approaches for identifying cardiovascular illness are support vector machine (SVM) and logistic
regression (LR).
Khan Ayub and Algarni Fahad suggested an IoMT-based healthcare monitoring system that uses
MSSO-ANFIS to forecast cardiac illness. Every time, LCSA for feature selection outperformed
all other options in terms of fitness values. The novel MSSO-ANFIS technique is superior to the
present HOBDBNN, GA-RFNN, HRFLM, ANN-FuzzyAHP, x2-DNN, logistic regression, ICA
with meta-heuristic, and hybrid intelligent systems methods in terms of precision, recall, F1-score,
and accuracy, as well as the lowest classification error [7].
It is possible to infer that the suggested MSSO-ANFIS is successful in recognising and continually
monitoring patients' cardiac issues. If necessary, the doctor may administer immediate treatment
depending on the diagnosed heart condition.
A comparison was also made between the model's efficiency and the efficiency of an existing
model that used the identical CVD dataset for the experiment. On this dataset, the experimental
results show that the XGB classifier is the most accurate, with an accuracy rate of 75%.
Machine learning methods were used to predict heart illness in this work by F. Shaik and V.
Duggineni. The UCI Machine Learning Repository provided datasets for heart disease and related
factors, and classification models were applied to the dataset. Five machine learning models,
including Support Vector Machine, Random Forest, KNN Gaussian Nave Bayes and Xg-Boost,
8
are used to diagnose heart illness in a short time and retrieve the results, as well as to lower people's
costs [9].
Goel Sakshi et al concluded that machine learning and artificial neural networks, produce the most
accurate and dependable findings for the prediction of cardiac disorders. If these models are
adopted, they will give end-user assistance and consultant services to patients with heart disease
diagnosis that is simple, rapid, and accurate [10].
Nweke et al. identified many sensor-based and wearable deep learning algorithms that extract
attributes automatically from mobile and human activities. Generative models, discriminative
model, and hybrid model are all examples of deep learning. Categorization techniques include
restricted Boltzmann Machines, auto-encoder and recurrent neural networks; discrimination
approaches include convolutional neural networks, deep mixture models and sparse coding [11].
A generative and discriminative model is used to describe a hybrid technique that may improve
feature learning. In addition to saving time and allowing more precise detection, effective vectors
may be generated using mobile sensor data. There are, however, issues with DL-based decision
fusion using deep learning on mobile devices and transfer learning, as well as worries about class
imbalance in the research community.
As a result, Riazul Islam et al. have suggested a wide range of medical network architectures,
platforms, and protocols that allow for the IoT foundation to be made accessible while also aiding
in the transport and collecting of data in healthcare. Using IoT-based medical services may help
you save money while also enhancing your well-being. Devices including IoT healthcare tools, on
the other hand, have CPUs that are too slow [12].
An extensive effort by K.Divya et al. shows that Random Forest is the most compatible
competitor for prediction models and delivers the largest performance measure among K-Nearest
Neighbor and Decision Tree. There is a 1.3969-second execution time for the recommended task,
which has an accuracy of 96.71 percent, a recall of 98.74%, and preciseness of 94.44% and
specificity of 96.11% [13].
9
3. DESCRIPTION OF UNDERLYING TECHNOLOGIES
Machine learning is the use ofalgorithms that are able to learn from data. Big data and low-
cost computing power are driving advancements in machine learning. Machine learning is based
on the observations of earlier machines. Algorithms are developed. In its most basic form, machine
learning is derived from data. Machine learning is a comprehensive multidisciplinary method that
includes statistics, algebra, data collecting, and data processing, among other things. ML is a
fundamental artificial intelligence technology for extracting information from data through data
training.The following are the several types of machine learning:
A. Supervised Learning
B. Unsupervised Learning
C. Reinforcement Learning
Machine learning is the study of computer programmes that can learn without being
explicitly designed by employing algorithms and statistical models via inference and pattern
recognition. Supervised learning is one of the most basic kinds of machine learning. When using
machine learning, the system is trained on labelled examples of the data. In spite of the need for
10
exact data labelling, supervised learning may be quite successful when applied to the correct
circumstances.
In supervised learning, the ML algorithm is given a small training dataset. Using this
training dataset, the algorithm is given a basic grasp of the problem, solution, and data points that
need to be dealt with. The training dataset has many of the same properties as the final dataset, and
it provides the algorithm with the labelled parameters it needs to solve the problem.
Unsupervised machine learning has the advantage of being able to work with data that has
not been labelled. As a result, the programme will be able to deal with considerably larger datasets
since no human labour is required. An algorithm can discover the precise nature of any relationship
between two data points thanks to the use of labels. Hidden structures are formed when there are
no labels to deal with in unsupervised learning. There is no need for any human intervention in the
program's interpretation of data linkages.
The ability to deal with unlabeled data is a benefit of unsupervised machine learning. This
implies that no human labour is necessary to make the dataset machine-readable, allowing the
software to work on much bigger datasets. The labels in supervised learning allow the algorithm
to determine the exact nature of any link between two data points. Unsupervised learning, on the
other hand, lacks labels to deal with, leading in the formation of hidden structures. The programme
perceives relationships between data points in an abstract fashion, with no human input necessary.
Reinforcement learning takes its cues from how real-world learners make sense of the data
they encounter on a regular basis. Incorporates a trial-and-error system that constantly improves
and learns from diverse situations. Unfavourable outputs are discouraged or "punished," while
beneficial outputs are reinforced or "encouraged”. Using the principle of conditioning,
reinforcement learning places the algorithm in a work environment with an interpreter and a reward
system. The interpreter receives the output of each iteration of the algorithm and decides whether
or not the result is helpful [14].
11
The interpreter rewards the algorithm if it successfully locates the correct answer. The
algorithm must keep repeating the procedure until a better outcome is obtained if the first attempt
fails. An incentive system's effectiveness is inversely related to its effectiveness in most cases.
Machine learning's difficulty is to find information graphs in provided data and then generate
predictions based on them on a regular basis, huge trials to find business optimistic questions, and
assist them in solving difficulties. Machine learning algorithms analyse your data and identify
trends. In supervised learning, the model is "trained" using a significant amount of data, and
methods are used to predict a result from future sources of data.
The utilisation of IoT technology provides the most up-to-date medical device environment
for medical professionals and patients. IoT devices and machine learning are useful in a variety of
applications, ranging from long-range climate monitoring to mechanical automation. Furthermore,
medical care applications are primarily expressing interest in IoT items due to cost savings,
convenience of use, and improved patient satisfaction. For intellectual, creativity-based solutions,
the most recent applications for IoT medical care, studied and currently confronting issues in the
clinical setting, are required.
3.2Dimension Reductions
There are frequently too many criteria on which the final categorization is made in machine
learning classification issues.These elements are essentially variables referred to as features. The
more characteristics there are, the more difficult it is to envision the training set and subsequently
work on it.Most of these characteristics are sometimes connected and hence
redundant.Dimensionality reduction methods are useful in this situation.The technique of lowering
the number of random variables under consideration by generating a set of primary variables is
known as dimensionality reduction.It is split into two parts: feature selection and feature
extraction.
Feature selection: In this step, we strive to locate a subset of the original collection of
variables, or features, so that we can model the issue with a smaller subset. It is generally
accomplished in three ways:
Filter’s Methods
Wrapper’s Methods
12
Embedded Methods
Precision and recall metrics would be used to evaluate the data obtained. PCA is a linear
dimensionality reduction approach that is widely utilised. It's an algorithm for unsupervised
learning. In a data collection, feature extraction helps to reduce the amount of irrelevant data that
is collected.
Primordial component analysis (PCA) seeks to reduce the number of variables in a data
collection while maintaining as much variation as feasible. This is the primary objective of PCA.
It is possible to achieve the same effect by transforming the variables into a new set of orthogonal
and ordered variables known as principle components (or simply PCs). As a consequence, the
variation in the original components is preserved in the first principle component. To put it another
way, the principal components of a covariance matrix are orthogonal to each other.
3.3Deep Learning
An artificial neural network (ANN) is an algorithm based on the structure and function of
the brain that is used in deep learning. To perform classification tasks, a computer model learns
directly from images, text, or sound. Modern deep learning models can match or even outperform
human performance in terms of accuracy. Multilayer neural network topologies are used to train
models using massive amounts of labelled data. The term "deep" refers to the amount of hidden
layers in a neural network. In contrast to typical neural networks, which only contain 2-3 hidden
layers, deep neural networks may have up to 150. Deep learning models are trained using massive
volumes of labelled data and neural network topologies that extract characteristics directly from
the data.
13
Figure 3.2: Types of Deep Learning Techniques (Iqbal H. Sarker 2021)
The automation of health care is one application that uses IoT to monitor the patient's health
state in order to make medical devices more efficient by monitoring the patient's health, identifying
bodily ailments and reducing human error. A health-care monitoring system is used to track a
patient's physiological parameters for a specific disease and collect data about it. The heart rate
monitor is one of the in-system IoT devices for recognising and monitoring cardiac patients'
conditions in emergency scenarios. It keeps track of a patient's heart rate that has been diagnosed
with long-term cardiovascular illness.
The pulse sensor and ECG sensor are communicated with using an Arduino-based
microcontroller. With the aid of the Raspberry Pi, the system can analyse the signal, extract
characteristics from it, and detect normal or abnormal circumstances, and the findings of the ECG
14
signals are provided to the web server. It guarantees that the heart rate signal is transmitted to the
database via IoT. This also recommends that doctors should monitor their patients' progress using
the patient's data contained in the database. As a result, the Internet of Things (IoT) provides one
of the solutions for cardiac patient monitoring while also reducing the complexity of the
relationship between patient outcome and technology.
15
4. OBJECTIVES OF THE RESEARCH
Motivation for the Research: Heart disease is responsible for the increase in mortality. Heart
disease is usually not detected in the early stages. The patient usually does not experience severe
symptoms until the blockage in the vessels exceeds a level. Invasive methods of diagnosing heart
disease such as angiography are costly and risky. Heart disease can be diagnosed using non-
invasive tests using a decision support system. This could reduce the human error associated with
diagnosing heart disease. ML based predictive model can help in diagnosing heart disease before
the conditions become critical. If heart disease is diagnosed on time, then heart failure can be
avoided, and human life can be saved. Accuracy is of utmost importance in any disease detection
system. The motivation behind this work is to develop a highly accurate predictive model that
might predict the absence or presence of Heart Disease depending on different symptoms of heart-
related features based on medical parameters that can assist healthcare workers.
The goal of this research is to develop enhanced machine learning model for prediction of Cardio
Vascular Disease.
16
5. PROPOSED METHODOLOGY
We will collect the dataset from different Machine Learning or Medical Repository like UCI,
Kaggle, IEEE DataPort etc.
STEP 2: Data Preprocessing and Cleaning
In this stage, we'll remove duplicates from the dataset and extract relevant variables using any of
the data preprocessing techniques and feature extraction methods available, such as Principal
Component Analysis, Linear Discriminant Analysis, and Generalized Discriminant Analysis.
STEP 3: Choose an appropriate Machine Learning Algorithm
In the medical field, the disease diagnosis process may be thought of as a decision-making process
in which a medical practitioner makes a diagnosis of a new and unknown case based on clinical
evidence and his or her expertise in the area. This decision-making process may be automated to
make it less expensive, simpler, quicker, more accurate, and more efficient.
For heart disease prediction we will choose for training our model trough any of them multilayer
perceptron architecture of neural network, back-propagation, CNN, RNN, Naive Bayes, Logistic
Regression, Support Vector Machine learning algorithm and then testing will be perform.
STEP 4: Compute the Efficiency with the help of various parameters
Once the model will train then we test our model and we will compute the performance of our
trained model with various parameters like Precision, Accuracy, F1Score, Specificity, Recall, PR
Curve, ROC Curve, Error Rate etc. & Compare with Existing Methods available.
This will be implementing in Python Language using Jupyter Notebook or Google Colaboratory.
17
Flowchart of the Proposed System as mentioned below:
Dimension Reduction
Evaluating Model Train with the Machine using Feature Extraction
Learning Models Methods
18
6. EXPECTED OUTCOME OF RESEARCH
• The proposed model / methodology that improves the early stage detection using machine
learning algorithms will give us better results for our work and using appropriate dimension
reduction algorithms to get better results of a given classification algorithm for a cardio
vascular disease from the current methods available .
• The proposed model will reduce the time taken for decision-making. It will remove the
headache of searching the required data from large amounts of data.
• The proposed model will provide timely and accurate information to the decision-makers.
• The proposed model will help in deciding what action should be taken when a particular
situation occurs. Sometimes many possible actions can be taken in a particular situation. It
is difficult for human beings to choose any one option among the many possible
alternatives.
19
REFERENCES
1. B. Hamidreza, M. Maryam, R. Amir Masoud (2021) “Deep learning applications for IoT
in health care: A systematic review” Informatics in Medicine Unlocked, Volume 23, 2021,
100550, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2021.100550.
(https://www.sciencedirect.com/science/article/pii/S235291482100040X)
6. Jinjri Wada et al. (2021) “Machine Learning Algorithms for The Classification of
Cardiovascular Disease- A Comparative Study” 2021 International Conference on
Information Technology (ICIT) | 978-1-6654-2870-5/21/$31.00 ©2021 IEEE | DOI:
10.1109/ICIT52682.2021.9491677
7. Khan Ayub and Algarni Fahad (2020) “A Healthcare Monitoring System for the Diagnosis
of Heart Disease in the IoMT Cloud Environment Using MSSO-ANFIS”, IEEE ACCESS,
Digital Object Identifier 10.1109/ACCESS.2020.3006424
8. R. Y. Tamanna, et al (2021) “Early Prediction of Cardiovascular Diseases Using Feature
Selection and Machine LearningTechniques”Proceedings of the 6th International
Conference on Communication and Electronics Systems (ICCES-2021), IEEE Xplore Part
Number: CFP21AWO-ART; ISBN: 978-0-7381-1405-7, DOI:
10.1109/ICCES51350.2021.9489057
20
9. F. Shaik and V. Duggineni (2020) “Dynamic Heart Disease Prediction using Multi-
Machine Learning Techniques” IEEE Xplore
10. Goel Sakshi et al (2019) “Comparative Analysis of various Techniques for Heart Disease
Prediction” 2019 4th International Conference on Information Systems and Computer
Networks (ISCON), IEEE Xplore
11. Nweke Henry Friday, Teh Ying Wah, Al-garadi Mohammad Ali, Alo Uzoma Rita.(2018)
“Deep learning algorithms for human activity recognition using mobile and wearable
sensor networks: state of the art and research challenges.” Expert Syst Appl Sep 2018;
105:233–61.
12. Islam Riazul, et al. (2015) “The internet of Things for health care: a comprehensive survey”
IEEE Access 2015;3:678–708.
13. K. Divya, et al (2019) “Prediction of Coronary Heart Disease using Supervised Machine
Learning Algorithms” 2019 IEEE Region 10 Conference (TENCON 2019)
14. Shailaja, K., Seetharamulu, B., & Jabbar, M. (2018). Machine Learning in Healthcare: A
Review. Paper presented at the 2018 Second International Conference on Electronics,
Communication and Aerospace Technology (ICECA).
15. S. H. Iqbal, (2021) “Deep Learning: A Comprehensive Overview on Techniques,
Taxonomy, Applications and Research Directions”, SN Computer Science (2021) 2:420
https://doi.org/10.1007/s42979-021-00815-1
21