Development of Heart Disesase Prediction System Using Firefly Feature Selection and Logistic Regression Algorithm (Tobless)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 42

CHAPTER ONE

INTRODUCTION

1.1 BACKGROUND OF THE STUDY

Heart disease is a leading cause of death worldwide, accounting for over 17.9 million

deaths per year. Early prediction and diagnosis of heart disease can improve treatment

outcomes, reduce healthcare costs, and enhance patient quality of life. Traditional

prediction method rely on medical professionals’ expertise and manual analysis of patient

data, which can be time consuming and prone to errors.

A scarcity of clinical specialists, a growth in the number of chronic diseases, and rising

healthcare costs are all barriers in today's environment. Heart disease remains the leading

cause of premature mortality. Heart diseases occur when enough blood does not reach the

body's needs throughout the pumping process (Zheng, Y.(2018). Because of multiple

contributing danger issues such as diabetes, high blood pressure, high cholesterol,

incorrectpulse rate, and many other conditions, it is difficult to detect heart disease.

People's health, particularly their hearts, suffers as a result of busy lifestyles and junk

food consumption. An accurate decision support system can play a key role in the early-

stage identification of heart problems in developing countries when heart cardiologists

are still not available in remote,semi-urban, and rural locations (Rani et al., 2021).

Heart disease is a harmful disease that affects the functionality of the heart. Heart disease

represents a collection of different diseases such as heart failure, coronary artery disease
(CAD), Heart Arrhythmia, Heart Valve Disease, Pericardial Disease, Cardiomyopathy

(Heart Muscle Disease), and congenital heart disease that many people suffering with.

Cardiovascular disease is one of the most dangerous diseases of the heart. As per the

world health organization report, there are 17.9 million deaths occur through world wide

due to cardiovascular disease. Heart issue leads to scarier people’s lifestyle. Now a day’s

cardiovascular diseases are very common. Cardiovascular disease describes a range of

conditions that affect the heart mechanism. So cardiovascular disease is identified and

measured by taking a range of conditions from the human affected heart. This project can

predict and detect diagnose with heart disease from their medical history reports. It can

help those who are having heart disease symptoms like high blood pressure, asthma, heart

valve pain, and chest pain by giving effective treatment, accurate with less medical

testing. So that patient can be cured with surety.

This HD (heart disease) prediction uses ML (Machine Learning) developed by regression

algorithm. HD Prediction built by Logistic regression algorithm in the rate of 62%. The

regression algorithm falls under the category of machine learning technique. The

regression algorithm is widely used in skin cancer prediction techniques and breast

cancer prediction. Improper diet, sugar at a young age, increasing age, taking more

calories food and no physical activities are the main reasons for the heart disease. These

are impacting major heart disease. Meditation, physical activities, a proper diet, and

healthy food can protect from heart disease.


1.2 STATEMENT OF THE PROBLEM

Heart disease is a leading cause of mortality worldwide, accounting for over 17.9millions

deaths per year. Early prediction and diagnosis of heart disease can be significantly

reduces the risk of death and improve patient outcomes. However, the accuracy of

existing system prediction models is limited by the high dimensionality of relevant

features and the complexity of relationships between them.

Traditional prediction method rely on medical professionals’ expertise and manual

analysis of patient data, which can be time consuming and prone to errors and early

identification of people who are at a high risk of contracting the illness is essential for

effective heart disease prevention and management. There is a need for more precise and

effective prediction models because traditional risk factors, including age, family history

and gender that have low predictive ability.

The following are the area of the specific problem.

High Dimensionality: The vast number of feature collected in electronic health records

(EHRS) and other data sources make it challenging to identify the most relevant feature

of heart disease prediction.

Feature Selection: Existing features selection method they may not effectively identify

the most informative features, leading to reduced model accuracy and increased risk of

over fitting
Model Complexity: Logistic regression model may not capture complex non linear

relationships between features and heart disease risk.

Data imbalance: Heart disease datasets often suffer from class imbalance, where the

number of healthy instances far exceeds the number of diseased instances, leading to

biased model.

1.3 AIM AND OBJECTIVES

Aim

The aim of this project is to develop heart disease prediction system using firefly feature

selection and logistic regression Algorithms.

Objectives

The objective of the HD (heart disease) prediction is to detect the heart disease from their

age, name, medical history report, cardiovascular test, Blood test Electrocardiogram,

nuclear cardiac stress test, and so on. A dataset is kept from the Kaggle repository with

the use of attributes and patient medical history report. Using a dataset and 14 types of

attributes can predict and detect heart disease. In order to predict disease as early as

possible, it can cure heart disease accordingly.

1.4 SIGNIFICANCE OF THE STUDY

The significance of the system is to develop a heart disease prediction system using

firefly feature selection and logistic regression algorithm, the system uses a dataset of
patient characteristics and medical history. Firefly feature selection is applied to identify

the most relevant features. Logistic regression algorithms are then used to predict the

likelihood of heart disease.

1.5 SCOPE OF THE STUDY

This study intends to predict heart disease with high accuracy by proposing an improved

feature selection and enhanced classification approach. The project employs logistic

regression algorithms with Firefly for effective feature selection.


CHAPTER TWO

LITERATURE REVIEW

2.1 Overview of Heart Disease

Heart disease (HD), or cardiovascular disease, is a major cause of death worldwide.

Based on World Health Organization (WHO) report, there are 17.9 million deaths yearly,

and almost 32% of all are passed away (Maghdid etal, 2022). According to the WHO

page, the cause of heart disease is a heart attack, stroke, and rheumatic. Everyone has the

potential for heart disease, especially men compared to the woman. Unhealthy lifestyles,

such as smoking, cholesterol, high blood pressure, obesity, alcohol, and hereditary

history, become the most critical risk of heart disease (Latha et al, 2020) . Not all

sufferers of heart disease end in death. A controlled lifestyle, such as eating habits and

physical activity, can prevent the risk.

Symptoms indicate heart disease, such as shortness of breath (Alshuky et al, 2020),

physical fatigue (Nagarajam et al, 2020), and pain in the chest, arms, shoulders, or back

(Kim J, 2021). Heart disease can attack the sufferer and is not easy to cure because it

needs special treatment. As a vital organ, heart health care must be highly guarded. The

most effortless action to take as a preventive measure is to reduce smoking habits, have a

healthy diet, be active in physical activities and stop consuming alcohol (Ndejjo, 2020).

The various causes of heart disease may increase the prediction complexity.
With the development of medical data sourced from the patient's health record, there is a

great opportunity as a basic material in developing patient health. Currently, the use of

computers has been applied in various fields. In health, it can be used to improve the

decision-support system in medicine (Garate & Hajjam, 2020). Especially, implementing

machine learning as an analytical tool can find hidden patterns in the data (Hassan M,

2018). This development follows up a high degree of prediction in terms of proper

prevention.

Causes of Heart Disease

Heart disease, also known as cardiovascular disease, has several causes and risk factors.

Here are some of the most significant heart disease:

i. High blood pressure: uncontrolled high blood pressure can damage blood

vessels and increase the risk of heart disease .

ii. High cholesterol: elevated level of low-density lipoprotein (LDL) cholesterol

can lead to plaque buildup in arteries, increasing the risk of heart disease

iii. Smoking: Smoking damages blood vessels, increases blood pressure, and

reduces oxygen supply to the heart, making it a significant risk factor.

iv. Diabetes: high blood sugar levels can damage blood vessels and nerves,

increasing the risk of heart disease.

v. Obesity: excess weight can lead to high blood pressure, high colestrol, and

diabetes, all of which increase the risk of heart disease.


vi. Poor Diet: consuming a diet high in saturated fats, sodium, and added sugars

can increase the risk of heart disease.

vii. Lack of exercise: a sedentary lifestyle can contribute to obesity, high blood

pressure, and high colestrol.

viii. Age: heart disease risk increases with age especially after 45 for men and 55

for women.

2.2 Machine Learning

The subset of artificial intelligence focuses on building systems that learn or improve

performance based on the data they consume (Nasteski n.d.). It was born from pattern

recognition and the theory that computers can learn without being programmed to

perform specific tasks; researchers interested in artificial intelligence wanted to see if

computers could learn from data. The iterative aspect of machine learning is important

because as models are exposed to new data, they can independently adapt. They learn

from previous computations to produce reliable, repeatable decisions and results. The

practice of machine learning involves taking data, examining it for patterns, and

developing some sort of prediction about future outcomes (Liu et al. 2022). By feeding an

algorithm more data over time, data scientists can sharpen the machine learning model's

predictions. From this basic concept, several different types of machine learning have

developed.
2.2.1 Unsupervised Machine Learning

A technique in which models are not supervised using a training dataset. Instead, models

themselves find hidden patterns and insights from the given data. It can be compared to

learning which takes place in the human brain while learning new things (Eweje et al.

2021).

Unsupervised learning cannot be directly applied to a regression or classification problem

because, unlike supervised learning, we have the input data but no corresponding output

data. The goal of unsupervised learning is to find the underlying structure of the dataset,

group that data according to similarities, and represent that dataset in a compressed

format.

Example: Suppose the unsupervised learning algorithm is given an input dataset

containing images of different types of dogs. The algorithm is never trained upon the

given dataset, which means it does not have any idea about the features of the dataset.

The task of the unsupervised learning algorithm is to identify the image features on their

own. An unsupervised learning algorithm will perform this task by clustering the image

dataset into groups according to similarities between images.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of

problems:
i. Clustering: Clustering is a method of grouping objects into clusters such that

objects with the most similarities remain in a group and have fewer or no

similarities with the objects of another group (Benndorf et al. 2018). Cluster

analysis finds the commonalities between the data objects and categorizes them as

per the presence and absence of those commonalities.

ii. Association: An association rule is an unsupervised learning method that is used

for finding the relationships between variables in a large database. It determines

the set of items that occurs together in the dataset. The Association rule makes

marketing strategy more effective (Jiang et al. 2019).

2.2.2 Supervised Machine Learning

Supervised learning is the type of machine learning in which machines are trained using

well "labeled" training data, which means that, machines predict the output (Nasteski

n.d.). The labeled data means some input data is already tagged with the correct output.

In supervised learning, the training data provided to the machines work as the supervisor

that teaches the machines to predict the output correctly. It applies the same concept as a

student learns under the supervision of the lecturer. In supervised learning, models are

trained using labeled datasets, where the model learns about each type of data (Benndorf

et al. 2018). Once the training process is completed, the model is tested based on test data

(a subset of the training set), and then it predicts the output.


Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

i. Regression: Regression algorithms are used if there is a relationship between the

input variable and the output variable. Which is used for the prediction of

continuous variables, such as Weather forecasting, Market Trends, etc?

ii. Classification: Classification algorithms are used when the output variable is

categorical, which means there are two classes such as Yes-No, Male-Female, and

True-false.

Logistic Regression:

Logistic regression is a popular machine learning algorithms used for predicting heart

disease. Logistic regression is a statistical method used to predict outcome of a

categorical dependent variable (Target Variable) based on one or more predictor variable.

It’s a type of regression analysis used for predicting probabilities.

.Logistic Regression LR is a simple supervised learning method. It is mostly used to

solve problems involving binary categorization. It's a type of ordinary regression that can

only model a binary variable, such as whether or not an event occurs. Logistic Regression

may help you figure out if a new instance belongs to a given class. The result will be

between 0 and 1 because it is a probability. When employing LR as a binary classifier, a

threshold must be chosen to discriminate between two categories. Multi-valued variables


can be modeled using the LR method. Multinomial logistic regression is a larger variant

of LR. Uddin et.al., 2019)

Variable of logistic regression

i. Binary Outcome: Logistic Regression predicts the probility of a binary

outcome (0 or 1, Yes or No, etc)

ii. Odd Ratio: it estimates the odds ratio, which represents the change in odds of

the outcome for a one-unit change in the predictor variable.

iii. Sigmoid Curve: Logistic Regression uses a sigmoid curve (S-shaped curve) to

model the probability of the outcomes.

iv. Non-linear: Logistic Regression is a non-linear model, meaning the

relationship between predictors and outcome is not linear.

Common application of logistic Regression

i. Classification: Predicting a binary outcome (e.g, spam vs. non emails)

ii. Risk assessment: estimate the probability of a specific outcome (e.g heart

disease risk).

iii. Customer churns prediction: predicting whether a customer will leave or stay.

iv. Credit risk assessment: Predicting loan default probabilities.

Logistic Regression Equation

P(outcome)= 1/(1 + e^(-z))

Where
 P (outcome) is the probability of the outcome

 E is the base of the natural of the natural logarithm

 Z is a linear combination of the predictor variable ( β0 + β1 – x1 + β2 x2 + …)

2.3 Features Extraction

Feature Extraction aims to reduce the number of features in a dataset by creating new

features from the existing ones (and then discarding the original features). These new

reduced sets of features should then be able to summarize most of the information

contained in the original set of features. In this way, a summarized version of the original

features can be created from a combination of the original set (Gemescu et al. 2019). The

process of feature extraction is useful when you need to reduce the number of resources

needed for processing without losing important or relevant information. Feature

extraction can also reduce the amount of redundant data for a given analysis. Also, the

reduction of the data and the machine’s efforts in building variable combinations

(features) facilitate the speed of learning and generalization steps in the machine learning

process.

2.4 Review of related work.

Heart disease prediction system using data mining and hybrid intelligent techniques is

done mostly by doctor’s expertise and experience. Computer Aided Decision Support

System plays a major role in medical field. With the growing research on heart disease

predicting system, it has become important to categories the research outcomes and

provides readers with an overview of the existing heart disease prediction techniques in
each category. Neural Networks are one of many data mining analytical tools that can be

utilized to make predictions for medical data. From the study it is observed that Hybrid

Intelligent Algorithm improves the accuracy of the heart disease prediction system.

Heart Disease Detection Using Feature Extraction and Artificial Neural Networks . It

uses an artificial neural network (ANN) technique to identify scent patterns in individuals

using ten metal oxide semiconductor sensors. Sensor data is scanned and extracted before

using ANN patterns. Before using ANN patterns to generate patterns from sensor data, it

is important to scan and extract sensory information from that data. Each participant is

recognized and scanned for a totally of 1000 different characteristics during the course of

the multiple investigations, which are conducted across a variety of time periods that

include 5, 10, 15, and 20 people. Because of the varying time periods, signals from

sensors are received in analog form, which is then transformed by Arduino into digital

form. It is necessary to train architecture on the data set that has been created. The

benchmarks that are employed for the assessment of the model that is presented for the

identification of human odor include sensitivity, f-measures, accuracy, and specificity,

among other things. Experiments are carried out using the assessment measures, and the

findings demonstrate that this model has an accuracy of greater than 85 % in most cases.

The research demonstrates the potential of feature extraction methods in identifying

individuals and enhancing human odor identification.

Heart Disease Prediction System Using Logistic Regression Algorithm this is done

Detecting the disease at a premature stage may save the life of the patient. Data mining
techniques are very popular and have been used in many fields including healthcare to

help the doctor to make better decisions. Machine learning provides classification

algorithms such as decision tree (DT), Naïve Bayes algorithm, Support machine vector

(SVM), and Logistic Regression (LG) are used in many types of research for predicting

heart disease. The dataset is collected from the Kaggle repository. It contains 604 data

and 14 attributes used to train the model that will be used in the web application.

Building an efficient prediction model to be deployed into the web application is the main

objective of this project.

(Jabbar et al, 2016) proposed work employed RF to predict cardiac illness. The CHI

approach was utilized to choose to take the related features. When compared to decision

trees, the proposed research suggests that random forests yield more accurate results. The

proposed work was built utilizing neural networks by (Kim JK, 2017). Sensitivity

analysis is indeed one of the evaluation metrics for prediction. The importance of features

with such a high degree of sensitivity was considered. After selecting the relevant

characteristic, correlated features were used to examine changes insensitivity. The

sensitivity of each feature is determined by it. This (Amin U, 2018) employed seven

classification algorithms to predict cardiac disease in people. This study used Relief,

MRMR, and LAS, and Selection Operator feature selection methods to choose the

appropriate feature.

In addition to the seven performance metrics this study employed, the ROC and AUC

will help clinicians diagnose heart patients more efficiently. To select an appropriate
feature, (Rani et al., 2021) used a Genetic Algorithm (GA) and recursive feature

elimination. The proposed study used standard and SMOTE to preprocess the data and

performed support vector machines, naive Bayes, logistic regression, random forest, and

an Ada Boost classifier to aid in the earlier prediction of heart disease hung on the

patient's medical features. The system's simulation environment was built in Python, and

it was discovered that random forest achieved a maximum accuracy of 86.6 percent. (Ali

et al., 2019) used the chi-square statistical approach to pick significant features. Particular

features that were selected were fed into a deep neural network, which was then trained to

do classification. A rigorous grid search method would be used to improve network

configuration.

(Paul et al, 2016) used a fuzzy decision support system (FDSS) that includes rules

derived from the genetic algorithm with perhaps even weighted fuzzy derivatives (GA).

They were able to recover eight useful features with an accuracy of 80%. Multiple heart

disease datasets were employed in this study.

(Bashir et al., 2019) for experimentation analysis and to increase accuracy performance.

Feature selection algorithms such as Decision Tree, Logistic Regression SVM, Nave

Bayes, and Random Forest are used with the Rapid miner, and accuracy is enhanced.

(Liu et al., 017) offered a study that used relief and rough set approaches. The proposed

system consists of two subsystems: the RFRS feature system and ensemble classifier

classifications. The first system has three stages: data extraction using the ReliefF

method, feature reduction using our heuristic Rough Set reduction technique, and feature
reduction using our heuristic Rough Set reduction technique. In the second system, which

is based on the C4.5 classifier, an ensemble classifier is suggested. The proposed

technique had a classification accuracy of 92.32 percent. On the Cleveland heart disease

dataset, (Singh et al, 2017) used an RF classifier that can handle large amounts of data

with missing values. This classifier generates a large number of decision trees that are

selected through voting. The chosen branch is used to improve precision. Due to the

obvious non-linear dataset, this study was able to reach an accuracy of 85.81 percent.
CHAPTER THREE

RESEARCH METHODOLOGY

3.1 Data Acquisition

Machine learning algorithms become very popular and used in different fields such as

healthcare, business, etc. to solve many problems. in this system, we proposed a logistic

regression machine learning algorithm for predicting heart disease. The logistic

regression algorithm shows a high accurate result for the prediction. The user-friendly

web application is developed using flask, HTML, and CSS. The user will login to the

system and gives the required input for prediction. Problems in this system, we proposed

a logistic regression machine learning algorithm for predicting heart disease. The logistic

regression algorithm shows a high accurate result for the prediction. The user-friendly

web application is developed using flask, HTML, and CSS. The user will login to the

system and gives the required input for prediction.

3.1.1 Advantages

1. Less time consuming and quick result

2. Accurate results for classification and prediction

3. Diagnose the disease in an early stage

4. Affordable treatment for the patient

5. Reduce doctor’s work


Table 3. 1 A detailed list of dataset features and all possible values is shown b

S/NO Attributes Description Values

1 Age The current age Continous Value

of the patient

2 Gender Gender of the 0 =Male

Patient
1 = Female

3 Cp Chest Pain 1=Typical Angina

Type
2=Atypical Angina

3= Non Angina Pain

4 = Asymptomatic

4 Chol Serum Continuous value

cholesterol

5 Threstbps Resting blood Continuous values

pressure

6 Fbs Fasting blood Value 0: <120mg/dl

sugar
Value 1: >120 mg/dl

7 Restecg Resting ECG 0= normal

Result
1 = having ST-T wave
abnormality

8 Thalach Max Heart Rate Range [71, 202]

Achieved

9 Exang Exercise - 0 = No

Included
1 = Yes
Angina

10 Oldpeak ST depression Continuous value

included by

exercise relative

to

rest

11 Slope Slope of the Range in [1, 3] value

peak exercise

ST segment

11. Ca Number of Range in [0, 3] value

Major Vessels

12 Thal Defect type Value 7 = Reversible

Defect

Value 6 = Fixed Defect


Value 7 = Normal

13 Target Heart disease 0 = normal

prediction
1 = stage 1

2 = stage 2

3 = stage 3

4 = stage 4

3.2 Data Pre-processing

The data is imported into the python environment as a CSV file format. Independent

variables such as (name, gender, age, chest pain type, resting blood pressure, serum

cholesterol, fasting blood sugar, exercise- induce angina, resting ECG result, max heart

rate achieved, st depression, slop of the peak exercise st, number of major vessels) and

dependent variables (target values) are extracted and stored as x and y respectively.
3.3 Flowchart Model

Data Collecton

Data Processing

Feature
Selection

Logistic Regression
Classifier

Web Application
Performance
evaluation

Normal Stage of the heart


Disease
3.4 Feature Selection

It is the process of obtaining a subset from the original dataset without losing the features

of the data set. In this step, irrelevant data, and noise data are removed. Removing

irrelevant data provide a huge impact in the process such as improving the accuracy,

reducing time, and easy understanding of the model. The data is divided into a training

set and a testing set. The test set is used for scaling in order to get an accurate result for

prediction. The training set is used to train the dataset.

3.5 Logistic Regression Classifier

Logistic regression is a supervised learning technique. It is one of the best machine

learning algorithms. Logistic regression is getting very popular and used for classification

and prediction due to its high accuracy. There are two types of regression models: binary

logistic model and multinomial logistic regression model. In the Binary logistic

regression model, the target variable can have either 1 or 0. in the other hand

multinomial logistic regression model which is the model used in this project the

dependent variable can have 3 or more possibilities. The algorithm predicts dependent

variables based on the independent variables. In this project, the logistic regression model

predicts whether the patient has heart disease or not with the specific stage of the disease

(target) based on the symptoms, personal details, and medical test (independents

variable/attributes) of the user. Fitting logistic regression to the training set:

i. Import Logistic Regression class from the SK learn

ii. Create a classifier object

iii. Fit the model to the logistic regression


iv. Predicting the test result using predict method

v. Import confusion matrix class to test the accuracy of the result

vi. A model is built and deployed into the web application

3.6 Software and Hardware Requirements

The requirements needed to implement this system are as follows:

3.6.1 Hardware Requirements

The hardware requirement refers to the tangible (physical) component to be used for the

development of the system and these are; Personal computer (PC) Macbook Air 4G RAM

/256G hard drive with a core i3 processor or higher.

3.6.2 Software Requirements

Windows 8 or higher operating system software can be used for the deployment of this

system or a MacBook Air or higher. Terminal or Command Prompt, Cross-platform(X),

Apache (A), and Python3 will all be used in the project to develop the system. Visual

Studio Code is the software package that will be used to create the source file to make the

system run on the terminal.


CHAPTER FOUR

RESULTS AND DISCUSSION

4.1 RESULTS

The analyses have been performed on the provided heart dataset (multiple times), also,

the typical exactness and the standard deviation have been noted for each of the datasets.

As the heart disease dataset is exceptionally contrasted and the number of times the

dataset have been split for producing best accuracy results using cross-validation

approach took more than 13000 ms. Subsequently, for each of the heart disease datasets,

the proposed strategy has been applied by choosing 10, 50, and 100 elements in firefly

algorithm, individually.

Table 4. 1: Performance Matrix

Metric Firefly-Logistic Regression Traditional Logistic

Regression

Accuracy 92.1% 88.5%

Sensitivity 90.5% 86.2%

Specificity 93.5% 90.8%

Precision 91.2% 88.1%

Recall 90.5% 86.2%

F1-Score 90.8% 87.1%


Table 4.1. The result of this study demonstrates the effectiveness of using firefly feature

selection and logistic regression for predicting heart disease. The system achieved an

accuracy of 92.1% and sensitivity of 90.5%, and specificity of 93.5% outperforming the

traditional logistic regression approach.

Table 4.2 Feature selection Result

Feature Firefly selection Traditional Selection

Age 93% 84%

Sex 92.1% 87%

Cholesterol 90% 76%

Family History 95.8% 89%

Smoking 94.1% 78.5%

Obesity 92.1% 85.1%


Table 4.3 Confusion Matrix

Predicted Class Actual Class Firefly Logistic Traditional Logistic

Regression Regression

Postive Positive 85 80

Positive Negative 5 10

Negative Positive 10 15

Negative Negative 90 85

Discussion

The firefly feature selection method successfully identified the most relevant features,

including age, sex, Blood Pressure, and family history, which are consistent with clinical

risk factors for heart disease. The selected features improved the performance of the

logistic regression model, indicating the importance of feature selection in machine

learning based prediction systems.

The improved performance of the system can be attributed to the ability of the firefly

feature selection to identify the most informative features and eliminate the redundant or

irrelevant ones. This reduces the dimensionality of the data and improves the models

generalizability.
The result of this study has the implications for the development of clinical decision

support system for heart disease prediction.

Furthermore the application of firefly feature selection to their machine learning

algorithm and medical datasets, further evaluating its effectiveness in improving

prediction accuracy.
4.3 RESULT OF THE DEVELOPED HEAR DISEASE PREDICTION SYSTEM

Logistic regression

Logistic regression is a type of regression analysis in statistics used for prediction of

outcome of a categorical dependent variable from a set of predictor or independent

variables. In logistic regression the dependent variable is always binary. Logistic

regression is mainly used to for prediction and also calculating the probability of succes
Index page
Heart disease Predictor
Heart disease analysis Result
Heart Disease predictor Analysis page

Explanation of the variable use

Sex: This action give the user an option to select the his/her gender, the gender are

indicated with 1 and 0, 1 represent Male while 0 represent female.

Resting Blood Pressure: These modules allow the user to select the blood pressure of

the patient it ranges from 94mmHg to 200 mmHg; this indicates the resting period of the

patient.

Thallium stress test: thallium this option allows the user to pick the stressing rate and

the heart beat rate of the patient.


Peak exercise: This option allows the user to select the exercise rate of the patient

ranging from 0-5.

CHAPTER FIVE
SUMMARY, CONCLUSION AND RECOMMENDATION
5.1 SUMMARY

Machine learning techniques, a type of artificial intelligence, are being used in the

health field to assist researchers in recognizing pathology before it becomes a major

problem. Because healthcare is such an important aspect of a country's economy,

researchers are exploring the level of uncertainty that arises when using machine

learning algorithms for ways to anticipate the disease. The most significant concept in

health data analysis is the prediction of cardiac disease from clinical data. The

prediction helps physicians to take exact decisions regarding patients' health. The

proposed model used Data collection, Data preprocessing, and Data Transformation

methods to train the model. This model exploited feature selection methods: filter and

wrapper with classification techniques to enhance the prediction of cardiac disease

classification. The classification techniques, namely: Logistic Regression and Firelfly

algorithm which are pragmatic to evaluate performance metrics. The performance

metrics include Accuracy, F1-score, Precision, Sensitivity; Specificity reveals an

improvement in the outcomes of the prediction.


5.2 CONCLUSION

An effective machine learning-based approach for analysis of heart disease was

established in this work. The work was designed with the help of machine learning

classifiers Logistic Regression and firefly algorithm. The dataset used in the study

comprises a number of patients affected by heart disease and also includes related

features to perform the prediction. The prevalence of features in this dataset was

determined by feature selection procedures. These are the methods that were used to

resolve the issue. Relevant features are used in the classifier model to perform evaluation

metrics. Accuracy, f1 score, precision, sensitivity, and specificity performance evaluation

measures were also used to evaluate the identification system's performance. According

to Table 2, the firefly classifier produces the best results in both feature selection

algorithms when compared with other classifiers. When compared to other ways of filter

feature selection, firefly produces better results. Furthermore, irrelevant features harm the

diagnostic system's performance and lengthen computation time. As a result, another

groundbreaking part of this research was the use of feature selection algorithms to

determine the appropriate features, which enhanced classification accuracy while

simultaneously reducing the computation of the diagnosis process. Other feature selection

techniques and optimization approaches will be exploited to improve the performance of

a prediction method for analyzing HD in the future.


5.3 RECOMMENDATION

Development of heart disease prediction system using firefly feature selection and

logistic regression algorithm is highly recommended in health practitioner center to ease

the queue method, relief doctor from stress work on diagnoses patient. The system can

also helped the health care center to predict if a patient have heart disease or not.

The development of heart disease prediction system can also be recommend for an

individual to diagnose themselves at home to without visiting clinic, this help for them

predict if they have heart disease or not, and to check the functionality of their heart due

to the figure display by the system.

It can also be recommend to other researcher who want to continue in the development of

disease prediction system logistic regression and firefly algorithm, researcher can also

make use of other algorithm for the implementation of the new system using this

designed.
REFERENCE

A. Gavhane, G. Kokkula, I. Pandya, and K. Devadkar, ‘‘Prediction of heart disease

using machine learning,’’ in Proc. 2nd Int. Conf. Electron., Commun.

Aerosp. Technol. (ICECA), Mar. 2018, pp. 1275–1278.

Ali L, Rahman A, Khan A, Zhou M, Javeed A, Khan JA (2019) An automated

diagnostic system for heart disease prediction based on Chi square

statistical model and optimally configured deep neural network. IEEE

Access 7:34938–34945.https ://doi.org/10.1109/ ACCES S.2019.2904800

Amin Ul Haq, Jian Ping Li, Muhammad Hammad Memon, Shah Nazir, Ruinan Sun,

"A Hybrid Intelligent System Framework for the Prediction of Heart

Disease Using Machine Learning Algorithms", Mobile Information

Systems, vol. 2018, Article ID 3860146, 21 pages, 2018.

https://doi.org/10.1155/2018/3860146 .

Gárate-Escamila, A. K., El Hassani, A. H., & Andrès, E. (2020). Classification

models for heart disease prediction using feature selection and PCA.

Informatics in Medicine Unlocked, 19, 100330.

Hosmer Jr DW, Lemeshow S, Sturdivant RX.( 2013) “Applied logistic regression”.

Wiley; Innovations in bio-inspired computing and applications.

Jabbar MA, Deekshatulu BL, Chandra P (2016) “Prediction of heart disease using

random forest and feature subset selection”. In: Innovations in bio-inspired


computing and applications. Springer, Cham, pp 187–196. https

://doi.org/10.1007/978-3-319-28031 - 8_16.

Bandyopadhyay, Oishila, Arindam Biswas, and Bhargab B. Bhattacharya. 2019.

“Bone-Cancer Assessment and Destruction Pattern Analysis in Long-Bone

X-Ray Image.” Journal of Digital Imaging 32(2):300–313. doi:

10.1007/s10278-018-0145-0.

Benndorf, Matthias, JakobNeubauer, Mathias Langer, and ElmarKotter. 2018.

“Bayesian Pretest Probability Estimation for Primary Malignant Bone

Tumors Based on the Surveillance, Epidemiology and End Results Program

(SEER) Database.” International Journal of Computer Assisted Radiology

and Surgery 12(3):485–91. doi: 10.1007/s11548-016-1491-3.

Costelloe, Colleen M., and John E. Madewell. 2021. “An Approach to Undiagnosed

Bone Tumors.” Seminars in Ultrasound, CT and MRI 42(2):114–22. doi:

10.1053/j.sult.2020.08.014.

Eweje, Feyisope R., BingtingBao, Jing Wu, DeepaDalal, Wei-hua Liao, Yu He,

YonghengLuo, Shaolei Lu, Paul Zhang, XianjingPeng, Ronnie Sebro,

Harrison X. Bai, and Lisa States. 2021. “Deep Learning for Classification

of Bone Lesions on Routine MRI.” EBioMedicine 68:103402. doi:

10.1016/j.ebiom.2021.103402.
Gemescu, Ioan N., Kolja M. Thierfelder, ChristophRehnitz, and Marc-André Weber.

2019. “Imaging Features of Bone Tumors.” Magnetic Resonance Imaging

Clinics of North America 27(4):753–67. doi: 10.1016/j.mric.2019.07.008.

He, Yu, Ian Pan, BingtingBao, Kasey Halsey, Marcello Chang, Hui Liu,

ShupingPeng, Ronnie A. Sebro, Jing Guan, Thomas Yi, Andrew T.

Delworth, FeyisopeEweje, Lisa J. States, Paul J. Zhang, Zishu Zhang, Jing

Wu, XianjingPeng, and Harrison X. Bai. 2020. “Deep Learning-Based

Classification of Primary Bone Tumors on Radiographs: A Preliminary

Study.” EBioMedicine 62:103121. doi: 10.1016/j.ebiom.2020.103121.

Jiang, Liangxiao, Lungan Zhang, Chaoqun Li, and Jia Wu. 2019. “A Correlation-

Based Feature Weighting Filter for Naive Bayes.” IEEE Transactions on

Knowledge and Data Engineering 31(2):201–13. doi:

10.1109/TKDE.2018.2836440.

Liu, Renyi, Derun Pan, Yuan Xu, Hui Zeng, Zilong He, Jiongbin Lin, Weixiong

Zeng, Zeqi Wu, ZhendongLuo, Genggeng Qin, and Weiguo Chen. 2022.

“A Deep Learning–Machine Learning Fusion Approach for the

Classification of Benign, Malignant, and Intermediate Bone Tumors.”

European Radiology 32(2):1371–83. doi: 10.1007/s00330-021-08195-z.

Nasteski, Vladimir. n.d. “An Overview of the Supervised Machine Learning

Methods.” 12.
Palmerini, Emanuela, PieroPicci, Peter Reichardt, and Gerald Downey. 2019.

“Malignancy in Giant Cell Tumor of Bone: A Review of the Literature.”

Technology in Cancer Research & Treatment 18:153303381984000. doi:

10.1177/1533033819840000.

Singh, Pramod Kumar. 2018. “Radiography in Skeletal Tumours.” Journal of Medical

Science And Clinical Research 6(10). doi: 10.18535/jmscr/v6i10.132.

Suster, David, Yin Pun Hung, and G. Petur Nielsen. 2020. “Differential Diagnosis of

Cartilaginous Lesions of Bone.” Archives of Pathology & Laboratory Medicine

144(1):71–82. doi: 10.5858/arpa.2019-0441-RA.

Tao, Yuzhang, Xiao Huang, Yiwen Tan, Hongwei Wang, Weiqian Jiang, Yu Chen,

Chenglong Wang, Jing Luo, Zhi Liu, KangrongGao, Wu Yang, MinkangGuo,

Boyu Tang, Aiguo Zhou, Mengli

You might also like