0% found this document useful (0 votes)

9 views

A Genetic Algorithm-Based Feature Selection Approach For Diabetes Prediction

Genetic algorithms have emerged as a powerful optimization technique for feature selection due to their ability to search through a vast feature space efficiently. This study discusses the importance of feature selection for prediction in healthcare and prominently focuses on diabetes mellitus. Feature selection is essential for improving the performance of prediction models, by finding significant features and removing unnecessary among them. The study aims to identify the most informative subset of features. Diabetes is a chronic metabolic disorder that poses significant health challenges worldwide. For the experiment, two datasets related to diabetes were downloaded from Kaggle and the results of both (datasets) with and without feature selection using the genetic algorithm were compared. Machine learning classifiers and genetic algorithms were combined to increase the precision of diabetes risk prediction. In the preprocessing phase, feature selection, machine learning classifiers, and performance metrics methods were applied to make this study feasible. The results of the experiment showed that genetic algorithm + logistic regression i.e., 80% (accuracy) works better for PIMA diabetes, and for Germany diabetes dataset genetic algorithm + random forest and genetic algorithm + K-Nearest Neighbor i.e., 98.5% performed better than other chosen classifiers. The researchers can better comprehend the importance of feature selection in healthcare through this study.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

A Genetic Algorithm-Based Feature Selection Approach For Diabetes Prediction

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 2, June 2024, pp. 1489~1498

ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i2.pp1489-1498  1489

A genetic algorithm-based feature selection approach for

diabetes prediction

Kirti Kangra1, Jaswinder Singh2

1
Department of Computer Science and Engineering, Research Scholar of Guru Jambheshwar University of Science and Technology,
Hisar, India
2
Department of Computer Science and Engineering, Faculty of Guru Jambheshwar University of Science and Technology, Hisar, India

Article Info ABSTRACT

Article history: Genetic algorithms have emerged as a powerful optimization technique for
feature selection due to their ability to search through a vast feature space
Received Jul 28, 2023 efficiently. This study discusses the importance of feature selection for
Revised Oct 16, 2023 prediction in healthcare and prominently focuses on diabetes mellitus.
Accepted Nov 15, 2023 Feature selection is essential for improving the performance of prediction
models, by finding significant features and removing unnecessary among
them. The study aims to identify the most informative subset of features.
Keywords: Diabetes is a chronic metabolic disorder that poses significant health
challenges worldwide. For the experiment, two datasets related to diabetes
Chronic disease were downloaded from Kaggle and the results of both (datasets) with and
Feature selection without feature selection using the genetic algorithm were compared.
Genetic algorithm Machine learning classifiers and genetic algorithms were combined to
Health-care increase the precision of diabetes risk prediction. In the preprocessing phase,
Machine learning feature selection, machine learning classifiers, and performance metrics
Optimisation methods were applied to make this study feasible. The results of the
experiment showed that genetic algorithm + logistic regression i.e., 80%
(accuracy) works better for PIMA diabetes, and for Germany diabetes
dataset genetic algorithm + random forest and genetic algorithm + K-Nearest
Neighbor i.e., 98.5% performed better than other chosen classifiers. The
researchers can better comprehend the importance of feature selection in
healthcare through this study.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Kirti Kangra
Department of Computer Science and Engineering Guru Jambheshwar University of Science and Technology
Hisar, Haryana, India
Email: [email protected]

1. INTRODUCTION
Feature selection is a crucial step in machine learning (ML) and data analysis, which involves
selecting the most pertinent and instructive features (variables or attributes) from a broader set of potential
features [1]. The goal of feature selection is to improve the performance of an ML model by reducing
dimensionality, mitigating the risk of overfitting, speeding up the training process, and providing an optimal
solution using Optimisation strategies. It can also help in improving model interpretability and reducing
noise. Some common methods of optimization strategies for feature selection are:
Filter methods: These methods assess the relevance of each feature independently of the others and
select features based on statistical measures [2]. Some common filter methods include correlation, chi-
squared test, and variance thresholding. Wrapper methods: These methods involve training and evaluating
the ML model with different subsets of features to determine which combination yields the best performance

Journal homepage: http://ijai.iaescore.com

1490  ISSN: 2252-8938

[3]. Common wrapper methods include: Forward selection, backward elimination and recursive feature
elimination (RFE). Embedded methods: Some ML algorithms have built-in feature selection mechanisms.
For example L1 regularization (Lasso), tree-based methods, dimensionality reduction techniques, domain
knowledge, feature importance from tree-based models, and sequential feature selection.
Metaheuristic algorithms: These are optimisation techniques used to find solutions to complex
problems by exploring the search space efficiently. In the context of feature selection, metaheuristic
algorithms can be employed to search for an optimal or near-optimal subset of features [4]. These algorithms
aim to find the best subset of features by evaluating different combinations based on a fitness or objective
function such as genetic algorithms, particle swarm optimisation, or simulated annealing.
Hybrid methods: Combine multiple feature selection techniques to take advantage of their strengths
and mitigate their weaknesses [5]. For example, using a filter method to preselect a subset of relevant features
and then applying a wrapper method for fine-tuning. The type of data being used, the problem being addressed,
and the available computational resources all influence the feature selection. It's often a crucial part of the
feature engineering process in ML projects as it can significantly impact model performance and efficiency.
This study primarily focuses on the genetic algorithm (GA) for feature selection to predict diabetes
mellitus with the help of different ML classifiers. The GA [6], a search heuristic, is based on Charles
Darwin's idea of natural evolution that replicates the process of natural selection in which the fittest
individuals are chosen for procreation to develop offspring for the future generation. GA is part of
metaheuristic algorithm that is used in feature selection to optimize the solutions to different computer
science problems. The key components of a metaheuristic algorithm are intensity and dispersion. To
effectively address the real situation, a balance between these components is essential. Metaheuristic
algorithms fall into two main categories in computer science: single-solution and population-based
metaheuristic algorithms. GA comes under population-based metaheuristic algorithm. It generally
incorporates biologically inspired operators like mutation, crossover, and selection to optimize and solve
search problems with high-quality solutions. It is frequently employed in problem-solving, research, and ML.
The correct sequence of operators (discussed below) is required to be followed while implementing GA
(shown in Figure 1 [7]):
− Initialization/Population: In the first step of the GA, the initial population is produced at random for each
unique solution. The population is determined by the type of problem, which may have a number of
solutions. For the subsequent steps, some encoding schemes must be implemented at this step.
− Selection: The next phase in GA is the reproductive phase. It randomly selects chromosomes from a
population, based on an objective function. The objective functions are used to select individuals through
survival of the fittest [8].
− Crossover: To produce a new chromosome or offspring, two chromosomes are fused. This procedure is
used to develop a new offspring after selection.
− Mutation: It alters the value of one or more genes on the chromosome. This operator randomly flips few
chromosome bits [9].
− Fitness Function/Objective Function: It examines various chromosomes to determine which is optimal.
This study used the fitness function noted:

𝐹(𝑡) = 𝑒𝑣𝑎𝑙(𝑧) (1)

𝑎+𝑏
Where 𝑧 =
𝑎+𝑏+𝑐+𝑑
a = observation is positive & predicted to be positive
b = observation is negative & predicted to be negative
c = observation is negative & predicted to be positive
d = observation is positive & predicted to be negative
There are various encoding schemes for the chromosome representation (see Figure 1.). In this study
Binary encoding scheme is used in which chromosomes are encoded as binary strings and have two potential
gene variants, 0 and 1. It is assumed that chromosomes are points in the solution space. These are handled by
repeatedly replacing its population with genetic operators. Nowadays, GA is used to solve real-world
problems emanating from a variety of disciplines, including economics, medicine, politics, management, and
engineering. This study focuses on the use of GA in the healthcare sector.
The world is currently dealing with a number of chronic diseases, including diabetes, cancer,
tuberculosis, and heart disease. It is essential to find these diseases early on. These diseases must be endured
for a very long time by the sufferer and are spreading more widely every day. To control these diseases, more
study is needed. Among them, for this article, we have chosen Diabetes chronic disease. GA and diabetes are
the topics of this research.

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Int J Artif Intell ISSN: 2252-8938  1491

Predicting and categorizing diabetes mellitus is one of the most difficult tasks in biomedical
sciences [10]. Diabetes is one of the top causes of death in developing nations. High diabetes prevalence rates
in people aged 20 to 79 were reported for China, India, and Pakistan in 2021. With 0.6 million deaths, India
ranks third in the world due to its sizable population and high rate of diabetes. Diabetes is one of the global
health calamities with the fastest rate of growth in the twenty-first century, according to the tenth edition of
the International Diabetes Federation Atlas. Due to inadequate insulin, glucose levels in people with diabetes
continue to rise. A research report from the International Diabetes Federation projects that by 2040, there will
be 642 million cases of diabetes worldwide [11]. So, to reduce this number machine learning optimisation
techniques are required for better results.
The Key objectives of this paper is to provide an overview of genetic algorithm. Experiment and
compare the values of different evaluation parameters using with and without genetic algorithm to
demonstrate the importance of feature selection. This paper is divided into sections. The first section of the
paper provides an overview of GA and diabetes. The second section discusses Literature related to GA. The
rest of the sections discuss the methodology used and experimental results.
Problem statement diabetes can be diagnosed as a binary classification problem that divides all
experiment subjects into two groups: those who have diabetes and those who do not. In the article by Kangra
and Singh [12] perform an experiment on conventional machine learning classifiers. We have compared
different ML classifiers for diabetes prediction but the results are not satisfactory. To enhance the prediction
rate, this study will use feature selection. It plays a significant part in disease prediction by removing pointless
work and shrinking the dataset. Today, diabetes has developed into a significant issue that requires effective
treatment. But using conventional ML classifiers it cannot be predicted in a precise way. To get more precise
results this article will use GA for feature selection with conventional ML classifier to predict diabetes.

Figure 1.Taxonomy of genetic algorithm

2. REVIEW OF LITERATURE
This section highlights pertinent research on feature selection using GA and classification
algorithms that have been used to identify diabetes and other medical conditions including heart disease. To
identify diabetes, a number of feature selection methods have been used. Table 1 which was created after
analysing the studies described below helps in the selection of classifiers for experimental analysis.
In this study, Patil et al. suggested a stacking-based non deterministic sorting genetic algorithm
(NSGA-II) technique for type 2 diabetes prediction [13]. On-dominated sorting GA was utilised in NSGA-II.
This strategy made use of two diabetes-related datasets. Utilising Matlab experiment was conducted.
Comparisons between the proposed NSGA-II stacking methodology and the boosting, bagging, random
forest (RF), and random subspace methods were designed. K-Nearest neigbour (KNN) outperforms decision
trees when used as a stacking combiner. A novel diabetes detection technique was presented by
Domínguez et al. [14] employing design of experiment, GA, and multi-layer perceptron (MLP). GA was
A genetic algorithm-based feature selection approach for diabetes prediction (Kirti Kangra)
1492  ISSN: 2252-8938

used to optimise the parameters in the MLP. The MLP serves as a model within GA to provide the fitness
evaluation of the solutions. The diabetes statistics were gathered from the "Sylhet Diabetes Hospital in
Bangladesh". The proposed model's accuracy rate was 98%. Arukonda and Cheruku [15] designed a medical
support system for diabetes prediction. In this research, the researcher combined feature selection algorithms
with ML classifiers. Feature selection was performed by the Akaike information criterion and GA. Six well-
known classifier algorithms, including support vector machine (SVM), RF, KNN, gradient boosting, extra
trees, and naive bayes (NB), were coupled with these methodologies. The dataset was produced using patient
records from the general hospital "Centro Médico Siglo XXI" in Mexico. S. Arukonda and R. Cheruku [15]
their study they had discussed different chronic diseases i.e. Diabetes, Heart, Kidney, and Breast Cancer.
Datasets were split in the ratio of 90:10. 10% of test data was further split using 5-fold cross-validation. The
bootstrapped technique was used to apply logistic regression (LR), SVM, decision tree (DT), and KNN basis
learners to create 20 different base learners. GA was used to find the best ensemble learner. PIMA diabetes
dataset (PIDD), kidney, heart, and breast cancer dataset showed 90.9%, 96.05%, 97.56%, and 98.08%
accuracy rates for the proposed model. E1-Shafiey et al. [16] discussed heart disease. The researcher
proposed a hybrid model consisting of GA, particle swarm optimization (PSO), and RF. For the experiment
Python was used on the PIDD [17]. The result achieved by the model was 95.60%. Togatorop et al. [18]
proposed stacked generalization GA to predict heart disease. PID and Diabetes 130-US hospital datasets were
used to check the validity of the model. The result of the proposed model was 98.8 and 99.01%. Researchers
in Tan et al. [19] proposed an optimisation model to predict heart disease. In this GA and RF were used for
the prediction. The result of the anticipated model was 85.83%. Rajagopal et al. [20] proposed a model to
predict diabetes. In this research, the researcher used “Qingdao desensitization physical examination data
from 1 January 2017 to 31 December 2019”. GA, DT, CNN, SVM, and KNN were used. The suggested
model produced a result of 98.71%. Nagarajan et al. [21] stat ed a customized hybrid model of artificial
neural network (ANN) and GA for diabetes prediction. With an accuracy rate of 80%, the mentioned
customised hybrid model and its supporting decision-making algorithm were applied to PIDD obtained from
the University of California Irvine (UCI) ML Repository. Ashri et al. [22] proposed a system for diagnosing
cardiovascular disease. The objective of this project was to design a hybrid genetic-based crow search
algorithm (GCSA) for feature selection and classification using deep convolution neural networks”. The
accuracy of the GCSA model was 95.34% for extracted features and 88.78% for original features.
El-Shafiey et al. [23] introduced “hybrid classifiers using the ensembled model with majority voting”
technique to boost prediction for cardiovascular disease. The dataset was acquired from UCI. LR, SVM, RF,
DT, and KNN were used as ML classifiers. The experiment was performed using Python. The result of the
proposed model was 98.18%.
Li et al. [24] introduced GA-RF based heart disease model. The dataset was taken from UCI. The
experiment was performed using Python. The experimental results show that the proposed technique
predicted heart disease with 95.6% accuracy on the Cleveland dataset. Dinesh and Prabha [25] proposed a
method that had three steps: preprocessing, feature selection, and classification to predict diabetes. “Harmony
search algorithm, GA, and PSO algorithms were examined with K-means for feature selection”. The diabetes
dataset was classified using KNN. The proposed method had 91.65% accuracy.To anticipate diabetes,
Rani et al. [26] proposed a hybrid model. In the proposed work, the features were transformed using Kernel
principal component analysis (KPCA). SVM was used for classification, and GA to select features. To
experiment, Python was used. The presented method had 97.3% accuracy rate. A hybrid decision support
system was put forth by Alharan et al. [27] in as a tool for the early diagnosis of cardiovascular disease. GA
and Synthetic Minority Oversampling Technique were used in the preprocessing phase. NB, LR, SVM, and
RF were used as classification models. Among them, RF performed better with an accuracy of 86.6%. To
obtain the dataset UCI was used and for the experiment, Python was used. In this study, Dweekat and Lam
[28] proposed a diabetes diagnosis system by analysing two different diabetes datasets, namely “PIDD and
Dr. John Schorling's data”. Python was used for the experiment. For feature selection, linear discriminant
analysis and GA methods were used. For classification, RF, logistic model tree, and JRip algorithms were
used. The datasets' accuracy was 90.89% and 91.44%, respectively.
ML methods can help with the early identification of diabetes by looking at related research. But
there are also some problems with these investigations. The associated research uncovers the following
knowledge gap: i) A few studies failed to use parameter measurements to display their findings. Accuracy is
crucial, but performance evaluation also takes into account other factors, such as error rates. As a result,
using more prominent features to minimise calculations, GA may be used as a potential tool to make
predictions more efficiently. It is possible to design more effective solutions for the healthcare industry by
integrating GA with other optimisation techniques. This will aid practitioners in making more accurate
decisions in the healthcare industry.

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Int J Artif Intell ISSN: 2252-8938  1493

Table 1.Classifier count

ML Classifiers References Count
DT [13], [15], [17], [22] 4
SVM [14], [15], [17], [19], [22], [25], [26] 7
KNN [13]–[15], [19], [22], [24] 6
LR [15], [22], [26] 3
NB [14], [17], [26] 3
RF [13], [14], [16], [17], [18], [22], [23], [26], [27] 9
ANN [20] 1
MLP [28], [17] 2

3. METHODOLOGY
The essential phases used in this research are shown in Figure 2. There are four phases to this
methodology. In the first phase, Preprocessing was used to generate a properly organised dataset. In the
second phase, the optimum features were chosen using GA, with the help of an objective function (1). In the
third phase, six ML classifiers were used for classification. Finally, the results of the classifier were evaluated
using the performance metrics.

Figure 2. Methodology

3.1. Datasets description

The pima Indian diabetes (PID) [29] and the Germany diabetes [30] datasets were used to evaluate
the utility of the feature-selection approach. Both databases are often used to forecast diabetes and are freely
accessible. The PID dataset, which was used to evaluate whether a patient had diabetes and was from the
“National Institute of diabetes and digestive and kidney diseases (sNIDDK)”, was acquired from the UCI ML
repository. The PID dataset contains data on 768 females and eight attributes. The Hospital Frankfurt in
Frankfurt, Germany provided the second dataset. Eight features and 2,000 cases are included in the dataset.

3.2. Pre-processing phase

In order to use data in an ML model, it must first be cleaned and formatted. This process is known
as data pre-processing. It is correct to conclude that this is the first and most crucial step in the process of
creating an ML model. There are number of outliers in the dataset like missing values, zeros instead of
values, etc these errors need to be tackled. When creating an ML model, having access to adequate, well-
structured data is beneficial, but it is not always a given. To make the process more manageable, data
preprocessing is divided into four steps: information extraction, data aggregation, data compression, and
transformation of data. In this study, zeros were replaced with the mean value.

3.3. Feature selection

The inclusion of numerous additional aspects and features has contributed to a substantial growth in
medical data. The majority of features do not influence the outcomes of predictive models but increase
calculation time and resource requirements. As a result, in order to achieve high accuracy rates, picking up a
limited number of features is required. On the PID and Germany diabetes datasets in this work, the ideal
subset of characteristics was selected using GA. The number of features in the datasets was decreased by
using the objective function provided in the algorithm. Algorithm opted for the feature selection described:
A genetic algorithm-based feature selection approach for diabetes prediction (Kirti Kangra)
1494  ISSN: 2252-8938

Input: set parameters, Produce P random population solutions with n max number of generations using
binary encoding
Output: P(n) the best features
Start
For each individual form i to P do
Evaluate Fitness function
While iteration number < n
Select= SelectBst(i);
If Select then // using tournament selection
If Cross-over then
Choose two parents ia and ib
Produce off-spring ic= cross-over
Else
Choose one individual
Produce off-spring by Mutate(ic)
Terminate
Evaluate the fitness value of ic;
Replace with least fitness value individual;

By following the above steps number of times pregnant, ‘Plasma glucose concentration a 2 hours in
an oral glucose tolerance test’; ‘Body mass index (weight in kg/(height in m 2)); Diabetes pedigree function;
Age (years); these features among the selected dataset were chosen for PIDD. The same steps were applied to
the other diabetes dataset and ‘Glucose’; ‘skinthickness’; ‘Insulin’; ‘BMI’; ‘Diabetes pedigree function’
opted for optimised results.

3.4. Algorithm selection

The dataset and the type of prediction determine which classifier is being used. The NB, LR, SVM,
DT, RF, and KNN ML classifiers were chosen from the literature. From Table 1 classifiers whose count>2
were selected for the experiment.

3.5. Software
The experiment was carried out through Python. The Python scikit-learn library contained the ML
classifiers (Python offers built-in libraries that can be used to implement different feature selection
algorithms). They can be implemented using libraries such as Jupiter Notebook, NumPy, Pandas, and Scikit-
learn. For the experiment, Jupiter notebook was run on “AMD Ryzen 5 5500U with radeon graphics and 16
GB RAM under x64 bit Windows 11 operating system”.

4. EXPERIMENT ANALYSIS
For the experiment, this study used PIMA and the Germany diabetes datasets, downloaded from
Kaggle. The experiment was conducted in two phases: with and without feature selection using GA. This
study shows the comparison between results with and without using GA. The mean was utilised to replace all
zeros throughout the pre-processing phase. Table 2 gives details on the parameter values used by GA during
the trial. Following preprocessing, 10-fold cross-validation was used to divide the data into training and
testing sets. While only 30% of the data was used for testing, 70% of it was used for training.
Accuracy [31], Precision [32], Recall, F-1 score, MCC, Kappa value, AUC [33], MAE, RMSE,
RAE, RRSE, and MSTSS parameter metrics and error rates were used in the analysis. Mean absolute error:
As computed by averaging the absolute difference over the dataset, it reflects the difference between the
actual and expected values [34]. Root mean squared error (RMSE): It is a well-known method for assessing
model error when predicting statistical data. RMSE values between 0.0 and 0.5 indicate that the model can
make precise predictions about the data. Relative absolute error (RAE): It is a technique for assessing a
predictive model’s potency. It contrasts mean mistakes to trivial errors and is expressed as a ratio [35]. Root
relative squared error (RRSE) is a key indicator that sheds light on a model's performance. It also has a
relative squared error (RSE) variation. Matthews’ correlation coefficient measures the accuracy of
categorisation by accounting for true and erroneous positives and negatives. In this, a perfect forecast is
represented by a value of 1, an imperfect prediction by a value of 0, and a total difference between the
prediction and the observation by a value of 1 [36]. Kappa Value: It is a statistic that measures the agreement
between the observed classification and the classification produced by a model while accounting for the
possibility of agreement occurring by chance [37].

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Int J Artif Intell ISSN: 2252-8938  1495

4.1. Results for diabetes dataset

From Table 3 different classifiers NB, SVM, DT, KNN, LR, and RF showed 77%, 75%, 70%, 74%,
76%, and 77% accuracy scores. Among them, NB and RF perform better with 77%. From Table 4 different
classifiers NB, SVM, DT, KNN, LR, and RF showed 70%, 76%, 73%, 78%, 80%, and 78% accuracy scores
after feature selection. Among them, LR performs better with 80%. If kappa values (Table 3 and 4) were
analysed all the classifiers were not enough strong to predict diabetes for the PID dataset. Figures 3 and 4
depict that among all the selected classifiers NB was the only one whose accuracy and auc decreased after
feature selection.

Table 2. Genetic algorithm parameter

Parameter name Value
Size of Population 100
Number of generations 50
Crossover Probability 0.6
Mutation Probability 0.3
Mutation type Uniform
Selection type Tournament
Fitness function Accuracy
Encoding Binary

Table 3. PIDD without GA

Classifiers Accuracy Precision F-1 score Recall MCC Kappa Value Area Under Curve (AUC)
NB 77 71 65 60 0.43 0.43 71
SVM 75 76 63 55 0.45 0.43 67
DT 70 63 59 56 0.43 0.43 65
KNN 74 65 55 47 0.31 0.30 68
LR 76 78 66 57 0.49 0.47 69
RF 77 71 65 60 0.43 0.29 71

Table 4. PIDD with GA

Classifiers Accuracy Precision F-1 score Recall MCC Kappa Value AUC
NB 70 72 76 77 0.44 0.43 60
SVM 76 77 76 77 0.43 0.42 70
DT 73 73 73 74 0.24 0.23 68
KNN 78 70 60 56 0.40 0.39 72
LR 80 72 62 53 0.46 0.45 72
RF 78 70 65 58 0.44 0.43 73

Figure 3. Comparison between accuracy Figure 4. Comparison between AUC

4.2. Results for Germany diabetes dataset

From Table 5 different classifiers NB, SVM, DT, KNN, LR, and RF showed 76.5%, 77%, 94%, 98%,
77%, and 98% accuracy scores. Among them, KNN and RF perform better with 98%. From Table 6 different
classifiers NB, SVM, DT, KNN, LR, and RF showed 78.6%, 77.8%, 95%, 98%, 77.5%, and 99% accuracy
scores after feature selection. Among them, KNN and RF perform better with 98.5 %. Figures 5 and 6
demonstrate that RF and KNN show almost 1% improvement after using GA. By analysing kappa values
(Tables 5 and 6) DT, KNN, and RF are enough strong to predict diabetes for the Germany dataset.
A genetic algorithm-based feature selection approach for diabetes prediction (Kirti Kangra)
1496  ISSN: 2252-8938

Table 5.Without GA
Classifiers Accuracy Precision F1-score Recall MCC Kappa Value AUC
NB 76.5 75 62 76 0.47 0.45 82
SVM 77 76 70 77 0.50 0.45 71
DT 94 94 92 94 0.87 0.89 97
KNN 98 98 93 98 0.97 0.97 98
LR 77 77 60 77 0.50 0.45 83
RF 98 98 94 98 0.97 0.98 99

Table 6.With GA
Classifiers Accuracy Precision F1-score Recall MCC Kappa Valuue AUC
NB 78.6 77 62 77 0.39 0.39 84
SVM 77.8 78 72 78 0.47 0.45 72
DT 95 93 92 95 0.87 0.87 98
KNN 98.5 98 94 98 0.97 0.98 99
LR 77.5 78 64 78 0.51 0.50 84
RF 98.5 98 96 98 0.97 0.98 98

Figure 5. Comparison between accuracy Figure 6. Comparison between AUC

4.3. Error rates of chosen classifiers

For each classifier, the error rates were examined in Tables 7 and 8. For best outcomes, the error rate
should be as low as possible. One would frequently choose the classifier with the lowest RAE and RRSE or the
lowest MAE and RMSE, depending on their needs. Low RRSE values (in Tables 7 and 8) are desirable for
improved classifier predictions. In the PID dataset, RRSE values are very high as compared to the Germany
dataset for some classifiers. All classifiers have the same RAE of 0.86, so they perform equally well in terms of
RAE for the PID dataset. If both MAE and RMSE are equally important, the study could consider LR as the
best choice since it has the lowest values for both. DT and KNN produce low RRSE values for the Germany
dataset, making them suitable as classifiers. Given the emphasis on MAE and RMSE as measures of prediction
accuracy, the DT classifier appears to be the best performer among all the classifiers in this specific context.
Without feature selection error rates were computed in the previous article of ours.
The study used two different datasets for diabetes that showed results with and without feature
selection. It provides a clear view that after feature selection accuracy of almost every classifier was
increased and also reduced error rates. Therefore, based on the results of the experiment described above, it
can be said that feature selection is crucial for improving prediction accuracy.

Table 7. Error rates of PIDD using GA Table 8. Error rates of Germany using GA
Classifiers MAE RMSE RAE RRSE% Classifiers MAE RMSE RAE RRSE%
NB 0.26 0.51 0.86 51.80 NB 0.26 0.51 0.77 51.63
SVM 0.27 0.52 0.86 52.22 SVM 0.27 0.52 0.77 52.12
DT 0.36 0.60 0.86 59.94 DT 0.05 0.23 0.77 25.16
KNN 0.28 0.53 0.86 53.45 KNN 0.23 0.48 0.77 48.47
LR 0.25 0.50 0.86 50.96 LR 0.27 0.52 0.77 52.12
RF 0.26 0.51 0.86 51.80 RF 0.26 0.51 0.77 51.63

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Int J Artif Intell ISSN: 2252-8938  1497

5. CONCLUSION
Thus, GA can be used as an optimisation technique to find the best optimal solution. The GA provides
optimal features to predict diabetes through initialisation, selection, crossover, mutation, and replacement. To
obtain results the study followed four steps: Pre-processing, feature selection, ML classifiers, and performance
metrics. Consequently, after feature extraction LR performs better with an accuracy rate of 80% for PIMA
diabetes and 98.5% accuracy rate for the Germany diabetes dataset for RF and KNN. In addition, the study
compared the results of both datasets with and without feature selection. It shows that by using the feature
selection algorithm results for the selected ML classifiers can be increased. Accuracy, Precision, Recall, F-1
score, MCC, Kappa value, AUC, MAE, RMSE, RAE, and RRSE parameter metrics and error rates were used in
the analysis phase. Thus, from this experiment related to diabetes, we can say that GA alone is not sufficient for
the accurate prediction of diabetes. There should be some other method that integrates with GA to make its
performance much better for other parameters i.e., Kappa value, and MCC. The combining GA with some other
optimisation methods is the future concern of this research paper. The datasets utilised here have fewer
instances and properties, but the study may use some sizable datasets.

REFERENCES
[1] H. Polat, H. Danaei Mehr, and A. Cetin, “Diagnosis of chronic kidney disease based on support vector machine by feature
selection methods,” Journal of Medical Systems, vol. 41, no. 4, p. 55, Apr. 2017, doi: 10.1007/s10916-017-0703-x.
[2] R. Parthiban et al., “Prognosis of chronic kidney disease (CKD) using hybrid filter wrapper embedded,” European Journal of
Molecular & Clinical Medicine, vol. 07, no. 09, pp. 2511–2530, 2021.
[3] M. Manonmani and S. Balakrishnan, “An ensemble feature selection method for prediction of CKD,” 2020 International
Conference on Computer Communication and Informatics, ICCCI 2020, 2020, doi: 10.1109/ICCCI48352.2020.9104137.
[4] V. Kumar, J. K. Chhabra, and D. Kumar, “Parameter adaptive harmony search algorithm for unimodal and multimodal
optimization problems,” Journal of Computational Science, vol. 5, no. 2, pp. 144–155, Mar. 2014, doi:
10.1016/j.jocs.2013.12.001.
[5] A. Prabha, J. Yadav, A. Rani, and V. Singh, “Design of intelligent diabetes mellitus detection system using hybrid feature
selection based XGBoost classifier,” Computers in Biology and Medicine, vol. 136, 2021, doi:
10.1016/j.compbiomed.2021.104664.
[6] Z. Michalewicz, “Genetic Algorithms + Data Structures = Evolution Programs 3ed _Michalewicz.PDF,” 3rd ed. Springer Berlin
Heidelberg, 1996.
[7] U. Mehboob, J. Qadir, S. Ali, and A. Vasilakos, “Genetic algorithms in wireless networking: techniques, applications, and
issues,” Soft Computing, vol. 20, no. 6, pp. 2467–2501, Jun. 2016, doi: 10.1007/s00500-016-2070-9.
[8] A. Ramesh, C. Kambhampati, J. Monson, and P. Drew, “Artificial intelligence in medicine,” Annals of The Royal College of
Surgeons of England, vol. 86, no. 5, pp. 334–338, Sep. 2004, doi: 10.1308/147870804290.
[9] P. Ghodmare, “A review paper on brief introduction of genetic algorithm,” International Journal of Science Technology &
Engineering, vol. 4, no. 8, pp. 42–44, 2018.
[10] G. R. Ashisha, X. A. Mary, H. M. Ashif, I. Karthikeyan, and J. Roshan, “Early diabetes prediction with optimal feature selection
using ML based prediction framework,” in 2023 4th International Conference on Signal Processing and Communication
(ICSPC), Mar. 2023, pp. 391–395. doi: 10.1109/ICSPC57692.2023.10125956.
[11] I. D. Federation, “Facts & Figures,” International Diabetes Federation, 2023.
[12] K. Kangra and J. Singh, “Comparative analysis of predictive machine learning algorithms for diabetes mellitus,” Bulletin of
Electrical Engineering and Informatics, vol. 12, no. 3, pp. 1728–1737, Jun. 2023, doi: 10.11591/eei.v12i3.4412.
[13] R. N. Patil, S. Rawandale, N. Rawandale, U. Rawandale, and S. Patil, “An efficient stacking based NSGA-II approach for
predicting type 2 diabetes,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 1, p. 1015, Feb.
2023, doi: 10.11591/ijece.v13i1.pp1015-1023.
[14] A. García-Domínguez et al., “Diabetes detection models in mexican patients by combining machine learning algorithms and
feature selection techniques for clinical and paraclinical attributes: a comparative evaluation,” Journal of Diabetes Research, vol.
2023, pp. 1–19, Jun. 2023, doi: 10.1155/2023/9713905.
[15] S. Arukonda and R. Cheruku, “A novel diversity-based ensemble approach with genetic algorithm for effective disease
diagnosis,” Soft Computing, vol. 27, no. 14, pp. 9907–9926, Jul. 2023, doi: 10.1007/s00500-023-08393-5.
[16] M. G. El-Shafiey, A. Hagag, E.-S. A. El-Dahshan, and M. A. Ismail, “A hybrid GA and PSO optimized approach for heart-
disease prediction based on random forest,” Multimedia Tools and Applications, vol. 81, no. 13, pp. 18155–18179, May 2022,
doi: 10.1007/s11042-022-12425-x.
[17] J. Abdollahi and B. Nouri-Moghaddam, “Hybrid stacked ensemble combined with genetic algorithms for diabetes prediction,”
Iran Journal of Computer Science, vol. 5, no. 3, pp. 205–220, Sep. 2022, doi: 10.1007/s42044-022-00100-1.
[18] P. R. Togatorop, M. Sianturi, D. Simamora, and D. Silaen, “Optimizing random forest using genetic algorithm for heart disease
classification,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 13, no. 1, p. 60, Aug. 2022, doi:
10.24843/LKJITI.2022.v13.i01.p06.
[19] Y. Tan, H. Chen, J. Zhang, R. Tang, and P. Liu, “Early risk prediction of diabetes based on GA-stacking,” Applied Sciences, vol.
12, no. 2, p. 632, Jan. 2022, doi: 10.3390/app12020632.
[20] A. Rajagopal, S. Jha, R. Alagarsamy, S. G. Quek, and G. Selvachandran, “A novel hybrid machine learning framework for the
prediction of diabetes with context-customized regularization and prediction procedures,” Mathematics and Computers in
Simulation, vol. 198, pp. 388–406, Aug. 2022, doi: 10.1016/j.matcom.2022.03.003.
[21] S. M. Nagarajan, V. Muthukumaran, R. Murugesan, R. B. Joseph, M. Meram, and A. Prathik, “Innovative feature selection and
classification model for heart disease prediction,” Journal of Reliable Intelligent Environments, vol. 8, no. 4, pp. 333–343, Dec.
2022, doi: 10.1007/s40860-021-00152-3.
[22] S. E. A. Ashri, M. M. El-Gayar, and E. M. El-Daydamony, “HDPF: heart disease prediction framework based on hybrid
classifiers and genetic algorithm,” IEEE Access, vol. 9, pp. 146797–146809, 2021, doi: 10.1109/ACCESS.2021.3122789.
[23] M. G. El-Shafiey, A. Hagag, E.-S. A. El-Dahshan, and M. A. Ismail, “Heart-disease prediction method using random forest and

A genetic algorithm-based feature selection approach for diabetes prediction (Kirti Kangra)
1498  ISSN: 2252-8938

genetic algorithms,” in 2021 International Conference on Electronic Engineering (ICEEM), Jul. 2021, pp. 1–6. doi:
10.1109/ICEEM52022.2021.9480625.
[24] X. Li, J. Zhang, and F. Safara, “Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection
algorithm,” Neural Processing Letters, vol. 55, no. 1, pp. 153–169, Feb. 2023, doi: 10.1007/s11063-021-10491-0.
[25] M. G. Dinesh and D. Prabha, “Diabetes mellitus prediction system using hybrid KPCA-GA-SVM feature selection techniques,”
Journal of Physics: Conference Series, vol. 1767, no. 1, p. 012001, Feb. 2021, doi: 10.1088/1742-6596/1767/1/012001.
[26] P. Rani, R. Kumar, N. M. O. S. Ahmed, and A. Jain, “A decision support system for heart disease prediction based upon machine
learning,” Journal of Reliable Intelligent Environments, vol. 7, no. 3, pp. 263–275, Sep. 2021, doi: 10.1007/s40860-021-00133-6.
[27] A. F. H. Alharan, Z. M. Algelal, N. S. Ali, and N. Al-Garaawi, “Improving classification performance for diabetes with linear
discriminant analysis and genetic algorithm,” in 2021 Palestinian International Conference on Information and Communication
Technology (PICICT), Sep. 2021, pp. 38–44. doi: 10.1109/PICICT53635.2021.00019.
[28] O. Y. Dweekat and S. S. Lam, “Optimized design of hybrid genetic algorithm with multilayer perceptron to predict patients with
diabetes,” Soft Computing, vol. 27, no. 10, pp. 6205–6222, May 2023, doi: 10.1007/s00500-023-07876-9.
[29] UCI Machine Learning, “Pima Indians diabetes database,” 2021, doi: Pima Indians Diabetes Database | Kaggle.
[30] DaSilva John, “diabetes | Kaggle,” 2022, [Online]. Available:
https://www.kaggle.com/datasets/johndasilva/diabetes?resource=download
[31] N. Cahyani and M. A. Muslim, “Increasing accuracy of C4.5 algorithm by applying discretization and correlation-based feature
selection for chronic kidney disease diagnosis,” Journal of Telecommunication, Elctronic and Computer Engineering, vol. 12,
no. 1, pp. 25–32, 2020.
[32] M. Gollapalli et al., “A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-
diabetes, T1DM, and T2DM,” Computers in Biology and Medicine, vol. 147, p. 105757, Aug. 2022, doi:
10.1016/j.compbiomed.2022.105757.
[33] H. Lu, S. Uddin, F. Hajati, M. A. Moni, and M. Khushi, “A patient network-based machine learning model for disease prediction:
The case of type 2 diabetes mellitus,” Applied Intelligence, vol. 52, no. 3, pp. 2411–2422, Feb. 2022, doi: 10.1007/s10489-021-
02533-w.
[34] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE,
MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, p. e623, Jul. 2021, doi:
10.7717/peerj-cs.623.
[35] R. Naseem et al., “Empirical assessment of machine learning techniques for software requirements risk prediction,” Electronics,
vol. 10, no. 2, p. 168, Jan. 2021, doi: 10.3390/electronics10020168.
[36] L. Ali et al., “An optimized stacked support vector machines based expert system for the effective prediction of heart failure,”
IEEE Access, vol. 7, pp. 54007–54014, 2019, doi: 10.1109/ACCESS.2019.2909969.
[37] K. C. Howlader et al., “Machine learning models for classification and identification of significant attributes to detect type 2
diabetes,” Health Information Science and Systems, vol. 10, no. 1, p. 2, Feb. 2022, doi: 10.1007/s13755-021-00168-2.

BIOGRAPHIES OF AUTHORS

Kirti Kangra is pursuing her Ph.D. from Guru Jambheshwar University of

Science and Technology, Hisar, Haryana in Computer Science and Engineering. She has
teaching experience for 2 years. She has completed her B. Tech in Information and
Technology from Vaish College of Engineering, Rohtak, and M. Tech in Computer Science
and Engineering from the University Institute of Engineering and Technology affiliated with
Maharishi Dayanand University, Rohtak, Haryana. Her areas of research include machine
learning, data mining, and artificial intelligence. She can be contacted at email:
[email protected].

Jaswinder Singh is working as Professor in the Department of Computer Science

& Engineering at Guru Jambheshwar University of Science &Technology, Hisar, Haryana. He
has teaching experience of more than 20 years and he has published more than 25 research
papers in international journals and conferences. He has completed his Ph.D in Computer
Science & Engineering from Deenbandhu Chhotu Ram University of Science and
Technology, Murthal, Sonepat, Haryana and completed his M.Tech in Computer Science &
Engineering from Kurukshetra University, Kurukshetra, Haryana. His areas of research
include machine learning, opinion mining, web information retrieval, search engine
optimisation, web mining, information processing, information system, and social network
analysis. He can be contacted at email: [email protected].

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

10 1109@cesys 2018 8723882
No ratings yet
10 1109@cesys 2018 8723882
5 pages
Latent Profile Analysis in R: A Tutorial and Comparison To Mplus
No ratings yet
Latent Profile Analysis in R: A Tutorial and Comparison To Mplus
19 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
4 pages
Artificial Intelligence in Biomedical Engineering
No ratings yet
Artificial Intelligence in Biomedical Engineering
25 pages
Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm
No ratings yet
Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm
5 pages
1-2 The Problem 3-4 Proposed Solution 5-7 The Experiment 8-9 Experimental Results 10-11 Conclusion 12 References 13
No ratings yet
1-2 The Problem 3-4 Proposed Solution 5-7 The Experiment 8-9 Experimental Results 10-11 Conclusion 12 References 13
14 pages
ls4 PDF
No ratings yet
ls4 PDF
9 pages
Data Mining - Fuzzy Neural Genetic Algorithm in Predicting Diabetes
No ratings yet
Data Mining - Fuzzy Neural Genetic Algorithm in Predicting Diabetes
5 pages
Minor Project FINAL Review in Signal Processing Domain: Supervisor
No ratings yet
Minor Project FINAL Review in Signal Processing Domain: Supervisor
32 pages
1 s2.0 S1877050915004536 Main
No ratings yet
1 s2.0 S1877050915004536 Main
8 pages
Decision Tree Classifiers With Ga Based Feature Selection
No ratings yet
Decision Tree Classifiers With Ga Based Feature Selection
10 pages
UNIT - 5
No ratings yet
UNIT - 5
9 pages
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
No ratings yet
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
46 pages
Observations From Survey: Analysis Based On Output
No ratings yet
Observations From Survey: Analysis Based On Output
2 pages
Exploring The Efficacy of Machine Learning Algorithms For Diabetes Prediction A Comparative Prediction
No ratings yet
Exploring The Efficacy of Machine Learning Algorithms For Diabetes Prediction A Comparative Prediction
9 pages
CP0270 12-Apr-2012 RM01
No ratings yet
CP0270 12-Apr-2012 RM01
31 pages
A Sampling of Various Other Learning Methods
No ratings yet
A Sampling of Various Other Learning Methods
34 pages
GADataMining CNA
No ratings yet
GADataMining CNA
73 pages
Woa 1
No ratings yet
Woa 1
5 pages
Genetic Algorithms
No ratings yet
Genetic Algorithms
37 pages
Chapter+3+ +Genetic+Algorithms+Population+Representation+ +fitness
No ratings yet
Chapter+3+ +Genetic+Algorithms+Population+Representation+ +fitness
23 pages
TSP_CMC_44065
No ratings yet
TSP_CMC_44065
26 pages
Sample - IND ASSINGMENT-2-9 (ISP611)
No ratings yet
Sample - IND ASSINGMENT-2-9 (ISP611)
8 pages
Survey On GA and Rules
No ratings yet
Survey On GA and Rules
15 pages
Analysisonfitnessfunction
No ratings yet
Analysisonfitnessfunction
5 pages
Contoh Makalah
No ratings yet
Contoh Makalah
4 pages
3
No ratings yet
3
159 pages
The Applications of Genetic Algorithms in Medicine
No ratings yet
The Applications of Genetic Algorithms in Medicine
11 pages
GA Optimisation
No ratings yet
GA Optimisation
48 pages
Metaheuristic_Algorithms_on_Feature_Selection_A_Survey_of_One_Decade_of_Research_2009-2019
No ratings yet
Metaheuristic_Algorithms_on_Feature_Selection_A_Survey_of_One_Decade_of_Research_2009-2019
26 pages
A feature selection method with feature ranking using genetic programming
No ratings yet
A feature selection method with feature ranking using genetic programming
24 pages
Review 2 Final
No ratings yet
Review 2 Final
27 pages
2
No ratings yet
2
12 pages
Module 5
No ratings yet
Module 5
11 pages
Learning with AI : Intelligent Optimisation
From Everand
Learning with AI : Intelligent Optimisation
Coleman Colman
No ratings yet
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
5.binary Grosopher
No ratings yet
5.binary Grosopher
20 pages
batch34_diabetis prediction ML_formatted
No ratings yet
batch34_diabetis prediction ML_formatted
81 pages
An Approach For Systems Identification Using Artificial Inteligence
No ratings yet
An Approach For Systems Identification Using Artificial Inteligence
4 pages
1 s2.0 S1568494623005768 Main
No ratings yet
1 s2.0 S1568494623005768 Main
19 pages
Aide MATLab Sur Les Algorithmes Génétiques
100% (2)
Aide MATLab Sur Les Algorithmes Génétiques
21 pages
Limitation of Apriori Algo
No ratings yet
Limitation of Apriori Algo
7 pages
Healthy Food Suggestions Based On Blood Parameters Web Application
No ratings yet
Healthy Food Suggestions Based On Blood Parameters Web Application
23 pages
1 s2.0 S0010482524011569 Main
No ratings yet
1 s2.0 S0010482524011569 Main
15 pages
Fitness Function Evaluation Using Ga
No ratings yet
Fitness Function Evaluation Using Ga
12 pages
Genetic Based ID3 Classification Algorithm Diagnosis and Prognosis of Oral Cancer
No ratings yet
Genetic Based ID3 Classification Algorithm Diagnosis and Prognosis of Oral Cancer
3 pages
Evolutionary algorithms
No ratings yet
Evolutionary algorithms
66 pages
Genetic Algorithm and Data Mining
No ratings yet
Genetic Algorithm and Data Mining
2 pages
MSC Dissertation - Matthew Sparkes
100% (7)
MSC Dissertation - Matthew Sparkes
47 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages
An Entropy-Based Adaptive Genetic Algorithm For Learning Classification Rules
No ratings yet
An Entropy-Based Adaptive Genetic Algorithm For Learning Classification Rules
8 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
PPT-Template (1)
No ratings yet
PPT-Template (1)
31 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
10 pages
MCTS-GA
No ratings yet
MCTS-GA
5 pages
Tài Liệu Việt Nam -Anh Bảo
No ratings yet
Tài Liệu Việt Nam -Anh Bảo
4 pages
Summary on Artificial Intelligence In Medicine
No ratings yet
Summary on Artificial Intelligence In Medicine
9 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
8 pages
Multiple Disease Prediction System
100% (1)
Multiple Disease Prediction System
9 pages
P3Permutation Encoding TSP
No ratings yet
P3Permutation Encoding TSP
6 pages
Ga Parameters
No ratings yet
Ga Parameters
10 pages
Multi-task deep learning for Vietnamese capitalization and punctuation recognition
No ratings yet
Multi-task deep learning for Vietnamese capitalization and punctuation recognition
11 pages
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
No ratings yet
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
10 pages
Developing a website for English-speaking practice to English as a foreign language learners at the university level
No ratings yet
Developing a website for English-speaking practice to English as a foreign language learners at the university level
12 pages
A proposed approach for plagiarism detection in Myanmar Unicode text
No ratings yet
A proposed approach for plagiarism detection in Myanmar Unicode text
9 pages
Evaluating ChatGPT’s Mandarin “yue” pronunciation system in language learning
No ratings yet
Evaluating ChatGPT’s Mandarin “yue” pronunciation system in language learning
8 pages
Deep learning-based techniques for video enhancement, compression and restoration
No ratings yet
Deep learning-based techniques for video enhancement, compression and restoration
13 pages
Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
No ratings yet
Abstractive summarization using multilingual text-to-text transfer transformer for the Turkish text
10 pages
A contest of sentiment analysis: k-nearest neighbor versus neural network
No ratings yet
A contest of sentiment analysis: k-nearest neighbor versus neural network
9 pages
A comparative study of natural language inference in Swahili using monolingual and multilingual models
No ratings yet
A comparative study of natural language inference in Swahili using monolingual and multilingual models
8 pages
Graph-based methods for transaction databases: a comparative study
No ratings yet
Graph-based methods for transaction databases: a comparative study
10 pages
Video forgery: An extensive analysis of inter-and intra-frame manipulation alongside state-of-the-art comparisons
No ratings yet
Video forgery: An extensive analysis of inter-and intra-frame manipulation alongside state-of-the-art comparisons
13 pages
Hybrid object detection and distance measurement for precision agriculture: integrating YOLOv8 with rice field sidewalk detection algorithm
No ratings yet
Hybrid object detection and distance measurement for precision agriculture: integrating YOLOv8 with rice field sidewalk detection algorithm
11 pages
Automatic detection of dress-code surveillance in a university using YOLO algorithm
No ratings yet
Automatic detection of dress-code surveillance in a university using YOLO algorithm
8 pages
Artificial intelligence algorithms to predict customer satisfaction: a comparative study
No ratings yet
Artificial intelligence algorithms to predict customer satisfaction: a comparative study
9 pages
Enhancing emotion recognition model for a student engagement use case through transfer learning
No ratings yet
Enhancing emotion recognition model for a student engagement use case through transfer learning
11 pages
Hindi spoken digit analysis for native and non-native speakers
No ratings yet
Hindi spoken digit analysis for native and non-native speakers
7 pages
Improved convolutional neural networks for aircraft type classification in remote sensing images
No ratings yet
Improved convolutional neural networks for aircraft type classification in remote sensing images
8 pages
Primary phase Alzheimer's disease detection using ensemble learning model
No ratings yet
Primary phase Alzheimer's disease detection using ensemble learning model
9 pages
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on deep neural network
No ratings yet
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on deep neural network
13 pages
Hybrid model detection and classification of lung cancer
No ratings yet
Hybrid model detection and classification of lung cancer
11 pages
Adaptive kernel integration in visual geometry group 16 for enhanced classification of diabetic retinopathy stages in retinal images
No ratings yet
Adaptive kernel integration in visual geometry group 16 for enhanced classification of diabetic retinopathy stages in retinal images
12 pages
U-Net for wheel rim contour detection in robotic deburring
No ratings yet
U-Net for wheel rim contour detection in robotic deburring
14 pages
A novel scalable deep ensemble learning framework for big data classification via MapReduce integration
No ratings yet
A novel scalable deep ensemble learning framework for big data classification via MapReduce integration
15 pages
Deep ensemble learning with uncertainty aware prediction ranking for cervical cancer detection using Pap smear images
No ratings yet
Deep ensemble learning with uncertainty aware prediction ranking for cervical cancer detection using Pap smear images
11 pages
Event detection in soccer matches through audio classification using transfer learning
No ratings yet
Event detection in soccer matches through audio classification using transfer learning
9 pages
Enhancing fall detection and classification using Jarratt‐butterfly optimization algorithm with deep learning
No ratings yet
Enhancing fall detection and classification using Jarratt‐butterfly optimization algorithm with deep learning
10 pages
Optimizing deep learning models from multi-objective perspective via Bayesian optimization
No ratings yet
Optimizing deep learning models from multi-objective perspective via Bayesian optimization
10 pages
Exploring DenseNet architectures with particle swarm optimization: efficient tomato leaf disease detection
No ratings yet
Exploring DenseNet architectures with particle swarm optimization: efficient tomato leaf disease detection
9 pages
Squeeze-excitation half U-Net and synthetic minority oversampling technique oversampling for papilledema image classification
No ratings yet
Squeeze-excitation half U-Net and synthetic minority oversampling technique oversampling for papilledema image classification
10 pages
Detecting road damage utilizing retinanet and mobilenet models on edge devices
No ratings yet
Detecting road damage utilizing retinanet and mobilenet models on edge devices
11 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
Question Bank For Digital Image Processing
0% (1)
Question Bank For Digital Image Processing
9 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
Finalproject Review PPT
No ratings yet
Finalproject Review PPT
39 pages
A Survey On Chat-Bot System For Agriculture Domain
No ratings yet
A Survey On Chat-Bot System For Agriculture Domain
5 pages
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
No ratings yet
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
1 page
ML LAB REPORT
No ratings yet
ML LAB REPORT
6 pages
A Machine Learning-Based Clinical Decision Support System To Identify Prescriptions With A High Risk of Medication Error
No ratings yet
A Machine Learning-Based Clinical Decision Support System To Identify Prescriptions With A High Risk of Medication Error
7 pages
Ada Boost
No ratings yet
Ada Boost
2 pages
Learning Style Detection in E-Learning Systems Using Machine Learning Techniques
No ratings yet
Learning Style Detection in E-Learning Systems Using Machine Learning Techniques
12 pages
A Technical Report On:: "Artificial Intelligence"
No ratings yet
A Technical Report On:: "Artificial Intelligence"
20 pages
Transmission Line Faults in Power System and The Different Algorithms For Identification, Classification and Localization: A Brief Review of Methods
No ratings yet
Transmission Line Faults in Power System and The Different Algorithms For Identification, Classification and Localization: A Brief Review of Methods
24 pages
Artificial Inteligence PDF
No ratings yet
Artificial Inteligence PDF
328 pages
Detecting Alzheimer From Speech
No ratings yet
Detecting Alzheimer From Speech
11 pages
Overview of Data Mining
No ratings yet
Overview of Data Mining
4 pages
Bab 01 - Pengenalan Data Mining
No ratings yet
Bab 01 - Pengenalan Data Mining
37 pages
1-s2.0-S0167404823005710-main
No ratings yet
1-s2.0-S0167404823005710-main
19 pages
Cs3491 - Aiml - Unit III - Introduction To Machine Learning1
100% (1)
Cs3491 - Aiml - Unit III - Introduction To Machine Learning1
23 pages
Anirudh Jain CV
No ratings yet
Anirudh Jain CV
2 pages
ML Mid Key PDF
No ratings yet
ML Mid Key PDF
6 pages
005 Model Clasification
No ratings yet
005 Model Clasification
22 pages
Aiml Lab Manual - Srmtrpec - 20 Feb 24
No ratings yet
Aiml Lab Manual - Srmtrpec - 20 Feb 24
86 pages
Gebreyes Gebeyehu Coffee Thesis 2021
No ratings yet
Gebreyes Gebeyehu Coffee Thesis 2021
124 pages
Machine Learning Based Approach For Predicting Fault in Software Engineering by GMFPA: A Survey
No ratings yet
Machine Learning Based Approach For Predicting Fault in Software Engineering by GMFPA: A Survey
6 pages
FINAL Thesis Nirali Mistry 14-Sep-2022
No ratings yet
FINAL Thesis Nirali Mistry 14-Sep-2022
92 pages
DWM
100% (1)
DWM
19 pages
Texture TMI
No ratings yet
Texture TMI
6 pages
Hierarchical Cluster Analysis - R Tutorial
No ratings yet
Hierarchical Cluster Analysis - R Tutorial
3 pages
Computer Vision
No ratings yet
Computer Vision
6 pages

A Genetic Algorithm-Based Feature Selection Approach For Diabetes Prediction

Uploaded by

A Genetic Algorithm-Based Feature Selection Approach For Diabetes Prediction

Uploaded by

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 2, June 2024, pp. 1489~1498

A genetic algorithm-based feature selection approach for

Kirti Kangra1, Jaswinder Singh2

Article Info ABSTRACT

Journal homepage: http://ijai.iaescore.com

𝐹(𝑡) = 𝑒𝑣𝑎𝑙(𝑧) (1)

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Figure 1.Taxonomy of genetic algorithm

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Table 1.Classifier count

3.1. Datasets description

3.2. Pre-processing phase

3.3. Feature selection

3.4. Algorithm selection

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

4.1. Results for diabetes dataset

Table 2. Genetic algorithm parameter

Table 3. PIDD without GA

Table 4. PIDD with GA

Figure 3. Comparison between accuracy Figure 4. Comparison between AUC

4.2. Results for Germany diabetes dataset

Figure 5. Comparison between accuracy Figure 6. Comparison between AUC

4.3. Error rates of chosen classifiers

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

Kirti Kangra is pursuing her Ph.D. from Guru Jambheshwar University of

Jaswinder Singh is working as Professor in the Department of Computer Science

Int J Artif Intell, Vol. 13, No. 2, June 2024: 1489-1498

You might also like