Implementation of Machine Learning Techniques With Big Data and IoT To Create Effective Prediction Models For Health Informatics
Implementation of Machine Learning Techniques With Big Data and IoT To Create Effective Prediction Models For Health Informatics
net/publication/379838882
CITATION READS
1 111
6 authors, including:
Md Mobin Akhtar
SEE PROFILE
All content following this page was uploaded by Abu Sarwar Zamani on 16 April 2024.
A R T I C L E I N F O A B S T R A C T
Keywords: As a result of the availability of healthcare data in sheer size, big data analytics has to grow regularly in this
Prediction Model industry to ensure new and effective opportunities. This is helpful in providing early prevention, prediction, and
Health Informatics detection of disease, thus helping in the enhancement of the overall life quality of the individuals. Likewise, in
Big Data
this paper, a machine learning-based big data analytics model is developed for predicting multi-diseases to
Machine Learning Techniques
Hybrid Flower Pollination Bumblebees
provide a better decision support system for various healthcare applications. This developed framework utilizes
Optimization Algorithm the MapReduce framework, where the map phase performs feature extraction and the reduce phase performs
Neural Networks feature selection for the purpose of handling and processing big data. The required healthcare data is collected
Fuzzy Classifier from external web sources. In the map phase, the statistical features and the Principal Component Analysis (PCA)
K-Nearest Neighbour features are extracted. In the reduction phase, the optimal features are selected with the aid of the developed
Hybrid Flower Pollination Bumblebees Optimization Algorithm (HFPBOA). Then, the Ensemble Learning (EL)
model is developed to predict the multi-diseases. Moreover, the parameters present in the EL classifiers are
optimized by using the same HFPBOA. The final prediction output is obtained by averaging the weight function
between the outputs of the NN, KNN, and fuzzy classifier. Thus, the offered model attains 40.1%, 28.7%, 23.6%,
and 10.5% improved than SSA-EL, DOA-EL, BOA-EL, and FA-EL respectively in terms of best value. The effec
tiveness computed for the developed multi-disease prediction framework is guaranteed by comparing the results
among the recently developed prediction approaches.
1. Introduction in the delivery of various healthcare services [2]. For the convenience of
the patients, these IoT-based wireless devices are worn or implanted on
Based on the support of massive internet connectivity and an abun the patient’s body in order to collect the patient’s healthcare data in a
dance of bandwidth feasibility, the Internet of Things (IoT) has shown remote manner. This technology is considered the Internet of Medical
much development in recent years. These IoT play an essential role in Things (IoMT). In the IoMT system, the sensor devices are used for
the development of various electronics industries as it paves a new way transmitting the significant symptoms of the patients to the health care
for implementing these devices in an automated and remote manner. providers with the aid of IoT technology in a wireless and remote
Like any other field, the medical domain has also been completely manner [3]. This data obtained by means of IoT-based applications helps
modified with the aid of IoT. This IoT technology is also helpful in in providing effective treatment plans and helps in making consultations
regenerating various applications in this medical domain as well [1]. or interventions in some cases by means of webcam with the respective
IoT-based healthcare systems are incorporated for providing accessi doctors. Moreover, the data helps to make the predictive analytics and
bility to data movement, information exchange, machine-to-machine thus helps the doctors in elevating the disease with a high degree of
communication, and interoperability, which on the other hand helps accuracy rate with the utilization of the gathered health patterns with
* Corresponding author at: Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, Saudi Arabia.
E-mail addresses: [email protected] (A.S. Zamani), [email protected] (A.H.A. Hashim), [email protected] (A.S.A. Shatat), mohammed.akhtar@
riyadh.edu.sa (Md.M. Akhtar), [email protected] (M. Rizwanullah), [email protected] (S.S.I. Mohamed).
https://doi.org/10.1016/j.bspc.2024.106247
Received 14 October 2023; Received in revised form 2 January 2024; Accepted 20 March 2024
Available online 3 April 2024
1746-8094/© 2024 Elsevier Ltd. All rights reserved.
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
the aid of the IoT-based sensing devices at a much faster rate [4]. machine learning models together with the support of a hybrid
However, these benefits are unparalleled manner and assure significant optimization algorithm in order to provide effective decisions in the
decision-support information. On the other hand, it is considered a field of medical science.
challenging task as it involves a high number of noises in the informa • To perform feature selection with the support of the developed
tion, and such systems have to deal with a huge variety of information HFPBOA for enhancing the overall efficiency of healthcare data
[5]. The highly essential threats caused while processing the IoMT in prediction with the selection of optimal features and weights for the
formation are related to the privacy and security of the data. Since these corresponding features for the purpose of maximizing the variance of
devices record and transmit the data over a practical environment in the weighted features.
dependent of the standard data protocols and to systems without the • To propose an efficient ensemble-based prediction model that is
support of data ownership regulations, these generated and transmitted developed by combining NN, fuzzy, and KNN for predicting three
information are susceptible to hack and fraudulence activity [6]. kinds of diseases such as diabetes, kidney-related ailments, and heart
In conventional methods, information extraction, visualization, diseases in the patients. In addition, the parameter optimization
processing, and storage are observed to be challenging issues when it takes place with the help of the developed HFPBOA in order to get
comes to the utilization of huge volumes of data of varying data types. accurate and precise results in the disease prediction process.
The most significant challenge that occurs in big data analytics is to • To suggest a hybrid optimization algorithm named the HFPBOA for
explore new ways to efficiently acquire the required information that is feature selection and for the optimization of the parameters like
required by diverse types of users [7]. Recently, diverse varieties of hidden nodes in NN, learning rate in NN, membership function in
healthcare sources have continuously gathered from both the non- fuzzy, and neighbors count in KNN in order to improve the overall
clinical and clinical scenarios, in which the highly essential data is performance capability of the prediction model.
regarded as a digital representation of the medical history of the patients • To evaluate the implemented healthcare data prediction framework
who are under healthcare analytics [8]. Hence, the development of a by comparing it against various existing state-of-the-art models with
distributed data system is required to solve the challenges that are dis the utilization of various performance metrics.
cussed as follows. The first challenge is due to the generation of a huge
volume of heterogeneous data which makes it difficult to gather the data The rest of the sections in this paper are given as follows. Section II
by means of shared locations [9]. Next, the major issues are concerning provides the existing medical data prediction model with its imple
the storage concern because of the utilization of massive and hetero mentation details, advantages, and disadvantages. Section III describes
geneous datasets [10]. The big data system is required with an effective the development of the implemented disease prediction models having
storage system in order to assure efficiency. The final challenge is caused big data. Section IV describes the map-reduce framework for big data
due to big data analytics, more precisely, while analyzing massive analysis. Section V provides the steps that are involved in the develop
datasets over real-time platforms that include optimization, prediction, ment of the ensemble learning model for assisting in the execution of the
visualization, and modeling [11]. These challenges have to be solved disease prediction framework. Section VI provides the results obtained
with a novel processing procedure to act as the data management sys and their respective discussions. Section VII gives the conclusion of the
tem, as the present data management systems are considered to be developed prediction model.
inefficient in handling practical and heterogeneous data [12].
MapReduce is considered a parallel processing approach that is uti 2. Literature survey
lized to analyze huge volumes of data that are distributed over the
commodity cluster. This technique is comprised of both Map and Reduce 2.1. Related works
operations. This MapReduce framework is used for solving problems
related to computational complexity and training complexity when Kumar et al. [18] have designed a genetic-based Fuzzy C-means data
utilizing big data [13]. Based on the MapReduce framework, Hadoop is clustering approach to identify the state of the affected over the edge.
developed as the batch-wise processing architecture which is commonly Clustering has generated precise data for all objects that were further
utilized for shared storage and analysis of big data [14]. The machine modified genetically to avoid stagnation under the local optima. A deep
learning domain is considered a subdivision of computer science. It is learning strategy was finally employed for performing the classification
one of the branches of artificial intelligence techniques that ensure that task. The framework was validated and a comparison between the re
machines use the capability of learning independently for explicit pro sults obtained from the executed model with the other conventional
gramming [15]. Machine learning is derived from pattern identification techniques with respect to its time complexity was carried out. This
as well as from computational learning theory [16]. The increasing comparison has shown that the data clustering approach has performed
quantity of data and their types leads to the development of powerful better reduction in the processing step even on the utilization of prac
tools for storing, and analyzing the data and also for obtaining useful tical data streams.
insights for supporting the decision-making [17]. Moreover, machine Tan et al. [19] have implemented a clinical data system that was
learning schemes are employed in the healthcare industry for predicting developed by obtaining assistance from big data analytics. The patient’s
disease that helps the patient to undergo effective measures at the right information was arbitrarily divided for training and testing phases for
time. The motivation behind the study has been listed here. The existing the implemented model. Based on the processing of the electronic health
health information system often fails to provide effective outcomes in records, this model has employed the synthetic minority oversampling
terms of the data threats, and transmission delay errors. Moreover, it approach for determining the five resultant outcomes. The developed
causes misdiagnosis issues while handling larger datasets. Therefore, model has been evaluated and the evaluation results have shown the
this paper aims to develop a healthcare data prediction model with a better competence of the implemented model over other conventional
combination of various machine-learning approaches. This developed frameworks.
model helps to effectively handle the larger amount of datas whereas the Safa et al. [20] have implemented a big data-based framework for
accurate feature extraction process is performed to provide significant detecting cardiac diseases in individuals by utilizing IoT devices with the
information. Thus, the result analysis has been validated to show the consideration of fuzzy rules. Here, the features were generated by means
effective performance. of the fuzzy rule in order to assist in the diagnosis of cardiac diseases,
This research paper has the following as its major contributions. which were then utilized in this model training. The detection was done
with the utilization of the optimized recurrent neural network (RNN).
• To implement an effectual healthcare data prediction model that From the RNN, the final classification was performed. The high perfor
consisted of big data by using the ensemble approach of various mance rate was observed through the experimental analysis of the
2
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
suggested model in comparison with the conventional approaches. 2.2. Problem statement
Ed-daoudy et al. [21] have implemented a novel architecture for
predicting the health status of an individual in practical scenarios as well The unprecedented growth of the electronic healthcare sector and
as made an analytics system based on the support of a big data approach. the on-demand of the healthcare system have increased the demand for
Using certain input attributes, the system was made capable of pre data analysis and big data opportunities with the utilization of machine
dicting the health status along with providing alert messages to the care learning models. The disparate and massive data management with the
providers. This information has been preserved in the distributed data aid of conventional methods is highly expensive and it is incredibly
base for performing stream reporting and health data analytics in real- difficult. Hence, several machine learning-based models are used along
time applications. The performance evaluation was made with the with the big data analytics-based health monitoring system. The ad
consideration of execution time and throughput. The performance vantages and the disadvantages of these existing machine learning-aided
offered by the developed model was observed to be higher than various big data analytics-based healthcare monitoring systems are depicted in
conventional models. below Table 1. Fuzzy [18] provides better results by reducing the
Manogaran et al. [22] have designed a “Bayesian Hidden Markov computational overhead and increasing the overall detection accuracy.
Model (HMM) along with the support of the Clustering technique” for In addition, during emergency incidents, this model also triggers the
modeling a system to predict the number of changes in the DNA copy alarms without relying on the backend servers. However, it faces several
over the genome. The developed Clustering technique was correlated difficulties while traversing, grouping, and selectively tapping the
with the conventional techniques and was related to the segment IoMT’s data traffic. Also, it provides poor performance in terms of ac
neighborhood and binary segmentation techniques. The analysis report curacy when detecting stress, baseline, and amusement in individuals.
has shown the effectiveness of the developed model while detecting the XGBoost and logistic regression [19] give promising results regarding
changes in the DNA number in an effective manner. data integrity and data security. Moreover, it makes decisions with the
Nibareke et al. [23] have presented an ensemble machine technique computation of low bandwidth and less response time. Yet, the delay
for predicting diabetes disease. Moreover, performance analytics was and path loss that occurs in the system is much higher. Also, it provides
carried out over flight delays. The overview was performed with big data low performance when fetching the patterns from the mental behavior
tools and was also with the aid of machine learning frameworks. Several of an individual. In Fuzzy [20], the routing defects are effectively
metrics were used to evaluate in terms of the accuracy of prediction. The analyzed and the risks that arise due to congestion in the IoMT are
diabetic prediction was performed and analyzed. The results were resolved. Furthermore, it produces premature influence rates which is
compared with distinct conventional models. The results suggested the helpful in the early diagnosis of the diseases. However, it needs a little
enhanced performance offered by this implemented model over other more flexibility in deciding which data has to be discarded or selected.
conventional frameworks. Also, it requires a huge storage system for retrieval and analytical op
Ashiku et al. [24] have explored the ability of an open-source model erations. Decision Tree [21] effectively reduces the normalization error.
named Apache Spark by analyzing a huge amount of data over the In addition, the cost function is highly reduced in this model. Never
clusters for evaluating the big data and have also integrated the tech theless, it acquires high power leakage which may lead to more energy
nologies for assuring the decision support systems under the healthcare consumption. Moreover, it is difficult to apply the encryption algorithm
environments. Further, the developed machine learning frameworks in this model. Gaussian Mixture (GM) clustering [22] minimizes the loss
have utilized Apache Spark for aiding in the decision-making task at the function and also highly decreases the false acceptance rate. Yet, it
time of allocating the organs such as the selection of a kidney for the consumes more time for analyzing a large volume of data. In addition, it
appropriate candidate, and hence, enhancing the donor utilization by needs high storage space for saving the entire health data. Linear
localizing the recipient under the allotted time. The developed frame Regression, Naïve Bayes, and Decision Tree [23] improve the effec
work has shown the identification of the waitlisted candidates for tiveness of the model in terms of jitter, latency, power consumption, and
accepting the kidneys that have been neglected by utilizing this network bandwidth. Moreover, it can handle disparate and complex
framework. data. However, it is affected by a lack of platform interoperability that
Pustokhin et al. [25] have recommended a novel feature selection as generates a massive burden in monitoring and controlling the healthcare
well as a disease diagnostic framework using big data analytics and Deep system. Also, the computational cost required for the system is exces
Belief Network (DBN). To minimize the number of features and to sively high. In PCA [24], each change point indices are effectively
reduce the dimensionality curse, the “Link-based Quasi Oppositional computed. Furthermore, it improves the veracity and reliability of the
Binary Particle Swarm Optimization Algorithm” was used for selecting system while dealing with big data. But it fails to provide high robust
the features in order to obtain the optimal feature set. Finally, the DBN ness. In addition, this model is also accomplished with high latency and
was used for classifying the existing disease with the support of the energy power consumption. DBN [25] highly reduces the bottleneck
feature-reduced data. The experimental analysis has shown the effect. Moreover, it improves the scalability of the health monitoring
enhanced performance offered by the developed model while consid system. Yet, it leads to a wastage of resources during data training as it
ering various aspects and comparing it with other traditional disease does not encounter the changes every second of the day. Meanwhile, it is
detection approaches. affected by congestion problems.
Marcin et al. [36] have suggested the IoT based models where it has
been integrated with the Bi-LSTM, decision tree, and data balancing 2.3. Motivation
strategy for the automated diagnosis support decision system. In order to
provide optimal solutions, the data pre-processing model has been In recent times, advanced technology is adopted which is inbuilt with
applied for the effective network training model. Henceforth, the IoT and Artificial intelligence based applications. Here, the existing
developed model has been performed to validate with different mea techniques suffer from providing accurate outcomes while is handles the
sures to provide effective outcomes in terms of accuracy, precision, and larger number of datas. However, the existing techniques often lead to
recall. Vara et al. [37] have developed the IoT-based 6G healthcare was cause misdiagnosis issues while considering the larger number of datas
performed using artificial intelligence. Here, the research work has been in healthcare informatics. These challenges that arise in the existing big
focused in terms of artificial intelligence and IoT which has been inbuilt data analytics regarding the IoT healthcare systems are resolved by
in the medical infrastructure in some of the clinical medical fields. The using machine learning techniques for the development of a new big
diverse analysis has been analyzed to show the effective performance of data analytic model. However, the developed HFPBOA algorithm has
the developed model while compared with the other state-of-the-art the ability to tune the parameters optimally in order to show accurate
methods. outcomes. However, the offered model provides better performance
3
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
4
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
diseases. Here, the parameters like the hidden nodes in NN, learning rate
in NN, membership function in fuzzy and neighbors count in KNN are
optimized with the aid of the implemented HFPBOA for enhancing the
overall performance of the prediction. From the ensemble model, the
final predicted disease outcome is obtained. The output from the
executed is validated by comparing it with other existing disease
detection approaches.
Table 2
Dataset Description of the Proposed Healthcare Prediction Model.
Dataset name Data link Dataset description
Dataset 1: “Pima Indians “https://www.kaggle.com/uciml/pima-indians-diabetes-database: This dataset is used for accurately predicting diabetic disease in patients
Diabetes Database” access date: 2023-01-11″ by concentrating on several diagnostic computations. These details are
also provided in this dataset. Here, the data are gathered from female
patients whose age is at least 21 years. This dataset is enclosed with
some variables that act as the medical predictor and a solitary target
parameter, where the predictor parameters are comprised of the
pregnancy count of the patients along with the insulin level, age, and
BMI.
Dataset 2: Diabetes “https://datahub.io/machine-learning/diabetes#data: access date: This dataset contains data related to diabetes under three data file type,
2023-01-11″ which are named as diabetes-off, diabetes, and diabetes-zip. In this
dataset, the diabetes-related databases are considered with the nine
attributes, where the final attributes give the details regarding whether a
patient is tested positive or tested negative for diabetics.
Dataset 3: “Heart Disease “https://www.kaggle.com/sid321axn/heart-statlog-cleveland-h This dataset is enclosed with 1190 records about the patients obtained
Dataset (Comprehensive)” ungary-final: access date: 2023-01-11″ from Hungary, Switzerland, the UK, and the US. Each of the data
provides various attributes that are required to analyze the presence of
diabetics. These 11 features include the “patient’s age, sex, chest pain
type, cholesterol, resting bp, fasting blood sugar, resting ECG, max heart
rate, exercise angina, and old peak”.
Dataset 4: waitlist_kidney_brazil “https://www.kaggle. This dataset supports predicting the waiting time of the deceased donor
com/datasets/gustavomodelli/waitlist-kidney-brazil: access date: 202 for a kidney transplant which is useful for clinicians and patients for
3-01-11″ suggesting the management and contributing to the effective usage of
the resources.
5
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
Considering the key-value pairs, the map function obtains the key- In Eq. (1), the term rq denotes the total number of samples and the
value pair as the input, and produces the collection of intermediate term rfrl indicates the data values. The arithmetic mean is represented as
key-value pairs as outputs. Before executing the reduce phase, the MEn.
MapReduce library combines the entire intermediate values related Minimum Value: Minimum defines “the minimum value that is
to the relevant intermediate key and makes transmissions for available in the gathered medical images. With respect to the statistical
fastening the processing in the reduce phase. measures, it is the smallest observation and the sample minimum”.
• Reduce-phase: The header node obtains the solution to the entire Standard deviation σ: Standard deviation is “the distribution in each
sub-problems and fuses them in a certain way to generate the final sample value”. It is computed using Eq. (2).
output. With the consideration of key-value pairs, the reduce phase √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ ∑
involves the transitional keys that are assured from the MapReduce √ 1 rg ( )2
σ=√ rJrj − rJ (2)
library and will generate the final results with regard to the respec rg rj=1
tive set of values and keys.
In Eq. (2), the term rJ denotes the mean distribution.
The MapReduce framework is diagrammatically illustrated in Fig. 2. Median MDn: Median is defined as “a simple measure of the central
tendency. It is determined by organizing the sample values from the
minimum to maximum value”. It is computed as given in Eq. (3).
4.2. Map Phase: Feature extraction phase
{ }th
(rq + 1)
MDn = value (3)
The map phase in the MapReduce framework is used for performing 2
the task of feature extraction in this research work. Thus, this model gets
Correlation: Correlation shows “how to correlate a sample to its
the collected data MDdataset as the input. Here, the feature extraction is
f neighboring sample within the entire observations”. It is computed as
performed based on the statistical and PCA-based methods, which are given in Eq. (4).
described in detail as follows.
PCA [26]: The collected healthcare data MDdataset is used as input to ∑1
rm−
MDrurv (ru − μ)(rv − μ)
f contrast = (4)
this technique to minimize the feature dimension while preserving the ru,rv=0
σ2
essential information. When utilizing a higher dimensional feature,
In Eq. (4), the term MDrurv denotes the dimensionality of the data at
more complexity is created when utilizing these features for training the
the position ru and rv denotes the variance of the intensity values is
model. Hence, the PCA is involved in this work for the minimization of
denoted by σ2 . The term μ denotes the average value or mean for the
the feature-length as well as for enhancing the interoperability among
entire data.
the healthcare data. At first, the PCA computes the patterns among the
Contrast: Contrast is computed using Eq. (5).
healthcare data. Then these patterns are used for determining the vari
ations and similarities between them. Following determining the pat r∑
m− 1
terns, data compression takes place to minimize the dimensions. PCA is contrast = MDrurv (ru − rv)2 (5)
progressed with the utilization of variance, covariance, and standard
ru,rv=0
deviation which are all generated as output when provided with the Entropy: Entropy gives “the amount of data loss in a sample which
input of healthcare data. Further, the mean computation is performed provides the required information for the following observation”. It is
with every data dimension. Moreover, the covariance matrix for the computed using Eq. (6).
healthcare data is determined together with the eigenvectors and ei
r∑
m− 1
genvalues. Then, the eigenvector elements are selected to design the entropy = MDrurv logMDrurv (6)
feature vector, which is observed to be an eigenvector matrix. The ru,rv=0
extracted features from PCA are represented as Fxpca b .
The extracted statistical features are represented as Fxsts
d .
Statistical features [27]: The minimum, maximum, mean, standard
deviation, entropy, correlation, contrast, and correlation are all extrac
ted by means of the statistical feature extraction method. These 4.3. Reduce Phase: Feature selection phase
extracted features that are being extracted are described as follows.
Maximum Value: Maximum explains “the largest value that is pre In the reduce phase, the extracted PCA-based as well as the statistical
sent within the collected medical data. Regarding the statistical mea features Fxpca
b and Fxsts
d , respectively are used. In order to select the
sures, it refers to the largest observation and the sample maximum”. optimal features, the developed HFPBOA is employed. This HFPBOA is
Mean: Mean is defined as “the sum of a collection of numbers divided used for reducing the feature-length with the selection of the best fea
by the total numbers of samples in the collection” as in Eq. (1). tures for predicting the disease in the individual. Based on the developed
rq HFPBOA, the optimal feature set 1 is selected from Fxpca b , which is
1 ∑
(1) indicated as FxOpt1 , and the optimal feature set 2 is selected from Fxsts
d by
MEn = rfrl g
rq rl=1
the same HFPBOA, which is represented as FxOpt2
h . Here, the number of
6
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
optimal features that are selected from each of the extracted feature sets time while solving the problems regarding the analytical queries.
is taken as 7. Then, the weight optimization in the feature fusion stage However, this algorithm has certain challenges like a low convergence
for minimizing the training complexity is done using the same HFPBOA, rate and this algorithm also stuck easily into the local optima. So, the
where the optimized weight 1 is indicated as We1g and the optimized FPO [29] algorithm is used along with the BMOA to enhance the per
weight 2 is denoted as We2h . The weighted feature selection with formance rate by solving the existing problems in the conventional
HFPBOA is mathematically depicted by means of Eq. (7). BMOA. In the proposed HFPBOA, the random parameter Rand is adap
( ) ( tively updated with the aid of the fitness mechanism as shown in Eq.
)
FfkWop = FxOpt1
g *We1g + FxOpt2
h *We2h (7) (10).
( )
worFit
In Eq. (7), the weights We1g and We2h are optimized in the range of Rand = ∗ R0 (10)
bstFit
[0.01, 0.99]. The weighted optimal features that are obtained at the end
of this stage are indicated as FfkWop . The objective of this optimization is In Eq. (10), the term worFit denotes the worst fitness value and bstFit
given by Eq. (8). indicates the best fitness value among the solutions. The arbitrary value
( ) is denoted by R0 . If the condition (Rand < 0.5) is satisfied, then the FPA-
OB1 = argmin
1
(8) based update of the solution takes place for the respective problem, or
{We1 ,We2 ,FxOpt1 ,FxOpt2 } vRN
g h g h
else, the BMOA is used for updating the solution for the provided
problem.
As given in Eq. (8), the objective of optimal feature selection and BMOA [28]: The BMOA optimization algorithm is developed based
weight optimization is to maximize the variance among the weighted on the mating characteristics of the bumble bees for resolving various
features which is represented by vRN. Variance is defined as “the complex optimization problems. Here, the bumble bees are represented
computation that is carried out by taking the average of squared de as the candidate solution over the search space, which is arranged ac
viations from the mean of sentence depths in each data”. Variance is cording to the objective function. This algorithm considers the best so
computed as provided in Eq. (9). lution as the queen bumble bees and the drone bumble bees are
1 ∑(
rg
)2 considered as the potential candidate solution. The mating behavior of
vRN = σ2 = rJrj − rJ (9) the queen bumble bees is formulated in Eq. (11).
rg rj=1
{
Qb (s) if (R⩽Dr1)
The developed HFPBOA-based weighted feature selection for multi- broodab (s) = (11)
drone − grntynb (s) otherwise
disease prediction is diagrammatically represented in Fig. 3.
In Eq. (11), the term a indicates the brood index, a denotes the index
4.4. Feature selection using HFPBOA of the arbitrarily selected drone, s indicates the iteration count, and the
uniformly shared arbitrary variable is indicated by the term R ∈ [0, 1].
The feature selection is performed with the help of the developed The selection of the mother bumble bees and worker bumble bees is
HFPBOA which is also used for optimizing the parameter in the imple performed using Eq. (12), Eq. (13), and Eq. (14).
mented ensemble learning-based disease detection model. While nQab (s + 1) = nQab (s) + b⋅(nQab (s) − Qb (s) )
selecting the appropriate features, it helps to enhance the data quality ∑ (12)
x
without the loss of significant information. However, the accurate +c⋅ (nQab (s) − Wkrlb (s) )
feature selection is made to significantly speed up the training process of l=1
7
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
In Eq. (13) and Eq. (14), the terms cmax , and cmin denote the food • Fourth Rule: Owing to the wind and physical proximity, local
quantity given by the corresponding queen and worker bumble bees pollination contains few advantages over global pollination. Both are
under the local feeding phase for supporting the optimal queen bumble managed with the utilization of the variable T.
bee. Further, the parameters Lsi, and Lsimax indicate the present and
maximum number of iterations under the brood feeding phase. The term Global pollination assures the fittest reproduction with the insects
x is selected to show the chosen worker bumble bee. The modified po that have traveled for longer distances where the fittest is indicated as
sition of the bumble bee is represented with the aid of Eq. (15). h∗. The first rule of flower constancy is depicted by Eq. (16).
( )
Drnab = Drnab + α⋅(Drnlb − Drnmb ) (15) ys+1
a = ysa + γM h∗ − ysa (16)
In Eq. (15), the term α denotes the significant factor that helps the In Eq. (16), the term ysa indicates the pollen A that is also considered
neighboring drones Drnab . Finally, the optimal solution at the final as the solution vector at the sth iteration, the optimal solution is repre
iteration is obtained. sented as h∗ , the step size scaling factor is denoted as γ, and the polli
FPO [29]: The FPO algorithm is performed based on the character nation power or step size is denoted as M.
istics of the pollination. In the FPO algorithm, the pollination is carried Local pollination based on rule 2 for the flower constancy is deter
out based on three rules, which are mentioned below. mined as given in Eq. (17).
( )
• First Rule: The cross and biotic pollination are determined as global ys+1
a = ysa + ε ysb − ysm (17)
pollination, in which the pollinators maintain the Levy distribution.
In Eq. (17), the terms ysb as well as ysm denote the solution vectors that
• Second Rule: The self and abiotic pollination are considered as the
are changed from flowers to flowers. The variable ε is obtained through
local pollination.
the uniform distribution which is under the range of [0, 1]. The polli
• Third Rule: The property of flower constancy is observed to be the
nation process is either considered as global or local and so, the
reproduction ratio which is related to the similarity degree between
switching probability, ε which is in the range of [0, 1] is considered as
two flowers (solutions).
given in Rule 4.
The pseudocode of the suggested HFPBOA is represented in Algo
8
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
respectively. In Eq. (21), the term γ denotes the constant controlling the slant
(
TREpos + TREneg
) capacity. The complete contribution of processing is by means of the
Acr = ( ) (19) handling unit d. The overall computation process is given in Eq. (22).
TREpos + TREneg + FLEpos + FLEneg
∑
Netd = Wcd yc + θd (22)
TREpos
Pns = (20) c
TREpos + FLEpos
In Eq. (22), the term y′c denotes the outputs obtained from the pre
In Eq. (19) and Eq. (20), the true positive is denoted by TREpos , true vious layer and Wcd indicates the weights of the connecting interfaces
negative is denoted by TREneg false positive is denoted by FLEpos , and from cth unit to dth unit. The graphical representation of the NN-based
false negative is denoted by FLEneg . The developed ensemble learning- prediction model is given in Fig. 6.
based disease prediction model is depicted in Fig. 5.
5.2. Machine learning model 1: Neural network 5.3. Machine learning model 2: Fuzzy classifier
NN [27] is used in this developed ensemble-based multi-disease The multi-disease prediction is done utilizing the fuzzy classifier
prediction model with the weighted features as its input. In this model, [30]. This model is used for identifying health-related diseases. The
the input units pass the data function p. The device handles various fuzzy classifier is involved with the fuzzy set theory to manipulate and
modules of the hidden layer and generates results at the output level. express uncertainty and ambiguity. The rules are produced with the
The entire unit is comprised of an activation function, which is the support of fuzzy logic. The input given to this model is the weighted
sigmoid function in this case as provided in Eq. (21). features FfkWop . This has been performed using the triangular
9
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
Fig. 7. Diagrammatic representation of Fuzzy-based Prediction Model. The MCC is computed using Eq. (26).
10
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
TREpos + TREneg − FLEpos + FLEneg conventional disease detection models. The evaluation results are shown
MCC = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
)̅ (26)
( )( in Fig. 9. By analyzing the plots, it is seen that the offered HFPBOA-EL-
( TREpos + FLEpos )( TREneg + FLEpos ) based multi-disease prediction system has given better prediction results
TREpos + FLEneg TREneg + FLEneg
rather than the prior works.
6.3. ROC analysis of the implemented model against various disease 6.4. Evaluation of the performance of the executed multi-disease
prediction models prediction model using the cost function
The ROC curve analysis is considered for evaluating the effectiveness The cost function computation of the offered HFPBOA-EL-based
of the developed HFPBOA-EL-based multi-disease prediction system multi-disease prediction model for four different datasets is depicted
using four different benchmark datasets and is compared with the in the following Fig. 10. In this evaluation, various heuristic strategies
Fig. 9. ROC analysis on the suggested HFPBOA-EL-based multi-disease prediction model when compared with the prior works regarding “(a) Dataset 1 (b) Dataset 2
(c) Dataset 3 and (d) Dataset 4”.
11
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
Fig. 10. Cost function analysis on the suggested HFPBOA-EL-based multi-disease prediction model when contrasted against distinct existing optimization algorithms
with respect to “(a) Dataset 1 (b) Dataset 2(c) Dataset 3 and (d) Dataset 4”.
are taken into consideration to ensure the effectiveness of the imple statistical measures is given in below Table 3. For performing this sta
mented multi-disease prediction model. The suggested approach tistical analysis, various heuristic algorithms are considered and con
attained enhanced cost function values of 72.72 %, 77.77 %, 79.31 %, trasted against the implemented multi-disease prediction model for
and 82.85 % better than the heuristic strategies like SSA-EL, DOA-EL, validating the performance of the developed prediction model. The
BOA-EL, and FPA-EL, respectively while considering dataset 3. All the suggested model attained a median value of 9.81 %, 19.03 %, 18.44 %,
datasets performed well by the suggested HFPBOA-EL-based multi-dis and 61.23 % improved than the existing heuristic strategies like SSA-EL,
ease prediction model for all iterations when compared with the other DOA-EL, BOA-EL, and FPA-EL, respectively while taking the 1st dataset.
algorithms. All the statistical measures that have been utilized to check the effec
tiveness of the implemented model have given superior results for the
6.5. Effectiveness analysis of the executed multi-disease prediction model developed model when compared with various heuristic algorithms.
over various conventional optimization algorithms and prediction models
7. Conclusion
The performance of the developed HFPBOA-EL-based multi-disease
prediction model is assured by comparing its performance with various This research work aimed at the implementation of an effective
optimization strategies and previously developed prediction models multi-disease prediction framework with big data using ensemble ma
while concerning various evaluation metrics are depicted in the below chine learning approaches. Initially, four datasets were used as the
Fig. 11 and Fig. 12, respectively. The suggested model prediction rate in source for collecting the necessary healthcare data. The collected
terms of precision is 9.09 %, 12 %, 23.52 %, and 29.23 % higher than the healthcare data was used in the feature extraction phase, where the
heuristic strategies like SSA-EL, DOA-EL, BOA-EL, and FPA-EL, respec statistical feature and PCA feature were obtained. The extracted features
tively while considering dataset 4. In addition, the accuracy and preci were further utilized in the weighted feature selection phase, in which
sion rate of the developed HFPBOA-EL-based multi-disease prediction the selection of the optimal features and the weights were carried out
model is also higher than the previously used prediction methodologies. using the implemented HFPBOA. The acquired optimal weighted fea
tures were employed for the disease prediction phase. In this prediction
phase, the ensemble learning model was utilized, in which machine
6.6. Effectiveness computation of the executed multi-disease prediction learning models such as the NN, fuzzy, and KNN were utilized for
model in terms of statistical measures accurately predicting the diseases. The parameter optimization has
taken place in the ensemble learning model using the developed
The computational effectiveness of the developed HFPBOA-EL-based HFPBOA for enhancing the overall disease prediction performance. The
multi-disease prediction model when analyzed in terms of various
12
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
Fig. 11. Effectiveness analysis of the suggested HFPBOA-EL-based multi-disease prediction model when contrasted against various optimization algorithms
regarding “(a) Accuracy (b) FPR (c) FDR and (d) Precision”.
experimental validations were carried out on the executed multi-disease diagnosing diabetes, mental disorders and other disorders. Hence, the
prediction model. Experimental results have shown that the prediction technicians could able to diagnose the better treatment for the particular
rate of the suggested model ensemble model in terms of precision was individuals.
9.09 %, 12 %, 23.52 %, and 29.23 % higher than the heuristic strategies Advantages and limitations of the model.
like SSA-EL, DOA-EL, BOA-EL, and FPA-EL, respectively while consid The developed HFPBOA-EL model is performed to provide better
ering dataset 4. Thus, the enhanced prediction outcomes provided by the performance in healthcare informatics. Here, the developed model is
executed multi-disease prediction model were assured. In practical performed to predict the disease in an effective manner which could
applications, health informatics can be applicable in various real time enhance the accuracy and precision of outcomes. Also, the developed
environments like disaster management, electronic health records, and HFPBOA algorithm is utilized to provide the optimal solutions to
healthcare in remote areas. However, these applications could be per enhance the feature propagation of the model. Thus, it is effectively
formed to enhance the quality and efficiency of the data while optimizes the parameters to enhance the performance. However, the
13
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
Fig. 12. Effectiveness analysis of the suggested HFPBOA-EL-based multi-disease prediction model when contrasted against prior works regarding “(a) Accuracy (b)
FPR (c) FDR and (d) Precision”.
14
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
15
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247
[35] Pooja Singh; Marcello Carvalho Reis; Victor Hugo C. Albuquerque, Design of [37] M. Vara Siddardha Reddy, R. Sai Prasad, R. Sai Jagan, M. Selvi, Artificial
Artificial Intelligence Enabled Dingo Optimizer for energy Management in 6G intelligence for IoT-based Healthcare System, 2023, International Conference on
communication networks. AI-Enabled 6G Networks and Applications, Wiley, 2023. Computer Communication and Informatics (ICCCI) (2023) 1–5.
[36] M. Woźniak, M. Wieczorek, J. Siłka, BiLSTM deep neural network model for
imbalanced medical data of IoT systems, Futur. Gener. Comput. Syst. 141 (2023)
489–499.
16