0% found this document useful (0 votes)
41 views17 pages

Implementation of Machine Learning Techniques With Big Data and IoT To Create Effective Prediction Models For Health Informatics

Uploaded by

chaima.rebai01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views17 pages

Implementation of Machine Learning Techniques With Big Data and IoT To Create Effective Prediction Models For Health Informatics

Uploaded by

chaima.rebai01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/379838882

Implementation of machine learning techniques with big data and IoT to


create effective prediction models for health informatics

Article · April 2024

CITATION READS

1 111

6 authors, including:

Abu Sarwar Zamani Abdallah S.A. Shatat


Prince Sattam bin Abdulaziz University Applied Science University
130 PUBLICATIONS 1,070 CITATIONS 17 PUBLICATIONS 114 CITATIONS

SEE PROFILE SEE PROFILE

Md Mobin Akhtar

24 PUBLICATIONS 200 CITATIONS

SEE PROFILE

All content following this page was uploaded by Abu Sarwar Zamani on 16 April 2024.

The user has requested enhancement of the downloaded file.


Biomedical Signal Processing and Control 94 (2024) 106247

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control


journal homepage: www.elsevier.com/locate/bspc

Implementation of machine learning techniques with big data and IoT to


create effective prediction models for health informatics
Abu Sarwar Zamani a, b, *, Aisha Hassan Abdalla Hashim b, Abdallah Saleh Ali Shatat c,
Md. Mobin Akhtar d, Mohammed Rizwanullah a, Sara Saadeldeen Ibrahim Mohamed a
a
Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, Saudi Arabia
b
Department of Electrical and Computer Engineering, International Islamic University Malaysia, Malaysia
c
Department of Management Information Systems, College of Administrative Sciences, Applied Science University, Kingdom of Bahrain
d
College of Applied Medical Science, Department of Basic Sciences, Riyadh Elm University (REU), Saudi Arabia

A R T I C L E I N F O A B S T R A C T

Keywords: As a result of the availability of healthcare data in sheer size, big data analytics has to grow regularly in this
Prediction Model industry to ensure new and effective opportunities. This is helpful in providing early prevention, prediction, and
Health Informatics detection of disease, thus helping in the enhancement of the overall life quality of the individuals. Likewise, in
Big Data
this paper, a machine learning-based big data analytics model is developed for predicting multi-diseases to
Machine Learning Techniques
Hybrid Flower Pollination Bumblebees
provide a better decision support system for various healthcare applications. This developed framework utilizes
Optimization Algorithm the MapReduce framework, where the map phase performs feature extraction and the reduce phase performs
Neural Networks feature selection for the purpose of handling and processing big data. The required healthcare data is collected
Fuzzy Classifier from external web sources. In the map phase, the statistical features and the Principal Component Analysis (PCA)
K-Nearest Neighbour features are extracted. In the reduction phase, the optimal features are selected with the aid of the developed
Hybrid Flower Pollination Bumblebees Optimization Algorithm (HFPBOA). Then, the Ensemble Learning (EL)
model is developed to predict the multi-diseases. Moreover, the parameters present in the EL classifiers are
optimized by using the same HFPBOA. The final prediction output is obtained by averaging the weight function
between the outputs of the NN, KNN, and fuzzy classifier. Thus, the offered model attains 40.1%, 28.7%, 23.6%,
and 10.5% improved than SSA-EL, DOA-EL, BOA-EL, and FA-EL respectively in terms of best value. The effec­
tiveness computed for the developed multi-disease prediction framework is guaranteed by comparing the results
among the recently developed prediction approaches.

1. Introduction in the delivery of various healthcare services [2]. For the convenience of
the patients, these IoT-based wireless devices are worn or implanted on
Based on the support of massive internet connectivity and an abun­ the patient’s body in order to collect the patient’s healthcare data in a
dance of bandwidth feasibility, the Internet of Things (IoT) has shown remote manner. This technology is considered the Internet of Medical
much development in recent years. These IoT play an essential role in Things (IoMT). In the IoMT system, the sensor devices are used for
the development of various electronics industries as it paves a new way transmitting the significant symptoms of the patients to the health care
for implementing these devices in an automated and remote manner. providers with the aid of IoT technology in a wireless and remote
Like any other field, the medical domain has also been completely manner [3]. This data obtained by means of IoT-based applications helps
modified with the aid of IoT. This IoT technology is also helpful in in providing effective treatment plans and helps in making consultations
regenerating various applications in this medical domain as well [1]. or interventions in some cases by means of webcam with the respective
IoT-based healthcare systems are incorporated for providing accessi­ doctors. Moreover, the data helps to make the predictive analytics and
bility to data movement, information exchange, machine-to-machine thus helps the doctors in elevating the disease with a high degree of
communication, and interoperability, which on the other hand helps accuracy rate with the utilization of the gathered health patterns with

* Corresponding author at: Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, Saudi Arabia.
E-mail addresses: [email protected] (A.S. Zamani), [email protected] (A.H.A. Hashim), [email protected] (A.S.A. Shatat), mohammed.akhtar@
riyadh.edu.sa (Md.M. Akhtar), [email protected] (M. Rizwanullah), [email protected] (S.S.I. Mohamed).

https://doi.org/10.1016/j.bspc.2024.106247
Received 14 October 2023; Received in revised form 2 January 2024; Accepted 20 March 2024
Available online 3 April 2024
1746-8094/© 2024 Elsevier Ltd. All rights reserved.
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

the aid of the IoT-based sensing devices at a much faster rate [4]. machine learning models together with the support of a hybrid
However, these benefits are unparalleled manner and assure significant optimization algorithm in order to provide effective decisions in the
decision-support information. On the other hand, it is considered a field of medical science.
challenging task as it involves a high number of noises in the informa­ • To perform feature selection with the support of the developed
tion, and such systems have to deal with a huge variety of information HFPBOA for enhancing the overall efficiency of healthcare data
[5]. The highly essential threats caused while processing the IoMT in­ prediction with the selection of optimal features and weights for the
formation are related to the privacy and security of the data. Since these corresponding features for the purpose of maximizing the variance of
devices record and transmit the data over a practical environment in­ the weighted features.
dependent of the standard data protocols and to systems without the • To propose an efficient ensemble-based prediction model that is
support of data ownership regulations, these generated and transmitted developed by combining NN, fuzzy, and KNN for predicting three
information are susceptible to hack and fraudulence activity [6]. kinds of diseases such as diabetes, kidney-related ailments, and heart
In conventional methods, information extraction, visualization, diseases in the patients. In addition, the parameter optimization
processing, and storage are observed to be challenging issues when it takes place with the help of the developed HFPBOA in order to get
comes to the utilization of huge volumes of data of varying data types. accurate and precise results in the disease prediction process.
The most significant challenge that occurs in big data analytics is to • To suggest a hybrid optimization algorithm named the HFPBOA for
explore new ways to efficiently acquire the required information that is feature selection and for the optimization of the parameters like
required by diverse types of users [7]. Recently, diverse varieties of hidden nodes in NN, learning rate in NN, membership function in
healthcare sources have continuously gathered from both the non- fuzzy, and neighbors count in KNN in order to improve the overall
clinical and clinical scenarios, in which the highly essential data is performance capability of the prediction model.
regarded as a digital representation of the medical history of the patients • To evaluate the implemented healthcare data prediction framework
who are under healthcare analytics [8]. Hence, the development of a by comparing it against various existing state-of-the-art models with
distributed data system is required to solve the challenges that are dis­ the utilization of various performance metrics.
cussed as follows. The first challenge is due to the generation of a huge
volume of heterogeneous data which makes it difficult to gather the data The rest of the sections in this paper are given as follows. Section II
by means of shared locations [9]. Next, the major issues are concerning provides the existing medical data prediction model with its imple­
the storage concern because of the utilization of massive and hetero­ mentation details, advantages, and disadvantages. Section III describes
geneous datasets [10]. The big data system is required with an effective the development of the implemented disease prediction models having
storage system in order to assure efficiency. The final challenge is caused big data. Section IV describes the map-reduce framework for big data
due to big data analytics, more precisely, while analyzing massive analysis. Section V provides the steps that are involved in the develop­
datasets over real-time platforms that include optimization, prediction, ment of the ensemble learning model for assisting in the execution of the
visualization, and modeling [11]. These challenges have to be solved disease prediction framework. Section VI provides the results obtained
with a novel processing procedure to act as the data management sys­ and their respective discussions. Section VII gives the conclusion of the
tem, as the present data management systems are considered to be developed prediction model.
inefficient in handling practical and heterogeneous data [12].
MapReduce is considered a parallel processing approach that is uti­ 2. Literature survey
lized to analyze huge volumes of data that are distributed over the
commodity cluster. This technique is comprised of both Map and Reduce 2.1. Related works
operations. This MapReduce framework is used for solving problems
related to computational complexity and training complexity when Kumar et al. [18] have designed a genetic-based Fuzzy C-means data
utilizing big data [13]. Based on the MapReduce framework, Hadoop is clustering approach to identify the state of the affected over the edge.
developed as the batch-wise processing architecture which is commonly Clustering has generated precise data for all objects that were further
utilized for shared storage and analysis of big data [14]. The machine modified genetically to avoid stagnation under the local optima. A deep
learning domain is considered a subdivision of computer science. It is learning strategy was finally employed for performing the classification
one of the branches of artificial intelligence techniques that ensure that task. The framework was validated and a comparison between the re­
machines use the capability of learning independently for explicit pro­ sults obtained from the executed model with the other conventional
gramming [15]. Machine learning is derived from pattern identification techniques with respect to its time complexity was carried out. This
as well as from computational learning theory [16]. The increasing comparison has shown that the data clustering approach has performed
quantity of data and their types leads to the development of powerful better reduction in the processing step even on the utilization of prac­
tools for storing, and analyzing the data and also for obtaining useful tical data streams.
insights for supporting the decision-making [17]. Moreover, machine Tan et al. [19] have implemented a clinical data system that was
learning schemes are employed in the healthcare industry for predicting developed by obtaining assistance from big data analytics. The patient’s
disease that helps the patient to undergo effective measures at the right information was arbitrarily divided for training and testing phases for
time. The motivation behind the study has been listed here. The existing the implemented model. Based on the processing of the electronic health
health information system often fails to provide effective outcomes in records, this model has employed the synthetic minority oversampling
terms of the data threats, and transmission delay errors. Moreover, it approach for determining the five resultant outcomes. The developed
causes misdiagnosis issues while handling larger datasets. Therefore, model has been evaluated and the evaluation results have shown the
this paper aims to develop a healthcare data prediction model with a better competence of the implemented model over other conventional
combination of various machine-learning approaches. This developed frameworks.
model helps to effectively handle the larger amount of datas whereas the Safa et al. [20] have implemented a big data-based framework for
accurate feature extraction process is performed to provide significant detecting cardiac diseases in individuals by utilizing IoT devices with the
information. Thus, the result analysis has been validated to show the consideration of fuzzy rules. Here, the features were generated by means
effective performance. of the fuzzy rule in order to assist in the diagnosis of cardiac diseases,
This research paper has the following as its major contributions. which were then utilized in this model training. The detection was done
with the utilization of the optimized recurrent neural network (RNN).
• To implement an effectual healthcare data prediction model that From the RNN, the final classification was performed. The high perfor­
consisted of big data by using the ensemble approach of various mance rate was observed through the experimental analysis of the

2
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

suggested model in comparison with the conventional approaches. 2.2. Problem statement
Ed-daoudy et al. [21] have implemented a novel architecture for
predicting the health status of an individual in practical scenarios as well The unprecedented growth of the electronic healthcare sector and
as made an analytics system based on the support of a big data approach. the on-demand of the healthcare system have increased the demand for
Using certain input attributes, the system was made capable of pre­ data analysis and big data opportunities with the utilization of machine
dicting the health status along with providing alert messages to the care learning models. The disparate and massive data management with the
providers. This information has been preserved in the distributed data­ aid of conventional methods is highly expensive and it is incredibly
base for performing stream reporting and health data analytics in real- difficult. Hence, several machine learning-based models are used along
time applications. The performance evaluation was made with the with the big data analytics-based health monitoring system. The ad­
consideration of execution time and throughput. The performance vantages and the disadvantages of these existing machine learning-aided
offered by the developed model was observed to be higher than various big data analytics-based healthcare monitoring systems are depicted in
conventional models. below Table 1. Fuzzy [18] provides better results by reducing the
Manogaran et al. [22] have designed a “Bayesian Hidden Markov computational overhead and increasing the overall detection accuracy.
Model (HMM) along with the support of the Clustering technique” for In addition, during emergency incidents, this model also triggers the
modeling a system to predict the number of changes in the DNA copy alarms without relying on the backend servers. However, it faces several
over the genome. The developed Clustering technique was correlated difficulties while traversing, grouping, and selectively tapping the
with the conventional techniques and was related to the segment IoMT’s data traffic. Also, it provides poor performance in terms of ac­
neighborhood and binary segmentation techniques. The analysis report curacy when detecting stress, baseline, and amusement in individuals.
has shown the effectiveness of the developed model while detecting the XGBoost and logistic regression [19] give promising results regarding
changes in the DNA number in an effective manner. data integrity and data security. Moreover, it makes decisions with the
Nibareke et al. [23] have presented an ensemble machine technique computation of low bandwidth and less response time. Yet, the delay
for predicting diabetes disease. Moreover, performance analytics was and path loss that occurs in the system is much higher. Also, it provides
carried out over flight delays. The overview was performed with big data low performance when fetching the patterns from the mental behavior
tools and was also with the aid of machine learning frameworks. Several of an individual. In Fuzzy [20], the routing defects are effectively
metrics were used to evaluate in terms of the accuracy of prediction. The analyzed and the risks that arise due to congestion in the IoMT are
diabetic prediction was performed and analyzed. The results were resolved. Furthermore, it produces premature influence rates which is
compared with distinct conventional models. The results suggested the helpful in the early diagnosis of the diseases. However, it needs a little
enhanced performance offered by this implemented model over other more flexibility in deciding which data has to be discarded or selected.
conventional frameworks. Also, it requires a huge storage system for retrieval and analytical op­
Ashiku et al. [24] have explored the ability of an open-source model erations. Decision Tree [21] effectively reduces the normalization error.
named Apache Spark by analyzing a huge amount of data over the In addition, the cost function is highly reduced in this model. Never­
clusters for evaluating the big data and have also integrated the tech­ theless, it acquires high power leakage which may lead to more energy
nologies for assuring the decision support systems under the healthcare consumption. Moreover, it is difficult to apply the encryption algorithm
environments. Further, the developed machine learning frameworks in this model. Gaussian Mixture (GM) clustering [22] minimizes the loss
have utilized Apache Spark for aiding in the decision-making task at the function and also highly decreases the false acceptance rate. Yet, it
time of allocating the organs such as the selection of a kidney for the consumes more time for analyzing a large volume of data. In addition, it
appropriate candidate, and hence, enhancing the donor utilization by needs high storage space for saving the entire health data. Linear
localizing the recipient under the allotted time. The developed frame­ Regression, Naïve Bayes, and Decision Tree [23] improve the effec­
work has shown the identification of the waitlisted candidates for tiveness of the model in terms of jitter, latency, power consumption, and
accepting the kidneys that have been neglected by utilizing this network bandwidth. Moreover, it can handle disparate and complex
framework. data. However, it is affected by a lack of platform interoperability that
Pustokhin et al. [25] have recommended a novel feature selection as generates a massive burden in monitoring and controlling the healthcare
well as a disease diagnostic framework using big data analytics and Deep system. Also, the computational cost required for the system is exces­
Belief Network (DBN). To minimize the number of features and to sively high. In PCA [24], each change point indices are effectively
reduce the dimensionality curse, the “Link-based Quasi Oppositional computed. Furthermore, it improves the veracity and reliability of the
Binary Particle Swarm Optimization Algorithm” was used for selecting system while dealing with big data. But it fails to provide high robust­
the features in order to obtain the optimal feature set. Finally, the DBN ness. In addition, this model is also accomplished with high latency and
was used for classifying the existing disease with the support of the energy power consumption. DBN [25] highly reduces the bottleneck
feature-reduced data. The experimental analysis has shown the effect. Moreover, it improves the scalability of the health monitoring
enhanced performance offered by the developed model while consid­ system. Yet, it leads to a wastage of resources during data training as it
ering various aspects and comparing it with other traditional disease does not encounter the changes every second of the day. Meanwhile, it is
detection approaches. affected by congestion problems.
Marcin et al. [36] have suggested the IoT based models where it has
been integrated with the Bi-LSTM, decision tree, and data balancing 2.3. Motivation
strategy for the automated diagnosis support decision system. In order to
provide optimal solutions, the data pre-processing model has been In recent times, advanced technology is adopted which is inbuilt with
applied for the effective network training model. Henceforth, the IoT and Artificial intelligence based applications. Here, the existing
developed model has been performed to validate with different mea­ techniques suffer from providing accurate outcomes while is handles the
sures to provide effective outcomes in terms of accuracy, precision, and larger number of datas. However, the existing techniques often lead to
recall. Vara et al. [37] have developed the IoT-based 6G healthcare was cause misdiagnosis issues while considering the larger number of datas
performed using artificial intelligence. Here, the research work has been in healthcare informatics. These challenges that arise in the existing big
focused in terms of artificial intelligence and IoT which has been inbuilt data analytics regarding the IoT healthcare systems are resolved by
in the medical infrastructure in some of the clinical medical fields. The using machine learning techniques for the development of a new big
diverse analysis has been analyzed to show the effective performance of data analytic model. However, the developed HFPBOA algorithm has
the developed model while compared with the other state-of-the-art the ability to tune the parameters optimally in order to show accurate
methods. outcomes. However, the offered model provides better performance

3
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Table 1 Table 1 (continued )


Features and challenges of big data analytics model with IoT in the healthcare Author [citation] Methodology Features Challenges
systems.
• It can handle healthcare
Author [citation] Methodology Features Challenges disparate and system.
Kumar et al. [18] Fuzzy • It provides • It faces several complex data. • The
better results difficulties while computational
by reducing the traversing, cost required for
computational grouping, and the system is
overhead and selectively excessively high.
increasing the tapping the Ashiku et al. [24] PCA • Each change • It fails to provide
overall IoMT’s data point index is high robustness.
detection traffic. effectively This model is
accuracy. • It provides poor computed by also
• During performance in using this accomplished
emergency terms of approach. with high
incidents, this accuracy when • It improves the latency and
model also detecting stress, veracity and energy power
triggers the baseline, and reliability of consumption.
alarms without amusement in the system
relying on the individuals. while dealing
backend with big data.
servers. Pustokhin et al. [25] DBN • It highly • It leads to a
Tan et al. [19] XGBoost, • It gives • The delay and reduces the wastage of
logistic promising path loss that bottleneck resources during
regression results in data occurs in the effect. data training as
integrity and system is much • It improves the it does not
data security. higher. scalability of encounter the
• It makes • It provides low the health changes every
decisions with performance monitoring second of the
the when fetching system. day.
computation of the patterns • It affected with
low bandwidth from the mental congestion
and less behavior of an problem.
response time. individual.
Safa and A. Pandian Fuzzy • The routing • It needs a little
[20] defects are more flexibility while validating with the standard performance measures and also it
effectively in deciding enhances the performance in healthcare informatics while taking a huge
analyzed and which data has number of datas. Thus, accurate treatment is suggested for the patients
the risks that to be discarded
arise due to or selected.
effectively.
congestion in • It requires a
the IoMT are huge storage 3. Intelligent prediction models for health informatics using
resolved. system for advanced machine learning techniques on big data
• It produces retrieval and
premature analytical
influence rates operations. 3.1. Proposed model and description
which is helpful
in the early Nowadays, big data has become an increasingly popular domain in
diagnosis of the various fields that are associated with society, technology, science, and
diseases.
Ed-daoudy and Decision Tree • It effectively • It acquires high
engineering. A huge amount of data is recorded and produced from
Maalmi [21] reduces the power leakage diverse sectors, from distinct resources like sensor networks, mobile
normalization which may lead applications, high throughput instruments, and streaming machines,
error. to more energy and also in other fields like the healthcare industry. This high data
• The cost consumption.
volume is represented by the big data. The challenges that are caused
function is • It is difficult to
highly reduced apply the while analyzing big data include time for data storage, analysis,
in this model. encryption retrieval, and information gain. Most of the conventional models have
algorithm in this lacked in handling this big data due to the voluminous behavior and
model. heterogeneous nature of the big data. This may result in certain diffi­
Manogaran et al. GM clustering • It minimizes • It consumes
[22] the loss more time for
culties while collecting the data from various sources. Hence, this work
function. analyzing a large makes use of the MapReduce technique to solve the computational
• It highly volume of data. complexity problem. Also, this work makes the healthcare data predic­
decreases the • It needs high tion with the utilization of an ensemble learning technique which is
false storage space for
depicted in Fig. 1.
acceptance saving the entire
rate. health data. An effective healthcare data prediction framework with the aid of big
NibarekeandLaassiri Linear • It improves the • It is affected by data is implemented in this work with the utilization of an ensemble
[23] Regression, effectiveness of the lack of learning model in which various machine learning models are used
Naïve bayes, the model in platform together along with the aid of a hybrid optimization algorithm in order
Decision Tree terms of jitter, interoperability
latency, power which generates
to provide effective decisions in the clinical field. Initially, four datasets
consumption, a massive are used for collecting the healthcare data related to diabetes, heart, and
and network administrative kidney for predicting the presence of the diseases in individuals. The
bandwidth. burden in the composed healthcare information is used for obtaining the features,
whereas the statistical and PCA-based techniques are used for obtaining
the features in order to determine the type of diseases. The obtained

4
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

diseases. Here, the parameters like the hidden nodes in NN, learning rate
in NN, membership function in fuzzy and neighbors count in KNN are
optimized with the aid of the implemented HFPBOA for enhancing the
overall performance of the prediction. From the ensemble model, the
final predicted disease outcome is obtained. The output from the
executed is validated by comparing it with other existing disease
detection approaches.

3.2. Healthcare Dataset details

The developed multi-disease prediction model collects the required


data from four online datasets “Pima Indians Diabetes Database, Dia­
betes, Heart Disease Dataset (Comprehensive), and waitlist-kidney-
brazil” whose description is briefly described in Table 2. The data
collected from the four standard medical databases are represented as
MDdataset
f , where the term f = 1, 2, …, F and F denotes the total number of
gathered medical data. These data are given into the implemented
model for further processing.

4. Map reduce framework for big data-based healthcare data


prediction

4.1. The architecture of map reduce framework

MapReduce framework is considered a parallel programming tech­


nique that is used for evaluating big data with the support of a group of
machines. Commonly, this framework is comprised of two stages,
including the map stage and the reduce stage. Initialization of the map
Fig. 1. Structural view of the developed healthcare data prediction framework.
stage is done as the first process. In the map stage, the input data gets
analyzed under the map function and further, in the second phase, the
generation of certain intermediate results is carried out to create the
features are further utilized for selecting the optimal features with the
final output. Particularly, the “MapReduce model” works in accordance
aid of the HFPBOA. The optimization of the weight is carried out with
with the basic data structure that is considered in the form of a key-value
the help of the implemented HFPBOA. This optimal selection of the
pair, and the entire data processed under the MapReduce is employed by
features and optimization of weights are performed to maximize the
terms of the key-value pair terms. In this way, the map phase and reduce
variance between the features. The selected features are combined with
phase are performed which are given elaborately as follows.
the optimal weights to obtain the optimal weighted features. The ac­
quired optimal weighted features are employed for the disease predic­
• Map-phase: The header node is involved in the segmentation of the
tion phase. In this prediction phase, the ensemble learning strategy is
input dataset to generate autonomous blocks and share them with
utilized in which various machine learning approaches such as the NN,
the block nodes. Further, the block node is utilized to process the
fuzzy, and KNN are utilized for accurately predicting the type of
smaller problems and forward the solution back to the header node.

Table 2
Dataset Description of the Proposed Healthcare Prediction Model.
Dataset name Data link Dataset description

Dataset 1: “Pima Indians “https://www.kaggle.com/uciml/pima-indians-diabetes-database: This dataset is used for accurately predicting diabetic disease in patients
Diabetes Database” access date: 2023-01-11″ by concentrating on several diagnostic computations. These details are
also provided in this dataset. Here, the data are gathered from female
patients whose age is at least 21 years. This dataset is enclosed with
some variables that act as the medical predictor and a solitary target
parameter, where the predictor parameters are comprised of the
pregnancy count of the patients along with the insulin level, age, and
BMI.
Dataset 2: Diabetes “https://datahub.io/machine-learning/diabetes#data: access date: This dataset contains data related to diabetes under three data file type,
2023-01-11″ which are named as diabetes-off, diabetes, and diabetes-zip. In this
dataset, the diabetes-related databases are considered with the nine
attributes, where the final attributes give the details regarding whether a
patient is tested positive or tested negative for diabetics.
Dataset 3: “Heart Disease “https://www.kaggle.com/sid321axn/heart-statlog-cleveland-h This dataset is enclosed with 1190 records about the patients obtained
Dataset (Comprehensive)” ungary-final: access date: 2023-01-11″ from Hungary, Switzerland, the UK, and the US. Each of the data
provides various attributes that are required to analyze the presence of
diabetics. These 11 features include the “patient’s age, sex, chest pain
type, cholesterol, resting bp, fasting blood sugar, resting ECG, max heart
rate, exercise angina, and old peak”.
Dataset 4: waitlist_kidney_brazil “https://www.kaggle. This dataset supports predicting the waiting time of the deceased donor
com/datasets/gustavomodelli/waitlist-kidney-brazil: access date: 202 for a kidney transplant which is useful for clinicians and patients for
3-01-11″ suggesting the management and contributing to the effective usage of
the resources.

5
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Considering the key-value pairs, the map function obtains the key- In Eq. (1), the term rq denotes the total number of samples and the
value pair as the input, and produces the collection of intermediate term rfrl indicates the data values. The arithmetic mean is represented as
key-value pairs as outputs. Before executing the reduce phase, the MEn.
MapReduce library combines the entire intermediate values related Minimum Value: Minimum defines “the minimum value that is
to the relevant intermediate key and makes transmissions for available in the gathered medical images. With respect to the statistical
fastening the processing in the reduce phase. measures, it is the smallest observation and the sample minimum”.
• Reduce-phase: The header node obtains the solution to the entire Standard deviation σ: Standard deviation is “the distribution in each
sub-problems and fuses them in a certain way to generate the final sample value”. It is computed using Eq. (2).
output. With the consideration of key-value pairs, the reduce phase √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ ∑
involves the transitional keys that are assured from the MapReduce √ 1 rg ( )2
σ=√ rJrj − rJ (2)
library and will generate the final results with regard to the respec­ rg rj=1
tive set of values and keys.
In Eq. (2), the term rJ denotes the mean distribution.
The MapReduce framework is diagrammatically illustrated in Fig. 2. Median MDn: Median is defined as “a simple measure of the central
tendency. It is determined by organizing the sample values from the
minimum to maximum value”. It is computed as given in Eq. (3).
4.2. Map Phase: Feature extraction phase
{ }th
(rq + 1)
MDn = value (3)
The map phase in the MapReduce framework is used for performing 2
the task of feature extraction in this research work. Thus, this model gets
Correlation: Correlation shows “how to correlate a sample to its
the collected data MDdataset as the input. Here, the feature extraction is
f neighboring sample within the entire observations”. It is computed as
performed based on the statistical and PCA-based methods, which are given in Eq. (4).
described in detail as follows.
PCA [26]: The collected healthcare data MDdataset is used as input to ∑1
rm−
MDrurv (ru − μ)(rv − μ)
f contrast = (4)
this technique to minimize the feature dimension while preserving the ru,rv=0
σ2
essential information. When utilizing a higher dimensional feature,
In Eq. (4), the term MDrurv denotes the dimensionality of the data at
more complexity is created when utilizing these features for training the
the position ru and rv denotes the variance of the intensity values is
model. Hence, the PCA is involved in this work for the minimization of
denoted by σ2 . The term μ denotes the average value or mean for the
the feature-length as well as for enhancing the interoperability among
entire data.
the healthcare data. At first, the PCA computes the patterns among the
Contrast: Contrast is computed using Eq. (5).
healthcare data. Then these patterns are used for determining the vari­
ations and similarities between them. Following determining the pat­ r∑
m− 1

terns, data compression takes place to minimize the dimensions. PCA is contrast = MDrurv (ru − rv)2 (5)
progressed with the utilization of variance, covariance, and standard
ru,rv=0

deviation which are all generated as output when provided with the Entropy: Entropy gives “the amount of data loss in a sample which
input of healthcare data. Further, the mean computation is performed provides the required information for the following observation”. It is
with every data dimension. Moreover, the covariance matrix for the computed using Eq. (6).
healthcare data is determined together with the eigenvectors and ei­
r∑
m− 1
genvalues. Then, the eigenvector elements are selected to design the entropy = MDrurv logMDrurv (6)
feature vector, which is observed to be an eigenvector matrix. The ru,rv=0
extracted features from PCA are represented as Fxpca b .
The extracted statistical features are represented as Fxsts
d .
Statistical features [27]: The minimum, maximum, mean, standard
deviation, entropy, correlation, contrast, and correlation are all extrac­
ted by means of the statistical feature extraction method. These 4.3. Reduce Phase: Feature selection phase
extracted features that are being extracted are described as follows.
Maximum Value: Maximum explains “the largest value that is pre­ In the reduce phase, the extracted PCA-based as well as the statistical
sent within the collected medical data. Regarding the statistical mea­ features Fxpca
b and Fxsts
d , respectively are used. In order to select the
sures, it refers to the largest observation and the sample maximum”. optimal features, the developed HFPBOA is employed. This HFPBOA is
Mean: Mean is defined as “the sum of a collection of numbers divided used for reducing the feature-length with the selection of the best fea­
by the total numbers of samples in the collection” as in Eq. (1). tures for predicting the disease in the individual. Based on the developed
rq HFPBOA, the optimal feature set 1 is selected from Fxpca b , which is
1 ∑
(1) indicated as FxOpt1 , and the optimal feature set 2 is selected from Fxsts
d by
MEn = rfrl g
rq rl=1
the same HFPBOA, which is represented as FxOpt2
h . Here, the number of

Fig. 2. MapReduce Framework Model.

6
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

optimal features that are selected from each of the extracted feature sets time while solving the problems regarding the analytical queries.
is taken as 7. Then, the weight optimization in the feature fusion stage However, this algorithm has certain challenges like a low convergence
for minimizing the training complexity is done using the same HFPBOA, rate and this algorithm also stuck easily into the local optima. So, the
where the optimized weight 1 is indicated as We1g and the optimized FPO [29] algorithm is used along with the BMOA to enhance the per­
weight 2 is denoted as We2h . The weighted feature selection with formance rate by solving the existing problems in the conventional
HFPBOA is mathematically depicted by means of Eq. (7). BMOA. In the proposed HFPBOA, the random parameter Rand is adap­
( ) ( tively updated with the aid of the fitness mechanism as shown in Eq.
)
FfkWop = FxOpt1
g *We1g + FxOpt2
h *We2h (7) (10).
( )
worFit
In Eq. (7), the weights We1g and We2h are optimized in the range of Rand = ∗ R0 (10)
bstFit
[0.01, 0.99]. The weighted optimal features that are obtained at the end
of this stage are indicated as FfkWop . The objective of this optimization is In Eq. (10), the term worFit denotes the worst fitness value and bstFit
given by Eq. (8). indicates the best fitness value among the solutions. The arbitrary value
( ) is denoted by R0 . If the condition (Rand < 0.5) is satisfied, then the FPA-
OB1 = argmin
1
(8) based update of the solution takes place for the respective problem, or
{We1 ,We2 ,FxOpt1 ,FxOpt2 } vRN
g h g h
else, the BMOA is used for updating the solution for the provided
problem.
As given in Eq. (8), the objective of optimal feature selection and BMOA [28]: The BMOA optimization algorithm is developed based
weight optimization is to maximize the variance among the weighted on the mating characteristics of the bumble bees for resolving various
features which is represented by vRN. Variance is defined as “the complex optimization problems. Here, the bumble bees are represented
computation that is carried out by taking the average of squared de­ as the candidate solution over the search space, which is arranged ac­
viations from the mean of sentence depths in each data”. Variance is cording to the objective function. This algorithm considers the best so­
computed as provided in Eq. (9). lution as the queen bumble bees and the drone bumble bees are
1 ∑(
rg
)2 considered as the potential candidate solution. The mating behavior of
vRN = σ2 = rJrj − rJ (9) the queen bumble bees is formulated in Eq. (11).
rg rj=1
{
Qb (s) if (R⩽Dr1)
The developed HFPBOA-based weighted feature selection for multi- broodab (s) = (11)
drone − grntynb (s) otherwise
disease prediction is diagrammatically represented in Fig. 3.
In Eq. (11), the term a indicates the brood index, a denotes the index
4.4. Feature selection using HFPBOA of the arbitrarily selected drone, s indicates the iteration count, and the
uniformly shared arbitrary variable is indicated by the term R ∈ [0, 1].
The feature selection is performed with the help of the developed The selection of the mother bumble bees and worker bumble bees is
HFPBOA which is also used for optimizing the parameter in the imple­ performed using Eq. (12), Eq. (13), and Eq. (14).
mented ensemble learning-based disease detection model. While nQab (s + 1) = nQab (s) + b⋅(nQab (s) − Qb (s) )
selecting the appropriate features, it helps to enhance the data quality ∑ (12)
x
without the loss of significant information. However, the accurate +c⋅ (nQab (s) − Wkrlb (s) )
feature selection is made to significantly speed up the training process of l=1

the model. Henceforth, the HFPBOA is developed by integrating the ( )


(cmax − cmin ) ∗ Lsi
Bumble Bees Mating Optimization Algorithm (BMOA) with the Flower b = cmax − (13)
Lsimax
Pollination Algorithm (FPO) for performing an effective optimization in
the developed multi-disease prediction model. Here, BMOA [28] is used ( )
(cmin − cmax ) ∗ Lsi
in the developed model due to its advantage of reducing the total c = cmin − (14)
Lsimax
evaluation cost which further results in the minimization of the response

Fig. 3. Optimal Weighted Feature Selection with the Developed HFPBOA.

7
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

In Eq. (13) and Eq. (14), the terms cmax , and cmin denote the food • Fourth Rule: Owing to the wind and physical proximity, local
quantity given by the corresponding queen and worker bumble bees pollination contains few advantages over global pollination. Both are
under the local feeding phase for supporting the optimal queen bumble managed with the utilization of the variable T.
bee. Further, the parameters Lsi, and Lsimax indicate the present and
maximum number of iterations under the brood feeding phase. The term Global pollination assures the fittest reproduction with the insects
x is selected to show the chosen worker bumble bee. The modified po­ that have traveled for longer distances where the fittest is indicated as
sition of the bumble bee is represented with the aid of Eq. (15). h∗. The first rule of flower constancy is depicted by Eq. (16).
( )
Drnab = Drnab + α⋅(Drnlb − Drnmb ) (15) ys+1
a = ysa + γM h∗ − ysa (16)
In Eq. (15), the term α denotes the significant factor that helps the In Eq. (16), the term ysa indicates the pollen A that is also considered
neighboring drones Drnab . Finally, the optimal solution at the final as the solution vector at the sth iteration, the optimal solution is repre­
iteration is obtained. sented as h∗ , the step size scaling factor is denoted as γ, and the polli­
FPO [29]: The FPO algorithm is performed based on the character­ nation power or step size is denoted as M.
istics of the pollination. In the FPO algorithm, the pollination is carried Local pollination based on rule 2 for the flower constancy is deter­
out based on three rules, which are mentioned below. mined as given in Eq. (17).
( )
• First Rule: The cross and biotic pollination are determined as global ys+1
a = ysa + ε ysb − ysm (17)
pollination, in which the pollinators maintain the Levy distribution.
In Eq. (17), the terms ysb as well as ysm denote the solution vectors that
• Second Rule: The self and abiotic pollination are considered as the
are changed from flowers to flowers. The variable ε is obtained through
local pollination.
the uniform distribution which is under the range of [0, 1]. The polli­
• Third Rule: The property of flower constancy is observed to be the
nation process is either considered as global or local and so, the
reproduction ratio which is related to the similarity degree between
switching probability, ε which is in the range of [0, 1] is considered as
two flowers (solutions).
given in Rule 4.
The pseudocode of the suggested HFPBOA is represented in Algo

Fig. 4. Flowchart of the developed HFPBOA.

8
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

rithm 1 and the flowchart of the suggested HFPBOA is illustrated in


Fig. 4.
Algorithm 1: Suggested HFPBOA

Flowers and bumble bee populations get initialized


Parameters R0 ,α, and h∗ are initialized
For y→1 to ITmax
While (until the stopping state is reached)
Determine the fitness of solutions
If (Rand < 0.5)
Position improvement takes place using FPA
Modify the solution using Eq. (16) or Eq. (17)
Else
Position improvement takes place using BMOA
Modify the solution using Eq. (12)
End
End
The best solution acquired

5. Big data-based prediction models for health informatics using


ensemble learning

5.1. Proposed ensemble learning

The developed multi-disease prediction model is developed by uti­


lizing an ensemble strategy with the aid of distinct machine learning
methods like the NN, fuzzy, and KNN for analyzing and predicting
healthcare data using Big Data approaches. Here, the parameters like
hidden nodes in NN, learning rate in NN, membership function in fuzzy
and neighbors count in KNN are optimized for enhancing the accuracy
and precision of the implemented disease prediction model. The objec­
tive function of the developed ensemble learning approach for disease
prediction is given in Eq. (18).
( )
1
OB2 = argmin (18)
{HndiNN ,LrtjNN ,Mmfkfuzzy ,Ngknn Acr + Pns
m }

In Eq. (18), the term HndNN


i denotes the optimized hidden nodes in
NN, which is optimized in the range [5, 255], LrtjNN indicates the opti­
mized learning rate in NN, which is optimized in the range [0.01, 0.09],
Mmfkfuzzy represents the optimized membership function of fuzzy, which
is optimized in the range [0.01, 0.09] and the term Ngmknn denotes the Fig. 5. Developed Ensemble Learning-based Multi-disease Prediction Model.
optimized neighbor count in KNN, which is optimized in the range [0, 7].
These parameters are optimized using the developed HFPBOA in order
to enhance the prediction accuracy Acr and precision Pns. Accuracy is f (p) =
1
(21)
computed using Eq. (19) and the precision is determined using Eq. (20), 1 + e− γp

respectively. In Eq. (21), the term γ denotes the constant controlling the slant
(
TREpos + TREneg
) capacity. The complete contribution of processing is by means of the
Acr = ( ) (19) handling unit d. The overall computation process is given in Eq. (22).
TREpos + TREneg + FLEpos + FLEneg

Netd = Wcd yc + θd (22)
TREpos
Pns = (20) c
TREpos + FLEpos
In Eq. (22), the term y′c denotes the outputs obtained from the pre­
In Eq. (19) and Eq. (20), the true positive is denoted by TREpos , true vious layer and Wcd indicates the weights of the connecting interfaces
negative is denoted by TREneg false positive is denoted by FLEpos , and from cth unit to dth unit. The graphical representation of the NN-based
false negative is denoted by FLEneg . The developed ensemble learning- prediction model is given in Fig. 6.
based disease prediction model is depicted in Fig. 5.

5.2. Machine learning model 1: Neural network 5.3. Machine learning model 2: Fuzzy classifier

NN [27] is used in this developed ensemble-based multi-disease The multi-disease prediction is done utilizing the fuzzy classifier
prediction model with the weighted features as its input. In this model, [30]. This model is used for identifying health-related diseases. The
the input units pass the data function p. The device handles various fuzzy classifier is involved with the fuzzy set theory to manipulate and
modules of the hidden layer and generates results at the output level. express uncertainty and ambiguity. The rules are produced with the
The entire unit is comprised of an activation function, which is the support of fuzzy logic. The input given to this model is the weighted
sigmoid function in this case as provided in Eq. (21). features FfkWop . This has been performed using the triangular

9
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Fig. 6. Diagrammatic representation of NN-based Prediction Model.

membership function as given in Eq. (23). 6. Results and discussions




⎪ 0, A⩽Low 6.1. Experimental setup



⎪ a − Low

⎪ Low < a⩽Medium
⎨ Medium − Low
Python platform was used for implementing the newly implemented
Mmfkfuzzy = (23)



High − a multi-disease prediction framework with the assistance of a machine
⎪ Medium < a < High


⎪ High − Medium learning structure to provide precise and accurate prediction results


0, A⩾High regarding diseases. Furthermore, the prediction results accomplished
from the developed HFPBOA-EL-based multi-disease prediction system
In Eq. (23), the triangular membership function is denoted by were analyzed by comparing it with different optimization algorithms
Mmfkfuzzy . Here, the considered medium operators are said to be “high, and with various other standard multi-disease prediction models to
low, and medium”. From this fuzzy classifier, various diseases are pre­ ensure computation efficiency. The population size and the maximum
dicted. The diagrammatical representation of the fuzzy-based prediction number of iterations that were considered for the implementation of the
model is given in Fig. 7. developed model were 10 and 25, respectively. Here, the ROC analysis,
cost function analysis, and validation measures analysis have been
5.4. Machine learning model 3: K-nearest Neighbour performed to guarantee the enhanced efficiency offered by the suggested
multi-disease prediction model. The traditional healthcare prediction
KNN [31] is employed in this multi-disease prediction model for techniques that were considered to analyze the effectiveness of the
diagnosing various diseases. This algorithm is considered as the non- implemented model were XG-Boost [32], DT [32], NN [30], Fuzzy [31],
parametric approach, where the input is considered as the k closest and KNN [33]. Moreover, the heuristic algorithms that were considered
training samples over the feature space. The class membership function for the evaluation of the performance of the implemented prediction
is observed as the output function. The classification is performed with model were the Salp Swarm Optimization (SSO) [34], Dingo Optimi­
the voting mechanism of the neighbors that are assigned to the classes zation Algorithm (DOA) [35], BOA [28], and FPA [29]. The experi­
between the nearest neighbors. The diagrammatic representation of the mental details of this research work have been listed as below. Here, the
KNN-based multi-disease prediction model is given in Fig. 8. processor of Intel core i3 is considered and also it contains the RAM size
of 8 GB as well as the 64 bits have been used. However, the phycharm
software is performed.

6.2. Evaluation metrics

The validation metrics that are used to compute the effectiveness of


the suggested HFPBOA-EL-based healthcare recognition model are given
as follows.
The value of FPR is determined using Eq. (24).
FLEpos
FPR = (24)
TREneg + FLEpos

The value of FDR is computed with the aid of Eq. (25).


TREpos
FDR = (25)
TREneg + FLEneg

Fig. 7. Diagrammatic representation of Fuzzy-based Prediction Model. The MCC is computed using Eq. (26).

10
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Fig. 8. Diagrammatic Representation of KNN-based disease Prediction Model.

TREpos + TREneg − FLEpos + FLEneg conventional disease detection models. The evaluation results are shown
MCC = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
)̅ (26)
( )( in Fig. 9. By analyzing the plots, it is seen that the offered HFPBOA-EL-
( TREpos + FLEpos )( TREneg + FLEpos ) based multi-disease prediction system has given better prediction results
TREpos + FLEneg TREneg + FLEneg
rather than the prior works.

6.3. ROC analysis of the implemented model against various disease 6.4. Evaluation of the performance of the executed multi-disease
prediction models prediction model using the cost function

The ROC curve analysis is considered for evaluating the effectiveness The cost function computation of the offered HFPBOA-EL-based
of the developed HFPBOA-EL-based multi-disease prediction system multi-disease prediction model for four different datasets is depicted
using four different benchmark datasets and is compared with the in the following Fig. 10. In this evaluation, various heuristic strategies

Fig. 9. ROC analysis on the suggested HFPBOA-EL-based multi-disease prediction model when compared with the prior works regarding “(a) Dataset 1 (b) Dataset 2
(c) Dataset 3 and (d) Dataset 4”.

11
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Fig. 10. Cost function analysis on the suggested HFPBOA-EL-based multi-disease prediction model when contrasted against distinct existing optimization algorithms
with respect to “(a) Dataset 1 (b) Dataset 2(c) Dataset 3 and (d) Dataset 4”.

are taken into consideration to ensure the effectiveness of the imple­ statistical measures is given in below Table 3. For performing this sta­
mented multi-disease prediction model. The suggested approach tistical analysis, various heuristic algorithms are considered and con­
attained enhanced cost function values of 72.72 %, 77.77 %, 79.31 %, trasted against the implemented multi-disease prediction model for
and 82.85 % better than the heuristic strategies like SSA-EL, DOA-EL, validating the performance of the developed prediction model. The
BOA-EL, and FPA-EL, respectively while considering dataset 3. All the suggested model attained a median value of 9.81 %, 19.03 %, 18.44 %,
datasets performed well by the suggested HFPBOA-EL-based multi-dis­ and 61.23 % improved than the existing heuristic strategies like SSA-EL,
ease prediction model for all iterations when compared with the other DOA-EL, BOA-EL, and FPA-EL, respectively while taking the 1st dataset.
algorithms. All the statistical measures that have been utilized to check the effec­
tiveness of the implemented model have given superior results for the
6.5. Effectiveness analysis of the executed multi-disease prediction model developed model when compared with various heuristic algorithms.
over various conventional optimization algorithms and prediction models
7. Conclusion
The performance of the developed HFPBOA-EL-based multi-disease
prediction model is assured by comparing its performance with various This research work aimed at the implementation of an effective
optimization strategies and previously developed prediction models multi-disease prediction framework with big data using ensemble ma­
while concerning various evaluation metrics are depicted in the below chine learning approaches. Initially, four datasets were used as the
Fig. 11 and Fig. 12, respectively. The suggested model prediction rate in source for collecting the necessary healthcare data. The collected
terms of precision is 9.09 %, 12 %, 23.52 %, and 29.23 % higher than the healthcare data was used in the feature extraction phase, where the
heuristic strategies like SSA-EL, DOA-EL, BOA-EL, and FPA-EL, respec­ statistical feature and PCA feature were obtained. The extracted features
tively while considering dataset 4. In addition, the accuracy and preci­ were further utilized in the weighted feature selection phase, in which
sion rate of the developed HFPBOA-EL-based multi-disease prediction the selection of the optimal features and the weights were carried out
model is also higher than the previously used prediction methodologies. using the implemented HFPBOA. The acquired optimal weighted fea­
tures were employed for the disease prediction phase. In this prediction
phase, the ensemble learning model was utilized, in which machine
6.6. Effectiveness computation of the executed multi-disease prediction learning models such as the NN, fuzzy, and KNN were utilized for
model in terms of statistical measures accurately predicting the diseases. The parameter optimization has
taken place in the ensemble learning model using the developed
The computational effectiveness of the developed HFPBOA-EL-based HFPBOA for enhancing the overall disease prediction performance. The
multi-disease prediction model when analyzed in terms of various

12
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Fig. 11. Effectiveness analysis of the suggested HFPBOA-EL-based multi-disease prediction model when contrasted against various optimization algorithms
regarding “(a) Accuracy (b) FPR (c) FDR and (d) Precision”.

experimental validations were carried out on the executed multi-disease diagnosing diabetes, mental disorders and other disorders. Hence, the
prediction model. Experimental results have shown that the prediction technicians could able to diagnose the better treatment for the particular
rate of the suggested model ensemble model in terms of precision was individuals.
9.09 %, 12 %, 23.52 %, and 29.23 % higher than the heuristic strategies Advantages and limitations of the model.
like SSA-EL, DOA-EL, BOA-EL, and FPA-EL, respectively while consid­ The developed HFPBOA-EL model is performed to provide better
ering dataset 4. Thus, the enhanced prediction outcomes provided by the performance in healthcare informatics. Here, the developed model is
executed multi-disease prediction model were assured. In practical performed to predict the disease in an effective manner which could
applications, health informatics can be applicable in various real time enhance the accuracy and precision of outcomes. Also, the developed
environments like disaster management, electronic health records, and HFPBOA algorithm is utilized to provide the optimal solutions to
healthcare in remote areas. However, these applications could be per­ enhance the feature propagation of the model. Thus, it is effectively
formed to enhance the quality and efficiency of the data while optimizes the parameters to enhance the performance. However, the

13
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Fig. 12. Effectiveness analysis of the suggested HFPBOA-EL-based multi-disease prediction model when contrasted against prior works regarding “(a) Accuracy (b)
FPR (c) FDR and (d) Precision”.

diverse performance metrics are validated to provide better perfor­ applications.


mance when compared with the existing methods and algorithms.
However, the developed model shows better results and thus, it needs to CRediT authorship contribution statement
focus on a few areas. The developed model needs to evaluate the real-
time applications. Additionally, the developed model needs to concen­ Abu Sarwar Zamani: Software. Aisha Hassan Abdalla Hashim:
trate on the different disorders using the multi-class classification. Software. Abdallah Saleh Ali Shatat: Software. Md. Mobin Akhtar:
Future work. Software. Mohammed Rizwanullah: Software. Sara Saadeldeen
In future work, the developed model will be concentrated in various Ibrahim Mohamed: Software.
clinical applications for the early prediction of various heart, kidney,
and diabetic diseases. Additionally, the enhanced ensemble models will
be implemented and also it have been applicable in real time

14
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

Table 3 [7] A. Kumar, K. Abhishek, P. Nerurkar, M.R. Khosravi, Muhammad Rukunuddin


Algorithmic validation on developed multi-disease prediction system for Ghalib & Achyut Shankar, “big data analytics to identify illegal activities on bitcoin
blockchain for IoMT,”, Pers. Ubiquit. Comput. (2021).
different datasets. [8] Sunder Ali Khowaja, Parus khuwaja, Kapal dev & Giuseppe D’Aniello, “VIRFIM: an
Performance SSA-EL DOA-EL BOA-EL FPA-EL HFPBOA- AI and internet of medical things-driven framework for healthcare using smart
measures [34] [35] [28] [29] EL sensors,”, Neural Comput. & Applic. (2021).
[9] M. Hashem, A.E. Youssef, Teeth infection and fatigue prediction using optimized
For dataset 1 neural networks and big data analytic tool, Clust. Comput. 23 (2020) 1669–1682.
Worst 2.716303 3.363279 2.905897 1.539785 3.313618 [10] S.A. Moqurrab, N. Tariq, A. Anjum, A. Asheralieva, S.U.R. Malik, H. Malik,
Best 2.460593 2.041682 1.747127 1.539785 1.507474 H. Pervaiz, S.S. Gill, A deep Learning-based privacy-preserving model for smart
Mean 2.511735 2.332303 2.674143 1.539785 2.424926 healthcare in internet of medical things using fog computing, Wirel. Pers.
Median 2.260593 2.085518 2.095897 1.539785 2.482445 Commun. 126 (2022) 2379–2401.
Standard 0.102284 0.515767 0.463508 2.22E-16 0.889622 [11] C. Ji, Hu. Yupeng, K. Wang, P. Zhan, X. Li, X. Zheng, Identifiable temporal feature
deviation selection via horizontal visibility graph Towards Smart medical applications,
Interdiscip. Sci.: Comput. Life Sci. 13 (2021) 717–730.
[12] Y.-C. Shen, T.-C. Hsia, C.-H. Hsu, Software optimization in ultrasound imaging
For dataset 2 technique using improved deep belief Learning network on the internet of medical
Worst 5.463833 4.107082 2.914553 4.482257 1.830504 things platform, Wirel. Pers. Commun. (2021).
Best 2.535012 1.686394 1.636323 1.567797 1.554054 [13] J.-W. Lin, J.M. Arul, J.-T. Kao, A bottom-up tree based storage approach for
Mean 2.848209 2.296648 1.764146 2.14912 1.719924 efficient IoT data analytics in cloud systems, Journal of Grid Computing 19 (2021)
10.
Median 2.535012 1.686394 1.636323 1.567797 1.830504
[14] A.G. Mohapatra, J. Talukdar, Ch. Tarini, Mishra, Sameer Anand, Ajay jaiswal,
Standard 0.872795 0.937584 0.383469 1.120798 0.135433
Ashish Khanna & Deepak Gupta, “fiber bragg grating sensors driven structural
deviation health monitoring by using multimedia-enabled IoT and big data technology,”,
Multimed. Tools Appl. 81 (2022) 34573–34593.
For dataset 3 [15] E. Adi, A. Anwar, Zubair Baig&SheraliZeadally, “machine learning and data
analytics for the IoT,”, Neural Comput. & Applic. 32 (2020) 16205–16233.
Worst 2.814894 5.892745 4.119697 2.644616 2.329871
[16] Y. Himeur, M. Elnour, FodilFadli, Nader meskin, Ioan petri, YacineRezgui,
Best 2.319189 2.133735 1.896879 1.732887 1.593327
FaycalBensaali & Abbes Amira, “AI-big data analytics for building automation and
Mean 2.472381 2.509636 2.805089 2.188751 1.666981 management systems: a survey, actual challenges and future perspectives,”, Artif.
Median 2.33039 2.133735 3.268735 2.188751 1.593327 Intell. Rev. (2022).
Standard 0.224388 1.127703 0.78118 0.455864 0.220963 [17] H. Jamil, T. Umer, CelalCeken&Fadi al-Turjman, “decision based model for real-
deviation time IoT analysis using big data and machine Learning,”, Wirel. Pers. Commun.
121 (2021) 2947–2959.
[18] A. Kumar, K. Sharma, A. Sharma, Genetically optimized fuzzy C-means data
For dataset 4 clustering of IoMT-based biomarkers for fast affective state recognition in
Worst 2.330728 4.323422 5.542967 4.606126 4.640647 intelligent edge analytics, Appl. Soft Comput. 109 (September 2021) 107525.
Best 2.330728 1.955043 1.825276 1.558554 1.393905 [19] Tian-Hoe Tan, Chien-Chin Hsu, Chia-Jung Chen, Shu-Lien Hsu, Tzu-Lan Liu, Hung-
Mean 2.330728 3.171602 2.305341 2.821035 2.490484 Jung Lin, Jhi-Joung Wang, Chung-Feng. Liu, Chien-Cheng. Huang, Predicting
Median 2.330728 3.301077 1.825276 2.921229 2.210922 outcomes in older ED patients with influenza in real time using a big data-driven
Standard 0 1.155454 1.091217 1.182603 1.199788 and machine learning approach to the hospital information system, BMC Geriatrics
deviation 21 (2021) 280.
[20] M. Safa, A. Pandian, Intelligent big data analytics model for efficient Cardiac
disease prediction with IoT Devices in WSN using fuzzy rules, Wirel. Pers.
Commun. (2021).
Declaration of competing interest
[21] A. Ed-daoudy and KhalilMaalmi, A new Internet of Things architecture for real-
time prediction of various diseases using machine learning on big data
The authors declare that they have no known competing financial environment, Journal of Big Data 6 (2019) 104.
interests or personal relationships that could have appeared to influence [22] V. Manogaran, R. Vijayakumar, Varatharajan, PriyanMalarvizhi Kumar, Revathi
Sundarasekar & Ching-Hsien hsu, “machine Learning based big data processing
the work reported in this paper. framework for cancer diagnosis using hidden Markov model and GM
ClusteringGunasekaran,”, Wirel. Pers. Commun. 102 (2018) 2099–2116.
Data availability [23] T. Jalal Laassiri, Using big data-machine learning models for diabetes prediction
and flight delays analytics, Journal of Big Data 7 (2020) 78.
[24] Md. LirimAshiku, Al-Amin, Sanjay madria, CihanDagli, “machine Learning models
No data was used for the research described in the article. and big data tools for evaluating kidney acceptance,”, Procedia Comput. Sci. 185
(2021) 177–184.
[25] D.A. Pustokhin, I.V. Pustokhina, P. Rani, V. Kansal, M. Elhosenye, G.P. Joshi,
Acknowledgements K. Shankarg, Optimal deep learning approaches and healthcare big data analytics
for mobile networks toward 5G, Comput. Electr. Eng. 95 (2021) 107376.
“This study is supported via funding from Prince Sattam bin Abdu­ [26] Myoung Soo Park, Jin Hee Na, and Jin Young Choi, “PCA-based feature extraction
using class information,” IEEE International Conference on Systems, Man and
laziz University project number (PSAU/2024/R/1445)” Cybernetics, Waikoloa, HI, USA, pp. 341-345 Vol. 1, 2005.
[27] F. Khelifi, J. Jiang, K -NN regression to improve statistical feature Extraction for
References texture retrieval, IEEE Trans. Image Process. 20 (1) (2011) 293–298.
[28] Y. Marinakis, M. Marinaki, Bumble bees mating optimization algorithm for the
vehicle routing problem, Handbook of Swarm Intelligence (2011) 347–369.
[1] W. Yu, Y. Liu, T. Dillon, W. Rahayu, F. Mostafa, An integrated framework for health
[29] D.F. Alam, D.A. Yousri, M.B. Eteiba, Flower Pollination Algorithm based solar PV
state monitoring in a Smart factory employing IoT and big data techniques, IEEE
parameter estimation, Energy Conv. Manag. 101 (2015) 410–422.
Internet Things J. 9 (3) (2022) 2443–2454.
[30] A. de Medeiros Martins, A.D.D. Neto, J.D. de Melo, Neural networks applied to
[2] M. Zheng, S. Bai, Implementation of universal health Management and monitoring
classification of data based on Mahalanobis metrics,, Proc. Int. Joint Conf. Neural
system in resource-constrained environment based on internet of things, IEEE
Net. 4 (2003) 3071–3076.
Access 9 (2021) 138744–138752.
[31] P.P. Angelov, X. Zhou, Evolving fuzzy-rule-based classifiers from data streams,
[3] N.R. Sivakumar, F.K.D. Karim, An IoT based big data framework using equidistant
IEEE Trans. Fuzzy Syst. 16 (6) (2008) 1462–1475.
heuristic and duplex deep neural network for diabetic disease prediction,
[32] R. Ghorbani, R. Ghousi, Comparing different resampling methods in predicting
J. Ambient Intell. Humanized Comput. (2021).
students’ performance using machine Learning techniques, IEEE Access 8 (2020)
[4] V. Subramaniyaswamy, R. Gunasekaran Manogaran, V.V. Logesh, D.N.C. Malathi,
67899–67911.
N. Senthilselvan, Retracted article: an ontology-driven personalized food
[33] H. Zhu, X. Wang, R. Wang, Fuzzy monotonic K-Nearest neighbor versus monotonic
recommendation in IoT-based healthcare system, J. Supercomput. 75 (2019)
fuzzy K-Nearest neighbor, IEEE Trans. Fuzzy Syst. 30 (9) (2022) 3501–3513.
3184–3216.
[34] M. Yaghoubi, M. Eslami, M. Noroozi, H. Mohammadi, O. Kamari, S. Palani,
[5] J. Andrew Onesimu, J. Karthikeyan, Y. Sei, An efficient clustering-based
Modified salp Swarm optimization for parameter estimation of Solar PV models,
anonymization scheme for privacy-preserving data collection in IoT based
IEEE Access 10 (2022) 110181–110194.
healthcare services, Peer-to-Peer Networking and Applications 14 (2021)
1629–1649.
[6] Y. Zhong, L. Chen, C. Dan, A. Rezaeipanah, A systematic survey of data mining and
big data analysis in internet of things, J. Supercomput. (2022).

15
A.S. Zamani et al. Biomedical Signal Processing and Control 94 (2024) 106247

[35] Pooja Singh; Marcello Carvalho Reis; Victor Hugo C. Albuquerque, Design of [37] M. Vara Siddardha Reddy, R. Sai Prasad, R. Sai Jagan, M. Selvi, Artificial
Artificial Intelligence Enabled Dingo Optimizer for energy Management in 6G intelligence for IoT-based Healthcare System, 2023, International Conference on
communication networks. AI-Enabled 6G Networks and Applications, Wiley, 2023. Computer Communication and Informatics (ICCCI) (2023) 1–5.
[36] M. Woźniak, M. Wieczorek, J. Siłka, BiLSTM deep neural network model for
imbalanced medical data of IoT systems, Futur. Gener. Comput. Syst. 141 (2023)
489–499.

16

View publication stats

You might also like