Optimized deep neural network architectures
Optimized deep neural network architectures
Keywords: Accurate time-series forecasting of energy consumption and photovoltaic (PV) production is essential for
Photovoltaic production effective energy management and sustainability. Deep Neural Networks (DNNs) are effective tools for learning
Deep neural networks complex patterns in such data; however, optimizing their architecture remains a significant challenge. This
Meta-heuristic algorithms
paper introduces a novel hybrid optimization approach that integrates Genetic Algorithms (GA) and Particle
Time series forecasting
Swarm Optimization (PSO) to enhance the DNN architecture for more accurate energy forecasting. The
performance of GA-PSO is compared with leading hyperparameter optimization techniques, such as Bayesian
Optimization and Evolutionary Strategy, across various optimization benchmarks and DNN hyperparameter
tuning tasks. The study evaluates the GA-PSO-enhanced Optimized Deep Neural Network (ODNN) against
traditional DNNs and state-of-the-art machine learning methods on multiple real-world energy forecasting
tasks. The results demonstrate that ODNN outperforms the average performance of other methods, achieving
a 27% improvement in forecasting accuracy and a 22% reduction in error across various metrics. These
findings demonstrate the significant potential of GA-PSO as an effective tool to optimize DNN models in energy
forecasting applications.
1. Introduction this domain include Particle Swarm Optimization (PSO) [1], the Arti-
ficial Bee Colony algorithm [2], Social Spider Optimization [3], and
In today’s energy landscape, the growing global demand and the fi- Genetic Algorithm [4]. Beyond these, a multitude of other algorithms
nite fossil fuel dependence pose a critical challenge. Time series energy in this field are documented, spanning Refs. [5–17], and [18].
consumption forecasting, utilizing algorithms like metaheuristics, is DNNs, a crucial component of machine learning, employ diverse
crucial for sustainable resource allocation. This task involves predicting strategies to learn new tasks based on data. They stand out for their ex-
future energy usage, addressing challenges such as data variability and ceptional predictive accuracy, harnessing insights from historical data,
nonlinearity. Methods like statistical models and machine learning are and providing powerful computational learning approaches. Gradient-
employed for efficient resource management, cost optimization, and based optimization methods fine-tune model parameters to minimize
stable energy supply. Optimizing neural network architectures is key, cost functions, enhancing model adaptability across diverse settings.
improving accuracy and adaptability to evolving energy patterns. This Despite their effectiveness, challenges arise with the backpropagation
optimization enhances reliability and efficiency, contributing to cost algorithm, vital for neural network training, due to sensitivity to noisy
savings, resource conservation, and more effective decision-making in data, time-consuming processes, and susceptibility to local minima. Ad-
the energy sector toward a sustainable future. ditionally, these methods grapple with issues like determining optimal
Meta-heuristic algorithms, inspired by swarm intelligence, nature, step sizes, the possibility of converging to multiple local optima, and
biomimicry, physics, and scientific theories, excel in solving complex high computational complexity.
optimization problems. By mimicking collaborative behaviors observed Recent advancements in machine learning have introduced physics-
in nature and leveraging insights from diverse disciplines, these algo- informed methodologies, which integrate domain-specific physical laws
rithms efficiently navigate intricate solution spaces. Their adaptability into data-driven models to enhance accuracy and efficiency. Among
and collective intelligence make them indispensable tools for address- these, the Deep Energy Method (DEM) offers a unique alternative to the
ing real-world optimization challenges. Several notable algorithms in widely used Physics-Informed Neural Networks (PINNs). Unlike PINNs,
∗ Corresponding author.
E-mail address: [email protected] (E. Hosseini).
https://doi.org/10.1016/j.esr.2025.101704
Received 15 September 2024; Received in revised form 7 March 2025; Accepted 23 March 2025
Available online 6 April 2025
2211-467X/© 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
2
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Table 1
Advantages and disadvantages of existing energy forecasting methods.
Category Methodology Advantages Disadvantages
Traditional statistical ARIMA, ARIMAX, - Simple and interpretable - Struggles with nonlinear
models Exponential Smoothing - Effective for short-term patterns
forecasting with linear patterns - Requires strong assumptions on
stationarity
Deep learning models LSTM, GRU, CNN, DNN - Capable of capturing temporal - High computational cost
dependencies - Susceptible to overfitting
- Handles large-scale datasets - Requires extensive
effectively hyperparameter tuning
Hybrid statistical-ML ARIMA-DNN, - Combines strengths of statistical - Increased complexity
models ARIMA-LSTM, and ML approaches - Needs careful feature
XGBoost-based - Improves accuracy over engineering and hyperparameter
standalone models tuning
Metaheuristic-based GA, PSO, ABC, Social - Efficient for optimizing - Computationally expensive
optimization Spider hyperparameters - Convergence can be slow
- Avoids local minima in - Risk of premature convergence
optimization
Physics-informed deep PINNs, Deep Energy - Incorporates domain knowledge - Not directly applicable to purely
learning Method (DEM) - Improves generalization in data-driven forecasting
physics-based problems - Requires physical laws
integration
optimization techniques to detect such attacks and improve system 2. Demonstrating how ODNN improves both accuracy and effi-
security through state forecasting. On the other hand, in the realm ciency in forecasting models for precise energy consumption
of smart grids, wind speed prediction plays a crucial role in ensur- predictions.
ing efficient energy distribution and system stability. [56] employed 3. Optimizing neural network architectures efficiently through a
a two-layer nonlinear combination technique for accurate short-term balanced approach, combining global exploration and exploita-
wind speed prediction, demonstrating the effectiveness of combin- tion.
ing extreme learning machines and neural networks to enhance fore-
casting accuracy. These studies highlight the increasing importance Table 1 compares different energy forecasting methods, outlining
of advanced machine learning algorithms and hybrid techniques in their respective advantages and disadvantages. It covers traditional
addressing forecasting challenges in energy systems. statistical models, deep learning models, hybrid statistical ML models,
Time series forecasting methods range from traditional models such metaheuristic-based optimization, and physics-informed deep learn-
as ARIMA [57], which are simple but struggle with nonlinear patterns, ing approaches. Each method is evaluated based on its strengths and
to combining ARIMA and DNN [58], and advanced techniques like deep limitations in the context of energy forecasting.
neural networks (LSTM, GRU) [59], which excel in capturing temporal The ODNN framework effectively addresses several critical gaps
dependencies but require intensive computation and tuning. Meta- identified in the literature and highlighted in Table 1. Traditional statis-
heuristic algorithms, such as GA and PSO, optimize model parameters tical models fail to capture nonlinear patterns and deep learning models
efficiently, with GA offering robust global search, and PSO providing often suffer from high computational costs and overfitting risks. Hybrid
faster local refinement. However, these methods face challenges like statistical-ML models increase complexity and reliance on extensive
overfitting and computational demands. By combining GA and PSO feature engineering and optimization techniques focused on accuracy
with DNNs, this study addresses these limitations, improving forecast or efficiency. Existing neural network optimization techniques often
accuracy and efficiency. Harnessing metaheuristics to enhance DNN suffer from suboptimal architectures due to static configurations or the
training for time series energy consumption forecasting offers a promis- reliance on single optimization methods, which reduces their practical
ing avenue for improving accuracy and efficiency. This study introduces applicability to complex forecasting scenarios.
a novel hybrid framework that combines GA and PSO to optimize ODNN overcomes these challenges by dynamically tuning DNN
DNN architectures, referred to as ODNN. The approach leverages GA’s architectures to achieve both accuracy and efficiency. It effectively
global exploration and PSO’s local refinement to balance optimization, addresses scalability and adaptability issues that many existing models
achieving precise and efficient forecasting. face by balancing global exploration (GA) with local refinement (PSO).
Extensive experiments have been conducted to compare the per- This hybrid optimization approach ensures that ODNN remains ro-
formance of DNNs with and without the hybrid optimization, demon- bust, adaptable, and accurate under various forecast conditions, filling
strating notable improvements in accuracy and generalization. This the gap left by previous methods that fail to combine these aspects
study also highlights the practical implications of ODNN in energy efficiently.
management and sustainable planning, addressing key challenges such
as scalability and adaptability. Unlike traditional methods focusing on 2. Dataset overview
either accuracy or computational efficiency, ODNN achieves both by
dynamically optimizing DNN architectures. The proposed framework This section explores three key aspects: data description, time series
establishes itself as a state-of-the-art solution for energy consumption analysis, and feature extraction. We analyze the characteristics of the
and PV production forecasting. Existing methods for neural network dataset, identify temporal patterns, and investigate the extraction of
optimization often face challenges in balancing global exploration and meaningful features. Together, these discussions lay the foundation for
local exploitation, which can lead to suboptimal architectures and a deeper understanding and informed analysis in the following sections.
reduced scalability in practical applications. The main contributions of For our experiments, a comprehensive set of twelve datasets, namely
this work are: (AEP, COMED, DAYTON, DEOK, DOM, DUQ, EKPC, FE, NI, PJM, PJME,
PJMW), was employed. These datasets cover over a decade of hourly
1. Introducing ODNN, a hybrid model that incorporates GA, PSO, energy consumption data from PJM Interconnection LLC (PJM) in
and DNN to enhance DNN efficiency. megawatts.
3
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
3. Proposed method
4
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
3.1. Steps of the algorithm 4. Genetic Operators: Two fundamental genetic operators, the
two-point crossover strategy for crossover and the uniform in-
The algorithm begins by employing GA to methodically explore teger mutation for mutation, are implemented. These opera-
and optimize the architecture of the neural network. In this phase, tors enhance the exploration of diverse architectures within the
GA operates by iterating through various architectures, assessing each population.
one based on a fitness function to identify the most promising config- 5. Evolution: The GA population progresses through a series of
urations. The fitness function evaluates performance, typically using a generations, as specified by the user. Individuals with superior
metric such as mean squared error (MSE), to ensure that the network architectures are more likely to be selected and carried over to
is capable of making accurate predictions. Once the GA identifies the subsequent generations.
best-performing architecture, this optimal design serves as the initial 6. Selection: Tournament selection is utilized to determine in-
candidate for the next phase, where the algorithm transitions to PSO. dividuals for reproduction in each generation, promoting the
PSO refines and further optimizes the architecture by dynamically retention of superior architectures. At the end of the GA process,
adjusting the parameters based on the behavior of particles in a search the optimal architecture is identified based on the fitness func-
space, ultimately enhancing the performance of the network. This tion, representing the configuration that achieved the minimum
dual-phase approach ensures that the neural network’s architecture mean squared error in the validation dataset.
is optimized both globally and locally, balancing exploration with 7. Initialization and Bounds - PSO: The initial particle in the PSO
fine-tuning. The key steps within this GA-PSO-DNN optimization frame- phase is set as the best architecture uncovered by the GA. The
work include fitness evaluation, crossover, mutation, selection, and search space for PSO is confined by upper and lower bounds,
the refinement of architectures using particle-based adjustments, all which delineates the acceptable range for the number of neurons
aimed at achieving the most effective deep learning model. The pivotal in each hidden layer.
stages within the GA-PSO-DNN optimization algorithm encompass the 8. Objective Function: In the PSO phase, the objective function
following key steps: aligns with the fitness function utilized in the GA phase, as-
sessing the mean squared error in the validation dataset. The
Algorithm 1 GA-PSO-DNN Algorithm primary aim of PSO is to minimize this error.
1: Initialize GA Population 9. Optimization: PSO is employed iteratively to update the archi-
2: Generate random neural network architectures tecture parameters, striving to minimize the objective function.
3: Encode architectures as chromosomes (neuron counts for two This optimization method fine-tunes the neural network archi-
hidden layers) tecture by dynamically adjusting the number of neurons in the
4: while GA termination criteria not met do hidden layers, taking cues from the optimal solution identified
5: Evaluate Fitness Function during the GA phase.
6: Calculate mean squared error (MSE) on validation dataset
7: Apply Genetic Operators The pseudocode describing the detailed steps of the proposed algorithm
8: Perform two-point crossover is presented in Algorithm 1, while the corresponding flowchart that
9: Apply uniform integer mutation illustrates the overall process and decision flow is shown in Fig. 2.
10: Selection Together, these visual and textual representations provide a clear and
11: Use tournament selection to choose individuals for reproduction comprehensive understanding of the algorithm’s operation, specifically
within the context of optimizing DNNs. The process begins with the
12: Generate New Population initialization phase using the GA to explore and refine the DNN ar-
13: end while chitecture, followed by the PSO phase, which further fine-tunes the
14: Initialize PSO architecture to enhance performance. These tools not only clarify the
15: Set initial particle as the best architecture from GA structure of the algorithm, but also highlight the key operations and
16: Define upper and lower bounds for neuron counts decision points that drive its optimization process.
17: while PSO termination criteria not met do
18: Evaluate Objective Function 3.2. Practical implementation details
19: Calculate MSE on validation dataset
20: Update Particle Positions 3.2.1. Feature extraction
21: Adjust neuron counts based on PSO principles
A crucial facet of feature engineering lies in creating time-based fea-
22: end while
tures, particularly relevant for time-series data. These features, derived
23: Return Optimal Architecture
from temporal information like day of the week, month, or season,
24: Output the neural network architecture with the lowest MSE
prove instrumental in capturing nuanced temporal patterns. In this
research, we have crafted several time-based features from the core
1. Fitness Function: A fitness function is defined to quantitatively dataset, including Hour, Day of the week, Quarter, Month, Year, Day
assess the performance of a given neural network architecture. of the year, Day of the month, and Week of the Year.
The mean squared error loss on a validation dataset serves as To visually interpret the distribution characteristics of energy con-
the evaluative metric within this function. The neural network’s sumption, we employ a box plot. Fig. 3 showcases each hour as an
architecture is encoded as a chromosome, where genes represent individual box, delineating the interquartile range (IQR) of energy
the number of neurons in the two hidden layers. The fitness consumption within that timeframe. A horizontal line within the box
function evaluates the performance of the neural network with denotes the median energy consumption, while whiskers extend to
this architecture in the validation dataset. illustrate consumption range within 1.5 times the IQR. Outliers, beyond
2. Chromosome Representation: Each member of the GA popula- this range, are visualized separately. This box plot provides a compre-
tion is represented by two numbers, which indicate the number hensive overview of the dataset, quickly revealing central tendencies,
of neurons in the two hidden layers of the neural network. data spread, and any anomalies or extreme consumption patterns.
3. Initialization: The GA population is initialized by generating The visualization offers valuable insights into energy usage variability
a set of individuals, each characterized by a random neural throughout the day, empowering informed decision-making for energy
network architecture. management and optimization strategies.
5
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Analyzing the annual energy consumption patterns provides a holis- vividly illustrating the interquartile range (IQR) of energy consumption
tic understanding of the dynamics shaping demand throughout the within specific timeframes. This visual aid not only enhances compre-
year. Seasonal variations, long-term trends, and the distinctive impact hension but also facilitates more informed decision-making for both
of holidays and special events emerge as key facets of this comprehen- energy providers and consumers.
sive examination. Beyond mere observation, this data-driven approach
Although a typical split ratio often allocates 70%–80% of the data
serves as a linchpin for optimizing infrastructure planning, ensuring en-
ergy efficiency, strategic budgeting, seamless integration of renewable for training and 20%–30% for testing, these percentages may vary
energy sources, and unwavering adherence to regulatory compliance. based on the size and characteristics of the dataset. In some cases,
The graphical representation in Fig. 3 effectively encapsulates this more advanced techniques, such as cross-validation, may be employed
wealth of information by delineating each month as a distinct box, for robust evaluation. In our specific dataset, the data post-2015 are
6
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
designated as the test set for some experiments, and September 2022 architecture with encoded parameters which indicate the
for others, as illustrated in Fig. 4. number of neurons in the two hidden layers of the neural
network. - The fitness of each solution is evaluated based
3.2.2. Fine-tuning hyperparameters on the model’s performance, using the mean squared error
The implementation of the proposed GA-PSO-DNN framework re- (MSE) or mean absolute error (MAE) on the validation
quires careful attention to several practical considerations, such as dataset. The fitness function quantitatively assesses the
parameter tuning, computational resource allocation, and scalability performance by computing the MSE loss on the validation
for larger datasets or real-time applications. dataset. - Genetic operations such as selection, crossover,
and mutation are applied to generate new candidate so-
1. Optimization Parameters The optimization process heavily re- lutions, iteratively improving the population over mul-
lies on selecting appropriate parameters for both the GA and PSO tiple generations. Specifically, the two-point crossover
phases. The key parameters that require tuning include: strategy and uniform integer mutation are implemented
for effective exploration. - Tournament selection is used
(a) GA Parameters: The population size, crossover rate, mu- to promote the retention of superior architectures for
tation rate, and the number of generations need to be reproduction in each generation, with individuals with
configured. Typically, a larger population and more gen- superior architectures being more likely to be selected for
erations improve exploration but increase computational subsequent generations. - At the end of the GA phase,
costs. We performed a sensitivity analysis to find the the optimal architecture is identified based on the fit-
most suitable values, which balance performance and ness function, which achieves the minimum MSE on the
efficiency. validation dataset.
(b) PSO Parameters: The number of particles, inertia weight, (b) PSO Phase: - The best-performing architecture from the
cognitive and social coefficients, and the maximum num- GA phase is used as the initial particle for the PSO phase.
ber of iterations are crucial for PSO’s effectiveness in - PSO fine-tunes the hyperparameters and weights of the
refining the architecture. We found that adjusting these DNN by optimizing the particle positions in the solution
parameters dynamically during the optimization process space. Each particle adjusts its position based on its own
enhanced convergence speed and accuracy. best-known solution and the global best solution found
by the swarm, guided by inertia, cognitive, and social
2. Optimization Process of GA-PSO for DNN: The GA-PSO frame- coefficients. - The search space for PSO is confined by
work operates in a hybrid manner to optimize the architecture upper and lower bounds, which delineate the acceptable
and hyperparameters of the DNN. The detailed steps of the range for the number of neurons in each hidden layer. -
process are as follows: The objective function in the PSO phase aligns with the
fitness function from the GA phase, focusing on minimiz-
(a) GA Phase: - The GA is employed for the initial global ing the MSE on the validation dataset. - PSO is employed
search to explore potential DNN architectures. - Each can- iteratively to update the architecture parameters, dynami-
didate solution (chromosome) represents a unique DNN cally adjusting the number of neurons in the hidden layers
7
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Table 2
Hyperparameters for GA-PSO-DNN model.
Hyperparameter Description Value/Max
GA population size Number of solution in each generation 20
GA crossover rate Probability of crossover between two parent solutions 0.8
GA mutation rate Probability of mutation in offspring 0.1
GA generations Number of generations to evolve the population 50
GA selection method Method for selecting individuals for reproduction Tournament
PSO number of particles Number of particles in the PSO swarm 20
PSO inertia weight Weight that controls the influence of the previous velocity 0.9
PSO cognitive coefficient Weight that controls particle’s self-awareness 2.0
PSO social coefficient Weight that controls particle’s social influence 2.0
PSO Max iterations Maximum number of iterations for PSO optimization 50
DNN learning rate Step size for updating weights during training 0.001
DNN batch size Number of training samples per batch 32
DNN epochs Number of training iterations 20
DNN dropout rate Fraction of input units to drop for regularization 0.2
DNN hidden layers Number of layers in the neural network 2
DNN neurons per layer Number of neurons in each hidden layer By GA-PSO
to refine the architecture based on the optimal solution 3.3. Limitations of the proposed framework
identified during the GA phase.
(c) Convergence and Output: - The optimization process While the proposed GA-PSO-DNN framework demonstrates substan-
concludes when a convergence criterion is met, such as tial improvements in forecast accuracy and computational efficiency, it
reaching a predefined number of iterations or achieving is important to recognize the following limitations:
a satisfactory fitness threshold. - The optimized DNN
architecture and its hyperparameters are then finalized 1. Computational Overhead: The hybrid nature of the GA and
for further training and testing. PSO approach introduces significant computational demands,
especially for larger datasets or more complex architectures.
3. Computational Resources: For our experiments, we utilized the Although parallelization strategies can mitigate this to some
following resources: extent, further optimization of the metaheuristic parameters is
necessary to reduce training time.
(a) Hardware: The optimization was performed on a high-
2. Scalability to Real-Time Applications: The current implemen-
performance computing cluster equipped with multiple
tation focuses on offline optimization and lacks direct integra-
GPUs to accelerate the training of DNNs and the optimiza-
tion with real-time forecasting systems. Adapting the framework
tion process.
for real-time applications requires further exploration, such as
(b) Software: The implementation was done using Python
incremental learning or faster convergence techniques.
with libraries such as TensorFlow for the neural net-
3. Dataset Dependency: The performance of the framework is
work training, DEAP for genetic algorithm operations,
highly dependent on the quality and characteristics of the
and PySwarms for PSO. Parallelization techniques were
dataset. The handling of noisy, incomplete, or highly dynamic
employed to speed up the evaluation of fitness functions
across multiple generations and particles. datasets may require additional preprocessing steps or robust
mechanisms to ensure model reliability.
The overall computational time for each experiment varied de- 4. Generality Across Domains: Although the framework has
pending on the size of the dataset, the complexity of the DNN, shown strong performance in energy forecasting tasks, its adapt-
and the number of generations or iterations in the GA and PSO ability and effectiveness in other domains with different data
phases. On average, the training time per model ranged from a patterns remain untested. More experiments across diverse ap-
few hours to several days, depending on the configuration. plications are needed to validate its generalizability.
4. Scalability and Real-Time Applications: While the proposed
method demonstrates strong performance on smaller datasets, 4. Computational results
its scalability to larger datasets and real-time forecasting appli-
cations remains a key consideration: In this section, we employ the proposed algorithm within the do-
main of deep learning, evaluating its performance using six distinct
(a) Larger Datasets: To handle larger datasets, the algorithm criteria. We conduct a comparative analysis between two scenarios: the
can be further parallelized. The GA-PSO optimization Basic Deep Neural Network (BDNN) and the Optimized Deep Neural
phases can be distributed across multiple machines to Network (ODNN). Our evaluation involves benchmarking against state-
accelerate the evaluation of potential architectures. Ad- of-the-art algorithms to assess the algorithm’s effectiveness in both
ditionally, model size reduction techniques, such as prun- configurations. First, we compare the proposed GA-PSO algorithm with
ing or weight quantization can be employed to optimize Bayesian Optimization and an Evolutionary Strategy to analyze the effi-
memory usage and inference time without compromising ciency and robustness of each optimization technique. This comparison
accuracy.
will highlight the relative strengths and weaknesses of these methods
(b) Real-Time Forecasting: For real-time applications, such in optimizing deep neural network architectures, enabling us to better
as predicting energy consumption and forecasting of PV understand their performance across various test scenarios.
algorithm must be adapted to achieve faster convergence.
Techniques such as transfer learning could be explored, 4.1. Evaluation criteria
where a pre-trained model is fine-tuned with new data,
reducing the computational overhead during deployment. In this study, we evaluate the performance of the regression model
using six criteria, namely Root Mean Squared Error (RMSE), Mean
Absolute Error (MAE), R-squared (𝑅2 ) Score, Median Absolute Error
Table 2 shows the details of the hyperparameters for GA-PSO-DNN (MedAE), Explained Variance Score (EV), and Relative Absolute Error
Model. (RAE). These criteria are defined as follows:
8
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
in predicting the target value. Mathematically, it is defined as: MSE 0.0435 0.0840 0.3292 1.9883 0.0572 0.0204
√ MAE 0.1763 0.0762 0.2923 0.4483 0.1907 0.0322
√ 𝑛
√1 ∑ RMSE 0.1546 0.0135 0.3462 0.4576 0.2360 0.0385
RMSE = √ (𝑦 − 𝑦̂𝑖 )2 (1) R2 0.9767 0.0152 0.6270 2.2528 0.9352 0.0231
𝑛 𝑖=1 𝑖 MAPE 20.8455 0.1095 21.7854 6.5686 20.8541 0.1951
Max Error 0.0531 0.0324 0.3467 0.5783 0.0790 0.0821
where 𝑛 is the number of observations, 𝑦𝑖 represents the actual Adjusted R2 0.9460 0.0092 0.6321 2.2467 0.9361 0.0246
values, and 𝑦̂𝑖 represents the predicted values.
2. Mean Absolute Error (MAE):
MAE measures the average absolute differences between pre-
We compare GA-PSO with Bayesian Optimization and Evolution-
dicted and actual values, offering robustness against outliers. It
ary Strategy by evaluating their convergence behavior, computational
is calculated as:
efficiency, and the quality of the final solutions. Each algorithm was
1∑
𝑛
MAE = |𝑦 − 𝑦̂𝑖 | (2) applied to a set of optimization test functions, such as the Sphere,
𝑛 𝑖=1 𝑖 Rastrigin, and Ackley functions, which are known for their varying
3. R-squared (𝑅2 ) Score: degrees of complexity.
𝑅2 is a metric indicating the proportion of variance in the The performance of the algorithms was measured by their best
dependent variable (𝑅2 ) that is predictable from the independent achieved fitness values and the number of generations required to
variables (features). Ranges from 0 to 1, with higher values converge to the optimal solution. The results of this comparison is
denoting better fit to the model. shown in Fig. 6, which provides insight into the relative advantages of
∑ our combining of GA and PSO for solving complex optimization prob-
(𝑦𝑖 − 𝑦̂𝑖 )2
𝑅2 = 1 − ∑𝑖 (3) lems, as well as highlight how Bayesian Optimization and Evolutionary
̄2
𝑖 (𝑦𝑖 − 𝑦) Strategy perform in comparison to a hybrid approach like GA-PSO.
4. Median Absolute Error (MedAE):
MedAE utilizes the median instead of the mean and is resilient 4.2.2. Comparison on DNN
to outliers. It is calculated as the median of |𝑦𝑖 − 𝑦̂𝑖 | We present a comparative analysis of GA-PSO with Bayesian Opti-
mization and Evolutionary Strategy using multiple evaluation metrics.
MedAE = median(|𝑦𝑖 − 𝑦̂𝑖 |) (4)
The comparison is performed across nine key metrics: MSE, MAE,
5. Explained Variance Score (EV): RMSE, R2 , MAPE, Explained Variance, Median Absolute Error, Max
The EV metric quantifies the proportion of the variance in the Error, and Adjusted R2 . These metrics provide a comprehensive eval-
dependent variable that the model explains. It is calculated as: uation of model performance, covering both prediction accuracy and
explanatory power of the model. The performance of each optimization
Var(𝑦 − 𝑦)
̂
EV = 1 − (5) method is assessed across generations of model tuning.
Var(𝑦)
The synthetic dataset used in this study consists of 1000 samples,
where 𝑉 𝑎𝑟 denotes the variance. each with 10 features, generated through a random process. The target
6. Relative Absolute Error (RAE): values are derived as the sum of the features, with added Gaussian
RAE normalizes the MAE by dividing it by the average absolute noise to simulate real-world forecasting conditions. The dataset is split
error of a simple model (e.g., a model predicting the mean into training and validation sets to evaluate the model’s generalization
of the target variable). This normalization helps in assessing capability.
performance relative to a baseline model. The RAE is given by: We used a feedforward neural network for the model, consisting of
MAE one hidden layer with a variable number of neurons, which is optimized
RAE = ∑𝑛 (6) by the different hyperparameter optimization methods. This simple
1
𝑛 𝑖=1 |𝑦𝑖 − 𝑦|
̄
architecture is commonly referred to as a DNN, where the number of
where 𝑦̄ is the mean of the target variable. neurons in the hidden layer is the key hyperparameter that is being
tuned. The plots in Fig. 7 provide a comprehensive comparison of
These criteria collectively provide a comprehensive evaluation of the performance of GA-PSO, Bayesian Optimization, and Evolutionary
the regression model’s predictive capabilities. Strategy in nine different evaluation criteria. These criteria encompass
both error-based metrics, such as mean absolute error and root mean
4.2. GA-PSO vs. Hyperparameter optimizers square error, and accuracy-driven indicators. As observed in these fig-
ures, GA-PSO consistently outperforms the other two hyperparameter
optimization methods, demonstrating superior performance in mini-
In this subsection, we conduct a detailed comparison between the
mizing prediction errors and maximizing accuracy. These results offer
proposed GA-PSO algorithm and two other widely used optimization
valuable insight into the strengths and limitations of each optimiza-
techniques: Bayesian Optimization and Evolutionary Strategy for both
tion technique, providing a clear understanding of their respective
optimization benchmarks and tuning DNN.
advantages in energy forecasting tasks.
Table 3 presents the optimization algorithm statistics for GA-PSO,
4.2.1. Comparison on optimization benchmarks Bayesian Optimization, and Evolutionary Strategy on a set of evaluation
The focus of this comparison is on the performance of these al- metrics. The mean and standard deviation values provide a detailed
gorithms in solving optimization test functions, specifically targeting view of the variability in performance of each algorithm. GA-PSO
their ability to efficiently minimize the objective function in multiple consistently demonstrates the lowest mean error values in all metrics,
benchmark problems. These test functions include both unimodal and including MSE, MAE, RMSE, and R2 , indicating its superior ability
multimodal problems, designed to challenge the algorithms in different to minimize prediction errors. The standard deviation values for GA-
optimization landscapes shown in Fig. 5. PSO are relatively smaller, reflecting its robustness in performance
9
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
consistency. In contrast, Bayesian Optimization exhibits higher mean 4.3. GA-PSO on DNN for energy forecasting
errors and larger standard deviations, suggesting less stability and
efficiency in its optimization. Evolutionary Strategy shows competitive In this study, we conducted a thorough analysis of energy consump-
performance but with higher mean errors in all metrics, along with tion prediction using a diverse set of machine learning algorithms,
more considerable fluctuations in standard deviation. These statistics namely XGBoost, k-nearest neighbors (KNN), Decision Tree, Random
highlight the relative strengths and weaknesses of each optimization Forest, and Linear Regression. The primary aim was to evaluate the
method, with GA-PSO emerging as the most reliable choice. effectiveness of these models in capturing intricate patterns inherent
10
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Table 4
Prediction performance over RMSE (↓).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 1542.6 1228.4 229.2 383.5 1593.6 203.0 290.2 790.9 1315.8 2577.3 3289.9 700.5
BDNN 1935.9 1404.6 261.5 399.7 1816.9 222.6 301.5 925.1 1339.4 2877.3 3889.9 707.3
XGBst 1649.4 1390.7 232.8 400.9 1732.0 186.2 308.1 845.8 1344.5 2944.3 3726.8 706.0
RF 1929.5 1595.6 276.1 462.9 2145.6 237.1 353.7 971.6 1440.9 3395.5 4339.9 911.1
KNN 2098.1 1740.7 311.5 498.5 2221.6 256.6 372.9 1084.9 1680.9 3746.6 4734.2 910.9
SVM 2623.6 2275.2 380.2 620.8 2604.6 307.5 382.2 1324.3 2300.4 5929.0 6457.5 999.0
DT 2189.2 1876.2 239.6 478.8 2022.0 261.6 334.0 1069.9 1776.3 4262.6 5295.9 822.6
LR 2225.7 2007.5 336.5 550.3 2288.4 278.8 371.1 1155.7 2104.9 4838.0 5683.9 915.5
in energy consumption data. The figures in this section depict a com- The KNN algorithm, using its proximity-based approach, excels in
parison between the predicted values generated by each algorithm and capturing localized patterns and is particularly effective when energy
the actual energy consumption values. Tables 4, 5, 6, 7, 8, 9 present consumption exhibits distinct clusters. However, its performance may
the prediction performance of the predictive models, evaluating metrics vary in regions with sparse data or less defined clusters. However,
such as RMSE, MAE, R2, MEDAE, EV, and RAE. Furthermore, Figs. 8, the decision tree model demonstrates a propensity for capturing hi-
9, 10, and 11 illustrate the actual versus predicted data for the AEP, FE, erarchical dependencies, but its performance may plateau in intricate
PV, and BC datasets, respectively, offering a comprehensive visualiza-
scenarios where more sophisticated models, like the neural network,
tion of the model’s performance across different scenarios. Upon close
prove advantageous.
examination of the plots, it becomes apparent that the neural network
model fine-tuned by our methodology demonstrates a remarkable abil- Linear regression, while providing a baseline for comparison, re-
ity to closely approximate true energy consumption values, showcasing veals its limitations in accommodating the intricate dynamics of energy
its superior predictive performance. The KNN algorithm, which relies consumption, especially when nonlinear relationships play a significant
on local patterns, also produces commendable results, particularly in role. These findings underscore the importance of selecting a modeling
scenarios characterized by discernible clusters. In contrast, the decision approach that aligns with the underlying complexity of the data.
tree and linear regression models exhibit varying degrees of accuracy, To further explore the landscape of energy consumption prediction,
with potential limitations in capturing non-linear dependencies within we included the XGBoost algorithm in our analysis. XGBoost, as an
the data.
ensemble learning method, combines the predictive power of multiple
These visual representations not only illuminate the potential
decision trees, sequentially correcting the errors of its predecessors.
strengths and weaknesses of each model in the context of energy
consumption prediction, but also provide valuable insight. The neural In this section, we thoroughly examine the training dynamics of our
network, with its ability to discern complex patterns and relationships, ODNN designed for prediction of energy consumption. We juxtapose
emerges as a promising choice for accurate forecasting in this domain. its learning curve with that of a BDNN. The learning curve serves as
Its ability to adapt to non-linear dependencies is particularly evident in a crucial diagnostic tool, elucidating the progression of training and
regions of the plot where traditional linear models falter. validation performance metrics over successive epochs.
11
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Table 5
Prediction performance over MAE (↓).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 1281.5 889.1 178.0 299.5 1161.9 168.0 212.5 591.1 940.2 1947.1 2766.5 538.7
BDNN 1577.1 1027.2 213.3 308.2 1354.7 172.1 220.7 686.2 1010.4 2274.1 3366.5 546.1
XGBst 1319.0 983.4 178.5 309.2 1298.7 145.5 224.1 647.9 988.3 2138.2 2902.2 541.8
RF 1448.3 1089.4 206.0 346.3 1520.6 177.7 245.4 716.1 1009.0 2466.3 3201.1 639.2
KNN 1604.1 1224.1 234.3 370.1 1601.7 191.7 255.4 809.8 1230.5 2817.2 3564.9 664.2
SVM 2181.7 1703.3 301.5 474.0 1950.3 247.0 291.1 1027.2 1743.5 4568.8 5116.7 799.7
DT 1853.9 1482.2 239.6 376.8 1556.5 217.5 252.7 849.0 1398.9 3236.4 4312.1 664.3
LR 1820.5 1481.7 260.3 441.5 1806.7 229.2 278.9 897.2 1705.3 3748.8 4600.3 710.2
Table 6
Prediction performance over 𝑅2 (↑).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 0.65 0.68 0.64 0.61 0.59 0.53 0.39 0.64 0.67 0.81 0.73 0.62
BDNN 0.40 0.62 0.53 0.46 0.49 0.50 0.32 0.60 0.59 0.74 0.71 0.58
XGBst 0.57 0.63 0.63 0.57 0.51 0.61 0.31 0.59 0.66 0.75 0.67 0.51
RF 0.41 0.51 0.47 0.43 0.25 0.36 0.09 0.46 0.61 0.67 0.55 0.50
KNN 0.30 0.41 0.33 0.34 0.20 0.25 −0.01 0.33 0.47 0.59 0.46 0.18
SVM −0.09 0.00 0.00 −0.03 −0.10 −0.07 −0.06 0.00 0.00 −0.02 0.00 0.01
DT 0.24 0.32 0.39 0.39 0.34 0.22 0.19 0.35 0.40 0.47 0.33 0.33
LR 0.22 0.22 0.22 0.19 0.15 0.12 0.00 0.24 0.16 0.32 0.22 0.24
12
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
The BDNN learning curve unveils valuable insights into the model’s statistical measures, providing a clearer understanding of the signifi-
inherent behavior during training, revealing trends related to con- cance and practical relevance of the findings. The following statistical
vergence speed, potential overfitting, and overall stability. A signifi- analyzes were applied:
cant disparity between the training and validation curves may signify
challenges in generalization and efficiency. • Statistical Tests: To evaluate the significance of the differences
observed between the methods, we performed both t-tests and
Our scrutiny extends to the ODNN, wherein the proposed opti-
Analysis of Variance (ANOVA). The t-test and ANOVA were both
mization method is applied. By comparing the learning curves of both
used for pairwise comparisons between two methods to assess the
models, our goal is to elucidate the impact of optimization on the
statistical significance of the differences observed.
convergence rate, generalization performance, and resource efficiency.
• Effect Sizes and Confidence Intervals (CIs): In addition to p-
This comparative analysis serves multiple purposes, including the values, we calculated effect sizes to assess the magnitude of the
evaluation of optimization techniques in terms of model stability, mit- observed differences. Effect sizes help to understand not just
igating overfitting, and achieving resource-efficient convergence. Fur- whether a difference exists, but how substantial that difference
thermore, the insights gleaned from the learning curves contribute to is in practical terms. Furthermore, we report 95% confidence
informed decision-making regarding the selection of the most suitable intervals (CIs) for key parameters, which provide a range of
model for deployment in real-world energy consumption prediction values within which the true population parameter is likely to fall
applications. with confidence 95%. This enhances the reliability of our findings
Figs. 12 and 13 show visually depict the learning curves for both and adds precision to the statistical reporting.
BDNN and ODNN across various datasets. These figures provide a tan- • Statistical Reporting: All statistical results are presented with
gible representation of their respective training dynamics, highlighting p-values, effect sizes, and confidence intervals where applicable.
the discernible benefits derived from the optimization process. This approach allows for a transparent interpretation of the data.
The inclusion of effect sizes and CIs provides a more complete
picture, facilitating a better understanding of the importance of
4.4. Statistical analysis
observed differences.
We have incorporated a comprehensive statistical analysis of the Table 10 summarizes the statistical analysis of the results for key
results, which is presented in Table 10. This ensures that the data is comparisons across methods. As shown in the table, all p-values are
analyzed rigorously and that the results are reported with appropriate below the commonly accepted threshold of 0.05, indicating that the
13
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Table 7
Prediction performance over MEDAE (↓).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 981.7 650.4 141.9 241.5 850.2 151.9 152.5 445.4 649.9 1452.9 2210.8 418.9
BDNN 1385.9 768.0 190.3 255.0 937.5 168.3 154.8 451.0 709.4 1623.3 2310.8 436.3
XGBst 1128.4 694.4 138.7 239.1 955.6 124.5 155.2 515.1 751.4 1569.9 2337.4 424.5
RF 1075.8 681.9 152.1 257.3 1028.9 134.7 159.0 513.2 710.6 1781.3 2342.2 450.1
KNN 1236.6 827.4 178.4 275.2 1124.9 145.0 160.2 602.5 906.9 2166.6 2740.2 473.6
SVM 1920.1 1370.1 257.5 375.3 1447.8 208.3 227.8 848.7 1452.2 3734.1 4222.5 669.8
DT 1784.9 1262.7 210.0 308.0 1242.8 200.0 186.4 713.1 1180.0 2648.2 3797.3 589.0
LR 1629.7 1156.4 209.2 384.8 1546.6 207.3 221.2 743.6 1522.2 3130.8 4047.3 590.1
Table 8
Prediction performance over EV (↑).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 0.67 0.69 0.65 0.62 0.60 0.68 0.39 0.65 0.69 0.78 0.73 0.53
BDNN 0.62 0.64 0.57 0.60 0.59 0.60 0.30 0.62 0.65 0.76 0.71 0.50
XGBst 0.58 0.63 0.63 0.57 0.52 0.61 0.32 0.59 0.66 0.75 0.67 0.51
RF 0.44 0.51 0.47 0.43 0.25 0.42 0.10 0.47 0.63 0.67 0.55 0.48
KNN 0.33 0.41 0.33 0.34 0.20 0.33 0.00 0.34 0.54 0.60 0.46 0.19
SVM 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01
DT 0.39 0.33 0.40 0.39 0.35 0.35 0.19 0.37 0.41 0.49 0.35 0.33
LR 0.23 0.23 0.26 0.20 0.15 0.22 0.07 0.24 0.27 0.33 0.23 0.28
14
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
Table 9
Prediction performance of the predictive models over RAE (↓).
XXX Dataset
XXX AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
Alg X
ODNN 0.58 0.47 0.59 0.62 0.59 0.74 0.71 0.58 0.54 0.43 0.52 0.61
BDNN 0.78 0.60 0.71 0.68 0.67 0.78 0.76 0.62 0.58 0.45 0.58 0.70
XGBst 0.65 0.58 0.59 0.64 0.66 0.64 0.75 0.63 0.57 0.47 0.57 0.67
RF 0.72 0.64 0.68 0.72 0.77 0.79 0.82 0.70 0.58 0.55 0.63 0.75
KNN 0.79 0.72 0.78 0.77 0.81 0.85 0.86 0.79 0.71 0.62 0.71 0.82
SVM 1.08 1.00 1.00 0.98 0.99 1.09 0.98 1.00 1.01 1.01 1.01 0.99
DT 0.91 0.87 0.79 0.78 0.79 0.96 0.85 0.83 0.81 0.72 0.85 0.82
LR 0.90 0.87 0.86 0.92 0.91 1.02 0.94 0.87 0.98 0.83 0.91 0.90
15
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
dual-phase strategy ensures both extensive search of the archi- This dynamic tuning is a key innovation that differentiates the
tecture space and fine-tuned optimization, addressing limitations method from traditional optimization techniques.
of individual methods when applied independently. 5. Superior Performance: The GA-PSO framework consistently
2. Application to Energy Forecasting: Unlike prior studies that outperforms state-of-the-art hyperparameter optimization ap-
focus solely on traditional machine learning or optimization proaches, such as Bayesian Optimization and Evolutionary
benchmarks, our work applies the hybrid GA-PSO framework Strategies, across both benchmark tests and real-world forecast-
to real-world energy forecasting tasks, including photovoltaic ing tasks.
(PV) production and energy consumption. This focus on practical
applications underscores the utility and impact of the method in The proposed method offers significant advantages, including improved
the energy sector. forecast accuracy, computational efficiency, and scalability. However,
3. Scalability : The GA-PSO-enhanced ODNN demonstrates supe- the hybrid nature of GA and PSO introduces a potential computational
rior scalability and adaptability when applied to various overhead, which could be mitigated by further optimizing the meta-
datasets. The results show a significant improvement of 27% heuristic parameters. Despite this, the benefits outweigh the challenges,
in the accuracy of the prediction and 22% in the error metrics particularly for applications requiring high accuracy and adaptability.
compared to the state-of-the-art methods. These contributions collectively establish the novelty of the proposed
4. Dynamic Parameter Tuning: A novel aspect of the frame- method and its potential to advance the state-of-the-art in energy
work is the use of dynamic parameter adjustment during the forecasting, offering actionable insights and tools for practitioners in
PSO phase, which enhances convergence speed and accuracy. the energy sector.
16
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
5. Conclusion and future works • Apply the GA-PSO-DNN framework to optimize hybrid energy
systems for better energy management and storage optimization.
In this paper, we proposed a hybrid metaheuristic framework com-
bining GA and PSO to optimize DNN architectures for accurate fore- By addressing these areas, the proposed framework can further
casting of energy consumption and PV production. The framework advance neural network optimization and its application across energy
effectively automates neural network configuration, improving both and other sectors.
prediction accuracy and computational efficiency. Experimental re-
sults demonstrate the robustness of the approach, with notable per- CRediT authorship contribution statement
formance improvements over traditional methods, making it suitable
for applications like smart grid management and renewable energy Eghbal Hosseini: Developed the original concept, Methodology,
integration. Writing – original draft,Writing – review & editing. Barzan Saeedpour:
Future research could extend this framework to: Writing – original draft, Writing – review & editing. Mohsen Banaei:
Data collection, Analysis, Interpretation. Razgar Ebrahimy: Writing –
• Explore its adaptability to convolutional and recurrent networks review & editing, Read and approved the final version.
for diverse forecasting tasks.
• Develop multi-objective optimization for balancing error mini- Funding
mization, computational efficiency, and interpretability.
• Integrate real-time data and online learning to enhance adaptabil- This research work is supported by DTU Compute, Technical Uni-
ity in dynamic environments. versity of Denmark, Copenhagen, Denmark.
17
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
18
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704
[43] E. Hosseini, A.M. Al-Ghaili, D.H. Kadir, S.S. Gunasekaran, A.N. Ahmed, N. Jamil, [52] E.C. Blessie, B. Sundaravadivazhagan, V. Kumutha, V. Sumesh, Predictive model-
M. Deveci, R.A. Razali, Meta-heuristics and deep learning for energy applications: ing of household power consumption using machine learning and meta-heuristic
Review and open research challenges (2018–2023), Energy Strat. Rev. 53 (2024) optimization technique, in: Machine Learning for Radio Resource Management
101409. and Optimization in 5G and beyond, CRC Press, 2025, pp. 140–155.
[44] A. Ponmalar, K. Vijayakumar, C. Lakshmipriya, M. Karthikeyan, B.P. PJ, [53] Z. Wang, W. Xue, K. Li, Z. Tang, Y. Liu, F. Zhang, S. Cao, X. Peng, E.Q. Wu,
Meta-heuristics and machine learning applications in complex systems, in: Meta- H. Zhou, Dynamic combustion optimization of a pulverized coal boiler con-
heuristic and Machine Learning Optimization Strategies for Complex Systems, sidering the wall temperature constraints: A deep reinforcement learning-based
IGI Global, 2024, pp. 257–275. framework, Appl. Therm. Eng. 259 (2025) 124923.
[45] A. Saha, S. Rajak, J. Saha, C. Chowdhury, A survey of machine learning and [54] Z. Wang, H. Zhou, X. Peng, S. Cao, Z. Tang, K. Li, S. Fan, W. Xue, G. Yao, S.
meta-heuristics approaches for sensor-based human activity recognition systems, Xu, A predictive model with time-varying delays employing channel equalization
J. Ambient. Intell. Humaniz. Comput. 15 (1) (2024) 29–56. convolutional neural network for NOx emissions in flexible power generation,
[46] B. Gao, S. Peng, T. Li, F. Wang, J. Guo, C. Liu, H. Zhang, Integration of improved Energy 306 (2024) 132495.
meta-heuristic and machine learning for optimizing energy efficiency in additive [55] K.D. Lu, Z.G. Wu, T. Huang, Differential evolution-based three stage dynamic
manufacturing process, Energy 306 (2024) 132518. cyber-attack of cyber–physical power systems, IEEE/ASME Trans. Mechatronics
[47] S.K. Chauhan, V.S. Chauhan, Meta-heuristic algorithms for optimal sizing of 28 (2) (2022) 1137–1148.
hybrid renewable energy systems, in: Metaheuristic and Machine Learning [56] M.R. Chen, G.Q. Zeng, K.D. Lu, J. Weng, A two-layer nonlinear combination
Optimization Strategies for Complex Systems, IGI Global, 2024, pp. 184–200. method for short-term wind speed prediction based on ELM, ENN, and LSTM,
[48] H. Hu, S. Gong, B. Taheri, Energy demand forecasting using convolutional IEEE Internet Things J. 6 (4) (2019) 6997–7010.
neural network and modified war strategy optimization algorithm, Heliyon 10 [57] R.H. Shumway, D.S. Stoffer, R.H. Shumway, D.S. Stoffer, ARIMA models, in:
(6) (2024). Time Series Analysis and Its Applications: With R Examples, 2017, pp. 75–163.
[49] E. Hosseini, A.M. Al-Ghaili, D.H. Kadir, N. Jamil, M. Deveci, S.S. Gunasekaran, [58] A.A. Alsuwaylimi, Comparison of ARIMA, ANN and hybrid ARIMA-ANN models
R.A. Razali, Extra dimension algorithm: a breakthrough for optimization and for time series forecasting, Inf. Sci. Lett. 12 (2) (2023) 1003–1016.
enhancing DNN efficiency, Artif. Intell. Rev. 58 (1) (2025) 1–35. [59] P.T. Yamak, L. Yujian, P.K. Gadosey, A comparison between arima, lstm, and
[50] R. Nikou, A. Goli, A. Zackery, Improving electricity demand forecasting through gru for time series forecasting, in: Proceedings of the 2019 2nd International
hybrid neural networks and meta-heuristics: A case study in Iran, J. Dyn. Games Conference on Algorithms, Computing and Artificial Intelligence, 2019, pp.
12 (3) (2025) 243–266. 49–55.
[51] K. Pathmapriya, P.J. Prathap, Integrating deep learning and meta-heuristics for
healthcare: A survey, in: 2025 6th International Conference on Mobile Computing
and Sustainable Informatics, ICMCSI, IEEE, 2025, pp. 910–918.
19