0% found this document useful (0 votes)
10 views

Optimized deep neural network architectures

Uploaded by

salimop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Optimized deep neural network architectures

Uploaded by

salimop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Energy Strategy Reviews 59 (2025) 101704

Contents lists available at ScienceDirect

Energy Strategy Reviews


journal homepage: www.elsevier.com/locate/esr

Optimized deep neural network architectures for energy consumption and


PV production forecasting
Eghbal Hosseini a ,∗, Barzan Saeedpour b , Mohsen Banaei a , Razgar Ebrahimy a
a
Technical University of Denmark, Department of Applied Mathematics and Computer Science, Copenhagen, Denmark
b
Department of Computer Engineering, University of Kurdistan, Iran

ARTICLE INFO ABSTRACT

Keywords: Accurate time-series forecasting of energy consumption and photovoltaic (PV) production is essential for
Photovoltaic production effective energy management and sustainability. Deep Neural Networks (DNNs) are effective tools for learning
Deep neural networks complex patterns in such data; however, optimizing their architecture remains a significant challenge. This
Meta-heuristic algorithms
paper introduces a novel hybrid optimization approach that integrates Genetic Algorithms (GA) and Particle
Time series forecasting
Swarm Optimization (PSO) to enhance the DNN architecture for more accurate energy forecasting. The
performance of GA-PSO is compared with leading hyperparameter optimization techniques, such as Bayesian
Optimization and Evolutionary Strategy, across various optimization benchmarks and DNN hyperparameter
tuning tasks. The study evaluates the GA-PSO-enhanced Optimized Deep Neural Network (ODNN) against
traditional DNNs and state-of-the-art machine learning methods on multiple real-world energy forecasting
tasks. The results demonstrate that ODNN outperforms the average performance of other methods, achieving
a 27% improvement in forecasting accuracy and a 22% reduction in error across various metrics. These
findings demonstrate the significant potential of GA-PSO as an effective tool to optimize DNN models in energy
forecasting applications.

1. Introduction this domain include Particle Swarm Optimization (PSO) [1], the Arti-
ficial Bee Colony algorithm [2], Social Spider Optimization [3], and
In today’s energy landscape, the growing global demand and the fi- Genetic Algorithm [4]. Beyond these, a multitude of other algorithms
nite fossil fuel dependence pose a critical challenge. Time series energy in this field are documented, spanning Refs. [5–17], and [18].
consumption forecasting, utilizing algorithms like metaheuristics, is DNNs, a crucial component of machine learning, employ diverse
crucial for sustainable resource allocation. This task involves predicting strategies to learn new tasks based on data. They stand out for their ex-
future energy usage, addressing challenges such as data variability and ceptional predictive accuracy, harnessing insights from historical data,
nonlinearity. Methods like statistical models and machine learning are and providing powerful computational learning approaches. Gradient-
employed for efficient resource management, cost optimization, and based optimization methods fine-tune model parameters to minimize
stable energy supply. Optimizing neural network architectures is key, cost functions, enhancing model adaptability across diverse settings.
improving accuracy and adaptability to evolving energy patterns. This Despite their effectiveness, challenges arise with the backpropagation
optimization enhances reliability and efficiency, contributing to cost algorithm, vital for neural network training, due to sensitivity to noisy
savings, resource conservation, and more effective decision-making in data, time-consuming processes, and susceptibility to local minima. Ad-
the energy sector toward a sustainable future. ditionally, these methods grapple with issues like determining optimal
Meta-heuristic algorithms, inspired by swarm intelligence, nature, step sizes, the possibility of converging to multiple local optima, and
biomimicry, physics, and scientific theories, excel in solving complex high computational complexity.
optimization problems. By mimicking collaborative behaviors observed Recent advancements in machine learning have introduced physics-
in nature and leveraging insights from diverse disciplines, these algo- informed methodologies, which integrate domain-specific physical laws
rithms efficiently navigate intricate solution spaces. Their adaptability into data-driven models to enhance accuracy and efficiency. Among
and collective intelligence make them indispensable tools for address- these, the Deep Energy Method (DEM) offers a unique alternative to the
ing real-world optimization challenges. Several notable algorithms in widely used Physics-Informed Neural Networks (PINNs). Unlike PINNs,

∗ Corresponding author.
E-mail address: [email protected] (E. Hosseini).

https://doi.org/10.1016/j.esr.2025.101704
Received 15 September 2024; Received in revised form 7 March 2025; Accepted 23 March 2025
Available online 6 April 2025
2211-467X/© 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Nomenclature the pursuit of optimizing DNN parameters. This collaborative approach


EV Explained BDNN Basic Deep seeks to overcome the inherent constraints of gradient-based optimiza-
VarianceScore Neural Network tion methods applied to neural network parameters, while avoiding the
ODNN Optimized Deep MHs Meta-Heuristics complexities associated with the backpropagation process.
Neural Network This intersection of DNN and meta-heuristic methodologies has led
DNNs Deep Neural RMSE Root Mean to a significant body of research, strategically addressing the multi-
Networks Squared Error faceted challenges associated with optimization in neural networks. The
LSTM Long Short-Term ML Machine effective integration of these synergies drives meta-heuristic techniques
Memory Learning toward more efficient, effective, and robust search processes. This,
LCA Laying Chicken MAE Mean Absolute in turn, fortifies their overall performance metrics, including solution
Algorithm Error quality, convergence rate, and resilience against various perturbations
BBA Big Bang MAE Mean Absolute and uncertainties.
Algorithm Error By harmoniously combining the adaptability of meta-heuristic al-
BBA Big Bang MedAE Median Absolute gorithms with the complex optimization landscape of deep neural
Algorithm Error networks, researchers are pioneering innovative approaches to enhance
MVA Multiverse PV Photovoltaics the overall effectiveness of optimization processes. The resulting ad-
Algorithm vancements contribute not only to the refinement of DNN models but
VEA Volcano RAE Relative also hold the potential to extend the applicability of meta-heuristic
Eraption Absolute Error techniques across a wide range of complex problem domains. This
Algorithm dynamic synergy represents a paradigm shift in the optimization land-
CVA Covid-19 RES Renewable scape, paving the way for novel and transformative methodologies
Algorithm energy source that promise to reshape the fields of machine learning and artificial
EGA Evolutionary- RL Reinforcement intelligence.
Gradient learning Recent research highlights state-of-the-art metaheuristic algorithms
Algorithm with wide-ranging applications. These cutting-edge techniques are
PSO Particle Swarm CNN Convolution adept at enhancing cybersecurity, refining LSTM and BiLSTM mod-
Optimization Neural Networks els, and optimizing Deep Belief Network models for energy-related
ACO Ant Colony ABC Artificial Bee solutions. The optimization of deep learning parameters through ad-
Optimization Colony vanced metaheuristic algorithms has significant implications, particu-
FA Firefly LoRa Short for larly in the realm of achieving efficient solutions for energy storage.
Algorithm long-Range In addition, one study integrates metaheuristic and deep learning
GA Genetic LR Linear for efficient energy management in hybrid electric vehicles. Further-
Algorithms Regression more, a metaheuristic-based clustering method has been introduced
XGBst XGBoost RF Random Forest for improved data processing. Moreover, a groundbreaking fusion of
DT Decision Tree SVM Support Vector metaheuristic techniques with deep learning is proposed for smart grid
Machine stability prediction models, which could revolutionize energy manage-
KNN K Nearest Std Dev Standard ment [28–32]. The optimal configuration of hybrid renewable energy
Neighbors Deviation systems using a techno-economic analysis was proposed [33]. While
this study effectively identified cost-efficient energy configurations for
a specific geographic location, its scope was limited to static configura-
tions and did not leverage advanced machine learning or optimization
techniques for dynamic energy forecasting. [34] proposed a statisti-
which leverage governing differential equations, DEM minimizes the cal approach for intrusion detection using a multilayer convolutional
potential energy of the physical system, reducing computational effort
neural network, achieving high accuracy. However, this approach
and the order of required derivatives. This approach is particularly
focused primarily on feature selection and classification tasks within a
advantageous for handling singularities and working with limited data,
specific domain, without addressing broader challenges like optimizing
though it is less suited for identifying governing equations. Applica-
complex deep learning architectures for real-time applications. A large
tions of DEM have demonstrated its effectiveness in modeling static
body of research in the fusion of meta-heuristics and deep learning is
systems, as illustrated in studies such as [19]. While this study focuses
documented in Refs. [35–51], and [52].
on purely data-driven approaches for energy forecasting, integrating
Recent advancements in data-driven optimization for energy sys-
physics-informed methods like DEM into the framework could pro-
tems have been introduced. For instance, a channel selection convo-
vide a promising direction for future work, particularly in scenarios
where physical laws significantly influence the data dynamics. These lutional neural network model combined with the twin delayed depth
challenges highlight the ongoing pursuit of more efficient optimization deterministic Policy gradient algorithm was proposed to optimize boiler
methods in the dynamic and evolving field of DNN. For further ex- performance under variable loads, enhancing thermal efficiency and
ploration of contemporary deep learning techniques, refer to [20–25], reducing NOx emissions while ensuring rapid decision-making [53].
and [26]. Similarly, a channel equalization CNN model was developed to predict
Physics-informed approaches, such as using DNNs to approximate NOx emissions more efficiently, achieving significant improvements in
solutions to Partial Differential Equations (PDEs), have shown promise both training time and prediction accuracy compared to baseline meth-
in computational mechanics. By leveraging energy-based loss functions, ods [54]. These works highlight the potential of hybrid approaches,
these methods provide an alternative to traditional techniques like which further support the innovations presented in this study.
finite element methods (FEM) for solving mechanical problems [27]. In recent years, the application of advanced forecasting techniques
While this study focuses on data-driven approaches, acknowledging in the energy sector has gained significant attention. For example,
these methods highlights the potential for integrating domain knowl- in the context of cyber–physical power systems (CPPS), research has
edge into energy forecasting systems. focused on addressing the security risks posed by false data injection
In recent years, there has been a notable upswing in scholarly attacks (FDIAs) [55]. A novel dynamic false data injection attack
attention devoted to the amalgamation of meta-heuristic techniques in model was proposed to account for the evolving nature of CPPS, using

2
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Table 1
Advantages and disadvantages of existing energy forecasting methods.
Category Methodology Advantages Disadvantages
Traditional statistical ARIMA, ARIMAX, - Simple and interpretable - Struggles with nonlinear
models Exponential Smoothing - Effective for short-term patterns
forecasting with linear patterns - Requires strong assumptions on
stationarity
Deep learning models LSTM, GRU, CNN, DNN - Capable of capturing temporal - High computational cost
dependencies - Susceptible to overfitting
- Handles large-scale datasets - Requires extensive
effectively hyperparameter tuning
Hybrid statistical-ML ARIMA-DNN, - Combines strengths of statistical - Increased complexity
models ARIMA-LSTM, and ML approaches - Needs careful feature
XGBoost-based - Improves accuracy over engineering and hyperparameter
standalone models tuning
Metaheuristic-based GA, PSO, ABC, Social - Efficient for optimizing - Computationally expensive
optimization Spider hyperparameters - Convergence can be slow
- Avoids local minima in - Risk of premature convergence
optimization
Physics-informed deep PINNs, Deep Energy - Incorporates domain knowledge - Not directly applicable to purely
learning Method (DEM) - Improves generalization in data-driven forecasting
physics-based problems - Requires physical laws
integration

optimization techniques to detect such attacks and improve system 2. Demonstrating how ODNN improves both accuracy and effi-
security through state forecasting. On the other hand, in the realm ciency in forecasting models for precise energy consumption
of smart grids, wind speed prediction plays a crucial role in ensur- predictions.
ing efficient energy distribution and system stability. [56] employed 3. Optimizing neural network architectures efficiently through a
a two-layer nonlinear combination technique for accurate short-term balanced approach, combining global exploration and exploita-
wind speed prediction, demonstrating the effectiveness of combin- tion.
ing extreme learning machines and neural networks to enhance fore-
casting accuracy. These studies highlight the increasing importance Table 1 compares different energy forecasting methods, outlining
of advanced machine learning algorithms and hybrid techniques in their respective advantages and disadvantages. It covers traditional
addressing forecasting challenges in energy systems. statistical models, deep learning models, hybrid statistical ML models,
Time series forecasting methods range from traditional models such metaheuristic-based optimization, and physics-informed deep learn-
as ARIMA [57], which are simple but struggle with nonlinear patterns, ing approaches. Each method is evaluated based on its strengths and
to combining ARIMA and DNN [58], and advanced techniques like deep limitations in the context of energy forecasting.
neural networks (LSTM, GRU) [59], which excel in capturing temporal The ODNN framework effectively addresses several critical gaps
dependencies but require intensive computation and tuning. Meta- identified in the literature and highlighted in Table 1. Traditional statis-
heuristic algorithms, such as GA and PSO, optimize model parameters tical models fail to capture nonlinear patterns and deep learning models
efficiently, with GA offering robust global search, and PSO providing often suffer from high computational costs and overfitting risks. Hybrid
faster local refinement. However, these methods face challenges like statistical-ML models increase complexity and reliance on extensive
overfitting and computational demands. By combining GA and PSO feature engineering and optimization techniques focused on accuracy
with DNNs, this study addresses these limitations, improving forecast or efficiency. Existing neural network optimization techniques often
accuracy and efficiency. Harnessing metaheuristics to enhance DNN suffer from suboptimal architectures due to static configurations or the
training for time series energy consumption forecasting offers a promis- reliance on single optimization methods, which reduces their practical
ing avenue for improving accuracy and efficiency. This study introduces applicability to complex forecasting scenarios.
a novel hybrid framework that combines GA and PSO to optimize ODNN overcomes these challenges by dynamically tuning DNN
DNN architectures, referred to as ODNN. The approach leverages GA’s architectures to achieve both accuracy and efficiency. It effectively
global exploration and PSO’s local refinement to balance optimization, addresses scalability and adaptability issues that many existing models
achieving precise and efficient forecasting. face by balancing global exploration (GA) with local refinement (PSO).
Extensive experiments have been conducted to compare the per- This hybrid optimization approach ensures that ODNN remains ro-
formance of DNNs with and without the hybrid optimization, demon- bust, adaptable, and accurate under various forecast conditions, filling
strating notable improvements in accuracy and generalization. This the gap left by previous methods that fail to combine these aspects
study also highlights the practical implications of ODNN in energy efficiently.
management and sustainable planning, addressing key challenges such
as scalability and adaptability. Unlike traditional methods focusing on 2. Dataset overview
either accuracy or computational efficiency, ODNN achieves both by
dynamically optimizing DNN architectures. The proposed framework This section explores three key aspects: data description, time series
establishes itself as a state-of-the-art solution for energy consumption analysis, and feature extraction. We analyze the characteristics of the
and PV production forecasting. Existing methods for neural network dataset, identify temporal patterns, and investigate the extraction of
optimization often face challenges in balancing global exploration and meaningful features. Together, these discussions lay the foundation for
local exploitation, which can lead to suboptimal architectures and a deeper understanding and informed analysis in the following sections.
reduced scalability in practical applications. The main contributions of For our experiments, a comprehensive set of twelve datasets, namely
this work are: (AEP, COMED, DAYTON, DEOK, DOM, DUQ, EKPC, FE, NI, PJM, PJME,
PJMW), was employed. These datasets cover over a decade of hourly
1. Introducing ODNN, a hybrid model that incorporates GA, PSO, energy consumption data from PJM Interconnection LLC (PJM) in
and DNN to enhance DNN efficiency. megawatts.

3
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

2. Pattern Recognition: Identifying patterns, trends, and seasonality


helps in informed decision-making, particularly in finance to
detect market trends or in meteorology to understand weather
patterns.
3. Anomaly Detection: Detecting unusual or anomalous behavior in
time series data is vital for identifying problems or opportunities,
as seen in cybersecurity to detect unusual network activity.

The analysis of time series data typically involves statistical tech-


niques, time domain analysis, frequency domain analysis (e.g., Fourier
transforms), and machine learning methods.

3. Proposed method

DNNs have several critical parameters that significantly influence


their performance, including the number of layers, neurons per layer,
weights, and the scaling of input data. These parameters determine
the network’s depth, learning capacity, and ability to capture complex
patterns, which are crucial for accurate forecasting of tasks like PV
panel production and energy consumption. However, improperly set
Fig. 1. PJM hourly energy consumption dataset. parameters can lead to issues like overfitting, underfitting, or slow
convergence, negatively impacting the model’s predictive accuracy. To
address this, metaheuristic approaches can be employed to system-
PJM is a vital regional transmission organization (RTO) operating atically explore the complex parameter space, dynamically adjusting
within the United States, serving as a crucial component of the Eastern key configurations to enhance the network’s performance. By finding
Interconnection grid. This electric transmission system spans various the optimal balance, metaheuristics improve the network’s ability to
regions, including Delaware, Illinois, Indiana, Kentucky, Maryland, learn from data and generalize to new, unseen cases, offering a more
Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, efficient and effective method for tuning deep learning models in these
Virginia, West Virginia, and the District of Columbia. The hourly power forecasting tasks.
consumption data utilized in this research are sourced directly from We propose an innovative two-step approach that combines GA and
PJM’s website and are expressed in megawatts (MW). Illustrated in PSO to optimize neural network architectures for forecasting energy
Fig. 1, the dataset comprises two distinct columns: ‘‘Datetime’’ and consumption and PV panel production. Our objective is to uncover an
‘‘PJME-MW’’. optimal architecture of the DNNs. The first step utilizes GA for global
exploration, evolving architectures based on a fitness function. In the
2.1. Data quality assurance second step, PSO refines the architecture locally, leveraging its strength
in fine-tuning parameters. This hybridized approach aims to provide
To ensure the integrity and reliability of the data used in this study, a balanced and efficient solution, enhancing the accuracy of energy
several quality assurance measures were implemented. The dataset, consumption and PV panel production predictions.
sourced directly from the official website of PJM’s Hourly Energy Con- The algorithm begins by using a GA to explore and optimize the
sumption, was thoroughly validated to confirm its precision and con- neural network architecture. It starts with defining a fitness function,
sistency. Outlier detection methods, such as box plots, were employed where the network’s architecture is evaluated based on the mean
to identify and analyze anomalies in energy consumption. Furthermore, squared error on a validation dataset. Each architecture is represented
data pre-processing involved checks for missing values and inconsisten- as a chromosome, with genes indicating the number of neurons in
cies, with appropriate measures taken to address any detected issues. the two hidden layers. The process initializes with a randomly gen-
These steps ensure that the data is robust and suitable for subsequent erated population of neural network architectures. Genetic operators
analysis and modeling tasks. like two-point crossover and uniform integer mutation are applied to
create diversity in the population. The GA then evolves this population
2.2. Time series data over several generations, with better-performing architectures being
more likely to be selected for the next round. Tournament selection is
used to choose individuals for reproduction, ultimately leading to the
Time series data represent a specialized form of data that captures
observations or measurements at regular intervals over time. Each identification of the optimal architecture that minimizes the validation
data point within a time series is associated with a time period or error.
timestamp, enabling chronological organization. This type of data is After identifying the best neural network architecture with the GA,
invaluable for comprehending the temporal evolution of variables or the algorithm uses PSO to further refine and optimize this architecture.
phenomena and has applications in diverse fields such as prediction, The objective function in PSO remains the same as in GA, focusing on
pattern recognition, and decision-making. minimizing the mean squared error on the validation dataset. The best
The ubiquity of time series data extends various domains, and its architecture from GA serves as the starting point, or initial particle,
analysis serves several critical purposes, including the following. for PSO. The search space is defined by upper and lower bounds,
specifying the acceptable range for the number of neurons in each
1. Forecasting: Time-series analysis aids in predicting future val- hidden layer. PSO then iteratively adjusts these parameters to fine-tune
ues based on historical data, employing techniques like ARIMA the architecture, aiming to achieve even lower error by building on the
models and machine learning algorithms. optimal solution found in the GA phase.

4
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

3.1. Steps of the algorithm 4. Genetic Operators: Two fundamental genetic operators, the
two-point crossover strategy for crossover and the uniform in-
The algorithm begins by employing GA to methodically explore teger mutation for mutation, are implemented. These opera-
and optimize the architecture of the neural network. In this phase, tors enhance the exploration of diverse architectures within the
GA operates by iterating through various architectures, assessing each population.
one based on a fitness function to identify the most promising config- 5. Evolution: The GA population progresses through a series of
urations. The fitness function evaluates performance, typically using a generations, as specified by the user. Individuals with superior
metric such as mean squared error (MSE), to ensure that the network architectures are more likely to be selected and carried over to
is capable of making accurate predictions. Once the GA identifies the subsequent generations.
best-performing architecture, this optimal design serves as the initial 6. Selection: Tournament selection is utilized to determine in-
candidate for the next phase, where the algorithm transitions to PSO. dividuals for reproduction in each generation, promoting the
PSO refines and further optimizes the architecture by dynamically retention of superior architectures. At the end of the GA process,
adjusting the parameters based on the behavior of particles in a search the optimal architecture is identified based on the fitness func-
space, ultimately enhancing the performance of the network. This tion, representing the configuration that achieved the minimum
dual-phase approach ensures that the neural network’s architecture mean squared error in the validation dataset.
is optimized both globally and locally, balancing exploration with 7. Initialization and Bounds - PSO: The initial particle in the PSO
fine-tuning. The key steps within this GA-PSO-DNN optimization frame- phase is set as the best architecture uncovered by the GA. The
work include fitness evaluation, crossover, mutation, selection, and search space for PSO is confined by upper and lower bounds,
the refinement of architectures using particle-based adjustments, all which delineates the acceptable range for the number of neurons
aimed at achieving the most effective deep learning model. The pivotal in each hidden layer.
stages within the GA-PSO-DNN optimization algorithm encompass the 8. Objective Function: In the PSO phase, the objective function
following key steps: aligns with the fitness function utilized in the GA phase, as-
sessing the mean squared error in the validation dataset. The
Algorithm 1 GA-PSO-DNN Algorithm primary aim of PSO is to minimize this error.
1: Initialize GA Population 9. Optimization: PSO is employed iteratively to update the archi-
2: Generate random neural network architectures tecture parameters, striving to minimize the objective function.
3: Encode architectures as chromosomes (neuron counts for two This optimization method fine-tunes the neural network archi-
hidden layers) tecture by dynamically adjusting the number of neurons in the
4: while GA termination criteria not met do hidden layers, taking cues from the optimal solution identified
5: Evaluate Fitness Function during the GA phase.
6: Calculate mean squared error (MSE) on validation dataset
7: Apply Genetic Operators The pseudocode describing the detailed steps of the proposed algorithm
8: Perform two-point crossover is presented in Algorithm 1, while the corresponding flowchart that
9: Apply uniform integer mutation illustrates the overall process and decision flow is shown in Fig. 2.
10: Selection Together, these visual and textual representations provide a clear and
11: Use tournament selection to choose individuals for reproduction comprehensive understanding of the algorithm’s operation, specifically
within the context of optimizing DNNs. The process begins with the
12: Generate New Population initialization phase using the GA to explore and refine the DNN ar-
13: end while chitecture, followed by the PSO phase, which further fine-tunes the
14: Initialize PSO architecture to enhance performance. These tools not only clarify the
15: Set initial particle as the best architecture from GA structure of the algorithm, but also highlight the key operations and
16: Define upper and lower bounds for neuron counts decision points that drive its optimization process.
17: while PSO termination criteria not met do
18: Evaluate Objective Function 3.2. Practical implementation details
19: Calculate MSE on validation dataset
20: Update Particle Positions 3.2.1. Feature extraction
21: Adjust neuron counts based on PSO principles
A crucial facet of feature engineering lies in creating time-based fea-
22: end while
tures, particularly relevant for time-series data. These features, derived
23: Return Optimal Architecture
from temporal information like day of the week, month, or season,
24: Output the neural network architecture with the lowest MSE
prove instrumental in capturing nuanced temporal patterns. In this
research, we have crafted several time-based features from the core
1. Fitness Function: A fitness function is defined to quantitatively dataset, including Hour, Day of the week, Quarter, Month, Year, Day
assess the performance of a given neural network architecture. of the year, Day of the month, and Week of the Year.
The mean squared error loss on a validation dataset serves as To visually interpret the distribution characteristics of energy con-
the evaluative metric within this function. The neural network’s sumption, we employ a box plot. Fig. 3 showcases each hour as an
architecture is encoded as a chromosome, where genes represent individual box, delineating the interquartile range (IQR) of energy
the number of neurons in the two hidden layers. The fitness consumption within that timeframe. A horizontal line within the box
function evaluates the performance of the neural network with denotes the median energy consumption, while whiskers extend to
this architecture in the validation dataset. illustrate consumption range within 1.5 times the IQR. Outliers, beyond
2. Chromosome Representation: Each member of the GA popula- this range, are visualized separately. This box plot provides a compre-
tion is represented by two numbers, which indicate the number hensive overview of the dataset, quickly revealing central tendencies,
of neurons in the two hidden layers of the neural network. data spread, and any anomalies or extreme consumption patterns.
3. Initialization: The GA population is initialized by generating The visualization offers valuable insights into energy usage variability
a set of individuals, each characterized by a random neural throughout the day, empowering informed decision-making for energy
network architecture. management and optimization strategies.

5
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 2. Flowchart of GA-PSO-DNN algorithm.

Fig. 3. Box plots by hour and month for different datasets.

Analyzing the annual energy consumption patterns provides a holis- vividly illustrating the interquartile range (IQR) of energy consumption
tic understanding of the dynamics shaping demand throughout the within specific timeframes. This visual aid not only enhances compre-
year. Seasonal variations, long-term trends, and the distinctive impact hension but also facilitates more informed decision-making for both
of holidays and special events emerge as key facets of this comprehen- energy providers and consumers.
sive examination. Beyond mere observation, this data-driven approach
Although a typical split ratio often allocates 70%–80% of the data
serves as a linchpin for optimizing infrastructure planning, ensuring en-
ergy efficiency, strategic budgeting, seamless integration of renewable for training and 20%–30% for testing, these percentages may vary
energy sources, and unwavering adherence to regulatory compliance. based on the size and characteristics of the dataset. In some cases,
The graphical representation in Fig. 3 effectively encapsulates this more advanced techniques, such as cross-validation, may be employed
wealth of information by delineating each month as a distinct box, for robust evaluation. In our specific dataset, the data post-2015 are

6
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 4. Energy consumption and data test/train for different datasets.

designated as the test set for some experiments, and September 2022 architecture with encoded parameters which indicate the
for others, as illustrated in Fig. 4. number of neurons in the two hidden layers of the neural
network. - The fitness of each solution is evaluated based
3.2.2. Fine-tuning hyperparameters on the model’s performance, using the mean squared error
The implementation of the proposed GA-PSO-DNN framework re- (MSE) or mean absolute error (MAE) on the validation
quires careful attention to several practical considerations, such as dataset. The fitness function quantitatively assesses the
parameter tuning, computational resource allocation, and scalability performance by computing the MSE loss on the validation
for larger datasets or real-time applications. dataset. - Genetic operations such as selection, crossover,
and mutation are applied to generate new candidate so-
1. Optimization Parameters The optimization process heavily re- lutions, iteratively improving the population over mul-
lies on selecting appropriate parameters for both the GA and PSO tiple generations. Specifically, the two-point crossover
phases. The key parameters that require tuning include: strategy and uniform integer mutation are implemented
for effective exploration. - Tournament selection is used
(a) GA Parameters: The population size, crossover rate, mu- to promote the retention of superior architectures for
tation rate, and the number of generations need to be reproduction in each generation, with individuals with
configured. Typically, a larger population and more gen- superior architectures being more likely to be selected for
erations improve exploration but increase computational subsequent generations. - At the end of the GA phase,
costs. We performed a sensitivity analysis to find the the optimal architecture is identified based on the fit-
most suitable values, which balance performance and ness function, which achieves the minimum MSE on the
efficiency. validation dataset.
(b) PSO Parameters: The number of particles, inertia weight, (b) PSO Phase: - The best-performing architecture from the
cognitive and social coefficients, and the maximum num- GA phase is used as the initial particle for the PSO phase.
ber of iterations are crucial for PSO’s effectiveness in - PSO fine-tunes the hyperparameters and weights of the
refining the architecture. We found that adjusting these DNN by optimizing the particle positions in the solution
parameters dynamically during the optimization process space. Each particle adjusts its position based on its own
enhanced convergence speed and accuracy. best-known solution and the global best solution found
by the swarm, guided by inertia, cognitive, and social
2. Optimization Process of GA-PSO for DNN: The GA-PSO frame- coefficients. - The search space for PSO is confined by
work operates in a hybrid manner to optimize the architecture upper and lower bounds, which delineate the acceptable
and hyperparameters of the DNN. The detailed steps of the range for the number of neurons in each hidden layer. -
process are as follows: The objective function in the PSO phase aligns with the
fitness function from the GA phase, focusing on minimiz-
(a) GA Phase: - The GA is employed for the initial global ing the MSE on the validation dataset. - PSO is employed
search to explore potential DNN architectures. - Each can- iteratively to update the architecture parameters, dynami-
didate solution (chromosome) represents a unique DNN cally adjusting the number of neurons in the hidden layers

7
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Table 2
Hyperparameters for GA-PSO-DNN model.
Hyperparameter Description Value/Max
GA population size Number of solution in each generation 20
GA crossover rate Probability of crossover between two parent solutions 0.8
GA mutation rate Probability of mutation in offspring 0.1
GA generations Number of generations to evolve the population 50
GA selection method Method for selecting individuals for reproduction Tournament
PSO number of particles Number of particles in the PSO swarm 20
PSO inertia weight Weight that controls the influence of the previous velocity 0.9
PSO cognitive coefficient Weight that controls particle’s self-awareness 2.0
PSO social coefficient Weight that controls particle’s social influence 2.0
PSO Max iterations Maximum number of iterations for PSO optimization 50
DNN learning rate Step size for updating weights during training 0.001
DNN batch size Number of training samples per batch 32
DNN epochs Number of training iterations 20
DNN dropout rate Fraction of input units to drop for regularization 0.2
DNN hidden layers Number of layers in the neural network 2
DNN neurons per layer Number of neurons in each hidden layer By GA-PSO

to refine the architecture based on the optimal solution 3.3. Limitations of the proposed framework
identified during the GA phase.
(c) Convergence and Output: - The optimization process While the proposed GA-PSO-DNN framework demonstrates substan-
concludes when a convergence criterion is met, such as tial improvements in forecast accuracy and computational efficiency, it
reaching a predefined number of iterations or achieving is important to recognize the following limitations:
a satisfactory fitness threshold. - The optimized DNN
architecture and its hyperparameters are then finalized 1. Computational Overhead: The hybrid nature of the GA and
for further training and testing. PSO approach introduces significant computational demands,
especially for larger datasets or more complex architectures.
3. Computational Resources: For our experiments, we utilized the Although parallelization strategies can mitigate this to some
following resources: extent, further optimization of the metaheuristic parameters is
necessary to reduce training time.
(a) Hardware: The optimization was performed on a high-
2. Scalability to Real-Time Applications: The current implemen-
performance computing cluster equipped with multiple
tation focuses on offline optimization and lacks direct integra-
GPUs to accelerate the training of DNNs and the optimiza-
tion with real-time forecasting systems. Adapting the framework
tion process.
for real-time applications requires further exploration, such as
(b) Software: The implementation was done using Python
incremental learning or faster convergence techniques.
with libraries such as TensorFlow for the neural net-
3. Dataset Dependency: The performance of the framework is
work training, DEAP for genetic algorithm operations,
highly dependent on the quality and characteristics of the
and PySwarms for PSO. Parallelization techniques were
dataset. The handling of noisy, incomplete, or highly dynamic
employed to speed up the evaluation of fitness functions
across multiple generations and particles. datasets may require additional preprocessing steps or robust
mechanisms to ensure model reliability.
The overall computational time for each experiment varied de- 4. Generality Across Domains: Although the framework has
pending on the size of the dataset, the complexity of the DNN, shown strong performance in energy forecasting tasks, its adapt-
and the number of generations or iterations in the GA and PSO ability and effectiveness in other domains with different data
phases. On average, the training time per model ranged from a patterns remain untested. More experiments across diverse ap-
few hours to several days, depending on the configuration. plications are needed to validate its generalizability.
4. Scalability and Real-Time Applications: While the proposed
method demonstrates strong performance on smaller datasets, 4. Computational results
its scalability to larger datasets and real-time forecasting appli-
cations remains a key consideration: In this section, we employ the proposed algorithm within the do-
main of deep learning, evaluating its performance using six distinct
(a) Larger Datasets: To handle larger datasets, the algorithm criteria. We conduct a comparative analysis between two scenarios: the
can be further parallelized. The GA-PSO optimization Basic Deep Neural Network (BDNN) and the Optimized Deep Neural
phases can be distributed across multiple machines to Network (ODNN). Our evaluation involves benchmarking against state-
accelerate the evaluation of potential architectures. Ad- of-the-art algorithms to assess the algorithm’s effectiveness in both
ditionally, model size reduction techniques, such as prun- configurations. First, we compare the proposed GA-PSO algorithm with
ing or weight quantization can be employed to optimize Bayesian Optimization and an Evolutionary Strategy to analyze the effi-
memory usage and inference time without compromising ciency and robustness of each optimization technique. This comparison
accuracy.
will highlight the relative strengths and weaknesses of these methods
(b) Real-Time Forecasting: For real-time applications, such in optimizing deep neural network architectures, enabling us to better
as predicting energy consumption and forecasting of PV understand their performance across various test scenarios.
algorithm must be adapted to achieve faster convergence.
Techniques such as transfer learning could be explored, 4.1. Evaluation criteria
where a pre-trained model is fine-tuned with new data,
reducing the computational overhead during deployment. In this study, we evaluate the performance of the regression model
using six criteria, namely Root Mean Squared Error (RMSE), Mean
Absolute Error (MAE), R-squared (𝑅2 ) Score, Median Absolute Error
Table 2 shows the details of the hyperparameters for GA-PSO-DNN (MedAE), Explained Variance Score (EV), and Relative Absolute Error
Model. (RAE). These criteria are defined as follows:

8
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

1. Root Mean Squared Error (RMSE): Table 3


Hyperparameter optimization algorithm statistics.
The RMSE is a primary performance indicator for regression
Metrics GA-PSO Bayesian optimization Evolutionary strategy
models, quantifying the average difference between predicted
values and actual values. Measures the accuracy of the model Mean Std Dev Mean Std Dev Mean Std Dev

in predicting the target value. Mathematically, it is defined as: MSE 0.0435 0.0840 0.3292 1.9883 0.0572 0.0204
√ MAE 0.1763 0.0762 0.2923 0.4483 0.1907 0.0322
√ 𝑛
√1 ∑ RMSE 0.1546 0.0135 0.3462 0.4576 0.2360 0.0385
RMSE = √ (𝑦 − 𝑦̂𝑖 )2 (1) R2 0.9767 0.0152 0.6270 2.2528 0.9352 0.0231
𝑛 𝑖=1 𝑖 MAPE 20.8455 0.1095 21.7854 6.5686 20.8541 0.1951
Max Error 0.0531 0.0324 0.3467 0.5783 0.0790 0.0821
where 𝑛 is the number of observations, 𝑦𝑖 represents the actual Adjusted R2 0.9460 0.0092 0.6321 2.2467 0.9361 0.0246
values, and 𝑦̂𝑖 represents the predicted values.
2. Mean Absolute Error (MAE):
MAE measures the average absolute differences between pre-
We compare GA-PSO with Bayesian Optimization and Evolution-
dicted and actual values, offering robustness against outliers. It
ary Strategy by evaluating their convergence behavior, computational
is calculated as:
efficiency, and the quality of the final solutions. Each algorithm was
1∑
𝑛
MAE = |𝑦 − 𝑦̂𝑖 | (2) applied to a set of optimization test functions, such as the Sphere,
𝑛 𝑖=1 𝑖 Rastrigin, and Ackley functions, which are known for their varying
3. R-squared (𝑅2 ) Score: degrees of complexity.
𝑅2 is a metric indicating the proportion of variance in the The performance of the algorithms was measured by their best
dependent variable (𝑅2 ) that is predictable from the independent achieved fitness values and the number of generations required to
variables (features). Ranges from 0 to 1, with higher values converge to the optimal solution. The results of this comparison is
denoting better fit to the model. shown in Fig. 6, which provides insight into the relative advantages of
∑ our combining of GA and PSO for solving complex optimization prob-
(𝑦𝑖 − 𝑦̂𝑖 )2
𝑅2 = 1 − ∑𝑖 (3) lems, as well as highlight how Bayesian Optimization and Evolutionary
̄2
𝑖 (𝑦𝑖 − 𝑦) Strategy perform in comparison to a hybrid approach like GA-PSO.
4. Median Absolute Error (MedAE):
MedAE utilizes the median instead of the mean and is resilient 4.2.2. Comparison on DNN
to outliers. It is calculated as the median of |𝑦𝑖 − 𝑦̂𝑖 | We present a comparative analysis of GA-PSO with Bayesian Opti-
mization and Evolutionary Strategy using multiple evaluation metrics.
MedAE = median(|𝑦𝑖 − 𝑦̂𝑖 |) (4)
The comparison is performed across nine key metrics: MSE, MAE,
5. Explained Variance Score (EV): RMSE, R2 , MAPE, Explained Variance, Median Absolute Error, Max
The EV metric quantifies the proportion of the variance in the Error, and Adjusted R2 . These metrics provide a comprehensive eval-
dependent variable that the model explains. It is calculated as: uation of model performance, covering both prediction accuracy and
explanatory power of the model. The performance of each optimization
Var(𝑦 − 𝑦)
̂
EV = 1 − (5) method is assessed across generations of model tuning.
Var(𝑦)
The synthetic dataset used in this study consists of 1000 samples,
where 𝑉 𝑎𝑟 denotes the variance. each with 10 features, generated through a random process. The target
6. Relative Absolute Error (RAE): values are derived as the sum of the features, with added Gaussian
RAE normalizes the MAE by dividing it by the average absolute noise to simulate real-world forecasting conditions. The dataset is split
error of a simple model (e.g., a model predicting the mean into training and validation sets to evaluate the model’s generalization
of the target variable). This normalization helps in assessing capability.
performance relative to a baseline model. The RAE is given by: We used a feedforward neural network for the model, consisting of
MAE one hidden layer with a variable number of neurons, which is optimized
RAE = ∑𝑛 (6) by the different hyperparameter optimization methods. This simple
1
𝑛 𝑖=1 |𝑦𝑖 − 𝑦|
̄
architecture is commonly referred to as a DNN, where the number of
where 𝑦̄ is the mean of the target variable. neurons in the hidden layer is the key hyperparameter that is being
tuned. The plots in Fig. 7 provide a comprehensive comparison of
These criteria collectively provide a comprehensive evaluation of the performance of GA-PSO, Bayesian Optimization, and Evolutionary
the regression model’s predictive capabilities. Strategy in nine different evaluation criteria. These criteria encompass
both error-based metrics, such as mean absolute error and root mean
4.2. GA-PSO vs. Hyperparameter optimizers square error, and accuracy-driven indicators. As observed in these fig-
ures, GA-PSO consistently outperforms the other two hyperparameter
optimization methods, demonstrating superior performance in mini-
In this subsection, we conduct a detailed comparison between the
mizing prediction errors and maximizing accuracy. These results offer
proposed GA-PSO algorithm and two other widely used optimization
valuable insight into the strengths and limitations of each optimiza-
techniques: Bayesian Optimization and Evolutionary Strategy for both
tion technique, providing a clear understanding of their respective
optimization benchmarks and tuning DNN.
advantages in energy forecasting tasks.
Table 3 presents the optimization algorithm statistics for GA-PSO,
4.2.1. Comparison on optimization benchmarks Bayesian Optimization, and Evolutionary Strategy on a set of evaluation
The focus of this comparison is on the performance of these al- metrics. The mean and standard deviation values provide a detailed
gorithms in solving optimization test functions, specifically targeting view of the variability in performance of each algorithm. GA-PSO
their ability to efficiently minimize the objective function in multiple consistently demonstrates the lowest mean error values in all metrics,
benchmark problems. These test functions include both unimodal and including MSE, MAE, RMSE, and R2 , indicating its superior ability
multimodal problems, designed to challenge the algorithms in different to minimize prediction errors. The standard deviation values for GA-
optimization landscapes shown in Fig. 5. PSO are relatively smaller, reflecting its robustness in performance

9
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 5. Optimization benchmarks.

Fig. 6. Comparison of GA-PSO with optimization methods for benchmarks.

consistency. In contrast, Bayesian Optimization exhibits higher mean 4.3. GA-PSO on DNN for energy forecasting
errors and larger standard deviations, suggesting less stability and
efficiency in its optimization. Evolutionary Strategy shows competitive In this study, we conducted a thorough analysis of energy consump-
performance but with higher mean errors in all metrics, along with tion prediction using a diverse set of machine learning algorithms,
more considerable fluctuations in standard deviation. These statistics namely XGBoost, k-nearest neighbors (KNN), Decision Tree, Random
highlight the relative strengths and weaknesses of each optimization Forest, and Linear Regression. The primary aim was to evaluate the
method, with GA-PSO emerging as the most reliable choice. effectiveness of these models in capturing intricate patterns inherent

10
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 7. Comparison of optimization techniques on DNN performance metrics — Five generations.

Table 4
Prediction performance over RMSE (↓).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 1542.6 1228.4 229.2 383.5 1593.6 203.0 290.2 790.9 1315.8 2577.3 3289.9 700.5
BDNN 1935.9 1404.6 261.5 399.7 1816.9 222.6 301.5 925.1 1339.4 2877.3 3889.9 707.3
XGBst 1649.4 1390.7 232.8 400.9 1732.0 186.2 308.1 845.8 1344.5 2944.3 3726.8 706.0
RF 1929.5 1595.6 276.1 462.9 2145.6 237.1 353.7 971.6 1440.9 3395.5 4339.9 911.1
KNN 2098.1 1740.7 311.5 498.5 2221.6 256.6 372.9 1084.9 1680.9 3746.6 4734.2 910.9
SVM 2623.6 2275.2 380.2 620.8 2604.6 307.5 382.2 1324.3 2300.4 5929.0 6457.5 999.0
DT 2189.2 1876.2 239.6 478.8 2022.0 261.6 334.0 1069.9 1776.3 4262.6 5295.9 822.6
LR 2225.7 2007.5 336.5 550.3 2288.4 278.8 371.1 1155.7 2104.9 4838.0 5683.9 915.5

in energy consumption data. The figures in this section depict a com- The KNN algorithm, using its proximity-based approach, excels in
parison between the predicted values generated by each algorithm and capturing localized patterns and is particularly effective when energy
the actual energy consumption values. Tables 4, 5, 6, 7, 8, 9 present consumption exhibits distinct clusters. However, its performance may
the prediction performance of the predictive models, evaluating metrics vary in regions with sparse data or less defined clusters. However,
such as RMSE, MAE, R2, MEDAE, EV, and RAE. Furthermore, Figs. 8, the decision tree model demonstrates a propensity for capturing hi-
9, 10, and 11 illustrate the actual versus predicted data for the AEP, FE, erarchical dependencies, but its performance may plateau in intricate
PV, and BC datasets, respectively, offering a comprehensive visualiza-
scenarios where more sophisticated models, like the neural network,
tion of the model’s performance across different scenarios. Upon close
prove advantageous.
examination of the plots, it becomes apparent that the neural network
model fine-tuned by our methodology demonstrates a remarkable abil- Linear regression, while providing a baseline for comparison, re-
ity to closely approximate true energy consumption values, showcasing veals its limitations in accommodating the intricate dynamics of energy
its superior predictive performance. The KNN algorithm, which relies consumption, especially when nonlinear relationships play a significant
on local patterns, also produces commendable results, particularly in role. These findings underscore the importance of selecting a modeling
scenarios characterized by discernible clusters. In contrast, the decision approach that aligns with the underlying complexity of the data.
tree and linear regression models exhibit varying degrees of accuracy, To further explore the landscape of energy consumption prediction,
with potential limitations in capturing non-linear dependencies within we included the XGBoost algorithm in our analysis. XGBoost, as an
the data.
ensemble learning method, combines the predictive power of multiple
These visual representations not only illuminate the potential
decision trees, sequentially correcting the errors of its predecessors.
strengths and weaknesses of each model in the context of energy
consumption prediction, but also provide valuable insight. The neural In this section, we thoroughly examine the training dynamics of our
network, with its ability to discern complex patterns and relationships, ODNN designed for prediction of energy consumption. We juxtapose
emerges as a promising choice for accurate forecasting in this domain. its learning curve with that of a BDNN. The learning curve serves as
Its ability to adapt to non-linear dependencies is particularly evident in a crucial diagnostic tool, elucidating the progression of training and
regions of the plot where traditional linear models falter. validation performance metrics over successive epochs.

11
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 8. Actual vs. Predicted data visualization for AEP dataset.

Table 5
Prediction performance over MAE (↓).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 1281.5 889.1 178.0 299.5 1161.9 168.0 212.5 591.1 940.2 1947.1 2766.5 538.7
BDNN 1577.1 1027.2 213.3 308.2 1354.7 172.1 220.7 686.2 1010.4 2274.1 3366.5 546.1
XGBst 1319.0 983.4 178.5 309.2 1298.7 145.5 224.1 647.9 988.3 2138.2 2902.2 541.8
RF 1448.3 1089.4 206.0 346.3 1520.6 177.7 245.4 716.1 1009.0 2466.3 3201.1 639.2
KNN 1604.1 1224.1 234.3 370.1 1601.7 191.7 255.4 809.8 1230.5 2817.2 3564.9 664.2
SVM 2181.7 1703.3 301.5 474.0 1950.3 247.0 291.1 1027.2 1743.5 4568.8 5116.7 799.7
DT 1853.9 1482.2 239.6 376.8 1556.5 217.5 252.7 849.0 1398.9 3236.4 4312.1 664.3
LR 1820.5 1481.7 260.3 441.5 1806.7 229.2 278.9 897.2 1705.3 3748.8 4600.3 710.2

Table 6
Prediction performance over 𝑅2 (↑).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 0.65 0.68 0.64 0.61 0.59 0.53 0.39 0.64 0.67 0.81 0.73 0.62
BDNN 0.40 0.62 0.53 0.46 0.49 0.50 0.32 0.60 0.59 0.74 0.71 0.58
XGBst 0.57 0.63 0.63 0.57 0.51 0.61 0.31 0.59 0.66 0.75 0.67 0.51
RF 0.41 0.51 0.47 0.43 0.25 0.36 0.09 0.46 0.61 0.67 0.55 0.50
KNN 0.30 0.41 0.33 0.34 0.20 0.25 −0.01 0.33 0.47 0.59 0.46 0.18
SVM −0.09 0.00 0.00 −0.03 −0.10 −0.07 −0.06 0.00 0.00 −0.02 0.00 0.01
DT 0.24 0.32 0.39 0.39 0.34 0.22 0.19 0.35 0.40 0.47 0.33 0.33
LR 0.22 0.22 0.22 0.19 0.15 0.12 0.00 0.24 0.16 0.32 0.22 0.24

12
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 9. Actual vs. Predicted data visualization for FE dataset.

The BDNN learning curve unveils valuable insights into the model’s statistical measures, providing a clearer understanding of the signifi-
inherent behavior during training, revealing trends related to con- cance and practical relevance of the findings. The following statistical
vergence speed, potential overfitting, and overall stability. A signifi- analyzes were applied:
cant disparity between the training and validation curves may signify
challenges in generalization and efficiency. • Statistical Tests: To evaluate the significance of the differences
observed between the methods, we performed both t-tests and
Our scrutiny extends to the ODNN, wherein the proposed opti-
Analysis of Variance (ANOVA). The t-test and ANOVA were both
mization method is applied. By comparing the learning curves of both
used for pairwise comparisons between two methods to assess the
models, our goal is to elucidate the impact of optimization on the
statistical significance of the differences observed.
convergence rate, generalization performance, and resource efficiency.
• Effect Sizes and Confidence Intervals (CIs): In addition to p-
This comparative analysis serves multiple purposes, including the values, we calculated effect sizes to assess the magnitude of the
evaluation of optimization techniques in terms of model stability, mit- observed differences. Effect sizes help to understand not just
igating overfitting, and achieving resource-efficient convergence. Fur- whether a difference exists, but how substantial that difference
thermore, the insights gleaned from the learning curves contribute to is in practical terms. Furthermore, we report 95% confidence
informed decision-making regarding the selection of the most suitable intervals (CIs) for key parameters, which provide a range of
model for deployment in real-world energy consumption prediction values within which the true population parameter is likely to fall
applications. with confidence 95%. This enhances the reliability of our findings
Figs. 12 and 13 show visually depict the learning curves for both and adds precision to the statistical reporting.
BDNN and ODNN across various datasets. These figures provide a tan- • Statistical Reporting: All statistical results are presented with
gible representation of their respective training dynamics, highlighting p-values, effect sizes, and confidence intervals where applicable.
the discernible benefits derived from the optimization process. This approach allows for a transparent interpretation of the data.
The inclusion of effect sizes and CIs provides a more complete
picture, facilitating a better understanding of the importance of
4.4. Statistical analysis
observed differences.

We have incorporated a comprehensive statistical analysis of the Table 10 summarizes the statistical analysis of the results for key
results, which is presented in Table 10. This ensures that the data is comparisons across methods. As shown in the table, all p-values are
analyzed rigorously and that the results are reported with appropriate below the commonly accepted threshold of 0.05, indicating that the

13
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 10. Actual vs. Predicted data visualization for PV dataset.

Table 7
Prediction performance over MEDAE (↓).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 981.7 650.4 141.9 241.5 850.2 151.9 152.5 445.4 649.9 1452.9 2210.8 418.9
BDNN 1385.9 768.0 190.3 255.0 937.5 168.3 154.8 451.0 709.4 1623.3 2310.8 436.3
XGBst 1128.4 694.4 138.7 239.1 955.6 124.5 155.2 515.1 751.4 1569.9 2337.4 424.5
RF 1075.8 681.9 152.1 257.3 1028.9 134.7 159.0 513.2 710.6 1781.3 2342.2 450.1
KNN 1236.6 827.4 178.4 275.2 1124.9 145.0 160.2 602.5 906.9 2166.6 2740.2 473.6
SVM 1920.1 1370.1 257.5 375.3 1447.8 208.3 227.8 848.7 1452.2 3734.1 4222.5 669.8
DT 1784.9 1262.7 210.0 308.0 1242.8 200.0 186.4 713.1 1180.0 2648.2 3797.3 589.0
LR 1629.7 1156.4 209.2 384.8 1546.6 207.3 221.2 743.6 1522.2 3130.8 4047.3 590.1

Table 8
Prediction performance over EV (↑).
Alg AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
ODNN 0.67 0.69 0.65 0.62 0.60 0.68 0.39 0.65 0.69 0.78 0.73 0.53
BDNN 0.62 0.64 0.57 0.60 0.59 0.60 0.30 0.62 0.65 0.76 0.71 0.50
XGBst 0.58 0.63 0.63 0.57 0.52 0.61 0.32 0.59 0.66 0.75 0.67 0.51
RF 0.44 0.51 0.47 0.43 0.25 0.42 0.10 0.47 0.63 0.67 0.55 0.48
KNN 0.33 0.41 0.33 0.34 0.20 0.33 0.00 0.34 0.54 0.60 0.46 0.19
SVM 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01
DT 0.39 0.33 0.40 0.39 0.35 0.35 0.19 0.37 0.41 0.49 0.35 0.33
LR 0.23 0.23 0.26 0.20 0.15 0.22 0.07 0.24 0.27 0.33 0.23 0.28

14
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 11. Actual vs. Predicted data visualization for BC dataset.

Table 9
Prediction performance of the predictive models over RAE (↓).
XXX Dataset
XXX AEP COMD DAYTON DEOK DOM DUQ EKPC FE NI PJM PJME PJMW
Alg X
ODNN 0.58 0.47 0.59 0.62 0.59 0.74 0.71 0.58 0.54 0.43 0.52 0.61
BDNN 0.78 0.60 0.71 0.68 0.67 0.78 0.76 0.62 0.58 0.45 0.58 0.70
XGBst 0.65 0.58 0.59 0.64 0.66 0.64 0.75 0.63 0.57 0.47 0.57 0.67
RF 0.72 0.64 0.68 0.72 0.77 0.79 0.82 0.70 0.58 0.55 0.63 0.75
KNN 0.79 0.72 0.78 0.77 0.81 0.85 0.86 0.79 0.71 0.62 0.71 0.82
SVM 1.08 1.00 1.00 0.98 0.99 1.09 0.98 1.00 1.01 1.01 1.01 0.99
DT 0.91 0.87 0.79 0.78 0.79 0.96 0.85 0.83 0.81 0.72 0.85 0.82
LR 0.90 0.87 0.86 0.92 0.91 1.02 0.94 0.87 0.98 0.83 0.91 0.90

Table 10 to 0.52, which is considered a moderate effect, suggesting that the


Statistical analysis of the results.
differences between the models are not only statistically significant,
ODNN vs. Statistical test p-value Effect size 95% CIs but also of practical importance. The confidence intervals 95% for each
BDNN t-test 0.023 0.45 (0.10, 0.80) comparison further confirm the consistency of the observed effects,
BDNN ANOVA 0.031 0.38 (0.05, 0.72)
providing a clear picture of the reliability of our findings.
XGB t-test 0.014 0.52 (0.15, 0.88)
XGB ANOVA 0.024 0.40 (0.12, 0.75)
KNN t-test 0.011 0.52 (0.15, 0.88) 4.5. Novelty and key findings
KNN ANOVA 0.020 0.41 (0.09, 0.74)
RF t-test 0.026 0.52 (0.15, 0.88) This study aims to develop a novel hybrid optimization framework
RF ANOVA 0.022 0.47 (0.11, 0.78)
that combines GA and PSO to optimize the architecture of DNNs for
energy forecasting tasks. The proposed approach is distinguished by
following unique aspects:

1. Hybrid Optimization Framework: The GA-PSO framework


differences between ODNN and the other models (BDNN, XGB, KNN, synergizes the complementary strengths of GA’s global explo-
and RF) are statistically significant. The effect sizes range from 0.38 ration capabilities and PSO’s local refinement precision. This

15
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 12. ODNN vs. BDNN learning curve.

dual-phase strategy ensures both extensive search of the archi- This dynamic tuning is a key innovation that differentiates the
tecture space and fine-tuned optimization, addressing limitations method from traditional optimization techniques.
of individual methods when applied independently. 5. Superior Performance: The GA-PSO framework consistently
2. Application to Energy Forecasting: Unlike prior studies that outperforms state-of-the-art hyperparameter optimization ap-
focus solely on traditional machine learning or optimization proaches, such as Bayesian Optimization and Evolutionary
benchmarks, our work applies the hybrid GA-PSO framework Strategies, across both benchmark tests and real-world forecast-
to real-world energy forecasting tasks, including photovoltaic ing tasks.
(PV) production and energy consumption. This focus on practical
applications underscores the utility and impact of the method in The proposed method offers significant advantages, including improved
the energy sector. forecast accuracy, computational efficiency, and scalability. However,
3. Scalability : The GA-PSO-enhanced ODNN demonstrates supe- the hybrid nature of GA and PSO introduces a potential computational
rior scalability and adaptability when applied to various overhead, which could be mitigated by further optimizing the meta-
datasets. The results show a significant improvement of 27% heuristic parameters. Despite this, the benefits outweigh the challenges,
in the accuracy of the prediction and 22% in the error metrics particularly for applications requiring high accuracy and adaptability.
compared to the state-of-the-art methods. These contributions collectively establish the novelty of the proposed
4. Dynamic Parameter Tuning: A novel aspect of the frame- method and its potential to advance the state-of-the-art in energy
work is the use of dynamic parameter adjustment during the forecasting, offering actionable insights and tools for practitioners in
PSO phase, which enhances convergence speed and accuracy. the energy sector.

16
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Fig. 13. ODNN vs. BDNN learning curve.

5. Conclusion and future works • Apply the GA-PSO-DNN framework to optimize hybrid energy
systems for better energy management and storage optimization.
In this paper, we proposed a hybrid metaheuristic framework com-
bining GA and PSO to optimize DNN architectures for accurate fore- By addressing these areas, the proposed framework can further
casting of energy consumption and PV production. The framework advance neural network optimization and its application across energy
effectively automates neural network configuration, improving both and other sectors.
prediction accuracy and computational efficiency. Experimental re-
sults demonstrate the robustness of the approach, with notable per- CRediT authorship contribution statement
formance improvements over traditional methods, making it suitable
for applications like smart grid management and renewable energy Eghbal Hosseini: Developed the original concept, Methodology,
integration. Writing – original draft,Writing – review & editing. Barzan Saeedpour:
Future research could extend this framework to: Writing – original draft, Writing – review & editing. Mohsen Banaei:
Data collection, Analysis, Interpretation. Razgar Ebrahimy: Writing –
• Explore its adaptability to convolutional and recurrent networks review & editing, Read and approved the final version.
for diverse forecasting tasks.
• Develop multi-objective optimization for balancing error mini- Funding
mization, computational efficiency, and interpretability.
• Integrate real-time data and online learning to enhance adaptabil- This research work is supported by DTU Compute, Technical Uni-
ity in dynamic environments. versity of Denmark, Copenhagen, Denmark.

17
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

Intellectual property [18] P.T. Kapen, Multi-objective optimization of a Wind/Photovoltaic/Battery hybrid


system using a novel hybrid meta-heuristic algorithm, Energy Convers. Manag.
327 (2025) 119533.
We confirm that we have given due consideration to the protection [19] S. Kollmannsberger, D. D’Angella, M. Jokeit, L. Herrmann, Deep energy method,
of intellectual property associated with this work, and there are no in: Deep Learning in Computational Mechanics: An Introductory Course, 2021,
impediments to publication. pp. 85–91.
[20] H. Yong, J. Huang, X. Hua, L. Zhang, Gradient centralization: A new optimization
technique for deep neural networks, in: InComputer Vision–ECCV 2020: 16th
Research ethics European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I
16, Springer International Publishing, 2020, pp. 635–652.
We further confirm that any aspect of the work covered in this [21] L. Jin, L. Wei, S. Li, Gradient-based differential neural-solution to time-dependent
manuscript involving human patients has been conducted with the nonlinear optimization, IEEE Trans. Autom. Control. 68 (1) (2022) 620–627.
[22] E.M. Dogo, O.J. Afolabi, N.I. Nwulu, B. Twala, C.O. Aigbavboa, A comparative
ethical approval of all relevant bodies, and such approvals are acknowl- analysis of gradient descent-based optimization algorithms on convolutional neu-
edged within the manuscript. ral networks, in: 2018 International Conference on Computational Techniques,
Electronics and Mechanical Systems, CTEMS, IEEE, 2018, pp. 92–99.
[23] Y. Bengio, Practical recommendations for gradient-based training of deep ar-
Declaration of competing interest
chitectures, in: Neural Networks: Tricks of the Trade: Second Edition, Springer
Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 437–478.
No conflict of interest exists. The authors confirm that there are no [24] M. Ancona, E. Ceolini, C. Öztireli, M. Gross, Towards better understanding
known conflicts of interest associated with this publication. of gradient-based attribution methods for deep neural networks, 2017, arXiv
preprint arXiv:1711.06104.
[25] J. Zhang, Gradient descent based optimization algorithms for deep learning
Data availability models training, 2019, arXiv preprint arXiv:1903.03614.
[26] X. Zhou, Z. You, W. Sun, D. Zhao, S. Yan, Fractional-order stochastic gradient
Data will be made available on request. descent method with momentum and energy for deep neural networks, Neural
Netw. 181 (2025) 106810.
[27] E. Samaniego, C. Anitescu, S. Goswami, V.M. Nguyen-Thanh, H. Guo, K. Hamdia,
X. Zhuang, T. Rabczuk, An energy approach to the solution of partial differential
References equations in computational mechanics via machine learning: Concepts, imple-
mentation and applications, Comput. Methods Appl. Mech. Engrg. 362 (2020)
[1] D. Wang, D. Tan, L. Liu, Particle swarm optimization algorithm: an overview, 112790.
Soft Comput. 22 (2) (2018) 387–408. [28] A. Sagu, N.S. Gill, P. Gulia, P.K. Singh, W.C. Hong, Design of metaheuristic
[2] M. Jawad, F. Wahid, S. Ali, Y. Ma, A. Alkhyyat, J. Khan, Y. Lee, Energy optimization algorithms for deep learning model for secure IoT environment,
optimization and plant comfort management in smart greenhouses using the Sustain. 15 (3) (2023) 2204.
artificial bee colony algorithm, Sci. Rep. 15 (1) (2025) 1752. [29] C. Stoean, M. Zivkovic, A. Bozovic, N. Bacanin, R. Strulak-Wójcikiewicz, M.
[3] E. Cuevas, M. Cienfuegos, D. Zaldívar, M. Pérez-Cisneros, A swarm optimization Antonijevic, R. Stoean, Metaheuristic-based hyperparameter tuning for recurrent
algorithm inspired in the behavior of the social-spider, Expert. Syst. Appl. 40 deep learning: Application to the prediction of solar energy generation, Axioms
(16) (2013) 6374–6384. 12 (3) (2023) 266.
[4] O. Kramer, O. Kramer, Genetic Algorithms, Springer International Publishing, [30] A.A. Eshmawi, M. Khayyat, S. Abdel-Khalek, R.F. Mansour, U. Dwivedi, K.K.
2017. Joshi, D. Gupta, Deep learning with metaheuristics based data sensing and
[5] E. Hosseini, Laying chicken algorithm: A new meta-heuristic approach to encoding scheme for secure cyber physical sensor systems, Clust. Comput. 26
solve continuous programming problems, J. Appl. Comput. Math. 6 (1) (2017) (4) (2023) 2245–2257.
344–351. [31] M.H. Sulaiman, Z. Mustaffa, N.F. Zakaria, M.M. Saari, Using the evolutionary
[6] E. Hosseini, Big bang algorithm: A new meta-heuristic approach for solving mating algorithm for optimizing deep learning parameters for battery state of
optimization problems, Asian J. Appl. Sci. 10 (3) (2017) 134–144. charge estimation of electric vehicle, Energy (2023) 128094.
[7] E. Hosseini, A.S. Sadiq, K.Z. Ghafoor, D.B. Rawat, M. Saif, X. Yang, Volcano [32] A. Al-Bossly, Metaheuristic optimization with deep learning enabled smart grid
eruption algorithm for solving optimization problems, Neural Comput. Appl. 33 stability prediction, Comput. Mater. Contin. 75 (3) (2023).
[33] G. Zhang, C. Xiao, N. Razmjooy, Optimal operational strategy of hybrid PV/wind
(7) (2021) 2321–2337.
renewable energy system using homer: a case study, Int. J. Ambient. Energy 43
[8] E. Hosseini, K. Ghafoor, A. Sadiq, M. Guizani, A. Emrouznejad, COVID-19 op-
(1) (2022) 3953–3966.
timizer algorithm, modeling and controlling of coronavirus distribution process,
[34] M.B. Umair, Z. Iqbal, M.A. Faraz, M.A. Khan, Y.D. Zhang, N. Razmjooy, S. Kadry,
IEEE J. Biomed. Heal. Inform. 24 (10) (2020) 2765–2775.
A network intrusion detection system using hybrid multilayer deep learning
[9] E. Hosseini, K.Z. Ghafoor, A. Emrouznejad, A.S. Sadiq, D.B. Rawat, Novel
model, Big Data 12 (5) (2024) 367–376.
metaheuristic based on multiverse theory for optimization problems in emerging
[35] M. Kaveh, M.S. Mesgari, Application of meta-heuristic algorithms for training
systems, Appl. Intell. 51 (6) (2021) 3275–3292.
neural networks and deep learning architectures: A comprehensive review,
[10] E. Hosseini, L. Reinhardt, D.B. Rawat, Optimizing Gradient methods for IoT
Neural Process. Lett. 55 (4) (2023) 4519–4622.
applications, IEEE Internet Things J. 9 (15) (2022) 13694–13704.
[36] P. Mange, A. Lule, R. Savant, Advanced spam email detection using machine
[11] X. Weng, A.A. Heidari, G. Liang, H. Chen, X. Ma, An evolutionary Nelder–Mead
learning and bio-inspired meta-heuristics algorithms, Int. J. Intell. Syst. Appl.
slime mould algorithm with random learning for efficient design of photovoltaic
Eng. 12 (4s) (2024) 122–135.
models, Energy Rep. 7 (2021) 8784–8804. [37] M. Gao, K. Gao, Z. Ma, W. Tang, Ensemble meta-heuristics and Q-learning for
[12] K. SureshKumar, P. Vimala, Energy efficient routing protocol using exponentially- solving unmanned surface vessels scheduling problems, Swarm Evol. Comput. 82
ant lion whale optimization algorithm in wireless sensor networks, Comput. (2023) 101358.
Netw. 197 (2021) 108250. [38] Y. Wei, H. Hashim, K.L. Chong, Y.F. Huang, A.N. Ahmed, A. El-Shafie, Investiga-
[13] P. Singh, N.K. Meena, J. Yang, E. Vega-Fuentes, S.K. Bishnoi, Multi-criteria tion of meta-heuristics algorithms in ANN streamflow forecasting, KSCE J. Civ.
decision making monarch butterfly optimization for optimal distributed energy Eng. 27 (5) (2023) 2297–2312.
resources mix in distribution networks, Appl. Energy 278 (2020) 115723. [39] F. Sadeghi, A. Larijani, O. Rostami, D. Martín, P. Hajirahimi, A novel multi-
[14] A.K. Sahoo, T.K. Panigrahi, G. Dhiman, K.K. Singh, A. Singh, Enhanced emperor objective binary chimp optimization algorithm for optimal feature selection:
penguin optimization algorithm for dynamic economic dispatch with renewable Application of deep-learning-based approaches for SAR image classification,
energy sources and microgrid, J. Intell. Fuzzy Syst. 40 (5) (2021) 9041–9058. Sensors 23 (3) (2023) 1180.
[15] R.M. Rizk-Allah, A.E. Hassanien, D. Song, Chaos-opposition-enhanced slime [40] S. Fong, S. Deb, X.S. Yang, How meta-heuristic algorithms contribute to deep
mould algorithm for minimizing the cost of energy for the wind turbines on learning in the hype of big data analytics, in: Progress in Intelligent Computing
high-altitude sites, ISA Trans. 121 (2022) 191–205. Techniques: Theory, Practice, and Applications, in: Proceedings of ICACNI 2016,
[16] H. Abdel-Mawgoud, S. Kamel, J. Yu, F. Jurado, Hybrid Salp Swarm algorithm vol. 1, Springer Singapore, 2018, pp. 3–25.
for integrating renewable distributed energy resources in distribution systems [41] X.J. Luo, L.O. Oyedele, A.O. Ajayi, O.O. Akinade, H.A. Owolabi, A. Ahmed,
considering annual load growth, J. King Saud Univ.- Comput. Inf. Sci. 34 (1) Feature extraction and genetic algorithm enhanced adaptive deep neural network
(2022) 1381–1393. for energy consumption prediction in buildings, Renew. Sustain. Energy Rev. 131
[17] A. Akter, E.I. Zafir, N.H. Dana, R. Joysoyal, S.K. Sarker, L. Li, S.M. Muyeen, (2020) 109980.
S.K. Das, I. Kamwa, A review on microgrid optimization with meta-heuristic [42] E. Hosseini, A.M. Al-Ghaili, D.H. Kadir, F. Daneshfar, S.S. Gunasekaran, M.
techniques: Scopes, trends and recommendation, Energy Strat. Rev. 51 (2024) Deveci, The evolutionary convergent algorithm: A guiding path of neural network
101298. advancement, IEEE Access. (2024).

18
E. Hosseini et al. Energy Strategy Reviews 59 (2025) 101704

[43] E. Hosseini, A.M. Al-Ghaili, D.H. Kadir, S.S. Gunasekaran, A.N. Ahmed, N. Jamil, [52] E.C. Blessie, B. Sundaravadivazhagan, V. Kumutha, V. Sumesh, Predictive model-
M. Deveci, R.A. Razali, Meta-heuristics and deep learning for energy applications: ing of household power consumption using machine learning and meta-heuristic
Review and open research challenges (2018–2023), Energy Strat. Rev. 53 (2024) optimization technique, in: Machine Learning for Radio Resource Management
101409. and Optimization in 5G and beyond, CRC Press, 2025, pp. 140–155.
[44] A. Ponmalar, K. Vijayakumar, C. Lakshmipriya, M. Karthikeyan, B.P. PJ, [53] Z. Wang, W. Xue, K. Li, Z. Tang, Y. Liu, F. Zhang, S. Cao, X. Peng, E.Q. Wu,
Meta-heuristics and machine learning applications in complex systems, in: Meta- H. Zhou, Dynamic combustion optimization of a pulverized coal boiler con-
heuristic and Machine Learning Optimization Strategies for Complex Systems, sidering the wall temperature constraints: A deep reinforcement learning-based
IGI Global, 2024, pp. 257–275. framework, Appl. Therm. Eng. 259 (2025) 124923.
[45] A. Saha, S. Rajak, J. Saha, C. Chowdhury, A survey of machine learning and [54] Z. Wang, H. Zhou, X. Peng, S. Cao, Z. Tang, K. Li, S. Fan, W. Xue, G. Yao, S.
meta-heuristics approaches for sensor-based human activity recognition systems, Xu, A predictive model with time-varying delays employing channel equalization
J. Ambient. Intell. Humaniz. Comput. 15 (1) (2024) 29–56. convolutional neural network for NOx emissions in flexible power generation,
[46] B. Gao, S. Peng, T. Li, F. Wang, J. Guo, C. Liu, H. Zhang, Integration of improved Energy 306 (2024) 132495.
meta-heuristic and machine learning for optimizing energy efficiency in additive [55] K.D. Lu, Z.G. Wu, T. Huang, Differential evolution-based three stage dynamic
manufacturing process, Energy 306 (2024) 132518. cyber-attack of cyber–physical power systems, IEEE/ASME Trans. Mechatronics
[47] S.K. Chauhan, V.S. Chauhan, Meta-heuristic algorithms for optimal sizing of 28 (2) (2022) 1137–1148.
hybrid renewable energy systems, in: Metaheuristic and Machine Learning [56] M.R. Chen, G.Q. Zeng, K.D. Lu, J. Weng, A two-layer nonlinear combination
Optimization Strategies for Complex Systems, IGI Global, 2024, pp. 184–200. method for short-term wind speed prediction based on ELM, ENN, and LSTM,
[48] H. Hu, S. Gong, B. Taheri, Energy demand forecasting using convolutional IEEE Internet Things J. 6 (4) (2019) 6997–7010.
neural network and modified war strategy optimization algorithm, Heliyon 10 [57] R.H. Shumway, D.S. Stoffer, R.H. Shumway, D.S. Stoffer, ARIMA models, in:
(6) (2024). Time Series Analysis and Its Applications: With R Examples, 2017, pp. 75–163.
[49] E. Hosseini, A.M. Al-Ghaili, D.H. Kadir, N. Jamil, M. Deveci, S.S. Gunasekaran, [58] A.A. Alsuwaylimi, Comparison of ARIMA, ANN and hybrid ARIMA-ANN models
R.A. Razali, Extra dimension algorithm: a breakthrough for optimization and for time series forecasting, Inf. Sci. Lett. 12 (2) (2023) 1003–1016.
enhancing DNN efficiency, Artif. Intell. Rev. 58 (1) (2025) 1–35. [59] P.T. Yamak, L. Yujian, P.K. Gadosey, A comparison between arima, lstm, and
[50] R. Nikou, A. Goli, A. Zackery, Improving electricity demand forecasting through gru for time series forecasting, in: Proceedings of the 2019 2nd International
hybrid neural networks and meta-heuristics: A case study in Iran, J. Dyn. Games Conference on Algorithms, Computing and Artificial Intelligence, 2019, pp.
12 (3) (2025) 243–266. 49–55.
[51] K. Pathmapriya, P.J. Prathap, Integrating deep learning and meta-heuristics for
healthcare: A survey, in: 2025 6th International Conference on Mobile Computing
and Sustainable Informatics, ICMCSI, IEEE, 2025, pp. 910–918.

19

You might also like