0% found this document useful (0 votes)
12 views35 pages

Ieeeaccess

Uploaded by

amirrezafeyzi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views35 pages

Ieeeaccess

Uploaded by

amirrezafeyzi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/380392940

Machine Learning Approaches for Power System Parameters Prediction: A


Systematic Review

Article in IEEE Access · May 2024


DOI: 10.1109/ACCESS.2024.3397676

CITATIONS READS

2 99

3 authors, including:

Tolulope Makanju Thokozani Shongwe


Federal University of Technology University of Johannesburg
14 PUBLICATIONS 12 CITATIONS 141 PUBLICATIONS 1,238 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Oluwole John Famoriji on 18 May 2024.

The user has requested enhancement of the downloaded file.


Received 1 February 2024, accepted 2 May 2024, date of publication 6 May 2024, date of current version 17 May 2024.
Digital Object Identifier 10.1109/ACCESS.2024.3397676

Machine Learning Approaches for Power System


Parameters Prediction: A Systematic Review
TOLULOPE DAVID MAKANJU , THOKOZANI SHONGWE , (Senior Member, IEEE),
AND OLUWOLE JOHN FAMORIJI , (Member, IEEE)
Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Johannesburg 2006, South Africa
Corresponding author: Tolulope David Makanju ([email protected])

ABSTRACT Prediction in the power system network is very crucial as expansion is needed in the network.
Several methods have been used to predict the load on a network, from short to long time load prediction,
to ensure adequate planning for future use. Since the power system network is dynamic, other parameters,
such as voltage and frequency prediction, are necessary for effective planning against contingencies. Also,
most power systems are interconnected networks; using isolated variables to predict any part of the network
tends to reduce prediction accuracy. This review analyzed different machine learning approaches used
for load, frequency, and voltage prediction in power systems and proposed a machine learning predictive
approach using network topology behavior as input variables to the model. The analysis of the proposed
model was tested using a regression model, Decision tree regressor, and long short-term memory. The
analysis results indicate that with network topology behavior as input to the model, the prediction will be
more accurate than when isolated variables of a particular Bus in a network are used for prediction. This
work suggests that network topology behavior data should be used for prediction in a power system network
rather than the use of isolated data of a particular bus or exogenous data for prediction in a power system.
Therefore, this research recommends that the accuracy of different predictive models be tested on power
system parameters by hybridizing the network topology behavior dataset and the exogenous dataset.

INDEX TERMS Frequency, load prediction, machine learning, power system, voltage prediction.

I. INTRODUCTION rate. However, this approach still lacks implementation in


Electrical energy has been one of the critical elements associ- some developing countries, which reduces the reliability of
ated with any Nation’s development [1]. Many factors affect the electricity grid in such countries. The problem faced
the generation and distribution of clean and reliable power by many developing countries, especially the ones in sub-
energy to consumers. Modern electrical systems have faced Saharan Africa, is electricity. The inability of many of the
many challenges, such as a mismatch between generator and grid networks to supply adequate and reliable electricity to
load caused by the removal of the generator or load, which consumers is due to a lack of proper planning in the grid
affects the system’s parameters and tends to cause total or network, which in most cases results in system failure called
partial collapse [2]. However, using various control strate- collapse. With the use of prediction techniques, some critical
gies tends to reduce the effects of these challenges, allowing operating parameters in the power system, such as load,
the power system to operate normally during contingencies. voltage, and frequency, can be predicted, which will assist
Also, different approaches such as Supervisory Control and the utility company in making proper plans. The accuracy of
Data Acquisition (SCADA) systems and smart grid technol- the predicting techniques is very important as many variables
ogy have been used to monitor and control power system need to be considered to develop an accurate predictive model
parameters to reduce the system’s downtime and failure for any specific operation.
There are various parameters that contribute to any power
The associate editor coordinating the review of this manuscript and system failure; the operating parameters’ values within per-
approving it for publication was Nagesh Prabhu . missible limits bring about safety in the system. One of
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
66646 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

the tasks of the field engineer is to ensure that the power I) Improved Accuracy: machine learning algorithms can
system network parameters operate within the acceptable analyze large amounts of data and identify complex patterns
limit, which has led to different monitoring and predictive that may not be apparent to human operators. This leads to
techniques to determine when certain operations need to be more accurate predictions of power system parameters such
initiated. Since power flow in any grid system is dynamic, as load demand, voltage, power generated, etc.
there are various approaches for predicting parameters such II) Flexibility and Adaptability: machine learning models
as voltage, frequency, and real and reactive power, which are highly flexible and can adapt to changing conditions in
operations outside the rated values can lead to failure in the the power system. They can learn from new data and update
system. The techniques used for predicting parameters are their predictions accordingly, allowing for real-time decision-
classified into traditional and machine learning techniques. making and optimization of power system operations.
Traditional approaches can be based on statistical models or III) Handling Nonlinearity: power systems exhibit nonlin-
rule-based models. ear behaviors due to the interaction of various components,
A rule-based prediction model depends on a predefined such as generators, transmission lines, and loads. Machine
condition. Experts in the system create these conditions learning techniques, such as neural networks and support
through a knowledge acquisition process, which can be vector machines, excel at modeling nonlinear relationships,
inform of if statements, if then statements, logic operators or, making them well-suited for power system prediction tasks.
mathematical functions, etc. In a rule-based model, variables IV) Integrating Uncertainty: power system predictions
are predicted by subjecting the input data to the rules and are subject to various uncertainties, such as fluctuations in
determining the output, making the rule-based prediction power generation and unexpected changes in load demand.
model transparent and interpretable. However, a rule-based Machine learning algorithms can incorporate probabilistic
system prediction has limitations in handling complex or models and uncertainty estimation techniques to provide
dynamic data patterns, making rule-based prediction tech- probabilistic forecasts, enabling better risk assessment and
niques less accurate for power system parameters prediction decision-making.
since operating parameters in power systems are dynamic. V) Scalability: power systems generate enormous amounts
Also, updating or modifying the rules can be time-consuming of data from numerous sources, including smart meters, sen-
and require system expertise. sors, and SCADA systems. Machine learning techniques can
Statistical model prediction uses statistical techniques to efficiently handle large-scale datasets and perform computa-
forecast or estimate future outcomes based on historical data. tions in parallel, making them scalable for analyzing power
Statistical models identify patterns and relationships in the system data.
data and use them to predict future observations. However, VI) Reducing Costs and Improving Efficiency: accurately
there are limitations to the use of a statistical model, which predicting power system parameters, machine learning can
includes the following: optimize the scheduling of generation units, minimize trans-
I) Assumptions: statistical models often make assumptions mission losses, and improve energy management, which
about the underlying data, such as linearity or independence. leads to cost savings, improved system reliability, and better
Violations of these assumptions can lead to inaccurate pre- infrastructure utilization.
dictions. The ability of machine learning approaches to capture the
II) Overfitting or underfitting: if statistical model is overly non-relationship of variables and an enormous amount of data
complex or lacks complexity, it may result in overfitting (fit- makes it superior to statistical model prediction and rule-
ting to noise) or underfitting (failing to capture the underlying based prediction. The beauty of the prediction techniques
patterns). is the accuracy of the prediction, which depends on the
III) Data quality: accuracy of predictions relies heavily algorithm and the data formulation used for the prediction.
on the quality and relevance of the input data. Inaccurate or Many predictions have been carried out in power systems
incomplete data can negatively impact the performance of using other exogenous data such as temperature, popula-
statistical models. tion, and income of consumers as the input to the model
IV) Inability to handle complex relations: statistical models to predict different parameters in a particular substation or
tend not to capture complex relationships or non-linear pat- network. However, since most of the power system networks
terns in the data. The statistical models’ inability to capture are interconnected, the prediction of any parameter in any
complex relationships makes them less efficient for power power station should also depend on the parameters of other
system operations because power system interconnected grid interconnected buses, which is known as the network topol-
networks are nonlinear. ogy behavior (NTB) that is the operating condition of all the
Since power system operating parameters are dynamic in interconnected buses in the network and the conditions used
nature, the use of machine learning techniques in predict- in the network to ensure the stability of the grid network such
ing power system parameters over other predicting tech- as load shedding and use of compensator for voltage improve-
niques, such as the traditional approach, which is based ment this will tend to improve the accuracy of prediction and
on a statistical model and the rule-based system, is as reduce system failure by ensuring effective planning. The use
follows: of network topology behavior is essential in a power system

VOLUME 12, 2024 66647


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

because most of the grid networks are linked to one another, input to the model. The research establishes that the proposed
which implies that the system as an interconnected effect on input formulation based on the network topology behavior is
one another. better than using the isolated bus parameter or a single bus
Since the parameter of the sending end bus will have effects parameter for prediction in the power system network.
on the parameters of the receiving end bus, predicting the
parameters of the receiving end bus using the isolated vari- II. MACHINE LEARNING TECHNIQUES USED FOR POWER
ables of the receiving end bus may tend not to give an accurate SYSTEM PARAMETERS
predictive model because the input is based on the operating Due to the importance of prediction in power systems,
parameter of the receiving end bus only. However, predicting machine learning algorithms that can be used for prediction
the receiving end bus parameters using the variables of the in power systems include but are not limited to the following
sending end bus or other interconnected buses in the network according to [3], [4], [5], [6], [7], [8], [9], and [10]
tends to increase the accuracy of the predictive model because Auto Regression Moving Average (ARMA) Model: ARMA
all the occurrences in the network have been used as the is a time series machine learning prediction model that uses
input to the model. With this approach, the parameters of the historical values or past values of the targeted variables
any bus in an interconnected network can be predicted using to predict the future. However, other features associated with
other buses’ operating variables. This approach will allow the the targeted variables cannot be used as inputs to the model;
system operator to plan well against contingency. only the previously targeted variable can be used as inputs.
Furthermore, since the power system network is dynamic The model is simple but unsuitable for prediction where
and unique to each country, the dataset for prediction may other features strongly influence the targeted variable. The
differ from region to region. However, a model can suggest implementation of ARMA is simple, but it cannot handle the
different variables that can be used as input to improve model complexity of linear or non-relationship of targeted variables
accuracy despite the difference in a dataset. It is, therefore, with additional features. There are two major components
necessary to model the prediction of power system parame- in the ARMA model: Auto Regression (A.R.) and Moving
ters based on the network topology behavior to improve the Average (M.A.). A.R. model forecasts the future variable
accuracy of a predictive model of power system parameters based on the past variables. The model is presented with an
that can lead to system failure if it operates outside the order k, which is the number of previous or past values in the
permissible limit. The accuracy in predicting power system time series dataset used to predict the value at time instant t.
load, frequency, and voltage will enable adequate action to The representation of A.R. is presented in equation (1). The
be initiated by system operators to avoid system collapse. model is often used for load prediction in power systems.
This research reviews and analyzes existing literature on
load, frequency, and voltage prediction, highlights the pros yt = α0 + α1 yt−1 + α2 yt−2 + . . . αk yt−k + µt (1)
and cons of each technique used, and proposes input vari- where α0 , α1 , αk are the parameters of the model, yt is the
ables techniques for the prediction of power systems based predicted value at time t, and µt represent the average of the
on network topology behavior. Also, load flow simulation time series.
was carried out on IEEE BUS 5 under different operating Also, the Moving Average (M.A.) is an indicator of tech-
conditions to establish the effect of network topology behav- nical analysis used widely to smooth noise based on lagging.
ior on bus voltage. The data generated was used for load The order l in M.A. models refers to l previous errors. M.A.
and voltage prediction in the network. Likewise, a week can be formalized as presented in equation (2).
hourly data of 21 load buses and total active load was
obtained from the Nigeria Electricity Regulating Commission Xt = β0 + β1 εt−1 + β2 εt−2 + . . . βl εt−l + ε0 (2)
(NERC) to test the proposed input techniques for power
where β0 , β1 , β2 , are the parameters of models and εt are the
system load prediction based on network topology behavior
errors in the model until time t.
to predict the hourly load. In addition, an hourly dataset,
ARMA combines Auto Regression (A.R.) and moving
which includes loading, voltage, and frequency, was obtained
average (M.A.) and can be formulated as presented in
from a 330/132 kV transmission station in Nigeria to evaluate
equation (1).
the proposed network topology behavior as input formulation
for hourly frequency prediction. Furthermore, three models yt = C + α1 yt−1 + α2 yt−2 + . . . αk yt−k
were used to test the proposed input techniques for load, + εt + β1 εt−1 + β2 εt−2 + . . . + βl εt−l (3)
voltage, and frequency prediction, which are classified into
a supervised (Regression and Decision Tree) algorithm and where yt is the value of the targeted variables at time t,
an unsupervised (Long short-term Memory LSTM) machine C is a constant representing a baseline in the time series,
learning algorithm by using different data formulation as the α1 , α2 , αk are the coefficient of the A.R. that indicates the
input to the model. The input data formulation was based on relationship between the current values yt and the past values
two cases: case one, when only the isolated bus parameter yt−1 , yt−2 , yt−k upto k lag, εt is called white noise or error
is used as the input to the model, and case two, when the term at instant t, the error term is used to represent other
proposed network topology behavior was considered as the factors that are not observed that influenced the targeted data.

66648 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

β1 , β 2 , β l are the coefficient of the M.A. that represents the be noted that stationarity may not be achieved in a dataset in
relationship between the current error at time instant t and such a situation ARIMA is not suitable for prediction. Hence
past errors εt−1 , εt−2, εt−l up to lag l. ARMA’s inability to seasonal ARIMA (SARIMA) can be used to capture seasonal
handle stationarity and remove seasonality in time series data data cases. In addition to the same parameters of ARIMA (P,
makes Autoregression integrated Moving average (ARIMA) D, Q)) a fourth parameters that is used to capture seasonality
better, which can handle non stationary and remove trend in in the model is added and the SARIMA is formulated as
time series dataset. SARIMA (P, D, Q M) where P and Q represents the A.R.
Auto Regression Integrated Moving Average (ARIMA): and M.A. respectively, D represents the integration and M
ARIMA is a machine learning algorithim based on the time represents the time of seasonality which can be in hours, days,
variation in the targeted variable. The idea of the ARIMA months etc. for an hourly data M is 24, for daily data M
model is that it uses past occurrences concerning the season is 7. Since ARIMA and SARIMA will involve integration
or time to predict the future at a particular time instant. The to avoid trend in dataset, which caused the data not to be
input to the model can only be the past targeted variables. stationary, there is another machine learning model designed
Other exogenous data that affect the targeted variables cannot for predicting time series data points with trend or seasonality
be used as the input to the model. Since Seasonality or trends is Exponential Smoothing (ES).
influence the accuracy of the time series model, ARIMA Exponential Smoothing (ES) Model: E.S. is a time series
captures and converts the stationary to non-stationary and forecasting method used for predicting future data points
also removes the trends or seasonality in the dataset. ARIMA based on the historical values of a time series. It is particularly
model cannot be effective in a dynamic system such as an useful when the data has a trend and/or seasonality. The
interconnected power system network where operating fea- basic idea behind exponential smoothing is to give more
tures have strong effects on one another. The aspect where weight to recent observations and progressively less weight
ARIMA can be used for prediction in power systems is when to older observations [6]. There are various types of E.S. that
the progression in load is to be considered for a region or can used for prediction depending on the complexity of the
community in order to plan for future expansion. ARIMA dataset. Equation (4) represents the implementation of simple
is generally represented by ARIMA (P, D, Q) where P and exponential smoothing (SES) model smoothing. The SES
Q stands for the order of the A.R. and M.A. parts of the consists of three components, which are the level, error, and
model, and D represents the number of integrations used. smoothing parameters. The forecast of the next time period
In order to determine the necessary orders P and Q that must in the dataset is equal to the current level, as presented in
be employed to get the best results from the ARIMA model, equation (4).
the best approaches that can be used are correlation (A.C.)
yt+1 = αkt + (1 − α) yt + ε (4)
and partial auto-correlation (PAC) [6]. The A.C. is the degree
of similarity between a time-series data and its lags; it takes where yt+1 is the predicted value at time t+1, kt is the actual
the values between the range (−1,1). It is used to determine value at time t, variable, yt is the predicted value at time t, and
the order of the Q which is the moving average components in ε is the error in the model. The advantage of E.S. over ARMA,
ARIMA, if there is any seasonality in data, the A.C. plot will ARIMA, and SARIMA is that it can capture the trend in the
show a remarkable spike by pointing out the lag that occurs dataset without the operation of any integration. However, the
after the autocorrelation reaches statistical insignificance, the E.S. can only use the historical data of the targeted variables
A.C. assists in determining the proper arrangement of the to predict the future values. It is difficult for E.S. to capture the
moving average component. relationship with other variables influencing the targeted vari-
Additionally, the PAC plot can be used to estimate the auto able. Due to this constraint in E.S., regression-based model
regression’s order P. By eliminating the indirect correlations is often used in a situation where the relationship of other
caused by intervening lags; the PAC plot illustrates the direct features affects the targeted variables.
link between a time series and its lagged values. In addi- Linear Regression (LR):
tion, tests are conducted to ascertain the dataset’s stationarity The linear regression model has many regression-based
based on the order of D. Reference [6] proposed two tests, machine learning models in which the fundamental approach
the Dickfuller (D.F.) test and the rolling statistics plot test. to prediction is based on the regression model. The regression
A chart analysis approach called the rolling statistic plot test model allows for predicting targeted variables using past data
uses a rolling average to plot data and determine whether a related to the targeted variables. The machine learning will
trend exists in the plot. The data is considered stationary if train the past targeted variables and associated data to predicts
the rolling average results do not indicate a trend. Moreso, a future targeted variable. This prediction type is more robust
the D.F. test is based on a null hypothesis in which the nature than the ARMA, ARIMA, or SARIMA model because other
of the series (i.e., stationary or not) could be determined by variables related to the targeted variable can be used as input
evaluating the p-value received by the Dickey-Fuller test. to the model. This approach can be used effectively for power
If the P-Value is less than 0.05 which represents 5% confi- system operating parameters prediction since it can capture
dence interval Thus, the alternative hypothesis is consider. the relationship between the operating parameters. Linear
Hence, the data can be said to be stationary [6]. It should regression model can be used for power system prediction

VOLUME 12, 2024 66649


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

where the parameters or the network have linear dependency


on one another. However, linear regression models assume
linear relationships between variables, which does not give
an accurate forecast for nonlinear variables [6], [9].
Decision Tree (DT): It is a supervised machine-learning
algorithm for classification and regression tasks. It recur-
sively partitions the data into subsets based on the most
significant attribute at each step. The goal is to create a
tree structure where each internal node represents a decision
based on quality, each branch represents the outcome, and
each leaf node represents the final decision or prediction. The
algorithm automatically selects the most relevant features for
decision-making, making it robust to irrelevant or redundant
features; DT is better than the regression model in handling of
nonlinear relationships between features and the target vari- FIGURE 1. Neural network architecture.
able. It can be used for prediction in a dynamic system such as
interconnected power system network prediction since it can
capture the nonlinear relationship in features. However, the Machines (SVM) for regression problems. SVR works by
decision tree model tends to overfit, and it can capture outliers finding a hyperplane in a high-dimensional space that best
in the dataset. This limitation can be overcome or mitigated represents the relationship between the input features and the
by using a large dataset for prediction or pruning techniques, target variable. It can handle complex relationships between
including setting a minimum sample size and data prepro- features of small datasets. It requires more time for data
cessing to check for outliers. In addition, DT may struggle to training because it is computationally complex. The training
capture complex relationships in the dataset that the random of the SVR model is expensive for large datasets because it
forest model can handle better. DT is not suitable for predic- requires large memory, which makes it less efficient for large
tion when the relationships between the features are complex. datasets. Furthermore, the SVR cannot handle outliers; the
Random Forest (RF): Random Forest is an ensemble learn- inability to handle outliers can be solved by preprocessing
ing method that operates by constructing many decision trees the dataset. The application of SVR for prediction in a power
during training and outputs the mode of the classes (classifi- system can be more expensive because of the large memory
cation) or the individual trees’ mean prediction (regression). required to train a large dataset.
It is one of the most powerful and widely used machine Neural Network (NN): A neural network is a computa-
learning algorithms due to its versatility and high predictive tional model inspired by the structure and function of the
accuracy. The beauty of random forest generally provides human brain. It is composed of interconnected nodes (arti-
high accuracy in classification and regression tasks also, the ficial neurons) organized in layers. Neural networks are a
ensemble approach and the randomness introduced during fundamental component of deep learning, a subset of machine
tree construction make random forest less prone to overfitting learning that involves training models with multiple layers
than decision trees this makes it better than DT and linear (deep architectures). The neural network architecture consists
regression. It can also handle large datasets with less training of the input layer, hidden layers, and output layers. The
time compared to DT and linear regression. However, due to architecture for implementation is presented in Figure 1. Each
the ensemble nature in the random forest model, it cannot layer contains nodes (neurons), and network connections are
handle the effect that an extreme outlier has on prediction associated with weights. In the feedforward process, input
accuracy in such case data preprocessing is recommended. data is fed into the input layer. The data passes through
Also, RF tends to perform less on small datasets compared to the hidden layers, and the output layer generates an out-
DT and linear regression because of the random sampling and put. Each connection has a weight, and each node has an
feature selection, which will be less efficient or effective with associated activation function that determines its output. The
a small dataset. Moreover, there may be a loss of information activation functions introduce non-linearity to the model.
during the random subsets of features at each node, which Standard activation functions include sigmoid, hyperbolic
tends to affect the model performance in terms of accuracy. tangent (tanh), and rectified linear unit (ReLU). NN learns
Despite the shortcomings of random forest, it is much used from data through backpropagation. The difference between
when accuracy is the primary goal required. Due to the inef- the predicted and actual output (the error) is calculated during
fectiveness of random forests with a small dataset. It is also training. This error is then propagated backward through the
important to consider a machine learning model that can be network, and the weights are updated to minimize the error.
less sensitive to outliers, and handle complex relations with Moreso, optimization algorithms, such as gradient descent,
small datasets. adjust the weights iteratively, reducing the error and improv-
Support Vector Regression (SVR): SVR is used for ing the model’s performance. The implementation of a neural
regression-based tasks. It is an extension of Support Vector network in a power system for prediction parameters such

66650 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

as load, voltage, and frequency will increase the accuracy of


prediction because the neural network can adapt to complex
patterns and scales to handle large and high dimensional
datasets. However, there is complexity in the implementation
of neural networks, and the training process requires more
time compared to other machine learning models.
FIGURE 2. Dataflow of series CNN-LSTM model.
Long Short Time Memory (LSTM): Long Short-Term
Memory (LSTM) is a type of recurrent neural network (RNN)
architecture designed to address the vanishing gradient prob-
lem, a challenge that traditional RNNs face when trying to
capture long-term dependencies in sequential data. LSTMs
are particularly effective for tasks involving time series,
which makes them more suitable for power system param-
eters prediction because the model trains the past targeted
variables with other features that can influence the targeted
variables. LSTMs introduce memory cells and various gates FIGURE 3. Parallel LSTM-CNN architecture [6].
to control the flow of information, enabling them to remem-
ber or forget information over long sequences selectively. The
core component of an LSTM is the memory cell. It allows the of implementing LSTM-CNN, which is a series cascaded
network to store and access information over long sequences. approach, is presented in Figure 2.
LSTM has three different gates to control the flow of infor- The CNN component is responsible for processing the
mation: the input gate determines which information from spatial aspects of the input data. It typically consists of convo-
the current input should be stored in the cell state, the forget lutional layers, activation functions (such as ReLU), pooling,
gate determines whether the information from the previous and possibly fully connected layers. The LSTM component is
state should be discarded, and the output gate regulates the designed to capture temporal dependencies in sequential data.
information that gets passed to the output and the next hidden It processes the output from the CNN and learns the sequen-
state. One of the unique attributes of LSTM that makes it tial patterns over time. The LSTM helps the model remember
suitable for power system prediction is that it updates the relevant information and make predictions based on the con-
weights of its gates and memory cells to optimize its perfor- text of the sequence. LSTM-CNN is well-suited for tasks that
mance. The LSTM can quickly deal with outliers in the data involve both spatial and temporal aspects of data, making
because the forget gates can discard irrelevant patterns and it effective for power system parameter prediction such as
focus on relevant patterns. One of the major setbacks of the load, which is affected by the complexity of human behaviors,
neural network-based model is the computational complexity, time, weather data, etc. Since the effect of loading on an
which affects the LSTM model. The LSTM tends to be less interconnected power system network affects the frequency
accurate with a small dataset because it requires significantly and voltage of the network. The LSTM-CNN is suitable for
labeled data to capture patterns effectively and generalize to the prediction of load based on the complexity factors, and
new instances. Moreover, LSTM is liable to overfitting with the same models are suitable for voltage and frequency. The
a small dataset. Despite these limitations of LSTM, it is very implementation of LSTM-CNN increases the accuracy of
useful for power system prediction that involves sequential prediction because it uses an optimization algorithm that min-
data because of its ability to model long-term dependencies imizes the difference between predicted and actual values.
and capture complex patterns over time. Researchers continue However, the limitation of this model is the complexity in
to explore improvements and variations of LSTM architec- computations; it requires a large amount of data for training
tures to address their limitations. Most of the time, a single to avoid overfitting, and like other neural network base mod-
machine learning model cannot capture all the attributes els, the hyperparameters need to be carefully tuned in order
required to ensure accuracy in prediction in such a situa- to achieve optimal performance. Despite all the limitations,
tion; a hybridization of different machine learning models is the LSTM -CNN models will perform better in the predic-
suggested. tion of power system parameters if the shortcomings can be
LSTM-CNN Model: LSTM-CNN is a hybrid model that avoided. The model is better than the existing time series
combines Long Short-Term Memory (LSTM) networks and model and regression-based models because it combines two
Convolutional Neural Networks (CNNs). This architecture is neutral network-based models used for different operations
often used for tasks involving sequential and spatial data, such to enhance the accuracy. Since the input of the LSTM is the
as video analysis, action recognition, and spatiotemporal data output of the CNN model, it may tend to be prone to error if
analysis. In the implementation of the LSTM-CNN model, there is an error in turning the hyperparameters, which will
the CNN is used for input feature extraction, which is one affect the model’s accuracy. A parallel hybridization of the
of the best attributes of CNN, and the output of the CNN machine learning model was proposed by [6] to avoid errors
is used as the input of the LSTM model. The architecture in the series cascaded LSTM-CNN.

VOLUME 12, 2024 66651


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

the PLCNet model data flow and how the input data is being
processed. According to the diagram, the number of data in
each batch can be different, and it depends on the purpose
of the prediction. In other words, for different time horizons,
the number of look back steps is different. After choosing
the look back number, the load data are batched with the
same size. For example, if the number of look back steps
is chosen to be 25, the first batch will contain data points
0 to 24, then the second batch will have data points 1 to
25, the third batch includes 2 to 26, and so on. The merge
layer is used to combine the LSTM and CNN output, which
can be informed of concatenation. The combined features
capture both the temporal and spatial aspects of the data.
The prediction process involves the training and testing of
the model for prediction. The implementation of parallel
hybridization of the machine learning algorithim helps to
increase the accuracy of the predictive model.
Since different machine learning is a unique attribute, it is
better to summarize the strengths and weakness of each
machine learning model as presented in Table 1A to Table 1C.
In addition, using any of the models in any network depends
on the network parameters and behavior. The accuracy of
prediction of load, voltage, and frequency using the models
is essential in a power system, which enables the power
system engineers to determine when to initiate the necessary
action in order to avoid disturbance in the network. Load is
FIGURE 4. Flowchart for PLCNet. predicted in the power system to accommodate the expansion
in the network due to an increase in load and assess the
increase required for rating power system elements such as
Parallel LSTM-CNN Network (PLCNet) Model: The major transformers, generators, and protective elements. Moreso,
reason for hybridizing machine learning models is to increase voltage and frequency are monitored in a power system net-
the accuracy of the model. The problem with a series work to avoid partial or total collapse because the values
LSTM-CNN networks is that extracted features affect the of voltage or frequency outside the permissible limit in any
training of LSTM. In order to solve this problem, PLCNet, network can lead to system collapse. The collapse of power
which combines LSTM and CNN networks, is used in two systems in developing countries such as Nigeria cannot be
different paths without any correlation between those two avoided in a year, which has reduced the ability of the network
paths, as presented in Figure 3. These two paths extract the to supply adequate power to the electricity consumer. Also,
features, and in the CNN path, capturing the features of in a situation when the power generated is not commensurable
local trends is the main objective. In the CNN path, data to the power consumed in some power grid in Africa, led to
is convoluted through a Conv-1D layer [6], and after the overloading of the network, which can lead to the collapse of
convolution layer, the Maxpooling layer is used to reduce the system but in order to avoid the aforementioned problem,
the data’s dimensionality by down sampling while keeping load shedding was used as a measure which will allow some
its quality. The data in the final layer continues through the user to be in blackout this approach reduces the reliability of
flattened layer. All of the units are activated by the activation the grid network. The consumer load is dynamic in nature,
function. A fully connected path, including dense and dropout which cannot be evaluated using a deterministic approach,
layers, has been implemented and finally predicted actual but it can be estimated with the used of appropriate predictive
values to compare data to carry out the final prediction. techniques using historical data and occurrence in the net-
In addition, the LSTM path is used to capture the long-term work, which helps to determine the period where a particular
dependency within data, and data go through a flatten layer to region or part of the network is due for expansion.
start working with the LSTM network. After passing through
the flatten layer, input data is ready to be entered as an A. LOAD PREDICTION IN THE POWER SYSTEM
LSTM layer input. After passing through LSTM and CNN Load is one of the most essential parameters in power system
paths, the processed data is ready to be entered into the fully operation and needs much attention since loading affects
connected layer. However, there is no correlation between other variables, such as voltage and frequency. The recent
the two paths, making it more unique than the series or development in power system operation is load prediction,
cascaded LSTM CNN model. Figure 4. shows a diagram of which will enable accurate load forecasting and help the

66652 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 1. Comparison of different machine learning models.


TABLE 1. (Continued.) Comparison of different machine learning models.

VOLUME 12, 2024 66653


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 1. (Continued.) Comparison of different machine learning models.

FIGURE 6. Effect of temperature on load consumption [11].

FIGURE 5. Impact of population on load consumption [12].

FIGURE 7. Effect of relative humidity on load consumption [11].

power system operator determine what is needed to generate


to meet the consumer’s demand. Loading in any power system
network affects other operating variables within or outside model used for prediction. The effects of these variables on
the permissible range. Accurate load prediction increases load consumption, as suggested by [11], [12], [13], and [14]
the power system’s efficiency, reliability, and safety. Several will invariably affect the load prediction, as follows:
factors affect the loading of the power system, which were I) Population: The population of an area can impact the
classified by [11] into the population, weather variables, time overall electricity demand. As the population increases, the
factors, consumer personal income, the occurrence of random demand for electricity will also increase. Load prediction
events and tariff rate. For load prediction to be accurate, two models can consider population data when forecasting future
or more of these variables must be considered in the mod- electricity consumption. Reference [12] shows that electricity
eling. Since all these variables affect the load consumption, consumption varies with population, as presented in Figure 5.
using one or two of these variables as input to the predicting This implies that predicting the load consumption popula-
techniques or model will tend to affect the accuracy of the tion should be used as one of the inputs to the model, which

66654 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

FIGURE 11. Effect of weekdays and weekends on load consumption [11].

FIGURE 8. Effect of wind speed on load consumption [11].

FIGURE 9. Effect of yearly variation on load consumption [11].

FIGURE 12. Impact of random occurrences on load consumption [13].

FIGURE 10. Effect of quarters variation on load consumption [11].

FIGURE 13. Load consumption and different tariff rates [14].


will tend to increase the accuracy of the predictive techniques
used.
II) Weather Variables: Weather conditions, such as tem- electricity consumption. Load prediction models often incor-
perature, wind speed, and humidity, play a significant role porate weather variables to capture the relationship between
in electricity consumption. During hot weather, the demand weather patterns and electricity consumption accurately.
for air conditioning increases, leading to higher electric- Reference [11] shows the effects of temperature, relative
ity loads, and during cold weather, the need for thermal humidity and wind speed on electricity consumption as pre-
energy increases in buildings, leading to an increase in sented in Figures 6 to 8. The results show that temperature

VOLUME 12, 2024 66655


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 2. Tariffs for monthly energy consumption (subsidized and official


electricity tariffs) [14].

parameters should be used as input to the model, which tends


to increase the model’s accuracy.
III) Time Factors: Time factors, such as day of the week,
time of day, and holidays, can influence electricity usage
patterns. For example, on weekdays, residential loads tend to
be higher during morning and evening, while commercial and FIGURE 14. Load prediction techniques [15].
industrial loads may have different patterns. Load prediction
models consider these time factors to capture the daily and
weekly variations in electricity demand. According to [11], Also, the drop in the country’s electricity consumption in
time variation has effects on load consumption of France 2010 is associated with three random occurrences, which are
as presented in Figure 9 to 11. The boxplot of the load as follows: first, the period of January to April 2010 was
consumption shows that the average load is almost the same accounted to be the post-election crisis; secondly, there is an
across the year, as shows in Figure 9. However, consider- increase in load shedding which occurred due to damages
ing the quarterly loads consumption in Figure 10, the plot of 150MW turbine in a generating station and finally, the
shows a difference in the level of consumption from each of drop in 2010 was due to political instability from the post
the quarters in the year. Reference [11] reported that power electoral crisis; residents moving and plants, businesses and
demand increases in France in first and last fourth quarters administrations closing resulted in lower electricity consump-
of the year due to heating load in the winter month. Also, tion [13]. Since random occurrences can have a significant
the consumption during the weekend is high compared to the effect on load prediction, it is, therefore, necessary that load
number of weekdays, as presented in Figure 11. prediction techniques capture the random occurrences in pre-
This shows that the electricity consumption varies with dicting the load demand accurately.
time. Therefore, the model for predicting loads will be more VI) Tariff Rate: Energy tariff rates can influence consumer
accurate if time is also a feature considered as input to the behavior and electricity consumption. Higher tariff rates may
model. incentivize consumers to reduce their electricity usage, result-
IV) Consumer Personal Income: Personal income can ing in lower loads. Reference [14] reported that consumer
affect electricity consumption patterns. Higher income levels with low tariffs tends to consume more. Table 2 presents the
may lead to an increase in the use of electricity, intensive tariff rate of the two customer groups. The results presented
appliances, and devices. Load prediction techniques may take in Figure 13 show the consumption of two customers against
consumer personal income data into account to understand the tariff rate. The results indicate that customers with lower
the relationship between income and electricity demand. tariff rates consume electricity more than customers with
V) Occurrence of Random Events: Random events such higher tariff rates. Load prediction techniques can incorporate
as political crisis or war can increase or decrease electric- tariff rate data to understand how pricing affects electricity
ity consumption. This random occurrence can lead to the demand. However, the use of all the factors as input to the
shunting down of power plants and the migration of people, predictive model tends to increase the accuracy of the load
which can impact electricity usage in specific locations dur- prediction techniques, but the network topology behavior,
ing the period. Reference [13] reported that the occurrence which gives the actual operating state of an interconnected
of random events affects the consumption of electricity in power system network, tends to give better accuracy in
Abidjan. The result is presented in Figure 12. According to prediction.
the Technical Distribution Department (TDD), as reported There are different classifications of load forecasting
by [13], the decrease in electricity consumption in 2003 was techniques, which are model-based (convectional, Artificial
associated with the unstable political conditions in the Ivory intelligence, and Traditional) and prediction period (very
Coast between 2002 and 2004, which led to the shutting down short time load prediction (VSTP), Short time load prediction
of the power plant and decreased electricity consumption. (STLP), middle time load prediction (MTLP), and Longtime

66656 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 3. Performance of different models on stlp using malaysia TABLE 4. Performance of different model on stlp using germany
dataset [6]. dataset [6].

Load prediction (LTLP) as presented in Figure 14. The use model is measured by its accuracy and error in the model.
of the traditional approach for load prediction depends on Reference [19] used six different models (Support vec-
the statistical analysis and rule-based system in which his- tor machine, random forest, ensemble Xgboost, K-nearest
torical data are used to forecast future load demand, which Neighbors (KNN), Decision trees (DT), and Neural Network
relies on time series analysis to identify patterns and trends (NN) to predict the daily load consumption of the Greek grid
in historical load data. The traditional approach tends not system. The previous daily load consumption was used as
to consider other external factors, such as operating condi- the input to the model, and the Mean absolute percentage
tions in a grid network, weather conditions, and economic error (MAPE) was used to evaluate the model’s performance
indicators that can influence load demand. It is simple and compared with the prediction made by the operators of the
less computationally intensive compared to conventional electricity supply chain in Greece. The results show a 4.7%
model prediction techniques. Conventional model prediction reduction in error compared to Greek operators of electricity
techniques involve mathematical models that simulate the supply chain prediction. The model can still be better if all the
behavior of the power system and its components. It con- load buses in the network were used as input to the model;
sidered other factors that can influence the demand for load. though it may increase the complexity in computation in a
However, complex algorithms and mathematical models are power system, accuracy is very important. Reference [20]
required to implement conventional techniques for load pre- perform STLP on Chhattisgarh state electricity load demand
diction. Conventional model-based techniques can consider using the Artificial Neural Network (ANN) algorithm and
historical load data and additional external factors, such as historical weather data were used as input to the model. The
operating conditions in the grid network, temperature, humid- findings reveal that ANN performs efficiently in STLP.
ity, economic indicators, and other relevant parameters. Likewise, [21] used the ANN model to predict short time
These techniques may require more computational load demand of the Iraqi National Grid using weather param-
resources and time to develop and train the models compared eters such as temperature, fog, humidity, and cloudiness as the
to traditional techniques. Also, the load prediction is more input to the model. The results show that the accuracy of the
accurate and more reliable than the traditional method. model is high, with a little error margin. The unique finding
MTLP: of [21] is that ANN is suitable for STLF prediction with the
Medium-term load prediction is used to predict for a month use of exogenous parameters that affect load in the power
up to a year [16]. The thirty-day load ahead was predicted in system network. However, the evaluation metrics to test the
the Bono region of Ghana by hybridizing different machine performance of the model were not stated. The predicted load
algorithms such as Multilayer perceptron’s (MLP), Support and actual load were only compared with one another.
Vector Machine (SVM) and DT. The historical load consump- Reference [22] used ANN to experiment with the STLP
tion and weather data were used as the input to the model. for a period of an hour up to 24 hours, using temperature,
The results reported by [17] show that 95% accuracy was wind speed, and wind chill index which combine the effect
obtained in the model. However, the model fails to include of wind speed and temperature as one of the predators, other
the network topology behavior in the prediction as an input predators such as previous load, random occurrences were
variable, which tends to improve the prediction accuracy. also used as an input to the model. The study was tested on
STLP: the Abbottabad grid network located in Pakistan. The study
STLP is used for hourly prediction and a few weeks’ established that the major predators for short-term predic-
prediction in power systems based on the available his- tion are the previous day’s load, the previous week’s load,
torical data [18]. Different predictive models can be used and temperature. The major finding of [22] is that exoge-
for short time prediction, and the performance of each nous data such as temperature has affected the short time

VOLUME 12, 2024 66657


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

load prediction with the use of ANN. However, using other TABLE 5. Summary of the related literatures based on the input
formulation and the model used for prediction.
predators, such as network topology behavior, for predicting
the load of each interconnected bus in the network, tariff
rate, and average consumer income, can improve the model
performance because electricity consumers tend to consume
more energy if available when income is high.
LTLP:
Reference [23] used the previous load and various exoge-
nous data such as temperature, humidity, average salary, gross
domestic, oil price, population, residence, passengers, cur-
rency earning rate, and total import and export were used
as input into the model to predict ten (10) years of elec-
tricity consumption for Kuwait grid network using ARIMA
and Neural Network (NN). The findings show that the NN
model is better than ARIMA in terms of accuracy and the
use of temperature and humidity, as the input data indicates
more significance than other variables used. The study estab-
lished for long term load prediction with the availability of
numerous sample neutral network-based model is suitable
for prediction, while ARIMA is suitable for short term load
prediction or long-term load prediction with few sample data.
Reference [24] obtained, two different data sets of the same
grid network from different sources; historical load data and
Twitter data based on people’s opinions were also obtained,
which was used as an independent variable to test the per-
formance of ANN and Support vector Machine (SVM). The
load obtained was used as the input to the model, and the
results show that ANN performed better than SVM in terms
of accuracy. Also, there was no improvement in the model
performance when weather data was used as input variables
to the model. The research established that mining of data
from social media or questionnaire surveys for load predic-
tion is unreliable. Reference [24] therefore recommends that
historical data of the load consumption and other exogenous
data that has influence on the load consumption should be
used as the input formulation for load prediction.
Reference [25] compared the results of hybridization of
different machine learning models (SVM, RF., Generalized
linear model GLM) with ARIMA model to form a two
stage cascaded predictive model, which are ARIMA-SVM,
ARIMA- RF., ARIMA-GLM to reduce the error in the model.
The model was tested using datasets from the Iberian electric-
ity market which consist of load, temperature, and the tariff
rate, which is also known as electricity market price. The
results of each of the hybrids model were compared with one
another and performance was tested for long term prediction
using MAPE. The findings established that the combination
of ARIMA-GLM model gives better accuracy for long term
prediction compared to other models. This implies that the
hybridization of two different machine learning models with
different unique features tends to increase the accuracy of
prediction.
The LTLP is used to look into the future for a year and up to
several years. The prediction model tends to be more accurate
when other exogenous data, such as humidity, temperature,
etc., are used. The use of LTLP in power system network

66658 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 5. (Continued.) Summary of the related literatures based on the TABLE 6. Comparison of ELM and CCELMN model in term of
input formulation and the model used for prediction. accuracy [43].

FIGURE 15. Model performance between the proposed model and ELM
only [41].

performs better using a particular dataset and gives less accu-


will not allow effective planning’s as power system load racy in another data set. This reaffirms that using a prediction
are dynamic in nature the shorter the prediction period the algorithm in a power system depends on the characteristics of
effective the planning because shorter periods like a month the dataset and the operating behavior of the network is one
and a day allow effective planning for the nearest future and of the major variables to be considered, which is determined
that is what is required by the utility companies to known by various components used in the network. This implies that
what is required to supply the consumer and ensure system load prediction in any grid network is dynamic, which can
reliability. depend on various factors such as network topology behav-
VSTLP: ior and geographical location. Reference [26] suggested that
Reference [6] evaluate the performance of different mod- weather parameters, income of consumers, and tariff rate can
els for predicting STLP using two regions (Germany and be used to influence the performance of the load prediction
Malaysia) hourly consumption and supply datasets. The model.
author combined two models: Long short-time memory sys- The findings of this review on load prediction are as fol-
tems (LSTM) and Convolution Neural networks (CNN) in lows:
parallel called parallel long short-time memory and convo- I) in case of an interconnected grid network load prediction
lution neural networks (PLCNet). The performance of the tends to be more accurate by using the Network topology
model is compared with other machine learning models as behavior (NTB) data as the input to the model.
presented in Table 3 and Table 4. The findings show that II) when the power generated is less than load demand, the
the hybridized model performs better than the initial models power generation should also be used as an input variable to
used. Also, the result of [6] shows that the prediction model the model.

VOLUME 12, 2024 66659


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

III) when the tariff band determines the hours of electricity Frequency response: fluctuations in power generated can
supply to consumers, the tariff band should also be used as an impact the frequency response of the system. Power system
input variable to the model for prediction of load. components such as generators, turbines, and governors have
IV) in situation in three (3), the supply hours should also specific control mechanisms to regulate frequency. However,
be used as input variables for demand side load prediction sudden changes in load or power fluctuations can disturb the
V) weather data should be admitted as part of the indepen- balance between power generation and consumption, leading
dent variables for load data. to frequency variations.
VI) dataset must be well structured; the interval should be Time response: frequency prediction techniques rely on
regular if it is time data. historical data and mathematical models to forecast future
VII) hourly data should use for STLP frequency patterns. However, sudden changes in load and
VIII) monthly data should be used for MTLP power fluctuations can disrupt the accuracy of these predic-
IX) yearly load data should be used for LTLP. tions. The time response of frequency prediction techniques,
X) when different companies handle the generating, trans- including the time it takes to detect changes and update
mission and distribution of power, the distribution data should predictions, may be affected due to the rapid changes in power
be used for load prediction as all other data from transmission conditions.
and generation should be used as input with other exogenous The resilience of prediction models: the resilience of
data. frequency prediction models to sudden load and power fluc-
The summary of the unique state of art of each of the tuations changes is an important factor that needs to be
related literatures on load prediction based on the input for- considered for frequency prediction. Some prediction tech-
mulation and the machine learning model used is presented niques may be designed to handle such variations more
in Table 5A and Table 5B. effectively than others. It is essential to consider the robust-
Furthermore, precise load estimates increase revenue ness and adaptability of the prediction models to ensure
for electric companies by enhancing energy efficiency, accurate forecasts despite unpredictable load and power
lowering operational expenses through wise planning and conditions.
decision-making for future expansion, and decreasing oper- Since sudden changes in load and fluctuations in power
ating costs. It has been demonstrated by several researches generated can impact the accuracy and reliability of fre-
that a 1% reduction in load forecasting error results in savings quency prediction techniques. It is important to consider
of hundreds of thousands or even millions of dollars [27], real-time power conditions and designing prediction models
[28]. This means that the accuracy in load prediction is very that can adapt to dynamic power system behavior, which
important; since load prediction is essential in a power system machine learning is a good technique to apply.
network, it is also necessary to predict the effect of loading on However, there are a lot of techniques used to predict and
other variables, such as frequency and voltage, that can lead control the frequency of grid network to maintain stability,
to the collapse in the power system network. The study [6], such as frequency response analysis, the dynamic simulation
[17], [18], [19], [20], [21], [22], [23], and [24] suggests that of the grid network also the steady state analysis of the grid
the use of previous load only as input to the predictive model network, which assumes that the grid is stable, load flow
performs less compared to when other features that affect the analysis [29] which is majorly used for power and voltage
load are also used as input to the model. This implies that calculation can also be used to estimate the frequency of
in predicting the load in the power system, all features that the grid network for different load values and since load
affect the load should be used as input to the model. Likewise, values can be predicted in any power system network, then the
in an interconnected grid network, bus load prediction should corresponding frequency values can be determined. However,
be based on the network topology behavior that captures the many literature [30], [31], [32], [33], [34], [35], [36], [37],
behavior in the network. [38], [39] deals with frequency control in a power system net-
work. The machine learning approach can be used to predict
the frequency of the network based on the network’s historical
B. FREQUENCY PREDICTION IN POWER SYSTEM data and network topology. However, in selecting any of
Many factors affect the frequency of a grid network, such the grid network specific requirements must be determined.
as sudden changes in load, fluctuations in power generated, The use of any prediction techniques depends on the time
power imbalance between the load and power generated, response of a particular device used. Since the power system
interconnected grid networks, and contingencies in the net- is dynamic and the parameters vary with time, it is easy to use
work. Since all these factors affect the frequency, it tends to historical data to predict the condition of a particular network.
affect the prediction of frequency. However, [40] suggests that in a situation where the his-
Load Mismatch: sudden changes in load can result in a mis- torical data is insufficient, the use of the machine learning
match between the power generated and the power consumed method tends to be less efficient in prediction. The use of
in an electrical grid. This mismatch can cause frequency transfer learning is proposed which the fundamental concept
deviations and affect the accuracy of the frequency prediction is to transfer eligible samples from the source domain’s data
technique. that have similar features to the target system in order to

66660 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 7. Summary of related literatures based on the input formulation


and the model used for frequency prediction.

FIGURE 16. Comparison of accuracy of the two scenarios [43].

increase the number of samples in the target system’s training


sample set and improve the precision and dependability of
the prediction model [40]. Since model-driven techniques for
frequency prediction and control is limited by contradiction
between accuracy and efficiency while data-driven methods
demonstrate strong abilities for the online decision-making
support with advancement of various data mining techniques.
Instead of direct application of data-driven methods in the
power system, [41] proposed an approach that integrate
model driven and data-drive techniques for frequency predic-
tion. The model based is based on the characteristics of the
frequency response model of the network and the data-driven
is based on extreme learning machine (ELM) approach which
is a machine learning model. The proposed model was tested
on three different network which are on WSCC 9-bus, New
England 39-bus, and NPCC 140-bus system. The results of
the prediction in term of accuracy and speed were compared
when the ELM only was used and when the integration of the
model and data driven was used as presented in Figure 15.
The result presented in Figure 15. indicates that integrated
method better than ELM-only method in accuracy in WSCC
9-bus test system. Nevertheless, when the test bed is substi-
tuted with New England 39-bus test system, the advantage
of proposed integrated method in term of accuracy is high.
ELM-only method encounters more difficulty to describe
dynamic features of a larger system, where variables are in
large number, while integrated method is based on prediction
results of characteristic of frequency response method, which than SFR only and ELM-only method in accuracy with little
benefits to construct optimization solution-based prediction sacrifice of computing speed.
model by reducing potential feasible solution space. For According to [41] and [42] frequency prediction improves
both reasons, accuracy variation difference of integration and power system control and protection techniques. Refer-
ELM-only method in small and large test system occurs. ence [43] developed a predictive model called cellular com-
Furthermore, RMSE indexes for integration and ELM-only putational extreme learning machine network (CCELMN)
method in New England 39-bus test system are (0.0518 Hz, which is based on modification of ELM for frequency pre-
0.427 s, 0.0166 Hz) and (0.112 Hz, 0.7236 s, 0.0256 Hz) diction. Cellular computational network is a scalable and
respectively, showed in sequence of maximum frequency distributed architecture for learning the dynamics of large
deviation, time at maximum frequency deviation and steady- interconnected systems. It is a directed graph with compu-
state frequency. It indicates that proposed integrated method tational nodes (cells) and nodes are located by representing
is superior than ELM-only method in prediction reliability. the topology of the network. A cell consists of a com-
Hence, it can be concluded that integrated method is better putational unit, a learning unit and a communication unit.

VOLUME 12, 2024 66661


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

Computational unit uses an intelligent algorithm to utilize


the available input data to give a final output. The technique
was compared with independent extreme learning machine
(ELM) models, which was tested on 12 bus networks. The
input to the model is the loading parameters and the generator
parameters in the interconnected network. The mean absolute
percentage error (MAPE) for the two models is presented
in Table 6 [43]. This result shows that CCELMN is more
accurate for frequency prediction than ELM, because it is
capable of dynamically learning from interconnected Buses
in the system. Hence CCELMN is more suitable for are large
interconnected network such as power system network.
According to [44] and [45] the major factors affecting
the frequency response characteristic in the power grid are:
total power generation, total power consumed, grid capac-
ity, and power disturbance which can be associated with
the network topology behavior. Reference [46] obtained FIGURE 17. Classification of voltage stability based on disturbance [66].
datasets from the Northwest power grid based on the factors
that affect the frequency. The following data total power
generation, power disturbance, total power received, power
III) all network disturbance cases must be used as input
generation under different power generation methods, and
parameters.
grid-connected capacity, also with the use of maximum rel-
IV) removing and adding load or generator to the power
evance and minimum redundancy model (MRMR) of input
system network affects the frequency, the prediction model
extraction algorithm the following data was extracted total
will be more accurate using short-time load prediction data.
thermal power generation, new energy generation ratio, hydro
Since the frequency of a network depends on the network’s
power grid connection capacity, and power disturbance. The
loading and machine learning is used for predicting load for
extraction of the data was classified into four different
effective planning for future expansion, it can also be used
datasets, and dataset 1 is the data extracted using MRMR,
for frequency prediction, which will help operators to plan
in dataset 2, hydropower, total wind power, and photo-
against contingencies in the network. It will also help power
voltaic total power was added to dataset 1, thermal power
system operators to use predictive maintenance techniques
grid-connected capacity and total thermal power load rate
than reactive maintenance techniques.
was added to dataset 2 to obtain dataset 3. Dataset 4 uses the
extraction based on the factors affecting frequency response.
The prediction is classified into two, where all scenarios C. VOLTAGE PREDICTION IN POWER SYSTEM
were used, and some scenarios were dropout. The model’s The voltage profile in a power system is a critical parameter
accuracy in terms of MAPE is presented in Figure 16. The that ensures the stable and efficient operation of the electrical
operation of any power system network is unique because grid. Several factors can affect the voltage profile in a power
many factors affect the operating behavior, such as trans- system, and it is essential to maintain voltage within accept-
former rating, protective element rating, lines parameters etc. able limits to prevent equipment damage and ensure reliable
This implies that prediction in a power system is network electricity supply [47]. According to [48], [49], and [50] there
specific, that is it will be difficult to use a model developed for are several factors that affects the voltage level in a power
a particular network to predict for another network unless it system network which include:
is of the same network parameters and operating conditions. I) Load Variations
The frequency of a particular network can be predicted Changes in electrical load demand can significantly impact
using an appropriate machine learning algorithm, which will the voltage profile. When there is a sudden increase in
help the power system engineer initiate the necessary control load, voltage levels can drop, leading to voltage instability.
techniques during contingencies or abnormal operating con- Conversely, a decrease in load can result in over-voltage
ditions. The summary of the related literature for frequency conditions.
prediction based on input formulation and the machine learn- II) Output of generators
ing model used for prediction is presented in Table 7. The output of generators connected to the power grid
This review suggests the following for frequency predic- affects the voltage profile. An increase in generator output
tion in power systems: can raise voltage levels, while a decrease can cause voltage
I) features extractions of factors affecting frequency to drop.
response should be carried out. III) Distance and impedance of transmission: The
II) parameters such as total power generated and power impedance of transmission lines can cause voltage drops due
consumed must be used as part of the input to the model. to line

66662 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 8. Comparison of different prediction methods [78]. TABLE 11. Comparative analysis of various regression model for unseen
N-1 contingencies [79].

TABLE 9. Comparative analysis of various regression models for the base


case [79].

TABLE 10. Comparative analysis of various regression model for an N-1


contingency case [79].

X) Operating conditions
Factors like ambient temperature and humidity can affect
the performance of electrical equipment, including trans-
formers and generators, which can, in turn affect voltage
levels.
XI) Grid expansion and upgrades
Changes in the power grid’s infrastructure, including the
addition of new substations, transmission lines, and genera-
losses. Longer transmission lines with higher resistance tion sources, can influence the voltage profile.
and reactance will have more significant voltage drop. However, to maintain voltage within permissible limit in
IV) Transformer tap Settings power system there are several control strategies used by
Transformers are used to step up or step-down voltage power system operators to mitigate the effect of the aforemen-
levels, adjusting the tap settings on transformers can impact tioned factors that affect the voltage profile in the network.
voltage levels at various points in the grid. This approach is According to [51], [52], [53], [54], and [55]. The control
used to regulate the voltage of a power system network to strategies include:
operate within acceptable limit. I) Voltage Regulators
V) Faults and short circuits These devices automatically adjust the tap settings on
Faults on the power system networks tends to disrupt the transformers to regulate the voltage within desired limits,
voltage profile. Faults can lead to sagging voltage or spikes which maintain a stable voltage profile.
depending on the location and the nature of the fault. II) Reactive Power Compensation
VI) Reactive power compensation Installing capacitor banks at strategic locations in the
Reactive power is very important in maintaining voltage power system can provide reactive power support and help
stability. The addition or removal of reactive power sources improve voltage levels. They are particularly effective in
such as capacitors or reactors can affect voltage levels. distribution systems.
VII) Network Topology III) Synchronous Condensers
The structure of the power grid, including the arrangement Synchronous condensers are rotating machines that can
of substations, transformers, and feeders, can impact the provide or absorb reactive power as needed to stabilize the
voltage profile. Grid topology can determine how voltage voltage. They are especially useful in areas with high pene-
propagates and is distributed. tration of renewable energy sources.
VIII) Renewable energy integration: The intermittent VI) Static Var Compensators (SVCs) and Static Syn-
nature of renewable energy sources like solar and wind can chronous Compensators (STATCOMs)
introduce voltage fluctuations and require advanced control These are advanced power electronic devices that can
strategies to maintain voltage stability. quickly inject or absorb reactive power to maintain voltage
XI) Load characteristics stability and improve the voltage profile.
The type of loads connected to the grid, such as motors V) Distributed Energy Resources (DERs)
or sensitive electronic equipment, can affect voltage stability. Solar photovoltaic (PV) systems and wind turbines can be
The starting of a Motor in particular, can cause voltage drops. equipped with inverters that provide reactive power support,

VOLUME 12, 2024 66663


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 12. Comparison results in terms of PICP and winkler score for capacity and plan for expansion. But other factors contribute
each test system [80], [81], [82].
to the supply of suitable voltage to consumers, which is the
operating voltage at each bus in the network, which has been
determined by the loading on the bus, the sending voltage
from the previous buses in the network, the distance of the
transmission or distribution line and the impedance of the
lines. The recommended standard for operating voltage in
any network is 0.95≤1≤1.05 P.U [57], [58]. Many techniques
are used to improve the operating voltage of any network
from the FACTS Devices, distributed generators, and model
predictive control [59], [60], [61], [62]. All these are part of
the methods used to maintain the operating voltage of a power
network within a permissible range. However, in most power
system networks all the approach to improve the operating
voltage results from reactive maintenance in the network.
There is a need to predict the operating voltage of a power
system network under different scenarios that can be used
helping to regulate voltage levels. Advanced inverters with to improve the voltage profile of the network. Malkovich,
voltage control capabilities are increasingly being deployed. a Soviet academic, first identified the issue of voltage in the
VI) Voltage and Reactive Power Control 1940s and proposed the criterion; nevertheless, it wasn’t until
Implementation of advanced voltage and reactive power the end of the 1970s or the beginning of the 1980s that this
control strategies in the power system’s such as supervisory issue was seriously studied [63].
control and data acquisition (SCADA) system which uses Due to voltage collapse accidents, some of the world’s
real-time data to adjust voltage and reactive power settings. most extensive power grids have collapsed, including the
VII) Load Shedding and Load Management French power grid accident in 1978, the Swedish power grid
Implementing load shedding schemes that can disconnect accident in 1983, the Tokyo power outage in 1998, and the
non-critical loads during voltage emergencies to help main- United States in the western power grid blackout in 1996,
tain voltage stability. This approach reduces the system’s among others [64], [65]. With more research being done,
reliability. people are gradually realizing that the dynamic nature of
VIII) Network Reconfiguration voltage stability and the dynamic properties of many power
Modify the network configuration by changing the oper- system components are closely related. Therefore, defining
ation of switches and circuit breakers to redistribute power the research scope for static and transient voltage stability
flows and alleviate voltage violations. is essential before selecting the right mathematical models,
IX) Distributed Voltage Control analysis techniques, and simulation tools as presented in
Use smart grid technologies and communication systems the Figure 17, voltage stability is typically split into two
to enable distributed control of voltage. Decentralized control categories: small disturbance voltage stability and large dis-
strategies can help maintain voltage levels locally. turbance voltage stability. Different methods are used to
X) Voltage Monitoring and Data Analysis analyze transient voltage stability: time domain simulation,
Deploy voltage monitoring devices and perform regular non-linear dynamic, and transient energy function [66].
analysis of voltage data to identify areas with voltage prob- Despite extensive investigation, the mechanism of tran-
lems and take corrective actions. sient voltage instability is still unknown. Many researchers
XI) Grid Planning and Expansion first thought that an unbalanced power supply and demand
Plan for future grid expansion and upgrades to accommo- between the system and the load brought on voltage instabil-
date growing power demand and the integration of renewable ity. As a result, it was decided to study the transient voltage
energy sources. This may involve adding new substations, stability problem as a load stability problem. The study’s
transformers, and transmission lines. One of the ways to main focus is the transient voltage instability brought on by
ensure the effective planning in power grid network is the the load’s dynamic properties. References [67], [68], [69],
used of forecasting techniques. [70], [71], [72], [73], [74], [75], and [76] studied different
In any power network, three major important things are loading condition on voltage stability. However, for a very
essential to the operator; to supply good quality of voltage, long time, researchers generally concentrated on the load
ensure supply availability at demand and increase reliabil- side, represented by induction motors, and rarely involved
ity [56]. However, to maintain these three states in the power the system side, which hampered their research’s integrity.
system, some factors needed to be considered, and one is to According to [77], it is essential to examine the mechanism
predict the operating parameters of the power system network of transient voltage stability from the three perspectives of
based on the network topology behavior. Many researchers the transmission network’s capacity constraint, the dynamic
are interested in load flow prediction to know the network’s properties of the load, and the receiving end system’s voltage
66664 VOLUME 12, 2024
T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 13. Summary of the related works based on the prediction model
and input for voltage prediction.

FIGURE 18. Network topology of IEEE bus 5.

TABLE 14. a) Load and generator data for scenario 1. b) Transmission line
data for scenario 1.

points. The technique also accelerates the offline training


process by reducing the number of simulations on a detailed
power system model around operating points where cor-
rect predictions are made. The approach is implemented on
support capacity. It sees the issue of voltage stability as a WECC system which consists of 29 generators, 179 buses,
systemic one rather than a load stability issue. The use of 263 transmission lines, 42 shunts, and 104 loads. The
machine learning techniques to predict the voltage stability in knowledge base prepared by the oracle includes 5078 ‘‘sta-
power systems supplement traditional simulation techniques ble’’,2540 ‘‘alert’’, and 2529 ‘‘critical’’ labeled OPs. A total
such as load flow analysis and probabilistic analysis. of 256 channels of simulated phasor data were gathered, cov-
Reference [78] proposed a pool-based active learning solu- ering 10147 selected operating points. Also, three machine
tion to enhance existing machine learning applications by learning model was used for the experiment (Random Forest,
actively interacting with the online prediction and offline Support vector machine (SVM), and Artificial neural net-
training process. The technique identifies operating points work). The machine learning model was used for three major
where machine learning predictions based on power sys- conditions in which are training of dataset using the labeled
tem measurements contradict with actual system conditions pool, estimation of uncertainty using probabilistic approach
by creating the training set around the identified operating and determining the class of the operating pointing in the

VOLUME 12, 2024 66665


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 15. a) Load and generator data for scenario 2. b) Transmission line
data for scenario 2.

TABLE 16. a) Load and generator data for scenario 3. b) Load and
generator data for scenario 3.

FIGURE 19. Voltage profile results of all the four scenarios.

results presented in Table 9 -11, indicates that GPR as


the highest accuracy for most cases, the MSE, which in is
measured in megawatts square, is the mean of the square of
the difference between the VSM predicted by the regression
model and the actual value obtained using continuous power
flow (CPF), while the computational time needed to train
the dataset is more significant when compared to SVM and
DT both the training and testing MSE are noticeably lower
in GPR and ANN. Even though DT uses less processing
power than other regression models, its performance accuracy
is low, which is in direct opposition to the value that it
label dataset which is called the margin. The accuracy of each represents. The size of the training data set has little effect
of the models used is presented in Table 8. The results shows on the overall performance of regression models in either
that RF shows a better performance in term of accuracy for scenario. In terms of training and testing MSE, GPR and
the three conditions. Moreso, the proposed pool-based active ANN performed better. However, training the model requires
learning approach can build data sets for a machine learning additional computation time, highlighting the necessity for
model to train on more efficiently. parallel computing techniques. Moreso, the following MSE
Reference [79] uses (GPR), (ANN), (SVM), and (DT) to values of each of the model GPR(2.29E+03), ANN (0.3641,
predict voltage stability margin the voltage magnitude and 0.9119,0.10456,0.2863,1.6715,0.6869,0.1164,1.9961,1.4257),
angle at each bus are inputs to machine learning algorithms SVM (0.2588,119.95,1.3537,0.5467,1.2148,0.6965,0.8520),
for evaluating the voltage stability margin. The analysis DT(9.9156,2.8741,0.1067,0.2460,0.2059,0.1068,0.1528) in
was done in the New England 39-bus system under normal Table 11, for different unseen network topology represents
conditions and in case of contingency in the network. The the unacceptable values that require the model to be updated.

66666 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 17. a) Load and generator data for scenario 4. b) Transmission line interval coverage probability (PICP) and Winkler scores for
data for scenario 4.
all test cases. Reference [83] predicts the voltage stability
of Nigeria’s (30) 330 kV bus using Auditory Machine intel-
ligence techniques (AMI) and the method was compared
with the Group Method of Data Handling (GMDH). The
results of simulation studies show that the AMI technique is
competitive with the GMDH time-series technique for several
experimental simulation runs.
The summary of related literatures on voltage prediction in
power system is presented in Table 13.
Since voltage level of buses operating within permissible
values gives a good stability to the network. Many factors
determined the prediction of voltage level in interconnected
systems such as load at each bus, total load in the network,
behavior in the network (load shedding), the voltage of pre-
vious buses in the network and any compensating techniques
used in the network. To validate the effects of loading and
distance on voltage profile load flow analysis of IEEE bus 5
was carried out under four scenarios of which the voltage
profile of all the buses in the network was observed and the
values of the voltage profile on each of the buses in the net-
work determines the voltage stability of the system. The data
In most scenarios of unknown topologies, the GPR model generated from the simulation of IEEE bus 5 under the four
outperformed in predicting the voltage stability margin with scenarios were later used to predict the voltage of each load
the least MSE value compared to other regression models. bus in the network. Voltage prediction in power system tends
On the contrary, in scenario 2, which represents the outage to depend on the network topology behavior. bus 5 under the
of generator G2, GPR cannot predict the VSM with the four scenarios were later used to predict the voltage of each
desired accuracy, which signifies the need for a model update load bus in the network. Voltage prediction in power system
for this case. However, the rest of the regression models tends to depend on the network topology behavior.
cannot predict VSM with accuracy in most cases; thus, will
require frequent updating of the database with these unknown III. EFFECTS OF LOADING AND DISTANCE ON VOLTAGE
topologies during the initial stage for data preparation and PROFILE
retraining the models. This will result in an overall increase in This section established the effects of distance and loading
the computational time of the estimation process. Additional on voltage profile of an interconnected network using IEEE
tests are done for the unknown operating points with unseen Bus 5 in Figure 18. Four different scenarios were considered
topologies not mentioned in Table 9. Although, in some cases, when simulating the IEEE bus 5 under different scenarios
GPR cannot predict with the desired accuracy, it requires of the base case: increase in load, increase in transmission
fewer model updates compared to other regression models distance, and increase in both load and transmission line. The
to estimate the VSM for online applications. load flow simulation was carried out on IEEE Bus 5 using
However, in determining the voltage stability margin, the Newton Raphson’s Algorithm in NEPLAN to validate the
operating voltage of the bus was not used. The prediction of effects of loading and distance on voltage profile of a power
voltage stability may be more accurate if the voltage profile of system network under the four different scenarios, Newton
each bus in the network is determined both under the normal Raphson’s algorithm was considered because of fast rate of
and in case of contingency. Reference [80] proposed an online convergence. The four scenarios were created because it is
probabilistic extreme learning machine (ELM) algorithm the most common scenarios associated to the power system
based on the power transformation technique. The prediction network in developing countries like Africa in which load
interval (PI) estimation for voltage stability margin VSM shedding, and expansion of a network beyond the maximum
is formulated as a Box-Cox transformation (B.T.) model to distance it can withstand before voltage violation occurs. The
consider uncertainties associated with predictions. The pro- load variation in the network is considered as loadshedding in
posed method was tested on two networks of NREL-118 this work and the distance increase is considered as the expan-
test system and a practical power system network of Taiwan sion in the network in which all the transmission line in the
power system (TPS). The results metric was compared with network was assumed to have the same distance of 1km for
the study of [81] and [82] as depicted in Table 12. scenario 1 and 2 and 2km for scenario 3 and 4. This was done
The scores for the PI generated by the proposed method are to ensure balanced power that each line carries, by ensuring
the lowest in the comparison. In other words, the proposed an equal share of the total power being transmitted, minimize
method provides the best performance regarding prediction losses in the network and improved voltage stability in the

VOLUME 12, 2024 66667


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

network since voltage drop will be minimized. The input using the machine learning approach. The power system
data used for simulation under the four different scenarios is network’s operating voltage needs to be predicted based on
presented in Table 14a -17b. The results of the voltage profile the expected load in the network. Since the operating voltage
for all the four scenarios presented in Figure 19, shows that of any power system network also determines the quality of
the loading and distance has a significant effect on the voltage voltage supply to the consumer and affects the voltage sta-
profile of the network. The higher the loading on a particular bility of the power system network. Therefore, the operating
bus the higher the chance of reduction in voltage level of voltage should be predicted to have well planned strategies in
that bus. Also, the longer the distance of the distribution and case of contingency. This research uses machine learning for
transmission lines the higher the chance of low voltage at the prediction of voltage by formulating different input variable
receiving end. The approach of network topology in power for voltage, load and frequency prediction in power system
system network is very important for example the true picture network based on the network topology behavior such as
of occurrence in Bus 2, cannot be determined by just using (loadshedding, voltage compensator, voltage of all the load
data associated to Bus 2 only or the use of exogenous data busses connected in the network must be used as an input
such as weather data. The use of network topology behavior to the model in order to ensure accuracy in the prediction.
data in a network is very essential in power system prediction Other operating parameters such as power losses, voltage
accuracy. drops were are not considered because the effect of the losses
Reference [84] established that the better the voltage pro- and voltage drops has been observed on the bus voltage. Since
file close to nominal voltage the higher the voltage stability. the system is interconnected the losses in term of power and
For a power system to be stable the V-Q values must be voltage drop between the sending end bus and the receiving
positive for buses in the network and the degree of the stability end bus as effects on the voltage level of the receiving end bus
depends on the V-Q values. The less the value of V-Q at any and all the buses in the network was considered as input for
bus the better the voltage stability at that bus. Since voltage prediction of the load buses. The bus one was not predicted
stability depends on voltage profile it is better to predict because, the effects of the load under the four scenarios has
the voltage profile in a network in order to ensure stability. no effects on the voltage profile of bus one as presented in
In must power system network to maintain high voltage Figure 19, which makes the correlation with other variables
stability in the network the voltage profile in the network is in the network not valid.
improve using different techniques. Reference [64] classified
the techniques that can be used for voltage stability into three IV. PROBLEM FORMULATION FOR VOLTAGE, LOAD, AND
such as conventional techniques which involves the addition FREQUENCY PREDICTION
of non-renewable energy sources such as diesel generator, Since voltage profile in power system depends on a lot of
to meet the load demand in order to stabilized the voltage operating conditions or factors in the network, it is necessary
profile. However, this techniques of adding a nonrenewable to model a formulation technique for voltage prediction in
sources generator to meet the daily increase in load in order power system. The network topology behavior considered in
to stabilized the voltage is not good because of the emission the prediction of voltage was divided into two, one when
of carbon monoxides that gives rise to depletion of ozone the load at the bus and the total load in the network was
layer [85]. The used of capacitor bank to inject reactive power considered as the input to the model and the second case
into the network to improve the voltage profile [86]. Also, when all the load of all buses in the network, total load
loadshedding in the network is also a method used to maintain in the network, the operating behavior in the network such
the voltage level of a power system, this method is mostly as load shedding and the use of compensator for voltage
used in developing countries and majority of African coun- improvement, in the IEEE Bus 5 network compensator was
tries are using this method which leaves part of the network in not considered.
blackout and thus reduce system reliability. References [64] Case 1: The use of load and total load in the network as
and [85] suggests that keeping the reactance of the transmis- the input to the model to predict the load bus voltage as
sion line low increases the voltage profile of the network and represented by equation (1)
thus increase the voltage stability.
References [52] and [86] also established that use of renew- Vi = F(Li , LT ) (5)
able energy sources for distributed generator such as winds
and photovoltaic tends to maintain the power system stability where Vi is the predicted bus voltage Li is the bus load and
by improving the voltage profile in the network. LT is the total load in the network.
More so, the use of Flexible AC transmission stem devices Case 2: The use of load of each bus in the network, voltage
(FACTS) such as static Var compensator (SVC), Static of each bus in the network, total power in the network and
synchronous compensator, interline power flow controller, the operating behavior in the network such as load shedding
(IPFC), static synchronous series compensator (SSSC), uni- and compensating techniques this is based on the network
fied power flow controller (UPFC). Machine learning models topology behavior.
have not been fully explored for voltage prediction in power
system networks. Few researchers worked on voltage stability Vi = F(Li , Lj , LT , Vj , Oc ) (6)

66668 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

FIGURE 21. Correlation matrix of IEEE Bus 5 dataset.

FIGURE 20. Flowchart of the step-by-step approach for machine learning


prediction.

where Vi is the predicted bus voltage, Li is the load at the


predicted bus, LT is the total active load in the network, Vj is
the voltage of other buses in the network, OC is the operating
condition in the network such as loadshedding and use of
compensator for voltage profile improvement.
Also, the load prediction was done based on the one-week
hourly data obtained from NERC, the Lagos load bus was
predicted in the network using the input variable formulation
of equation (7) and equation (8) based on the network topol-
ogy behavior because all the 21 load buses in the network are
FIGURE 22. Correlation matrix of NERC dataset of 330 kV load buses in
interconnected. The network topology behavior considered nigeria grid network.
was all the load buses connected to the network and the total
load in the network. The case 1 is when the load bus and
the total load was used as an input to the machine learning Li = F(Li , Lj , LT ) (8)
algorithim and case 2, when all the load buses in the network
and the total load was used as input to the model as presented where Li , Lj and LT are the load at bus i, load at bus j, total
in equation (7) and equation (8) respectively. load in the network.
Equation (7) represents the individual bus parameters and Moreso, the approach was also implemented for hourly
equation (8) represents the network topology behavior. frequency prediction, using the isolated station parameter as
the input formulation to the machine learning model and the
Li = F(Li , Lj ) (7) use of network topology behavior as the input to the model.

VOLUME 12, 2024 66669


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

there is no missing data in the simulated dataset therefore


treatment was not required. However, in the NERC dataset
there are four rows having missing values in which the rows
having the missing values was remove from the dataset since
it can affect the accuracy of the model. In the TCN dataset
there is no missing values.
This make the shape of the simulated data, the NERC
dataset and the TCN dataset to be (36, 11), (164 and 22)
and (60,7) respectively. Moreso, power system is dynamic
and the data collected are complex and diverse this can affect
the accuracy of the machine learning algorithim. Hence data
normalization was done by setting the data values within
a range. There are two commonly approach for data nor-
malization which are Z-scores normalization and Min-Max
normalization which is also refers to unity normalization.
Reference [90] stated that Z-scores normalization is not good
for energy data therefore Min-max normalization presented in
equation (11) is used in this research. Also, features selection
FIGURE 23. Correlation Matrix of TCN dataset for 330/ 132 kV substation.
was carried out in this research using correlation as presented
in equation (12).
The approach was tested on a dataset obtained from 330/
132 kV Transmission station in Nigeria which consist of three y − Min(y)
yn = (11)
outgoing 132 kV feeders. The dataset consist of load bus, Max (y) − min(y)
P
voltage and frequency on each bus and the total load from (xi − x̄)(yi − ȳ)
r = qP (12)
the station. The case 1 and case 2 input formulation to the
(xi − x̄)2 (yi − ȳ)2
P
model based on the data obtained is presented in equation (9)
and equation (10) respectively. In which case 1, is the load and yn is the normalized value, y is the values to be normalized,
voltage of the targeted variables bus in the network and case 2, min(y) is the minimum value in variable y, and max(y) is the
is the parameters of the network topology of the network that maximum value in variable y. r is the correlation co-efficient,
are connected with the targeted variables in the network. xi is the first variables and yi is the second variable, x̄ is
Fi = F(L330 , V330 ) (9) the mean of variables x, and ȳ is the mean of variable y.
The correlation results of all the variables in the simulated
Fi = F(L330 , V330 , Li, Lj , Lk , Vi , V j , Vk ) (10)
dataset, the NERC dataset and TCN dataset are presented in
where L330 and V330 is the load and voltage on the 330 kV the Figure 21 to 23 respectively. The results in Figure 21, indi-
bus in the network. Li , Lj , Lk , Vi , Vj and Vk are the load and cates that there is strong correlation between all the voltages
voltage on the three outgoing feeder i, j, and k respectively. in the simulated network. However, the correlation values are
However, equation (6), equation (8), equation (10), can not valid for the voltage of Bus 1, because the value of Bus
be modified to suit any operating condition in any power one remains constant under the four scenarios considered.
system network for voltage, load and frequency prediction Likewise, there is strong correlation between all the load on
respectively. In a situation where the tariff rate is available, each of the load bus in the IEEE bus 5 dataset. Also, the cor-
the generating capacity, the bus voltage etc. can be considered relation results of the NERC dataset presented in Figure 22,
as part of the network topology behavior data. indicates that there is strong correlation between all the load
buses in the NERC dataset and the total load in the network.
V. MODEL USED FOR PREDICTION AND EVALUATION Also, there is strong correlation between some selected load
The implementation of the machine learning techniques for bus and there is a weak correlation with some. Since there is
power system prediction was carried out on Python jupternote a relationship between all the load buses in the NERC net-
book in which the simulated IEEE bus dataset, historical work and the total load, this implies that the dynamic nature
NERC dataset and TCN 330/132 kV dataset obtained was of power system based on its network topology should be
imported into the notebook using panda’s library. The step- considered in predicting any of the network variable because
by-step approach on how the machine learning techniques is most power system are interconnected. In addition, Figure 23,
implemented for the voltage load and frequency prediction is shows that there is relationship between the frequency and the
presented in Figure 20. The data was imported into python load and voltage of each of the outgoing buses in the network
using the panda’s library and the data were preprocessed by for TCN dataset. In respective of strength of the correlation
cleaning the data checking for missing data since raw data all the features that has correlation are selected as input to
tends to include missing values two approaches was used model because of the dynamic nature of power system. After
the identification approach and treatment approach in which the features extractions the voltage of all the load buses in

66670 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

FIGURE 24. Voltage prediction of bus 2 in IEEE bus 5 network using


regression, decision tree and LSTM. FIGURE 25. Voltage prediction of bus 3 IEEE Bus 5 using regression,
decision tree and LSTM.

the IEEE bus 5, was predicted using the machine learning


approach and the input variables formulation of case 1 and
2 scenarios. The dataset was splitted into training set and
testing set. In which 80% of the data was used for training
in each of the model and 20% was used for testing. There
is different machine learning model which was classified as
supervised and unsupervised model. In this research linear
regression, decision tree regressor model which is a subset of
supervised machine learning model was used to predict the
voltage of all the load buses of the IEEE bus 5 network using
equation (14). However, Bus 1, voltage was not predicted
because it has no relationship with all the variables in the
network as shown in the correlation matrix result and is not a
load bus in the network.
Moreso, the models were also used to predict the Lagos
load Bus in the NERC dataset because is an industrial city and
it accounted for the highest load consumption in the NERC
dataset.
yk = α0 + xk + εk (13)
FIGURE 26. Voltage prediction of bus 4 in IEEE Bus 5 network using
when the variable x is more than one the equation becomes a regression, decision tree and LSTM.
multiple linear regression as presented in equation (14).
y = α0 + α1 x1 + . . . αn xn + εk (14) LSTM was also used for voltage and load prediction. The
where y is the predicted values of the dependent variable, voltage of the load bus was predicted on IEEE Bus 5, network.
α0 is the intercept when all the variables are set at zero, The LSTM consist of three major important gate which are
α0 is the coefficient of first independent variables, αn is the input gates, forget gates and output gates. The computation of
co-efficient of last independent variable and εk is the error in LSTM is show by equation (15) to (19) adopted from [6].
the model when predicting k. The decision tree classifier was
not used because the variables to be predicted is continuous i(t) = α(W _i ∗ h(t − 1) + k_i) (15)
that is the reason for using decision tree regressor. Moreso, f (t) = α(W _f ∗ h(t − 1) + k_f ) (16)

VOLUME 12, 2024 66671


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 18. Error in voltage prediction model using case 1.

TABLE 19. Error in voltage prediction model using case 2.

FIGURE 27. Voltage prediction of bus 5 in IEEE Bus 5 network using


regression, decision tree and LSTM.

O(t) = α(W _0 ∗ h(t − 1) + k_0) (17)


C(t) = f(f) ⊙ C[t − 1] + i(t) ⊙ (α.(W _c ∗ h[t − 1] + k_c))
(18) TABLE 20. Error in voltage prediction model using LSTM for voltage
prediction.
h(t) = β(c[t]) ⊙ O[t] (19)
where i[t] is the input gate, f [t] is forget gate, i[t] is input
gate, O[t] is output gate, C[t] is the state of this cell to encode
information from the input sequence, h[t] is network output,
W−i , W−f , W−0 are parameters to be learned, k−i , k−f , k−0 ,
k−c are biased vectors, β is hyperbolic tangent α is sigmoid
activation function, h[t].
The models were evaluated using mean absolute percent-
age error (MAPE) mean squared error (MSE), and root mean
squared error (RMSE) as presented in equation (20) to (22) input formulation for the regression, decision tree model and
1 Xn the LSTM used for predicting the frequency of 330/132 kV
MAPE = yp tt ∗ 100 (20) station in TCN dataset is based on case 1 and case 2 in
n t=1
1 X n 2 equation (9) and (10).
MSE = y p − yt (21) This approach can be implemented in any of the power
n t=1 t
r system simulation software’s that can work with python API
1 Xn 2
RMSE = yt p − yt (22) such as NEPLAN, DigisilentPowerFactory and Electrical
n t=1
Transient Analyzer Program (ETAP).
p
where yt the predicted value at any time t, yt is the actual
value at time t, n is the total number of predicted values. VI. RESULTS OF THE VOLTAGE PREDICTION USING THE
The input formulation into the Regression model, decision PROPOSED INPUT FORMULATION VARIABLES
tree techniques and the LSTM that was used to predict the The load bus voltage prediction in IEEE BUS 5 Network
voltage profile of the load bus in the IEEE bus 5 network is based on the simulated data is presented in this section. The
based on the case 1 and 2 input formulation in equation (5) voltage prediction of Bus2, Bus3, Bus4 and Bus 5 when the
and (6). Also, the input formulation for the regression and input formulation in case 1, and case 2 was used as the input to
decision tree model and the LSTM used for predicting the the regression, decision tree and LSTM models is presented
Lagos load bus in NERC dataset is based on the input formu- in Figure 24 to 27 respectively. The metric evaluation is
lation of case 1 and case 2 in equation (7) and (8). Moreso, the presented in Table 17 to Table 19. In Figure 24, the predicted

66672 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

voltage values at each instance of prediction are the same


for regression and decision tree model when the case 1 input
formulation were used as the input to the two model to predict
bus 2 voltage. Also, the predicted values are the same for the
two models at each instance of prediction when case 2 input
formulation was used as the input to the models.
However, with LSTM model the predicted voltage val-
ues of Bus 2, differs at each instance of prediction when
compared to the predicted voltage values of regression
and decision tree models when both input formulation of
case 1 and case 2 were used. Moreso, in Figure 25, the
predicted voltage of Bus 3, at each instance of prediction
is the same when decision tree and regression model was
used and differs when LSTM was used for case 1 input
formulation. Likewise, the predicted voltage is the same
for regression model and decision tree model but differs
for LSTM when the case two input formulation was used.
The same was obtained in voltage prediction of Bus 4 and
Bus 5 as presented in Figure 26 and 27 respectively. The
predicted bus voltage is the same for regression and decision
tree model and differs for LSTM when input formulation in
case 1 and case 2 was used as the input to the model. The
results of the model accuracy are presented in Table 18 when
the input formulation of case 1 is used, the RMSE values of
0.0351, 0.2315, 0.0596 and 0.2482 was obtained for both the
regression and decision tree model for Bus 2, Bus3, Bus4, FIGURE 28. Load prediction using regression with problem formulation
case 1.
and Bus 5 respectively. Moreso, the MAPE in the model
under the case 1, input formulation for both regression and
decision tree is 0.0376%, 0.0711%, 0.0679% and 0.0610%
for Bus 2 to Bus 5, respectively. This model is good for
voltage prediction since the MAPE is less than 5% which
is the minimum acceptable variation for voltage level the
use of the model for voltage prediction can helps the system
operators to initiate the necessary action when the voltage
is below the acceptable limit. Moreso, the model evaluation
metrics results presented in Table 19 based on the input
formulation of case 2, indicates that the model performs better
under the case 2 input formulation for regression and decision
tree model as the RMSE and MAPE shows a very little error
in the model as the value is very close to zero. Moreso, model
evaluation for LSTM presented in Table 20 for prediction of
Bus 2 to 5 using the input formulation of case 1, indicates
RMSE of 0.0514, 0.0835, 0.0828, 0.095 respectively.
Also, the RMSE for the predicted bus voltage when the
input formulation in case 2, was used is 0.045, 0.0686,
0.0721, 0.0800 for Bus 2 to 5 respectively, which is smaller
compared to when the input formulation of case 1 was used.
This indicates that machine learning models for voltage pre- FIGURE 29. Load prediction using decision tree regression with problem
formulation case 1.
diction tends to perform better when the network topology
behavior is considered as the input to the model.
effects of network topology behavior on prediction model.
VII. RESULTS OF THE LOAD PREDICTION USING THE The results of the prediction using regression, decision tree
PROPOSED INPUT FORMULATION VARIABLES regressor and LSTM are presented in Figure 28 to 33 and the
The load prediction using the problem formulation in results of the evaluation metrics are presented in Table 21
equation (7) and (8) by using 80% of the 7 days hourly and Table 22. The results in Figure 28 to 30, indicates the
data for training and 20% for testing in order to validate the prediction of load based on regression, decision tree and

VOLUME 12, 2024 66673


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

FIGURE 32. Load prediction using decision tree with case 2 input
formulation.

FIGURE 30. Load prediction using LSTM regression with problem


formulation case 1.

FIGURE 33. Load prediction using LSTM with case 2 input formulation.

TABLE 21. Error in load prediction using case 1.

FIGURE 31. Load prediction using regression with case 1 input


formulation. compares the predicted Lagos load and the actual load when
decision tree model was used and case 1 input formulation
was used as the input to the model in which there is a variation
LSTM for case 1 input formulation. The result in Figure 28, between the predicted values and the actual values. Moreso,
compares the predicted Lagos load with the actual load using the results presented in Figure 30, indicates the variations in
regression model with input formulation of case 1. The result the predicted Lagos load for 3 hours and 12 hours and the
indicates that there is variation between the actual and pre- actual Lagos load in the NERC dataset using LSTM model
dicted Lagos load. Also, the results presented in Figure 29, and input formulation of case 1 as input to the model. The

66674 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

TABLE 22. Error in load prediction using case 1.

results of the evaluation metrics when the input formulation


of case 1 was used as presented in Table 21 shows that the
RMSE are 87.83, 95.42, 107.43 and 236.46 for regression,
decision tree model and LSTM in 3 hours and 12 hours load
prediction respectively. Though the RMSE error in the model
is high but the LSTM error for 3 hours prediction is better than
that of 12 hours prediction this implies that predictive model FIGURE 34. Frequency prediction using regression with problem
tends to be more accurate for predicting a VSTLP. Also, the formulation case 1.
results presented in Figure 31 to 33 indicates the variation
in the actual and predicted load of Lagos bus when regres-
sion, decision tree and LSTM models were used respectively.
The results of the evaluation metric presented in Table 22
was used to test the performance of the model in term of
accuracy. The RMSE reduced to 5.0185, 57.59, 21.37 and
167.7 for regression model, decision tree regressor, LSTM
for 3 hours prediction and 12 hours prediction respectively
when the input formulation of case 2, was used as the input
to the models. The load prediction in an interconnected grid
network using other load bus in the network as multiple input
to the model tends to give a better accuracy. Comparing the
results in Table 21 and Table 22 the errors in prediction model
for regression, decision tree and LSTM reduced when input
formulation 2 was used as input to the model compared to
when the input formulation of case 1 was used. This indicates FIGURE 35. Frequency prediction using decision tree with problem
that the prediction model tends to be more accurate when formulation case 1.

network topology behavior variables were used than when


parameters of the predicted Bus only were used. However, to insufficient dataset and less features used for prediction.
the accuracy in prediction for LSTM is low compared to In addition, the LSTM model performs better for 3 hours pre-
regression and decision tree model this is due to the small diction than the 12 hours prediction. This implies that model
datasets because LSTM performs better with large dataset. tends to be more accurate when looking to the nearest future
than far future. Since frequency is a critical parameter in
VIII. RESULTS OF FREQUENCY PREDICTION USING THE power system in which the operation outsides the acceptable
PROPOSED INPUT FORMULATION VARIABLES recommended values in a network can leads to collapse in
The results of the prediction when the parameter of isolated the network. Therefore, short time prediction approach for
single bus in the network was used and the use of the network predicting a frequency for high accuracy performance is rec-
topology behavior features as the input formulation variable ommended. Furthermore, the study considered case 2, which
is presented in Figure 34 to 39. The results in Figure 34 is the network topology behavior as an input formulation to
to 36 indicates the regression model, DT, and LSTM when the models. The results presented in Figure 37 to 39 indicates
the case 1, only the isolated bus parameter was used as the that the regression, DT and LSTM model performs better than
input to the model. There is variation between the predicted when case 1 was used has the input formulation. This is due
frequency and the actual value for regression model, DT and to the fact that more features have been considered. Moreso,
LSTM. The variation in each of the model was tested using the evaluation metrics is presented in Table 24 to determine
the evaluation metrics to determine the error in the model. the error in the model for case 2 input formulation.
The results presented in Table 23, show that model has RMSE The results show that DT with the least RMSE of 0.000,
of 0.4399, 0.5502, 0.5182 and 0.9672 for regression model, model performs better than regression and LSTM because DT
DT, LSTM for 3hours and LSTM for 12 hours. Since the has tendency to capture the relationship between the features
regression model has the lowest error in the model. The model accurately than regression. However, the model of DT giving
performs better compared to DT and LSTM this may be due RMSE error of zero indicates that the dataset is small, and

VOLUME 12, 2024 66675


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

FIGURE 36. Frequency prediction using LSTM with problem formulation FIGURE 38. Frequency prediction using decision tree with problem
case 1. formulation case 2.

TABLE 23. Error in frequency prediction using case 1.

TABLE 24. Error in frequency prediction using case 2.

FIGURE 39. Frequency prediction using LSTM with problem formulation


case 2.

power system network, the use of network topology behavior


as input formulation increases the accuracy of the model.

IX. CONCLUSION
This study has considered various literature’s on predicting
voltage, load, and frequency in power system networks. Many
researchers have explored machine learning techniques for
load prediction from STLP to LTLP. Also, machine learning
techniques for voltage and frequency prediction in power
systems are not fully explored. Research is more carried
out on network voltage stability in which few researchers
used machine learning to predict. The analysis carried out
FIGURE 37. Frequency prediction using regression with problem
formulation case 2.
on the load, voltage and frequency prediction using network
topology behavior variables indicate that predicting load,
frequency and voltage in power system the predictive model
DT can easily memorize the entire dataset, including noise, performs better using network topology behavior variables as
leading to zero errors. This model might fail to generalize to input to the model than when individual data associated to the
new data. Despite the high ability of LSTM to capture the isolated Bus in the network is used.
nonlinear relationship between features for accuracy is still The research established that the use of network topology
low compared to DT. This is due to the small dataset. The behavior features as an input to machine learning model for
larger the dataset the better the accuracy of the LSTM model. power system prediction gives a better accuracy than when
The findings established that in predicting the frequency of a the isolated data for a particular part of the network is used

66676 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

for the prediction. Also, the use of operating condition vari- [14] A. Albatayneh, A. Juaidi, R. Abdallah, A. Peña-Fernández, and
ables such as load shedding, compensator for voltage profile F. Manzano-Agugliaro, ‘‘Effect of the subsidised electrical energy tariff
on the residential energy consumption in Jordan,’’ Energy Rep., vol. 8,
with other buses parameters in an interconnected network pp. 893–903, Nov. 2022.
improve the prediction accuracy. This research therefore sug- [15] L. Hernandez, C. Baladron, J. M. Aguiar, B. Carro,
gests that for high accuracy in power system prediction the A. J. Sanchez-Esguevillas, J. Lloret, and J. Massana, ‘‘A survey on
electric power demand forecasting: Future trends in smart grids,
input variables to the model should be based on network microgrids and smart buildings,’’ IEEE Commun. Surveys Tuts., vol. 16,
topology behavior features such as loading of buses, voltage no. 3, pp. 1460–1495, 3rd Quart., 2014.
of all buses, operating condition (such as loadshedding, the [16] W.-C. Hong, ‘‘Electric load forecasting by seasonal recurrent SVR (support
vector regression) with chaotic artificial bee colony algorithm,’’ Energy,
use of compensator for voltage improvement, tariff rate etc. vol. 36, no. 9, pp. 5568–5578, Sep. 2011.
Also, research can be carried out on hybridizing the network [17] I. K. Nti, S. Asafo-Adjei, and M. Agyemang, ‘‘Predicting monthly elec-
topology behavior variables with the exogenous dataset asso- tricity demand using soft-computing technique,’’ Int. Res. J. Eng. Technol.,
vol. 6, pp. 1967–1973, Jan. 2019.
ciated to an interconnected grid network for power system
[18] S. Li, P. Wang, and L. Goel, ‘‘Short-term load forecasting by wavelet
parameters prediction using machine learning. Furthermore, transform and evolutionary extreme learning machine,’’ Electric Power
the specific task, the features of the data, and the intended Syst. Res., vol. 122, pp. 96–103, May 2015.
trade-offs between simplicity, adaptability, efficiency, real- [19] P. N. Kouroupetroglou, ‘‘Machine learning techniques for short-term elec-
tric load forecasting,’’ Fac. Sci., School Inform., Dept. Comput. Sci.,
time application, resilience, and sensitivity all play a role Aristotle Univ. Thessaloniki, Thessaloniki, Greece, Tech. Rep., 2017,
in the machine learning algorithm selection in power sys- pp. 1–73.
tems. Every algorithm has advantages and disadvantages, and [20] S. Ghore and A. Goswami, ‘‘Short-term load forecasting of Chhattisgarh
grid using artificial neural network,’’ Int. J. Eng. Dev. Res., vol. 3,
choosing the best one requires taking into account the partic- pp. 391–397, Jan. 2011.
ular needs of the datasets used in power system applications. [21] H. Kuhba and H. A. H. Al-Tamimi, ‘‘Power system short-term load fore-
casting using artificial neural networks,’’ Int. J. Eng. Dev. Res., vol. 4,
pp. 78–87, Jan. 2016.
REFERENCES [22] M. U. Fahad and N. Arbab, ‘‘Factor affecting short term load forecasting,’’
[1] X. Luo, J. Wang, M. Dooner, and J. Clarke, ‘‘Overview of current J. Clean Energy Technol., vol. 2, no. 4, pp. 305–309, 2014.
development in electrical energy storage technologies and the application [23] S. Zakarya, H. Abbas, and M. Belal, ‘‘Long-term deep learning load
potential in power system operation,’’ Appl. Energy, vol. 137, pp. 511–536, forecasting based on social and economic factors in the Kuwait region,’’
Jan. 2015. J. Theor. Appl. Inf. Technol., vol. 95, pp. 1524–1535, Jan. 2017.
[2] D. Novosel, M. M. Begovic, and V. Madani, ‘‘Shedding light on black- [24] P. Stavast, ‘‘Prediction of energy consumption using historical data
outs,’’ IEEE Power Energy Mag., vol. 2, no. 1, pp. 32–43, Jan. 2004. and Twitter,’’ Fac. Math. Natural Sci., Univ. Groningen, Groningen,
[3] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, ‘‘Short-term The Netherlands, Tech. Rep., 2014, pp. 1–43.
residential load forecasting based on LSTM recurrent neural network,’’ [25] R. Angamuthu Chinnathambi, A. Mukherjee, M. Campion, H. Salehfar,
IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 841–851, Jan. 2019. T. Hansen, J. Lin, and P. Ranganathan, ‘‘A multi-stage price forecast-
ing model for day-ahead electricity markets,’’ Forecasting, vol. 1, no. 1,
[4] H. Wang, Y. Zhao, and S. Tan, ‘‘Short-term load forecasting of power
pp. 26–46, Jul. 2018, doi: 10.3390/forecast1010003.
system based on time convolutional network,’’ in Proc. 8th Int. Symp. Next
Gener. Electron. (ISNE), Zhengzhou, China, Oct. 2019, pp. 1–3. [26] N. Mohan, K. P. Soman, and S. Sachin Kumar, ‘‘A data-driven strategy for
short-term electric load forecasting using dynamic mode decomposition
[5] M. Zou, D. Fang, G. Harrison, and S. Djokic, ‘‘Weather based day-ahead
model,’’ Appl. Energy, vol. 232, pp. 229–244, Dec. 2018.
and week-ahead load forecasting using deep recurrent neural network,’’ in
[27] B. F. Hobbs, S. Jitprapaikulsarn, S. Konda, V. Chankong, K. A. Loparo,
Proc. IEEE 5th Int. Forum Res. Technol. Soc. Ind. (RTSI), Florence, Italy,
and D. J. Maratukulam, ‘‘Analysis of the value for unit commitment
Sep. 2019, pp. 341–346.
of improved load forecasts,’’ IEEE Trans. Power Syst., vol. 14, no. 4,
[6] B. Farsi, M. Amayri, N. Bouguila, and U. Eicker, ‘‘On short-term pp. 1342–1348, Nov. 1999.
load forecasting using machine learning techniques and a novel parallel
[28] D. W. Bunn, ‘‘Forecasting loads and prices in competitive power markets,’’
deep LSTM-CNN approach,’’ IEEE Access, vol. 9, pp. 31191–31212,
Proc. IEEE, vol. 88, no. 2, pp. 163–169, Feb. 2000.
2021.
[29] N. Mlilo, J. Brown, and T. Ahfock, ‘‘Impact of intermittent renewable
[7] N. Singh, C. Vyjayanthi, and C. Modi, ‘‘Multi-step short-term electric energy generation penetration on the power system networks—A review,’’
load forecasting using 2D convolutional neural networks,’’ in Proc. IEEE- Technol. Econ. Smart Grids Sustain. Energy, vol. 6, no. 1, pp. 25–35,
HYDCON, Hyderabad, India, Sep. 2020, pp. 1–5. Dec. 2021.
[8] C. Tian, J. Ma, C. Zhang, and P. Zhan, ‘‘A deep neural network model [30] M. R. Vedady Moghadam, R. T. B. Ma, and R. Zhang, ‘‘Distributed
for short-term load forecast based on long short-term memory network frequency control in smart grids via randomized demand response,’’ IEEE
and convolutional neural network,’’ Energies, vol. 11, no. 12, p. 3493, Trans. Smart Grid, vol. 5, no. 6, pp. 2798–2809, Nov. 2014.
Dec. 2018, doi: 10.3390/en11123493. [31] P. J. C. Vogler-Finck and W.-G. Früh, ‘‘Evolution of primary frequency
[9] P.-H. Kuo and C.-J. Huang, ‘‘A high precision artificial neural networks control requirements in great Britain with increasing wind generation,’’ Int.
model for short-term energy load forecasting,’’ Energies, vol. 11, no. 1, J. Electr. Power Energy Syst., vol. 73, pp. 377–388, Dec. 2015.
p. 213, Jan. 2018, doi: 10.3390/en11010213. [32] K. Dehghanpour and S. Afsharnia, ‘‘Electrical demand side contribution
[10] L. Hu and G. Taylor, ‘‘A novel hybrid technique for short-term electricity to frequency control in power systems: A review on technical aspects,’’
price forecasting in UK electricity markets,’’ J. Int. Council Electr. Eng., Renew. Sustain. Energy Rev., vol. 41, pp. 1267–1276, Jan. 2015.
vol. 4, no. 2, pp. 114–120, Apr. 2014, doi: 10.5370/jicee.2014.4.2.114. [33] S. A. Pourmousavi and M. H. Nehrir, ‘‘Real-time central demand response
[11] S. Bouktif, A. Fiaz, A. Ouni, and M. Serhani, ‘‘Optimal deep learning for primary frequency regulation in microgrids,’’ IEEE Trans. Smart Grid,
LSTM model for electric load forecasting using feature selection and vol. 3, no. 4, pp. 1988–1996, Dec. 2012.
genetic algorithm: Comparison with machine learning approaches,’’ Ener- [34] C. Zhao, U. Topcu, and S. H. Low, ‘‘Frequency-based load control in power
gies, vol. 11, no. 7, p. 1636, Jun. 2018, doi: 10.3390/en11071636. systems,’’ in Proc. Amer. Control Conf. (ACC), Jun. 2012, pp. 4423–4430.
[12] P. J. Zarco-Perinan, I. M. Zarco-Soto, and F. J. Zarco-Soto, ‘‘Influence [35] P. Gupta, R. S. Bhatia, and D. K. Jain, ‘‘Average absolute frequency
of the population density of cities on energy consumption of their house- deviation value based active islanding detection technique,’’ IEEE Trans.
holds,’’ Sustainability, vol. 13, pp. 1–15, 2021, doi: 10.3390/su13147542. Smart Grid, vol. 6, no. 1, pp. 26–35, Jan. 2015.
[13] G. K. Akara, B. Hingray, A. Diawara, and A. Diedhiou, ‘‘Effect of weather [36] P. Moutis and N. D. Hatziargyriou, ‘‘Decision trees-aided active power
on monthly electricity consumption in three coastal cities in West Africa,’’ reduction of a virtual power plant for power system over-frequency mitiga-
AIMS Energy, vol. 9, no. 3, pp. 446–464, 2021. tion,’’ IEEE Trans. Ind. Informat., vol. 11, no. 1, pp. 251–261, Feb. 2015.

VOLUME 12, 2024 66677


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

[37] S. Sun, M. Dong, and B. Liang, ‘‘Real-time welfare-maximizing regulation [58] Y. Q. Jing, X. Y. Li, and X. M. Guo, ‘‘Transient voltage stability fast
allocation in dynamic aggregator-EVs system,’’ IEEE Trans. Smart Grid, criterion considering induction motor load model,’’ Power Syst. Automat.,
vol. 5, no. 3, pp. 1397–1409, May 2014. vol. 35, no. 5, pp. 10–14, 2011.
[38] M. R. Tur, M. Wadi, A. Shobole, and S. Ay, ‘‘Load frequency control of [59] F. Karbalaei, M. Kalantar, and A. Kazemi, ‘‘Diagnosis of voltage collapse
two area interconnected power system using fuzzy logic control and PID due to induction motor stalling using static analysis,’’ Energy Convers.
controller,’’ in Proc. 7th Int. Conf. Renew. Energy Res. Appl. (ICRERA), Manag., vol. 49, no. 2, pp. 151–156, Feb. 2008.
Oct. 2018, pp. 1253–1258. [60] P. Sharma and A. Kumar, ‘‘Thevenin’s equivalent based P–Q–V voltage
[39] H. Zhu, Y. Hu, and X. Wang, ‘‘Frequency stability control method of stability region visualization and enhancement with FACTS and HVDC,’’
AC/DC power system based on convolutional neural network,’’ in Proc. Int. J. Electr. Power Energy Syst., vol. 80, pp. 119–127, Sep. 2016.
IEEE Sustain. Power Energy Conf. (iSPEC), Nov. 2020, pp. 2609–2615. [61] S.-J. Chuang, C.-M. Hong, and C.-H. Chen, ‘‘Improvement of integrated
[40] R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, and transmission line transfer index for power system voltage stability,’’ Int. J.
C. Kim, ‘‘Machine learning in materials informatics: Recent applications Electr. Power Energy Syst., vol. 78, pp. 830–836, Jun. 2016.
and prospects,’’ Npj Comput. Mater., vol. 3, no. 1, pp. 54–64, Dec. 2017. [62] W. C. Zhang, B. Zhang, and J. Pan, ‘‘Analysis of transient voltage instabil-
[41] Q. Wang, F. Li, Y. Tang, and Y. Xu, ‘‘Integrating model-driven and ity mechanism based on grid load mutual feed characteristics of induction
data-driven methods for power system frequency stability assessment motor,’’ Power Syst. Autom., vol. 41, no. 7, pp. 8–14, 2017.
and control,’’ IEEE Trans. Power Syst., vol. 34, no. 6, pp. 4557–4568, [63] Z. Y. Gu, Y. Tang, and J. Yi, ‘‘Analysis of interaction mechanism between
Nov. 2019. power system angle instability and local induction motor instability,’’
[42] J. Dong, X. Ma, S. M. Djouadi, H. Li, and Y. Liu, ‘‘Frequency prediction Power Grid Technol., vol. 41, no. 8, pp. 2499–2505, 2017.
of power systems in FNET based on state-space approach and uncertain [64] R. Diao, Z. Wang, D. Shi, Q. Chang, J. Duan, and X. Zhang, ‘‘Autonomous
basis functions,’’ IEEE Trans. Power Syst., vol. 29, no. 6, pp. 2602–2612, voltage control for grid operation using deep reinforcement learning,’’
Nov. 2014. in Proc. IEEE Power Energy Soc. Gen. Meeting (PESGM), Aug. 2019,
[43] I. Jayawardene and G. K. Venayagamoorthy, ‘‘Cellular computational pp. 1–5.
extreme learning machine network based frequency predictions in a power [65] I. Oladeji, R. Zamora, and T. T. Lie, ‘‘Optimal placement of renewable
system,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), May 2017, energy sources distributed generation in an unbalanced network for modern
pp. 3377–3384. grid operations,’’ in Proc. Int. Conf. Smart Energy Syst. Technol. (SEST),
[44] A. Fernández-Guillamón, E. Gómez-Lázaro, E. Muljadi, and Sep. 2021, pp. 1–6.
Á. Molina-García, ‘‘Power systems with high renewable energy sources:
[66] Y. Naderi, S. H. Hosseini, S. Ghassem Zadeh, B. Mohammadi-Ivatloo,
A review of inertia and frequency control strategies over time,’’ Renew.
J. C. Vasquez, and J. M. Guerrero, ‘‘An overview of power quality enhance-
Sustain. Energy Rev., vol. 115, Nov. 2019, Art. no. 109369.
ment techniques applied to distributed generation in electrical distribution
[45] H. Bevrani, A. Ghosh, and G. Ledwich, ‘‘Renewable energy sources and networks,’’ Renew. Sustain. Energy Rev., vol. 93, pp. 201–214, Oct. 2018.
frequency regulation: Survey and new perspectives,’’ IET Renew. Power
[67] C. L. T. Borges and D. M. Falcão, ‘‘Optimal distributed generation alloca-
Gener., vol. 4, no. 5, pp. 438–457, 2010.
tion for reliability, losses, and voltage improvement,’’ Int. J. Electr. Power
[46] K. Zhang, B. Wang, D. Liu, J. Zhao, Y. Guo, and Z. Wu, ‘‘Prediction Energy Syst., vol. 28, no. 6, pp. 413–420, Jul. 2006.
modeling of frequency response characteristic of power system based on
[68] A. R. Gupta and A. Kumar, ‘‘Deployment of distributed generation with D-
historical data,’’ in Proc. IEEE/IAS Ind. Commercial Power Syst. Asia,
FACTS in distribution system: A comprehensive analytical review,’’ IETE
Jul. 2020, pp. 1486–1490.
J. Res., vol. 68, no. 2, pp. 1195–1212, Mar. 2022.
[47] M. Rotimi Adu, I. Osilama Oshiobugie, and T. David Makanju, ‘‘Electrical
[69] A. S. N. Huda and R. Živanović, ‘‘Large-scale integration of distributed
load flow analysis of Auchi distribution network without load shedding,’’
generation into distribution networks: Study objectives, review of mod-
Int. J. Eng. Res. Updates, vol. 4, no. 2, pp. 35–44, May 2023.
els and computational tools,’’ Renew. Sustain. Energy Rev., vol. 76,
[48] T.-H. Chen, L.-S. Chiang, and N.-C. Yang, ‘‘Examination of major factors pp. 974–988, Sep. 2017.
affecting voltage variation on distribution feeders,’’ Energy Buildings,
[70] O. G. I. Okwe Gerald Ibe, ‘‘Concepts of reactive power control and
vol. 55, pp. 494–499, Dec. 2012.
voltage stability methods in power system network,’’ IOSR J. Comput.
[49] R. Yan and T. K. Saha, ‘‘Investigation of voltage variations in unbalanced
Eng., vol. 11, no. 2, pp. 15–25, 2013.
distribution systems due to high photovoltaic penetrations,’’ in Proc. IEEE
Power Energy Soc. Gen. Meeting, Jul. 2011, pp. 1–8. [71] D. N. Kosterev, C. W. Taylor, and W. A. Mittelstadt, ‘‘Model validation
for the August 10, 1996 WSCC system outage,’’ IEEE Trans. Power Syst.,
[50] A. Debnath and C. Nandi, ‘‘Voltage profile analysis during fault with
vol. 14, no. 3, pp. 967–979, Aug. 1999.
STATCOM,’’ Int. J. Comput. Appl., vol. 72, no. 11, pp. 16–22, Jun. 2013.
[72] Y. Ma, S. Lv, X. Zhou, and Z. Gao, ‘‘Review analysis of voltage stability
[51] J. O. Petinrin and M. Shaabanb, ‘‘Impact of renewable generation on
in power system,’’ in Proc. IEEE Int. Conf. Mechatronics Autom. (ICMA),
voltage control in distribution systems,’’ Renew. Sustain. Energy Rev.,
Aug. 2017, pp. 7–12.
vol. 65, pp. 770–783, Nov. 2016.
[73] M. S. S. Danish, A. Yona, and T. Senjyu, ‘‘A review of voltage stability
[52] S. N. Salih and P. Chen, ‘‘On coordinated control of OLTC and reactive
assessment techniques with an improved voltage stability indicator,’’ Int.
power compensation for voltage regulation in distribution systems with
J. Emerg. Electric Power Syst., vol. 16, no. 2, pp. 107–115, Apr. 2015.
wind power,’’ IEEE Trans. Power Syst., vol. 31, no. 5, pp. 4026–4035,
Sep. 2016. [74] M. Z. El-Sadek, ‘‘Voltage instabilities subsequent to short-circuit recover-
ies,’’ Electric Power Syst. Res., vol. 21, no. 1, pp. 9–16, Apr. 1991.
[53] H. Bakir and A. A. Kulaksiz, ‘‘Modelling and voltage control of the solar-
wind hybrid micro-grid with optimized STATCOM using GA and BFA,’’ [75] H. D. Sun, ‘‘Analysis and application of voltage stability considering
Eng. Sci. Technol., Int. J., vol. 23, no. 3, pp. 576–584, Jun. 2020. induction motor load,’’ China Electr. Power Res. Inst., 2015.
[54] V. B. Pamshetti, S. Singh, and S. P. Singh, ‘‘Combined impact of net- [76] L. Li, C. Lu, and Z. G. Huang, ‘‘Analytical evaluation method for transient
work reconfiguration and volt-VAR control devices on energy savings voltage stability of load nodes considering induction motor,’’ Power Syst.
in the presence of distributed generation,’’ IEEE Syst. J., vol. 14, no. 1, Automat., vol. 33, no. 7, pp. 1–5, 2009.
pp. 995–1006, Mar. 2020. [77] R. Y. Tang, Voltage Stability Analysis of Power System. Beijing, China:
[55] N. Tshivhase, A. N. Hasan, and T. Shongwe, ‘‘An average voltage approach Science Press Beijing, 2011.
to control energy storage device and tap changing transformers under high [78] V. Malbasa, C. Zheng, P.-C. Chen, T. Popovic, and M. Kezunovic, ‘‘Voltage
distributed generation,’’ IEEE Access, vol. 9, pp. 108731–108753, 2021, stability prediction using active machine learning,’’ IEEE Trans. Smart
doi: 10.1109/ACCESS.2021.3101463. Grid, vol. 8, no. 6, pp. 3117–3124, Nov. 2017.
[56] A. Moreno-Munoz, J. J. G. De-la-Rosa, M. A. Lopez-Rodriguez, [79] A. Adhikari, S. Naetiladdanon, A. Sagswang, and S. Gurung, ‘‘Com-
J. M. Flores-Arias, F. J. Bellido-Outerino, and M. Ruiz-de-Adana, parison of voltage stability assessment using different machine learning
‘‘Improvement of power quality using distributed generation,’’ Int. J. algorithms,’’ in Proc. IEEE 4th Conf. Energy Internet Energy Syst. Integr.
Electr. Power Energy Syst., vol. 32, no. 10, pp. 1069–1076, 2010. (EI), China, Oct. 2020, pp. 2023–2026.
[57] Z. J. Wang, ‘‘Analytical algorithm and application of critical clearing time [80] H.-Y. Su and H.-H. Hong, ‘‘An intelligent data-driven learning approach
for voltage sag of induction motor,’’ Shandong Univ., Tech., China, Rep., to enhance online probabilistic voltage stability margin prediction,’’ IEEE
2014. Trans. Power Syst., vol. 36, no. 4, pp. 3790–3793, Jul. 2021.

66678 VOLUME 12, 2024


T. D. Makanju et al.: Machine Learning Approaches: A Systematic Review

[81] R. D. de Veaux, J. Schumi, J. Schweinsberg, and L. H. Ungar, ‘‘Prediction THOKOZANI SHONGWE (Senior Member,
intervals for neural networks via nonlinear regression,’’ Technometrics, IEEE) received the B.Eng. degree in electronic
vol. 40, no. 4, pp. 273–282, Nov. 1998. engineering from the University of Swaziland,
[82] A. Khosravi, E. Mazloumi, S. Nahavandi, D. Creighton, and Swaziland, in 2004, the M.Eng. degree in telecom-
J. W. C. van Lint, ‘‘Prediction intervals to account for uncertainties munications engineering from the University of
in travel time prediction,’’ IEEE Trans. Intell. Transp. Syst., vol. 12, no. 2, the Witwatersrand, South Africa, in 2006, and the
pp. 537–547, Jun. 2011. D.Eng. degree from the University of Johannes-
[83] Biobele. A. Wokoma, Emmanuel. N. Osegi, and Alex. O. Idachaba, burg, South Africa, in 2014. He is currently an
‘‘Predicting voltage stability indices of Nigerian 330 kV 30
Associate Professor of telecommunications and
bus power network using an auditory machine intelligence
the Head of the School of Electrical and Elec-
technique,’’ in Proc. IEEE AFRICON, Sep. 2019, pp. 1–4, doi:
10.1109/AFRICON46755.2019.9133915. tronic Engineering, University of Johannesburg. He is the Co-Founder of
[84] R. Yan and T. K. Saha, ‘‘Investigation of voltage stability for residential a research group at the University of Johannesburg called Artificial Intel-
customers due to high photovoltaic penetrations,’’ IEEE Trans. Power ligence for Electrical Engineering Applications (AI for EE Applications).
Syst., vol. 27, no. 2, pp. 651–662, May 2012. This research group is currently composed of five staff members, two
[85] C. W. Gellings, The Smart Grid: Enabling Energy Efficiency and Demand postdoctoral researchers working in the fields of telecommunications and
Response. Boca Raton, FL, USA: CRC Press, 2020. machine learning, five Ph.D. students, and ten master’s degree students
[86] V. Calderaro, G. Conio, V. Galdi, and A. Piccolo, ‘‘Reactive power control working in the fields of power line communications, visible light com-
for improving voltage profiles: A comparison between two decentral- munications, application of ML in PLC, power systems, agriculture, and
ized approaches,’’ Electric Power Syst. Res., vol. 83, no. 1, pp. 247–254, object tracking. His research interests include digital communications, error-
Feb. 2012. correcting coding, power-line communications, cognitive radio, smart grids,
[87] U. Sultana, A. B. Khairuddin, M. M. Aman, A. S. Mokhtar, and N. Zareen, visible light communications, machine learning, and artificial intelligence.
‘‘A review of optimum DG placement based on minimization of power He was a recipient of the 2014 University of Johannesburg Global Excellence
losses and voltage stability enhancement of distribution system,’’ Renew. Stature (GES) Award, which was awarded to him to carry out a postdoctoral
Sustain. Energy Rev., vol. 63, pp. 363–378, Sep. 2016.
research with the University of Johannesburg. In 2016, he was a recipient
[88] B. Singh, V. Mukherjee, and P. Tiwari, ‘‘A survey on impact assessment
of the TWAS-DFG Cooperation Visits Programme funding to do research
of DG and FACTS controllers in power systems,’’ Renew. Sustain. Energy
in Germany. Other awards that he has received in the past, such as the
Rev., vol. 42, pp. 846–882, Feb. 2015.
[89] C. D. Iweh, S. Gyamfi, E. Tanyi, and E. Effah-Donyina, ‘‘Distributed
post-graduate merit award scholarship to pursue the master’s degree with
generation and renewable energy integration into the grid: Prerequisites, the University of Witwatersrand, in 2005, which is awarded on a merit
push factors, practical options, issues and merits,’’ Energies, vol. 14, no. 17, basis. In the year 2012, he (and his co-authors) received an award of the
p. 5375, Aug. 2021. Best Student Paper at the IEEE ISPLC 2012 (power line communications
[90] S. Zhan, Z. Liu, A. Chong, and D. Yan, ‘‘Building categorization revisited: conference), Beijing, China. Also in the year 2020, he (and his co-authors)
A clustering-based approach to using smart meter data for building energy received an award of the Best Paper at the International Conference on
benchmarking,’’ Appl. Energy, vol. 269, Jul. 2020, Art. no. 114920, doi: Science and Technology (ICST 2020), Yogyakarta, Indonesia.
10.1016/j.apenergy.2020.114920.

OLUWOLE JOHN FAMORIJI (Member, IEEE)


received the B.Tech. degree in electrical and
electronic engineering from the Ladoke Akintola
University of Technology, Ogbomoso, Nigeria,
in 2009, the M.Eng. degree in communications
engineering from the Federal University of Tech-
nology Akure, Nigeria, in 2014, and the Ph.D.
TOLULOPE DAVID MAKANJU received the degree in electronic science and technology from
B.Eng. and M.Eng. degrees in electrical and the University of Science and Technology of China
electronics engineering from the Federal Univer- (USTC), Hefei, China, in 2019. He is currently
sity of Technology Akure, Nigeria, in 2018 and an Associate Professor with Achievers University, Owo, Nigeria, and a
2021, respectively. He is currently pursuing the Research Fellow with the University of Johannesburg, South Africa. His
Ph.D. degree with the University of Johannesburg, research interests include signals and systems, array processing, electromag-
South Africa, with a special interest in apply- netic sensing, antenna, and propagation. He was a recipient of one of the Best
ing artificial intelligence in power systems. From Paper and Oral Presentation Award from the 2018 IEEE International Con-
2018 and 2021, he was with Transmission Com- ference on Integrated Circuits and Technology Applications (ICTA), Beijing,
pany of Nigeria (TCN). He is a Lecturer with the and the 2016 Innovation Spirit Award of the Micro-/Nano Electronics System
Department of Electrical and Information Engineering, Achievers Univer- Integration.
sity, Nigeria.

VOLUME 12, 2024 66679

View publication stats

You might also like