Dhiman - Machine Intelligent and Deep Learning
Dhiman - Machine Intelligent and Deep Learning
DOI: 10.1002/2050-7038.12818
1
Department of Electrical Engineering,
Adani Institute of Infrastructure
Summary
Engineering, Ahmedabad, India Fluctuation in wind speed over the period of time can impact operation of a
2
Department of Electrical Engineering, utility grid. Sudden large scale variation in turbine power, often termed as
Institute of Infrastructure Technology
ramp event, can threaten the security of power system. In this work, we pre-
Research and Management, Ahmedabad,
India dict ramp events through a hybrid method based on discrete wavelet transform
(DWT) and learning algorithms such as Twin Support vector regression
Correspondence
Harsh S. Dhiman, Department of
(TSVR), random forest regression (RFR), and Convolutional neural networks
Electrical Engineering, Adani Institute of (CNN) for onshore, offshore and hilly sites. Wavelet transform-based signal
Infrastructure Engineering, Ahmedabad processing helps extract features from wind speed. Results suggest that SVR
382421, India.
Email: [email protected]
based prediction models are the best in forecasting among the available
models. Besides, CNN predicts ramp events closer to the TSVR model and
gives a better prediction performance for larger training datasets. The proposed
hybrid version of TSVR, RFR, and CNN models are compared with existing
SVM, ANN, and ELM models and indicate significant improvement in
predicting ramp events. Compared to SVM, TSVR and RFR are 17.88% and
4.87% efficient in terms of RMSE, respectively. Further, randomness in ramp
event signal for all the wind farm sites considered is evaluated using log-energy
entropy. Results reveal that compared to wavelet transform, empirical mode
decomposition yields lower randomness in predicted ramp event signals.
KEYWORDS
empirical mode decomposition (EMD), ramp events, support vector regression (SVR), twin
support vector regression (TSVR), wavelet transform (WT), wind forecasting
1 | INTRODUCTION
Environmental deterioration and greenhouse gas emission is a cause of concern for many developing and developed
countries.1-4 Among renewable energy sources, by 2020, wind power is likely to cater 12% of global electricity
List of Symbols and Abbreviations: ANN, Artificial neural network; ARIMA, Autoregressive moving integrated moving average; ARMA,
Autoregressive moving average; CNN, Convolutional neural network; DWT, Discrete wavelet transform; ELM, Extreme learning machine; EMD,
Empirical mode decomposition; GBM, Gradient boosted machines; LSTM, Long short-term memory networks; MAE, Mean absolute error; MAPE,
Mean absolute percentage error; MLP, Multi-layer perceptron; RFR, Random forest regression; RMSE, Root mean squared error; SSR, Sum of squared
residuals; SSR/SST, Ratio of sum of squared residuals to sum of squared deviation; SST, Sum of squared deviation of testing samples; SVR, Support
vector regression; TSVR, Twin support vector regression; U1, Theil's U1 statistic; U2, Theil's U2 statistic; ε-TSVR, ε-Twin support vector regression.
Int Trans Electr Energ Syst. 2021;e12818. wileyonlinelibrary.com/journal/etep © 2021 John Wiley & Sons Ltd 1 of 15
https://doi.org/10.1002/2050-7038.12818
2 of 15 DHIMAN AND DEB
demands.5 Although the wind resource is available in abundance, the resulting stochasticity creates several challenges
in terms of control, scheduling and maintenance of wind power from large wind parks.6 Uncontrolled vibrations from
the combination of wind and wave force tend to disrupt the structural integrity of the turbines.7 A ramp event is a sud-
den change in wind power capture, and when the change exceeds a specified power level, a ramp-up or ramp-down
event gets identified. Growing wind power capacity has led to a more focused study of wind power ramp events. Mathe-
matically, a wind power ramp event8 is given as the arithmetic difference between wind power at consecutive time
instants (t and t + Δt):
ΔPRamp
w = Pw ðt + Δt Þ −Pw ðt Þ: ð1Þ
Gallego-Castillo et al reviewed the preliminary definition of a ramp event and the threshold values that various liter-
ature have used to formulate a ramp event.9 Forecasting ramp event is can be considered an equivalent to wind speed
forecasting where the aim is ensure stable and reliable power transfer to the grid.10 Intermittent grid failures are suit-
ably preventable with available wind forecasting data.11,12 Setting proper technical standards to identify ramp events is
an integral part of wind farm planning.
Traditionally, wind power ramp events can be divided into ramp-up and ramp-down events. The corresponding
ramp event types are related to the change in the magnitude of the wind speed in a short span of time. Forecast meth-
odology to characterize ramp events with a derivative filtering edge detection approach and numerical weather predic-
tion ensembles for probabilistic forecasts of ramp occurrence is presented.13 Understanding the weather patterns and
regimes helps predict ramp events and often complements various modern forecast methods.14 Couto et al performed a
classification task for the weather regimes over mainland Portugal for severe wind power ramps.15
Machine learning (ML) models with good generalization capability are usually preferred to capture the non-
linearity in wind speed adequately. Among ML models, Support vector machines (SVM), Extreme learning machine
(ELM), and Gaussian process regression (GPR) find usage for classification and regression tasks. These models use the
historical data to ascertain the patterns between a set of input and output quantities.5 The training data's size deeply
affects the performance and computation complexity for that particular model for ML models. Cui et al detect ramp
event probability characteristics from scenarios captured bands to evaluate the ramp event forecasting method using a
modified genetic algorithm with multi-objective fitness functions.16,17 In other ramp event prediction related works,
Dhiman et al have discussed SVR and its variants18 for predicting wind power ramp events using five datasets for ramp-
up and ramp-down events.19,20
In Reference 21, the authors presented a ramp function for identifying a large scale ramp event using Wavelet trans-
form technique to decompose the wind speed time-series. Neural networks and evolutionary optimization techniques are
often used for classification and regression analysis. The ability of evolutionary techniques to work in a large search space
makes them a popular candidate. ML algorithms such as SVM and ELM have been applied to classify wind power ramp
events.22 Data accuracy of hybrid models is assessed for classifying wind power ramp events. In ML, the performance on
an unseen data is highly dependent on the input feature set, and the dimensionality of input features during training
phase governs the computational time and the generalization performance. In Reference 23, the prediction performance
of a feature extraction technique based on Gabor filtering and considering atmospheric pressure fields, is found to be bet-
ter than the state-of-the-art neural network method. Figure 1 illustrates a graphic for ML performance with respect to the
number of features. The performance metric can be squared error loss, accuracy or mean absolute error (MAE).
A common example of computational complexity can be seen in the SVM based classification and regression models
where the kernel matrix size during training phase is of order N × L, where N is the number of samples and L denotes
the dimension of the input feature space. Feature selection using wavelet transform is studied by Amin et al where the
relative wavelet energy for approximate and detail signals is calculated.24 Popular feature selection techniques include
Principal component analysis, Neighborhood component analysis, and Grey relational analysis which are used in
selecting optimal features for short-term wind forecasting. Selecting optimal and relevant features for a ML algorithm is
likely to boost its computation time and generalization performance.25-28 Apart from ML models, deep learning tech-
niques are also implemented for a holistic understanding of renewable energy sources specifically wind speed forecast-
ing and predictive maintenance.29,30 Kulkarni et al explored deep learning techniques such as non-linear auto-
regressive neural network and Long short-term memory (LSTM) models for wind speed forecasting and prediction of
fatigue analysis for a 5 MW NREL wind turbine model31 using an environmental data of 5 years as an input. Liu et al
developed a hybrid model based on wavelet decomposition for forecasting wind speed.32 The wind speed data is
DHIMAN AND DEB 3 of 15
decomposed into several sub-bands, and are divided into low and high frequency signals which are forecasted using
LSTM model and Elman neural network (ENN) respectively. The results for this hybrid model are then compared with
ARIMA, Back propagation neural network (BPNN) and Generalized regression neural network (GRNN). Chen et al
explored deep learning and ensemble methods for multi-step wind speed forecasting for a wind farm site in China33
Wind speed time-series with time intervals of 15-minutes, 1 hour, 4 hours, 8 hours, and 24 hours for a duration of six
months is taken. Stacked denoising autoencoder (SDAE) based feature extraction technique is employed in tandem
with deep learning model like LSTM to predict wind speed. Results indicate that SDAE based deep learning model out-
performs ENNs for different time-intervals. Zhang et al discussed a variational mode decomposition (VMD) based
hybrid model for wind power prediction.34 The proposed method is compared with SVM, BPNN, and ARIMA. Results
reveal that RMSE obtained with VMD based prediction is lower than the other methods.
Hong and Rioflorido discussed a CNN topology for 24 hours ahead wind speed forecasting35 by cascading with
radial basis function neural network (RBFNN). Results are then compared with multi feed-forward neural network
(MFNN) and BPNN and it is found that the forecasting method with double Gaussian function (DGF) as its activation
function outperforms others. The accuracy is further validated with Diebold-Mariano test. In Reference 36, a
3-dimensional CNN is implemented to forecast wind power using relevant feature extraction technique and Numerical
weather prediction (NWP) data. A duration of 19 months for training and 9 months for testing is considered. Bench-
mark methods like 2-dimensional CNN, Persistence and Principal component analysis are applied to validate the supe-
riority of 3-dimensional CNN. In Reference 37, a wavelet transform based CNN method is applied to wind farm sites in
China, and the uncertainties in the wind speed time-series analysed by modeling a probabilistic forecast reveal that the
prediction performance based on ensemble learning method is superior than Persistence, SVM and BPNN. In Reference
38, several signal pre-processing techniques such as Complete Ensemble Empirical Mode Decomposition (CEEMDAN)
and wavelet transform are applied with Back-propagation, GRNN and Radial basis neural network (RBFNN). Results
indicate that wavelet transform based forecasting model outperforms the EMD based forecasting models. In Reference
39, 1-dimensional CNN architectures are employed for wind speed forecasting for meteorological sites in Stuttgart and
Netherlands. The problem is framed as a classification task where accuracies up to 95% are achieved. In Reference 40,
authors present a detailed review of hybrid EMD methods for wind speed and power prediction. Authors indicate that
SVM and ANN based hybrid methods have shown significant improvement over classical methods.
In the current manuscript, a comparative analysis of ML techniques such as Twin support vector regression (TSVR)
and Random forest are tested against a deep learning technique based on Convolutional neural network (CNN). As per
our knowledge, we are the first group to analyze the performance of wind power ramp events when large training data
is considered. Further, wind power ramp event forecasting based on wavelet based TSVR, RFR and CNN adds new
insights to the existing literature. In terms of the training time, we observed the superiority of TSVR among tested
methods. More specifically, the broader aim is to assess the performance of these methods for a large training set. The
current work addresses following objectives:
1. Ramp prediction models for onshore and offshore wind farms are discussed. The wind powers corresponding to the
wind speed time series are calculated to identify ramp events. The wind power ramp events are predicted using a
hybrid method involving wavelet transform decomposition and machine learning methods. The underlying problem
is modeled as a regression task with Δt = 10 min due to the relevance of minute-scale wind forecasting with modern
day market operation.
4 of 15 DHIMAN AND DEB
2. The potential capability of deep learning techniques like CNN and ML models like Random forest regression (RFR) and TSVR
are checked. Several error metrics for the entire wind speed time series and absolute error for ramp events are evaluated.
3. Log energy entropy based randomness is discussed. The randomness in a ramp signal is an important feature to be
dealt with. Higher order randomness calls for accurate wind resource assessment and micro-siting.
This manuscript is organized as follows. Section 2 describes machine learning methods, that is, TSVR, RFR and
CNN followed by framework of ramp prediction models in Section 3. In Section 4 outcomes of the proposed models
and discussions are highlighted, followed by comparative analysis in Section 5 and Conclusions in Section 6.
2 | M E T H O D S FO R W I N D P O W E R R A M P P R E D I C T I O N
Next, we discuss various methods employed to predict ramp events in onshore, offshore and hilly wind farms. A hybrid ML
model based on Wavelet decomposition serves the purpose of eliminating stochastic trends in the wind speed time series.
Support vector regression (SVR) is derived from SVMs41 and is used in several branches of forecasting like solar radia-
tion forecasting, wind forecasting and hydrological time series. In 2010, Peng developed a TSVR where two smaller
sized quadratic programming problems are solved to arrive at the final regressor.42
For training data (x1, y1), (x2, y2), …, (xn, yn) X × R, where X represents the input feature space with dimension
R , consider Y = (y1, y2, …, yi) as target output, i = 1, 2, …, n and yi R. The mathematical formulation of TSVR is
n
1X n Xn
min ðyi −eε1 −ψ 1i ÞT ðyi −eε1 −ψ 1i Þ + C1 eT ξi ,
2 i=1 i=1
s:t:yi −ψ 1i ≥eε1 −ξi ,
1X n Xn
min ðyi −eε2 −ψ 2i ÞT ðyi −eε2 −ψ 2i Þ + C2 eT ηi ,
2 i=1 i=1
s:t:ψ 2i − yi ≥eε2 −ηi ,
where ψ 1i = xiw1 + eb1, ψ 2i = xiw1 + eb2, C1, C2 > 0 and ε1, ε2 ≥ 0 are the TSVR hyperparameters and ξi, ηi denote the
slack variables acting as soft margin to the error ε. The formulation of dual of the TSVR problem in terms of Lagrangian opera-
tor is
1X n
Lðw1 , b1 ,εi , αi ,βi Þ = ðy −eε1 − ðx i w1 + eb1 ÞÞT ðyi −eε1 − ðx i w1 + eb1 ÞÞ
2 i=1 i
ð2Þ
X
n X
n X
n
+ C1 e T
ξi − αi ðyi −eε1 − ðx i w1 + eb1 ÞÞ− βi ξ,
i=1 i=1 i=1
where αi, βi are the Lagrangian multipliers. The KKT conditions can be evaluated as follows
8
>
> ∂L
>
> = 0 ) −X T ðY −Xw1 −eb2 −eε1 Þ + X T α = 0
>
> ∂w
>
>
1
>
> ∂L
>
< = 0 ) −eT ðY − Xw1 −eε1 −eb2 Þ + eT α = 0
∂b1
>
> ∂L = 0 ) C eT −α −β = 0
>
>
>
> ∂ξ
1
>
>
>
>
: ∂L = 0 ) Y − ðXw1 + eb1 Þ≥eε −ξ, ξ≥0:
>
∂α
For the TSVR optimization problem, the equality constraints are given as
DHIMAN AND DEB 5 of 15
where α [0, C1e] for β ≥ 0, and Equation (3) is unified and written as
−QT t + QT Qu1 + QT α = 0,
−1 ð4Þ
u1 = QT Q QT ðt − αÞ
where Q = [X e] and t = Y − eε1. The dual corresponding to Equation (2) can be simplified as
1 −1 −1
max − αT Q QT Q QT α + t T Q QT Q QT α−t T α
2 ð5Þ
s:t:α½0, C1
1 −1 −1
max − γ T Q QT Q QT γ + mT Q QT Q QT γ −mT γ
2 ð6Þ
s:t:γ½0, C2 ,
where m = Y + eε2 and u2 = (QTQ)−1QT(m − γ). The final predictions for new data samples is given as a mean
regressor
1
f TSVR ðx Þ = ðw1 + w2 ÞT x + ðb1 + b2 Þ , ð7Þ
2
where w1, w2 are the weights corresponding to the two hyperplanes and b1, b2 are the bias vectors. As studied by
Dhiman et al, among the variants of SVR models for wind forecasting, TSVR is found to be the best regressor.19,20 Spe-
cifically for wind speed forecasting where the available data is large, it is essential to use algorithms which are fast in
their computation. Thus, in this regard to computationally assess the prospect of machine learning techniques, the
ramp event prediction study is extended for RFR and CNN. Wind farm sites with onshore, offshore and hilly terrain are
studied.
Random forest is an ensemble technique formulated by Brieman to build a cluster of trees from given training inputs.43
The generalized performance obtained through such methods is superior to individual methods in better capturing the
non-linear trend in the wind speed. The random forest segregates the input matrix into different subsets and performs
regression or classification on each one of them. Similar to hyperparameters (σ and C) in SVR, the number of trees and
the number of random features in each tree decomposition is the parameters that decide the regression task's perfor-
mance. For a given decision tree, a set of input samples are fed to a function to create a model at the end of the training
phase for performance testing on unseen data. The name “random” applies as it randomly selects the input samples
from a given set and creates the best fit model. Mathematically, the regression task is
Xk
^=1
A ^r ðX,V i Þ, ð8Þ
k i=1
where ^r ðX, V Þ is the representative tree at the end of training process, X is the set of input feature vectors and T is the
collective set representing input-output pair Vi = (x1, y1), (x2, y2), …(xn, yn). The predicted output gets averaged over
k decision trees. An added advantage is the insensitivity to noise from uncorrelated trees via differential input sampling.
The common problem of over-fitting is avoided by tuning the hyperparameters, such as the maximum number of fea-
tures and tree depth of a random forest model.
6 of 15 DHIMAN AND DEB
Proposed by Yann LeCan in 1988, a CNN is a deep learning tool widely used for classification, regression analysis and
more specifically in the field of image and handwriting recognition.44 CNN works on a multi-layer perceptron (MLP)
topology with large number of hidden layers that allows the network to select optimal features needed for classification
or regression task. A CNN classification or regression task can be divided into three main processes: (a) convolution,
(b) pooling and (c) classification or regression. In the convolution step, the filters applied to an input, and an appropri-
ate algorithm is used to track the changes caused by the filter. Given a training input of size N × N, a filter with size
Z × Z is applied to obtain a filtered input with size E × E, where E = N − Z + 1.45,46 The filtered input is then passed
through a non-linear activation function “ReLU.” Next step involves pooling. Pooling layer helps in sub-sampling the
filtered data and reduces the computation time in the training process. Moreover, over-fitting problem that persists in
machine learning algorithms is overcome by CNN. Over-fitting is a phenomena that leads to poor generalization perfor-
mance where the trained model causes large error on a new unseen data. The final step is a CNN fully connected layer
where the classification or regression task is achieved. The extracted features from the convolution and pooling proce-
dures are now evaluated to obtain the target values. Activation functions in CNN hold significant importance with
“Sigmoid(x),” “ReLU(x)” and “tanh(x)” being widely used. Furthermore, optimization of weights of the hidden layer
results in an accurate classification or regression task. Commonly used Stochastic gradient descent (SGD) and Adaptive
moment estimation (ADAM) algorithms tend to improve the computation complexity. Mathematically, ADAM algo-
rithm can be expressed as
g t = r θ B t ðθ t − 1 Þ
r t = γ 1 r t − 1 + ð1 −γ 1 Þgt
st = γ 2 st − 1 + ð1 − γ 2 Þg2t
r s , ð9Þ
^r t = t t , ^st = t t
1 −γ 1 1 −γ 2
α^r t
θt = θt − 1 − pffiffiffiffi
^st + ε
where γ 1, γ 2 are the exponential decay rates, α is the step size, B(θ) is the loss function, θ0 is the initial parameter esti-
mate and rt, st are first and second moment vectors. Commonly used value of tolerance term ε is 10−8 and gt is square
root of element-wise square g2t . Figure 2 illustrates different layers in a CNN model. Layers CN1, CN2 and CN3 are the
convolution layers that perform the convolution operation on the input data.
Convolution ensures the spatial correlation among input training data items by learning features utilizing tiny
squares of input data. Training data input is considered a matrix of a defined value. In CNN terms, a “filter” is the
matrix performed on an input training data, and the matrix formed by sliding the filter over the image and computing
the inner product is named a”Convolved Feature.” Further, a CNN has a Pooling layer which responsible for dimension
reduction for the input training data. A large input training data can lead to skewed projections that may cause
decreased accuracy. Pooling layers are also immune to any transformation, translation or distortion occurring in input
data. The final layers in a CNN are fully connected layers FC1 and FC2 that essentially are similar to MLP network. The
relevant features from the convolution and pooling layers are fed to the fully-connected layers which yield output in
the end in terms of a classification or regression task.
Wind speed variation depends on the terrain under study. For all the wind farms operating in neutral atmospheric
boundary layer, the wind speed follows a logarithmic profile with height. For onshore wind farms, the surface rough-
ness length (z0), is around 0.005 m, for offshore wind farms it is 0.0002 and 1 m for hilly terrain. The current hybrid pre-
diction model is based on the combination of a wavelet transform technique and a machine learning/deep learning
algorithm. Wavelet transform is widely used for defragmentation of a time-series signal into low and high frequency
sub-signals called as approximate and detail signals respectively. A 5-level db4 wavelet transform is used to decompose
the wind speed time series. The approximate signals (A1,A2, …, A5) along with detail signals (D1, D2,…, D5) form input
feature set for all the prediction models. The wind speed datasets are collected for three different time periods of 3, 12,
and 36 months at a height of 10 m above the ground averaged over a time interval of 10 minutes. For an accurate wind
power ramp event study, it is desirable to transform the available wind speed to a hub height of 90 m using logarithmic
law.47 In this work, we consider all the wind farms with identical wind turbines having rotor diameter of 120 m with a
hub height of 90 m. The datasets McCain Foods (onshore), Gemini (offshore) and Starfish hills (hilly) with their site
coordinates and statistical parameters are depicted in Table 1.
Mean represents the mean of the wind speed time-series and SD indicates the SD. The variability in wind speed is
high as the changes in wind speed on minute to minute scale are likely to cause large power changes. Figure 3 illus-
trates a generic block diagram of the wavelet transform based ramp prediction model for onshore, offshore and hilly
wind farms. With reference to the block diagram, a wind speed time-series is transformed into wind power time-series
followed by calculation of the signal ΔPRamp
w which is nothing but our wind power ramp signal. Given the two thresh-
olds for ramp-up event (Pup) and ramp-down event (Pdown), the ramp signal is compared to identify the type of event at
a particular time instance. The absolute error at these ramp instances is calculated as
^u −pu j , Rdown = j p
Rup = j p ^d −pd j , ð10Þ
where p ^d , pd are predicted and actual values of power at ramp-up and ramp-down instances.
^u ,pu and p
Ramp events occur when the wind power increases or decreases suddenly in a short duration of time which is typi-
cally in the range from 5 minutes to 6 hours.9 For a given wind turbine, let us say the ramp threshold power is αth% of
the nominal wind power. Then we can define two ramp thresholds, that is,
+ αth % of Pnom = Pup ,
ΔPramp
w = ð11Þ
−αth % of Pnom = Pdown ,
T A B L E 1 Statistical features of
Wind farm (Dataset) (Site
wind speed data coordinates) Mean SD
McCain Foods, UK (A) 6.491 3.519
(52.56 N, 0.172 W)
Gemini, Netherlands (B) 7.577 4.174
(54.03 N, 5.96 E)
Starfish hills, Australia (C) 6.571 4.515
(35.58 S, 138.14 E)
8 of 15 DHIMAN AND DEB
where Pup and Pdown are the upper and lower ramp thresholds respectively depicting ramp-up and ramp-down events
in a short period of time. The rated wind speed considered for ramp event study is 12 m/s and threshold level for ramp
events is chosen as 10%. Further, in order to forecast the ramp events in the wind speed time series, wind power is cal-
culated for the given sample of data. A ramp event is identified when the wind power generated exceeds the lower
(Pdown) or upper (Pup) threshold values. In order to assess the wind power ramp event prediction performance metrics
such as Root mean squared error (RMSE), MAE, Mean absolute percentage error (MAPE), Theil's U1 and U2 statistic
are calculated and their expression is given as
" #1=2
1X n
RMSE = ðp^ −p Þ2 ð12Þ
n i=1 i i
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P n
n×
1
ðp^i −pi Þ2
i=1
U1 = sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! ð13Þ
P n P n
n×
1
p2i + n1 × ^2i
p
i=1 i=1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P n
1
n × ððp i + 1 − p ^i + 1 Þ=pi Þ2
i=1
U2 = sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , ð14Þ
P n
n×
1
ð ðp i + 1 − p ^i Þ=pi Þ2
i=1
1X n
MAE = ^ ði Þ j
j p ði Þ − p ð15Þ
n i=1
100 X
n
j pðiÞ− p ^ ði Þ j
MAPEð%Þ = , ð16Þ
n i=1 p ði Þ
DHIMAN AND DEB 9 of 15
0
0 500 1000 1500 2000 2500 3000 3500 4000
Samples/10 min
10
0
0 500 1000 1500 2000 2500 3000 3500 4000
Samples/10 min
Dataset C
20
10
0
0 500 1000 1500 2000 2500 3000 3500 4000
Samples/10 min
where p^i , pi , p
are the predicted, actual and mean values of the testing samples. Additionally, we have computed Theil's
U1 and U2 (TH1/TH2) statistic that finds their usage in economics for indicating the quality of forecasts made by a par-
ticular method. Figure 4 illustrates the wind speed characteristics of wind farm sites considered in this analysis.
4 | R E SUL T S
Based on the formulation of prediction methods, we now discuss their performance for wind power ramp events. In this
analysis, we consider a threshold of 10%, that is, αth = 0.1 for identifying ramp events. In order to avoid data-leakage,
the wind power time-series is segmented into training (80%) and testing (20%) set and is treated with 10-fold cross-
validation scheme. For TSVR, the hyperparameters, RBF bandwidth (σ) and regularization parameter (C) are chosen.
From the set [2−10, 2−9, …29, 210]. For RFR, the simulations in R studio uses “randomForest” package with 1000 trees
in the training phase. For CNN, Keras library is utilized with input to convolution layer and pooling layer having
“ReLU” as its activation function and “linear” activation function for final output layer. Further, it is important to
understand that input matrix to the regression model is obtained from discrete wavelet transform (DWT) decomposi-
tion.18 A more prediction specific block diagram is illustrated in Figure 5.
The analysis is then done to identify ramp events based on the set threshold αth = 0.1. For onshore, offshore and
hilly wind farm sites, the datasets labeled are A, B, and C respectively. From Table 2, we decipher that the prediction
performance among the three methods is significantly dependent on the time period under which they are considered.
Machine learning techniques like TSVR and RFR model perform better than CNN model for time period of 3 and
12 months as indicated by the error metrics RMSE, MAE, and MAPE (%). This ability of ML models suits the wind farm
companies to forecast wind speed for considering shorter historical data. Market clearing procedures are also dependent
on the availability of short-term forecasts. For a time duration of 3 months, TSVR is 74.04% better than CNN while the
same method is 53.97% better than CNN for a duration of 12 months. This decrease in the efficiency of TSVR when
compared to CNN is indicative of the fact that with large training dataset deep learning technique is a far superior can-
didate than machine learning methods. Furthermore, for all the datasets A, B, and C, the predictive performance in
terms of RMSE for CNN over TSVR and RF model is much better thus strengthening the focal point of deep learning
techniques for a large training data. For dataset A, CNN outperforms RFR and TSVR by 9.09% and 38.88% respectively
in terms of RMSE when wind speed data considered is of 36 months. Further, for the same dataset A, CNN gives a pre-
diction error of 1.0312 and 1.1151 for ramp-up and ramp-down events and is minimum when compared to TSVR and
RFR. The statistical accuracy of CNN over TSVR and RFR models is also validated in terms of Theil's U1 and U2 indices
where the quality of forecast is assessed. In terms of the computation time as denoted by CPU time in Table 2, CNN
model takes less time compared to TSVR and RFR models specifically for a time duration of 36 months. The time com-
plexity is a natural phenomena for machine learning and deep learning techniques and depends heavily on the number
10 of 15 DHIMAN AND DEB
FIGURE 5 Wind power prediction using hybrid DWT machine and deep learning models
of hyper-parameters involved during the training phase. Figure 6 illustrates a ramp event signal for offshore wind farm
Gemini, Netherlands. The figure talks about the change in wind power observed for different lengths of time-series
data. Further, it is likely to observe changes in wind power ramp signal when the sample interval is on hourly basis.
5 | C OM P ARA T I V E AN AL Y S IS
While predicting wind power ramp-up (Rup) and ramp-down (Rdown) events, TSVR and RF model give better absolute
error than the CNN method. However, for a longer duration, like 36 months, CNN outperforms TSVR and RF model
due to its deep network topology and efficient optimization algorithm that adjusts the weights for the hidden layers.
The section above shows that CNN-based regressors have the upper hand over RFR and TSVR based prediction
methods. In Reference 48, authors have tried to address the wind power ramp prediction based on Markov chain
(MC) and auto-regression (AR) methods. However, the prediction process using AR methods is always prone to period-
icity which may not capture the non-linearity in the underlying time-series. In Reference 49, methods such as support
vector regression (SVR), Gaussian process regression (GPR), multilayer perceptron neural network (MLPNN), and
Extreme learning machine (ELM) are deployed for ramp event prediction.
The current work predicts wind power ramp events with a 10-minutesute sampling interval for onshore, offshore,
and hilly wind sites. With offshore wind farms, the wind speed is high and variable, the probability of ramp event
increases. Thus, in the current work, we extend the ramp event prediction study by considering a large training data
and test it on hybrid DWT based methods incorporating TSVR, RFR, and CNN. Results reveal that TSVR based predic-
tion model yields the lowest error for ramp-up and ramp-down events for the period of 3 and 12 months. The compara-
tive analysis is done for dataset A and results are depicted in Table 3. The proposed hybrid TSVR, RFR, and CNN
methods are compared with SVM,50 and ANN.49 It is observed that TSVR and RFR models outperformed SVM and
ANN models in terms of RMSE and other performance metrics. Further, in terms of computation speed, TSVR is
observed to be the fastest among all models. The model parameters for SVM are regularization parameter C = 29 and
RBF kernel bandwidth σ = 28. Meanwhile, for ANN, the number of epochs is 1000 and gradient is 10−6.
The time and space complexity trade-off is an important analysis that helps to identify an optimal algorithm for
regression task. In this work, the time and space complexity of the tested models is compared and is depicted in
Table 4. With respect to the random forest algorithm, mtry represents variables at each node, ntrees is the optimal num-
ber of trees and n is the number of total features. As far as space complexity is concerned, for TSVR, it is dependent on
the number of support vectors (nsv). For random forest algorithm, the space complexity is related to the number of
DHIMAN AND DEB 11 of 15
TABLE 2 Performance metrics based on hybrid machine and deep learning methods
-2
0 500 1000 1500 2000 2500 3000
10 6 12 months
5
Ramp
w
0
P
-5
0 2000 4000 6000 8000 10000 12000
10
6 36 months
5
-5
0 0.5 1 1.5 2 2.5 3
Samples/10 min 10 4
12 of 15 DHIMAN AND DEB
TABLE 3 Comparison of hybrid DWT based TSVR, RFR and CNN models with previous studies
Dataset Model RMSE (%) Rup Rdown U1 U2 CPU time (sec) MAE MAPE (%)
TSVR 1.01 0.7653 0.8891 0.0138 0.0645 1221 2.012 2.672
RFR 1.17 1.0503 1.182 0.0731 1.0000 2071 2.673 2.874
A CNN 2.39 1.4105 1.5911 0.0003 0.0046 1901 3.001 2.986
50
SVM 1.23 1.2131 1.3424 0.08314 1.01138 1391 2.853 2.951
49
ANN 1.53 1.3151 1.3512 0.0923 1.0715 1591 2.993 2.921
ELM49 1.18 1.0653 1.2451 0.0813 1.0325 1614 2.703 2.759
FIGURE 7 Ramp event signal and its decomposition using wavelet transform and EMD
models m (or decision trees) and p nodes in a tree. For CNN, the time and space complexity is dependent on number of
epochs υ, number of training samples n and f(L) which is a function of number of neurons present in L layers and l, b,
h represents the dimensions of the image matrix.
DHIMAN AND DEB 13 of 15
T A B L E 5 Randomness in wind
Log energy entropy
power ramp signal
Dataset Algorithm Wavelet transform Empirical mode decomposition
A TSVR 5.7295 × 10 3
2.5646 × 103
RFR 5.6120 × 103 2.2297 × 103
CNN 5.6112 × 103 2.4621 × 103
B TSVR 2.1295 × 10 3
1.6396 × 103
RFR 2.6120 × 103 1.2287 × 103
CNN 2.3112 × 103 2.4721 × 103
C TSVR 4.4295 × 103 1.5646 × 103
RFR 4.7120 × 10 3
2.3397 × 103
CNN 4.9112 × 103 2.4311 × 103
Frequent wind power ramp events tend to cause large power reversals and adversely affect the associated power sys-
tems. Fundamentally, the entropy of a signal indicates the information carried. The inherent randomness present in
the wind speed time-series can be extracted in quantitative terms using log-energy entropy concepts. It is observed that
random nature of wind speed coupled with turbulence can lead to increased tower vibrations. Randomness in wind
speed can be examined by decomposing the ramp event signal using wavelet transform decomposition (WT) and empir-
ical model decomposition (EMD). Result of wavelet decomposition is a low-frequency and high-frequency component
known as approximate and detail signals respectively. While for EMD, the time-series is decomposed into intrinsic
mode functions (IMFs) and a residual signal. The log energy entropy for a signal z(t) given T samples is
X
T
E fzðt Þg = log zðt Þ2 : ð17Þ
t=0
Low frequency component from wavelet transform and low-frequency IMF from EMD are illustrated in Figure 7 for
a given ramp event signal. Ramp event signals are obtained from predicted wind powers using TSVR, RFR, and CNN is
used to calculate randomness as depicted in Table 5.
For dataset A, the randomness observed with empirical mode decomposition is less compared to wavelet transform.
Further, based on a prediction algorithm, CNN yields minimum log-energy entropy value for using wavelet transform
and empirical mode decomposition. This suggests the use of CNN as a prediction algorithm for providing accurate wind
power schedules to the utility grid. Similar behavior is observed for datasets B and C.
6 | C ON C L U S I ON
In this work, wind power ramp event prediction is studied for three different wind farm terrains where samples are col-
lected at a 10-minute interval. To analyze the ramp events, a threshold of α = 10% is considered. The main highlight of
this analysis is the large amount of training data considered where the ML models (TSVR and RFR) are compared with
a deep learning model CNN. TSVR based model provides a minimum absolute error for ramp events when the data
(training+testing) considered is 3 and 12 months. Further, the error in predicting ramp-down event is more than in
ramp-up event. It is observed that, TSVR and RFR are in good agreement for predicting 10-minutes wind ramps for
shorter duration of data (3-12 months). Compared to SVM, TSVR and RFR are 17.88% and 4.87% efficient in terms of
RMSE respectively. The CNN model's applicability comes into the picture for more than 12 months for training data
where the model outperforms standard ML models like SVM, RFR, and ANN. We calculate the Log-energy entropy for
the ramp event signal indicating the randomness in the wind speed for all three datasets and find that EMD based
decomposition yields less randomness suggesting its use for prediction. Thus, an accurate machine intelligent model for
14 of 15 DHIMAN AND DEB
a training period of 3 to 12 months and a deep learning tool like CNN for a duration higher than 12 months can yield a
smooth grid integration of a wind power system.
P EE R R EV IE W
The peer review history for this article is available at https://publons.com/publon/10.1002/2050-7038.12818.
ORCID
Harsh S. Dhiman https://orcid.org/0000-0002-7394-7102
R EF E RE N C E S
1. Dhiman HS, Deb D, Carroll J, Muresan V, Unguresan M-L. Wind turbine gearbox condition monitoring based on class of support vector
regression models and residual analysis. Sensors. 2020;20(23):6742.
2. Mohamed MA, Jin T, Su W. An effective stochastic framework for smart coordinated operation of wind park and energy storage unit.
Appl Energy. 2020;272:115228.
3. Abdolaziz Mohamed M, Almalaq A, Awwad EM, El-Meligy MA, Sharaf M, Ali ZM. A modified balancing approach for renewable based
microgrids using deep adversarial learning. IEEE Trans Ind Appl. 2020;1-1.
4. Guo J, Xu X, Lian W, Zhu H. A new approach for interval forecasting of photovoltaic power based on generalized weather classification.
Int Trans Electr Energy Syst. 2018;29(4):e2802.
5. Hu Q, Zhang S, Yu M, Xie Z. Short-term wind speed or power forecasting with Heteroscedastic support vector regression. IEEE Trans
Sustainable Energy. 2016;7(1):241-249.
6. Dhiman HS, Deb D, Foley AM. Lidar assisted wake redirection in wind farms: a data driven approach. Renewable Energy. 2020;152:
484-493.
7. Colwell S, Basu B. Tuned liquid column dampers in offshore wind turbines for structural control. Eng Struct. 2009;31(2):358-368.
8. Kamath C. Understanding wind ramp events through analysis of historical data. Paper presented at: IEEE PES T&D 2010; April 19-22,
2010; New Orleans, LA.
9. Gallego-Castillo C, Cuerva-Tejero A, Lopez-Garcia O. A review on the recent history of wind power ramp forecasting. Renewable Sus-
tainable Energy Rev. 2015;52:1148-1157.
10. Dhiman HS, Deb D. Wake management based life enhancement of battery energy storage system for hybrid wind farms. Renewable Sus-
tainable Energy Rev. 2020;130:109912.
11. Vincent CL, Pinson P, Giebela G. Wind fluctuations over the North Sea. Int J Climatol. 2010;31(11):1584–1595.
12. Gjerstad J, Aasen SE, Andersson HI, Brevik I, Løvseth J. An analysis of low-frequency maritime atmospheric turbulence. J Atmos Sci.
1995;52(15):2663-2669.
13. Bossavy A, Girard R, Kariniotakis G. Forecasting ramps of wind power production with numerical weather prediction ensembles. Wind
Energy. 2012;16(1):51-63.
14. Bracale A, Caramia P, Carpinelli G, De Falco P. Day-ahead probabilistic wind power forecasting based on ranking and combining
NWPs. Int Trans Electr Energy Syst. 2020;30(7).
15. Couto A, Costa P, Rodrigues L, Lopes VV, Estanqueiro A. Impact of weather regimes on the wind power ramp forecast in Portugal. IEEE
Trans Sustainable Energy. 2015;6(3):934-942.
16. Cui M, Ke D, Sun Y, Gan D, Zhang J, Hodge B-M. Wind power ramp event forecasting using a stochastic scenario generation method.
IEEE Trans Sustainable Energy. 2015;6(2):422-433.
17. Cui M, Ke D, Gan D, Sun Y. Statistical scenarios forecasting method for wind power ramp events using modified neural networks.
J Mod Power Syst Clean Energy. 2015;3(3):371-380.
18. Dhiman HS, Anand P, Deb D. Wavelet transform and variants of SVR with application in wind forecasting. In: Deb D, Balas V, Dey R,
eds. Innovations in Infrastructure. Advances in Intelligent Systems and Computing. Singapore: Springer; 2018:501-511.
19. Dhiman HS, Deb D, Guerrero JM. Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renewable Sustainable
Energy Rev. 2019;108:369-379.
20. Dhiman HS, Deb D, Balas VE. Supervised Machine Learning in Wind Forecasting and Ramp Event Prediction (Wind Energy Engineering).
Cambridge, MA: Academic Press; 2020.
21. Gallego C, Cuerva A, Costa A. Detecting and characterising ramp events in wind power time series. J Phys: Conf Ser. 2014;555:012040.
22. Cornejo-Bueno L, Camacho-Gómez C, Aybar-Ruiz A, Prieto L, Barea-Ropero A, Salcedo-Sanz S. Wind power ramp event detection with
a hybrid neuro-evolutionary approach. Neural Comput Appl. 2018;32:391–402.
23. Li Y, Musilek P, Lozowski E. Improving the prediction of wind power ramps using texture extraction techniques applied to atmospheric
pressure fields. Int J Data Sci Anal. 2017;4(4):237-250.
24. Amin HU, Malik AS, Ahmad RF, et al. Feature extraction and classification for EEG signals using wavelet transform and machine learn-
ing techniques. Australas Phys Eng Sci Med. 2015;38(1):139-149.
DHIMAN AND DEB 15 of 15
25. Du P. Ensemble machine learning-based wind forecasting to combine NWP output with data from weather station. IEEE Trans Sustain-
able Energy. 2019;10(4):2133-2141.
26. Zhang C-Y, Philip Chen CL, Gan M, Chen L. Predictive deep Boltzmann machine for multiperiod wind speed forecasting. IEEE Trans
Sustainable Energy. 2015;6(4):1416-1425.
27. Khodayar M, Wang J. Spatio-temporal graph deep neural network for short-term wind speed forecasting. IEEE Trans Sustainable Energy.
2019;10(2):670-681.
28. Li S, Wang P, Goel L. Wind power forecasting using neural network ensembles with feature selection. IEEE Trans Sustainable Energy.
2015;6(4):1447-1456.
29. Bali V, Kumar A, Gangwar S. Deep learning based wind speed forecasting-A review. Paper presented at: 2019 9th International Confer-
ence on Cloud Computing, Data Science & Engineering (Confluence); January 10-11, 2019; Noida, India.
30. Sergio AT, Ludermir TB. Deep learning for wind speed forecasting in northeastern region of Brazil. Paper presented at: 2015 Brazilian
Conference on Intelligent Systems (BRACIS); November 4-7, 015; Natal, Brazil.
31. Kulkarni PA, Dhoble AS, Padole PM. Deep neural network-based wind speed forecasting and fatigue analysis of a large composite wind
turbine blade. Proc. Inst. Mech. Eng., Part C: J Mech Eng Sci. 2018;233(8):2794-2812.
32. Liu L, Ji T, Li M, Chen Z, Wu Q. Short-term local prediction of wind speed and wind power based on singular spectrum analysis and
locality-sensitive hashing. J Mod Power Syst Clean Energy. 2018;6(2):317-329.
33. Chen L, Li Z, Zhang Y. Multiperiod-ahead wind speed forecasting using deep neural architecture and ensemble learning. Math Probl
Eng. 2019;2019:1-14.
34. Zhang G, Liu H, Zhang J, et al. Wind power prediction based on variational mode decomposition multi-frequency combinations. J Mod
Power Syst Clean Energy. 2018;7(2):281-288.
35. Hong Y-Y, Rioflorido CLPP. A hybrid deep learning-based neural network for 24-h ahead wind power forecasting. Appl Energy. 2019;
250:530-539.
36. Higashiyama K, Fujimoto Y, Hayashi Y. Feature extraction of NWP data for wind power forecasting using 3D-convolutional neural net-
works. Energy Procedia. 2018;155:350-358.
37. Wang H, Li G, Wang G, Peng J, Jiang H, Liu Y. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl
Energy. 2017;188:56-70.
38. Liu H, Mi X, Li Y. Comparison of two new intelligent wind speed forecasting approaches based on Wavelet Packet Decomposition, Com-
plete Ensemble Empirical Mode Decomposition with Adaptive Noise and Artificial Neural Networks. Energy Convers Manage. 2018;155:
188-200.
39. Harbola S, Coors V. One dimensional convolutional neural network architectures for wind prediction. Energy Convers Manage. 2019;
195:70-75.
40. Bokde N, Feijóo A, Villanueva D, Kulat K. A review on hybrid empirical mode decomposition models for wind speed and wind power
prediction. Energies. 2019;12(2):254.
41. Vapnik Vladimir N. The Nature of Statistical Learning Theory. New York: Springer; 2000.
42. Peng X. TSVR: an efficient twin support vector machine for regression. Neural Netw. 2010;23(3):365-372.
43. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. London, UK: Routledge; 2017.
44. LeCun Y, Boser B, Denker JS, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989;1(4):541-551.
45. Hoseinzade E, Haratizadeh S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl. 2019;129:
273-285.
46. Chen F-C, Jahanshahi MR. NB-CNN: deep learning-based crack detection using convolutional neural network and Naïve Bayes data
fusion. IEEE Trans Ind Electron. 2018;65(5):4392-4400.
47. Irwin JS. A theoretical variation of the wind profile power-law exponent as a function of surface roughness and stability. Atmos Environ
(1967). 1979;13(1):191-194.
48. Ouyang T, Zha X, Qin L, He Y, Tang Z. Prediction of wind power ramp events based on residual correction. Renewable Energy. 2019;136:
781-792.
49. Cornejo-Bueno L, Cuadra L, Jiménez-Fernández S, Acevedo-Rodríguez J, Prieto L, Salcedo-Sanz S. Wind power ramp events prediction
with hybrid machine learning regression techniques and reanalysis data. Energies. 2017;10(11):1784.
50. Kramer O, Treiber NA, Sonnenschein M. Wind power ramp event prediction with support vector machines. In: Polycarpou M, de
Carvalho ACPLF, Pan JS, Woźniak M, Quintian H, Corchado E, eds. Hybrid Artificial Intelligence Systems. HAIS 2014. Cham, Switzer-
land: Springer International Publishing; 2014:37-48.
How to cite this article: Dhiman HS, Deb D. Machine intelligent and deep learning techniques for large
training data in short-term wind speed and ramp event forecasting. Int Trans Electr Energ Syst. 2021;e12818.
https://doi.org/10.1002/2050-7038.12818