0% found this document useful (0 votes)
143 views16 pages

A Review of Deep Learning Models For Time Series Prediction

This document reviews deep learning models for time series prediction. It begins by introducing time series prediction and its importance. It then categorizes deep learning models for time series prediction into discriminative models, generative models, and hybrid models based on their modeling of conditional or joint probabilities. The document evaluates these representative deep learning methods on benchmarks and real-world industrial data to analyze their performance for time series prediction. It concludes by discussing future directions and challenges in this area.

Uploaded by

Wang Ke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views16 pages

A Review of Deep Learning Models For Time Series Prediction

This document reviews deep learning models for time series prediction. It begins by introducing time series prediction and its importance. It then categorizes deep learning models for time series prediction into discriminative models, generative models, and hybrid models based on their modeling of conditional or joint probabilities. The document evaluates these representative deep learning methods on benchmarks and real-world industrial data to analyze their performance for time series prediction. It concludes by discussing future directions and challenges in this area.

Uploaded by

Wang Ke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE SENSORS JOURNAL, VOL. 21, NO.

6, MARCH 15, 2021 7833

A Review of Deep Learning Models for


Time Series Prediction
Zhongyang Han, Member, IEEE, Jun Zhao , Member, IEEE, Henry Leung , Fellow, IEEE,
King Fai Ma , and Wei Wang , Senior Member, IEEE

Abstract —In order to approximate the underlying process


of temporal data, time series prediction has been a hot
research topic for decades. Developing predictive models
plays an important role in interpreting complex real-world
elements. With the sharp increase in the quantity and dimen-
sionality of data, new challenges, such as extracting deep
features and recognizing deep latent patterns, have emerged,
demanding novel approaches and effective solutions. Deep
learning, composed of multiple processing layers to learn with
multiple levels of abstraction, is, now, commonly deployed for
overcoming the newly arisen difficulties. This paper reviews
the state-of-the-art developments in deep learning for time
series prediction. Based on modeling for the perspective of
conditional or joint probability, we categorize them into discriminative, generative, and hybrids models. Experiments
are implemented on both benchmarks and real-world data to elaborate the performance of the representative deep
learning-based prediction methods. Finally, we conclude with comments on possible future perspectives and ongoing
challenges with time series prediction.
Index Terms — Review, discriminative models, generative models, deep learning, time series prediction.

I. I NTRODUCTION Integrated Moving Average (ARIMA) [4], [5], filtering-based


methods [6], [7], support vector machines [8], etc.
T IME series, as a collection of temporal observations,
has attracted intensive attention initiating various studies
and developments in the field of machine learning and artifi-
Conventional techniques for time series prediction were
limited in their ability to process big data with high dimen-
sionality, as well as efficiently represent complex functions [9].
cial intelligence. Among the research aspects ranging from
Also, designing an effective machine learning system requires
dimensionality reduction to data segmentation, time series
considerable domain expertise of data. Recently, deep learning
prediction for acquiring future trends and tendency is one of
has emerged as the forefront of advanced artificial intelligence.
the most important subjects. The results can provide a basis
Deep learning describes models that utilize multiple layers
for various applications, e.g., production planning, control,
to represent latent features at a higher and more abstract
optimization, etc. [1]–[3]. Therefore, numerous models have
level [10]. The representations are learned from data rather
been proposed for solving this problem, e.g., Auto Regressive
than constructed by human engineers. Regarding the above
Manuscript received May 13, 2019; revised June 5, 2019; accepted superiorities, deep learning-based models have been success-
June 6, 2019. Date of publication June 20, 2019; date of current version fully applied in many fields pertinent to time series prediction,
February 17, 2021. This work was supported in part by the National Key including remote sensing [11], multi-sensor fusion [12], [13],
R&D Program under Grant 2017YFA0700300, in part by the National
Natural Sciences Foundation of China under Grant 61833003, Grant etc. They have also been explored as an effective approach
61703071, Grant 61603069, and Grant 61533005, and in part by the to discovering complex relationships between multiple time
Fundamental Research Funds for the Central Universities of China under series. Despite their effectiveness, we note that various deep
Grant DUT18RC(3)074. The associate editor coordinating the review of
this article and approving it for publication was Dr. You Li. (Corresponding learning models have their own individual advantages and
author: Zhongyang Han.) drawbacks.
Z. Han, J. Zhao, and W. Wang are with the School of Control Sciences In this paper, we review a variety of deep learning models
and Engineering, Dalian University of Technology, Dalian 116023,
China (e-mail: [email protected]; [email protected]; for time series prediction that have been developed to explic-
[email protected]). itly capture temporal relationships. The rest of this paper is
H. Leung and K. F. Ma are with the Department of Electrical and organized as follows: Section II will give a brief description
Computer Engineering, University of Calgary, Calgary, AB T2N 1N4,
Canada (e-mail: [email protected]; [email protected]). on time series prediction, including 3 groups of definitions and
Digital Object Identifier 10.1109/JSEN.2019.2923982 mathematical formulations. Section III elaborates a wealth of

1558-1748 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7834 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

models for time series prediction, which are categorized as than single data points, embraces time-series prediction with
discriminative, generative and others. In Section IV, experi- data segments in both equal and unequal length. Such direct
mental studies are implemented on data involving commonly prediction strategy can be expressed as follows
deployed benchmarks and real-world representative industrial
data to report the performance of some state-of-the-art deep X̂ = f (x t , x t −1 , · · · · · · , x t −N+1 ) (4)
learning models. Finally, we conclude with a brief discussion  T
of possible future topics in Section V. where X̂ = x̂ t +1, x̂ t +2 , · · · · · · , x̂ t +M refers to the pre-
dicted vector.
II. T IME S ERIES P REDICTION
Generally, the aim of time series prediction is to forecast C. Univariate/Multivariate Modeling
its value at some future time t + h using available observa- Time series prediction may be defined with regards to the
tions from a time series at time t. The detailed definition dimensionality of the modeling variables. The problems are
or mathematical formulation varies with respect to different named as univariate [30] and multivariate [31], which are
cases. To provide background for the following sections, this initially proposed for classifying ARIMA models [32], [33].
study will introduce 3 typical groups of definitions formulating Detailed definitions for univariate and multivariate time series
the problem of time series prediction. prediction can be described as following Eq. (5) and (6),
respectively.
A. Observations at Equal/Unequal Intervals of Time
x̂ t +1 = f (x t , x t −1 , · · · · · · , x t −N+1 ) (5)
For a large amount of existing studies, the observations
are assumed to be available at equispaced intervals of x̂ t +1 = f (x t1 , · · · , x t1−N+1 , x t2 , · · · , x t2−N+1 , · · · · · · ,
time [14], [15]. In such a case, the problem of time series x tL , · · · , x tL−N+1 ) (6)
prediction can be formulated as follows
where L denotes the number of related variables. The Eq. (5)
x̂ t +h = f (x t , x t −1 , · · · · · · , x t −N+1 ) (1) can be regarded as a special case of Eq. (1), of which h = 1.
where x t , x t −1 , · · · · · · , x t −N+1 refers to time series data One of the most common conventional approaches when
points, x̂ t +h is the predicted results. N denotes the number dealing with multivariate series is the Vector Autoregres-
of inputs, also named as embedded dimension in some stud- sive (VAR) model which considers linear relationship between
ies [16], [17]. The timestamp h could be 1 [18], [19], or variables, and has been widely applied in the field of eco-
any positive integer, of which is named as multi-step-ahead nomics [34]. Other approaches will be discussed in the follow-
prediction [20], [21]. ing sections. The problem of embedded dimension selection
Some studies also break the equispaced interval assumption, also grows with multivariate data.
instead, processing time series data observed at unequal length
of time [22], [23]. Under such circumstances, the problem of III. C LASSIC AND D EEP L EARNING M ODELS FOR
time series prediction should be expressed as follows T IME S ERIES P REDICTION
By modeling in view of conditional or joint probability,
x̂ t +h = f (x t −l1 , x t −l2 , · · · · · · , x t −l N , l1 , l2 , · · · , l N ) (2)
the deep learning models can be categorized as discriminative
where timestamps l1 , l2 , · · · , l N denote various time space and generative approaches. This categorization is not only
among the observations. effective for classification, but also for time series prediction,
for which the mechanism of the models remains the same.
B. Recursive/Direct Prediction Strategy Besides, other unclassifiable but typical hybrid approaches are
also described in this section.
In order to predict the several-timesteps-ahead values of a In order to introduce models in a natural way as well as
time series, one of the most intuitive approaches is to deploy clarify the difference, we will review one or two classical
recursive prediction strategy [24]–[26]. This kind of iterative methods at first for each category in this section. Then, several
process can be expressed as follows state-of-the-art deep learning models for time series prediction
x̂ t +1 = f (x t , x t −1 , · · · · · · , x t −N+1 ) will be discussed in detail.
 
x̂ t +2 = f x̂ t +1, x t , · · · · · · , x t −N+2
A. Discriminative
......
  Discriminative models, or conditional models, are a class
x̂ t +M = f x̂ t +M−1 , x̂ t +M−2 , · · · · · · , x t +M−N (3)
of methods depending on the observed data and learning to
where M denotes the number of iterations. act from the given statistics. The orientation of these models
The main drawback of recursive prediction is its accu- is with respect to the conditional probability of the target Y
mulated error which gradually deteriorates the prediction given an observation X, which means they can be used to
accuracy. As such, some researchers are focusing to pre- ‘discriminate’ the target given an observation [35]. As such,
dict multiple data points in one-time iteration. For instance, two classic and three deep learning-based models are deployed
Granular Computing [27]–[29], modeling on granules rather as representatives to be discussed in this section.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7835

1) Representative Classical Model - Support Vector Machine: regression and forecasting. Although currently it draws less
Support vector machine (SVM), proposed by Vapnik [36] attention comparing with the RNNs, its application on time
in 1995, is a supervised kernel-based method and always series prediction still performs well on both computational cost
applied to nonlinear time series prediction [37]–[41]. The and accuracy [63]–[66].
classical SVM can be summarized as to solve a quadratic
programming problem, which tends to cause curse of dimen- 3) Representative Deep Learning Model - Convolutional
sionality with the increase of training dataset size. As a result, Neural Network: In 1959, two neurobiologists, Hubel and
some algorithms, such as Sequential Minimal optimization Wiesel, found a unique neuron structure during the research on
(SMO), were proposed to overcome the shortcomings [42]. cat’s receptive fields in striate cortex, which is the prototype
Also, Suykens and his fellowships have constructed a Least of Convolutional Neural Network (CNN) [67]. The CNN was
Square Support Vector Machine (LSSVM), in which the successfully applied for processing image, speech and time
unequal constraints are replaced by equal ones so that the series in 1995 by LeCun and Bengio [68]. Using a variation of
computing efficiency is remarkably enhanced [43]. MLP designed to require minimal preprocessing, CNN is also
SVM was not only independently applied regarding the known as Shift Invariant Artificial Neural Network (SIANN)
superiority in modeling and predicting times series, but also owing to its shared-weights structure and translation invariance
combined with other techniques so as to obtain higher predic- characteristics.
tive accuracy considering the non-stationarity and complexity Similar with the conventional NNs, a CNN also consists
in time series. For example, in [44], the trend and fluctuation of three parts, i.e., an input layer, multiple hidden layers and
parts hidden in original time series were decomposed by the an output layer. Each layer may contain activation functions,
singular spectrum analysis before the establishment of an such as Rectified Linear Unit (ReLU). The hidden layers
SVM. Multiple LSSVM models were built in a neuro-fuzzy typically include fully connected layers and operators involv-
framework to construct different local regimes of the input ing convolutional layers along with pooling. The purpose of
space for improving the accuracy [18]. In addition, SVM pooling is to achieve invariance to small local distortions and
models have also been applied in multivariate time series reduce the dimensionality of the feature space [69]. In order to
consisting of multiple spatial observations. To overcome the mine the deep information, connected convolution and pooling
instabilities of spatiotemporal forecasting, a multi-output SVM operators often repeat many times in the network. The use of
model with multi-task learning was reported in [45]. convolutional operations allows for the number of parame-
2) Representative Classical Model - Shallow Neural Net- ters to be far smaller than a fully connected network, thus
works: Another classical discriminative model for time series resulting in efficient training and inference. The convolution
prediction is Neural Networks (NNs). In order to distinguish technique has also been deployed on deep belief network,
the machine learning and deep learning models, here we a generative deep learning model which we will review in the
only investigate the shallow, or in other words, traditional following section 3.2, to introduce probability into the pooling
or vanilla NNs. With almost 70 years development, NNs process [70].
have successfully solved problems ranging from regression, The CNN model for a univariate sequence will operate using
classification to feature exaction, inference, etc. Regarding the a set of filters or weights in the convolutional layer, and the
subject of time series prediction, feedforward NNs were one output of the layer ol is obtained by simply computing the dot
of the most commonly deployed approach, represented by product between the overlapping input x and  the weights w
N
Multi-Layer Perceptron (MLP) and error Back Propagation similar to an autoregressive manner. ol = 1 wi x t −i + bi ,
(BP) [46]–[51]. These models consist of intuitive calculations where the receptive field of the filter is of size N. The receptive
among neurons so that they are easy to be interpreted and field determines the number of inputs that can influence the
realized. They have been also used for nonlinear prediction output similar to the autoregressive model. We note that a
in multivariate domain [52] by combining all inputs in the multivariate sequence may be thought of as a 2D image, for
input layer. However, their shortcomings are also evident example several works have used the spectrogram as the input
when facing complex tasks with high dimensionality and for acoustic event processing [71]. By considering time as one
variance, which is low efficiency and inability to approximate axis and the frequency (or in general, multivariate observations
complicated functions. By using Radial Basis Function (RBF) for each time) on the other, an “image patch” is formed. This
as activation functions, RBF network has been an alternative structure allows for finding local patterns in the input series.
to forecast time series data, exhibiting superiority on conver- This is followed by multiple layers of convolutional layers and
gence rate and being capable of approximating any nonlinear pooling, and finally a fully connected layer. The basis of con-
function [53]–[57]. volution uses the same weights for sliding window, and thus
Another wealth of classical networks is recurrent structures. tries to learn the correlation and repeating patterns between
Unlike feedforward NNs, Recurrent Neural Networks (RNNs) the variables. However, the receptive field or window size is
connects neurons’ output to their inputs. Such a close-loop typically fixed in conventional CNN architectures. Recently,
endows RNNs the ability of memorizing information involv- dilated convolutions were introduced  where the filter is applied
ing trend and tendency. The training process for RNNs is to every d inputs, such that ol = k1 wi x t −d×i + bi . The use
a well-known strategy named as Backpropagation-Through- of dilated convolutions in CNN allows the filter to increase the
Time (BPTT) [58]–[62]. As well, Echo State Network (ESN), receptive field and thus allowing for access to a longer history
first proposed in 2001, has been also widely applied for of time [74]. The dilated causal convolution architecture

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7836 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

without pooling operations was proposed in WaveNet for features of the latest timestep obtained from Eq. (8)-(10) are
improved speech modeling [72]. used in a prediction algorithm, typically a feedforward neural
Some literatures have been reported to use CNN for pre- network like MLP, in Eq. (11). The weights are learned from
dicting chaotic as well as real-world time series. For instance, BPTT with the loss function such as mean squared error in
[76] reported a deep CNN model for dynamic occupancy Eq. (12).
grid prediction with data from multiple sensors. One recent
work adapted the dilated CNN architecture to the stock market x̂ t +1 = f (h t ) (11)
prediction problem [77]. A deep spatio-temporal residual net-
1 
N
2
work, as an extension of CNN, is presented in [78] for citywide L = x t +1 − x̂ t +1 (12)
crowd flows prediction. An ensemble model is established N
t =1
in [79] for wind power forecasting. Besides, remote sensing is
also one of the hot topics for the real application of CNN [80]. The input gate and update i k , u k control the value of the
To the best of our knowledge, we found that more works current cell state update ck , while the forget gate f k controls
utilize the CNN architectures for classification rather than forgetting factor of the previous state ck−1 .
prediction of the next value with multivariate time-series, There are several common variants proposed to LSTM.
such as predicting heart failure from heartbeat data [73] or From Eq. (8), we see that the current state ck−1 has no effect on
activity from biometric time series [75]. Compared with the the gates. The output of the LSTM h k−1 has an effect, which
extensive application on image recognition [81]–[83], using is close to zero if the output gate is zero. The main usage is
CNN for time series prediction still requires further studies to when extracting information from long timesteps are required.
demonstrate its superiority on learning deep features so as to The peephole LSTM [94] modifies this to have an effect, such
obtain better performance. that the output of gates in (8) is dependent on ck−1 and another
4) Representative Deep Learning Model - Long-Short Term parameter W x k + U h k−1 + V ck−1 + b. The peephole LSTM
Memory: Standard RNN’s suffer from a vanishing and was able to learn highly nonlinear spike trains with constant
exploding gradient problem, as the BPTT procedure depends time delays and count delays between sharp spikes.
exponentially on weights for each timestep, thus failing Gated Recurrent Unit (GRU) is a simplified version of
to learn information over typically 5-10 timesteps [84]. the LSTM that combines forget and input gates to form an
The Long-Short Term Memory (LSTM) recurrent network ‘update’ gate [96]. As a result, it has less parameters but
overcomes the vanishing gradient problem by introducing a reduced complexity than LSTM. The cell state and hidden
linear unit (cell) called a Constant Error Carousel (CEC) that state is also combined, and a ‘reset’ gate is used. While
information can be added for each timestep. Error flow control there have been several improvements and modifications to the
with a CEC is conducted using ‘gates’. For example, the input standard (vanilla) LSTM, recent studies [87], [95] show that
gate controls the information added to the cell, the output gate most variants do not significantly improve the performance
regulates the flow of information out to the rest of the network, for sequential tasks such as speed, music and handwriting
while the forget gate decays the activation of the previous recognition. In fact, it was shown that initializing a bias to
timesteps. Thus, LSTM can maintain temporal information a large value of 1 or 2 to the forget gate improved LSTM
in the state for a long number of timesteps and is widely performance for long term dependencies [95].
used in sequential data analysis, prediction and classifica- The GRU authors also proposed an encoder-decoder frame-
tion tasks [85]–[88] in univariate and multivariate [89], [90] work [96], where instead of predicting at each timestep,
domains. the prediction is decomposed into two steps. An RNN encoder
We describe the standard LSTM use from the Auto Regres- such as LSTM is used to obtain a hidden state from a
sive (AR) model perspective in Eq. (7). The LSTM for a variable length sequence h t = f (h t −1 , x t ). The representation
single timestep k where i k , fk , u k , ok , ck represent input gate, at the end of sequence is denoted as c. A decoder RNN
forget gate, update for cell state, output gate, cell state, would decode the hidden state to predict multiple timesteps
respectively [85]. The weights W = Wi , Wg , Wu , Wo , U = h t = f (h t −1 , yt −1 , c). Many conventional architectures utilize
[Ui , Ug , Uu , Uo ] and bias is denoted b. Internal LSTM states only the previous and current timestep, while the bidirectional
h 0 , c0 are initialized as zero. LSTM utilizes the future timesteps, and was used in sequence-
to-sequence speech recognition [100]. However, the encoder-
x t +1 = f (x t , x t −1 , x t −2 , . . . x t −N+1 ) (7)
decoder framework was shown to deteriorate in performance
For k = t − N + 1 to t with longer sequences [99]. The use of attention mechanism in
⎡ ⎤
ik the encoder-decoder framework [96] addressed this, where a
⎢ fk ⎥ weighting of the previous hidden states is used, and allows
⎢ ⎥ = W x k + U h k−1 + b (8)
⎣ uk ⎦ the network to select relevant hidden states. Recently the
ok attention mechanism was modified for use in time series
ck = ck−1 σ ( f k ) + tanh (u k ) σ (i k ) (9) prediction [97], [98]. Besides, the real-world examples of
using LSTM and GRU for time series analysis are also
h k = σ (ok ) tanh(ck ) (10)
reported in the application of diagnosis of neurodegenera-
where σ is sigmoid function. h k at each timestep becomes tive diseases [101], hydrologic prediction [102], multi-sensors
the features learned from the history of the input. Finally, the fusion [98], [103], remote sensing [104]–[106], etc.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7837

The LSTM can be extended to multivariate domain [88],


[90], [91] by additional dimension on parameters W , U
and b. We note the multivariate LSTM assumes equispaced
intervals of time, thus many common approaches interpolate
data [88], [92] to align and fit the recurrent model. Other
LSTM variants have sought to incorporate timestep into the
network architectures, such as adding a new time gate in the
Phased LSTM [93].
5) Representative Deep Learning Model - Auto-Encoder: As
a discriminative method for directly learning a parametric map
from input to representation, Auto-Encoder (AE) is to explic-
itly define a feature-extracting function, i.e., the encoder f θ ,
in a specific parameterized closed form [107]. Given each
sample x i from a dataset x 1 , x 2 , . . . , x N , the feature vector
or code is computed as
h i = f θ (x i ) (13)
Another parameterized closed function gθ , called the decoder,
maps from feature space back into input space, producing a
Fig. 1. A basic DSN architecture using input-output stacking. Dash lines
reconstruction as denote copying layers.
ri = gθ (h i ) (14)
classification [121], [122]. The weight matrix in lower-layer
The form of fθ and gθ can be simply affine mappings as W connects the linear input layer and the hidden nonlinear
following layer, and the one in upper-layer U connects the hidden
nonlinear
 layer−1with Tthe linear output layer. Given the fact that
f θ (x i ) = σ f (Wx + b) (15)
 
U = HHT HT = F(W) (H = [h1 , . . . , hi , . . . , h N ],
gθ (h i ) = σg (W h + b ) (16) hi = σ (WT xi ), xi denote the training vector), the weights
 in DSN can be learned by simple gradient computation with
where W and W are the weight matrices for encoder and
 batch training and parallel strategy [123].
decoder, along with b and b as respective bias vectors.
The above DSN has already been generalized to ten-
σ f and σg denote the activation functions, which can be
sorized version, i.e., Tensor DSN (TDSN), which provides
sigmoid, logistic, etc. [108], [109] Obviously, the parameters
higher-order feature interactions comparing with conventional
can be determined by minimizing reconstruction error E(x, r ),
DSN [124]. Motivated by increase the size of the hidden units
which is usually carried out by stochastic gradient descent
without increasing the number of parameters to learn, Kernel
technique [110].
DSN (KDSN) has been proposed using kernel trick [125]. The
After been initially deployed for dimensionality reduction
DSN was also reported to be connected with a Conditional
by its original form, AE has been developed into many forms,
Random Field (CRF), which was successfully developed and
such as sparse AE, regularized AEs involving Contractive
applied for Natural Language Processing (NLP) [126].
AEs (CAEs) and Denoising AE (DAE), etc. [111]–[113] The
As for time series prediction, [127] and [128] both pre-
application for time series prediction ranges from multimodal
sented a typical DSN-based approach, namely Deep-STEP,
fusion for sensor data [114], traffic flow [115] to host load
for spatiotemporal prediction of remote sensing data. Besides,
in cloud computing [116]. Reference [117] also reported an
the DSN is commonly combined with auto-encoders for fore-
extreme deep learning approach using stacked AE to pre-
casting time-series benchmarks [129] and practical data, such
dict building energy consumption. Besides, AE is commonly
as crude oil price [130]. Reference [131] also presented a novel
deployed to combine with other deep learning approaches
double deep ELMs ensemble system for time series prediction,
for forecasting time series. We will investigate them in the
which utilized DSN for generalization.
following sections.
6) Representative Deep Learning Model - Deep Stacking Net-
work: Inspiring by the idea of stacking, a novel discriminative B. Generative
deep learning method emerged recently is the Deep Stacking Different from the discriminative models, a generative
Network (DSN), where simple functions are composed first model considers the joint probability distribution of both
and then they are “stacked” on top of each other so as to learn observation X and target Y . It can be used to ‘generate’
complex functions [118], [119]. The basic architecture of DSN random instances regarding a set of observation and target,
is shown in Fig. 1 involving a number of layered modules, i.e., (X, Y ). We will also review two conventional approaches
in which each module is a specialized neural network consist- along with three deep architectures in this section.
ing of a single hidden layer and two sets of weights [120]. 1) Representative Classical Model - Gaussian Process:
The figure only gives 4 such modules, while they could be Gaussian Process (GP) [132], as a kernel based probabilistic
up to hundreds in practice, especially for image and speech model, builds probabilistic relationships between the latent

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7838 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

functions of samples [133]–[136]. The advantage of a GP where v i , h j , ai , b j and Wi j are the elements of v, h, a, b
lies in its ability of modeling the uncertainty hidden in data, and W respectively. Based on this energy function, the joint
which is provided by predicting distributions. Considering the probability distribution is yielded as
non-stationarity of a time series, a covariance matrix in a GP 1 −E(v,h)
was designed in [137]. As for multiple-step ahead times series P (v,h) = e (18)
Z
forecasting, in a noisy inputs-based GP model, the propagation
of the uncertainty in each step prediction was realized by a where Z is the partition function for guaranteeing that the
Gaussian approximation [138]. For cases with missing points distribution is normalized. Considering the special structure
in time series, a GP framework based on semi-described and of RBM in which the connections are existed only between
semi-supervised learning was reported in [139], in which the layers, the states of each units in hidden layer are conditionally
posterior distribution over the missing points was provided independent given visible units, and vice versa.  Therefore,
by using a variational inference. Moreover, the time series in the conditional probability P (v i |h) and P h j |v are given by

some fields, such as financial, often exhibits heteroscedastic P (v i |h) = σ (ai + Wi j h j )
characteristic, i.e., the volatility or fluctuation of these data is j
time-varying rather than constant. To forecast such time series   
data, the GP-based volatility models were reported in recent P h j |v = σ (b j + v i Wi j ) (19)
years [140]–[142]. i
In a nutshell, the GP based models can give excellent where σ (∗) is the activation function, of which logistic
predictive performance with prediction uncertainty. However, function σ (x) = 1+e1 −x is a common choice [161]. RBM
as a black box model, they cannot give a clear interpretation of has several extensions, e.g., conditional RBM [162], gated
how the time series evolves in a certain application situation. RBM [163], etc.
Besides, these methods may suffer from the time-consuming Some studies have already reported using RBM to forecast
issue owing to the calculation of inverse for covariance matrix, or predict time series data. For instance, [164] proposed a
especially with large datasets. deep network structure consisting of two RBMs to realize
2) Representative Classical Model - Bayesian Networks and prediction. One of the authors has also reviewed extended
Hidden Markov Model: Bayesian Networks (BNs) [143], [144], RBM for time series which often have recurrent structure so
treated as directed probabilistic graph models, are a class that BPTT is employed to learn the parameters [165]. Focusing
of powerful tools for dealing with the uncertainty issues. on multiperiod wind speed prediction, [166] designed a deep
There already exist numerous studies on using BNs for time Boltzmann machine regarding its competitive capability on
series prediction [145]–[148]. Compared with other generative approximating nonlinear and nonsmoothed functions. Besides,
models (e.g., a GP), BNs can be naturally applied to model the a dynamic RBM is presented in [167] which considers
multivariate time series, where the relation between variables Gaussian properties of the data. In summary, this kind of
as well as the evolution over time will be both effectively deep learning approach is suitable for modeling relationship
captured [149]–[151]. or discovering features from the perspective of probability.
Particularly, Hidden Markov Model (HMM), a special BN 4) Representative Deep Learning Model - Deep Belief Net-
which has discrete latent nodes, is often employed to accom- work: In order to further develop the ability of abstraction and
plish time series prediction tasks [152]–[154]. These HMM the capacity of information, the RBMs are typically stacked as
based prediction methods assume that the time series evolves an integrated structure, to form a Deep Belief Network (DBN).
under the control of ‘events’ or ‘patterns’ hidden in data, which A DBN with l layers models the joint distribution between
will then transform over time with a certain probability [155]– observed variables v i and hidden layers h(k) , k = 1, 2, · · · , l
[158]. consisted of binary units h (k) j , which can be described as
It should be noted that although the BNs based prediction follows.
models can capture uncertain relations between the multivari-  
ate time series, they may consume more training time to find p v,h(1) ,h(2) , · · · ,h(l)
   
an effective network structure so as to produce high accuracy. = P v|h(1) P h(1) |h(2)
And as for the HMM based models, how to determine the    
cardinality of the latent nodes is still an open topic deserving × · · · P h(l−2) |h(l−1) P h(l−1) ,h(l) (20)
further discussion.
3) Representative Deep Learning Model - Restricted Boltz- Assuming v =h(0) , along with a(k) the bias vector of layer
mann Machine: The Restricted Boltzmann Machine (RBM), k and W(k) the weight matrix between layer k and k + 1,
proposed by Hinton and Sejnowski in 1986 [159], is a the factorial conditional distribution in DBN can be formulated
generative deep model concerning probabilistic relationship as follows.
between input, i.e., visible units vand latent, i.e., hidden units     
(k)
P h(k) |h(k+1) = P h i |h(k+1) (21)
h [160]. The visible and hidden units have their bias a and b,
i
respectively, and they are connected with a weight matrix W.  
The energy function E (v,h) of an RBM is defined as follows (k) (k)  (k) (k+1)
where P h i = 1|h(k+1) = sig(ai + Wi j h j ). As
    (l−1) (l)  j
E (v,h) = − ai v i − bjh j − Wi j v i h j (17) such, p h ,h refers to an RBM [168].
i j ij

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7839

Comparing with RBM, DBN have been more widely Reference [190] reported a LSTM-based GAN for predicting
deployed as a learning model of predicting temporal data. The traffic flow. Besides, a deep generative adversarial architecture
predictive objects are not only chaotic time series [169], [170], for network-wide spatial-temporal traffic-state estimation is
but also ranging from traffic flow, energy to drought index, proposed in [191]. Considering the capability of generating
etc. [171]–[176] As for the structure of this method, some missing data points [192], the prediction for incomplete time
scholars have constructed ensemble DBNs which aggregate series is a possible future topic deserving further study.
the outputs so as to obtain better accuracy [177], [178]. DBN
has also been used in combination with some classical models,
such as ARIMA [179]. In the future, optimization techniques C. Others
are still highly demanded to reduce the computational cost this 1) Representative Classical Model - Clustering-Based Model:
deep model, especially for ones exhibiting ensemble structure. As the most commonly deployed semi-supervised technique,
5) Representative Deep Learning Model - Generative Adver- the clustering-based methods are reported in the literatures for
sarial Nets: At twenty-eighth conference on Neural Infor- predicting and forecasting time series data [193], [194]. The
mation Processing Systems (NIPS), Ian J. Goodfellow et al. clusters, as a typical representative of the information granules,
proposed a new framework for estimating generative models are used to construct a time series prediction model by the
via an adversarial process, i.e., Generative Adversarial Nets aid of techniques such as Fuzzy C-Means (FCM). Generally,
(GANs) [180]. Two neural networks are typically involved this framework first clusters the data so as to produce the
and simultaneously trained in GANs: a generative G that prototypes, which are then employed to extract fuzzy rules
captures the data distribution, and a discriminative D that and finally obtains long-term prediction results [27], [28],
estimates the probability that a sample came from the training [195]–[197]. These methods modeled the information gran-
data rather than G. The objective of the D is to discriminate ules rather than single point so that the accumulated errors
the training and generated data, whereas the G is trained caused by the iterative prediction are avoided. However, these
to confuse the D as much as possible. As for the most granules are still intuitive which needs to be constructed
straightforward to apply GANs, in which the G and D are both in a deeper level.
multilayer perceptrons, the entire system can be simply trained Another typical clustering-based model is k-Nearest Neigh-
with backpropagation technique. Given input data x and the bor (k-NN). This simple but powerful technique has been
noise z, the objective of training GANs can be described applied in many different fields, particularly for classification
as a two-player minimax game with value function V(G, D) and prediction [198]–[201]. The points having shortest dis-
between D and G: tance with the data in training set are selected as the nearest
neighbors [202]–[204]. While k-NN has its shortcomings, such
V (G, D)
    as high computational cost and space complexity, which limits
= Ex∼ pdata (x) log D(x) + Ez∼ pz (z) log (1 − D(G (z))) its scope of application.
(22) 2) Representative Deep Learning Model - Hybrids: Besides
the above-mentioned studies reporting individual deep learn-
where pdat a (x) denotes the generator’s distribution over
ing models, there also exist some hybrids or combinational
data x, and pz (z) the prior distribution over noise. In practice,
structures which are successfully applied on time series pre-
the training process is implemented by some iterative, numer-
diction [205], [206]. For instance, a deep learning approach
ical approach. One of the most commonly used technique
using Self Organizing Maps (SOMs) and MLP were proposed
is minibatch stochastic gradient descent, which updates the
in [207] for multi-sensor data prediction. Reference [115]
discriminators by ascending its stochastic gradient as well as
reported a deep architecture model for traffic flow prediction,
the generator by descending its stochastic gradient:
in which a stacked autoencoder is used to extract traffic
1 m flow features, and a logistic regression layer is applied for
∇θd [log D (xi ) + log (1 − D(G(zi )))] (23)
m i=1 prediction. Combining CNN and LSTM, [208] proposed a
1  m
∇θg log (1 − D(G(zi ))) (24) hybrid deep learning framework, also to forecast future traf-
m i=1 fic flow. Reference [209] discussed prediction models using
where θd and θg denote the parameters for the D and G. autoencoder and LSTM with various activation functions for
xi and zi are the samples from data generating distribution solar power forecasting. An integrated approach is proposed
pdat a (x) and noise prior pg (z) [181]. in [210] which combines discriminatively trained predictive
The variants of GANs include varying objective of models with deep neural networks accounting for the joint
the G, D and also the overall architecture, yield conditional statistics of a set of weather-related variables.
GANs [182], InfoGAN [183], SeqGAN [184], Wasserstein Such hybrid approaches take the advantage of both discrim-
GANs [185], least squares GANs [186], etc. The reported inative and generative models for applying deep learning on
application cases are mostly for image processing [187]. time series prediction. Still, resolving how to combine various
Still, a number of studies applying GANs for time-series models in a reasonable and effective way requires special
prediction has been also emerged recently. For instance, [188] consideration towards different application fields.
presented a study of stock market prediction on high-frequency In order to give a brief summarization, we also organize a
using GAN. And a GAN-based parallel prediction model is table as the Appendix to describe the advantage or disadvan-
proposed in [189] to forecast building energy consumption. tage or both of each model.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7840 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

IV. E XPERIMENTS AND A NALYSIS


This section selects six methods, i.e., RNN with BPTT,
CNN, LSTM, GANs, DBN and sparse AEs with LSTM as
a hybrid, as the representative for demonstrating the perfor-
mance of deep learning models on time series prediction.
The data we use are two benchmarks including Mackey-Glass
and Lorenz chaotic time series, along with Blast Furnace
Gas (BFG) generation and Coke Oven Gas (COG) generation
as the typical gaseous energy from steel industry. Mean
Absolute Percentage Error (MAPE) [211] and Root Mean Fig. 2. Predicted results of different deep learning models for Mackey-
Square Error (RMSE) [212] are deployed as indices for error Glass time series.
statistics, which are defined as follows.
  TABLE I
100  N  yi − ŷi  E RROR S TATISTICS AND PARAMETERS OF D IFFERENT D EEP L EARNING
MAPE = (25) M ODELS FOR P REDICTING M ACKEY-G LASS T IME S ERIES
N i=1 yi

1 N
RMSE = (yi − ŷi )2 (26)
N i=1

where N denotes the length of predicted results, i.e., ŷi . yi


refers to the real value. In this study, N = 60. Parameters
include the number of units in input layer n input , hidden layers
(1) (2) (k)
n hidden , n hidden , · · · , n hidden and output layer n out put for each
model. As for CNN, this study deploys max pooling. The
parameters further include the size of the convolutional kernel,
the number of neurons n f c in fully connected layer between
the convolutional layer and the predictor. The predictor for
CNN in this study deploys a simple backpropagation neural
network. The step size for the convolutional nets is set to 1,
and the size of the convolutional kernel as 1 × 5. The hybrid
also includes sparsity ρ and the number of divided layers
n divided . All the parameters are determined by Differential
Evolution (DE) algorithm [213]. The results are calculated on
testing data.

A. Benchmarks
1) Mackey-Glass: The Mackey-Glass time series prediction
is a typical nonlinear fitting problem, which has greatly
attracted attentions. The series is generated by the following
time delay differential equation:
d x(t) αx(t − τ )
= + γ x(t) (27) where σ , r and b refer to chaotical parameters. The series was
dt 1 + x β (t − τ ) generated from Eq. (28) by the Runge-Kutta algorithm [215].
where x(t) refers to the generated data. τ denotes the time Comparing with Mackey-Glass, the Lorenz data exhibits no
delay parameter. α, β and γ are real numbers [214]. obvious periodic variations so prediction on this data is
Predicted results of different models for Mackey-Glass time somewhat more difficult. As a result, the numbers of hidden
series are given in Fig.2. It can be depicted that almost all layers of the deep learning models are all higher than the
six models successfully forecasted the tendency of this data. ones for Mackey-Glass time series. Besides the RNN and
In particular, LSTM performs best on accuracy, which can be CNN, the other four models still output satisfactory results,
also concluded from error statistics shown in Table I. which demonstrate the applicability on time series prediction,
2) Lorenz: The Lorenz system can be described as the as shown in Fig.3. Table II gives error statistics along with
following set of equations determined parameters.
d x(t)
= σ (x (t) − y(t)) B. Real-World Data
dt
d y(t) For the steel industry, the generating amount of gaseous
= r x (t) − x(t)z(t)
dt energy plays a crucial role for supporting production and
dz(t) saving energy. The prediction on this time series data enables
= z (t) y (t) − bz(t) (28)
dt the operating staff to acknowledge the trend of gas flow in

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7841

Fig. 4. Predicted results of different deep learning models for BFG


Fig. 3. Predicted results of different deep learning models for Lorenz generation.
time series. TABLE III
E RROR S TATISTICS AND PARAMETERS OF D IFFERENT D EEP L EARNING
TABLE II
M ODELS FOR P REDICTING BFG G ENERATION
E RROR S TATISTICS AND PARAMETERS OF D IFFERENT D EEP L EARNING
M ODELS FOR P REDICTING L ORENZ T IME S ERIES

advance, which is then beneficial for scheduling and opti-


mizing the utilization of gaseous energy. As a result, here
we employed BFG generation and COG generation as the
representative data application.
1) BFG Generation: By implementing LSTM and DBM
on BFG generation data, we obtain the predicted results as
shown in Fig.4. The RNN, CNN and GANs perform well for
the first 40 points, then behave not as well as for predicting
Mackey-Glass and Lorenz chaotic time series. While the three
deep models give excellent results on this industrial data in
all these 60 points. And as shown in Table III, the different
models exhibit little difference on the accuracy involving
MAPE and RMSE.
2) COG Generation: As shown in Fig.5, compared to time Fig. 5. Predicted results of different deep learning models for COG
series of BFG generation, LSTM and hybrid behave sub- generation.
stantially different with the other models for predicting COG
generation amount. LSTM and hybrid that still perform well, In summary, these two representative models perform well
whereas RNN, CNN GANs and DBN failed to accurately for selected time series including both benchmarks and
estimate the variation of this data, which can be also concluded real-world data. Analysis shows LSTM seems to be more
from error statistics in Table IV. stable than DBN in the implemented experiments.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7842 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

TABLE IV TABLE V
E RROR S TATISTICS AND PARAMETERS OF D IFFERENT D EEP L EARNING B RIEF S UMMARY OF THE R EVIEWED M ODELS
M ODELS FOR P REDICTING COG G ENERATION

Considering the time for modeling deep features with mul-


tiple hidden layers, the training time for the deep learning
models is comparatively longer than the one for conventional
machine learning models. While the time for prediction con-
sumes far less, which exhibits little difference with the tradi-
tional models. Bearing this in mind, the real-world application
always run these two procedures in separate, i.e., train the
model offline, possibly with parallel strategy, and predict the
data online, so that the practical requirements on real-time
performance can be satisfied.
In summary, these representative models perform well for
selected time series including both benchmarks and real-world
data. Analysis shows LSTM and hybrid seems to be more
stable than other ones in the implemented experiments.

V. F UTURE P ERSPECTIVES
Deep learning-based models are good at discovering intri-
cate structure in large data sets. In addition, they have been
shown to learn to discriminate patterns from multiple time
series information. This superiority in the feature learning
process also results in some mystery in interpretation with
regards to the output. As for time series prediction, providing
explainable result is beneficial for measuring its reliability.
Therefore, enhancing interpretability becomes a future topic
attracting intensive attention [216], [217]. For instance, IJCAI
specially organized a workshop discussing Explainable Arti-
ficial Intelligence (XAI) from 2017 [218], [219]. As well,
the best paper of ICML 2017 was awarded to a study on
understanding black-box predictions [220].

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7843

Acceleration for the learning process is another area in [17] L. Zhang, W.-D. Zhou, P.-C. Chang, J.-W. Yang, and F.-Z. Li, “Iterated
which deep learning needs to be improved over the next few time series prediction with multiple support vector regression models,”
Neurocomputing, vol. 99, pp. 411–422, Jan. 2013.
years. In order to determine the weights of multiple layers in [18] A. Miranian and M. Abdollahzade, “Developing a local least-squares
the network, deep learning-based approaches usually requires support vector machines-based neuro-fuzzy model for nonlinear and
hours or even days for training, even using the latest GPU chaotic time series prediction,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 24, no. 2, pp. 207–218, Feb. 2013.
processors, which is unacceptable for time series prediction [19] C.-H. Lee, F.-Y. Chang, and C.-M. Lin, “An efficient interval type-2
in some real-world application cases. For improving the fuzzy CMAC for chaos time-series prediction and synchronization,”
computational efficiency, some studies have been conducted IEEE Trans. Cybern., vol. 44, no. 3, pp. 329–341, Mar. 2014.
[20] S. B. Taieb, G. Bontempi, A. F. Atiya, and A. Sorjamaa, “A review and
to propose algorithms and strategies for accelerating deep comparison of strategies for multi-step ahead time series forecasting
learning [221]–[223]. based on the NN5 forecasting competition,” Expert Syst. Appl., vol. 39,
Besides combining with classic models, deep learning-based no. 8, pp. 7067–7083, Jun. 2012.
[21] A. G. Parlos, O. T. Rais, and A. F. Atiya, “Multi-step-ahead prediction
methods are also typically merged with meta-learning and using dynamic recurrent neural networks,” Neural Netw., vol. 13, no. 7,
reinforcement learning so as to utilize the ability of inter- pp. 765–786, Sep. 2000.
action with environment of these two methods [224]–[227]. [22] E. Ramasso, M. Rombaut, and N. Zerhouni, “Joint prediction of
continuous and discrete states in time-series based on belief functions,”
There already exist some literatures which successfully make IEEE Trans. Cybern., vol. 43, no. 1, pp. 37–50, Feb. 2013.
predictions by deep meta-learning or deep reinforcement learn- [23] C. H. Aladag, U. Yolcu, E. Egrioglu, and A. Z. Dalar, “A new time
ing [228], [229]. We believe that more studies will emerge in invariant fuzzy time series forecasting method based on particle swarm
the near future as improvements on accuracy, efficiency and optimization,” Appl. Soft Comput., vol. 12, no. 10, pp. 3291–3299,
2012.
architectures are proposed for time series analysis. [24] P. C. Young, Recursive Estimation and Time-Series Analysis:
An Introduction. Berlin, Germany: Springer, 2012.
A PPENDIX [25] D. T. Mirikitani and N. Nikolaev, “Recursive Bayesian recurrent neural
networks for time-series modeling,” IEEE Trans. Neural Netw., vol. 21,
See Table V. no. 2, pp. 262–274, Feb. 2010.
[26] H. Liu, H.-Q. Tian, and Y.-F. Li, “Comparison of two new ARIMA-
R EFERENCES ANN and ARIMA-Kalman hybrid methods for wind speed prediction,”
Appl. Energy, vol. 98, pp. 415–424, Oct. 2012.
[1] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time [27] J. Zhao, Z. Y. Han, and W. Pedrycz, “Granular model of long-term
Series Analysis: Forecasting and Control. Hoboken, NJ, USA: Wiley, prediction for energy system in steel industry,” IEEE Trans. Cybern.,
2015. vol. 46, no. 2, pp. 388–400, Jul. 2016.
[2] P. J. Brockwell, R. A. Davis, and M. V. Calder, Introduction to Time [28] Z. Y. Han, J. Zhao, Q. Liu, and W. Wang, “Granular-computing
Series and Forecasting. New York, NY, USA: Springer, 2002. based hybrid collaborative fuzzy clustering for long-term prediction
[3] C. Chatfield, Time-Series Forecasting. London, U.K.: Chapman & Hall, of multiple gas holders levels,” Inf. Sci., vol. 330, pp. 175–185,
2000. Feb. 2016.
[4] C. Liu, S. C. H. Hoi, and P. Zhao, “Online ARIMA algorithms for [29] Z. Han, J. Zhao, W. Wang, and Y. Liu, “A two-stage method for
time series prediction,” in Proc. 13th AAAI Conf. Artif. Intell., 2016, predicting and scheduling energy in an oxygen/nitrogen system of the
pp. 1–7. steel industry,” Control. Eng. Pract., vol. 52, pp. 35–45, Jul. 2016.
[5] A. A. Adebiyi, A. O. Adewumi, and C. K. Ayo, “Comparison of [30] A. Pankratz, Forecasting With Univariate Box-Jenkins Models: Con-
ARIMA and artificial neural networks models for stock price predic- cepts and Cases. Hoboken, NJ, USA: Wiley, 2009.
tion,” J. Appl. Math., vol. 2014, Mar. 2014, Art. no. 614342. [31] M. Khashei and M. Bijari, “A novel hybridization of artificial neural
[6] X. Wu and Y. Wang, “Extended and unscented Kalman filtering based networks and ARIMA models for time series forecasting,” Appl. Soft
feedforward neural networks for time series prediction,” Appl. Math. Comput., vol. 11, no. 2, pp. 2664–2675, Mar. 2011.
Model., vol. 36, no. 3, pp. 1123–1131, 2012. [32] E. Cadenas, W. Rivera, R. Campos-Amezcua, and C. Heard, “Wind
[7] T. W. Joo and S. B. Kim, “Time series forecasting based on wavelet speed prediction using a univariate ARIMA model and a multivariate
filtering,” Expert Syst. Appl., vol. 42, no. 8, pp. 3868–3874, 2015. NARX model,” Energies, vol. 9, no. 2, p. 109, 2016.
[8] Z. Han, Y. Liu, J. Zhao, and W. Wang, “Real time prediction for [33] C. Yuan, S. Liu, and Z. Fang, “Comparison of China’s primary
converter gas tank levels based on multi-output least square support energy consumption forecasting by using ARIMA (the autoregressive
vector regressor,” Control Eng. Pract., vol. 20, no. 12, pp. 1400–1409, integrated moving average) model and GM (1,1) model,” Energy,
Dec. 2012. vol. 100, pp. 384–390, Apr. 2016.
[9] Y. Bengio and Y. LeCun, “Scaling learning algorithms towards AI,” [34] S. Johansen, “Estimation and hypothesis testing of cointegration vectors
Large-Scale Kernel Mach., vol. 34, no. 5, pp. 1–41, 2007. in Gaussian vector autoregressive models,” Econometrica, vol. 59,
[10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 6, pp. 1551–1580, 1991.
no. 7553, p. 436, 2015. [35] A. Y. Ng and M. I. Jordan, “On discriminative vs. generative classifiers:
[11] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A comparison of logistic regression and naive Bayes,” in Proc. Adv.
A technical tutorial on the state of the Art,” IEEE Geosci. Remote Sens. Neural Inf. Process. Syst., 2002, pp. 841–848.
Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016. [36] V. N. Vapnik, The Nature of Statistical Learning Theory. New York,
[12] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive NY, USA: Springer, 1995.
review and list of resources,” IEEE Geosci. Remote Sens. Mag., vol. 5, [37] F. E. Tay and L. Cao, “Application of support vector machines in
no. 4, pp. 8–36, Dec. 2017. financial time series forecasting,” Omega, vol. 29, no. 4, pp. 309–317,
[13] J. C. B. Gamboa, “Deep learning for time-series analysis,” Aug. 2001.
2017, arXiv:1701.01887. [Online]. Available: https://arxiv.org/abs/ [38] M. das Chagas Moura, E. Zio, I. D. Lins, and E. Droguett, “Failure
1701.01889 and reliability prediction by support vector machines regression of time
[14] T. V. Gestel et al., “Financial time series prediction using least squares series data,” Rel. Eng. Syst. Saf., vol. 96, no. 11, pp. 1527–1534, 2011.
support vector machines within the evidence framework,” IEEE Trans. [39] A. Mellit, A. M. Pavan, and M. Benghanem, “Least squares support
Neural Netw., vol. 12, no. 4, pp. 809–821, Jul. 2001. vector machine for short-term prediction of meteorological time series,”
[15] B. Doucoure, K. Agbossou, and A. Cardenas, “Time series prediction Theor. Appl. Climatol., vol. 111, nos. 1–2, pp. 297–307, 2013.
using artificial wavelet neural network and multi-resolution analysis: [40] N. I. Sapankevych and R. Sankar, “Time series prediction using support
Application to wind speed data,” Renew. Energy, vol. 92, pp. 202–211, vector machines: A survey,” IEEE Comput. Intell. Mag., vol. 4, no. 2,
Jul. 2016. pp. 24–38, May 2009.
[16] X. An, D. Jiang, M. Zhao, and C. Liu, “Short-term prediction of wind [41] T. Quan, X. Liu, and Q. Liu, “Weighted least squares support vector
power using EMD and chaotic theory,” Commun. Nonlinear Sci. Numer. machine local region method for nonlinear time series prediction,”
Simul., vol. 17, no. 2, pp. 1036–1042, 2012. Appl. Soft Comput., vol. 10, no. 2, pp. 562–566, Mar. 2010.
Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7844 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

[42] J. C. Platt, “12 fast training of support vector machines using sequen- [66] N. Chouikhi, B. Ammar, N. Rokbani, and A. M. Alimi, “PSO-based
tial minimal optimization,” in Proc. Adv. Kernel Methods, 1999, analysis of echo state network parameters for time series forecasting,”
pp. 185–208. Appl. Soft Comput., vol. 55, pp. 211–225, Jun. 2017.
[43] J. A. K. Suykens and J. Vandewalle, “Least squares support vector [67] T. N. Wiesel and D. H. Hubel, “Receptive fields of single neurones in
machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, the cat’s striate cortex,” J. Physiol., vol. 148, no. 3, pp. 574–591, 1959.
Jun. 1999. [68] Y. LeCun and Y. Bengio, “Convolutional networks for images, speech,
[44] X. Wang, J. Wu, C. Liu, S. Wang, and W. Niu, “A hybrid model based and time series,” in The Handbook of Brain Theory and Neural
on singular Spectrum analysis and support vector machines regression Networks, vol. 3361, no. 10. Cambridge, MA, USA: MIT Press, 1995.
for failure time series prediction,” Qual. Rel. Eng. Int., vol. 32, no. 8,
[69] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE
pp. 2717–2738, 2016.
Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.
[45] Y. Zhou, F. J. Chang, and L. C. Chang, “Multi-output support vector
machine for regional multi-step-ahead PM2.5 forecasting,” Sci. Total [70] H. Lee, R. Grosse, R. Ranganath, and A. N. Ng, “Convolutional deep
Environ., vol. 651, pp. 230–240, Feb. 2019. belief networks for scalable unsupervised learning of hierarchical
[46] A. Mellit and A. M. Pavan, “A 24-h forecast of solar irradiance using representations,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009,
artificial neural network: Application for performance prediction of a pp. 609–616.
grid-connected PV plant at Trieste, Italy,” Sol. Energy, vol. 84, no. 5, [71] S. Hershey et al., “CNN architectures for large-scale audio
pp. 807–821, 2010. classification,” in Proc. IEEE Int. Conf. Acoust., Speech Signal
[47] G. Lachtermacher and J. D. Fuller, “Back propagation in time-series Process. (ICASSP), Mar. 2017, pp. 131–135.
forecasting,” J. Forecast., vol. 14, no. 4, pp. 381–393, 1995. [72] A. Oord, S. Dieleman, and H. Zen, “WaveNet: A generative
[48] L. Wang, Y. Zeng, and T. Chen, “Back propagation neural network with model for raw audio,” 2016, arXiv:1609.03499. [Online]. Available:
adaptive differential evolution algorithm for time series forecasting,” https://arxiv.org/abs/1609.03499
Expert Syst. Appl., vol. 42, no. 2, pp. 855–863, 2015. [73] Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. L. Zhao, “Time series
[49] D. S. K. Karunasinghe and S. Y. Liong, “Chaotic time series prediction classification using multi-channels deep convolutional neural
with a global model: Artificial neural network,” J. Hydrol., vol. 323, networks,” in Proc. Int. Conf. Web-Age Inf. Manage. Cham,
nos. 1–4, pp. 92–105, 2006. Switzerland: Springer, 2014, pp. 298–310.
[50] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy [74] A. Borovykh, S. Bohte, and C. W. Oosterlee, “Conditional time series
inference system and its application for time-series prediction,” IEEE forecasting with convolutional neural networks,” 2017, arXiv:1703.
Trans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Feb. 2002. 04691. [Online]. Available: https://arxiv.org/abs/1703.04691
[51] F. S. Wong, “Time series forecasting using backpropagation neural
[75] J. Yang, M. N. Nguyen, P. P. San, X. Li, and S. Krishnaswamy, “Deep
networks,” Neurocomputing, vol. 2, no. 4, pp. 147–159, 1991.
convolutional neural networks on multichannel time series for human
[52] K. Chakraborty, K. Mehrotra, C. K. Mohan, and S. Ranka, “Forecasting
activity recognition,” in Proc. IJCAI, vol. 15, 2015, pp. 3995–4001.
the behavior of multivariate time series using neural networks,” Neural
Netw., vol. 5, no. 6, pp. 961–970, 1992. [76] S. Hoermann, M. Bach, and K. Dietmayer, “Dynamic occupancy grid
[53] E. S. Chng, S. Chen, and B. Mulgrew, “Gradient radial basis function prediction for urban autonomous driving: A deep learning approach
networks for nonlinear and nonstationary time series prediction,” IEEE with fully automatic labeling,” in Proc. IEEE Int. Conf. Robot. Autom.
Trans. Neural Netw., vol. 7, no. 1, pp. 190–194, Jan. 1996. (ICRA), May 2018, pp. 2056–2063.
[54] Z. Ramedani, M. Omid, A. Keyhani, S. Shamshirband, and B. Khosh- [77] X. Ding, Y. Zhang, T. Liu, and J. Duan, “Deep learning for event-driven
nevisan, “Potential of radial basis function based support vector regres- stock prediction,” in Proc. IJCAI, 2015, pp. 2327–2333.
[78] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual
sion for global solar radiation prediction,” Renew. Sustain. Energy Rev.,
networks for citywide crowd flows prediction,” in Proc. AAAI, 2017,
vol. 39, pp. 1005–1011, Nov. 2014.
pp. 1655–1661.
[55] H. Leung, T. Lo, and S. Wang, “Prediction of noisy chaotic time series [79] H. Z. Wang, G.-Q. Li, G.-B. Wang, J.-C. Peng, H. Jiang, and Y.-T.
using an optimal radial basis function neural network,” IEEE Trans. Liu, “Deep learning based ensemble approach for probabilistic wind
Neural Netw., vol. 12, no. 5, pp. 1163–1172, Sep. 2001. power forecasting,” Appl. Energy, vol. 188, pp. 56–70, Feb. 2017.
[56] G. Sideratos and N. D. Hatziargyriou, “Probabilistic wind power [80] J. You, X. Li, M. Low, D. Lobell, and S. Ermon, “Deep Gaussian
forecasting using radial basis function neural networks,” IEEE Trans. process for crop yield prediction based on remote sensing data,” in
Power Syst., vol. 27, no. 4, pp. 1788–1796, Nov. 2012. Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 1–7.
[57] C. Harpham and C. W. Dawson, “The effect of different basis functions [81] I. Wallach, M. Dzamba, and A. Heifets, “AtomNet: A deep
on a radial basis function network for time series prediction: A com- convolutional neural network for bioactivity prediction in structure-
parative study,” Neurocomputing, vol. 69, nos. 16–18, pp. 2161–2170, based drug discovery,” 2015, arXiv:1510.02855. [Online]. Available:
2006. https://arxiv.org/abs/1510.02855
[58] R. Chandra, “Competition and collaboration in cooperative coevolution [82] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning traffic as
of Elman recurrent neural networks for time-series prediction,” IEEE images: A deep convolutional neural network for large-scale transporta-
Trans. Neural Netw. Learn. Syst., vol. 26, no. 12, pp. 3123–3136, tion network speed prediction,” Sensors, vol. 17, no. 4, p. 818, 2017.
Dec. 2015. [83] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[59] R. Chandra and M. Zhang, “Cooperative coevolution of Elman recur- with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
rent neural networks for chaotic time series prediction,” Neurocomput- Process. Syst. (NIPS), 2012, pp. 1097–1105.
ing, vol. 86, pp. 116–123, Jun. 2012. [84] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to
[60] S. Anbazhagan and N. Kumarappan, “Day-ahead deregulated electricity forget: Continual prediction with LSTM,” Istituto Dalle Molle
market price forecasting using recurrent neural network,” IEEE Syst. di Studi sull’Intelligenza Artificiale, Manno, Switzerland, Tech.
J., vol. 7, no. 4, pp. 866–872, Dec. 2013. Rep. IDSIA-01-99, 1999.
[61] E. Egrioglu, U. Yolcu, C. H. Aladag, and E. Bas, “Recurrent mul- [85] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
tiplicative neuron model artificial neural network for non-linear time Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[86] A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech recognition
series forecasting,” Neural Process. Lett., vol. 41, no. 2, pp. 249–258,
with deep bidirectional LSTM,” in Proc. IEEE Workshop Autom.
2015.
Speech Recognit. Understand. (ASRU), Dec. 2013, pp. 273–278.
[62] E. Guresen, G. Kayakutlu, and T. U. Daim, “Using artificial neural [87] K. Greff, R. K. Srivastava, J. Koutnìk, B. R. Steunebrink, and
network models in stock market index prediction,” Expert Syst. Appl., J. Schmidhuber, “LSTM: A search space odyssey,” IEEE Trans.
vol. 38, no. 8, pp. 10389–10397, 2011. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017.
[63] S.-X. Lun, X.-S. Yao, H.-Y. Qi, and H.-F. Hu, “A novel model [88] Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diag-
of leaky integrator echo state network for time-series prediction,” nose with LSTM recurrent neural networks,” 2015, arXiv:1511.03677.
Neurocomputing, vol. 159, pp. 58–66, Jul. 2015. [Online]. Available: https://arxiv.org/abs/1511.03677
[64] L. Wang, Z. Wang, and S. Liu, “An effective multivariate time series [89] R. Fu, Z. Zhang, and L. Li, “Using LSTM and GRU neural network
classification approach using echo state network and adaptive differ- methods for traffic flow prediction,” in Proc. Youth Acad. Annu. Conf.
ential evolution algorithm,” Expert Syst. Appl., vol. 43, pp. 237–249, Chin. Assoc. Automat. (YAC), Nov. 2016, pp. 324–328.
Jan. 2016. [90] P. Filonov, A. Lavrentyev, and A. Vorontsov, “Multivariate industrial
[65] D. Li, M. Han, and J. Wang, “Chaotic time series prediction based on time series with cyber-attack simulation: Fault detection using
a novel robust echo state network,” IEEE Trans. Neural Netw. Learn. an LSTM-based predictive data model,” 2016, arXiv:1612.06676.
Syst., vol. 23, no. 5, pp. 787–799, May 2012. [Online]. Available: https://arxiv.org/abs/1612.06676

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7845

[91] K. Ma, H. Leung, E. Jalilian, and D. Huang, “Fiber-optic acoustic- [116] Q. Yang, Y. Zhou, Y. Yu, J. Yuan, X. Xing, and S. Du, “Multi-step-
based disturbance prediction in pipelines using deep learning,” IEEE ahead host load prediction using autoencoder and echo state networks
Sensors Lett., vol. 1, no. 6, Dec. 2017, Art. no. 6001404. in cloud computing,” J. Supercomput., vol. 71, no. 8, pp. 3037–3053,
[92] Y. Bengio and F. Gingras, “Recurrent neural networks for missing 2015.
or asynchronous data,” Adv. Neural Inf.Processing Syst., vol. 1996, [117] C. Li, Z. Ding, D. Zhao, J. Yi, and G. Zhang, “Building energy
pp. 395–401. consumption prediction: An extreme deep learning approach,”
[93] D. Neil, M. Pfeiffer, and S. C. Liu, “Phased LSTM: Accelerating Energies, vol. 10, no. 10, p. 1525, 2017.
recurrent network training for long or event-based sequences,” in Proc. [118] L. Deng and D. Yu, “Deep learning: Methods and applications,”
Adv. Neural Inf. Process. Syst., 2016, pp. 3882–3890. Found. Trends Signal Process., vol. 7, nos. 3–4, pp. 197–387, 2014.
[94] F. A. Gers and J. Schmidhuber, “Recurrent nets that time and count,” [119] C. Sun, M. Ma, Z. B. Zhao, and X. Chen, “Sparse deep stacking
in Proc. IEEE-INNS-ENNS Int. Joint Conf. Neural Netw. (IJCNN), network for fault diagnosis of motor,” IEEE Trans. Ind. Informat.,
vol. 3, Jul. 2000, pp. 189–194. vol. 14, no. 7, pp. 3261–3270, Jul. 2018.
[95] R. Jozefowicz, W. Zaremba, and I. Sutskever, “An empirical exploration [120] H. Palangi, R. Ward, and L. Deng, “Convolutional deep stacking
of recurrent network architectures,” in Proc. Int. Conf. Mach. Learn., networks for distributed compressive sensing,” Signal Process.,
2015, pp. 2342–2350. vol. 131, pp. 181–189, Feb. 2017.
[96] K. Cho et al., “Learning phrase representations using RNN encoder- [121] J. Li, H. Chang, and J. Yang, “Sparse deep stacking network for image
decoder for statistical machine translation,” 2014, arXiv:1406.1078. classification,” in Proc. 29th AAAI Conf. Artif. Intell., 2015, pp. 1–7.
[Online]. Available: https://arxiv.org/abs/1406.1078 [122] Z. Q. Wang and D. L. Wang, “Recurrent deep stacking networks
[97] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, “A for supervised speech separation,” in Proc. IEEE Int. Conf. Acoust.,
dual-stage attention-based recurrent neural network for time series Speech Signal Process. (ICASSP), Mar. 2017, pp. 71–75.
prediction,” 2017, arXiv:1704.02971. [Online]. Available: https://arxiv. [123] L. Deng, B. Hutchinson, and D. Yu, “Parallel training for deep
org/abs/1704.02971 stacking networks,” in Proc. 13th Annu. Conf. Int. Speech Commun.
[98] Y. Liang, S. Ke, J. Zhang, X. Yi, and Y. Zheng, “GeoMAN: Multi-level Assoc., 2012, pp. 1–4.
attention networks for geo-sensory time series prediction,” in Proc. [124] B. Hutchinson, L. Deng, and D. Yu, “Tensor deep stacking networks,”
IJCAI, 2018, pp. 3428–3434. IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1944–1957,
[99] K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio, “On Dec. 2013.
the properties of neural machine translation: Encoder-decoder [125] P. S. Huang, L. Deng, M. Hasegawa-Johnson, and X. He, “Random
approaches,” 2014, arXiv:1409.1259. [Online]. Available: https://arxiv. features for kernel deep convex network,” in Proc. IEEE Int. Conf.
org/abs/1409.1259 Acoust., Speech Signal Process., May 2013, pp. 3143–3147.
[100] A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional LSTM [126] W. Cohen and R. V. de Carvalho, “Stacked sequential learning,” in
networks for improved phoneme classification and recognition,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), 2005, pp. 671–676.
Proc. Int. Conf. Artif. Neural Netw. Berlin, Germany: Springer, 2005, [127] M. Das and S. K. Ghosh, “Deep-STEP: A deep learning approach
pp. 799–804. for spatiotemporal prediction of remote sensing data,” IEEE Geosci.
[101] A. Zhao, L. Qi, and J. Li, “LSTM for diagnosis of neurodegenerative Remote Sens. Lett., vol. 13, no. 12, pp. 1984–1988, Dec. 2016.
diseases using gait data,” in Proc. 9th Int. Conf. Graph. Image Process. [128] M. Das and S. K. Ghosh, “A deep-learning-based forecasting ensemble
(ICGIP), vol. 10615, 2018, Art. no. 106155B. to predict missing data for remote sensing analysis,” IEEE J. Sel. Topics
[102] D. Zhang, G. Lindholm, and H. Ratnaweera, “Use long short-term Appl. Earth Observ. Remote Sens., vol. 10, no. 12, pp. 5228–5236,
memory to enhance Internet of Things for combined sewer overflow Dec. 2017.
monitoring,” J. Hydrol., vol. 556, pp. 409–418, Jan. 2018. [129] P. Romeu, F. Zamora-Martínez, and P. Botella–Rocamora, “Stacked
[103] J. Cowton, I. Kyriazakis, T. Plötz, and J. Bacardit, “A combined deep denoising auto-encoders for short-term time series forecasting,”
learning gru-autoencoder for the early detection of respiratory disease in Artificial Neural Networks. Cham, Switzerland: Springer, 2015,
in pigs using multiple environmental sensors,” Sensors, vol. 18, no. 8, pp. 463–486.
p. 2521, 2018. [130] Y. Zhao, J. Li, and L. Yu, “A deep learning ensemble approach for crude
[104] A. X. Wang, C. Tran, N. Desai, D. Lobell, and S. Ermon, “Deep oil price forecasting,” Energy Econ., vol. 66, pp. 9–16, Aug. 2017.
transfer learning for crop yield prediction with remote sensing data,” [131] G. Song and Q. Dai, “A novel double deep ELMs ensemble system
in Proc. 1st ACM SIGCAS Conf. Comput. Sustain. Soc., 2018, p. 50. for time series forecasting,” Knowl.-Based Syst., vol. 134, pp. 31–49,
[105] X. Jia, A. Khandelwal, and G. Nayak, “Predict land covers with Oct. 2017.
transition modeling and incremental learning,” in Proc. SIAM Int. [132] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for
Conf. Data Mining, 2017, pp. 171–179. Machine Learning. Cambridge, MA, USA: MIT Press, 2006.
[106] X. Jia et al., “Incremental dual-memory LSTM in land cover [133] J. Hu and J. Wang, “Short-term wind speed prediction using empirical
prediction,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery wavelet transform and Gaussian process regression,” Energy, vol. 93,
Data Mining, 2017, pp. 867–876. pp. 1456–1466, Dec. 2015.
[107] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, “A survey on deep learning [134] R. Palm, “Multiple-step-ahead prediction in control systems with
for big data,” Inf. Fusion, vol. 42, pp. 146–157, Jul. 2018. Gaussian process models and TS-fuzzy models,” Eng. Appl. Artif.
[108] J. Zheng and L. Peng, “An autoencoder-based image reconstruction for Intell., vol. 20, no. 8, pp. 1023–1035, 2007.
electrical capacitance tomography,” IEEE Sensors J., vol. 18, no. 13, [135] W. Yan, H. Qiu, and Y. Xue, “Gaussian process for long-term time-
pp. 5464–5474, Jul. 2018. series forecasting,” in Proc. IEEE Int. Joint Conf. Neural Netw.,
[109] J. Snoek, R. P. Adams, and H. Larochelle, “Nonparametric guidance Jun. 2009, pp. 3420–3427.
of autoencoder representations using label information,” J. Mach. [136] D. Lee and R. Baldick, “Short-term wind power ensemble prediction
Learn. Res., vol. 13, pp. 2567–2588, Sep. 2012. based on Gaussian processes and neural networks,” IEEE Trans. Smart
[110] A. P. S. Chandar et al., “An autoencoder approach to learning bilingual Grid, vol. 5, no. 1, pp. 501–510, Jan. 2014.
word representations,” in Proc. Adv. Neural Inf. Process. Syst., 2014, [137] S. Brahim-Belhouari and A. Bermak, “Gaussian process for
pp. 1853–1861. nonstationary time series prediction,” Comput. Statist. Data Anal.,
[111] Z. Shao, L. Zhang, and L. Wang, “Stacked sparse autoencoder modeling vol. 47, no. 4, pp. 705–712, 2004.
using the synergy of airborne LiDAR and satellite optical and SAR data [138] A. Girard, C. E. Rasmussen, J. Q. Candela, and R. Murray-Smith,
to map forest above-ground biomass,” IEEE J. Sel. Topics Appl. Earth “Gaussian process priors with uncertain inputs application to multiple-
Observ. Remote Sens., vol. 10, no. 12, pp. 5569–5582, Dec. 2017. step ahead time series forecasting,” in Proc. Adv. Neural Inf. Process.
[112] J. Geng, H. Wang, J. Fan, and X. Ma, “Deep supervised and contractive Syst., 2003, pp. 545–552.
neural network for SAR image classification,” IEEE Trans. Geosci. [139] A. Damianou and N. D. Lawrence, “Semi-described and semi-
Remote Sens., vol. 55, no. 4, pp. 2442–2459, Apr. 2017. supervised learning with Gaussian processes,” 2015, arXiv:1509.01168.
[113] X. Zhang, G. Chen, W. Wang, Q. Wang, and F. Dai, “Object-based land- [Online]. Available: https://arxiv.org/abs/1509.01168
cover supervised classification for very-high-resolution UAV images [140] M. Lázaro-Gredilla and M. K. Titsias, “Variational heteroscedastic
using stacked denoising autoencoders,” IEEE J. Sel. Topics Appl. Gaussian process regression,” in Proc. Int. Conf. Mach. Learn., 2011,
Earth Observ. Remote Sens., vol. 10, no. 7, pp. 3373–3385, Jul. 2017. pp. 841–848.
[114] P. Zhang et al., “Multimodal fusion for sensor data using stacked [141] P. Kou, D. Liang, L. Gao, and J. Lou, “Probabilistic electricity price
autoencoders,” in Proc. IEEE 10th Int. Conf. Intell. Sensors, Sensor forecasting with variational heteroscedastic Gaussian process and active
Netw. Inf. Process. (ISSNIP), Apr. 2015, pp. 1–2. learning,” Energy Convers. Manage., vol. 89, pp. 298–308, Jan. 2015.
[115] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow [142] J. Han, X.-P. Zhang, and F. Wang, “Gaussian process regression
prediction with big data: A deep learning approach,” IEEE Trans. stochastic volatility model for financial time series,” IEEE J. Sel.
Intell. Transp. Syst., vol. 16, no. 2, pp. 865–873, Apr. 2015. Topics Signal Process., vol. 10, no. 6, pp. 1015–1028, Sep. 2016.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7846 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

[143] D. Heckerman, “A tutorial on learning with Bayesian networks,” in [167] S. Dasgupta and T. Osogami, “Nonlinear dynamic Boltzmann machines
Learning in Graphical Models. Dordrecht, The Netherlands: Springer, for time-series prediction,” in Proc. AAAI, 2017, pp. 1833–1839.
1998, pp. 301–354. [168] N. Le Roux and Y. Bengio, “Representational power of restricted
[144] K. P. Murphy and S. Russell, “Dynamic Bayesian networks: Boltzmann machines and deep belief networks,” Neural Comput.,
Representation, inference and learning,” Ph.D. dissertation, Dept. vol. 20, no. 6, pp. 1631–1649, Jun. 2008.
Comput. Sci., Graduate Division, Univ. California, Berkeley, Berkeley, [169] T. Kuremoto, M. Obayashi, K. Kobayashi, T. Hirata, and S. Mabu,
CA, USA, 2002. “Forecast chaotic time series data by DBNs,” in Proc. 7th Int. Congr.
[145] Q. Xiao, C. Chaoqin, and Z. Li, “Time series prediction using dynamic Image Signal Process. (CISP), Oct. 2014, pp. 1130–1135.
Bayesian network,” Optik, vol. 135, pp. 98–103, Apr. 2017. [170] T. Kuremoto, S. Kimura, K. Kobayashi, and M. Obayashi, “Time series
[146] L. Chen, Y. Liu, J. Zhao, W. Wang, and Q. Liu, “Prediction intervals forecasting using a deep belief network with restricted Boltzmann
for industrial data with incomplete input using kernel-based dynamic machines,” Neurocomputing, vol. 137, pp. 47–56, Aug. 2014.
Bayesian networks,” Artif. Intell. Rev., vol. 46, no. 3, pp. 307–326, [171] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for
2016. traffic flow prediction: Deep belief networks with multitask learning,”
[147] M. Naili, M. Bourahla, M. Naili, and A. K. Tari, “Stability-based IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp. 2191–2201,
dynamic Bayesian network method for dynamic data mining,” Eng. Oct. 2014.
Appl. Artif. Intell., vol. 77, pp. 283–310, Jan. 2019. [172] H. Z. Wang, G. B. Wang, G. Q. Li, J. C. Peng, and Y. T. Liu,
[148] M. Zou and S. D. Conzen, “A new dynamic Bayesian network “Deep belief network based deterministic and probabilistic wind speed
(DBN) approach for identifying gene regulatory networks from time forecasting approach,” Appl. Energy, vol. 182, pp. 80–93, Nov. 2016.
course microarray data,” Bioinformatics, vol. 21, no. 1, pp. 71–79, [173] A. Dedinec, S. Filiposka, A. Dedinec, and L. Kocarev, “Deep belief
Aug. 2004. network based electricity load forecasting: An analysis of macedonian
[149] K. Kourou, C. Papaloukas, and D. I. Fotiadis, “Integration of pathway case,” Energy, vol. 115, pp. 1688–1700, Nov. 2016.
knowledge and dynamic Bayesian networks for the prediction of oral [174] J. Chen, Q. Jin, and J. Chao, “Design of deep belief networks for
cancer recurrence,” IEEE J. Biomed. Health Inform., vol. 21, no. 2, short-term prediction of drought index using data in the Huaihe river
pp. 320–327, Dec. 2017. basin,” Math. Problems Eng., vol. 2012, Feb. 2012, Art. no. 235929.
[150] M. Das and S. K. Ghosh, “Spatio-temporal prediction of meteorological [175] P. Jiang, C. Chen, and X. Liu, “Time series prediction for evolutions
time series data: An approach based on spatial Bayesian network of complex systems: A deep learning approach,” in Proc. IEEE Int.
(SpaBN),” in Proc. Int. Conf. Pattern Recognit. Mach. Intell. Cham, Conf. Control Robot. Eng. (ICCRE), Apr. 2016, pp. 1–6.
Switzerland: Springer, 2017, pp. 615–622. [176] R. Soua, A. Koesdwiady, and F. Karray, “Big-data-generated traffic
[151] H. Guo, X. Liu, and Z. Sun, “Multivariate time series prediction using flow prediction using deep learning and dempster-shafer theory,”
a hybridization of VARMA models and Bayesian networks,” J. Appl. in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2016,
Statist., vol. 43, no. 16, pp. 2897–2909, 2016. pp. 3195–3202.
[152] N. Lethanh, K. Kaito, and K. Kobayashi, “Infrastructure deterioration [177] X. Qiu, Y. Ren, P. N. Suganthan, and G. A. J. Amaratunga, “Empirical
prediction with a Poisson hidden Markov model on time series data,” mode decomposition based ensemble deep learning for load demand
J. Infrastruct. Syst., vol. 21, no. 3, 2014, Art. no. 04014051. time series forecasting,” Appl. Soft Comput., vol. 54, pp. 246–255,
[153] S. Bhardwaj, S. Srivastava, S. Vaishnavi, and J. R. P. Gupta, “Chaotic May 2017.
time series prediction using combination of hidden Markov model and [178] X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga,
neural nets,” in Proc. IEEE Int. Conf. Comput. Inf. Syst. Ind. Manage. “Ensemble deep learning for regression and time series forecasting,”
Appl. (CISIM), Oct. 2010, pp. 585–589. in Proc. IEEE Symp. Comput. Intell. Ensemble Learn. (CIEL),
[154] A. M. Mikaeil, B. Guo, X. Bai, and Z. Wang, “Primary user channel Dec. 2014, pp. 1–6.
state prediction based on time series and hidden Markov model,” [179] T. Hirata, T. Kuremoto, M. Obayashi, S. Mabu, and K. Kobayashi,
in Proc. 2nd IEEE Int. Conf. Syst. Inform. (ICSAI), Nov. 2014, “Time series prediction using DBN and ARIMA,” in Proc. Int. Conf.
pp. 866–870. Comput. Appl. Technol. (CCATS), Apr. 2015, pp. 24–29.
[155] K. Wakabayashi and T. Miura, “Data stream prediction using incre- [180] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv.
mental hidden Markov models,” in Proc. Int. Conf. Data Warehousing Neural Inf. Process. Syst., 2014, pp. 2672–2680.
Knowl. Discovery. Berlin, Germany: Springer, 2009, pp. 63–74. [181] I. Goodfellow, “NIPS 2016 tutorial: Generative adversarial
[156] H. Guo, W. Pedrycz, and X. Liu, “Hidden Markov models based networks,” 2016, arXiv:1701.00160. [Online]. Available: https://arxiv.
approaches to long-term prediction for granular time series,” IEEE org/abs/1701.00160
Trans. Fuzzy Syst., vol. 26, no. 5, pp. 2807–2817, Oct. 2018. [182] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”
[157] R. Hassan, B. Nath, and M. Kirley, “HMM based fuzzy model for time 2014, arXiv:1411.1784. [Online]. Available: https://arxiv.org/abs/
series prediction,” in Proc. IEEE Int. Conf. Fuzzy Syst., Jan. 2006, 1411.1784
pp. 2120–2126. [183] X. Chen, Y. Duan, and R. Houthooft, “InfoGAN: Interpretable repre-
[158] M. Dong, D. Yang, Y. Kuang, D. He, S. Erdal, and D. Kenski, “PM2.5 sentation learning by information maximizing generative adversarial
concentration prediction using hidden semi-Markov model-based times nets,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2172–2180.
series data mining,” Expert Syst. Appl., vol. 36, no. 5, pp. 9046–9055, [184] L. Yu, W. Zhang, J. Wang, J. Schulman, I. Sutskever, and P. Abbeel,
2009. “Seqgan: Sequence generative adversarial nets with policy gradient,”
[159] G. E. Hinton and T. J. Sejnowski, “Learning and relearning in in Proc. 21st AAAI Conf. Artif. Intell., 2017, pp. 2172–2180.
Boltzmann machines,” in Parallel Distributed Processing–Explorations [185] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
in the Microstructure of Cognition, vol. 1. Cambridge, MA, USA: sarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
MIT Press, 1986, pp. 282–317. [186] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley,
[160] N. Lu, T. Li, X. Ren, and H. Miao, “A deep learning scheme for “Least squares generative adversarial networks,” in Proc. IEEE Int.
motor imagery classification based on restricted Boltzmann machines,” Conf. Comput. Vis., Oct. 2017, pp. 2794–2802.
IEEE Trans. Neural Syst. Rehabil. Eng., vol. 25, no. 6, pp. 566–576, [187] X. Yi, E. Walia, and P. Babyn, “Generative adversarial network
Jun. 2017. in medical imaging: A review,” 2018, arXiv:1809.07294. [Online].
[161] I. Sutskever, G. E. Hinton, and G. W. Taylor, “The recurrent temporal Available: https://arxiv.org/abs/1809.07294
restricted Boltzmann machine,” in Proc. Adv. Neural Inf. Process. [188] X. Zhou, Z. Pan, G. Hu, S. Tang, and C. Zhao, “Stock market
Syst., 2009, pp. 1601–1608. prediction on high-frequency data using generative adversarial nets,”
[162] V. Mnih, H. Larochelle, and G. E. Hinton, “Conditional restricted Math. Problems Eng., vol. 2018, 2018, Art. no. 4907423.
Boltzmann machines for structured output prediction,” 2012, [189] C. Tian, C. Li, G. Zhang, and Y. Lv, “Data driven parallel prediction
arXiv:1202.3748. [Online]. Available: https://arxiv.org/abs/1202.3748 of building energy consumption using generative adversarial nets,”
[163] K. Sohn, G. Zhou, C. Lee, and H. Lee, “Learning and selecting Energy Buildings, vol. 186, pp. 230–243, Mar. 2019.
features jointly with point-wise gated Boltzmann machines,” in Proc. [190] Y. Lv, Y. Chen, L. Li, and F. Wang, “Generative adversarial networks
Int. Conf. Mach. Learn., 2013, pp. 217–225. for parallel transportation systems,” IEEE Intell. Transp. Syst. Mag.,
[164] T. Kuremoto, S. Kimura, and K. Kobayashi, “Time series forecasting vol. 10, no. 3, pp. 4–10, 2018.
using restricted Boltzmann machine,” in Proc. Int. Conf. Intell. [191] Y. Liang, Z. Cui, Y. Tian, H. Chen, and Y. Wang, “A deep
Comput. Berlin, Germany: Springer, 2012, pp. 17–22. generative adversarial architecture for network-wide spatial-temporal
[165] T. Osogami, “Boltzmann machines for time-series,” 2017, arXiv:1708. traffic-state estimation,” Transp. Res. Rec., vol. 2672, pp. 87–105,
06004. [Online]. Available: https://arxiv.org/abs/1708.06004 Jan. 2018.
[166] C.-Y. Zhang, C. L. P. Chen, M. Gan, and L. Chen, “Predictive deep [192] C. Bowles et al., “GAN augmentation: Augmenting training data using
Boltzmann machine for multiperiod wind speed forecasting,” IEEE generative adversarial networks,” 2018, arXiv:1810.10863. [Online].
Trans. Sustain. Energy, vol. 6, no. 4, pp. 1416–1425, Oct. 2015. Available: https://arxiv.org/abs/1810.10863

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: REVIEW OF DEEP LEARNING MODELS FOR TIME SERIES PREDICTION 7847

[193] Y. K. Bang and C. H. Lee, “Fuzzy time series prediction using [218] T. Miller, P. Howe, and L. Sonenberg, “Explainable AI: Beware of
hierarchical clustering algorithms,” Expert Syst. Appl., vol. 38, no. 4, inmates running the asylum or: How I learnt to stop worrying and
pp. 4312–4325, 2011. love the social and behavioural sciences,” in Proc. IJCAI Workshop
[194] V. A. Gromov and E. A. Borisenko, “Predictive clustering on Explainable AI (XAI), 2017, p. 36.
non-successive observations for multi-step ahead chaotic time series [219] O. Biran and C. Cotton, “Explanation and justification in machine
prediction,” Neural Comput. Appl., vol. 26, no. 8, pp. 1827–1838, 2015. learning: A survey,” in Proc. IJCAI Workshop Explainable AI (XAI),
[195] T. Wang, Z. Han, J. Zhao, and W. Wang, “Adaptive granulation-based 2017, p. 8.
prediction for energy system of steel industry,” IEEE Trans. Cybern., [220] P. W. Koh and P. Liang, “Understanding black-box predictions via
vol. 48, no. 1, pp. 127–138, Nov. 2018. influence functions,” 2017, arXiv:1703.04730. [Online]. Available:
[196] W. Lu, J. Yang, X. Liu, and W. Pedrycz, “The modeling and prediction https://arxiv.org/abs/1703.04730
of time series based on synergy of high-order fuzzy cognitive map and [221] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep
fuzzy c-means clustering,” Knowl.-Based Syst., vol. 70, pp. 242–255, q-learning with model-based acceleration,” in Proc. Int. Conf. Mach.
Nov. 2014. Learn., 2016, pp. 2829–2838.
[197] C. Smith and D. Wunsch, “Time series prediction via two-step cluster- [222] C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong,
ing,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2015, pp. 1–4. “Caffeine: Towards uniformed representation and acceleration for
[198] T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” deep convolutional neural networks,” IEEE Trans. Comput.-Aided
IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967. Design Integr. Circuits Syst., to be published.
[199] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM-KNN: [223] T. Gokmen and Y. Vlasov, “Acceleration of deep neural network
Discriminative nearest neighbor classification for visual category training with resistive cross-point devices: Design considerations,”
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Frontiers Neurosci., vol. 10, p. 333, Jul. 2016.
vol. 2, Jun. 2006, pp. 2126–2136. [224] V. Mnih et al., “Human-level control through deep reinforcement
[200] J. Myung, D.-K. Kim, S.-Y. Kho, and C.-H. Park, “Travel time learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
prediction using k-nearest neighbor method with combined data from [225] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
vehicle detector system and automatic toll collection system,” Transp. with double Q-learning,” in Proc. AAAI, vol. 2, 2016, p. 5.
Res. Rec., vol. 2256, no. 1, pp. 51–59, 2011. [226] V. Mnih et al., “Playing atari with deep reinforcement learning,” 2013,
[201] Z. Xing, J. Pei, and S. Y. Philip, “Early prediction on time series: arXiv:1312.5602. [Online]. Available: https://arxiv.org/abs/1312.5602
A nearest neighbor approach,” in Proc. IJCAI, 2009, pp. 1297–1302. [227] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for
[202] L. Zhang, Q. Liu, W. Yang, N. Wei, and D. Dong, “An improved fast adaptation of deep networks,” 2017, arXiv:1703.03400. [Online].
k-nearest neighbor model for short-term traffic flow prediction,” Available: https://arxiv.org/abs/1703.03400
Procedia-Social Behav. Sci., vol. 96, pp. 653–662, Nov. 2013. [228] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap,
[203] F. H. Al-Qahtani and S. F. Crone, “Multivariate k-nearest neighbour “Meta-learning with memory-augmented neural networks,” in Proc.
regression for time series data—A novel algorithm for forecasting UK Int. Conf. Mach. Learn., 2016, pp. 1842–1850.
electricity demand,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), [229] A. Dosovitskiy and V. Koltun, “Learning to act by predicting the
Aug. 2013, pp. 1–8. future,” 2016, arXiv:1611.01779. [Online]. Available: https://arxiv.
[204] B. Yu, X. Song, F. Guan, Z. Yang, and B. Yao, “K-nearest neighbor org/abs/1611.01779
model for multiple-time-step prediction of short-term traffic condition,”
J. Transp. Eng., vol. 142, no. 6, 2016, Art. no. 04016018.
[205] Y. Bai, Z. Chen, J. Xie, and C. Li, “Daily reservoir inflow forecasting
using multiscale deep feature learning with hybrid models,” J. Hydrol., Zhongyang Han (M’18) received the B.S. degree
vol. 532, pp. 193–206, Jan. 2016. from the City Institute, Dalian University of Tech-
[206] H. Liu, X.-W. Mi, and Y.-F. Li, “Wind speed forecasting method nology, Dalian, China, and the Ph.D. degree in
based on deep learning strategy using empirical wavelet transform, engineering from the Dalian University of Tech-
long short term memory neural network and Elman neural network,” nology, in 2010 and 2016, respectively.
Energy Convers. Manage., vol. 156, pp. 498–514, Jan. 2018. He is currently an Associate Professor with
[207] N. Kussul, A. Shelestov, M. Lavreniuk, I. Butko, and S. Skakun, the School of Control Science and Engineer-
“Deep learning approach for large scale land cover mapping based on ing, Dalian University of Technology. His current
remote sensing data fusion,” in Proc. IEEE Int. Geosci. Remote Sens. research interests include computer-integrated
Symp. (IGARSS), Jul. 2016, pp. 198–201. manufacturing systems, artificial intelligence,
[208] Y. Wu and H. Tan, “Short-term traffic flow forecasting with data mining, and machine learning.
spatial-temporal correlation in a hybrid deep learning framework,”
2016, arXiv:1612.01022. [Online]. Available: https://arxiv.org/
abs/1612.01022
[209] A. Gensler, J. Henze, and B. Sick, “Deep learning for solar power Jun Zhao (M’10) received the B.S. degree in
forecasting—An approach using AutoEncoder and LSTM neural control theory from Dalian Jiaotong University,
networks,” in Proc. IEEE Int. Conf. Syst., Man, Cybern. (SMC), Dalian, China, and the Ph.D. degree in engi-
Oct. 2016, pp. 2858–2865. neering from the Dalian University of Technology,
[210] A. Grover, A. Kapoor, and E. Horvitz, “A deep hybrid model for Dalian, in 2003 and 2008, respectively.
weather forecasting,” Proc. 21th ACM SIGKDD Int. Conf. Knowl. He is currently a Professor with the School
Discovery Data Mining, 2015, pp. 379–386. of Control Science and Engineering, Dalian
[211] I. Colak, S. Sagiroglu, and M. Yesilbudak, “Data mining and wind University of Technology. His current research
power prediction: A literature review,” Renew. Energy, vol. 46, interests include industrial production schedul-
pp. 241–247, Oct. 2012. ing, computer-integrated manufacturing, intelli-
[212] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean gent optimization, and machine learning.
absolute error (MAE)?—Arguments against avoiding RMSE in the
literature,” Geosci. Model Develop., vol. 7, no. 3, pp. 1247–1250, 2014.
[213] S. P. Singh, S. Urooj, and A. Lay-Ekuakille, “Breast cancer detection
Henry Leung (F’15) was with the Department of
using PCPCET and ADEWNN: A geometric invariant approach to
National Defence (DND) of Canada as a Defense
medical X-ray image sensors,” IEEE Sensors J., vol. 16, no. 12,
Scientist. He is a Professor with the Department
pp. 4847–4855, Jun. 2016.
[214] J. Zhao and X. Yu, “Adaptive natural gradient learning algorithms for of Electrical and Computer Engineering, Univer-
Mackey–Glass chaotic time prediction,” Neurocomputing, vol. 157, sity of Calgary. His current research interests
pp. 41–45, Jun. 2015. include information fusion, machine learning, IoT,
[215] V. A. Gromov and A. N. Shulga, “Chaotic time series prediction with nonlinear dynamics, robotics, and signal and
employment of ant colony optimization,” Expert Syst. Appl., vol. 39, image processing. He is a Fellow of SPIE. He is
no. 9, pp. 8474–8478, 2012. an Associate Editor of the IEEE Circuits and Sys-
[216] T. Miller, “Explanation in artificial intelligence: Insights from the tems Magazine and the IEEE T RANSACTIONS ON
social sciences,” Artif. Intell., vol. 267, pp. 1–38, Feb. 2018. A EROSPACE AND E LECTRONIC S YSTEMS . He is
[217] A. Adadi and M. Berrada, “Peeking inside the black-box: A survey the Topic Editor on Robotic Sensors of the International Journal of
on explainable artificial intelligence (XAI),” IEEE Access, vol. 6, Advanced Robotic Systems. He is an Editor of the Springer book series
pp. 52138–52160, 2018. on Information Fusion and Data Science.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.
7848 IEEE SENSORS JOURNAL, VOL. 21, NO. 6, MARCH 15, 2021

King Fai Ma received the B.S. (Hons.) degree Wei Wang (SM’13) received the B.S., M.S.,
in electrical engineering and the M.S. degree and Ph.D. degrees from Northeastern Univer-
in electrical engineering from the University of sity, Shenyang, China, in 1982, 1986, and 1988,
Calgary in 2017, where he is currently pursuing respectively, all in industrial automation.
the Ph.D. degree in electrical engineering with He was a Post-Doctoral Fellow with the Division
the Sensor Networks and Robotics Laboratory. of Engineering Cybernetics, Norwegian Science
His research interests include big data ana- and Technology University, Trondheim, Norway,
lytics, deep learning, applied signal processing, from 1990 to 1992, a Professor and Vice Director
computer vision, and pipeline monitoring. of the National Engineering Research Center
of Metallurgical Automation of China, Beijing,
China, from 1995 to 1999, and a Research Fellow
with the Department of Engineering Science, University of Oxford,
Oxford, U.K., from 1998 to 1999. He is currently a Professor with the
School of Control Sciences and Engineering, Dalian University of Tech-
nology, Dalian, China. His current research interests include adaptive
controls, computer-integrated manufacturing, and computer controls of
industrial processes.

Authorized licensed use limited to: Beijing Jiaotong University. Downloaded on September 02,2023 at 13:03:13 UTC from IEEE Xplore. Restrictions apply.

You might also like