Forecasting of Nonlinear Time Series Using Artificial Neural Network
Forecasting of Nonlinear Time Series Using Artificial Neural Network
com
ScienceDirect
Future Computing and Informatics Journal 3 (2018) 143e151
http://www.journals.elsevier.com/future-computing-and-informatics-journal/
TE
a
Computer Science Department, Institute of Statistical Studies and Research, Cairo University, Giza, Egypt
b
Computer Science Department, Faculty of Computers and Information, Cairo University, Giza, Egypt
Received 4 February 2017; revised 18 April 2017; accepted 18 June 2017
Available online 5 July 2017
Abstract
ICA
When forecasting time series, it is important to classify them according to linearity behavior; the linear time series remains at the forefront of
academic and applied research. It has often been found that simple linear time series models usually leave certain aspects of economic and
financial data unexplained. The dynamic behavior of most of the time series in our real life, with its autoregressive and inherited moving average
terms, pose the challenge to forecast nonlinear times series that contain inherited moving average terms using computational intelligence
methodologies such as neural networks. It is rare to find studies that concentrate on forecasting nonlinear times series that contain moving
average terms. In this study, we demonstrate that the common neural networks are not efficient for recognizing the behavior of nonlinear or
dynamic time series which has moving average terms and hence low forecasting capability. This leads to the importance of formulating new
models of neural networks such as Deep Learning neural networks with or without hybrid methodologies such as Fuzzy Logic.
© 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. All rights
reserved.
PL
(AR), Moving Averages (MA), ARMA and AR Integrated MA To address this case, some authors suggest using the neural
(ARIMA) [1,2], it has been found that in reality the systems network NARMA and the autoregressive neural network
often have unknown nonlinear structure [3]. To address this ARNN of high order [15,16]. However, in reviewing the
problem, several nonlinear models have been proposed, such relevant literature it was found that:
as the bilinear models, AR Conditional Heteroskedasticity
(ARCH) and its extensions, Smooth Transition AR (STAR), The theory of NARMA ( p,q) model considers that the
Nonlinear AR (NAR), wavelet networks and Artificial Neural process of data generation corresponds to a nonlinear
Networks (ANN) [1e7]. structure with both AR and MA components; which is
With regard to ANN, its theory is very broad, and it has done by ignoring the AR component (making p ¼ 0) to
been applied in modeling and forecasting data from different obtain a nonlinear model of moving averages (NLMA).
However, in the literature there are no studies that examine
* Corresponding author. the capability of forecasting of NARMA (0,q) when it is
E-mail addresses: [email protected] (A. Tealab), hehefny@ieee. applied in a nonlinear time series that presents an inherent
org (H. Hefny), [email protected] (A. Badr).
Peer review under responsibility of Faculty of Computers and Information
MA component.
Technology, Future University in Egypt.
http://dx.doi.org/10.1016/j.fcij.2017.06.001
2314-7288/© 2017 Faculty of Computers and Information Technology, Future University in Egypt. Production and hosting by Elsevier B.V. All rights reserved.
144 A. Tealab et al. / Future Computing and Informatics Journal 3 (2018) 143e151
There is no evidence reported that a nonlinear MA model Different to the nonlinear autoregressive model (NAR), the
can be approximated by a nonlinear infinite order AR NLMA model has been little explored, both empirically and
model, like what happens in the case of linear models theoretically. This is due, in part to the difficulty to establish
when they meet certain invertibility conditions. the invertibility property model [21]; that property refers to the
possibility of rebuilding innovations 3 t from the observations
The objective of this paper is to answer the following yt, assuming that the true model is known. However, D. &
research questions in order to clarify the above mentioned Wang Chen [22] reached that the NLMA model can become
gaps: locally invertible; that can be done by set the initial conditions
that allow the innovations reconstruction asymptotically from
1. Can a nonlinear high order AR model, be represented by the observations.
an ARNN network, and be well approximated to nonlinear The fact that the NLMA model is not globally invertible
E
reduced order MA model? makes it, at least theoretically, not equivalent to a high order
2. When in a NARMA that assumes there is no AR process, NAR model, as if it happens in the case of linear one. It is
can a nonlinear time series containing inherent MA important to verify the invertibility of NLMA model to ensure
components be predicted adequately? that it is appropriate for the forecasting purposes and also
make its diagnosis possible.
AT
These questions will be resolved on the basis of the
approach of the invertibility of the nonlinear MA models and 3. Neural networks MODELS associated with moving
the use of experimental data simulations. The importance and averages components
originality of this work is based on the fact that to date there is
no evidence in the reviewed literature of studies that analyze Mathematically, a neuron is a nonlinear function, bounded
and identify the problems that arise when modeling and and parameterized in the form [23]:
forecasting time series with inherent MA components using
neural networks. o ¼ f x1 ; x2 ; …; xn ; u1 ; u2 ; …:; up ¼ f ðx; uÞ
The rest of this paper is organized as follows. Sections 2
IC
where:
and 3 present the nonlinear MA model, the NARMA and
NAR neural networks, respectively. Section 4 gives the
x ¼ ðx1 ; x2 ; …; xn Þ is the entry vector of variables into the
methodology used in this paper and the results we obtained to
neuron.
assess the capacity of these networks to predict nonlinear time
u ¼ ðu1 ; u2 ; …; up Þ is the weight (parameters) vector
series with MA component is given in Section 5. Section 6
associated with the inputs of the neuron.
provides answers to the research questions raised. Finally,
f ð,Þ is a nonlinear activation function.
PL
Section 7 concludes.
In turn, an artificial neural network is defined as a
2. Nonlinear moving average model
composition of nonlinear functions of the form:
In the nonlinear moving average model of order q, denoted
y ¼ g1 +g2 +…+gN f1 ðx; uÞ; f2 ðx; uÞ…; fp ðx; uÞ
as NLMA (q), the current value of the time series, yt, is a
nonlinear function known as h($) of the q past innovations where:
{ εt1,…, εtq} and the current innovation εt.
This is: y is the response variable or output of the artificial neural
DU
network.
yt ¼ εt þ h εt1 ; /; εtq ; q t ¼ 1; 2; 3; … ð1Þ
g1 for i ¼ 1,….,N are nonlinear functions.
where q represents the parameters vector of function ℎ($) and fj ðx; uÞ for j ¼ 1,…,p are functions defined as in (1).
N represents the number of hidden layers in the network.
y{ εt } is a sequence of independent random variables which
p denotes the number of neurons in the hidden layers.
are identically distributed, centered at zero and with constant
The symbol + between functions indicates the operation
variance.
composition.
Depending on the form of the function ℎ($), the following
NLMA models have been proposed:
The neural networks, according to its architecture and
Polynomial moving averages proposed by Robinson [17]. interconnection between neurons, can be classified into two
Asymmetric moving averages proposed by Kurt et al. [18]. classes: feed-forward networks and feed-back (recurrent)
networks. The feed-forward network, also known as static,
Nonlinear response moving averages with long scope
constitutes a nonlinear function of their entries and is repre-
proposed by Robinson and Zaffaroni [19].
sented as a set of interconnected neurons, in which informa-
Nonlinear integrated moving average by Engle and Smith
tion flows only in the forward direction, from inputs to
[20].
outputs. Specifically, in [24] a feed-forward network model,
A. Tealab et al. / Future Computing and Informatics Journal 3 (2018) 143e151 145
with a single output neuron and q hidden layers, is defined as yt ¼ h yt1 ; …; ytp ; εt1 ; …; εtq þ εt
follows:
Xq Xn where ℎ($) is a known nonlinear function and {3 t} is defined as
o t ¼ Ф b0 þ i¼1
bi J a iþ
j¼1
u ij xj;t ¼: f ðxt ; qÞ ð2Þ in (3). This model is called NARMA ( p,q). Since the sequence
3 t1,…, 3 tq is not directly observable, then you must find one
E
represents the parameters vector of the neural network,
under appropriate initial conditions [16]. By considering the
which is calculated based on the minimization of the sum
approximation in (5) and (6) the recurrent neural network
of squared differences
model NARMA ( p,q) can be expressed using the recurrent
X
n network:
ðyt ^
ot Þ
2
AT
Xh Xp Xpþq
t¼1 ybt ¼ a0 þ j¼1
a j g b0j þ b
i¼1 ij
y ti þ b
i¼pþ1 ij
b
ε tþpi
It is noteworthy that the kind of neural networks is more
studied and applied in the literature, mainly due to they are a ð7Þ
universal function approximator [25e27]; and moreover, in
where bε tþpi ¼ ytþpi ^y tþpi.
practice they are more simple networks in their implementa-
By observing the mathematical formulation of the model
tion and simulation. Meanwhile, the feed-back network, also
(7), it can be considered as an alternative to a nonlinear time
known as dynamic or recurrent, its architecture is character-
series model with an inherent moving averages component is
ized by cycles: the outputs of the neurons in a layer can be
to use a NARMA (0,q) model. This observation will be dis-
IC
inputs to the same neuron or inputs to neurons of previous
cussed in the following section.
layers. For more information of this type of network is sug-
gested to check [23] and [28]. Below are described special
4. Used methodology
cases of these types of networks: the autoregressive neural
network ARNN, which is of type feed-forward and recurrent
The evaluation of the ability to forecast NARMA ( p,q)
neural network NARMA.
neural networks models and ARNN ( p) was performed using
two sets of experimental data from the models described in
PL
E
Fig. 1. Example of time series generated by Model 1. procedure suggested by Zemouri et al. [30], namely:
The experiments focused on two aspects: (i) Analysis the 1. Made from i ¼ 1 to M ¼ 1000 times from different
ability to capture all the nonlinear moving averages process starting points:
using a recurrent neural network NARMA (0,q) or ARNN ( p) Train the network using the training data.
AT
with large enough p (for which the Model 1 was used), and (ii) Validate the trained network using the n.val validation
compare results obtained with any of the networks considered data. Calculate the forecasting mean error E(i) and
in this work with those found in the literature to NLMA model standard deviation std(i) on the validation set:
processes. In this case the Model 2 was used, and compared
the Burgues and Sayings results [15] obtained by a ARNN ( p)
network. In that case, the methodology used for each model 1 Xn: val
EðiÞ ¼ yj ybj - ð9Þ
has some distinctive aspects: n: val j¼1
Model 1:
1 Xn: val 2
stdðiÞ ¼ yj ybj - ð10Þ
IC
Different sample sizes are considered for n ¼ {100; 200; n: val j¼1
360} and data rates for network training (50, 65 and 80), to 2. Calculate the following measures to evaluate the fore-
examine the effect of their election on the predicted casting performance
values. P of the network:
M1 ¼ E ¼ M1 M i¼1 EðiÞ: It corresponds to an estimate
For the ARNN model values were examined with large of the average of the overall forecasting mean errors,
lags of p ¼ {10; 15; 25; 50; 100} for the purpose to answer and evaluates the proximity between the predicted and
the first research question. actual values. If M1 ¼ 0, then probability that the
PL
The network structure was considered to be used based on forecasting is centered on the actual data is very high.
the results found by Zhang et al. [29], who via simulation P
M2 ¼ std ¼ M1 M i¼1 stdðiÞ: It is used for measuring
show that the best network structure corresponds to a forecasts accuracy (in terms of variability). The ideal
hidden layer with a maximum of two neurons. The value is M2 ¼ 0, because it indicates that there is a
objective function was minimizing the mean square error significant probability that the predicted values are
(MSE). not scattered (i.e.; they have low variability).
In the case of NARMA model, in addition to the structure qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PM qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PM
2 2
of previous network, the following settings for the moving
1
M ½EðiÞE þ M
1
½stdðiÞstd
M3 ¼
i¼1 i¼1
. It is used to
averages process were considered that p ¼ {1; 2; 3; 4; 5; 6;
DU
2
indicate whether the training process of the network is
7; 8; 9; 10}.
repeatable (in which case M ¼ 3), so that you always
A set of 150 additional observations was generated and
get the same structure of the neural network in each
used as test data.
run of the training process, regardless of the initial
values.
Model 2: It is considered the same experimental conditions
M4 ¼ M1þM2þM31
. It is to examine the accuracy of the
employed by Burges and Refenes [15] in order to be able to
forecast. If the outputs of the network are very close
compare the results:
to the actual values, then the measures M1, M2 and
M3 are close to zero, and in that case M4 will take
The size of the series was 400 observations, of which the
very large values, so that M4>>0 is the ideal value to
initial 70% is used to train the network and the remaining
have forecasts confidence.
30% for validation.
3. Perform the verification using the test data: Select the best
The objective function was to minimize the normalized
candidate network as it having the higher M4 value and
means square error (NMSE).
lower M1, M2, and M3 values on the validation set. This
All networks used were of one hidden layer with four
will avoid over-fitting and under-fitting problems. Finally,
neurons.
A. Tealab et al. / Future Computing and Informatics Journal 3 (2018) 143e151 147
5. Results
E
The obtained results are presented below for each consid- Fig. 3. Performance measures for the ARNN model with n ¼ 200, p ¼ {10, 15,
ered model. 25, 50} and training percentage (50, 65, 80).
A. Model 1
AT
Figs. 2e4 show the values obtained for the measures. M1 e
M4 on the validation set of ARNN network for each sample
size under different numbers and considered training lags
percentages. In turn Fig. 5 contains the values of performance
measures E(i) and std(i) obtained in the validation set. Table 2
shows the results found for the ARNN network, the test data
under nine considered scenarios and the values of large lags
p ¼ {10; fifteen; 25; 50; 100}. The first column contains the
sample size, the second number of lags p, and the last three
IC
Fig. 4. Performance measures for the ARNN model with n ¼ 360, p ¼ {10, 15,
columns show the measures E(i) and std(i) values found for
25, 50, 100} and training percentage (50, 65, 80).
the test percentage of training sets. In this table, the * symbol
indicates that the value of the lag p is greater than the size of
the sample for all validation set, so you cannot examine the
ability of forecasting in this group of data.
From Figs. 2e5, it appears that whatever the value of the
PL
values.
In addition, Table 2 and Figs. 2e4 follow that the number
of lags selected in the final ARNN model depends on the size
of the series and the percentage of data used for network
training: for the network to be able to predict adequately, it is
necessary to choose the maximum number of lags allowed and
the largest set of training; which leads to expect that the use of
ARNN networks to forecast series with inherent MA compo-
nent, tends to suffer from parameterization problems. This is
confirmed by examining the behavior of the MSE according to
the number of lags and layers of the network. It was observed
that in the way of increasing the order of the nonlinear AR
model, the MSE tends to decrease regardless of the nodes
considered; however, minors MSE is obtained when consid-
Fig. 2. Performance measures for the ARNN model with n ¼ 100, p ¼ {10, 15, ering the network with two nodes in the hidden layer (see
25} and training percentage (50, 65, 80). Fig. 6).
148 A. Tealab et al. / Future Computing and Informatics Journal 3 (2018) 143e151
Table 2 obtained for M1eM4 on the validation set, and the E(i) and
Performance measures for the ARNN model with the test data. std(i) values for the whole test and the last three columns show
n p Measure Percentage of training the values obtained from these measurements for each training
50 65 80 percentage.
100 10 E(i) 2.334 0.1958 0.2934 In this table it is concluded that the NARMA network re-
std(i) 1.5836 1.7273 1.7324 quires considering large sample sizes to fit models that capable
15 E(i) 0.7113 0.3311 0.3016 of reduce the forecasts heterogeneity in test set. Likewise, as in
std(i) 1.8923 2.2368 1.5623 the networks ARNN, the percentage of data used for training
25 E(i) 0.3702 0.316 *
std(i) 1.5233 1.5139 *
the network has a direct relationship with the accuracy of the
200 10 E(i) 0.6903 0.339 0.2041 forecast, for any sample size. It was found that the best
std(i) 1.7291 1.5689 1.6601 outcome for the NARMA network (in terms of the measures on
E
15 E(i) 0.3939 0.3065 0.3379 the test data) was provided considering two nodes in the hidden
std(i) 1.6468 1.6451 1.7154 layer, q ¼ 2 lags and 360 observations, of which 80 percent was
25 E(i) 0.3945 0.1167 0.129
std(i) 1.5177 1.5716 1.5575
used to train the network.
50 E(i) 0.5299 0.2362 * It is noted that although the NARMA network is not also
std(i) 1.756 1.4851 * capable of capturing all data behavior with performance of
AT
360 10 E(i) 0.3284 0.299 0.346 moving averages (see Fig. 7 graph (b)); it is found that using a
std(i) 1.6458 1.7647 1.5909 lower number of parameters to be estimated has a better per-
15 E(i) 0.3678 0.371 0.2601
std(i) 1.559 1.5777 1.5391
formance than the ARNN network.
25 E(i) 0.1201 0.123 0.2232
std(i) 1.5785 1.5136 1.6092 B. Model 2
50 E(i) 0.1744 0.1713 0.1388
std(i) 1.2746 1.3208 1.2823 Table 4 contains the values of the normalized means square
100 E(i) 0.2222 0.05965 *
std(i) 1.06824 1.01172 *
error (NMSE) found by Burges and Refenes [15] for NARMA
models with 1, 2 and 3 lags (first three rows), and those ob-
IC
tained in this work by using the ARNN network and high order
NARMA network with 1, 2 and 3 lags. The information for
ARNN and NARMA models considered in this table is pre-
sented in Table 5, which contains information for each model
of the performance measures suggested by Zemouri et al. [30].
The actual values of test versus the best forecasts of the net-
PL
network with one (ARNN1) and two (ARNN2) nodes in the hidden layer. found for the three data sets (see Table 4). Following the
proposed approach by Zemouri et al. [30], the best models are:
ARNN (25) and NARMA (3). Note that there is a consistency
The best result found for the ARNN network (in terms of to select the best model using NMSE or E(i) measure (ob-
better measures results on the test data) was obtained when tained for test data).
considering 360 observations, of which 65% were used to train However, there is evidence that these models do not have a
the network with the maximum number of lags (100) and 2 nodes good predictive capability, given that in Fig. 8 clouds of points
in the hidden layer. However, it is not able to capture all the are far from the 45 line.
nonlinear process of moving averages (see graphic (a) in Fig. 7).
Moreover, the results found on the predictive ability of the 6. Discussion
recurrent neural network NARMA with presence of moving
averages are shown in Table 3 and the graph (b) of Fig. 7. In this section we answer the raised research questions.
In Table 3, the first column shows the sample size, and the
last three columns shown for each percentage of the following 1. Can a nonlinear high order AR model, represented by
training results: selected configuration (number of lags p and ARNN network, be well approximated to nonlinear
number of nodes in the hidden layer ) measurements values reduced order MA model?
A. Tealab et al. / Future Computing and Informatics Journal 3 (2018) 143e151 149
E
AT
Fig. 7. Comparison between the test data and their found forecasts with the best network (a) ARNN (100) and (b) NARMA (q ¼ 2, k ¼ 2).
Table 3 Table 5
Measures of performance for the NARMA model. Performance measures of the NARMA and ARNN models.
n p Measure Percentage of training Model M1 M2 M3 M4 E(i) std(i)
50 65 80 ARNN (10) 0.115 1.999 0.134 0.445 0.0394 1.0708
100 10 E(i) 2.334 0.1958 0.2934 ARNN (25) 0.0904 1.852 0.15 0.478 0.00544 1.101
ARNN (50) 0.129 1.607 0.0565 0.558 0.0417 0.153
IC
std(i) 1.5836 1.7273 1.7324
15 E(i) 0.7113 0.3311 0.3016 NARMA (1) 0.17 2.004 0.0276 0.537 0.0841 1.89
std(i) 1.8923 2.2368 1.5623 NARMA (2) 0.218 1.912 0.202 0.527 0.0672 1.865
25 E(i) 0.3702 0.316 * NARMA (3) 0.254 1.211 0.248 0.584 0.0249 1.852
std(i) 1.5233 1.5139 *
200 10 E(i) 0.6903 0.339 0.2041 In examining whether the ARNN network with a high order
std(i) 1.7291 1.5689 1.6601 for the lag p, is capable of approximating a NLMA correctly it
15 E(i) 0.3939 0.3065 0.3379 found that while increasing the number of lags p, the MSE
std(i) 1.6468 1.6451 1.7154
PL
50 E(i) 0.1744 0.1713 0.1388 leads to not adjust parsimonious or short term models and over
std(i) 1.2746 1.3208 1.2823
100 E(i) 0.2222 0.05965 *
parameterization problems.
std(i) 1.06824 1.01172 * If in addition to this, it is considered that NLMA model is
not globally invertible, then the answer to the question is
nonlinear autoregressive model (in this case approximated by
Table 4 an ARNN network) of a high order is not capable of repre-
Comparison of results for simulated data model (11). senting a nonlinear moving averages model (NLMA) of low
Model Data of training Data of validation Data of proof order.
NARMA (1) [15] 0.813 0.846 NA
NARMA (2) [15] 0.692 0.755 NA 2. When in a NARMA that assumes there is no autore-
NARMA (3) [15] 0.689 0.789 NA gressive process, can be predicted adequately a nonlinear
ARNN (10) 0.714 0.858 0.0858 time series containing inherent moving averages
ARNN (25) 0.636 0.864 0.0198
ARNN (50) 0.623 0.767 0.139
components?
NARMA (1) 0.743 0.783 0.909
NARMA (2) 0.773 0.714 0.876 Figs. 7 and 8 and in Tables 2 and 5, it is observed that
NARMA (3) 0.757 0.787 0.855 although the selected NARMA model has better performance
150 A. Tealab et al. / Future Computing and Informatics Journal 3 (2018) 143e151
E
[30] approach to get straight 45 ) than the other tested net-
works, the predicted values by this model are far from the
IC
actual values of the nonlinear series time with MA
component (See graphs (b) of Figs. 7 and 8). Considering this
AT
Fig. 8. Comparison between the test data and their forecasts found with the network (a) ARNN (25) and (b) NARMA (3).
(in terms of performance measures proposed by Zemouri et al. [2] Iacus SM. Statistical data analysis of financial time series and option
pricing in R. Chicago, R/Finance, USA. 2011.
[3] Ter€asvirta T. Forecasting economic variables with nonlinear models.
SSE/EFI Working Paper in Economics and Finance. Stockholm:
Department of Economic Statistics; 2005. p. 598.
[4] Engle Robert F. Risk and volatility: econometric models and financial
fact, the answer is that a recurrent network NARMA (0, q) practice. 44 West Fourth Street, New York, NY 10012-1126, USA: New York
cannot adequately predict nonlinear time series containing University, Department of Finance (Salomon Centre); December 2003.
[5] Dijk Dick van, Medeiros Marcelo C, Ter€asvirta Timo. Linear models,
inherent moving averages components. smooth transition autoregressions, and neural networks for forecasting
However, in testing it was noted that as is the case with macroeconomic time series: a re-examination. Rua Marques de S^ao
mathematical expressions, practically NARMA network has a Vicente 225-Rio de Janeiro 22453-22900, RJ: Department of Economics
PL
better approach to model NLMA (from the point of view of PUC-Rio, Pontifical Catholic University of Rio de Janeiro; 2004.
better forecasting capacity measures) than ARNN network. [6] Jorda Oscar, Escribano Alvaro. Improved testing and specification of smooth
transition regression models. Universidad Carlos III de Madrid; 1997.
This indicates that this network can be a good candidate to [7] Robinson Peter M. Modelling memory of economic and financial time
nonlinear data model containing moving averages compo- series. London School of Economics and Political Science; 2005.
nents, but requires to be studied in detail, and so a new [8] La Rocca Michele, Perna Cira. Model selection for neural network
research question arises: From the theoretical approach point models: a statistical perspective. In: Emmert-Streib Frank, Pickl Stefan,
of view, what are the considerations that the recurrent network Dehmer Matthias, editors. Computational network theory: theoretical
foundations and applications. 1st ed. Wiley-VCH Verlag GmbH & Co.
NARMA (0, q) must have so it can predict properly nonlinear KGaA; 2015.
time series containing inherent moving average components?
DU
[17] Robinson PM. The estimation of a nonlinear moving average model. [25] Stinchcombe M, White H, Hornik K. Multilayer feedforward networks
Stoch Process Appl 1977;5(1):81e90. are universal approximator. Neural Netw 1989;2(2):359e66.
[18] Br€ann€as Kurt, Ohlsson Henry. Asymmetric time series and temporal [26] Stinchcombe M, White H, Homik K. Universal approximation of an
aggregation. Rev Econ Stat May 1999;81(2):341e4. unknown mapping and its derivatives using multilayer feedforward net-
[19] Robinson PM, Zaffaroni P. Modelling nonlinearity and long memory in works. Neural Netw 1990;3(5):551e60.
time series. Fields Inst Commun 1997;11:161e70. [27] Homik K. Approximation capabilities of multilayer feedforward net-
[20] Engle RF, Smith A. Stochastic permanent breaks. Rev Econ Stat 1999; works. Neural Netw 1991;4(2):251e7.
81(4):553e74. [28] Rosa Joao Luis G. Artificial neural networks e models and applications.
[21] Turkman Kamil, Scotto Manuel Gonzalez, de Zea Bermudez P. 2016.
Nonlinear time series models. In: Extreme events and integer value [29] Patuwo BE, Hu MY, Zhang GP. A simulation study of artificial neural
problems; 2014. p. 23e90. networks for nonlinear time series forecasting. Comput Oper Res 2001;
[22] Chen D, Wang H. The stationarity and invertibility of a class of nonlinear 28(4):381e96.
ARMA models. Sci China Math March 2011;54(3):469e78. [30] Gouriveau R, Zerhouni N, Zemouri R. Defining and applying prediction
E
[23] Haykin Simon O. Neural networks and learning machines. 3rd ed. 2006. performance metrics on a recurrent NARX time series model. Neuro-
[24] Gencay R, Liu T. Nonlinear modeling and prediction with feedforward computing 2010;73(13e15):2506e21.
and recurrent networks. Phys D 1997;108(1e2):119e34.
IC AT
PL
DU