BiLSTM LSTM
BiLSTM LSTM
Zhiyong Cui, Ruimin Ke, Ziyuan Pu, and Yinhai Wang are with the
Department of Civil and Environmental Engineering, University of
Washington, Seattle, WA 98195 USA (e-mail: [email protected],
[email protected], [email protected], [email protected]).
2
data, LSTMs have been proved to be able to process sequence time series data, especially for recurring traffic patterns, from
data [4] and applied in many real-world problems, like speech both forward and backward temporal perspectives will enhance
recognition [6], image captioning [7], music composition [8] the predictive performance [28]. However, based on our review
and human trajectory prediction [9]. In recent years, LSTMs of the literature, few studies on traffic analysis utilized the
have been gaining popularity in traffic forecasting due to their backward dependency. To fill this gap, a bidirectional LSTM
ability to model long-term dependencies. Several studies [2, 22- (BDLSTM) with the ability to deal with both forward and
25, 34, 43, 44, 45] have been done to examine the applicability backward dependencies is adopted as a component of the
of LSTMs in traffic forecasting, and the results demonstrate the network structure in this study.
advantages of LSTMs. However, the potential of LSTMs is far In addition, when predicting the network-wide traffic speed,
from being fully exploited in the domain of transportation. The rather than the speed at a single location, the impact of upstream
three primary limitations in previous work on LSTMs in traffic and downstream speeds on each location in the traffic network
forecasting can be summarized as follows: 1) traffic forecasting should not be neglected. Previous studies [26, 27] which only
has generally focused on a small collection of network level. 2) making use of the forward dependencies of time series data
Most of the structures of LSTM-based methods are shallow. 3) have found that the past speed values of upstream as well as
The long-term dependencies are normally learned from downstream locations influence the future speed values of a
chronologically arranged input data considering only forward location along a corridor. However, for complicated traffic
dependencies, while backward dependencies learned from networks with intersections and loops, upstream and
reverse-chronological ordered data has never been explored. downstream both refer to relative positions, and two arbitrary
From the perspective of the scale of prediction area, locations can be upstream and downstream of each other.
predicting large-scale transportation network traffic has Upstream and downstream are defined with respect to space,
become an important and challenging topic. Most existing while forward and backward dependencies are defined with
studies utilize traffic data at a sensor location or along a respect to time. With the help of forward and backward
corridor, and thus, network-wide prediction could not be dependencies of spatial-temporal data, the learned feature will
achieved unless N models were trained for a traffic network be more comprehensive.
with N nodes [22]. While, learning complex spatial-temporal In this paper, we propose a stacked bidirectional and
features of a large-scale traffic network by only one model unidirectional LSTM (SBU-LSTM) neural network, combining
should be explored. LSTM and BDLSTM, for network-wide traffic speed
Regarding depth of the structure of LSTM-based models, the prediction. The proposed model is capable of handling input
structure should have the ability to capture the dynamic nature data with missing values and is tested on both large-scale
of the traffic system. Most of the newly proposed LSTM-based freeway and urban traffic networks in the Seattle area.
prediction models have relatively shallow structures with only Experimental results show that our model achieves network-
one hidden layer to deal with time series data [2, 22, 44]. wide traffic speed prediction with a high prediction accuracy.
Existing studies [20, 21] have shown that deep LSTM The influence of the number of layers, the number of time lags
architectures with several hidden layers can build up (the length of time series input), the dimension of weight
progressively higher levels of representations of sequence data. matrices in LSTM/BDLSTM layers, and the impact of
Although some studies [23-25] utilized more than one hidden additional volume and occupancy data are further analysed. The
LSTM layer, the influences of the number of LSTM layers in model’s scalability and its potential applications are also
different LSTM-based models need to be further compared and discussed. In summary, our contributions can be stated as
explained. follows: 1) we expand the traffic forecasting area from a
In terms of the dependency in prediction problems, all of the specific location or several adjacent locations along a corridor
information contained in time series data should be fully to large-scale traffic networks, varying from freeway traffic
utilized. Normally, the dataset fed to an LSTM model is network to complex urban traffic network; 2) we propose a deep
chronologically arranged, with the result that the information in architecture considering backward dependencies by combining
the LSTMs is passed in a positive direction from the time step LSTM and BDLSTM to enhance the feature learning from the
𝑡 − 1 to the time step 𝑡 along the chain-like structure. Thus, the large-scale spatial time series data; 3) a masking mechanism is
LSTM structure only makes use of the forward dependencies adopted to handle missing values; and 4) we evaluate many of
[5]. But in this process, it is highly possible that useful the model’s internal and external influential factors.
information is filtered out or not efficiently passed through the
chain-like gated structure. Therefore, it may be informative to II. METHODOLOGY
consider backward dependencies, which pass information in a In this section, the components and the architecture of the
negative direction, into consideration. Another reason for proposed SBU-LSTM is detailly introduced in this section.
including backward dependency into our study is the Here, speed prediction is defined as predicting future speed
periodicity of traffic. Unlike wind speed forecasting [15], traffic based on historical speed information. The illustrations of the
incident forecasting [16], or many other time series forecasting models in following sub-sections all take the traffic speed
problems with strong randomness, traffic conditions have prediction as examples.
strong periodicity and regularity, and even short-term
periodicity can be observed [17]. Analysing the periodicity of
3
Fig. 1 Standard RNN architecture and an unfolded structure with T time steps
Fig. 4 Masking layer for time series data with missing values
Fig. 5 SBU-LSTMs architecture necessarily consists of a BDLSTM layer and a LSTM layer. Masking layer for handling missing values and multiple LSTM or
BDLSTM layers as middle layers are optional.
backward dependencies. When feeding the spatial-temporal Fig. 5, since the target of this study is to predict network-wide
information of the traffic network to the BDLSTMs, both the traffic speed for one future time step. The detailed spatial
spatial correlation of the speeds in different locations of the structure of input data is described in the experiment section.
traffic network and the temporal dependencies of the speed
values can be captured during the feature learning process. In III. EXPERIMENTS
this regard, the BDLSTMs are very suitable for being the first
A. Dataset Description
layer of a model to learn more useful information from spatial
time series data. When predicting future speed values, the top In this study, two types of traffic state datasets are utilized to
layer of the architecture only needs to utilize learned features, carry out experiments to test the proposed model. One is a
namely the outputs from lower layers, to calculate iteratively station-/point-based dataset, called loop detector data [46],
along the forward direction and generate the predicted values. collected by inductive loop detectors deployed on roadway
Thus, an LSTM layer, which is fit for capturing forward surface. Multiple loop detectors are connected to a detector
dependency, is a better choice to be the last (top) layer of the station deployed around every half a mile. The collected data
model. from each station are grouped and aggregated as station-based
In this study, we propose a novel deep architecture named traffic state data according to directions. This aggregated and
stacked bidirectional and unidirectional LSTM network (SBU- quality controlled dataset contains traffic speed, volume, and
LSTM) to predict the network-wide traffic speed values. Fig. 5 occupancy information. In the experiments, the loop detector
illustrates the graphical architecture of the proposed model. If data cover four connected freeways, which are I-5, I-405, I-90
the input contains missing values, a masking layer should be and SR-520 in the Seattle area, and are extracted from the
adopted by the SBU-LSTM. Each SBU-LSTM contains a Digital Roadway Interactive Visualization and Evaluation
BDLSTM layer as the first feature-learning layer and a LSTM Network (DRIVE Net) system [29, 30]. The traffic sensor
layer as the last layer. For sake of making full use of the input stations are shown in Fig. 6 (a), which is represented by small
data and learning complex and comprehensive features, the blue icons. This dataset contains traffic state data of 323 sensor
SBU-LSTM can be optionally filled with one or more stations in 2015 and the time step interval of this dataset is 5
LSTM/BDLSTM layers in the middle. Fig. 5 shows that the minutes.
SBU-LSTM takes the spatial time series data as the input and The other dataset used in this study is a segment-based
predict future speed values for one time-step. The SBU-LSTM dataset, called INRIX data [46], which measures traffic speeds
is also capable of predicting values for multiple future time of both freeway and urban roadway segments. INRIX data is
steps based on historical data. But this property is not shown in selected by the U.S. Federal Highway Administration as the
National Performance Management Research Data Set. INRIX
6
TABLE Ⅰ
PERFORMANCE COMPARISON OF THE PROPOSED MODEL WITH OTHER
BASELINE MODELS FOR SINGLE DETECTOR STATIONS SPEED PREDICTION
Models MAE (mph) MAPE (%)
SVM 9.23 20.39
Random Forest 2.64 6.30
Feed-forward NN
2.63 6.41
(2-hidden layers)
GRU NN 3.43 8.02
SBU-LSTMs 2.42 5.67
K
100 𝑥r − 𝑥Xr
MAPE = —š š (13)
𝑛 𝑥r
r™.
Fig. 6 Loop detector stations on the freeway network in Seattle area where 𝑥r is the observed traffic speed, and 𝑥Xr is the predicted
speed. All the compared models in this section are trained and
data provide wide coverage and accurate traffic information by tested multiple times to eliminate outliers, and the results of
aggregating GPS probe data from a wide array of commercial them presented are averaged to reduce random errors.
vehicle fleets, connected cars and mobile apps. An entire traffic In this section, the results of the proposed SBU-LSTMs are
network in the Seattle downtown area, which contains more analyzed and compared with classical methods and other RNN-
than 1000 roadway segments, shown in Fig. 6 (b), is selected as based models. Further analysis about the influence of the
the experimental dataset. The dataset covers the whole year of number of time lags, the dimension of weight matrices, the
2012 and its time step interval is also 5 minutes. number of layers, the impact of volume and occupancy
information, spatial feature learning, and model robustness are
B. Experiment Results Analysis and Comparison
carried out to shed more light on the characteristics of proposed
In this sub-section, only the loop detector data, due to its high model.
data quality [46], are used to measure the performance of the 1) Comparison with Classical Models for Single Location
proposed approach and compare with other models. Hence, the Traffic Speed Prediction
network-wide traffic is characterized by the 323 station speed Many classical baseline models used in traffic forecasting
values and the spatial dimension of the input data is set as, 𝑃 = problems, like ARIMA [2, 23] Support Vector Regression
323. Since, the unit of a time step in loop detector data is 5 (SVR) [37], Kalman filter [35]. Based on our literature review
†‡(ˆrK)
minutes, the dataset has ‰ × 24(ℎ𝑜𝑢𝑟) × 365(𝑦𝑒𝑎𝑟) = [2], the performances of ARIMA and Kalman filter method are
105120 time steps in total. Suppose the number of the time lags far behind the others, and thus, these two methods are not
compared in this study. Most of mentioned classical models are
is set as 𝑛 = 10, which means the model uses a set of data with
not suitable for predicting network-wide traffic speed via a
10 consecutive time steps (covering 50 minutes) to predict the
following 5-minute speed value, the dataset is separated into single model, since they normally cannot process 3-D spatial
temporal vectors. To compare our proposed model with these
samples with 10 time lags and the sample size is 𝑁 =
baseline models, experiments are carried out for single loop
105110 (105120– 10).
detector stations, whose input data is a 2-D vector without
Based on the descriptions of the model, each sample of the
spatial dimension. The results are averaged to measure the
input data, 𝑿IH , is a 2-D vector with the dimension of [𝑛, 𝑃] =
overall performance of these models.
[10, 323], and each sample of the output data is a 1-dimension
We compared the performance of the SBU-LSTMs with
vector with 323 components. The input of the model is a 3-D
SVR, random forest, feed-forward NN, GRU NN. In this
vector, whose dimension is [𝑁, 𝑛, 𝑃]. Before fed into the model,
comparison, the proposed model does not use the masking layer
all the samples are randomized and divided into training set,
and optional middle layers. Among these baseline models, the
validation set, and test set with the ratio 7:2:1.
feed-forward NN model, also called Multilayer Perceptron
In the training process, mini-batch gradient descent method
(MLP), has superior performance for the traffic flow prediction
is used when the model optimizes the mean squared error
[32], and decision tree and SVR are very efficient models for
(MSE) loss using RMSProp optimizer and early stopping
prediction [23, 37]. For the SVR method, the Radial Basis
mechanism is used to avoid over-fitting. To measure the
Function (RBF) kernel is utilized, and for the Random Forest
effectiveness of different traffic speed prediction algorithms,
method, 10 trees are built, and no maximum depth of the trees
the Mean Absolute Errors (MAE) and Mean Absolute
is limited. In this experiment, the feed-forward NN model
Percentage Errors (MAPE) are computed using the following
consists of two hidden layers with 323 nodes in each layer.
equations:
K Table Ⅰ demonstrates the prediction performance of different
1 algorithms for the single detector stations. The number of input
MAE = —|𝑥r − 𝑥Xr | (12)
𝑛 time lags in this experiment is set as 10. Among the non-neural
r™.
network algorithms, random forest performs much better, with
the MAE of 2.64, than the SVM method, which makes sense
7
TABLE Ⅱ
PERFORMANCE COMPARISON OF THE PROPOSED MODEL WITH OTHER LSTM-BASED MODELS FOR NETWORK-WIDE TRAFFIC SPEED PREDICTION
Number of LSTM / BDLSTM layers
Model N=0 N=1 N=2 N=3 N=4
MAE MAPE MAE MAPE MAE MAPE MAE MAPE MAE MAPE
N-layers LSTM 2.886 6.585 2.502 5.929 2.483 5.950 2.529 6.114
N-layers LSTM
2.652 6.489 2.581 6.332 2.630 6.438 2.646 6.586
+ 1-layer DNN
N-layers LSTM
2.668 6.506 2.557 6.274 2.595 6.447 2.647 6.602
+ Hour of Day + Day of Week
N-layers BDLSTM 3.021 6.758 2.472 5.819 2.476 5.846 2.526 5.988
SBU-LSTMs: 1-layer BDLSTM
+ N middle BDLSTM layers 2.426 5.674 2.465 5.787 2.502 5.950 2.549 6.191 2.576 6.227
+ 1-layer LSTM
due to the majority votes mechanism of random forest. The middle layer, to represent the basic structure of the SBU-LSTM.
feed-forward NN whose MAE is 2.63 performs very close to The performance of SBU-LSTM is in conformity with the
the random forest method. Although GRU NN is a kind of trends of the compared models that the MAE and MAPE
recurrent NN, its performance obviously cannot outperform increase as the number of layers rises from zero to four.
those of feed-forward NN and random forest. The single layer The proposed SBU-LSTM outperforms the others for all the
structure and the simplified gates in GRU NN may be the layer numbers. When the SBU-LSTM has no middle layer, it
reasons. To sum up, the proposed SBU-LSTM model is clearly achieves the best MAE, 2.426 mph, and MAPE, 5.674%. The
superior to the other four methods in this single detector station test errors of multilayer LSTM NN and BD LSTM NN turn out
based experiment. to be larger than that of the proposed model. They achieve their
2) Comparison with LSTM-based models for Network-wide best MAEs of 2.502 and 2.472, respectively, when they both
Traffic Speed Prediction have two layers. It should be noted that, for the one-layer case,
The SBU-LSTMs is proposed aiming at predicting the the BDLSTM NN model gets the worst performance in our
network-wide traffic speed, and thus, other methods with the experiments shown in the Table Ⅱ. It indicates that one-layer
ability of predicting multi-dimensional time series data are BDLSTM may be good enough for capturing features, but it is
compared in this section. Since the proposed model combines not satisfactory to predict the results. Except for the one-layer
BDLSTMs and LSTMs, the pure deep (N-layers) BDLSTMs case, the model combining deep LSTM and DNN are not
and LSTMs are compared. A deep LSTM NN adding a fully comparable with others. This test results show that adding DNN
connected deep neural network (DNN) layer, which is proven layers to deep LSTM cannot make improvements for the
to be able to boost the LSTM NN [33], is also compared. To network-wide traffic prediction problem is consistent with the
measure the influence of temporal information to the network- finding in a previous study [33]. The performance of the
wide traffic speed, a multilayer LSTM model combining day of temporal information added multilayer LSTM is very close to
week and hour of day is also tested in this experiment. that of the LSTM combined with DNN. Thus, incorporating the
Meanwhile, the influence of depth of the neural networks, day of week and time of day features cannot improve the
namely the number of layers of the models, is tested in this performance for this study. This is in accordance with the
section. All the experiments undertook in this section used the results of previous works [23, 24].
dataset covering the whole traffic network with 10-time lags. 3) Influence of number of time lags
The number of time lags, 10, is set within a reasonable range The number of time lags, 𝑛, is the temporal dimension of the
for traffic forecasting based on literatures [25, 32] and our input data, which may influence the performance of the
experiments. The spatial dimension of weight matrices in each
LSTM or BDLSTM layer in this experiment is set as the
number of loop detector stations, 323, to ensure the spatial
feature can be fully captured. The comparison results are
averaged from multiple tests to remove random errors.
Table Ⅱ shows the comparison results, where the headers on
horizontal axis show the amount of the LSTM or BDLSTM
layers owned by the models. In terms of the influence of depth
of the neural network, all the compared models achieve their
best performance when they have two layers and their
performances have the same trends that the values of MAE and
MAPE increase as the number of layers increases from two to Fig. 6 Boxplot of MAE versus number of time lags in SBU-LSTMs. One unit
four. Table Ⅱ contains a special “(N=0)” column, denoting no of time lag is 5 minutes.
8
Fig. 7 Heatmaps of ground truth and predicted speed values for the freeway traffic network on 01/09/2015. The two plots share the same meanings of the two
axes, where the two horizontal axes represent the index and the arrangement order of sensor stations based on the mileposts and directions of the four freeways,
respectively.
proposed model. Fig. 6 shows the boxplot of the MAE versus performance, if the dimension is set as a reasonable value close
the number of time lags, in which the spatial dimensions of all to the number of sensor locations.
weight matrices are all set as 𝑃 = 323. When the number of 5) Spatial features learning
time lags equals 8, 10, and 12, the MAEs are very close, around Spatial features of a traffic network are critical for predicting
2.4. The deviations of these MAEs are relatively small. When network-wide traffic states. By carefully studying the LSTM
the number of time lags is set as 6, the MAE is much higher, methodology, we can find that the spatial features can be
and the deviation is much larger than other cases. That means, inherently learned by the weights in LSTM or BDLSTM layers
given the 5-minute time step interval and the studied traffic at the training process. No matter what the network’s spatial
network, input data with 6 time steps are not enough for the structure is, and no matter what the spatial order of the input
model to accurately predict network-wide traffic speed. To sum data is, the traffic speed relationship between each pair of two
up, the number of time lags tends to influence the predictive locations in the traffic network can be captured by the LSTM
performance, especially when the number is relatively small. weight matrices.
4) Influence of dimension of weight matrices In this section, we measure the influence of spatial order of
In the experiment, the dimension of each data sample is the input data on the spatial feature learning. Firstly, we order
[𝑛, 𝑃] , where 𝑃 is the spatial dimension representing the the spatial dimension of input data based on the milepost and
number of loop detector stations. According to the matrix direction of freeways. Fig. 7 displays the heatmap of true speed
multiplication rule, the spatial dimension of the weight matrices and predicted speed for the freeway network on a randomly
in the first layer of the SBU-LSTM must be accordance with selected day, taking 09/01/2015, a Friday, for an example. The
the value of 𝑃. But the spatial dimension of weight matrices in extremely similarity between the shapes in the two heatmaps
other layers can be customized. In this section, we measure the shows that the proposed model is capable of learning spatial
influence of the dimension of weight matrices in the basic SBU- features. Then, we randomly rearrange the spatial dimension of
LSTM. input data. By training and testing the model for multiple times,
When the model’s last LSTM layer has different spatial we find that the predictive performance nearly does not change,
. .
dimensions, including › 𝑃•, › 𝑃•, 𝑃, 2𝑃 and 4𝑃, very close and the MAEs are all around 2.42 mph. To the best of our
œ 1 knowledge, at least two aspects of reasons lead the good
prediction results are observed. Here, 𝑃 equals 323 and ⌈∙⌉ is performance. One is that the BDLSTM, measuring both
the ceiling function. Table Ⅲ shows the comparison results. forward and backward dependencies, helps learn better features.
The MAE, MAPE, and standard deviations are nearly the same. The other one is that the inherent spatial correlation between
Hence, the variation of the dimension of the weight matrices in locations is learned and stored in the weight matrices during the
the LSTM layer almost has no influence on the predictive training process. Hence, the order of spatial dimension of input
data basically does not affect the model performance.
TABLE Ⅲ
PERFORMANCES COMPARISON OF SBU-LSTMS WITH DIFFERENT SPATIAL
6) Influence of volume and occupancy
DIMENSIONS OF WEIGHT MATRICES Speed, volume (flow), and occupancy are the three
Spatial dimension of fundamental factors to analyze traffic flow. Considering the
weight matrices in MAE MAPE STD loop detector data contains speed, volume, and occupancy
the last layer (LSTM layer) information, it is informative to investigate the influence of
1⁄4 𝑃 2.486 5.903 0.675
these factors on the proposed model’s predictive performance.
1⁄2 𝑃 2.425 5.680 0.643
𝑃 = 323 2.426 5.674 0.630 In previous experiments, each element of the model input, 𝑥DE , is
2 𝑃 2.431 5.736 0.636 the speed (𝑠) at a specific location, 𝑝, at time 𝑡, where 𝑥DE = 𝑠DE .
4 𝑃 2.411 5.696 0.636
While, in this experiment, an element of the model input
combine speed (𝑠) with volume (𝑣) and occupancy (𝑜), where
9
[6] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep modeling, and analysis,” in Smart Cities Conference (ISC2), 2016 IEEE
recurrent neural networks,” in Acoustics, speech and signal processing International. IEEE, 2016, pp. 1–2.
(icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645– [30] X. Ma, Y.-J. Wu, and Y. Wang, “Drive net: E-science transportation
6649. platform for data sharing, visualization, modeling, and analysis,”
[7] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural Transportation Research Record: Journal of the Transportation Research
image caption generator,” in Proceedings of the IEEE conference on Board, no. 2215, pp. 37–49, 2011.
computer vision and pattern recognition, 2015, pp. 3156–3164. [31] K. Henrickson, Y. Zou, and Y. Wang, “Flexible and robust method for
[8] D. Eck and J. Schmidhuber, “A first look at music composition using lstm missing loop detector data imputation,” Transportation Research Record:
recurrent neural networks,” Istituto Dalle Molle Di Studi Sull Intelligenza Journal of the Transportation Research Board, no. 2527, pp. 29–36, 2015.
Artificiale, vol. 103, 2002. [32] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction
[9] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. with big data: a deep learning approach,” IEEE Transactions on Intelligent
Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” Transportation Systems, vol. 16, no. 2, pp. 865–873, 2015.
in Proceedings of the IEEE Conference on Computer Vision and Pattern [33] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional, long
Recognition, 2016, pp. 961–971. short-term memory, fully connected deep neural networks,” in Acoustics,
[10] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Speech and Signal Processing (ICASSP), 2015 IEEE International
Continual prediction with lstm,” 1999. Conference on. IEEE, 2015, pp. 4580–4584.
[11] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies [34] X. Song, H. Kanasugi, and R. Shibasaki, “Deeptransport: Prediction and
with gradient descent is difficult,” IEEE transactions on neural networks, simulation of human mobility and transportation mode at a citywide
vol. 5, no. 2, pp. 157–166, 1994. level.” in IJCAI, 2016, pp. 2618–2624.
[12] K. Cho, B. Van Merri¨enboer, D. Bahdanau, and Y. Bengio, “On the [35] J. Guo, W. Huang, and B. M. Williams, “Adaptive kalman filter approach
properties of neural machine translation: Encoder-decoder approaches,” for stochastic short-term traffic flow rate prediction and uncertainty
arXiv preprint arXiv:1409.1259, 2014. quantification,” Transportation Research Part C: Emerging Technologies,
[13] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of vol. 43, pp. 50–64, 2014.
gated recurrent neural networks on sequence modeling,” arXiv preprint [36] Q. Ye, W. Y. Szeto, and S. C. Wong, “Short-term traffic speed forecasting
arXiv:1412.3555, 2014. based on data recorded at irregular intervals,” IEEE Transactions on
[14] K. Greff, R. K. Srivastava, J. Koutn´ık, B. R. Steunebrink, and J. Intelligent Transportation Systems, vol. 13, no. 4, pp. 1727–1737, 2012.
Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on [37] C.-H. Wu, J.-M. Ho, and D.-T. Lee, “Travel-time prediction with support
neural networks and learning systems, 2017. vector regression,” IEEE transactions on intelligent transportation
[15] M.-D. Wang, Q.-R. Qiu, and B.-W. Cui, “Short-term wind speed systems, vol. 5, no. 4, pp. 276–281, 2004.
forecasting combined time series method and arch model,” in Machine [38] M. G. Karlaftis and E. I. Vlahogianni, “Statistical methods versus neural
Learning and Cybernetics (ICMLC), 2012 International Conference on, networks in transportation research: Differences, similarities and some
vol. 3. IEEE, 2012, pp. 924–927. insights,” Transportation Research Part C: Emerging Technologies, vol.
[16] X. Zheng and M. Liu, “An overview of accident forecasting 19, no. 3, pp. 387–399, 2011.
methodologies,” Journal of Loss Prevention in the process Industries, vol. [39] J. Hua and A. Faghri, “Apphcations of artificial neural networks to
22, no. 4, pp. 484–491, 2009. intelligent vehicle-highway systems,” Transportation Research Record,
[17] X. Jiang and H. Adeli, “Wavelet packet-autocorrelation function method vol. 1453, p. 83, 1994.
for traffic flow pattern analysis,” Computer-Aided Civil and [40] H. Yin, S. Wong, J. Xu, and C. Wong, “Urban traffic flow prediction
Infrastructure Engineering, vol. 19, no. 5, pp. 324–337, 2004. using a fuzzy-neural approach,” Transportation Research Part C:
[18] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” Emerging Technologies, vol. 10, no. 2, pp. 85–98, 2002.
IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, [41] D. Park and L. R. Rilett, “Forecasting freeway link travel times with a
1997. multilayer feedforward neural network,” Computer-Aided Civil and
[19] A. Graves and J. Schmidhuber, “Framewise phoneme classification with Infrastructure Engineering, vol. 14, no. 5, pp. 357–367, 1999.
bidirectional lstm and other neural network architectures,” Neural [42] J. Van Lint, S. Hoogendoorn, and H. Van Zuylen, “Freeway travel time
Networks, vol. 18, no. 5, pp. 602–610, 2005. prediction with state-space neural networks: modeling state-space
[20] A. Graves, N. Jaitly, and A.-r. Mohamed, “Hybrid speech recognition dynamics with recurrent neural networks,” Transportation Research
with deep bidirectional lstm,” in Automatic Speech Recognition and Record: Journal of the Transportation Research Board, no. 1811, pp. 30–
Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 273– 39, 2002.
278. [43] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma, “,” arXiv preprint
[21] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, arXiv:1705.02699, 2017.
no. 7553, pp. 436–444, 2015. [44] R. Fu, Z. Zhang, and L. Li, “Using lstm and gru neural network methods
[22] Y. Duan, Y. Lv, and F.-Y. Wang, “Travel time prediction with lstm neural for traffic flow prediction,” in Chinese Association of Automation (YAC),
network,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th Youth Academic Annual Conference of. IEEE, 2016, pp. 324–328.
International Conference on. IEEE, 2016, pp. 1053–1058. 2 [45] Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm network: a deep
[23] Y.-y. Chen, Y. Lv, Z. Li, and F.-Y. Wang, “Long short-term memory learning approach for short-term traffic forecast,” IET Intelligent
model for traffic congestion prediction with online open data,” in Transport Systems, vol. 11, no. 2, pp. 68–75, 2017.
Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International [46] Y. Wang, R. Ke, W. Zhang, Z. Cui, and K. Henrickson, “Digital roadway
Conference on. IEEE, 2016, pp. 132–137. interactive visualization and evaluation network applications to wsdot
[24] Y. Wu and H. Tan, “Short-term traffic flow forecasting with operational data usage,” University of Washington Seattle, Washington,
spatialtemporal correlation in a hybrid deep learning framework,” arXiv Tech. Rep., 2016.
preprint arXiv:1612.01022, 2016. [47] J. Li and H. Zhang, “Fundamental diagram of traffic flow: new
[25] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, “Deep learning: A identification scheme and further evidence from empirical data,”
generic approach for extreme condition traffic forecasting,” in Transportation Research Record: Journal of the Transportation Research
Proceedings of the 2017 SIAM International Conference on Data Mining. Board, no.2260, pp. 50–59, 2011.
SIAM, 2017, pp. 777–785.
[26] S. R. Chandra and H. Al-Deek, “Predictions of freeway traffic speeds and
volumes using vector autoregressive models,” Journal of Intelligent
Transportation Systems, vol. 13, no. 2, pp. 53–72, 2009.
[27] Y. Kamarianakis, H. O. Gao, and P. Prastacos, “Characterizing regimes
in daily cycles of urban traffic using smooth-transition regressions,”
Transportation Research Part C: Emerging Technologies, vol. 18, no. 5,
pp. 821–840, 2010.
[28] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series
analysis: forecasting and control. John Wiley & Sons, 2015.
[29] Z. Cui, S. Zhang, K. C. Henrickson, and Y. Wang, “New progress of drive
net: An e-science transportation platform for data sharing, visualization,