A SpatialTemporal Attention Approach For Traffic Prediction
A SpatialTemporal Attention Approach For Traffic Prediction
Abstract— Accurate traffic forecasting is important to enable Time-series based approaches, such as autoregressive inte-
intelligent transportation systems in a smart city. This problem is grated moving average (ARIMA), Kalman filtering, and latent
challenging due to the complicated spatial, short-term temporal space model, have been widely applied to traffic prediction
and long-term periodical dependencies. Existing approaches have
considered these factors in modeling. Most solutions apply CNN, problems ( [4], [9], [14]). But these approaches do not cap-
or its extension Graph Convolution Networks (GCN) to model ture the complex non-linear spatial-temporal dependency well.
the spatial correlation. However, the convolution operator may Recent advances in deep learning enable promising results in
not adequately model the non-Euclidean pair-wise correlations. modeling the complex spatiotemporal relationship in traffic
In this paper, we propose a novel Attention-based Periodic- forecasting. Existing deep learning approaches usually adopt
Temporal neural Network (APTN), an end-to-end solution for
traffic foresting that captures spatial, short-term, and long- the CNN for spatial correlation extraction, and RNN or its
term periodical dependencies. APTN first uses an encoder variants LSTM/GRU for temporal dependencies modeling. For
attention mechanism to model both the spatial and periodical example, several studies ( [27], [28]) have modeled citywide
dependencies. Our model can capture these dependencies more traffic as a heatmap image and used CNN to model the
easily because every node attends to all other nodes in the non-linear spatial dependency, and [25] used recurrent neural
network, which brings regularization effect to the model and
avoids overfitting between nodes. Then, a temporal attention is network based framework for modeling temporal dependency.
applied to select relevant encoder hidden states across all time Recent studies further proposed methods to jointly model
steps. We evaluate our proposed model using real world traffic spatial, temporal, and external features dependencies by inte-
datasets and observe consistent improvements over state-of-the- grating CNN and LSTM ( [10], [16], [22], [23]). However,
art baselines. these convolution based approaches may not adequately model
Index Terms— Attention mechanism, traffic prediction, neural the spatial correlation including non-Euclidean pair-wise cor-
networks. relations, since the convolution is based on Euclidean distance
to capture spatial correlation. [5] alleviates this problem using
I. I NTRODUCTION multi-graph convolutions, which take into account distance,
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
4910 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 22, NO. 8, AUGUST 2021
the entire graph. In other words, these nodes with different TABLE I
weights can reconstruct the entire graph. The motivation is N OTATIONS
that the values in the entire graph are related to the values
of several most important nodes, which brings regularization
effect to the model and avoids overfitting between nodes. The
main contributions of this paper are,
• We proposed a novel end-to-end framework for traffic
prediction, which can model spatial, short-term and long-
term periodical dependencies using attention mechanism.
• We designed an attention mechanism for obtaining
dynamic spatial correlation. Compared with CNN/GCN
approaches, it better captures the spatial dependencies,
leading to a significant performance gain.
• By extensive experiment results, we showed that the
performance of our model outperforms the state-of-art
methods.
This paper is organized as follows. Section II presents
related work. In Section III, we introduce notations and give
the problem definition. Section IV describes the detailed
design of our proposed model. The experiment evaluations are
presented in Section V and Section VI concludes this paper.
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: SPATIAL–TEMPORAL ATTENTION APPROACH FOR TRAFFIC PREDICTION 4911
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
4912 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 22, NO. 8, AUGUST 2021
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: SPATIAL–TEMPORAL ATTENTION APPROACH FOR TRAFFIC PREDICTION 4913
Fig. 4. The Architecture of Encoder and Decoder. The spatial attention mechanism computes the attention weights conditioned on the previous hidden state
L in the recurrent skip network. Then the newly computed r is fed into the encoder LSTM unit. The temporal attention computes
et−1 in the encoder and h t,n t
the attention weights based on the previous decoder hidden state dt−1 and represents the input information as a weighted sum of the encoder hidden states
across all the time steps. The generated context vector c̃t , and z t,n+1 are then used as inputs to the decoder LSTM unit.
where f2 is the mapping function that the decoder LSTM Then the final prediction of APTN is the integration of the
learned. Then a fully connected neural network is applied to outputs of neural network and the AR component,
obtain the final output,
x̂ T +k = x̂ Tnn+k + x̂ Tar+k . (18)
oTs = ReLU (Vi (ReLU (Wi [eTs ; dTs ] + bwi )) + bvi ), (15)
In our experiment, we adopt the squared error as loss function
where Wi ∈ R v×2m , Vi ∈ R v×v are learnable parameters, and of our model in training,
oTs ∈ R v is the output representation of neural network and
K
will be used to generate predictions for each horizon. 1 i
L= |x̂ T +k − x Ti +k |2 , (19)
K
i=1 k=1
D. Generating the Prediction
where is the number of training samples, x̂ Ti +k and x Ti +k are
As shown in Figure 1, the same encoder-decoder architec-
the prediction and groundtruth of the i -th sample in horizon
ture is used for all future K horizons, with shared parameters.
k respectively.
In this way, we can achieve the effect of multi-task learning,
obtain all features in K horizons, and reduce model overfitting.
After encoder-decoder, we use a feedforward neural network V. E XPERIMENTS
to obtain the final output of neural network, and the predicted A. Settings
output x̂ Tnn+k at time T + k is, 1) Dataset: To test the performance of our model, we use
x̂ Tnn+k = Vmk (ReLU (Wmk oTs + bwm
k
)) + bm
k
, (16) two large-scale public real-world datasets PeMSD4 and
PeMSD8 from California [6]. The data is collected in real
where Wmk ∈ R N×v , bwm k ∈ R N , Vmk ∈ R N×N and bm k ∈ RN
time every 30 seconds, and is aggregated into every 5-minutes
are learnable parameters. interval from the raw data.
Due to the non-linearity of the recurrent components, one 1) PeMSD4: It refers to the traffic data in San Francisco
major drawback of the neural network model is that the scale Bay Area, containing 3848 detectors on 29 roads, from
of outputs is not sensitive to the scale of inputs. However, for which we choose 307 detectors. The time span of this
real traffic datasets, the scale of input constantly changes in a dataset is from January to February in 2018.
non-periodic manner, which lowers the forecasting accuracy of 2) PeMSD8: It is the traffic data in San Bernardino from
the neural network model. To address this deficiency, following July to August in 2016, containing 1979 detectors on
the previous work [11], we decompose the final prediction 8 roads, from which we choose 170 detectors.
into a linear part using Autoregressive (AR) model, which
The data for training, validation, and test is 6:2:2.
primarily focuses on the local scaling issue, plus a non-linear
2) Baselines: We compare APTN with the following widely
part containing recurring patterns. This stabilizes the gradient
used baselines:
flow and makes the neural network easy to train. The output
from the autoregressive part at time T + k is 1) Historical Average (HA): which models the traffic
demand as a seasonal process, and uses the average of
Tar−1
k, j previous seasons as the prediction. The period used is
x̂ Tar+k = War x T − j + bar
k
, (17) one week, and the prediction is based on aggregated data
j =0
from the same time in previous weeks.
where Tar is the size of input window over short-term input, 2) Auto-Regressive Integrated Moving Average
k, j k ∈ R Tar ,
War is the j -th value of learnable parameters War (ARIMA) [20]: which is a generalization of an
and bar ∈ R.
k autoregressive moving average (ARMA) model,
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
4914 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 22, NO. 8, AUGUST 2021
and considers moving average and autoregressive median absolute error (MdAE), mean absolute scaled error
components. (MASE) [8], and mean absolute percentage error (MAPE)
3) Vector Auto-Regressive (VAR) [29]: which is also a are used to evaluate the models, and are defined as follows:
time series model, capturing the pairwise relationships
1 K
RM S E =
among all traffic flow series.
|x̂ Ti +k − x Ti +k |2 , (20)
4) Long Short-Term Memory Network (LSTM): Long K
i=1 k=1
Short-Term Memory network, a special RNN for
K
sequence modeling. The size of LSTM unit is set to 1
M AE = |x̂ Ti +k − x Ti +k |, (21)
be the same as our model. K
i=1 k=1
5) Dual-stage Attention-based Recurrent Neural Net-
work (DA-RNN) [15]: a dual-stage attention-based Md AE = medi an |x̂ Ti +k − x Ti +k |, (22)
1≤i≤,1≤k≤K
recurrent neural network for time series prediction.
1 K
|x̂ Ti +k − x Ti +k |
In the first stage, it introduces an input attention mech- M AS E = , (23)
anism to adaptively extract relevant driving series at K M AE in−sample
i=1 k=1
each time step by referring to the previous encoder K
1 x̂ Ti +k − x Ti +k
hidden state. In the second stage, it uses a temporal M AP E = | |, (24)
attention mechanism to select relevant encoder hidden K x Ti +k
i=1 k=1
states across all time steps.
where x̂ Ti +k and x Ti +k are the prediction and groundtruth
6) Geo-sensory Multi-level Attention Networks
of i -th sample of the k-th horizon of of all nodes respec-
(GeoMAN) [13]: a multi-level attention networks
tively, and M AE in−sample is the mean absolute error of
for geo-sensory time series prediction. It introduces
the twelve-step forecast random walk method and its val-
global spatial attention to capture the correlation
ues on the PeMSD4 and PeMSD8 datasets are 42.66 and
between the target time series of a sensor and the time
35.48. In addition, on the PeMSD4 and PeMSD8 test sets,
series of other sensors and uses local spatial attention
the RMSE and MAE of the random walk one-step forecast are
to capture the correlation between a feature time series
(34.28, 21.06) and (24.82, 16.04).
and other time series.
3) Hyperparameter Settings: For hyperparameter settings,
7) Spatial-Temporal Graph Convolution Network
we use grid search to find the optimal hyperparameters and
(STGCN) [24]: it comprises several spatio-temporal
optimization algorithms. From n = [1, 2, 4, 6, 7, 8, 9, 10,
convolutional blocks, which are a combination of
11, 12], Ts = [1, 2, 4, 8, 12, 16, 24, 32], m = v = [32, 64,
graph convolutional layers and convolutional sequence
128, 256], batch size = [16, 32, 64, 128], learning rate = [0.1,
learning layers, to model spatial and temporal
0.01, 0.001, 0.0001], dropout = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] and
dependencies.
optimization = (SGD, Momentum, Adagrad, Adam), we find
8) Diffusion Convolutional Recurrent Neural Network
the set of hyperparameters and optimization algorithm that the
(DCRNN) [12]: a diffusion convolutional recurrent
model performs best on the validation set. One time slot of
neural network for traffic forecasting. The diffusion
the PeMSD4 and PeMSD8 datasets is 5 minutes. We set Ts to
convolution operation builds a latent representation by
24 (corresponding to two hours) and n to 7 (corresponding to
scanning a diffusion process across each node in a graph-
one week) for all datasets. For long-term temporal information,
structured input, where the diffusion process is based on
we set period time interval Tl to be one day, i.e., Tl is 288
random walks on graph.
(corresponding to 288 × 5 mins = 1 day). The dimension
9) Graph WaveNet [21]: a graph convolution network
of hidden state of all LSTM unit m is set as 128, and the
using a constructed adjacency matrix to uncover unseen
feature representation dimension v is set as 128 as well. In our
graph structures from data. It proposes a CNN-based
experiment, the batch size is set to 64, the learning rate is set
graph convolution layer in which a self-adaptive adja-
as 0.001 and Adam optimization algorithm is used to train
cency matrix can be learned from the data through
model. Both dropout and recurrent dropout rates in LSTM are
an end-to-end supervised training, where the self-
set to 0.2.
adaptive adjacency matrix preserves hidden spatial
The source code and the two datasets are available at https://
dependencies.
github.com/Maple728/APTN.
10) Attention based Spatial-Temporal Graph Convolu-
tional Networks (ASTGCN) [6]: an attention based
spatial-temporal graph convolutional networks for traffic B. Results
flow forecasting. It designs a novel spatial-temporal Table II shows the results of our proposed method as com-
convolution module consisting of graph convolutions pared to other baselines on PeMSD4 and PeMSD8 datasets,
for capturing spatial features from the original graph- respectively. We report the average prediction results of traffic
based traffic network structure and convolutions in the volume over the next one hour (K = 12). We can see that
temporal dimension for describing dependencies from our proposed APTN outperforms all competing baselines,
nearby time slices. which suggests the effectiveness of proposed approach for
In our experiments, five commonly metrics: root spatiotemporal correlations modeling. In Figure 5, we show
mean square error (RMSE), mean absolute error (MAE), the actual and predicted time series with different horizons
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: SPATIAL–TEMPORAL ATTENTION APPROACH FOR TRAFFIC PREDICTION 4915
TABLE II
C OMPARISON W ITH D IFFERENT BASELINES ON P E MSD4 AND P E MSD8
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
4916 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 22, NO. 8, AUGUST 2021
the rest of the roads. The trend of solid red line at top and
bottom represents the trend of high traffic volume and low
traffic volume sets of roads. Because these two groups have
fewer roads, so only one road in each group has a higher
attention weight. There are many roads with medium volume,
which account for a large proportion of weight in the entire
Fig. 7. Peak and off-peak performance of all models.
graph and have various trends. Therefore, four roads with large
attention weights (four solid red line in the middle) represent
For MAPE, due to the small volume in off-peak hours, it is the trend of medium traffic volume. It can be seen that the
higher than peak hours. We also test the performance of all trend of each road with higher attention weight reflects the
baselines during peak and off-peak periods, and the results are main trend of its group, which collectively reflect the main
shown in Figure 7. We can see that our model outperforms all trend of the entire graph.
competing baselines.
D. Ablation Analysis
C. Spatial Attention Analysis Our proposed APTN model mainly consists of the fol-
The weights obtained from the spatial attention measure lowing four components, spatial attention, temporal attention,
the importance of each node to the entire road network. periodical information, and AR. To further investigate the
Figure 8(a) illustrates the attention weights. The six roads effectiveness of each component, we compare APTN with its
marked by darker red are the most weighted roads. Next, variants as follows:
we show how these six roads affect the prediction. Figure 8(b) 1) APTN/SA: remove the spatial attention. The input of
encoder LSTM is the concatenation of z t,n+1 and h t,n L
shows how traffic volumes of each road change with time,
where the six solid red lines represent the selected six roads L
instead of the weighted input rt and h t,n . It means that
with the largest weights, and the black dotted lines represent our model just deteriorates to a standard LSTM model
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: SPATIAL–TEMPORAL ATTENTION APPROACH FOR TRAFFIC PREDICTION 4917
VI. C ONCLUSION
In this paper, we investigated the traffic prediction prob-
lem. We proposed a novel Attention-based Periodic-Temporal
neural Network (APTN), which captures the spatial, temporal,
and periodical correlations. When evaluated on real-world
datasets, the proposed approach achieved better results than
state-of-the-art baselines. However, one potential limitation of
our model is that it has more parameters (2.2M of APTN
versus 1.3M of ASTGCN), which means that our model
needs more data to perform better than other baselines. For
future work, we plan to take external factors into account to
further improve the forecasting accuracy, e.g., weather, social
events, POI.
Fig. 9. Effect of hyperparameters.
R EFERENCES
based on encoder-decoder architecture with temporal [1] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
attention. jointly learning to align and translate,” 2014, arXiv:1409.0473. [Online].
2) APTN/TA: remove the temporal attention. The input of Available: http://arxiv.org/abs/1409.0473
[2] K. Cho et al., “Learning phrase representations using RNN encoder-
the decoder LSTM is the hidden state of encoder LSTM decoder for statistical machine translation,” 2014, arXiv:1406.1078.
et instead of context vector ct . [Online]. Available: http://arxiv.org/abs/1406.1078
3) APTN/PI: remove the periodical input. The long-term [3] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation
of gated recurrent neural networks on sequence modeling,” 2014,
periodical component is removed, and the input of arXiv:1412.3555. [Online]. Available: http://arxiv.org/abs/1412.3555
encoder LSTM is just the weighted input rt . [4] D. Deng, C. Shahabi, U. Demiryurek, L. Zhu, R. Yu, and L. Yan, “Latent
4) APTN/AR: remove the AR part. The prediction of our space model for road networks to predict time-varying traffic,” in Proc.
KDD, 2016, pp. 1525–1534.
model is just the output of neural network x̂ Tnn+k . [5] X. Geng et al., “Spatiotemporal multi-graph convolution network for
Table IV shows the results of each model. We can see ride-hailing demand forecasting,” in Proc. AAAI Conf. Artif. Intell.,
that among all components, spatial attention has the biggest vol. 33, Jul. 2019, pp. 3656–3663.
[6] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-
effect. Without spatial attention, the RMSE increases from temporal graph convolutional networks for traffic flow forecasting,” in
31.00 to 34.78. The spatial attention can selectively focus Proc. AAAI Conf. Artif. Intell., vol. 33, Jul. 2019, pp. 922–929.
on certain roads rather than treating all roads equally, and [7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
this data-driven attention mechanism captures the dynamic [8] R. J. Hyndman and A. B. Koehler, “Another look at measures of
correlations as well. forecast accuracy,” Int. J. Forecasting, vol. 22, no. 4, pp. 679–688,
From Table IV, AR has the second biggest effect. This Oct. 2006.
[9] S. Ishak and H. Al-Deek, “Performance evaluation of short-term time-
shows that AR can better obtain the linear output, comple- series traffic prediction model,” J. Transp. Eng., vol. 128, no. 6,
menting the nonlinear output from neural network. pp. 490–498, Nov. 2002.
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
4918 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 22, NO. 8, AUGUST 2021
[10] J. Ke, H. Zheng, H. Yang, and X. Chen, “Short-term forecasting of Heng Qi received the B.S. degree from Hunan
passenger demand under on-demand ride services: A spatio-temporal University in 2004, the M.E. and Ph.D. degrees from
deep learning approach,” Transp. Res. C, Emerg. Technol., vol. 85, the Dalian University of Technology in 2006 and
pp. 591–608, Dec. 2017. 2012, respectively. He has been a JSPS Oversea
[11] G. Lai, W.-C. Chang, Y. Yang, and H. Liu, “Modeling Long- and short- Research Fellow with the Graduate School of Infor-
term temporal patterns with deep neural networks,” in Proc. 41st Int. mation Science, Nagoya University, Japan, from
ACM SIGIR Conf. Res. Develop. Inf. Retr. (SIGIR), 2018, pp. 95–104. 2016 to 2017. He is currently an Associate Professor
[12] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent with the School of Computer Science and Tech-
neural networks: Data-driven traffic forecasting,” in Proc. ICLR, 2018, nology, Dalian University of Technology, China.
pp. 1–15. His research interests include computer network and
[13] Y. Liang, S. Ke, J. Zhang, X. Yi, and Y. Zheng, “GeoMAN: Multi-level multimedia computing.
attention networks for geo-sensory time series prediction,” in Proc. 27th
Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 3428–3434.
[14] I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic volume
through Kalman filtering theory,” Transp. Res. B, Methodol., vol. 18,
no. 1, pp. 1–11, Feb. 1984.
[15] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cot-
trell, “A dual-stage attention-based recurrent neural network for
time series prediction,” 2017, arXiv:1704.02971. [Online]. Available:
http://arxiv.org/abs/1704.02971 Yanming Shen received the B.S. degree in automa-
[16] X. Shi, Z. Chen, W. Hao, D. Y. Yeung, W. Wong, and W. Woo, tion from Tsinghua University in 2000, and the
“Convolutional LSTM Network: A Machine Learning Approach for Ph.D. degree from the Department of Electrical
Precipitation Nowcasting,” in Proc. NIPS, 2015, pp. 802–810. and Computer Engineering, Polytechnic University
[17] X. Shi, Z. Gao, L. Lausen, H. Wang, and D.-Y. Yeung, “Deep learning (now NYU Tandon School of Engineering) in 2007.
for precipitation nowcasting: A benchmark and a new model,” in Proc. He is currently a Professor with the School of Com-
NIPS, 2017, pp. 5617–5627. puter Science and Technology, Dalian University of
[18] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning Technology, China. His general research interests
with neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2014, include big data analytics, distributed systems, and
pp. 3104–3112. networking. He is a recipient of the 2011 Best Paper
[19] A. Vaswani et al., “Attention is all you need,” in Proc. NIPS, 2017, Award for Multimedia Communications (awarded by
pp. 5998–6008. the IEEE Communications Society).
[20] B. M. Williams and L. A. Hoel, “Modeling and forecasting vehicular
traffic flow as a seasonal ARIMA process: Theoretical basis and empir-
ical results,” J. Transp. Eng., vol. 129, no. 6, pp. 664–672, Nov. 2003.
[21] Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph WaveNet for
deep spatial-temporal graph modeling,” in Proc. 28th Int. Joint Conf.
Artif. Intell., Aug. 2019, pp. 1907–1913.
[22] H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting spatial-
temporal similarity: A deep learning framework for traffic prediction,” Genze Wu is currently pursuing the bachelor’s
in Proc. AAAI Conf. Artif. Intell., vol. 33, pp. 5668–5675, Jul. 2019. degree with the Dalian University of Technology,
[23] H. Yao et al., “Deep multi-view spatial-temporal network for taxi majoring in computer science. His research interests
demand prediction,” in Proc. AAAI, 2018, pp. 2588–2595. include data mining and computer vision.
[24] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
networks: A deep learning framework for traffic forecasting,” in Proc.
27th Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 1–5.
[25] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, “Deep learning:
A generic approach for extreme condition traffic forecasting,” in Proc.
SIAM Int. Conf. Data Mining, 2017, pp. 777–785.
[26] J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D.-Y. Yeung, “GaAN: Gated
attention networks for learning on large and spatiotemporal graphs,” in
Proc. UAI, 2018, pp. 1–10.
[27] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual net-
works for citywide crowd flows prediction,” in Proc. AAAI, 2017,
pp. 1655–1661.
[28] J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, “DNN-based prediction
model for spatio-temporal data,” in Proc. 24th ACM SIGSPATIAL Int.
Conf. Adv. Geographic Inf. Syst. (GIS), 2016, pp. 1–4. Baocai Yin (Member, IEEE) received the M.S. and
[29] E. Zivot and J. Wang, “Vector autoregressive models for multivariate Ph.D. degrees in computational mathematics from
time series,” in Proc. Modeling Financial Time Series S-PLUS, 2006, the Dalian University of Technology, Dalian, China,
pp. 385–429. in 1988 and 1993, respectively. He is currently a
Professor of computer science and technology with
the Faulty of Electronic Information and Electrical
Engineering, Dalian University of Technology. He is
also a Researcher with the Beijing Key Laboratory of
Multimedia and Intelligent Software Technology and
the Beijing Advanced Innovation Center for Future
Xiaoming Shi received the B.S. degree from Dalian Internet Technology. He has authored or coauthored
Maritime University in 2017. He is currently pursu- more than 200 academic articles in prestigious international journals, includ-
ing the master’s degree in computer science with the ing the IEEE T RANSACTIONS ON PATTERN A NALYSIS AND M ACHINE
Dalian University of Technology. His research inter- I NTELLIGENCE (T-PAMI), the IEEE T RANSACTIONS ON M ULTIMEDIA
ests include data mining and time series forecasting. (T-MM), the IEEE T RANSACTIONS ON I MAGE P ROCESSING (T-IP), the IEEE
T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS
(T-NNLS), the IEEE T RANSACTIONS ON C YBERNETICS (T-CYB), the IEEE
T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY
(T-CSVT), and top-level conferences, such as CVPR, AAAI, INFOCOM,
IJCAI, and ACM SIGGRAPH. His research interests include multimedia,
image processing, computer vision, and pattern recognition.
Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.