Multi-Encoder Spatio-Temporal Feature Fusion Network For Electric Vehicle Charging Load Prediction
Multi-Encoder Spatio-Temporal Feature Fusion Network For Electric Vehicle Charging Load Prediction
https://doi.org/10.1007/s10846-024-02125-z
REGULAR PAPER
Received: 2 March 2024 / Accepted: 28 May 2024 / Published online: 9 July 2024
© The Author(s) 2024
Abstract
Electric vehicles (EVs) have been initiated as a preference for decarbonizing road transport. Accurate charging load prediction
is essential for the construction of EV charging facilities systematically and for the coordination of EV energy demand with
the requisite peak power supply. It is noted that the charging load of EVs exhibits high complexity and randomness due to
temporal and spatial uncertainties. Therefore, this paper proposes a SEDformer-based charging road prediction method to
capture the spatio-temporal characteristics of charging load data. As a deep learning model, SEDformer comprises multiple
encoders and a single decoder. In particular, the proposed model includes a Temporal Encoder Block based on the self-attention
mechanism and a Spatial Encoder Block based on the channel attention mechanism with sequence decomposition, followed
by an aggregated decoder for information fusion. It is shown that the proposed method outperforms various baseline models
on a real-world dataset from Palo Alto, U.S., demonstrating its superiority in addressing spatio-temporal data-driven load
forecasting problems.
Keywords Electric vehicle · Load forecasting · Transformer · Spatio-temporal features · Series decomposition
The existing research methodologies on EV charging non-Euclidean spaces, raising challenges for most RNN-
load prediction can be broadly categorized into probability- based or CNN-based methods. The increasing interest in
based methods and artificial intelligence-based methods [6]. leveraging the expressive abilities of graph structures for
Probability-based methods are first widely used in study- machine learning-based graph analysis has brought atten-
ing the operation patterns of EVs. For example, a statistical tion to graph convolutional networks (GCN) [19]. Spatio-
model [7], relying on a small number of samples of user daily temporal graph convolutional network (STGCN) [20, 21]
driving data, is used to predict the temporal distribution of generates the prediction results by stacking spatio-temporal
charging loads, where the impact of large-scale EV charg- convolutional blocks, which are composed of temporal
ing behavior on the power grid is then evaluated through a gated convolutional layers and GCN layers. Attention based
small-scale simulation. Ikegami et al. [8] analyze the impact spatial-temporal graph convolutional network (ASTGCN)
of high penetration rates of EVs on the overall power demand [22] is built upon STGCN by incorporating an attention
of the electrical system. By employing probabilistic methods, mechanism into the spatio-temporal convolutional block to
Faridimehr et al. [9] utilize the Markov chain to model the enhance the modeling of dynamic spatio-temporal correla-
EV charging network as a two-stage stochastic problem to tions. Global spatio-temporal dynamic capturing network
estimate the charging load. It is noted that these methods (GSTDCN) [23] describes dynamic traffic flow information
often require the involvement of complex influencing factors through a combination of a global encoding block and GCN.
during modeling, however, the variables included in these It is worth mentioning that the over-smoothing issue
models are usually difficult to observe directly. For exam- for GCN makes it difficult to stack multiple layers [24],
ple, the battery state of charge (SOC), which serves as a which exhibits slow convergence speeds with large-scale
fundamental variable, can only be estimated through inter- node datasets. Transformer [25] initially achieves remark-
nal resistance, charge-discharge current, and battery voltage able success in natural language processing (NLP) [26].
[10]. This process tends to introduce biases and increase the Its encoder-decoder architecture is capable of capturing
difficulty of prediction. dependencies between sequence inputs. In particular, there
In recent years, machine-learning-based approaches has been an increasing amount of Transformer-based mod-
emerged, and it has been proven to be simple and effective els on time series forecasting. AutoFormer [27] embeds a
for load prediction. Traditional machine learning methods, decomposition architecture in both the encoder and decoder,
known for their ease of execution and rapid computation, are also replacing the multi-head attention mechanism of the
widely employed in time series forecasting. For instance, vanilla Transformer with an auto-correlation mechanism to
Liao et al. [11] propose support vector machine (SVM) better extract features in long time series. Informer [28]
for modeling the load forecasting model. Buzna et al. [12] reduces the time and space complexity of Transformer com-
employ two classical approaches, i.e., random forests (RF) putation through ProbSparse self-attention and leverages a
and gradient boosting regression trees (GBRT), to com- generative style decoder to mitigate the prediction errors of
pare predictive performance using EV load data from 169 long sequence time-series forecasting (LSTF). While models
charging stations in the Netherlands. Nevertheless, as these based on Transformers often prioritize temporal information.
methods typically involve numerous parameters, the predic- Therefore, the insufficient utilization of spatial information
tion accuracy tends to decline significantly as the temporal leaves much space for improvement.
scale increases. Motivated by the Transformer-based models, we develop
More recently, by means of deep learning (DL) theories a spatio-temporal prediction network, SEDformer, to tackle
and the abundance of computing resources, an increasing the task of EV load forecasting. The primary contributions
number of neural network models have been presented for of this study are outlined as follows:
larger-scale spatio-temporal prediction. Specifically, long
short-term memory neural network (LSTM) [13] is first • We propose SEDformer to address the EV long-term load
applied in intelligent transportation systems (ITS) for traf- prediction problem by capturing features in both the tem-
fic flow prediction. Convolution and LSTM are combined poral and spatial dimensions.
to create a Conv-LSTM [14, 15] module for extracting • A multi-encoder architecture is employed to maxi-
the spatial-temporal features of traffic flow, where LSTM mize the utilization of information. Temporal features
captures temporal features and convolutional neural net- are extracted through a self-attention mechanism-based
works (CNN) [16] captures the spatial features of gridded Temporal Encoder Block, while spatial features, after
discrete data. DGCRN [17] is a recurrent neural network sequence decomposition, are extracted through a Squeeze-
(RNN)-based model, where gate recurrent unit (GRU) [18] and-Excitation block.
addresses the problem of gradient disappearing and dynamic • Experiments on various time scales are conducted on
graph convolution replaces matrix multiplication in GRU. an EV charging dataset in Palo Alto, U.S. SEDformer
Nonetheless, real-world scenarios usually generate data in demonstrates excellent predictive performance, surpass-
123
Journal of Intelligent & Robotic Systems (2024) 110:94 Page 3 of 11 94
ing all the baseline models based on different kinds of S-Encoder utilizes trend decomposition to capture spatial
architectures. information by yielding trend and fluctuation subsequences.
Subsequences of two modes are fed into individual S-E
Blocks, which essentially employ a form of channel atten-
2 Methodology tion to establish positional connections between different
charging stations. Then the outputs of T-Encoder and S-
2.1 Problem Formulation Encoder are fused through attention and gating mechanism.
Finally, the aggregated Decoder Block takes fusional output
As a time series prediction problem, we aim to forecast EV sequences from the encoder and masked sequences as inputs,
charging load data for the next T time steps based on the his- ultimately generating the sequential prediction as output. The
torical data spanning H time steps, which can be formulated architecture of SEDformer is depicted in Fig. 1.
as follows:
2.3 Temporal Encoder Block
Ŷ t+1 , Ŷ t+2 ,..., Ŷ t+T =
(1) Embedding To ensure comprehensive understanding and
F Y t , Y t−1 , ..., Y t−H +1 ,
effective utilization of information, preprocessing of the
input data is required before feeding it into the model. For
where Y i = y1i , y2i , ..., y N
i . Y i ∈ R N is a vector of length
the initial input X raw ∈ R N ×L , a one-dimensional convo-
N , and y ij signifies the load value of the j-th charging station lution is applied to map the N -channel input tensor to a
at time step i. The mapping F denotes the proposed model, higher dimension, enhancing non-linear expressive capabil-
which endeavors to minimize prediction errors as much as ities. Meanwhile, the temporal features, including month,
possible. weekday, and date information of the input values, are also
extracted using the convolution network for time embedding.
2.2 Overall Architecture Additionally, as the attention mechanism cannot inherently
utilize the positional features in sequences, we employ sine
We design SEDformer, featuring S-E block and trend decom- and cosine functions for positional encoding [25]. The entire
position block, to effectively harness the spatio-temporal embedding module can be represented as:
information in the charging load data. SEDformer is struc-
tured as an encoder-decoder, which takes the raw data from X in = P E (X raw ) + Conv1d (X raw )
the embedding layer as inputs and outputs the prediction of +T imeEmbed (X raw ) , (2)
sequence data. The encoder consists of Temporal Encoder
Block (T-Encoder) and Spatial Encoder Block (S-Encoder).
T-Encoder employs a self-attention mechanism to preserve sin pos/10000ieven /dmodel
P E ( pos, i) = , (3)
temporal information of the input data. Meanwhile, the cos pos/10000iodd /dmodel
Fig. 1 The complete structure of SEDformer. The input sequence is processed in parallel by the temporal encoder block and spatial encoder Block,
then fused and input into the aggregated decoder block to produce the predicted sequence
123
94 Page 4 of 11 Journal of Intelligent & Robotic Systems (2024) 110:94
where X in ∈ Rdmodel ×L , dmodel is the dimension of the fea- comprising linear layers and an activation function, formu-
tures after the whole embedding. P E represents positional lated as follows:
encoding embedding. Decoder inputs share the same way of
embedding with encoder inputs. F F N (x) = r elu (0, x W1 + b1 ) W2 + b2 , (8)
Multi-Head Attention The attention mechanism assists
to selectively filter out more important information from a
where W1 , b1 , W2 , and b2 are the parameters of the linear
vast pool of data and focuses on these crucial details while
layer. Residual connections [30] are utilized in the encoder.
disregarding much of the remaining information. Consider-
ing the elements in the input source as key (K ) and value (V ),
for query (Q) of each input element, V is calculated based 2.4 Spatial Encoder Block
on the similarity between Q and K . The attention score is
obtained by weighted summation of V . The attention mech- Trend Decomposition Time series decomposition [31] can
anism can be formulated as follows: partition the original data into trend components and peri-
odic components. The trend components encapsulate the
QK T low-frequency characteristics of the data, while the peri-
Att (Q, K , V ) = so f tmax √ V, (4) odic components capture the high-frequency features. Trend
dk
decomposition block is devised to effectively draw distinct
characteristics of the sequence. Since the training data is
where dk is the dimension of K . Self-attention is employed
obtained using a sliding window, to prevent potential data
here, hence Q, K , and V are identical elements. Addition-
leakage issues during the process of model training, a mov-
ally, the sequence is decomposed, utilizing multiple heads to
ing average strategy [27] is employed for data decomposition.
create subspaces that focus on different aspects of informa-
The process of trend decomposition is depicted as follows:
tion, namely multi-head attention mechanism [25], which is
formulated as follows:
X tr = Moving Avg (Padding (X )) , (9)
Multi head (Q, K , V ) = X pe = X − X tr , (10)
Concat (head1 , head2 , ..., head H ) W O , (5)
where X , X tr , X pe ∈ R N ×L . A padding strategy is required
to maintain the same dimensions between the subsequences
and the original load sequence.
head j = Attention Q j , K j , V j , S-E Block We introduce two Squeeze-and-Excitation
j = 1, ..., H , (6) blocks (S-E blocks) to extract inter-channel dependen-
cies within sub-sequences at different scales. Squeeze-
and-excitation networks (SENet) [32] are widely used in
Q CNN-based models to capture relationships between differ-
Q j = QW j , K j = K W jK , V j = V W jV ,
ent channels in images. Modifications have been made here
j = 1, ..., H , (7) to enable the application of SENet for sequential data. The
workflow of a single S-E block is illustrated in Fig. 2. For
where W O , W jQ , W jK , and W jV are the weight matrices of the input X in ∈ Rdmodel ×L , one-dimensional average pooling
linear layers. layer is utilized to compress the temporal information. Sub-
Feed-Forward Neural Network Layer normalization sequently, two fully connected layers are applied to obtain
(LayerNorm) [29] operates independently on the batch size, weights for each channel, representing the weights for each
computing the mean and variance for individual samples charging station. The obtained weight vector is multiplied
and normalizing across different channels of the sample. element-wise with X in through the broadcasting mecha-
Following LayerNorm, a feed-forward network is utilized, nism, assigning different attention to charging stations. The
123
Journal of Intelligent & Robotic Systems (2024) 110:94 Page 5 of 11 94
calculation formula for a single S-E block is as follows: aggregate encoded information as follows:
123
94 Page 6 of 11 Journal of Intelligent & Robotic Systems (2024) 110:94
We conduct experiments on a dataset containing usage We select five different neural network models, includ-
records of EV charging stations in Palo Alto, U.S. which span ing LSTM, CNN-based model, GCN-based model, and the
from July 2011 to December 2020 [33]. To address poten- Transformer families. They have all demonstrated satis-
tial data sparsity arising from the limited charging behavior factory performance in time series prediction tasks. These
of EVs at the early stage, the sampling interval is set to models are evaluated on the same dataset as our proposed
1-day. The dataset is divided into training, validation, and SEDformer for comparison.
testing sets in a ratio of 7:2:1. The location of several selected
charging stations, along with the historical load curve, are (1) Long Short-Term Memory (LSTM): LSTM addresses
illustrated in Fig. 3. the issue of gradient disappearing and exploding in
traditional RNN models by introducing memory cells,
3.2 Implementation Details enabling it to capture complex dependencies in long
sequences.
We conduct experiments using PyTorch [34] on a personal (2) Convolutional Neural Network and Long Short-Term
computer equipped with an AMD Ryzen 5 7500F CPU, Memory (Conv-LSTM): Conv-LSTM transforms spa-
32GB of RAM, and an NVIDIA GeForce RTX 4060Ti 16GB tially distributed data into grid structures, then employs
GPU. In the experiments, mean square error (MSE) loss and Conv2d and LSTM to extract spatial and temporal fea-
the ADAM optimizer [35] are employed with an initial learn- tures from traffic data.
ing rate of 10−4 . Batch size is set to 32. The number of (3) Temporal Graph Convolutional Network (T-GCN) [36]:
epochs is configured as 1000, and to prevent overfitting, the T-GCN incorporates GCN into the temporal model to
early stopping strategy is implemented during the training capture complex geographical topological structures.
process. According to practical demands, we set the size of (4) Transformer: Transformer features an encoder-decoder
the look-back sliding window as L ∈ {30, 60, 120}, corre- framework, where the attention mechanism is capable of
sponding to the prediction length O ∈ {7, 15, 30}. For the capturing complex temporal dependencies.
7-day and 15-day prediction length, SEDformer includes 1 (5) Informer: Informer introduces ProbSparse self-attention
layer of T-Encoder and 1 layer of Decoder. For the 30-day mechanism along with self-attention distilling operation
prediction length, SEDformer includes 2 layers of T-Encoder to specifically address LSTF, further enhancing the pro-
and 1 layer of Decoder. cess of inference through a generative style decoder.
Fig. 3 Charging station nodes and historical load curves. (a) A partial map of Palo Alto city, with several charging stations denoted by a blue circle.
(b) The charging loads over a 600-day period commencing in July 2011 of five distinct charging stations
123
Journal of Intelligent & Robotic Systems (2024) 110:94 Page 7 of 11 94
3.4 Results larger than T-GCN. Overall, SEDformer maintains its status
as the top-performing model in the EV load forecasting task.
Experiments are conducted on three different time-scale The comparative analysis of all models is as follows. First, it
datasets. Root mean square error (RMSE), mean absolute is observed that, in most cases, an increase in the prediction
error (MAE), and weighted mean absolute percentage error length contributes to higher complexity in long-term load
(WMAPE) are used as three evaluation metrics to compre- forecasting, which results in the model’s accuracy tending
hensively assess the model’s performance. The results are to decrease. However, when the prediction length increases
presented in Table 1. The metrics are computed as follows: from 7 days to 15 days, LSTM, T-GCN, and SEDformer show
a slight decrease in prediction errors. This phenomenon can
N 2 be attributed to the enlarged look-back window size, which
1 L
RMSE = y ij − ŷ ij , (15) provides the models with more spatio-temporal information
NL
i=1 j=1 from historical data. Second, Conv-LSTM exhibits relatively
weaker performance in the prediction task, owing to the dif-
1 i
L N ficulty introduced in predicting sparse grid data in the dataset
M AE = y j − ŷ ij , (16) from Palo Alto city, resulting in lower accuracy compared to
NL SEDformer and other baseline models. Third, it is observed
i=1 j=1
that LSTM and Transformer experience a significant perfor-
mance decline when the prediction length increases to 30
W M AP E =
i days, while SEDformer shows a relatively small increase in
L
N y ij y − ŷ i
j j errors, indicating its better capability in long-term load fore-
L N i
, (17)
i=1 j=1 i=1 j=1 y j y ij casting.
123
94 Page 8 of 11 Journal of Intelligent & Robotic Systems (2024) 110:94
80
EV Charging Load(kW)
80
EV Charging Load(kW)
80
EV Charging Load(kW)
60 60 60
40 40 40
20 20 20
0 0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
Day(d) Day(d) Day(d)
80
EV Charging Load(kW)
80
EV Charging Load(kW) 80
EV Charging Load(kW)
60 60 60
40 40 40
20 20 20
0 0 0
2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14
Day(d) Day(d) Day(d)
80
EV Charging Load(kW)
80
EV Charging Load(kW)
80
EV Charging Load(kW)
60 60 60
40 40 40
20 20 20
0 0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
Day(d) Day(d) Day(d)
tion window sizes. Furthermore, compared to several similar encoder, the feature aggregation module is unnecessary and
methods, SEDformer exhibits the smallest prediction error. thus excluded.
The results in Fig. 5 reveal that the removal of any
module can result in a varying degree of performance
4.2 Ablation Studies degradation for SEDformer, which becomes more pro-
nounced as the prediction horizon increases. Particularly
To demonstrate the effectiveness of the proposed model, for long-term load forecasting, such as 15 days, SED-
we design ablation experiments targeting each module. former w/o T-Encoder exhibits a conspicuous performance
Experiments are conducted on four models: SEDformer, decline, indicating the necessity of T-Encoder for capturing
SEDformer without T-Encoder (SEDformer w/o T-Encoder), long temporal features and improving predictive capabil-
SEDformer without S-Encoder (SEDformer w/o S-Encoder), ity. It is also shown that S-Encoder helps extract implicit
and SEDformer without the feature aggregation module similar relationships among charging stations and enhance
(SEDformer w/o gating) using RMSE, MAE, and WMAPE predictive performance. The gap between SEDformer w/o
metrics. It is noted that if the model involves only a single gating and SEDformer increases with the extension of the
123
Journal of Intelligent & Robotic Systems (2024) 110:94 Page 9 of 11 94
123
94 Page 10 of 11 Journal of Intelligent & Robotic Systems (2024) 110:94
prediction horizon, suggesting the pivotal role of the aggre- Open Access This article is licensed under a Creative Commons
gation module in combining spatio-temporal information Attribution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as
in complex scenarios. As a data-driven approach, the col-
long as you give appropriate credit to the original author(s) and the
laborative dual-encoder architecture of the proposed model source, provide a link to the Creative Commons licence, and indi-
effectively explores the dependencies between long-range cate if changes were made. The images or other third party material
and cross-regional information. in this article are included in the article’s Creative Commons licence,
unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your
intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copy-
5 Conclusion right holder. To view a copy of this licence, visit http://creativecomm
ons.org/licenses/by/4.0/.
123
Journal of Intelligent & Robotic Systems (2024) 110:94 Page 11 of 11 94
13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural 33. CityofPaloAlto: Electric vehicle charging station usage (July
Comput. 9(8), 1735–1780 (1997) 2011 - Dec 2020). https://data.cityofpaloalto.org/dataviews/
14. Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.- 257812/electric-vehicle-charing-station-usage-july-2011-dec-
c.: Convolutional LSTM network: a machine learning approach 2020/ (2023)
for precipitation nowcasting. Adv. Neural Inform. Process. Syst. 34. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan,
28 (2015) G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch:
15. Liu, Y., Zheng, H., Feng, X., Chen, Z.: Short-term traffic flow pre- an imperative style, high-performance deep learning library. Adv.
diction with Conv-LSTM. In: 2017 9th International Conference on Neural Inform. Process. Syst. 32 (2019)
Wireless Communications and Signal Processing, pp. 1–6 (2017) 35. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization.
16. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature ArXiv:1412.6980 (2014)
521(7553), 436–444 (2015) 36. Hüttel, F.B., Peled, I., Rodrigues, F., Pereira, F.C.: Deep spatio-
17. Li, F., Feng, J., Yan, H., Jin, G., Yang, F., Sun, F., Jin, D., Li, Y.: temporal forecasting of electrical vehicle charging demand.
Dynamic graph convolutional recurrent network for traffic predic- ArXiv:2106.10940 (2021)
tion: benchmark and solution. ACM Trans. Knowl. Discov. Data
17, 1–21 (2023)
18. Cho, K., Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares,
F., Schwenk, H., Bengio, Y.: Learning phrase representations using Publisher’s Note Springer Nature remains neutral with regard to juris-
RNN encoder–decoder for statistical machine translation. In: Pro- dictional claims in published maps and institutional affiliations.
ceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing, pp. 1724–1734 (2014)
19. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neu-
ral networks on graphs with fast localized spectral filtering. Adv. Yufan Chen received his B.S. degree from the School of Information
Neural Inform. Process. Syst. 29 (2016) and Control Engineering, China University of Mining and Technology,
20. Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional Xuzhou, China in 2022. He is currently pursuing his M.S. degree in
networks: a deep learning framework for traffic forecasting. the School of Automation at Southeast University. His current interests
ArXiv:1709.04875 (2017) include time series prediction and intelligent transportation systems.
21. Zhang, D., Peng, Y., Zhang, Y., Wu, D., Wang, H., Zhang, H.: Train
time delay prediction for high-speed train dispatching based on
spatio-temporal graph convolutional network. IEEE Trans. Intell.
Mengqin Wang received her B.S. degree from Changsha University
Transp. Syst. 23(3), 2434–2444 (2021)
of Science & Technology, Changsha, China in 2022. She is currently
22. Guo, S., Lin, Y., Feng, N., Song, C., Wan, H.: Attention based
pursuing her master’s degree in the School of Automation at South-
spatial-temporal graph convolutional networks for traffic flow fore-
east University. Her current research interest is electric vehicle routing
casting. In: Proceedings of the AAAI Conference on Artificial
problems with charging scheduling.
Intelligence, vol. 33, pp. 922–929 (2019)
23. Sun, H., Wei, Y., Huang, X., Gao, S., Song, Y.: Global spatio-
temporal dynamic capturing network-based traffic flow prediction.
IET Intell. Trans. Syst. (2023) Yanling Wei received the Ph.D. degree in control science and engineer-
24. Chen, L., Wu, L., Hong, R., Zhang, K., Wang, M.: Revisiting graph ing from the Harbin Institute of Technology, Harbin, China, in June
based collaborative filtering: A linear residual graph convolutional 2014. He was a Research Fellow with the Department of Electrical
network approach. In: Proceedings of the AAAI Conference on Engineering and Computer Science, Technical University of Berlin,
Artificial Intelligence, vol. 34, pp. 27–34 (2020) Berlin, Germany, and a Senior Research Fellow with Katholieke Uni-
25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., versiteit Leuven (KU Leuven), Leuven, Belgium. He currently serves
Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. as a Full Professor with the School of Automation, Southeast Univer-
Advances in Neural Information Processing Systems 30 (2017) sity, Nanjing, China. His current research interests include collective
26. Basly, H., Zayene, M.A., Sayadi, F.E.: Spatiotemporal self- behaviors of complex networks, intelligent decision-making and con-
attention mechanism driven by 3d pose to guide rgb cues for daily trol, and biorobotics.
living human activity recognition. J. Intell. & Robot. Syst. 109(1),
2 (2023)
27. Wu, H., Xu, J., Wang, J., Long, M.: Autoformer: decomposition
Xueliang Huang received the B.S., M.S., and Ph.D. degrees in electri-
transformers with auto-correlation for long-term series forecasting.
cal engineering from Southeast University, Nanjing, China, in 1991,
Adv. Neural. Inf. Process. Syst. 34, 22419–22430 (2021)
1994, and 1997, respectively. From 2002 to 2004, he was a Postdoc-
28. Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang,
toral Researcher with the University of Tokyo. Since 2004, he has
W.: Informer: beyond efficient transformer for long sequence time-
been a Professor with the Electrical Engineering Department, South-
series forecasting. In: Proceedings of the AAAI Conference on
east University. His research interests include novel wireless power
Artificial Intelligence, vol. 35, pp. 11106–11115 (2021)
transfer systems, analysis of electromagnetic field, applied electro-
29. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization.
magnetics, intelligent electricity technology, etc.
ArXiv:1607.06450 (2016)
30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image
recognition. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 770–778 (2016) Shan Gao received the Ph.D. degree from Southeast University, Nan-
31. Cleveland, R.B., Cleveland, W.S., McRae, J.E., Terpenning, I.: jing, China, in 2000. He is currently an Associate Professor with
STL: A seasonal-trend decomposition. J. Off. Stat. 6(1), 3–73 the School of Electrical Engineering, Southeast University, and the
(1990) Jiangsu Provincial Key Laboratory of Smart Grid Technology and
32. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Pro- Equipment, Southeast University. His research interests include power
ceedings of the IEEE Conference on Computer Vision and Pattern system planning and operation, renewable energy integration, and
Recognition, pp. 7132–7141 (2018) active distribution networks.
123