4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment
4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment
Physica A
journal homepage: www.elsevier.com/locate/physa
article info a b s t r a c t
Article history: This article studies adaptive traffic signal control problem of single intersection in
Received 30 December 2022 dynamic environment. A novel cycle-based signal timing method with traffic flow
Received in revised form 21 April 2023 prediction (CycleRL) is proposed to improve the traffic efficiency under dynamic traffic
Available online 23 May 2023
flow. Firstly, the empirical mode decomposition is applied to denoise the flow data.
Keywords: Then a data-model hybrid driven traffic flow prediction strategy is designed to predict
Traffic flow prediction the traffic flow, which combines a model-based Kalman filter and an LSTM network-
Data-model hybrid driven based predictor and adopts another Kalman filter to fuse both prediction results to
Traffic signal control improve the prediction precision. Besides, a robust signal cycle timing strategy based
Human–machine collaboration on human–machine collaboration is developed to deal with dynamic traffic flow, which
Soft Actor–Critic firstly designs a rule-based signal cycle scheme according to the predicted flow data as
the preliminary scheme, and then finetunes the preliminary scheme based on Soft Actor–
Critic (SAC) algorithm according to the real-time traffic dynamics. The experiments in
both synthetic scenario and real-world scenario show that the proposed data-model
hybrid driven traffic flow prediction algorithm has better prediction performance and
the proposed CycleRL method outperforms rule-based methods, flow-based allocation
methods and traditional reinforcement learning method. Moreover, it is also shown that
the proposed CycleRL method has better transferability to bridge the discrepancy across
domains.
© 2023 Elsevier B.V. All rights reserved.
1. Introduction
With the development of modern transportation system, the problem of traffic jam has attracted increasing number
of attention. From the perspective of traffic management, traffic signal control (TSC) is a commonly used and feasible
approach [1].
Traditional traffic signal control methods are generally based on rules. The rule-based methods, such as FIXTIME [2],
MAXQUEUE [3], and MAXPRESSURE [4,5], are either heuristic or from the experience of experts [6,7]. Moreover, traditional
traffic signal control methods also make certain assumptions about the traffic system [8], which are often not satisfied in
practical traffic environment, and thus not applicable to some practical environments with dynamic traffic flow.
In the past few years, as a powerful artificial intelligence paradigm for dynamic control, reinforcement learning (RL) has
shown great potential in the field of traffic signal control. Abdulhai et al. [9] attempted to control a two-phase signalized
intersection based on Q-learning. Prashanth and Bhatnagar [10] adopted linear function approximation reinforcement
∗ Corresponding author at: School of Automation, Southeast University, Nanjing, 210096, China.
E-mail address: [email protected] (Y. Zhang).
https://doi.org/10.1016/j.physa.2023.128877
0378-4371/© 2023 Elsevier B.V. All rights reserved.
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
learning to handle the curse of dimensionality. However, the above methods have a problem that the Q table and linear
function approximation is not suitable when the traffic flow is highly variable. With the rapid development of deep
learning (DL) which is able to automatically capture features from data, deep reinforcement learning (DRL) which is
the combination of RL and DL emerges to overcome the drawback of the Q table and linear function approximation.
Li et al. [11] proposed a traffic signal control method using the deep Q network (DQN) and achieved performance that
was superior to that of traditional rule-based and tabular reinforcement learning methods. In order to further improve the
traffic efficiency, Wei et al. [12] proposed a signal control strategy considering different phases in the network structure
on the basis of DQN. However, since the action of the above methods is to set the next phase, it may leads to instability
and security problem [13]. Therefore, Nakanishi and Namerikawa [14] proposed a model-based reinforcement learning
algorithm for real-time timing of traffic signal cycle. Liang et al. [13] enriched traffic features by using images as state, and
finetuned the signal cycle to ensure traffic safety. In Zheng et al. [15], it was proved that complex state representation
might not be helpful for control, or even degraded the control performance. The above RL-based control methods all
require large amount of traffic data and long training time, that is, the sample utilization rate in them is low [16,17].
Xiong et al. [16] proposed a signal control scheme to speed up the training process by pretraining the RL model based on
the samples obtained by traditional control methods. Alegre et al. [17] adopted a simple linear function approximation
to accelerate the learning process and it achieved the same control performance as the traditional control methods.
On one hand, using the observation of traffic flow in previous cycles for training may cause inappropriate and
suboptimal timing. On the other hand, the traffic flow in current cycle is prior information and cannot be utilized for RL-
based traffic signal control. There have been some studies that combine traffic signal control with traffic flow prediction.
Hu et al. [18] applied the traffic flow prediction of the next state as the input of the RL model to improve the traffic
efficiency. Nakanishi and Namerikawa [14] achieved real-time timing of traffic signal cycle by modeling the system
dynamics of the flow at intersection. In order to further improve the integration performance of prediction and control,
a hierarchical control idea was proposed [19] where RL model was adopted for real-time control and then the flow
prediction model was adopted to further adjust the action output by the RL model. However, the existing signal control
methods with flow prediction predict real-time flow information as a part of the RL model input at each step, which
largely increases the training time and design difficulty of the prediction model. The control of the RL model is also not
robust enough [20].
As to the traffic flow prediction, there are mainly two types of approaches. One is the model-based prediction method
such as Kalman filter. The other is model-free prediction method based on neural network. Xie et al. [21] applied discrete
wavelet transform to denoise the flow data, and then used the classic Kalman filter to predict the traffic flow. Wang et al.
[22] proposed an adaptive Kalman filter, taking into account both process and observation Gaussian noise to improve the
quality of denoising. Zhou et al. [23] designed a hybrid dual Kalman filtering method by compensating for the difference
in preliminary predictions. The key idea is to introduce another dynamic system to simulate the propagation of Kalman
filter prediction error and random walk model prediction error.
The prediction model based on Kalman filter has the advantages of simple implementation and optimization for
linear system, while it is not applicable to complex traffic systems with unmodeled traffic features, which influences
the prediction precision or even causes divergence of prediction error. As a result, deep neural network, especially the
recurrent neural network for time series data, is widely used in traffic flow prediction. A special recurrent neural network,
called Long Short-Term Memory (LSTM) [24], was developed to capture the temporal features over a long period of time.
Ma et al. [25] applied LSTM to predicting traffic speed from the data of remote microwave sensors. Tian and Pan [26]
used LSTM for traffic prediction and claimed that LSTM was superior to most of nonparametric models. Cai et al. [27]
proposed a noise immune LSTM network, whose loss function was noise-immune derived from the correlation entropy
criterion to eliminate the influence of non-Gaussian noise in traffic flow. Chen et al. [28] purified traffic flow data through
various smoothing models including the empirical mode decomposition (EMD) and wavelet filters with different bases
and predicted traffic flow with an LSTM model. Li et al. [29] also applied various smoothing models to denoising the
traffic flow data and proposed a Bi-LSTM model for prediction.
This paper focuses on designing a cycle-based signal timing strategy for single intersection based on traffic flow
prediction and DRL. Although the drawback of model-based prediction methods can be overcome by the model-free
prediction methods, the model-free methods may have the problem of training instability which thus not reliable enough
in practice [30]. To make full use of the advantages of model-based methods (such as Kalman filter) and deep neural
network (such as LSTM), this paper proposes a data-model hybrid driven traffic flow prediction strategy which combines
a Kalman filter and an LSTM network based predictor and adopts another Kalman filter to fuse both prediction results.
In addition, to reduce the training cost of the prediction model, we use the predictor to predict the traffic flow during
short term periodically rather than the real-time flow at each step. Then, a robust signal cycle timing strategy based on
the flow prediction and human–machine collaboration is designed. In the timing process, by using the predicted traffic
flow, a preliminary signal cycle timing algorithm is firstly designed based on some rules, and then a real-time signal cycle
timing algorithm based on Soft Actor–Critic is designed to finetune the preliminary scheme according to the real-time
traffic dynamics.
In summary, the major contributions of the paper are as follows.
• A data-model hybrid driven traffic flow prediction strategy is proposed to predict short-term traffic flow data, which
can reduce the training cost of the prediction model and has been evaluated to have higher prediction precision than
Kalman filter and LSTM network based predictor.
2
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
• A novel and robust signal cycle timing strategy based on flow prediction and human–machine collaboration, which
is composed of a rule-based preliminary signal cycle scheme and an SAC-based finetuning scheme, is designed to
improve the traffic efficiency. The proposed CycleRL also has some transferability.
The remainder of this paper is organized as follows. In Section 2 the proposed CycleRL is introduced. The traffic flow
data in a synthetic scenario and a certain area of British highway is used in the experiments in Section 3 to illustrate the
effectiveness of the proposed method. Finally, some conclusions are drawn in Section 4.
2. Methods
As is shown in Fig. 1, the proposed cycle-based signal timing method with traffic flow prediction (CycleRL) in this
paper has two modules including a data-model hybrid driven traffic flow prediction module and a robust signal cycle
timing module based on human–machine collaboration.
Different from the existing flow prediction based signal control methods which predict real-time flow information
at each step, to reduce the training cost of the prediction model, this paper only predicts the short-term flow whose
computation period is larger than the duration of signal cycle. Firstly the historical traffic flow data is preprocessed by
the EMD to denoise it. After that, the denoised flow data is divided into the training set and validation set to estimate the
energy threshold and process noise covariance matrices and to train the LSTM prediction model. Once the prediction from
the traffic flow prediction module is obtained, it is adopted by the signal cycle timing module for training the real-time
timing model.
At the beginning of the kth computation period, input the traffic flow sequence of the previous n computation periods
to the traffic flow prediction module to obtain the flow prediction of the kth computation period. The length of the traffic
flow sequence n is arbitrarily set. Then input the flow prediction as the reference flow of the kth computation period to
the signal cycle timing module since the actual flow of the kth computation period cannot be obtained at this moment.
At the end of each signal cycle, the timing module will output the practical signal cycle program to the intersection
as the signal timing scheme of the next signal cycle in reaction to the real-time traffic dynamics.
• Firstly, by taking the raw traffic flow data as the original signal, the extreme points are obtained from it, and then
the interpolation algorithm is used to connect the extreme points to obtain the upper and the lower envelopes of
the signal.
• Secondly, the mean value of the upper and lower envelopes is subtracted from the original signal to obtain the
intermediate signal. It is checked whether the intermediate signal satisfies the condition for becoming an intrinsic
mode function (IMF) of the original signal, i.e., the difference between the number of extreme points and the number
of zero-crossing points does not exceed 1 and the mean value of the upper and lower envelopes is 0 in the given
historical data set. If not, repeat the first two steps until an IMF appears.
• Next, obtain the new signal by subtracting the IMF from the original signal which replaces the original data to repeat
the first three steps until the new signal is monotone or has only one extreme point, which is called the residual
signal.
• Finally, add the residual signal and the IMFs whose energy is larger than the energy threshold of noise to recover
the denoised traffic flow data. Here the energy of an IMF is defined by the mean of the squares of the signal values,
which is corresponding to the energy level. By using the energy levels to denoise the flow data and training and
testing the prediction model with the denoised flow data, the energy threshold is set by the energy level under
which the minimum prediction error is obtained on the validation set.
θ̂ (k | k − 1) = F θ̂ (k − 1 | k − 1) ,
P(k | k − 1) = FP(k − 1 | k − 1)F ⊤ + Q ,
K (k) = P(k | k − 1)H ⊤(k) H(k)P(k | k − 1)H ⊤(k) + O −1 ,
( )
where θ̂ (k|k − 1) and θ̂ (k|k) are the prior and posterior estimate of the coefficients in the linear combination at period
k which are obtained by using the Kalman filter algorithm respectively; P (k|k − 1) and P (k|k) are the covariance matrix
of the prior estimate and the posterior estimate at period k respectively; F represents the system matrix, which is the
identity matrix In with n dimensions since the change of traffic flow between every two prediction periods is gentle; O is
the covariance matrix of the observation noise, which is set to the zero matrix since during training the true value of the
observation data can be obtained; v (k) is the latest traffic flow value; Q is the process noise covariance matrix, which is
obtained by minimizing the following performance indicators according to the EM algorithm:
N
∑
ln (X (k)) + Y ⊤ (k)X −1 (k)Y (k) + C ,
{ }
− ln(L(Q )) = (2)
k=1
where
C is a constant and N is the amount of the historical traffic flow data. Therefore, the prediction value of traffic flow at
period k + 1 based on Kalman filter is:
Remark 1. The traffic flow prediction methods in most of existing works are either model-driven or data-driven
[21,22,26,27]. There is also a work adopting the fusion of Kalman filter and the random walk model [23] to predict
traffic flow. This paper proposes a novel data-model hybrid driven prediction strategy which designs another Kalman
filter to combine the predictions from Kalman filter and LSTM network to improve the prediction precision.
This subsection designs a signal cycle timing module by using the rule and reinforcement learning algorithm and the
fused prediction v̂ (k) is used by the rule to obtain the preliminary signal cycle scheme.
5
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
on the allowable directions of phase i, vin,j represents the flow of entry lane j on the allowable directions of phase i, T is
the total duration of the signal cycle and gsac_i ∈ {−1, 0, 1} is the output of the real-time timing algorithm in phase i to
finetune the preliminary rule-based signal cycle scheme. In order to keep T unchanged, the overall output of the real-time
∑phase_num
timing algorithm in all phases should satisfy the constraint i=1 gsac_i = 0.
The ∑
state transition probability is determined by the intersection environment and the reward function is defined by
lane_num
R = − j=1 qin,j , where qin,j is the length of the real-time queue on the entry lane j. Since the goal is to maximize the
cumulative rewards, the negative value of the sum of the queue lengths is taken as the reward.
An example is shown in Fig. 2. In a two-phase intersection, there are two entry lanes in the east–west direction and
two entry lanes in the north–south direction. Therefore, the state space is {v1 , v2 , v3 , v4 , f1 , f2 , f3 , f4 }, where v1 , v2 , v3 , v4
denote the traffic flow predictions of four entry lanes that obtained from the prediction module, f1 , f2 , f3 , f4 denote the
current number of vehicles on each entry lane. As Fig. 2 shows, at this moment, f1 = 2, f2 = 1, f3 = 0, f4 = 3. The reward
is set as the negative value of the sum of the queue lengths on four entry lanes. Note that the queue length on lane 1 is
1 since a vehicle on lane 1 is still moving, and thus the reward equals to −5.
In order to train the timing model more stably and faster, a target Q network is adopted whose structure is the
same with the Q network and the parameters will be updated smoothly from the Q network φ̃ = (1 − τ ) φ + τ φ̃ ,
where τ is the exponential moving average coefficient which often be set to 0.01. Besides, the SAC algorithm uses two
Q networks (meanwhile has two target networks) to deal with the deviation problem of Q-value estimation, that is
Qφ (s, a) = min Qφ1 (s, a) , Qφ2 (s, a) .
( )
The regularization coefficient α used to coordinate the importance of the cumulative rewards and the entropy of the
policy, which can be adaptively adjusted by minimizing the following loss function:
[ ]
∑
= E πθ (a|s) (−α log πθ (a|s) + ακ) , (9)
s∼D
a
where κ is a hyperparameter representing the target entropy and πθ (a|s) represents the output value of the policy
network. The loss functions of the policy network and the Q network are as follows:
[( ))2 ]
JQ (φ) = Qφ (s, a) − y r , s′
,
(
E
(s,a,r ,s′ )∼D
( )
∑ (
y r, s = r + γ πθ a |s Qφ̃ s , a − α log πθ a |s ,
( ′) ′ ′
) ( ( ′ ′) ( ′ ′ ))
a′
where Qφ (s, a) represents the value of the state ) pair, that is, the output value of the Q network; y r , s indicates
( ′
)
( ′ action
the target output value of the Q network; Qφ̃ s , a represents the value of the corresponding target network; a′ is the
′
Table 1
The settings of generation function.
Direction1 Direction2
(µ1 , µ2 ) (48, 102) (42, 108)
(σ1 , σ2 ) (10, 20) (15, 25)
σ3 1 2
3. Experiments
The commonly used traffic simulator Cityflow [31] with a synthetic dataset and a real-world dataset from British
highway is applied to verify the feasibility and effectiveness of the proposed method. To reflect the portability of the
algorithm, both two-phase intersection and four-phase intersection are Firstly, divide both datasets into the training set
and validation set respectively. The training set serves as historical traffic flow data which is used to compute the energy
threshold for the EMD and the process noise covariance matrices. It is also used for the training of the proposed traffic flow
prediction module. Then, based on the trained traffic flow prediction module, the signal cycle timing module is trained
to improve the traffic efficiency under both the synthetic scenario and the real-world scenario by using corresponding
validation sets.
A synthetic scenario with two phases as is shown in Fig. 2 is generated to test the feasibility of the proposed method,
where the flow is generated through sampling from a double normal distribution function double_normal (µ1 , µ2 , σ1 , σ2 )
and translating from the sampled probability to simulate morning and evening traffic peak. Assume that the flow of two
directions follows two independent double normal distributions with different expectations and variances added with
white Gaussian noise ∆ ∼ N (0, σ3 ), where the expectation represents the time of traffic peak. Besides, the flow value is
limited between 5 s/vehicle and 20 s/vehicle. The specific parameters are listed in Table 1.
Traffic flow data of seven days is generated in the synthetic scenario where the data of the first five days is adopted
to train the traffic flow prediction module, while that of the last two days is adopted to test it. Moreover the traffic flow
data of day 6 is used to train the signal cycle timing module whose transferability is tested by using the traffic flow data
of day 7. The generated traffic flow on day 6 is shown in Fig. 3(a).
As to the real-world dataset on the British highway, the intersection of the M4 road and the y57 road is chosen, and
its traffic flow data in October 2019 is applied. At this intersection there are also two phases for two directions which is
the same as Fig. 2 and the flow data of one day is given in Fig. 3(b). The traffic flow data of the first 25 days is adopted to
train the traffic flow prediction module, and the other data is used as the test set. Similar to the synthetic scenario, the
traffic flow data of the first 4 days in the test set is adopted to train the signal cycle timing module and that of the last
day is adopted to test its transferability.
To further illustrate the portability of the proposed model, a four-phase intersection shown in Fig. 4 is used and the
flow data is generated through sampling from a uniform distribution function whose parameters are respectively set to
2 s/vehicle and 20 s/vehicle. Besides, the simulation time is set to 2 hr to show different performance of these models
under throughput. Other settings are the same with the 2-phase intersection environment.
8
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
Table 2
The hyperparameters used by the LSTM model.
Hyperparameter Value
Number of layers 3
The dimension of hidden vector 128
Batch size 50
Train epochs 30
Learning rate 0.001
• Kalman filter: directly use a Kalman filter to predict the traffic flow.
• EMD + Kalman filter: preprocess the raw flow data by the EMD, then input the denoised flow data to a Kalman filter
to predict the traffic flow.
• LSTM: directly use an LSTM network to predict the traffic flow.
• EMD + LSTM: preprocess the raw flow data by the EMD, then input the denoised flow data to an LSTM network to
predict the traffic flow.
• EMD + dual Kalman filter + LSTM (proposed): preprocess the raw flow data by the EMD, then input the denoised flow
data to a Kalman filter and an LSTM network to predict the traffic flow of the next computation period respectively,
finally use another Kalman filter to fuse the two predictions.
During the training process of the LSTM model, the Adam optimizer is adopted and the hyperparameters used by the
LSTM model are listed in Table 2.
In order to demonstrate the effectiveness of the proposed CycleRL, we apply three traditional methods including
FIXTIME, MAXQUEUE and MP-TT, three allocation methods based on three kinds of traffic flow, three CycleRLs based
on three kinds of traffic flow and a pure RL based method for comparison.
• FIXTIME [2]: assign the same time to each signal of one cycle.
• MAXQUEUE [3]: always allow the direction that has the longest queue length to pass (it is a simplified version of
the max pressure algorithm [4] under a single intersection).
9
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
Table 3
The hyperparameters used by the CycleRL and PureRL
methods.
Hyperparameter Value
Number of layers 3
The dimension of hidden vector 128
Batch size 50
Train episodes 30
Learning rate 0.001
exponential moving average coefficient τ 0.01
Discount factor γ 0.99
Explore steps τ 200
Reward normal factor 50
The size of replay buffer 1000
Gradient update steps 3
• MP-TT [32]: compute the pressure of each phase (the absolute difference between average travel time of vehicles
on entry lanes and it on exit lanes) and allocate the next signal cycle in proportion to the pressure.
• REALCUR: allocate the signal cycle scheme of the next computation period in proportion to the traffic flow of the
previous computation period.
• REAL: assume the true traffic flow of the next computation period is available and allocate the signal cycle scheme
in proportion to the true traffic flow.
• PRED(proposed): allocate the signal cycle scheme of the next computation period in proportion to the predicted
traffic flow that obtained by the proposed data-model hybrid driven traffic flow prediction module.
• CycleRL_REALCUR: apply the proposed robust signal cycle timing module based on human–machine collaboration
to finetune the signal cycle scheme that allocated by the REALCUR.
• CycleRL_REAL: apply the proposed robust signal cycle timing module based on human–machine collaboration to
finetune the signal cycle scheme that allocated by the REAL.
• CycleRL_PRED(proposed): apply the proposed robust signal cycle timing module based on human–machine collabo-
ration to finetune the signal cycle scheme that allocated by the PRED.
• PureRL [33]: directly apply the reinforcement learning algorithm (SAC) to allocate the signal cycle scheme of the
next computation period.
The main difference among CycleRL_PRED, CycleRL_REALCUR and CycleRL_REAL is the flow data used for preliminary
timing. Moreover, for fair comparison, the CycleRL and the PureRL methods use the same hyperparameters listed in
Table 3.
3.2.2. Metrics
For prediction, there are two metrics used to compare the performance of different traffic flow prediction models.
Fig. 5. The predicted flows and true flows in the synthetic scenario.
Table 4
The overall average queue length in the synthetic scenario under different
methods.
Day6 Day7(only tested)
FIXTIME 13.3753 11.5413
MAXQUEUE 7.6748 7.6748
MP-TT 8.4375 7.6979
REALCUR 9.7548 10.2757
REAL 9.2920 9.5224
PRED 9.4680 9.6116
CycleRL_REALCUR 9.6690 9.9527
CycleRL_REAL 9.2033 9.4302
CycleRL_PRED 9.4371 9.5752
PureRL 17.1508 11.5665
CycleRL_REALCUR(DDPG) 9.4517 10.0638
CycleRL_REAL(DDPG) 9.1089 9.4188
CycleRL_PRED(DDPG) 9.3437 9.4974
PureRL(DDPG) 13.5613 12.6817
CycleRL_REALCUR(PPO) 9.6193 10.0953
CycleRL_REAL(PPO) 9.2153 9.4400
CycleRL_PRED(PPO) 9.4263 9.6146
PureRL(PPO) 13.4421 11.1745
For signal timing, there are also two main metrics including the average queue length of intersection (the sum of queue
length on each entry lane after one signal cycle) and the average travel time of vehicles to compare the efficiency of
intersection, and throughput(the number of passed vehicles) is used to illustrate the effectiveness of the proposed model.
Table 5
The overall average travel time in the synthetic scenario under different
methods.
Day6 Day7(only tested)
FIXTIME 195.5914 65.6354
MAXQUEUE 119.0461 62.7847
MP-TT 137.9947 63.1472
REALCUR 113.3493 62.7211
REAL 106.2333 60.1429
PRED 109.7421 60.7034
CycleRL_REALCUR 111.0151 61.7619
CycleRL_REAL 104.9461 59.7644
CycleRL_PRED 106.8260 60.3319
PureRL 307.1430 67.1269
CycleRL_REALCUR(DDPG) 108.6481 62.1491
CycleRL_REAL(DDPG) 102.6855 59.8867
CycleRL_PRED(DDPG) 106.6224 60.2973
PureRL(DDPG) 140.1753 84.0975
CycleRL_REALCUR(PPO) 111.0972 62.2904
CycleRL_REAL(PPO) 104.5897 60.1651
CycleRL_PRED(PPO) 109.3894 60.5763
PureRL(PPO) 152.4202 67.2584
Fig. 6. The queue lengths at the evening peak in the synthetic scenario under four methods.
and PureRL are worse than other methods. Due to the control principle of the MAXQUEUE that the direction with the
longest queue length has the priority to pass, it achieves the shortest average queue length. However, it must cause
the vehicles on a direction with short queue length to wait too long time, thus making the travel time longer (as is
shown in Table 5). The MP-TT has similar performance of queue length and travel time to the MAXQUEUE due to its key
idea that improving the practical applicability rather than performance of the max-pressure algorithm by considering as
input travel time instead of queue length. On the other hand, PRED achieves the result superior to that of REALCUR and
inferior to that of REAL, which indirectly displays the effectiveness of the traffic flow prediction module. Moreover, by
adding the proposed signal cycle timing module to REALCUR, REAL and PRED, we design CycleRL_REALCUR, CycleRL_REAL
and CycleRL_PRED respectively which reduce the average queue length accordingly. It demonstrates that although the
flow-based allocation methods can obtain better results than FIXTIME due to the usage of flow information, they can
only handle the situation that the change of flow during one computation period is small which cannot be satisfied in
practical applications. Therefore the CycleRL methods can further improve the traffic efficiency by tackling the real-time
traffic dynamics.
To further illustrate the improved performance of the proposed CycleRL under different state-of-art DRL algorithms,
DDPG and PPO are used and the corresponding CycleRL models are trained. As shown in Table 5, Cycle_REALCUR,
Cycle_REAL and Cycle_PRED based on DDPG and PPO can all achieve better performance than REALCUR, REAL, PRED and
the corresponding PureRL under average queue length.
Since the flow at the evening peak in the synthetic scenario is much larger than that at the rest time and the true flow
data of the next computation period is not available which makes REAL and CycleRL_REAL not feasible in practice, we
compare the queue lengths at the evening peak under REALCUR, PRED, CycleRL_REALCUR and CycleRL_PRED. As illustrated
in Fig. 6, the queue lengths under both CycleRL_REALCUR and CycleRL_PRED are shorter than that under REALCUR and
PRED, which illustrates the effectiveness of the prediction module and CycleRL.
The overall average travel time under different timing methods in the synthetic scenario are listed in Table 5 and the
trajectories of the travel time under REALCUR, PRED, CycleRL_REALCUR and CycleRL_PRED are given in Fig. 7. Similar to
the results in Table 4 and Fig. 6, the prediction module and CycleRL methods can improve the traffic efficiency. Moreover,
as the evening peak arrives, the more crowded traffic is, the larger improvement CycleRL methods can obtain.
12
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
Fig. 7. The travel time in the synthetic scenario under four methods.
Fig. 8. The predicted flows and true flows in the British highway scenario.
Table 6
The metrics under different traffic flow prediction models.
Direction1(M4 road) Direction2(y57 road)
RMSE Error rate MAE MAPE RMSE Error rate MAE MAPE
Kalman filter 43.76 0.251 26.138 0.251 27.85 0.3094 18.3141 0.310
EMD+Kalman 20.83 0.152 13.257 0.1449 14.17 0.1507 9.6808 0.151
LSTM 20.63 0.172 17.567 0.172 15.66 0.1775 12.6218 0.178
EMD+LSTM 13.17 0.122 9.450 0.122 11.01 0.1409 8.1316 0.147
EMD+dual Kalman+LSTM 13.15 0.119 9.434 0.119 10.90 0.14 8.1304 0.146
Table 7
The overall average queue length in the British highway scenario under different methods.
26th 27th 28th 29th 30th(only
tested)
FIXTIME 21.98 21.76 19.70 17.77 14.84
MAXQUEUE 7.91 7.36 7.10 8.63 7.12
MP-TT 7.73 9.75 9.14 7.78 8.82
REALCUR 11.64 13.89 11.65 14.00 9.79
REAL 9.94 12.74 10.02 12.75 9.32
PRED 10.72 13.81 10.47 13.23 9.55
CycleRL_REALCUR 11.03 13.14 11.25 13.32 9.54
CycleRL_REAL 9.52 11.05 9.62 12.70 9.29
CycleRL_PRED 9.51 12.38 9.72 12.79 9.37
PureRL 10.95 13.95 20.23 13.85 20.02
CycleRL_REALCUR(DDPG) 11.55 13.39 11.01 12.51 9.73
CycleRL_REAL(DDPG) 9.44 11.02 9.73 11.40 9.36
CycleRL_PRED(DDPG) 9.39 12.66 9.72 11.71 9.47
PureRL(DDPG) 52.69 50.97 50.96 51.86 49.77
CycleRL_REALCUR(PPO) 10.99 13.45 10.56 12.04 9.59
CycleRL_REAL(PPO) 9.88 11.97 9.82 11.24 9.34
CycleRL_PRED(PPO) 10.15 12.23 10.13 13.21 9.49
PureRL(PPO) 29.19 25.57 23.41 29.03 22.72
Table 8
The overall average travel time in the British highway scenario under different methods.
26th 27th 28th 29th 30th(only
tested)
FIXTIME 270.48 307.93 225.86 252.71 127.19
MAXQUEUE 70.55 69.53 68.13 70.22 66.41
MP-TT 73.64 80.82 71.42 82.59 68.59
REALCUR 72.51 79.87 69.96 73.45 64.89
REAL 68.03 70.86 65.11 70.72 64.00
PRED 69.44 71.44 65.93 71.32 64.45
CycleRL_REALCUR 72.31 78.45 68.74 70.54 64.81
CycleRL_REAL 67.17 68.94 64.44 70.93 63.92
CycleRL_PRED 67.14 69.55 64.64 69.70 64.09
PureRL 71.63 80.15 228.09 71.44 184.15
CycleRL_REALCUR(DDPG) 72.33 78.65 68.21 69.90 64.85
CycleRL_REAL(DDPG) 67.12 70.21 64.64 67.83 64.12
CycleRL_PRED(DDPG) 66.99 70.46 64.64 67.86 64.33
PureRL(DDPG) 832.67 1493.86 1632.60 1718.34 1050.81
CycleRL_REALCUR(PPO) 71.68 78.77 68.74 68.58 64.54
CycleRL_REAL(PPO) 67.89 69.62 65.87 67.21 64.06
CycleRL_PRED(PPO) 68.41 69.72 64.83 70.84 64.34
PureRL(PPO) 524.02 435.01 323.30 485.30 228.47
Like the results in the synthetic scenario, the performance under FIXTIME is worst on most days. As to PureRL, it
achieves results close to that of REALCUR on 26th, 27th and 29th while even worse than that of FIXTIME on 28th and
30th. It is due to the training instability of the reinforcement learning. Due to the same reason, the PureRL models based on
DDPG and PPO can only obtain the worst performance in 5 days. Similarly, PRED achieves results which are superior than
that of REALCUR and inferior to that of REAL, which demonstrates the effectiveness of the proposed prediction module.
Significantly, by adding the proposed signal cycle timing module, CycleRL_REALCUR, CycleRL_REAL and CycleRL_PRED
achieve better results compared with REALCUR, REAL and PRED. It is worth noting that CycleRL_PRED achieves the best
results close to that of REAL which is not feasible in practice. It demonstrates the effectiveness of the proposed signal cycle
timing module. On the final day, the signal cycle timing models of CycleRL_REALCUR, CycleRL_REAL and CycleRL_PRED
are directly transferred from the trained models of the previous day, which reveals the transferability of the proposed
CycleRL.
The queue lengths of the five days under PRED, REALCUR, CycleRL_PRED and CycleRL_REALCUR are shown in Fig. 9. It
shows that CycleRL can reduce the queue length at traffic peak and maintain the performance of preliminary rule-based
timing at the rest time.
Similarly, the overall average travel time from October 26th to 29th and the ones directly transferred to October 30th
under different signal cycle timing methods are listed in Table 8 and the results are the same as that of Table 7. Also, the
travel time of the five days under PRED, REALCUR, CycleRL_PRED and CycleRL_REALCUR is shown in Fig. 10, which also
illustrates the effectiveness of the prediction module and CycleRL methods.
In the following, we test three different models on 26th in the British highway scenario to illustrate the effectiveness
of the proposed model on throughput. The similar results can be obtained in other scenarios but not shown due to page
14
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
Fig. 9. The queue lengths of the five days in the British highway scenario under four methods.
Fig. 10. The travel time of the five days in the British highway scenario under four methods.
limitation. It should be noted that the total throughput during one day is fixed due to the same flow data, so different
methods are compared under throughput during one day.
As shown in Fig. 11, the MP-TT and CycleRL_PRED can let more vehicles pass at the beginning of each traffic peak, thus
leaving less vehicles waiting to pass in the intersection at the end of the traffic peak. Moreover, the proposed CycleRL_PRED
has similar throughput performance to the MP-TT while can significantly decrease the average travel time as mentioned
above.
15
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
Table 9
The performance of different models in a four-phase intersection and the transferability of the proposed model.
Average queue length/vehicle
FIXTIME MAXQUEUE MP-TT PRED CycleRL_PRED
Train scenario 541.3 206.5 530.2 534.4 523.3
Only test scenario 446.7 110.05 449.8 363.3 359.1
Average travel time/second
FIXTIME MAXQUEUE MP-TT PRED CycleRL_PRED
Train scenario 1105.6 970.9 1019.2 941.4 932.2
Only test scenario 1506.1 1342.2 1368.2 1321.4 1307.2
Total throughput/vehicle
FIXTIME MAXQUEUE MP-TT PRED CycleRL_PRED
Train scenario 7094 2083 7264 7401 7433
Only test scenario 9535 5327 10120 10186 10226
4. Conclusion
In this paper, a novel cycle-based signal timing method with traffic flow prediction is proposed. This method has two
parts including the data-model hybrid driven traffic flow prediction module and the robust signal cycle timing module
based on human–machine collaboration. The traffic flow prediction module adopts the empirical mode decomposition
to denoise the raw traffic flow data and design another Kalman filter to fuse the predictions of Kalman filter and LSTM
network. The proposed signal cycle timing module is composed of a rule-based preliminary signal cycle scheme and an
SAC-based finetuning scheme. The traffic signal control for single intersection is studied in this paper. How to predict
traffic flow and design traffic signal control scheme for large-scale intersection transportation systems is of our research
interest in future.
Funding
This work is supported by National Key R and D Program of China under Grant 2021ZD0112700, National Natural
Science Foundation (NNSF) of China under Grant 61973082 and 62233003, and the Natural Science Foundation of Jiangsu
Province of China under Grant BK20202006.
Yisha Li: Conceptualization, Methodology, Software, Validation, Writing – original draft, Writing – review & editing.
Guoxi Chen: Formal analysis, Methodology, Software. Ya Zhang: Supervision, Project administration, Funding acquisition.
16
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877
The authors declare the following financial interests/personal relationships which may be considered as potential
competing interests: Ya Zhang reports financial support was provided by National Key R&D Program of China. Ya Zhang
reports financial support was provided by National Natural Science Foundation (NNSF) of China. Ya Zhang reports financial
support was provided by the Natural Science Foundation of Jiangsu Province of China.
Data availability
References
[1] X. Liang, T. Yan, J. Lee, G. Wang, A distributed intersection management protocol for safety, efficiency, and driver’s comfort, IEEE Internet Things
J. 5 (3) (2018) 1924–1935.
[2] A.J. Miller, Settings for fixed-cycle traffic signals, J. Oper. Res. Soc. 14 (4) (1963) 373–386.
[3] M. Georg, C. Jechlitschek, S. Gorinsky, Improving individual flow performance with multiple queue fair queuing, in: 2007 Fifteenth IEEE
International Workshop on Quality of Service, IEEE, 2007, pp. 141–144.
[4] P. Varaiya, Max pressure control of a network of signalized intersections, Transp. Res. C 36 (2013) 177–195.
[5] P. Mercader, W. Uwayid, J. Haddad, Max-pressure traffic controller based on travel times: An experimental analysis, Transp. Res. C 110 (2020)
275–290.
[6] I. Porche, S. Lafortune, Adaptive look-ahead optimization of traffic signals, J. Intell. Transp. Syst. 4 (3–4) (1999) 209–254.
[7] S.-B. Cools, C. Gershenson, B. D’Hooghe, Self-organizing traffic lights: A realistic simulation, in: Advances in Applied Self-Organizing Systems,
Springer, 2013, pp. 45–55.
[8] Y. Du, A. Kouvelas, W. ShangGuan, M.A. Makridis, Dynamic capacity estimation of mixed traffic flows with application in adaptive traffic
signal control, Physica A Stat. Mech. Appl. (ISSN: 0378-4371) 606 (2022) 128065, http://dx.doi.org/10.1016/j.physa.2022.128065, URL https:
//www.sciencedirect.com/science/article/pii/S037843712200663X.
[9] B. Abdulhai, R. Pringle, G.J. Karakoulas, Reinforcement learning for true adaptive traffic signal control, J. Transp. Eng. 129 (3) (2003).
[10] L. Prashanth, S. Bhatnagar, Reinforcement learning with function approximation for traffic signal control, IEEE Trans. Intell. Transp. Syst. 12 (2)
(2010) 412–421.
[11] L. Li, Y. Lv, F.-Y. Wang, Traffic signal timing via deep reinforcement learning, IEEE/CAA J. Autom. Sin. 3 (3) (2016) 247–254.
[12] H. Wei, G. Zheng, H. Yao, Z. Li, Intellilight: A reinforcement learning approach for intelligent traffic light control, in: Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 2496–2505.
[13] X. Liang, X. Du, G. Wang, Z. Han, A deep reinforcement learning network for traffic light cycle control, IEEE Trans. Veh. Technol. 68 (2) (2019)
1243–1253.
[14] H. Nakanishi, T. Namerikawa, Optimal traffic signal control for alleviation of congestion based on traffic density prediction by model predictive
control, in: 2016 55th Annual Conference of the Society of Instrument and Control Engineers of Japan, SICE, IEEE, 2016, pp. 1273–1278.
[15] G. Zheng, X. Zang, N. Xu, H. Wei, Z. Yu, V. Gayah, K. Xu, Z. Li, Diagnosing reinforcement learning for traffic signal control, 2019, arXiv preprint
arXiv:1905.04716.
[16] Y. Xiong, G. Zheng, K. Xu, Z. Li, Learning traffic signal control from demonstrations, in: Proceedings of the 28th ACM International Conference
on Information and Knowledge Management, 2019, pp. 2289–2292.
[17] L.N. Alegre, T. Ziemke, A.L. Bazzan, Using reinforcement learning to control traffic signals in a real-world scenario: an approach based on linear
function approximation, IEEE Trans. Intell. Transp. Syst. (2021).
[18] X. Hu, C. Zhao, G. Wang, A traffic light dynamic control algorithm with deep reinforcement learning based on GNN prediction, 2020, arXiv
preprint arXiv:2009.14627.
[19] M. Abdoos, A.L. Bazzan, Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory,
Expert Syst. Appl. 171 (2021) 114580.
[20] A. Bignold, F. Cruz, R. Dazeley, P. Vamplew, C. Foale, Human engagement providing evaluative and informative advice for interactive
reinforcement learning, Neural Comput. Appl. (2022) 1–16.
[21] Y. Xie, Y. Zhang, Z. Ye, Short-term traffic volume forecasting using Kalman filter with discrete wavelet decomposition, Comput.-Aided Civ.
Infrastruct. Eng. 22 (5) (2007) 326–334.
[22] Y. Wang, J.H. Van Schuppen, J. Vrancken, Prediction of traffic flow at the boundary of a motorway network, IEEE Trans. Intell. Transp. Syst. 15
(1) (2013) 214–227.
[23] T. Zhou, D. Jiang, Z. Lin, G. Han, X. Xu, J. Qin, Hybrid dual Kalman filtering model for short-term traffic flow forecasting, IET Intell. Transp. Syst.
13 (6) (2019) 1023–1032.
[24] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8) (1997) 1735–1780.
[25] X. Ma, Z. Tao, Y. Wang, H. Yu, Y. Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor
data, Transp. Res. C 54 (2015) 187–197.
[26] Y. Tian, L. Pan, Predicting short-term traffic flow by long short-term memory recurrent neural network, in: 2015 IEEE International Conference
on Smart City/SocialCom/SustainCom (SmartCity), IEEE, 2015, pp. 153–158.
[27] L. Cai, M. Lei, S. Zhang, Y. Yu, T. Zhou, J. Qin, A noise-immune LSTM network for short-term traffic flow forecasting, Chaos 30 (2) (2020)
023135.
[28] X. Chen, H. Chen, Y. Yang, H. Wu, W. Zhang, J. Zhao, Y. Xiong, Traffic flow prediction by an ensemble framework with data denoising and deep
learning model, Physica A Stat. Mech. Appl. 565 (2021) 125574.
[29] Z. Li, H. Ge, R. Cheng, Traffic flow prediction based on BILSTM model and data denoising scheme, Chin. Phys. B 31 (4) (2022) 040502.
[30] X. Bai, X. Wang, X. Liu, Q. Liu, J. Song, N. Sebe, B. Kim, Explainable deep learning for efficient and robust pattern recognition: A survey of
recent developments, Pattern Recognit. 120 (2021) 108102.
[31] H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, Z. Li, Cityflow: A multi-agent reinforcement learning environment
for large scale city traffic scenario, in: The World Wide Web Conference, 2019, pp. 3620–3624.
[32] P. Mercader, W. Uwayid, J. Haddad, Max-pressure traffic controller based on travel times: An experimental analysis, Transp. Res. C 110 (2020)
275–290.
[33] T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,
in: International Conference on Machine Learning, PMLR, 2018, pp. 1861–1870.
17