0% found this document useful (0 votes)

17 views17 pages

4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment

Uploaded by

1057258646

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views17 pages

4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment

Uploaded by

1057258646

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Physica A 623 (2023) 128877

Contents lists available at ScienceDirect

Physica A
journal homepage: www.elsevier.com/locate/physa

Cycle-based signal timing with traffic flow prediction for

dynamic environment
∗
Yisha Li, Guoxi Chen, Ya Zhang
School of Automation, Southeast University, Nanjing, 210096, China
Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing, 210096, China

article info a b s t r a c t

Article history: This article studies adaptive traffic signal control problem of single intersection in
Received 30 December 2022 dynamic environment. A novel cycle-based signal timing method with traffic flow
Received in revised form 21 April 2023 prediction (CycleRL) is proposed to improve the traffic efficiency under dynamic traffic
Available online 23 May 2023
flow. Firstly, the empirical mode decomposition is applied to denoise the flow data.
Keywords: Then a data-model hybrid driven traffic flow prediction strategy is designed to predict
Traffic flow prediction the traffic flow, which combines a model-based Kalman filter and an LSTM network-
Data-model hybrid driven based predictor and adopts another Kalman filter to fuse both prediction results to
Traffic signal control improve the prediction precision. Besides, a robust signal cycle timing strategy based
Human–machine collaboration on human–machine collaboration is developed to deal with dynamic traffic flow, which
Soft Actor–Critic firstly designs a rule-based signal cycle scheme according to the predicted flow data as
the preliminary scheme, and then finetunes the preliminary scheme based on Soft Actor–
Critic (SAC) algorithm according to the real-time traffic dynamics. The experiments in
both synthetic scenario and real-world scenario show that the proposed data-model
hybrid driven traffic flow prediction algorithm has better prediction performance and
the proposed CycleRL method outperforms rule-based methods, flow-based allocation
methods and traditional reinforcement learning method. Moreover, it is also shown that
the proposed CycleRL method has better transferability to bridge the discrepancy across
domains.
© 2023 Elsevier B.V. All rights reserved.

1. Introduction

With the development of modern transportation system, the problem of traffic jam has attracted increasing number
of attention. From the perspective of traffic management, traffic signal control (TSC) is a commonly used and feasible
approach [1].
Traditional traffic signal control methods are generally based on rules. The rule-based methods, such as FIXTIME [2],
MAXQUEUE [3], and MAXPRESSURE [4,5], are either heuristic or from the experience of experts [6,7]. Moreover, traditional
traffic signal control methods also make certain assumptions about the traffic system [8], which are often not satisfied in
practical traffic environment, and thus not applicable to some practical environments with dynamic traffic flow.
In the past few years, as a powerful artificial intelligence paradigm for dynamic control, reinforcement learning (RL) has
shown great potential in the field of traffic signal control. Abdulhai et al. [9] attempted to control a two-phase signalized
intersection based on Q-learning. Prashanth and Bhatnagar [10] adopted linear function approximation reinforcement

∗ Corresponding author at: School of Automation, Southeast University, Nanjing, 210096, China.
E-mail address: [email protected] (Y. Zhang).

https://doi.org/10.1016/j.physa.2023.128877
0378-4371/© 2023 Elsevier B.V. All rights reserved.
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

learning to handle the curse of dimensionality. However, the above methods have a problem that the Q table and linear
function approximation is not suitable when the traffic flow is highly variable. With the rapid development of deep
learning (DL) which is able to automatically capture features from data, deep reinforcement learning (DRL) which is
the combination of RL and DL emerges to overcome the drawback of the Q table and linear function approximation.
Li et al. [11] proposed a traffic signal control method using the deep Q network (DQN) and achieved performance that
was superior to that of traditional rule-based and tabular reinforcement learning methods. In order to further improve the
traffic efficiency, Wei et al. [12] proposed a signal control strategy considering different phases in the network structure
on the basis of DQN. However, since the action of the above methods is to set the next phase, it may leads to instability
and security problem [13]. Therefore, Nakanishi and Namerikawa [14] proposed a model-based reinforcement learning
algorithm for real-time timing of traffic signal cycle. Liang et al. [13] enriched traffic features by using images as state, and
finetuned the signal cycle to ensure traffic safety. In Zheng et al. [15], it was proved that complex state representation
might not be helpful for control, or even degraded the control performance. The above RL-based control methods all
require large amount of traffic data and long training time, that is, the sample utilization rate in them is low [16,17].
Xiong et al. [16] proposed a signal control scheme to speed up the training process by pretraining the RL model based on
the samples obtained by traditional control methods. Alegre et al. [17] adopted a simple linear function approximation
to accelerate the learning process and it achieved the same control performance as the traditional control methods.
On one hand, using the observation of traffic flow in previous cycles for training may cause inappropriate and
suboptimal timing. On the other hand, the traffic flow in current cycle is prior information and cannot be utilized for RL-
based traffic signal control. There have been some studies that combine traffic signal control with traffic flow prediction.
Hu et al. [18] applied the traffic flow prediction of the next state as the input of the RL model to improve the traffic
efficiency. Nakanishi and Namerikawa [14] achieved real-time timing of traffic signal cycle by modeling the system
dynamics of the flow at intersection. In order to further improve the integration performance of prediction and control,
a hierarchical control idea was proposed [19] where RL model was adopted for real-time control and then the flow
prediction model was adopted to further adjust the action output by the RL model. However, the existing signal control
methods with flow prediction predict real-time flow information as a part of the RL model input at each step, which
largely increases the training time and design difficulty of the prediction model. The control of the RL model is also not
robust enough [20].
As to the traffic flow prediction, there are mainly two types of approaches. One is the model-based prediction method
such as Kalman filter. The other is model-free prediction method based on neural network. Xie et al. [21] applied discrete
wavelet transform to denoise the flow data, and then used the classic Kalman filter to predict the traffic flow. Wang et al.
[22] proposed an adaptive Kalman filter, taking into account both process and observation Gaussian noise to improve the
quality of denoising. Zhou et al. [23] designed a hybrid dual Kalman filtering method by compensating for the difference
in preliminary predictions. The key idea is to introduce another dynamic system to simulate the propagation of Kalman
filter prediction error and random walk model prediction error.
The prediction model based on Kalman filter has the advantages of simple implementation and optimization for
linear system, while it is not applicable to complex traffic systems with unmodeled traffic features, which influences
the prediction precision or even causes divergence of prediction error. As a result, deep neural network, especially the
recurrent neural network for time series data, is widely used in traffic flow prediction. A special recurrent neural network,
called Long Short-Term Memory (LSTM) [24], was developed to capture the temporal features over a long period of time.
Ma et al. [25] applied LSTM to predicting traffic speed from the data of remote microwave sensors. Tian and Pan [26]
used LSTM for traffic prediction and claimed that LSTM was superior to most of nonparametric models. Cai et al. [27]
proposed a noise immune LSTM network, whose loss function was noise-immune derived from the correlation entropy
criterion to eliminate the influence of non-Gaussian noise in traffic flow. Chen et al. [28] purified traffic flow data through
various smoothing models including the empirical mode decomposition (EMD) and wavelet filters with different bases
and predicted traffic flow with an LSTM model. Li et al. [29] also applied various smoothing models to denoising the
traffic flow data and proposed a Bi-LSTM model for prediction.
This paper focuses on designing a cycle-based signal timing strategy for single intersection based on traffic flow
prediction and DRL. Although the drawback of model-based prediction methods can be overcome by the model-free
prediction methods, the model-free methods may have the problem of training instability which thus not reliable enough
in practice [30]. To make full use of the advantages of model-based methods (such as Kalman filter) and deep neural
network (such as LSTM), this paper proposes a data-model hybrid driven traffic flow prediction strategy which combines
a Kalman filter and an LSTM network based predictor and adopts another Kalman filter to fuse both prediction results.
In addition, to reduce the training cost of the prediction model, we use the predictor to predict the traffic flow during
short term periodically rather than the real-time flow at each step. Then, a robust signal cycle timing strategy based on
the flow prediction and human–machine collaboration is designed. In the timing process, by using the predicted traffic
flow, a preliminary signal cycle timing algorithm is firstly designed based on some rules, and then a real-time signal cycle
timing algorithm based on Soft Actor–Critic is designed to finetune the preliminary scheme according to the real-time
traffic dynamics.
In summary, the major contributions of the paper are as follows.

• A data-model hybrid driven traffic flow prediction strategy is proposed to predict short-term traffic flow data, which
can reduce the training cost of the prediction model and has been evaluated to have higher prediction precision than
Kalman filter and LSTM network based predictor.
2
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 1. The overall framework of the proposed CycleRL.

• A novel and robust signal cycle timing strategy based on flow prediction and human–machine collaboration, which
is composed of a rule-based preliminary signal cycle scheme and an SAC-based finetuning scheme, is designed to
improve the traffic efficiency. The proposed CycleRL also has some transferability.

The remainder of this paper is organized as follows. In Section 2 the proposed CycleRL is introduced. The traffic flow
data in a synthetic scenario and a certain area of British highway is used in the experiments in Section 3 to illustrate the
effectiveness of the proposed method. Finally, some conclusions are drawn in Section 4.

2. Methods

2.1. Overall framework of the proposed CycleRL

As is shown in Fig. 1, the proposed cycle-based signal timing method with traffic flow prediction (CycleRL) in this
paper has two modules including a data-model hybrid driven traffic flow prediction module and a robust signal cycle
timing module based on human–machine collaboration.
Different from the existing flow prediction based signal control methods which predict real-time flow information
at each step, to reduce the training cost of the prediction model, this paper only predicts the short-term flow whose
computation period is larger than the duration of signal cycle. Firstly the historical traffic flow data is preprocessed by
the EMD to denoise it. After that, the denoised flow data is divided into the training set and validation set to estimate the
energy threshold and process noise covariance matrices and to train the LSTM prediction model. Once the prediction from
the traffic flow prediction module is obtained, it is adopted by the signal cycle timing module for training the real-time
timing model.
At the beginning of the kth computation period, input the traffic flow sequence of the previous n computation periods
to the traffic flow prediction module to obtain the flow prediction of the kth computation period. The length of the traffic
flow sequence n is arbitrarily set. Then input the flow prediction as the reference flow of the kth computation period to
the signal cycle timing module since the actual flow of the kth computation period cannot be obtained at this moment.
At the end of each signal cycle, the timing module will output the practical signal cycle program to the intersection
as the signal timing scheme of the next signal cycle in reaction to the real-time traffic dynamics.

2.2. Data-model hybrid driven traffic flow prediction module

2.2.1. Data preprocessing

In order to reduce the influence of the noise in raw traffic flow data, empirical mode decomposition (EMD) is applied
and its detailed process is as follows.
3
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

• Firstly, by taking the raw traffic flow data as the original signal, the extreme points are obtained from it, and then
the interpolation algorithm is used to connect the extreme points to obtain the upper and the lower envelopes of
the signal.
• Secondly, the mean value of the upper and lower envelopes is subtracted from the original signal to obtain the
intermediate signal. It is checked whether the intermediate signal satisfies the condition for becoming an intrinsic
mode function (IMF) of the original signal, i.e., the difference between the number of extreme points and the number
of zero-crossing points does not exceed 1 and the mean value of the upper and lower envelopes is 0 in the given
historical data set. If not, repeat the first two steps until an IMF appears.
• Next, obtain the new signal by subtracting the IMF from the original signal which replaces the original data to repeat
the first three steps until the new signal is monotone or has only one extreme point, which is called the residual
signal.
• Finally, add the residual signal and the IMFs whose energy is larger than the energy threshold of noise to recover
the denoised traffic flow data. Here the energy of an IMF is defined by the mean of the squares of the signal values,
which is corresponding to the energy level. By using the energy levels to denoise the flow data and training and
testing the prediction model with the denoised flow data, the energy threshold is set by the energy level under
which the minimum prediction error is obtained on the validation set.

2.2.2. Kalman filter based prediction algorithm

Supposing that the traffic flow in the next computation period is a linear combination of previous n traffic flows, it is
feasible to apply Kalman filter as an estimator of the coefficients in the linear combination. Taking the denoised flow data
of the previous n periods H(k) = [v (k − 1) , v (k − 2) , · · · , v (k − n)] as the input, the update formulas are as follows:

θ̂ (k | k) = θ̂ (k | k − 1) + K (k) · (v (k) − H(k) · θ̂ (k | k − 1)),

P(k | k) = (In − K (k)H(k))P(k | k − 1) (1)

where θ̂ (k|k − 1) and θ̂ (k|k) are the prior and posterior estimate of the coefficients in the linear combination at period
k which are obtained by using the Kalman filter algorithm respectively; P (k|k − 1) and P (k|k) are the covariance matrix
of the prior estimate and the posterior estimate at period k respectively; F represents the system matrix, which is the
identity matrix In with n dimensions since the change of traffic flow between every two prediction periods is gentle; O is
the covariance matrix of the observation noise, which is set to the zero matrix since during training the true value of the
observation data can be obtained; v (k) is the latest traffic flow value; Q is the process noise covariance matrix, which is
obtained by minimizing the following performance indicators according to the EM algorithm:
N
∑
ln (X (k)) + Y ⊤ (k)X −1 (k)Y (k) + C ,
{ }
− ln(L(Q )) = (2)
k=1

where

X (k) = H (k) P (k|k − 1) H ⊤ (k) + O,

Y (k) = v (k) − H (k) · θ̂ (k|k − 1) ,
P(k | k − 1) = FP(k − 1 | k − 1)F ⊤ + Q , (3)

C is a constant and N is the amount of the historical traffic flow data. Therefore, the prediction value of traffic flow at
period k + 1 based on Kalman filter is:

v̂kal (k + 1) = H (k + 1) θ̂ (k + 1 | k). (4)

2.2.3. LSTM network based prediction algorithm

This paper applies the long short-term memory network to predict the traffic flow, where the loss function is selected
as the mean square error and the hidden state in the last cell of the last layer of the network is used as the prediction
of the traffic flow in the next computation period. According to the historical observation data, the training set and the
validation set are divided, and the LSTM network is trained with the goal of minimizing the loss function. When the loss
keeps falling on the training set and remains unchanged on the validation set for a long time, the training is stopped, and
the network weights are fixed. Therefore, after inputting the denoised flow data of the previous n periods H(k + 1), the
prediction value of the traffic flow at period k + 1 based on LSTM network is:

v̂lstm (k + 1) = LSTM(H (k + 1)). (5)

4
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

2.2.4. Fusion of predictions

After obtaining the prediction value from Kalman filter v̂kal (k) and LSTM network v̂lstm (k), to make full use of the
advantages of Kalman filter and LSTM network, this paper fuses the above two predicts v̂kal , v̂lstm by constructing another
Kalman filter. The Kalman filter is designed to estimate the prediction errors that are computed by using the Kalman filter
and the LSTM network and has the following form:

x̂(k | k − 1) = Ax̂(k − 1 | k − 1),

P (k | k − 1) = AP (k − 1 | k − 1)A⊤ + Q,
)−1
Kk = P(k | k − 1)C ⊤ C P(k | k − 1)C ⊤ + R ,
(

x̂(k | k) = x̂(k | k − 1) + Kk · (y(k) − C ⊤ · x̂(k | k − 1)),

P (k | k) = (I2 − Kk C ⊤ )P (k | k − 1), (6)
v̂lstm (k) − v (k)
[ ]
where the system state x(k) = is a vector consisting of the prediction errors and is unknown since
v (k) − v̂kal (k)
v (k) is unavailable at period k; x̂(k | k − 1), P (k | k − 1) and x̂(k | k), P (k | k) represent corresponding prior estimate, prior
covariance matrix, and posterior estimate, posterior covariance matrix respectively; A is the system matrix of prediction
errors, which is taken as the identity matrix; the observation value y(k) = v̂lstm (k) − v̂kal (k) can be obtained by using
the above two algorithms at period k and thus the observation matrix C is equal to [1, 1]; R is the observation noise
covariance matrix which is set to the zero matrix; Q is the process noise covariance matrix which can be estimated from
the historical traffic flow data. Then, the fused prediction is designed as

v̂ (k) = 0.5 ∗ v̂lstm (k) + v̂kal (k) − [1, −1] x̂(k | k) .

( )
(7)

Remark 1. The traffic flow prediction methods in most of existing works are either model-driven or data-driven
[21,22,26,27]. There is also a work adopting the fusion of Kalman filter and the random walk model [23] to predict
traffic flow. This paper proposes a novel data-model hybrid driven prediction strategy which designs another Kalman
filter to combine the predictions from Kalman filter and LSTM network to improve the prediction precision.

2.3. Robust signal cycle timing module based on human–machine collaboration

This subsection designs a signal cycle timing module by using the rule and reinforcement learning algorithm and the
fused prediction v̂ (k) is used by the rule to obtain the preliminary signal cycle scheme.

2.3.1. Traffic signal control problem formulation

The traffic signal control problem of a single intersection can generally be formulated as the following optimization
problem:
lane_num
∑
min timej ,
j=1

s.t . g min ⩽ gphase_i ⩽ g max , i = 1, . . . , phase_num,

gj fin,j
⩾ , j = 1, . . . , lane_num (8)
T usat ,j
where phase_num and lane_num denote the number of phases and lanes respectively; timej denotes the total travel time of
vehicles entering the intersection from lane j, gphase_i is the green time allocated to phase i, g min and g max denote the lower
and upper bound of green time respectively; gj is the green time for the entry lane j, T is the total duration of the signal
cycle at the intersection, and fin,j , usat ,j denote the traffic flow and the saturation flow on the entry lane j respectively.

2.3.2. Definition of state, action and reward

The traffic signal cycle timing process is modeled as an MDP and expressed as ⟨S , A, R, P , γ ⟩ with each element
described in detail as follows.
The state space of the environment is defined as S = vin,j , fin,j , j = 1, . . . , lane_num , where vin,j represents the
{ }
predicted traffic flow at each computation period on the entry lane j which is obtained by the traffic flow prediction
module, fin,j denotes the real-time number{ of vehicles on the entry lane } j.
The action space is defined as A = gphase_i , i = 1, . . . , phase_num where gphase_i denotes the selected green time in
the signal cycle timing. In this paper, the green time is composed of two parts. One is the preliminary green time obtained
from the rule-based timing algorithm. The other is the finetuned value obtained from the RL. In other words, the green
vphase_i
time for phase i can be calculated by gphase_i = gbase_i + gsac_i , where gbase_i = ⌊T × ∑phase_num + 12 ⌋ is the output of
k=1
vphase_k
the preliminary rule-based timing algorithm which is proportional to vphase_i , vphase_i = vin,j represents the traffic flow
∑
j

5
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 2. Example of a two-phase intersection.

on the allowable directions of phase i, vin,j represents the flow of entry lane j on the allowable directions of phase i, T is
the total duration of the signal cycle and gsac_i ∈ {−1, 0, 1} is the output of the real-time timing algorithm in phase i to
finetune the preliminary rule-based signal cycle scheme. In order to keep T unchanged, the overall output of the real-time
∑phase_num
timing algorithm in all phases should satisfy the constraint i=1 gsac_i = 0.
The ∑
state transition probability is determined by the intersection environment and the reward function is defined by
lane_num
R = − j=1 qin,j , where qin,j is the length of the real-time queue on the entry lane j. Since the goal is to maximize the
cumulative rewards, the negative value of the sum of the queue lengths is taken as the reward.
An example is shown in Fig. 2. In a two-phase intersection, there are two entry lanes in the east–west direction and
two entry lanes in the north–south direction. Therefore, the state space is {v1 , v2 , v3 , v4 , f1 , f2 , f3 , f4 }, where v1 , v2 , v3 , v4
denote the traffic flow predictions of four entry lanes that obtained from the prediction module, f1 , f2 , f3 , f4 denote the
current number of vehicles on each entry lane. As Fig. 2 shows, at this moment, f1 = 2, f2 = 1, f3 = 0, f4 = 3. The reward
is set as the negative value of the sum of the queue lengths on four entry lanes. Note that the queue length on lane 1 is
1 since a vehicle on lane 1 is still moving, and thus the reward equals to −5.

2.3.3. Robust signal cycle timing based on human–machine collaboration

In this paper, to guarantee the stability of the timing algorithm and speed up the training process, we combine the
rule-based timing scheme with the DRL to develop a human–machine collaborative signal cycle timing strategy. Firstly,
by using the traffic flow periodically predicted by the proposed prediction strategy, a signal cycle timing algorithm is
designed based on some rules, which is treated as the preliminary module of the DRL module. Then a real-time signal
cycle timing algorithm based on Soft Actor–Critic (SAC) is designed to finetune the preliminary scheme according to the
real-time traffic dynamics. More specifically, based on SAC, a neural network is used as the function approximation to
estimate Q-value of action for the given traffic state, while another neural network is used as the policy to generate action
to finetune the preliminary signal cycle scheme. At the end of each signal cycle, the traffic state is input into the policy
network to output every action probability and the action with the highest probability will be taken. The state will also
be input into the Q network to output the Q-values of all actions which are used to update the parameters of the two
networks. Then
( after )taking action in the intersection, the next state and reward will be given from the environment, and
the sample s, a, r , s′ will be stored into a replay buffer D. Finally, if the number of samples in the replay buffer is larger
than the batch size, then compute loss values from batch size samples for both the Q network and the policy network to
update their parameters. The duration of interacting with intersection is defined as simulate_time and the specific training
process is shown in Algorithm 1.
6
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Algorithm 1 Robust Signal Cycle Timing based on Human–Machine Collaboration

1: Hyperparameters: define the target entropy κ , learning rate λQ , λπ , λα , exponential moving average coefficient τ
2: Inputs: prediction module, initial parameters of the policy network θ , initial parameters of the two Q networks φ1 , φ2
3: D = ∅, φ˜1 = φ1 , φ˜2 = φ2
4: for episode=0 to train_episodes do
5: reset the environment and get the initial state s0 → s
6: for step=0 to ⌊ simulate_time
T
⌋ do
7: if at the end of one computation period then predict the flow data of the next computation period from the
prediction module and input the predicted flow data into the rule-base signal cycle timing algorithm to obtain the
preliminary scheme.
8: end if
9: during the real-time timing training, sample a from πθ (·|s) and output practical signal cycle program to interact
with the environment, thus obtaining the reward r and the next state s′
10: D = D ∪ { s , a, r , s ′ }
11: if the number of samples in buffer D reaches batch_size then conduct multi-step gradient update:
12: φi = φi − λQ ∇ JQ (φi ) , for i = 1, 2
13: θ = θ − λπ ∇ Jπ (θ),
14: α = α − λα ∇ J (α),
15: φ̃i = (1 − τ ) φi + τ φ̃i , for i = 1, 2,
16: end if
17: end for
18: end for
19: Return: the final network parameters θ, φ1 , φ2

In order to train the timing model more stably and faster, a target Q network is adopted whose structure is the
same with the Q network and the parameters will be updated smoothly from the Q network φ̃ = (1 − τ ) φ + τ φ̃ ,
where τ is the exponential moving average coefficient which often be set to 0.01. Besides, the SAC algorithm uses two
Q networks (meanwhile has two target networks) to deal with the deviation problem of Q-value estimation, that is
Qφ (s, a) = min Qφ1 (s, a) , Qφ2 (s, a) .
( )
The regularization coefficient α used to coordinate the importance of the cumulative rewards and the entropy of the
policy, which can be adaptively adjusted by minimizing the following loss function:

J (α) = E [−α log πθ (a|s) + ακ ] ,

s∼D,a∼πθ

[ ]
∑
= E πθ (a|s) (−α log πθ (a|s) + ακ) , (9)
s∼D
a

where κ is a hyperparameter representing the target entropy and πθ (a|s) represents the output value of the policy
network. The loss functions of the policy network and the Q network are as follows:
[( ))2 ]
JQ (φ) = Qφ (s, a) − y r , s′
,
(
E
(s,a,r ,s′ )∼D
( )
∑ (
y r, s = r + γ πθ a |s Qφ̃ s , a − α log πθ a |s ,
( ′) ′ ′
) ( ( ′ ′) ( ′ ′ ))

a′

Jπ (θ) = Es∼D Ea∼πθ α log πθ (a|s) − Qφ (s, a) ,

[ [ ]]
[ ]
∑
πθ (a|s) α log πθ (a|s) − Qφ (s, a) ,
( )
= Es∼D (10)
a

where Qφ (s, a) represents the value of the state ) pair, that is, the output value of the Q network; y r , s indicates
( ′
)
( ′ action
the target output value of the Q network; Qφ̃ s , a represents the value of the corresponding target network; a′ is the
′

next action to take.

It should be noted that in the robust human–machine collaboration based signal cycle timing algorithm, the preliminary
rule-based timing algorithm can be replaced by any other rule-based timing methods and different state-of-art RL
algorithms can be used as the real-time timing algorithm. In this paper, Soft Actor–Critic algorithm is used as a state-of-art
RL algorithm.
7
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Table 1
The settings of generation function.
Direction1 Direction2
(µ1 , µ2 ) (48, 102) (42, 108)
(σ1 , σ2 ) (10, 20) (15, 25)
σ3 1 2

Fig. 3. The traffic flow in different scenarios.

3. Experiments

The commonly used traffic simulator Cityflow [31] with a synthetic dataset and a real-world dataset from British
highway is applied to verify the feasibility and effectiveness of the proposed method. To reflect the portability of the
algorithm, both two-phase intersection and four-phase intersection are Firstly, divide both datasets into the training set
and validation set respectively. The training set serves as historical traffic flow data which is used to compute the energy
threshold for the EMD and the process noise covariance matrices. It is also used for the training of the proposed traffic flow
prediction module. Then, based on the trained traffic flow prediction module, the signal cycle timing module is trained
to improve the traffic efficiency under both the synthetic scenario and the real-world scenario by using corresponding
validation sets.

3.1. Data and experimental scenarios

A synthetic scenario with two phases as is shown in Fig. 2 is generated to test the feasibility of the proposed method,
where the flow is generated through sampling from a double normal distribution function double_normal (µ1 , µ2 , σ1 , σ2 )
and translating from the sampled probability to simulate morning and evening traffic peak. Assume that the flow of two
directions follows two independent double normal distributions with different expectations and variances added with
white Gaussian noise ∆ ∼ N (0, σ3 ), where the expectation represents the time of traffic peak. Besides, the flow value is
limited between 5 s/vehicle and 20 s/vehicle. The specific parameters are listed in Table 1.
Traffic flow data of seven days is generated in the synthetic scenario where the data of the first five days is adopted
to train the traffic flow prediction module, while that of the last two days is adopted to test it. Moreover the traffic flow
data of day 6 is used to train the signal cycle timing module whose transferability is tested by using the traffic flow data
of day 7. The generated traffic flow on day 6 is shown in Fig. 3(a).
As to the real-world dataset on the British highway, the intersection of the M4 road and the y57 road is chosen, and
its traffic flow data in October 2019 is applied. At this intersection there are also two phases for two directions which is
the same as Fig. 2 and the flow data of one day is given in Fig. 3(b). The traffic flow data of the first 25 days is adopted to
train the traffic flow prediction module, and the other data is used as the test set. Similar to the synthetic scenario, the
traffic flow data of the first 4 days in the test set is adopted to train the signal cycle timing module and that of the last
day is adopted to test its transferability.
To further illustrate the portability of the proposed model, a four-phase intersection shown in Fig. 4 is used and the
flow data is generated through sampling from a uniform distribution function whose parameters are respectively set to
2 s/vehicle and 20 s/vehicle. Besides, the simulation time is set to 2 hr to show different performance of these models
under throughput. Other settings are the same with the 2-phase intersection environment.
8
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 4. A four-phase intersection.

Table 2
The hyperparameters used by the LSTM model.
Hyperparameter Value
Number of layers 3
The dimension of hidden vector 128
Batch size 50
Train epochs 30
Learning rate 0.001

3.2. Different methods for comparison and metrics

3.2.1. Different methods for comparison

Firstly, in order to demonstrate the high accuracy of the proposed traffic flow prediction module, an ablation
experiment is conducted in the British highway scenario, and the methods for comparison are listed as follows.

• Kalman filter: directly use a Kalman filter to predict the traffic flow.
• EMD + Kalman filter: preprocess the raw flow data by the EMD, then input the denoised flow data to a Kalman filter
to predict the traffic flow.
• LSTM: directly use an LSTM network to predict the traffic flow.
• EMD + LSTM: preprocess the raw flow data by the EMD, then input the denoised flow data to an LSTM network to
predict the traffic flow.
• EMD + dual Kalman filter + LSTM (proposed): preprocess the raw flow data by the EMD, then input the denoised flow
data to a Kalman filter and an LSTM network to predict the traffic flow of the next computation period respectively,
finally use another Kalman filter to fuse the two predictions.
During the training process of the LSTM model, the Adam optimizer is adopted and the hyperparameters used by the
LSTM model are listed in Table 2.
In order to demonstrate the effectiveness of the proposed CycleRL, we apply three traditional methods including
FIXTIME, MAXQUEUE and MP-TT, three allocation methods based on three kinds of traffic flow, three CycleRLs based
on three kinds of traffic flow and a pure RL based method for comparison.

• FIXTIME [2]: assign the same time to each signal of one cycle.
• MAXQUEUE [3]: always allow the direction that has the longest queue length to pass (it is a simplified version of
the max pressure algorithm [4] under a single intersection).
9
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Table 3
The hyperparameters used by the CycleRL and PureRL
methods.
Hyperparameter Value
Number of layers 3
The dimension of hidden vector 128
Batch size 50
Train episodes 30
Learning rate 0.001
exponential moving average coefficient τ 0.01
Discount factor γ 0.99
Explore steps τ 200
Reward normal factor 50
The size of replay buffer 1000
Gradient update steps 3

• MP-TT [32]: compute the pressure of each phase (the absolute difference between average travel time of vehicles
on entry lanes and it on exit lanes) and allocate the next signal cycle in proportion to the pressure.
• REALCUR: allocate the signal cycle scheme of the next computation period in proportion to the traffic flow of the
previous computation period.
• REAL: assume the true traffic flow of the next computation period is available and allocate the signal cycle scheme
in proportion to the true traffic flow.
• PRED(proposed): allocate the signal cycle scheme of the next computation period in proportion to the predicted
traffic flow that obtained by the proposed data-model hybrid driven traffic flow prediction module.
• CycleRL_REALCUR: apply the proposed robust signal cycle timing module based on human–machine collaboration
to finetune the signal cycle scheme that allocated by the REALCUR.
• CycleRL_REAL: apply the proposed robust signal cycle timing module based on human–machine collaboration to
finetune the signal cycle scheme that allocated by the REAL.
• CycleRL_PRED(proposed): apply the proposed robust signal cycle timing module based on human–machine collabo-
ration to finetune the signal cycle scheme that allocated by the PRED.
• PureRL [33]: directly apply the reinforcement learning algorithm (SAC) to allocate the signal cycle scheme of the
next computation period.

The main difference among CycleRL_PRED, CycleRL_REALCUR and CycleRL_REAL is the flow data used for preliminary
timing. Moreover, for fair comparison, the CycleRL and the PureRL methods use the same hyperparameters listed in
Table 3.

3.2.2. Metrics
For prediction, there are two metrics used to compare the performance of different traffic flow prediction models.

• RMSE: The root mean square error whose expression is as follows:

√
∑n ( )2
i=1 yi − ŷi
RMSE = , (11)
n−1
where yi , ŷi respectively represent the true flow and the predicted flow, and n is the number of samples.
• Error rate: The absolute value of the error divided by the true value:
n
1 ∑ |yi − ŷi |
Error rate = . (12)
n yi
i=1

• MAE: Mean absolute error:

n
1∑
MAE = |yi − ŷi |. (13)
n
i=1

• MAPE: Mean absolute percentage error:

n
1∑ |yi − ŷi |
MAPE = . (14)
n max(ϵ, |yi |)
i=1

where ϵ is a small positive number.

10
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 5. The predicted flows and true flows in the synthetic scenario.

Table 4
The overall average queue length in the synthetic scenario under different
methods.
Day6 Day7(only tested)
FIXTIME 13.3753 11.5413
MAXQUEUE 7.6748 7.6748
MP-TT 8.4375 7.6979
REALCUR 9.7548 10.2757
REAL 9.2920 9.5224
PRED 9.4680 9.6116
CycleRL_REALCUR 9.6690 9.9527
CycleRL_REAL 9.2033 9.4302
CycleRL_PRED 9.4371 9.5752
PureRL 17.1508 11.5665
CycleRL_REALCUR(DDPG) 9.4517 10.0638
CycleRL_REAL(DDPG) 9.1089 9.4188
CycleRL_PRED(DDPG) 9.3437 9.4974
PureRL(DDPG) 13.5613 12.6817
CycleRL_REALCUR(PPO) 9.6193 10.0953
CycleRL_REAL(PPO) 9.2153 9.4400
CycleRL_PRED(PPO) 9.4263 9.6146
PureRL(PPO) 13.4421 11.1745

For signal timing, there are also two main metrics including the average queue length of intersection (the sum of queue
length on each entry lane after one signal cycle) and the average travel time of vehicles to compare the efficiency of
intersection, and throughput(the number of passed vehicles) is used to illustrate the effectiveness of the proposed model.

3.3. Experimental results and discussion

3.3.1. Synthetic scenario of two-phase intersection

In the synthetic scenario, the period of flow computation is set to 10 min, and the duration of signal cycle is fixed to
100 seconds including yellow time between phases which is set to 3 sec. So the number of signal control steps during one
day is 864. To illustrate the effectiveness of the proposed traffic flow prediction module, the comparison between true
flow data of day 6 and corresponding flow data predicted by the proposed prediction module is presented in Fig. 5. It can
be seen that the prediction errors are small. Due to page limitation, the ablation experiment of the traffic flow prediction
model is only conducted in the British highway scenario.
Then, the overall average queue lengths on day 6 under different signal cycle timing methods and the overall average
queue lengths under them which are directly transferred to day 7 are listed in Table 4. The testing results under FIXTIME
11
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Table 5
The overall average travel time in the synthetic scenario under different
methods.
Day6 Day7(only tested)
FIXTIME 195.5914 65.6354
MAXQUEUE 119.0461 62.7847
MP-TT 137.9947 63.1472
REALCUR 113.3493 62.7211
REAL 106.2333 60.1429
PRED 109.7421 60.7034
CycleRL_REALCUR 111.0151 61.7619
CycleRL_REAL 104.9461 59.7644
CycleRL_PRED 106.8260 60.3319
PureRL 307.1430 67.1269
CycleRL_REALCUR(DDPG) 108.6481 62.1491
CycleRL_REAL(DDPG) 102.6855 59.8867
CycleRL_PRED(DDPG) 106.6224 60.2973
PureRL(DDPG) 140.1753 84.0975
CycleRL_REALCUR(PPO) 111.0972 62.2904
CycleRL_REAL(PPO) 104.5897 60.1651
CycleRL_PRED(PPO) 109.3894 60.5763
PureRL(PPO) 152.4202 67.2584

Fig. 6. The queue lengths at the evening peak in the synthetic scenario under four methods.

and PureRL are worse than other methods. Due to the control principle of the MAXQUEUE that the direction with the
longest queue length has the priority to pass, it achieves the shortest average queue length. However, it must cause
the vehicles on a direction with short queue length to wait too long time, thus making the travel time longer (as is
shown in Table 5). The MP-TT has similar performance of queue length and travel time to the MAXQUEUE due to its key
idea that improving the practical applicability rather than performance of the max-pressure algorithm by considering as
input travel time instead of queue length. On the other hand, PRED achieves the result superior to that of REALCUR and
inferior to that of REAL, which indirectly displays the effectiveness of the traffic flow prediction module. Moreover, by
adding the proposed signal cycle timing module to REALCUR, REAL and PRED, we design CycleRL_REALCUR, CycleRL_REAL
and CycleRL_PRED respectively which reduce the average queue length accordingly. It demonstrates that although the
flow-based allocation methods can obtain better results than FIXTIME due to the usage of flow information, they can
only handle the situation that the change of flow during one computation period is small which cannot be satisfied in
practical applications. Therefore the CycleRL methods can further improve the traffic efficiency by tackling the real-time
traffic dynamics.
To further illustrate the improved performance of the proposed CycleRL under different state-of-art DRL algorithms,
DDPG and PPO are used and the corresponding CycleRL models are trained. As shown in Table 5, Cycle_REALCUR,
Cycle_REAL and Cycle_PRED based on DDPG and PPO can all achieve better performance than REALCUR, REAL, PRED and
the corresponding PureRL under average queue length.
Since the flow at the evening peak in the synthetic scenario is much larger than that at the rest time and the true flow
data of the next computation period is not available which makes REAL and CycleRL_REAL not feasible in practice, we
compare the queue lengths at the evening peak under REALCUR, PRED, CycleRL_REALCUR and CycleRL_PRED. As illustrated
in Fig. 6, the queue lengths under both CycleRL_REALCUR and CycleRL_PRED are shorter than that under REALCUR and
PRED, which illustrates the effectiveness of the prediction module and CycleRL.
The overall average travel time under different timing methods in the synthetic scenario are listed in Table 5 and the
trajectories of the travel time under REALCUR, PRED, CycleRL_REALCUR and CycleRL_PRED are given in Fig. 7. Similar to
the results in Table 4 and Fig. 6, the prediction module and CycleRL methods can improve the traffic efficiency. Moreover,
as the evening peak arrives, the more crowded traffic is, the larger improvement CycleRL methods can obtain.
12
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 7. The travel time in the synthetic scenario under four methods.

Fig. 8. The predicted flows and true flows in the British highway scenario.

Table 6
The metrics under different traffic flow prediction models.
Direction1(M4 road) Direction2(y57 road)
RMSE Error rate MAE MAPE RMSE Error rate MAE MAPE
Kalman filter 43.76 0.251 26.138 0.251 27.85 0.3094 18.3141 0.310
EMD+Kalman 20.83 0.152 13.257 0.1449 14.17 0.1507 9.6808 0.151
LSTM 20.63 0.172 17.567 0.172 15.66 0.1775 12.6218 0.178
EMD+LSTM 13.17 0.122 9.450 0.122 11.01 0.1409 8.1316 0.147
EMD+dual Kalman+LSTM 13.15 0.119 9.434 0.119 10.90 0.14 8.1304 0.146

3.3.2. British highway scenario of two-phase intersection

In the British highway scenario, the period of flow computation is 15 min and other settings are consistent with those
in the synthetic scenario. The true flow data on October 28th and the corresponding prediction are shown in Fig. 8, which
illustrate the effectiveness of the proposed prediction module.
In order to show the effect of each part in the proposed traffic flow prediction module, the ablation experiment is
conducted and the results are listed in Table 6.
It is shown that all prediction metrics under the proposed traffic flow prediction module are the best and each part of
it (EMD, dual Kalman filter and LSTM) can accordingly improve the performance.
Then, the overall average queue lengths from October 26th to 29th and the ones directly transferred to October 30th
under different signal cycle timing methods are listed in Table 7.
13
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Table 7
The overall average queue length in the British highway scenario under different methods.
26th 27th 28th 29th 30th(only
tested)
FIXTIME 21.98 21.76 19.70 17.77 14.84
MAXQUEUE 7.91 7.36 7.10 8.63 7.12
MP-TT 7.73 9.75 9.14 7.78 8.82
REALCUR 11.64 13.89 11.65 14.00 9.79
REAL 9.94 12.74 10.02 12.75 9.32
PRED 10.72 13.81 10.47 13.23 9.55
CycleRL_REALCUR 11.03 13.14 11.25 13.32 9.54
CycleRL_REAL 9.52 11.05 9.62 12.70 9.29
CycleRL_PRED 9.51 12.38 9.72 12.79 9.37
PureRL 10.95 13.95 20.23 13.85 20.02
CycleRL_REALCUR(DDPG) 11.55 13.39 11.01 12.51 9.73
CycleRL_REAL(DDPG) 9.44 11.02 9.73 11.40 9.36
CycleRL_PRED(DDPG) 9.39 12.66 9.72 11.71 9.47
PureRL(DDPG) 52.69 50.97 50.96 51.86 49.77
CycleRL_REALCUR(PPO) 10.99 13.45 10.56 12.04 9.59
CycleRL_REAL(PPO) 9.88 11.97 9.82 11.24 9.34
CycleRL_PRED(PPO) 10.15 12.23 10.13 13.21 9.49
PureRL(PPO) 29.19 25.57 23.41 29.03 22.72

Table 8
The overall average travel time in the British highway scenario under different methods.
26th 27th 28th 29th 30th(only
tested)
FIXTIME 270.48 307.93 225.86 252.71 127.19
MAXQUEUE 70.55 69.53 68.13 70.22 66.41
MP-TT 73.64 80.82 71.42 82.59 68.59
REALCUR 72.51 79.87 69.96 73.45 64.89
REAL 68.03 70.86 65.11 70.72 64.00
PRED 69.44 71.44 65.93 71.32 64.45
CycleRL_REALCUR 72.31 78.45 68.74 70.54 64.81
CycleRL_REAL 67.17 68.94 64.44 70.93 63.92
CycleRL_PRED 67.14 69.55 64.64 69.70 64.09
PureRL 71.63 80.15 228.09 71.44 184.15
CycleRL_REALCUR(DDPG) 72.33 78.65 68.21 69.90 64.85
CycleRL_REAL(DDPG) 67.12 70.21 64.64 67.83 64.12
CycleRL_PRED(DDPG) 66.99 70.46 64.64 67.86 64.33
PureRL(DDPG) 832.67 1493.86 1632.60 1718.34 1050.81
CycleRL_REALCUR(PPO) 71.68 78.77 68.74 68.58 64.54
CycleRL_REAL(PPO) 67.89 69.62 65.87 67.21 64.06
CycleRL_PRED(PPO) 68.41 69.72 64.83 70.84 64.34
PureRL(PPO) 524.02 435.01 323.30 485.30 228.47

Like the results in the synthetic scenario, the performance under FIXTIME is worst on most days. As to PureRL, it
achieves results close to that of REALCUR on 26th, 27th and 29th while even worse than that of FIXTIME on 28th and
30th. It is due to the training instability of the reinforcement learning. Due to the same reason, the PureRL models based on
DDPG and PPO can only obtain the worst performance in 5 days. Similarly, PRED achieves results which are superior than
that of REALCUR and inferior to that of REAL, which demonstrates the effectiveness of the proposed prediction module.
Significantly, by adding the proposed signal cycle timing module, CycleRL_REALCUR, CycleRL_REAL and CycleRL_PRED
achieve better results compared with REALCUR, REAL and PRED. It is worth noting that CycleRL_PRED achieves the best
results close to that of REAL which is not feasible in practice. It demonstrates the effectiveness of the proposed signal cycle
timing module. On the final day, the signal cycle timing models of CycleRL_REALCUR, CycleRL_REAL and CycleRL_PRED
are directly transferred from the trained models of the previous day, which reveals the transferability of the proposed
CycleRL.
The queue lengths of the five days under PRED, REALCUR, CycleRL_PRED and CycleRL_REALCUR are shown in Fig. 9. It
shows that CycleRL can reduce the queue length at traffic peak and maintain the performance of preliminary rule-based
timing at the rest time.
Similarly, the overall average travel time from October 26th to 29th and the ones directly transferred to October 30th
under different signal cycle timing methods are listed in Table 8 and the results are the same as that of Table 7. Also, the
travel time of the five days under PRED, REALCUR, CycleRL_PRED and CycleRL_REALCUR is shown in Fig. 10, which also
illustrates the effectiveness of the prediction module and CycleRL methods.
In the following, we test three different models on 26th in the British highway scenario to illustrate the effectiveness
of the proposed model on throughput. The similar results can be obtained in other scenarios but not shown due to page
14
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 9. The queue lengths of the five days in the British highway scenario under four methods.

Fig. 10. The travel time of the five days in the British highway scenario under four methods.

limitation. It should be noted that the total throughput during one day is fixed due to the same flow data, so different
methods are compared under throughput during one day.
As shown in Fig. 11, the MP-TT and CycleRL_PRED can let more vehicles pass at the beginning of each traffic peak, thus
leaving less vehicles waiting to pass in the intersection at the end of the traffic peak. Moreover, the proposed CycleRL_PRED
has similar throughput performance to the MP-TT while can significantly decrease the average travel time as mentioned
above.
15
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Fig. 11. Throughput under three algorithms.

Table 9
The performance of different models in a four-phase intersection and the transferability of the proposed model.
Average queue length/vehicle
FIXTIME MAXQUEUE MP-TT PRED CycleRL_PRED
Train scenario 541.3 206.5 530.2 534.4 523.3
Only test scenario 446.7 110.05 449.8 363.3 359.1
Average travel time/second
FIXTIME MAXQUEUE MP-TT PRED CycleRL_PRED
Train scenario 1105.6 970.9 1019.2 941.4 932.2
Only test scenario 1506.1 1342.2 1368.2 1321.4 1307.2
Total throughput/vehicle
FIXTIME MAXQUEUE MP-TT PRED CycleRL_PRED
Train scenario 7094 2083 7264 7401 7433
Only test scenario 9535 5327 10120 10186 10226

3.3.3. Four-phase intersection

Five different models are compared in the four-phase intersection. The simulation time is set to 2 hr to show
different performance of these models under throughput. As shown in Table 9, the CycleRL_PRED model is firstly trained
and compared with others in the four-phase intersection and then tested under different flow data to illustrate its
effectiveness. MP-TT can significantly increase throughput compared to MAXQUEUE for its principle that explained above
and CycleRL_PRED can achieve better performance in a similar environment compared with other models under all
metrics.

4. Conclusion

In this paper, a novel cycle-based signal timing method with traffic flow prediction is proposed. This method has two
parts including the data-model hybrid driven traffic flow prediction module and the robust signal cycle timing module
based on human–machine collaboration. The traffic flow prediction module adopts the empirical mode decomposition
to denoise the raw traffic flow data and design another Kalman filter to fuse the predictions of Kalman filter and LSTM
network. The proposed signal cycle timing module is composed of a rule-based preliminary signal cycle scheme and an
SAC-based finetuning scheme. The traffic signal control for single intersection is studied in this paper. How to predict
traffic flow and design traffic signal control scheme for large-scale intersection transportation systems is of our research
interest in future.

Funding

This work is supported by National Key R and D Program of China under Grant 2021ZD0112700, National Natural
Science Foundation (NNSF) of China under Grant 61973082 and 62233003, and the Natural Science Foundation of Jiangsu
Province of China under Grant BK20202006.

CRediT authorship contribution statement

Yisha Li: Conceptualization, Methodology, Software, Validation, Writing – original draft, Writing – review & editing.
Guoxi Chen: Formal analysis, Methodology, Software. Ya Zhang: Supervision, Project administration, Funding acquisition.
16
Y. Li, G. Chen and Y. Zhang Physica A 623 (2023) 128877

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential
competing interests: Ya Zhang reports financial support was provided by National Key R&D Program of China. Ya Zhang
reports financial support was provided by National Natural Science Foundation (NNSF) of China. Ya Zhang reports financial
support was provided by the Natural Science Foundation of Jiangsu Province of China.

Data availability

Data will be made available on request.

References

[1] X. Liang, T. Yan, J. Lee, G. Wang, A distributed intersection management protocol for safety, efficiency, and driver’s comfort, IEEE Internet Things
J. 5 (3) (2018) 1924–1935.
[2] A.J. Miller, Settings for fixed-cycle traffic signals, J. Oper. Res. Soc. 14 (4) (1963) 373–386.
[3] M. Georg, C. Jechlitschek, S. Gorinsky, Improving individual flow performance with multiple queue fair queuing, in: 2007 Fifteenth IEEE
International Workshop on Quality of Service, IEEE, 2007, pp. 141–144.
[4] P. Varaiya, Max pressure control of a network of signalized intersections, Transp. Res. C 36 (2013) 177–195.
[5] P. Mercader, W. Uwayid, J. Haddad, Max-pressure traffic controller based on travel times: An experimental analysis, Transp. Res. C 110 (2020)
275–290.
[6] I. Porche, S. Lafortune, Adaptive look-ahead optimization of traffic signals, J. Intell. Transp. Syst. 4 (3–4) (1999) 209–254.
[7] S.-B. Cools, C. Gershenson, B. D’Hooghe, Self-organizing traffic lights: A realistic simulation, in: Advances in Applied Self-Organizing Systems,
Springer, 2013, pp. 45–55.
[8] Y. Du, A. Kouvelas, W. ShangGuan, M.A. Makridis, Dynamic capacity estimation of mixed traffic flows with application in adaptive traffic
signal control, Physica A Stat. Mech. Appl. (ISSN: 0378-4371) 606 (2022) 128065, http://dx.doi.org/10.1016/j.physa.2022.128065, URL https:
//www.sciencedirect.com/science/article/pii/S037843712200663X.
[9] B. Abdulhai, R. Pringle, G.J. Karakoulas, Reinforcement learning for true adaptive traffic signal control, J. Transp. Eng. 129 (3) (2003).
[10] L. Prashanth, S. Bhatnagar, Reinforcement learning with function approximation for traffic signal control, IEEE Trans. Intell. Transp. Syst. 12 (2)
(2010) 412–421.
[11] L. Li, Y. Lv, F.-Y. Wang, Traffic signal timing via deep reinforcement learning, IEEE/CAA J. Autom. Sin. 3 (3) (2016) 247–254.
[12] H. Wei, G. Zheng, H. Yao, Z. Li, Intellilight: A reinforcement learning approach for intelligent traffic light control, in: Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 2496–2505.
[13] X. Liang, X. Du, G. Wang, Z. Han, A deep reinforcement learning network for traffic light cycle control, IEEE Trans. Veh. Technol. 68 (2) (2019)
1243–1253.
[14] H. Nakanishi, T. Namerikawa, Optimal traffic signal control for alleviation of congestion based on traffic density prediction by model predictive
control, in: 2016 55th Annual Conference of the Society of Instrument and Control Engineers of Japan, SICE, IEEE, 2016, pp. 1273–1278.
[15] G. Zheng, X. Zang, N. Xu, H. Wei, Z. Yu, V. Gayah, K. Xu, Z. Li, Diagnosing reinforcement learning for traffic signal control, 2019, arXiv preprint
arXiv:1905.04716.
[16] Y. Xiong, G. Zheng, K. Xu, Z. Li, Learning traffic signal control from demonstrations, in: Proceedings of the 28th ACM International Conference
on Information and Knowledge Management, 2019, pp. 2289–2292.
[17] L.N. Alegre, T. Ziemke, A.L. Bazzan, Using reinforcement learning to control traffic signals in a real-world scenario: an approach based on linear
function approximation, IEEE Trans. Intell. Transp. Syst. (2021).
[18] X. Hu, C. Zhao, G. Wang, A traffic light dynamic control algorithm with deep reinforcement learning based on GNN prediction, 2020, arXiv
preprint arXiv:2009.14627.
[19] M. Abdoos, A.L. Bazzan, Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory,
Expert Syst. Appl. 171 (2021) 114580.
[20] A. Bignold, F. Cruz, R. Dazeley, P. Vamplew, C. Foale, Human engagement providing evaluative and informative advice for interactive
reinforcement learning, Neural Comput. Appl. (2022) 1–16.
[21] Y. Xie, Y. Zhang, Z. Ye, Short-term traffic volume forecasting using Kalman filter with discrete wavelet decomposition, Comput.-Aided Civ.
Infrastruct. Eng. 22 (5) (2007) 326–334.
[22] Y. Wang, J.H. Van Schuppen, J. Vrancken, Prediction of traffic flow at the boundary of a motorway network, IEEE Trans. Intell. Transp. Syst. 15
(1) (2013) 214–227.
[23] T. Zhou, D. Jiang, Z. Lin, G. Han, X. Xu, J. Qin, Hybrid dual Kalman filtering model for short-term traffic flow forecasting, IET Intell. Transp. Syst.
13 (6) (2019) 1023–1032.
[24] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8) (1997) 1735–1780.
[25] X. Ma, Z. Tao, Y. Wang, H. Yu, Y. Wang, Long short-term memory neural network for traffic speed prediction using remote microwave sensor
data, Transp. Res. C 54 (2015) 187–197.
[26] Y. Tian, L. Pan, Predicting short-term traffic flow by long short-term memory recurrent neural network, in: 2015 IEEE International Conference
on Smart City/SocialCom/SustainCom (SmartCity), IEEE, 2015, pp. 153–158.
[27] L. Cai, M. Lei, S. Zhang, Y. Yu, T. Zhou, J. Qin, A noise-immune LSTM network for short-term traffic flow forecasting, Chaos 30 (2) (2020)
023135.
[28] X. Chen, H. Chen, Y. Yang, H. Wu, W. Zhang, J. Zhao, Y. Xiong, Traffic flow prediction by an ensemble framework with data denoising and deep
learning model, Physica A Stat. Mech. Appl. 565 (2021) 125574.
[29] Z. Li, H. Ge, R. Cheng, Traffic flow prediction based on BILSTM model and data denoising scheme, Chin. Phys. B 31 (4) (2022) 040502.
[30] X. Bai, X. Wang, X. Liu, Q. Liu, J. Song, N. Sebe, B. Kim, Explainable deep learning for efficient and robust pattern recognition: A survey of
recent developments, Pattern Recognit. 120 (2021) 108102.
[31] H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, Z. Li, Cityflow: A multi-agent reinforcement learning environment
for large scale city traffic scenario, in: The World Wide Web Conference, 2019, pp. 3620–3624.
[32] P. Mercader, W. Uwayid, J. Haddad, Max-pressure traffic controller based on travel times: An experimental analysis, Transp. Res. C 110 (2020)
275–290.
[33] T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,
in: International Conference on Machine Learning, PMLR, 2018, pp. 1861–1870.

Business Statistics Using Excel PDF
93% (14)
Business Statistics Using Excel PDF
505 pages
Introduction To Statistical Modelling PDF
100% (1)
Introduction To Statistical Modelling PDF
133 pages
Formula Sheet
No ratings yet
Formula Sheet
7 pages
Mitigating Action Hysteresis in Traffic Signal Control With Traffic Predictive Reinforcement Learning
No ratings yet
Mitigating Action Hysteresis in Traffic Signal Control With Traffic Predictive Reinforcement Learning
12 pages
Traffic Signal Control A Double Q-Learning Approac
No ratings yet
Traffic Signal Control A Double Q-Learning Approac
5 pages
IEEE DQL Regional Network
No ratings yet
IEEE DQL Regional Network
5 pages
Applsci 13 02750 v2
No ratings yet
Applsci 13 02750 v2
23 pages
Multi Agent Learning Automata For Online Adaptive Control of Large Scale Traffic Signal Systems
No ratings yet
Multi Agent Learning Automata For Online Adaptive Control of Large Scale Traffic Signal Systems
6 pages
IEEE 2023 Cooperative Control
No ratings yet
IEEE 2023 Cooperative Control
5 pages
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
No ratings yet
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
20 pages
Electronics 10 02363 v2
No ratings yet
Electronics 10 02363 v2
32 pages
Actuators 13 00251
No ratings yet
Actuators 13 00251
15 pages
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
No ratings yet
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
10 pages
IEEE 2024 DQL Improved DQL
No ratings yet
IEEE 2024 DQL Improved DQL
6 pages
An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things
No ratings yet
An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things
11 pages
Adaptability and Sustainability of Machine Learning Approaches To Traffic Signal Control
No ratings yet
Adaptability and Sustainability of Machine Learning Approaches To Traffic Signal Control
12 pages
Graph-Based Cooperation Multi-Agent Reinforcement Learning For Intelligent Traffic Signal Control
No ratings yet
Graph-Based Cooperation Multi-Agent Reinforcement Learning For Intelligent Traffic Signal Control
13 pages
1 Two-Layer Coordinated Reinforcement Learning For Traffic Signal Control in Traffic Network
No ratings yet
1 Two-Layer Coordinated Reinforcement Learning For Traffic Signal Control in Traffic Network
12 pages
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
No ratings yet
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
16 pages
Sensors 22 08732 v3
No ratings yet
Sensors 22 08732 v3
21 pages
Traffic Light Control With Reinforcement Learning
No ratings yet
Traffic Light Control With Reinforcement Learning
18 pages
2018 TS Optimization 34
No ratings yet
2018 TS Optimization 34
17 pages
Q1 Traffic Light Scheduling For Pedestrian-Vehicle Mixed-Flow Networks
No ratings yet
Q1 Traffic Light Scheduling For Pedestrian-Vehicle Mixed-Flow Networks
16 pages
These YIN UTBM
No ratings yet
These YIN UTBM
183 pages
RL Paper Latex v01d01
No ratings yet
RL Paper Latex v01d01
6 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
22 pages
Optimize Traffic Signal Control
No ratings yet
Optimize Traffic Signal Control
11 pages
A Review of Reinforcement Learning Applications in Adaptive Traffic
No ratings yet
A Review of Reinforcement Learning Applications in Adaptive Traffic
17 pages
Research On Highway Traffic Flow Prediction Model and Decision Making Method
No ratings yet
Research On Highway Traffic Flow Prediction Model and Decision Making Method
11 pages
Swarm Intelligence Inspired Adaptive Traf C Control For Traf C Networks
No ratings yet
Swarm Intelligence Inspired Adaptive Traf C Control For Traf C Networks
11 pages
A Traffic Light Control Method Based On Multi Agent Deep Reinforcement Learning Algorithm
No ratings yet
A Traffic Light Control Method Based On Multi Agent Deep Reinforcement Learning Algorithm
11 pages
Sensors 24 03987 v2
No ratings yet
Sensors 24 03987 v2
19 pages
8349-Article Text-48881-2-10-20201129
No ratings yet
8349-Article Text-48881-2-10-20201129
16 pages
5 FairLight - Fairness-Aware Autonomous Traffic Signal Control With Hierarchical Action Space
No ratings yet
5 FairLight - Fairness-Aware Autonomous Traffic Signal Control With Hierarchical Action Space
13 pages
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
No ratings yet
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
29 pages
Mathematics 12 02056 v2
No ratings yet
Mathematics 12 02056 v2
24 pages
Aa Aa
No ratings yet
Aa Aa
16 pages
Urban Traffic Signal Control Using Reinforcement Learning Agents
No ratings yet
Urban Traffic Signal Control Using Reinforcement Learning Agents
13 pages
3 Discharge Control Policy Based On Density and Speed For Deep Q-Learning Adaptive Traffic Signal
No ratings yet
3 Discharge Control Policy Based On Density and Speed For Deep Q-Learning Adaptive Traffic Signal
21 pages
Improving Traffic Light Systems Using Deep Q-Networks
No ratings yet
Improving Traffic Light Systems Using Deep Q-Networks
13 pages
Taffic Control System
No ratings yet
Taffic Control System
19 pages
Hamsa Seminar Report
No ratings yet
Hamsa Seminar Report
18 pages
Reinforcement Learning-Based Intelligent Traffic Signal Control Considering Sensing Information of Railway
No ratings yet
Reinforcement Learning-Based Intelligent Traffic Signal Control Considering Sensing Information of Railway
12 pages
Traffic Simultaion
No ratings yet
Traffic Simultaion
12 pages
Articulo 2
No ratings yet
Articulo 2
14 pages
Rpaper 32
No ratings yet
Rpaper 32
4 pages
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
No ratings yet
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
12 pages
Ijrte02020406 PDF
No ratings yet
Ijrte02020406 PDF
3 pages
RL Report TEAM - 6
No ratings yet
RL Report TEAM - 6
13 pages
10 1109@tits 2020 2984033
No ratings yet
10 1109@tits 2020 2984033
10 pages
Optimizing Traffic Flow at Four-Legged Intersections: A Study On Intelligent Signal Systems
No ratings yet
Optimizing Traffic Flow at Four-Legged Intersections: A Study On Intelligent Signal Systems
9 pages
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
No ratings yet
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
8 pages
Multi-Agent Reinforcement Learning For Traffic Signal Control Through Universal Communication Method
No ratings yet
Multi-Agent Reinforcement Learning For Traffic Signal Control Through Universal Communication Method
12 pages
A Survey of Reinforcement and Deep Reinforcement Learning For Coordination in Intelligent Traffic Light Control
No ratings yet
A Survey of Reinforcement and Deep Reinforcement Learning For Coordination in Intelligent Traffic Light Control
15 pages
Neural Networks For Real-Time Traffic Signal Control (Srinivasan Et Al., 2006)
No ratings yet
Neural Networks For Real-Time Traffic Signal Control (Srinivasan Et Al., 2006)
12 pages
Ecolight: Reward Shaping in Deep Reinforcement Learning For Ergonomic Traffic Signal Control
No ratings yet
Ecolight: Reward Shaping in Deep Reinforcement Learning For Ergonomic Traffic Signal Control
5 pages
Using Digital Twins To Manage Traffic Flows
No ratings yet
Using Digital Twins To Manage Traffic Flows
6 pages
Distributed Traffic Light Control at Uncoupled Intersections With Real-World Topology by Deep Reinforcement Learning
No ratings yet
Distributed Traffic Light Control at Uncoupled Intersections With Real-World Topology by Deep Reinforcement Learning
9 pages
Revised Project Proposal (RUNIX)
No ratings yet
Revised Project Proposal (RUNIX)
6 pages
1 s2.0 S2095756425000480 Main
No ratings yet
1 s2.0 S2095756425000480 Main
34 pages
Optimizing Bus Operations at Autonomous
No ratings yet
Optimizing Bus Operations at Autonomous
14 pages
Pone 0298417
No ratings yet
Pone 0298417
22 pages
Intelligent Technologies for Automated Electronic Systems
From Everand
Intelligent Technologies for Automated Electronic Systems
S. Kannadhasan
No ratings yet
Stats-Edited Answers
No ratings yet
Stats-Edited Answers
30 pages
EBSCO-FullText-20 04 2025
No ratings yet
EBSCO-FullText-20 04 2025
15 pages
Matthew Hong JMP
No ratings yet
Matthew Hong JMP
48 pages
TB frq5
No ratings yet
TB frq5
13 pages
MTech QROR PQB 2016
No ratings yet
MTech QROR PQB 2016
24 pages
McNeish (2023, Categorical DFI)
No ratings yet
McNeish (2023, Categorical DFI)
41 pages
04 Probability Distributions
No ratings yet
04 Probability Distributions
47 pages
Weed Control Strips Influences On The Rubber Tree Growth
No ratings yet
Weed Control Strips Influences On The Rubber Tree Growth
10 pages
CTL - SC0x Supply Chain Analytics: Key Concepts Document
No ratings yet
CTL - SC0x Supply Chain Analytics: Key Concepts Document
11 pages
Kalman-Type Filtering Using The Wavelet Transform: Olivier Renaud Jean-Luc Starck Fionn Murtagh
No ratings yet
Kalman-Type Filtering Using The Wavelet Transform: Olivier Renaud Jean-Luc Starck Fionn Murtagh
24 pages
Midterm I Review - Final
No ratings yet
Midterm I Review - Final
25 pages
Drill Hole Spacing Analysis For Classification and Cost Optimisation
No ratings yet
Drill Hole Spacing Analysis For Classification and Cost Optimisation
13 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
M Stat (2015) - Revised PDF
No ratings yet
M Stat (2015) - Revised PDF
59 pages
Mathematical Modeling and Computation in Finance
No ratings yet
Mathematical Modeling and Computation in Finance
3 pages
I-YOLO: A Novel Single-Stage Framework For Small Object Detection
No ratings yet
I-YOLO: A Novel Single-Stage Framework For Small Object Detection
18 pages
Test Code: STB (Short Answer Type) 2015
No ratings yet
Test Code: STB (Short Answer Type) 2015
3 pages
Ch.3 Normal Distribution
No ratings yet
Ch.3 Normal Distribution
1 page
Course Handout - MA2201 - Jan-May 2023
No ratings yet
Course Handout - MA2201 - Jan-May 2023
5 pages
A Study On The Effects of The Library Services and Resources To The Learning Performance of Isa Town Secondary School
No ratings yet
A Study On The Effects of The Library Services and Resources To The Learning Performance of Isa Town Secondary School
5 pages
Towards Smart-Data - Improving Predictive Accuracy in Long-Term Football Team Performance - 2
No ratings yet
Towards Smart-Data - Improving Predictive Accuracy in Long-Term Football Team Performance - 2
28 pages
Chapter 4
100% (1)
Chapter 4
166 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
A Comprehensive Survey On Generative Diffusion Models For Structured Data
No ratings yet
A Comprehensive Survey On Generative Diffusion Models For Structured Data
20 pages
Send-Up Examination-2017 Subject: Statistics: Fazaia Degree College Arf Kamra
No ratings yet
Send-Up Examination-2017 Subject: Statistics: Fazaia Degree College Arf Kamra
3 pages
IB Maths Exploration Guide 2025
No ratings yet
IB Maths Exploration Guide 2025
35 pages

4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment

Uploaded by

4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment

Uploaded by

Physica A 623 (2023) 128877

Contents lists available at ScienceDirect

Cycle-based signal timing with traffic flow prediction for

Fig. 1. The overall framework of the proposed CycleRL.

2.1. Overall framework of the proposed CycleRL

2.2. Data-model hybrid driven traffic flow prediction module

2.2.1. Data preprocessing

2.2.2. Kalman filter based prediction algorithm

θ̂ (k | k) = θ̂ (k | k − 1) + K (k) · (v (k) − H(k) · θ̂ (k | k − 1)),

X (k) = H (k) P (k|k − 1) H ⊤ (k) + O,

v̂kal (k + 1) = H (k + 1) θ̂ (k + 1 | k). (4)

2.2.3. LSTM network based prediction algorithm

v̂lstm (k + 1) = LSTM(H (k + 1)). (5)

2.2.4. Fusion of predictions

x̂(k | k − 1) = Ax̂(k − 1 | k − 1),

x̂(k | k) = x̂(k | k − 1) + Kk · (y(k) − C ⊤ · x̂(k | k − 1)),

v̂ (k) = 0.5 ∗ v̂lstm (k) + v̂kal (k) − [1, −1] x̂(k | k) .

2.3. Robust signal cycle timing module based on human–machine collaboration

2.3.1. Traffic signal control problem formulation

s.t . g min ⩽ gphase_i ⩽ g max , i = 1, . . . , phase_num,

2.3.2. Definition of state, action and reward

Fig. 2. Example of a two-phase intersection.

2.3.3. Robust signal cycle timing based on human–machine collaboration

Algorithm 1 Robust Signal Cycle Timing based on Human–Machine Collaboration

J (α) = E [−α log πθ (a|s) + ακ ] ,

Jπ (θ) = Es∼D Ea∼πθ α log πθ (a|s) − Qφ (s, a) ,

next action to take.

Fig. 3. The traffic flow in different scenarios.

3.1. Data and experimental scenarios

Fig. 4. A four-phase intersection.

3.2. Different methods for comparison and metrics

3.2.1. Different methods for comparison

• RMSE: The root mean square error whose expression is as follows:

• MAE: Mean absolute error:

• MAPE: Mean absolute percentage error:

where ϵ is a small positive number.

3.3. Experimental results and discussion

3.3.1. Synthetic scenario of two-phase intersection

3.3.2. British highway scenario of two-phase intersection

Fig. 11. Throughput under three algorithms.

3.3.3. Four-phase intersection

CRediT authorship contribution statement

Declaration of competing interest

Data will be made available on request.

You might also like