IEEE 2024 DQL Improved DQL

This document presents a traffic signal control method utilizing improved deep reinforcement learning (DRL) to optimize traffic flow at intersections without predefined phase sequences or durations. The proposed method introduces a three-dimensional action space and a new loss function that dynamically adjusts weights based on the importance of actions, resulting in significant reductions in average waiting time and queue length. Simulation results demonstrate that this method outperforms traditional traffic signal control approaches, enhancing overall traffic efficiency.

Uploaded by

Jablan M Karaklajic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

IEEE 2024 DQL Improved DQL

Uploaded by

Jablan M Karaklajic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A traffic signal control method based on improved

deep reinforcement learning*

1st Ye Bao-Lin 2rd Chen Dong 3rd Wu Peng
School of Information Science and School of Information Science and School of Information Science and
Engineering Engineering Engineering
-LD[LQJ 8QLYHUVLW\ Zhejiang Sci-Tech University Zhejiang Sci-Tech University
Jiaxing, China Hangzhou, China Hangzhou, China
[email protected] [email protected] [email protected]
th
4 Wu Weimin 5th Lingxi Li 6th Chen Bin
State Key Laboratory of Industrial Elmore Family School of Electrical and Jiaxing Key Laboratory of Smart
Control Technology Computer Engineering Transportations
Zhejiang University Purdue University Jiaxing University
Hangzhou, China West Lafayette, USA Jiaxing, China
[email protected] [email protected] [email protected]

$EVWUDFW—In order to solve the problem that the existing In the TSC method based on deep reinforcement learning
traffic signal control methods usually need to predefine the (DRL), according to the action setting scheme, it can be
phase sequence or phase duration, a single intersection traffic divided into three categories: phase switching, phase
signal control method based on improved deep reinforcement selection and phase duration [4]. Aslani[5] et al. designed the
learning Double D N (DD N) was proposed. Firstly, a traffic state space based on the queue length at the intersection, and
signal control model based on deep reinforcement learning set the traffic signal control action as phase switching. Tan[6]
DD N was constructed to optimize the estimated value of et al. designed the state space based on the current traffic
action-value function and the iterative process of target value.
light phase, queue length and average speed at the
2024 China Automation Congress (CAC) | 979-8-3503-6860-4/24/$31.00 ©2024 IEEE | DOI: 10.1109/CAC63892.2024.10865285

Secondly, the original one-dimensional action space was

intersection, and set the switching mode of traffic signal
discretized into a three-dimensional action space composed of
phase selection, phase duration and loss factor, and the next
control as phase switching. Wu et al. [7, 8] designed the state
phase and phase green time to be executed were intelligently space based on the intersection vehicle position, speed, queue
decided. Finally, a new loss function is designed based on the length, pedestrian number and traffic light phase, and set the
loss factor, which can dynamically adjust the loss weight of traffic signal control action to maintain the current phase or
different actions according to the difference of importance. The switch to the next phase according to the preset phase
simulation test is carried out based on the microscopic duration and phase sequence. Wei[9] et al. designed the state
simulation platform SUMO. The experimental results show space based on the phase of traffic lights and the number of
that, compared with the baseline method, the proposed method vehicles entering and leaving the lane, and set the switching
reduces the average waiting time by 28.1%, the average queue mode of traffic signal control as phase selection. Xu[10] et al.
length by 29.7%, and the average number of parking vehicles designed the state space based on the position, speed and
by 18.8%. traffic light phase of vehicles at the intersection, and set the
TSC action as the agent selecting a certain phase of the preset
.H\ZRUGV²7UDIILF VLJQDO FRQWURO GHHS UHLQIRUFHPHQW phase library. Liang[11] et al. designed the state space based
OHDUQLQJ ORVV IXQFWLRQ GRXEOH '41 on the discrete speed and position grid, and set the action of
the TSC method to update the duration of each phase, but
this makes it difficult for the agent to achieve the optimal
I. INTRODUCTION control effect [12].
Traffic jams not only increase the economic cost of In summary, although the existing research work has
people's travel, but also hinder the sustainable development achieved some valuable research results, there are still two
of society. In order to improve the traffic efficiency of road problems as follows. Firstly, the existing traffic signal control
network, researchers at home and abroad have carried out a methods usually need to use historical flow information to
lot of research work on traffic signal control. Webster[1] calculate or determine the phase sequence or phase duration
method uses mathematical model to calculate the phase of the intersection through manual analysis. Secondly, in the
period length and phase segmentation of traffic signal control existing methods based on DRL, it is difficult to reasonably
based on the scene that vehicles uniformly arrive at a single allocate the weight of phase sequence and phase duration in
intersection in a certain historical period. SCOOT[2] the loss when controlling them simultaneously. Therefore, in
adaptive traffic signal control (TSC) system uses the periodic order to solve the above problems, this paper proposes a
flow distribution diagram (CFP) and the intersection model traffic signal control method based on improved DRL.
to establish a mathematical model to optimize the green-to- Experimental results show that compared with the traditional
signal ratio in each phase cycle. The SCATS[3] system TSC method and the baseline method involved in the
selects the best timing scheme from the predefined traffic comparison, the proposed method has higher traffic
signal control timing scheme library according to the efficiency. The main contributions of this study include: (1)
difference of each phase saturation. The proposed method discretizes the original single action
space into a three-dimensional action space composed of

979-8-3503-6860-4/24/$31.00
Authorized ©2024
licensed use limited to: to IEEExplore IEEE
provided 5959Downloaded on February 16,2025 at 08:28:24 UTC from IEEE Xplore. Restrictions apply.
by University Libraries | Virginia Tech.
phase selection, phase duration and loss factor, and location corresponding to this grid. However, when the value
adaptively decides the next phase and phase duration, (2) of the discreted grid element is equal to 1, it means that the
Secondly, a new loss function based on loss factor is corresponding vehicle monitoring location of this grid has
proposed, the weight of each loss term is intelligently vehicles. In Fig. 2d, when the value of the grid element is
assigned, and the update process of the loss function is greater than 0, it means that the vehicle speed of the
dynamically optimized, and (3) A random vehicle generation corresponding vehicle monitoring location of the grid.
model based on Weber distribution is designed to simulate
the traffic flow in different periods.
II. THREE ELEMENTS OF TRAFFIC SIGNAL CONTROL BASED
ON DEEP REINFORCEMENT LEARNING
In this study, a simulation environment of a typical
intersection with 12 lanes is built based on the SUMO. As
shown in Fig. 1, each entrance direction contains three lanes:
left-turn, straight, right-turn and straight lane. The
intersection contains four phases: straight north-south (lanes
1, 2, 7, 8), left turn north-south (lanes 3, 9), straight east-west
(lanes 4, 5, 10, 11), and left turn east-west (lanes 6, 12), Fig. 2 Schematic diagram of the state space.
where right-turn vehicles are not controlled by traffic signals.
% $FWLRQ 6SDFH
Typical traffic signal control action space mainly
includes non-conflict phase selection scheme or variable
phase duration scheme [15]. Although this scheme can
quickly switch to the phase with the maximum current traffic
demand, it requires a predefined phase duration as the agent
decision interval. In the case of unbalanced traffic demand in
all directions, the non-conflicting phase selection scheme
may not flexibly exploit the potential of DRL in TSC. In the
proposed method, the original one-dimensional action space
was discretized into a three-dimensional space containing
phase selection, phase duration and loss factor, and the
weight of each action in the loss function was adjusted by the
agent. The action is represented as follows,
Fig. 1 Schematic diagram of an intersection. ⋃ ∈ ˈ ∈ , , (2)
The TSC model can be approximated as a typical Markov Where, represents the loss factor. denote a phase from a
Decision Process (MDP), that is, the dynamic process of [13]. predefined phase library, and denote a duration from a
Where, S is the set of state spaces, represents the variables predefined phase duration range G.
that can describe the state information of the intersection,
and s represents the traffic state of a certain scale. A is the & 5HZDUG )XQFWLRQ
action space set, which represents the set of all phases in the A reward is the reward or punishment that the
traffic signal control phase library. P represents the state environment gives to the agent after the agent has performed
transition probability, which represents the probability that an action. Reward and punishment information is the
the state of the intersection changes into after the execution direction and key for the agent to find the optimal decision
of the state. In this study, the temporal difference method is [16]. In TSC methods based on DRL, the reward function is
used to approximately solve P. R denotes the reward set and usually defined by traffic efficiency indicators (such as the
represents the reward fed back by the environment after sum of the number of vehicles queuing and the sum of the
performing action a in state s. Indicates the degree of waiting time of vehicles). In this study, based on the high
influence on the state after executing the action. saturation of traffic flow, the reward function is designed by
the difference of the number of vehicles queuing, and the
$ 6WDWH 6SDFH queue number at the intersection at the last decision time is
In this study, the traffic state of the intersection is used as the baseline to guide the agent to accurately judge the
discreted into the number of lanes that represent the entrance quality of the action. The reward at the decision moment is
of the intersection. H represents the number of discreted given by,
monitoring indicators, the effective monitoring length of the
entrance lane, and the effective grid length of the standard 1 (3)
vehicle [14]. As shown in Fig. 2, the effective monitoring Where, and are the reward obtained by the agent at
length of each incoming lane is discreted into grids, and each W and the number of vehicle queues at the intersection,
grid information describes the discrete index of a vehicle in respectively. When 0 , indicates that the traffic
the monitoring area. The distribution information of vehicles condition at the intersection at time is improved relative to
in a certain approach lane is shown in Fig. 2b, and the that at -1 time. The agent positively optimizes the neural
corresponding vehicle position information table and speed network parameters according to the feedback signal, and
information table are shown in Fig. 2c and Fig. 2d. In Fig. 2c, positively rewards the actions performed within the time
the value of the discreted grid element is equal to 0, which step.
describes that there is no vehicle in the vehicle monitoring

5960Downloaded on February 16,2025 at 08:28:24 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech.
III. DEEP REINFORCEMENT LEARNING ALGORITHM DESIGN model and flexibly decide the optimal timing scheme [19].
Therefore, in order to intellitively adjust the weight of each
$ %DVLF )UDPHZRUN RI 76& %DVHG RQ '5/ loss in the loss function, this study designed the function as
Reinforcement learning is a field of machine learning that an action-value function with phase selection, phase duration
emphasizes observing the environment and acting in order to and loss factor as parameters. The function as follows,
maximize the expected benefit. As shown in Fig. 3, the agent
senses the state of the environment and decides the optimal , , , , , ,
action to be taken in the environment. After the environment '
executes the action, the state will be changed, and a reward
⋅ 1, , , 1, , , , , (4)
and punishment signal will be generated to feedback to the Where, , and represent the phase selection, phase
agent. The agent uses the environment feedback signal to duration and loss factor of the agent at time t, respectively.
continuously optimize the optimal policy until the agent
receives the termination signal from the environment. The The TSC method proposed in this paper based on
state of the environment, the feedback reward and improved DDQN, its specific execution process is
punishment signals and the action output by the agent summarized as Algorithm 1. The framework for improving
constitute the basic framework of reinforcement learning the DRL algorithm is shown in Fig. 4.
algorithm, forming a dynamic system of Markov decision.
Algorithm 1: TSC method based on improved DDQN
1 Initialize the training parameters˖Training round 1ˈ
The maximum number of training steps Q, etc;
2 Initialize the experiment parameters˖Period , greedy
coefficient , learning rate , etc;
3 for episodes = 0 to 1 do
4 Initialize the road network environment and load the
traffic flow data;
5 for t=1 to Q do
6 Traffic state and calculate the estimated value;
Fig. 3 A reinforcement learning model. 7 Perform action and phase duration ;
8 Calculate the reward and get the state 1;
The goal of DRL is to decide the optimal policy from
9 Store st , at , gt , rt , st 1 to the experience pool;
different strategies through continuous interaction with the
environment and trial and error, so that the expected value of 10 % samples are extracted to train neural network;
cumulative returns under this strategy can be maximized [17]. 11 Update the evaluation network parameters ;
12 if Number of decisions % 7 = 0
% 76& 0HWKRG %DVHG RQ ,PSURYHG ''41 $OJRULWKP 13 Update the target network parameters ' ;
The core of DDQN algorithm based on improved DRL is 14 End if
to introduce the phase duration action space on the basis of 13 End for
keeping the original action space as phase selection, so that 14 End for
the agent can decide the phase and phase duration of the TSC

Fig. 4 Schematic diagram of a TSC method based on improved DDQN.

IV. SIMULATION AND RESULT ANALYSIS
$ 6LPXODWLRQ (QYLURQPHQW DQG 3DUDPHWHU 6HWWLQJ

5961Downloaded on February 16,2025 at 08:28:24 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech.
In order to verify the performance of the proposed $GDSWLYH FRQWURO Adaptive traffic signal control
method, an intersection simulation environment is built calculates the number of vehicles in each lane based on the
based on the microscopic traffic simulation platform SUMO, vehicles arriving at a single intersection in real time. The
and the control platform based on python language is used to agent extends the current phase or switches phase
interact with the Traci interface provided by SUMO.
sequentially according to whether there are waiting vehicles
,QWHUVHFWLRQ SDUDPHWHU VHWWLQJ As shown in Fig. 1, in the corresponding lane of the current phase.
the typical intersection is taken as the research object in this 3KDVH VZLWFKLQJ WUDIILF VLJQDO FRQWURO Phase
study. Each road is set as a two-way lane, and the entrance switching traffic signal control method is that the agent
of each direction contains three lanes, which are left turn monitors the single intersection in real time, extracts the
lane, straight lane, straight line and right turn lane characteristic information of the traffic state, and
respectively. The road length of each lane is 400m, and the intelligently decides to extend the current phase or switch
maximum speed limit is 50km/h. In addition, the maximum the phase in an orderly manner.
phase duration of the intersection is set to 100s, and the 3KDVH VHOHFWLRQ WUDIILF VLJQDO FRQWURO Phase selection
yellow light switching time between different phases is set traffic signal control method is that the agent extracts the
to 3s. characteristic information of the traffic state of a single
7UDIILF IORZ VHWWLQJ DW WKH LQWHUVHFWLRQ The traffic intersection in real time, and intelligently decides to extend
flow information is shown in Table 1, where E, W, S, and N the current phase or jump to a phase of the predefined phase
represent east, west, south, and north, respectively, and EW library.
represents the lane in the direction from east to west. Based 3KDVH GXUDWLRQ WUDIILF VLJQDO FRQWURO The phase
on the statistics of the data set, the proportion of straight duration TSC method is that the agent decides the duration
traffic flow in the entrance lane of the intersection is the of each phase intelligently in the fixed phase sequence based
proportion of right-turn traffic flow and left-turn traffic flow. on the traffic state information of the intersection.
The distribution of traffic flow follows Weber distribution,
& ([SHULPHQWDO 5HVXOWV
and the probability density is expressed as follows.
0, 0 &RPSDULVRQ RI WUDIILF EHQHILW LQGLFDWRUV LQ WKH WUDLQLQJ VHW
, , 1 (9) Fig. 5 shows the cumulative reward curve of the four control
, 0
methods of traffic flow in the training process under the
Where, is a random variable, is the scale coefficient, and condition of the training set. Through comparison, the
N is the shape coefficient. convergence effect of the DDQN algorithm based on
$OJRULWKP SDUDPHWHU VHWWLQJ In the algorithm based improved DRL is significantly better than that of the other
on the improved DDQN algorithm, the target network and three control methods, which indicates that the DDQN
the estimation network are composed of fully connected algorithm based on improved DRL can more accurately
neural networks, and each network outputs three different decide the optimal phase timing scheme. In order to evaluate
value distributions, and different value distributions share the performance of the control methods in the TSC task, the
the first three fully connected layers. The specific traffic efficiency indicators of different control methods are
parameters are shown in Table I. compared respectively. Fig. 5-8 shows the change curves of
the average queue length, the average waiting time and the
TABLE I. PARAMETERS CONFIGURATION
average number of stops of the six control methods. In the
Parameters Value early stage of training, due to the fact that the agent is still in
Learning rate ∝ 0.0003 the exploration stage and the sample size of the experience
Discount factor ¤ 0.95 pool is too small, the optimal traffic signal timing scheme
Experience pool size 0 50000 decided by the agent is not accurate enough, so the traffic
efficiency index increases significantly. With the increase of
Mini-batch size % 256
training rounds, the traffic efficiency index decreases and
Greedy strategy coefficient ¦ 0.01
gradually tends to be stable and convergent. As shown in
Iteration period 7 100 Table II, compared with the other five algorithms, the
Simulation round 1 300 average queue length is reduced by 29.7%, 38.3%, 45.6%,
Maximum simulation steps Q 3600 53.9%, 55.5%, and the average waiting time is reduced by
GminˈGmax (s) 10,30
28.1%, 31.5%, 36.5%, 43.9%, 56.7%, respectively. The
average number of stops decreased by 18.8%, 32.5%, 44.2%,
48.4%, 52.6%. In summary, the DDQN algorithm based on
% $QDO\VLV RI FRPSDUDWLYH DOJRULWKPV the improved deep reinforcement learning has achieved
better performance on the three traffic benefit evaluation
7LPLQJ FRQWURO Timed traffic signal control is based
indicators of the average queue length of vehicles, the
on a single intersection scenario where vehicles arrive
average waiting time and the average number of parking. It
uniformly in a certain historical period. The mathematical
shows that the method of controlling the phase selection and
model is used to calculate the phase period length and phase
the phase duration of traffic signals at the same time can
segmentation. The agent switches sequentially according to
more effectively alleviate traffic congestion and guide the
the predefined phase sequence and phase duration.
agent to optimize the optimal phase timing decision process.

5962Downloaded on February 16,2025 at 08:28:24 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech.
TABLE II. PARAMETERS CONFIGURATION
Algorithm Queue length/m Waiting times/s Stops
Timing control 376.8 87.6 44.7
Phase duration 363.9 67.6 41.1
Adaptive control 308.1 59.7 38.0
Phase switching 271.6 55.3 31.4
Phase selection 238.4 52.7 26.1
Improved DDQN 167.5 37.9 21.2

Fig. 8 Average braking times of different TSC methods.

&RPSDULVRQ RI WUDIILF EHQHILW LQGLFDWRUV LQ WKH WHVW VHW The

test results of different algorithms are shown in Fig. 9-11. In
the comparison of average queue length, average waiting
time and average number of stops, DDQN algorithm based
on improved DRL performs the best, indicating that it can
effectively optimize the optimal timing scheme of traffic
signals and improve the traffic efficiency at intersections
with high traffic flow saturation. In addition, as shown in
Fig. 5 Accumulated rewards of different TSC methods. Table III, in the 100 test sets, compared with the other five
algorithms, the average queue length of vehicles based on
improved DDQN algorithm is reduced, and the average
waiting time is reduced by 34.9%, 40.1%, 42.9%, 45.3%,
47.6%, respectively. The average number of stops decreased
by 37.1%, 50.3%, 55.2%, 60.4% and 61.2%.

Fig. 6 Average queue length of different TSC methods.

Fig. 9 Average queue length of different TSC methods.

Fig. 7 Average waiting time of different TSC methods. Fig. 10 Average waiting time of different TSC methods.

5963Downloaded on February 16,2025 at 08:28:24 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech.
[2] B. Zhou, X.-D. Wu, D.-F. Ma, et al., ̌ A Review of Deep
Reinforcement Learning Application in Urban Traffic Signal Control
Methods, ̍ Modern Transportation and Metallurgical Materials, 2022,
2(03): 84-93.
[3] W.-C. Yang, L. Zhang, Y.-P. Shi, et al., ̌Application Review of Agent
Technology in Urban Traffic Signal Control System,̍ Journal of Wuhan
University of Technology (Transportation Science & Engineering), 2014,
38(04): 709-718.
[4] X.-Q. Chen, Y.-Z. Zhu, C.-F. Lv, ̌ Intersection Signal Phase and
Timing Optimization Method Based on Mixed Proximal Policy
Optimization, ̍ Journal of Transportation Systems Engineering and
Information Technology, 2023, 23(01): 106-113.
[5] M. Aslani, M. S. Mesgari, M. Wiering, ̌Adaptive traffic signal control
with actor-critic methods in a real-world traffic network with different
traffic disruption events, ̍ Transportation Research Part C: Emerging
Technologies, 2017, 85: 732-752.
[6] K. L. Tan, A. Sharma, S. Sarkar, ̌Robust deep reinforcement learning
for traffic signal control,̍ Journal of Big Data Analytics in Transportation,
2020, 2: 263-274.
Fig. 11 Average braking times of different TSC methods.
[7] T. Wu, P. Zhou, K. Liu, et al., ̌ Multi-Agent Deep Reinforcement
Learning for Urban Traffic Light Control in Vehicular Networks,̍ IEEE
TABLE III. COMPARISON OF THE PERFORMANCE OF DIFFERENT METHODS
Transactions on Vehicular Technology, 2020, 69(8): 8243-8256.
Algorithm Queue length/m Waiting time/s Stops [8] G.-Q. Zhang, F.-R. Chang, J.-L. Jin, et al., ̌ Safety-Driven Adaptive
Signal Control Method for Urban Intersections, ̍ China Safety Science
Phase duration 556.6 103.7 62.6
Journal, 2023, 19(10): 192-199.
Timing control 528.7 99.6 61.3 [9] H. Wei, C. Chen, G. Zheng, et al., ̌Presslight: Learning max pressure
Adaptive control 466.1 95.2 58.9 control to coordinate traffic signals in arterial network, ̍ Proceedings of
the 25th ACM SIGKDD International Conference on Knowledge
Phase switching 420.1 90.8 48.9 Discovery & Data Mining, 2019.
Phase selection 340.7 83.1 38.6 [10] M. Xu, J. Wu, L. Huang, et al., ̌Network-wide traffic signal control
based on the discovery of critical nodes and deep reinforcement learning,̍
Improved DDQN 200.3 54.3 24.3 Journal of Intelligent Transportation Systems, 2019, 24(1): 1-10.
[11] X. Liang, X. Du, G. Wang, et al., ̌ A deep reinforcement learning
network for traffic light cycle control, ̍ IEEE Transactions on Vehicular
Technology, 2019, 68(2): 1243-1253.
V. CONCLUSION
[12] B.-L. Ye, W. Wu, K. Ruan, et al., ̌ A survey of model predictive
We proposes a DDQN algorithm based on improved control methods for traffic signal control, ̍ IEEE/CAA Journal of
DRL, which uses neural network analysis to extract the Automatica Sinica, 2019, 6(3): 623-640.
traffic state information features of the intersection, maps the [13] Y. Hua, X.-F. Wang, B. Jin, ̌ A Survey on Multi-Agent
next phase and phase duration of the TSC model, and Reinforcement Learning for Urban Traffic Signal Optimization, ̍
Operations Research Transactions, 2023, 27(02): 49-62.
designs a new loss function based on the loss factor, and
[14] M. Kolat, B. K v ri, T. B csi, et al., ̌ Multi-agent reinforcement
dynamically optimizes the update process of the loss learning for traffic signal control: A cooperative approach,̍ Sustainability,
function. The intersection simulation traffic signal control 2023, 15(4): 3479.
scene is built on the SUMO microscopic traffic simulation [15] Z.-M. Liu, B.-L. Ye, Y.-D. Zhu, et al., ̌ Traffic Signal Control
platform, the real single intersection traffic flow data set is Method Based on Deep Reinforcement Learning, ̍ Journal of Zhejiang
used to train the traffic signal control model, and multiple University (Engineering Science), 2022, 56(6): 1249-1256.
traffic flow test sets are constructed to test the effect of the [16] A. Jamal, M. Tauhidur Rahman, H. M. Al-Ahmadi, et al., ̌Intelligent
model. The training and test results show that the DDQN intersection control for delay optimization: Using meta-heuristic search
algorithms,̍ Sustainability, 2020, 12(5): 1896.
algorithm based on improved DRL effectively overcomes the
shortcomings of predefined phase sequence and phase [17] Z.-D. Zhang, Y.-N. Wang, Y.-K. Liu, et al., ̌ Reinforcement
Learning Algorithm for Road Network Traffic Control Based on Nash-
duration. Compared with other traffic signal control Stackelberg Hierarchical Game Model, ̍ Journal of Southeast University
algorithms, the average queue length, the average waiting (Natural Science Edition), 2023, 53(02): 334-341.
time and the average number of stops of vehicles are [18] S.-F. Ding, W. Du, L.-L. Guo, et al., ̌ Multi-Agent Deep
significantly reduced, which effectively improves the Deterministic Policy Gradient Method Based on Dual Critics,̍ Journal of
efficiency of intersection traffic. However, the current Computer Research and Development, 2023, 60(10): 2394-2404.
research is limited to the TSC problem of single intersection. [19] L. Zhu, P. Peng, Z. Lu, et al., ̌Metavim: Meta variationally intrinsic
For the coordination of arterial traffic signal and the motivated reinforcement learning for decentralized traffic signal control,̍
IEEE Transactions on Knowledge and Data Engineering, 2023, 35(11):
cooperative control of local road network, the next stage will 11570-11584.
focus on the research. [20] X.-Y. Peng, H. Wang, ̌ A Review of Combined Optimization of
Traffic Assignment and Signal Control, ̍ Journal of Transportation
REFERENCES Engineering and Information, 2023, 21(01): 1-18.
[1] Z. Yu, N.-W. Ning, Y.-L. Zheng, et al., ̌Review of Intelligent Traffic
Signal Control Strategy Driven by Deep Reinforcement Learning, ̍
Computer Science, 2023, 50(04): 159-171.

5964Downloaded on February 16,2025 at 08:28:24 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: to IEEExplore provided by University Libraries | Virginia Tech.

These YIN UTBM
No ratings yet
These YIN UTBM
183 pages
Mathematics 12 02056 v2
No ratings yet
Mathematics 12 02056 v2
24 pages
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
No ratings yet
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
29 pages
Electronics 10 02363 v2
No ratings yet
Electronics 10 02363 v2
32 pages
Applsci 13 02750 v2
No ratings yet
Applsci 13 02750 v2
23 pages
4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment
No ratings yet
4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment
17 pages
Investigacion Algoritmos 2
No ratings yet
Investigacion Algoritmos 2
22 pages
Sensors 22 08732 v3
No ratings yet
Sensors 22 08732 v3
21 pages
1 Two-Layer Coordinated Reinforcement Learning For Traffic Signal Control in Traffic Network
No ratings yet
1 Two-Layer Coordinated Reinforcement Learning For Traffic Signal Control in Traffic Network
12 pages
Sensors 24 03987 v2
No ratings yet
Sensors 24 03987 v2
19 pages
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
No ratings yet
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
20 pages
Actuators 13 00251
No ratings yet
Actuators 13 00251
15 pages
3 Discharge Control Policy Based On Density and Speed For Deep Q-Learning Adaptive Traffic Signal
No ratings yet
3 Discharge Control Policy Based On Density and Speed For Deep Q-Learning Adaptive Traffic Signal
21 pages
Articulo 2
No ratings yet
Articulo 2
14 pages
2018 TS Optimization 34
No ratings yet
2018 TS Optimization 34
17 pages
Mitigating Action Hysteresis in Traffic Signal Control With Traffic Predictive Reinforcement Learning
No ratings yet
Mitigating Action Hysteresis in Traffic Signal Control With Traffic Predictive Reinforcement Learning
12 pages
IEEE DQL Regional Network
No ratings yet
IEEE DQL Regional Network
5 pages
An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things
No ratings yet
An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things
11 pages
The Traffic Signal Control Problem For Intersections: A Review
No ratings yet
The Traffic Signal Control Problem For Intersections: A Review
20 pages
Traffic Signal Control A Double Q-Learning Approac
No ratings yet
Traffic Signal Control A Double Q-Learning Approac
5 pages
Distributed Traffic Light Control at Uncoupled Intersections With Real-World Topology by Deep Reinforcement Learning
No ratings yet
Distributed Traffic Light Control at Uncoupled Intersections With Real-World Topology by Deep Reinforcement Learning
9 pages
Deep Reinforcement Q-Learning For Intelligent Tra C Signal Control With Partial Detection
No ratings yet
Deep Reinforcement Q-Learning For Intelligent Tra C Signal Control With Partial Detection
15 pages
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
No ratings yet
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
12 pages
IEEE 2023 Cooperative Control
No ratings yet
IEEE 2023 Cooperative Control
5 pages
Multi Agent Learning Automata For Online Adaptive Control of Large Scale Traffic Signal Systems
No ratings yet
Multi Agent Learning Automata For Online Adaptive Control of Large Scale Traffic Signal Systems
6 pages
Neural Networks For Real-Time Traffic Signal Control (Srinivasan Et Al., 2006)
No ratings yet
Neural Networks For Real-Time Traffic Signal Control (Srinivasan Et Al., 2006)
12 pages
Vlmlight: Traffic Signal Control Via Vision-Language Meta-Control and Dual-Branch Reasoning
No ratings yet
Vlmlight: Traffic Signal Control Via Vision-Language Meta-Control and Dual-Branch Reasoning
25 pages
Ecolight: Reward Shaping in Deep Reinforcement Learning For Ergonomic Traffic Signal Control
No ratings yet
Ecolight: Reward Shaping in Deep Reinforcement Learning For Ergonomic Traffic Signal Control
5 pages
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
No ratings yet
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
10 pages
Pone 0298417
No ratings yet
Pone 0298417
22 pages
RL Paper Latex v01d01
No ratings yet
RL Paper Latex v01d01
6 pages
Fuzzy Traffic Signal Control 1
No ratings yet
Fuzzy Traffic Signal Control 1
7 pages
Traffic Light Control With Reinforcement Learning
No ratings yet
Traffic Light Control With Reinforcement Learning
18 pages
8349-Article Text-48881-2-10-20201129
No ratings yet
8349-Article Text-48881-2-10-20201129
16 pages
5 FairLight - Fairness-Aware Autonomous Traffic Signal Control With Hierarchical Action Space
No ratings yet
5 FairLight - Fairness-Aware Autonomous Traffic Signal Control With Hierarchical Action Space
13 pages
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
No ratings yet
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
8 pages
Deep Reinforcement Learning Based Approach For Tra C Signal Control Deep Reinforcement Learning Based Approach For Tra C Signal Control
No ratings yet
Deep Reinforcement Learning Based Approach For Tra C Signal Control Deep Reinforcement Learning Based Approach For Tra C Signal Control
8 pages
26729-Article Text-30792-1-2-20230626
No ratings yet
26729-Article Text-30792-1-2-20230626
10 pages
Swarm Intelligence Inspired Adaptive Traf C Control For Traf C Networks
No ratings yet
Swarm Intelligence Inspired Adaptive Traf C Control For Traf C Networks
11 pages
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
No ratings yet
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
16 pages
Optimization of Traffic Signal Using Linear Programing Problem Linear Programing Problem
No ratings yet
Optimization of Traffic Signal Using Linear Programing Problem Linear Programing Problem
6 pages
A Traffic Light Control Method Based On Multi Agent Deep Reinforcement Learning Algorithm
No ratings yet
A Traffic Light Control Method Based On Multi Agent Deep Reinforcement Learning Algorithm
11 pages
Accident Analysis and Prevention: Gongquan Zhang, Fangrong Chang, Jieling Jin, Fan Yang, Helai Huang
No ratings yet
Accident Analysis and Prevention: Gongquan Zhang, Fangrong Chang, Jieling Jin, Fan Yang, Helai Huang
17 pages
Reinforcement Learning-Based Intelligent Traffic Signal Control Considering Sensing Information of Railway
No ratings yet
Reinforcement Learning-Based Intelligent Traffic Signal Control Considering Sensing Information of Railway
12 pages
Optimization of Traffic Signal Using Linear Programing Problem Linear Programing Problem
No ratings yet
Optimization of Traffic Signal Using Linear Programing Problem Linear Programing Problem
8 pages
Automated Traffic Signal Control System Using Timer Based Technology
No ratings yet
Automated Traffic Signal Control System Using Timer Based Technology
9 pages
Of Two For Adaptive Control Of: Analysis
No ratings yet
Of Two For Adaptive Control Of: Analysis
9 pages
IEEE
No ratings yet
IEEE
3 pages
Optimization of Traffic Signal Using Linear Programing Problem Linear Programing Problem
No ratings yet
Optimization of Traffic Signal Using Linear Programing Problem Linear Programing Problem
7 pages
A Hybrid Method For Automatic Traffic Control Mechanism
No ratings yet
A Hybrid Method For Automatic Traffic Control Mechanism
8 pages
TRB 04-0510 Han Li Wang 2
No ratings yet
TRB 04-0510 Han Li Wang 2
13 pages
231 235 PDF
No ratings yet
231 235 PDF
6 pages
Adaptive Traffic Light Control in Wirele PDF
No ratings yet
Adaptive Traffic Light Control in Wirele PDF
5 pages
Integrated
No ratings yet
Integrated
24 pages
Pressure Traffic Signal Control With Fixed and Adaptive Routing For Urban Vehicular Networks
No ratings yet
Pressure Traffic Signal Control With Fixed and Adaptive Routing For Urban Vehicular Networks
10 pages
Ijrte02020406 PDF
No ratings yet
Ijrte02020406 PDF
3 pages
Knowledge Based Traffic Signal Control Model For Signalized Intersection
No ratings yet
Knowledge Based Traffic Signal Control Model For Signalized Intersection
5 pages
Reviewpaperof TLCS
No ratings yet
Reviewpaperof TLCS
5 pages
Time Series
No ratings yet
Time Series
29 pages
PDF Updated Literature Review Paper PDF
0% (1)
PDF Updated Literature Review Paper PDF
4 pages
Cert and CPM
No ratings yet
Cert and CPM
11 pages
('Christos Papadimitriou', 'Midterm 2', ' (Solution) ') Fall 2009 PDF
No ratings yet
('Christos Papadimitriou', 'Midterm 2', ' (Solution) ') Fall 2009 PDF
4 pages
Weather Prediction Mode
100% (1)
Weather Prediction Mode
4 pages
Portfolio Optimization Beyond Markowitz
No ratings yet
Portfolio Optimization Beyond Markowitz
134 pages
Post Quantum Cryptography: Kenny Paterson Information Security Group @kennyog
No ratings yet
Post Quantum Cryptography: Kenny Paterson Information Security Group @kennyog
31 pages
Equity Offset Exampe
No ratings yet
Equity Offset Exampe
61 pages
Ekonomski Uticaj Saobracajnih Nezgoda
No ratings yet
Ekonomski Uticaj Saobracajnih Nezgoda
304 pages
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 7. Steady State Process Optimization (2010)
No ratings yet
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 7. Steady State Process Optimization (2010)
32 pages
EOQ With Finite Production Rate
No ratings yet
EOQ With Finite Production Rate
8 pages
Machine Learning Refined: Foundations, Algorithms, and Applications Second Edition Jeremy Watt - The Ebook in PDF Format Is Available For Download
100% (2)
Machine Learning Refined: Foundations, Algorithms, and Applications Second Edition Jeremy Watt - The Ebook in PDF Format Is Available For Download
57 pages
Bayesian Optimization Primer: 1. Sigopt
No ratings yet
Bayesian Optimization Primer: 1. Sigopt
4 pages
Manual PTV Balance ENG
No ratings yet
Manual PTV Balance ENG
62 pages
VAP 2.16 Interface User Manual: PTV Planung Transport Verkehr AG
No ratings yet
VAP 2.16 Interface User Manual: PTV Planung Transport Verkehr AG
42 pages
OPAC: A Demand-Responsive Strategy For Traffic Signal Control
No ratings yet
OPAC: A Demand-Responsive Strategy For Traffic Signal Control
8 pages
Unit 2
No ratings yet
Unit 2
15 pages
Ampl Cplex Tutorial
No ratings yet
Ampl Cplex Tutorial
22 pages
IEEE 2021 DQL Double Deep
No ratings yet
IEEE 2021 DQL Double Deep
6 pages
Data Structures - Algorithms - PC-CS391
No ratings yet
Data Structures - Algorithms - PC-CS391
2 pages
Algos Qpaper 2022
No ratings yet
Algos Qpaper 2022
6 pages
An Integrated Approach Towards Next Generation Tracking & Tracing
No ratings yet
An Integrated Approach Towards Next Generation Tracking & Tracing
10 pages
Calculating Moving Averages
No ratings yet
Calculating Moving Averages
6 pages
Revision Eco430
No ratings yet
Revision Eco430
7 pages
Secant Method
No ratings yet
Secant Method
49 pages
08 Vojo Andjus - Put - Potencijalni Uzrog Dogadjanja Saobracajnih Nezgoda
No ratings yet
08 Vojo Andjus - Put - Potencijalni Uzrog Dogadjanja Saobracajnih Nezgoda
1 page
Comparison Between PID Control and Fuzzy PID
No ratings yet
Comparison Between PID Control and Fuzzy PID
4 pages
Sampling Techniques Group Ass HIM Level 3
No ratings yet
Sampling Techniques Group Ass HIM Level 3
2 pages
Garber Grossman Johnson-Yu
No ratings yet
Garber Grossman Johnson-Yu
1 page
Decision Theory
No ratings yet
Decision Theory
6 pages
Regression Tree
No ratings yet
Regression Tree
12 pages
Unconventional Arterial Intersection Designs Initiatives: Minseok Kim, Xiaorong Lai, Gang-Len Chang, and Saed Rahwanji
No ratings yet
Unconventional Arterial Intersection Designs Initiatives: Minseok Kim, Xiaorong Lai, Gang-Len Chang, and Saed Rahwanji
6 pages
41 Af
No ratings yet
41 Af
7 pages
Trifid Cipher
No ratings yet
Trifid Cipher
3 pages
DCCN PPT 1
No ratings yet
DCCN PPT 1
10 pages
Combinatorial Optimization: Algorithms and Complexity: Christos H. Papadimitriou and Kenneth Steiglitz October 8, 2000
No ratings yet
Combinatorial Optimization: Algorithms and Complexity: Christos H. Papadimitriou and Kenneth Steiglitz October 8, 2000
2 pages
Vikas Subramaniam: Professional Summary
No ratings yet
Vikas Subramaniam: Professional Summary
2 pages
Newton's Method For Unconstrained Optimization
No ratings yet
Newton's Method For Unconstrained Optimization
14 pages
Performance Analysis of Parallel Flow Intersection and Displaced Left-Turn Intersection Designs
No ratings yet
Performance Analysis of Parallel Flow Intersection and Displaced Left-Turn Intersection Designs
12 pages
Advanced System For Container Management in Multimodal Transport (MNS - Multimodal Network System)
No ratings yet
Advanced System For Container Management in Multimodal Transport (MNS - Multimodal Network System)
12 pages
Introduction To Algorithm
No ratings yet
Introduction To Algorithm
3 pages
The MMF Rerouting Computation Problem
No ratings yet
The MMF Rerouting Computation Problem
7 pages
Rutiranje Vozila U Skladistima
No ratings yet
Rutiranje Vozila U Skladistima
6 pages
COMP4804 Assignment 3: Due Tuesday March 16th, 23:59EDT
No ratings yet
COMP4804 Assignment 3: Due Tuesday March 16th, 23:59EDT
6 pages
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet

IEEE 2024 DQL Improved DQL

Uploaded by

IEEE 2024 DQL Improved DQL

Uploaded by

A traffic signal control method based on improved

deep reinforcement learning*

Secondly, the original one-dimensional action space was

Fig. 4 Schematic diagram of a TSC method based on improved DDQN.

Fig. 8 Average braking times of different TSC methods.

&RPSDULVRQ RI WUDIILF EHQHILW LQGLFDWRUV LQ WKH WHVW VHW The

Fig. 6 Average queue length of different TSC methods.

Fig. 9 Average queue length of different TSC methods.

You might also like

&RPSDULVRQ RI WUDIILF EHQHILW LQGLFDWRUV LQ WKH WHVW VHW The