An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things
An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things
Abstract— With the development of communication technol- the intelligent navigation of vehicles in a safe way [1], [2]
ogy and artificial intelligence of things (AIoT), transportation without human interaction. Currently, a large number of
systems have become much smarter than ever before. However, artificial intelligence of things (AIoT) devices are deployed in
the volume of vehicles and traffic flows have rapidly increased.
Optimizing and improving urban traffic signal control is a transportation systems to support information collection and
potential way to relieve traffic congestion. In general, traffic situation awareness. However, another problem has arisen:
signal control is a sequential decision process that conforms to the volume of cars has substantially increased, leading to a
the characteristics of reinforcement learning, in which an agent variety of traffic problems, such as road congestion, traffic
constantly interacts with its environment, thus providing strategy accidents and traffic disputes [3]–[5], and travel efficiency is
for optimizing behavior in accordance with feedback in response.
In this paper, we propose multiagent reinforcement learning for decreasing. Additionally, slow driving and long wait times
traffic signals (MARL4TS) to support the control and deployment for vehicles at intersections contribute to poor air quality.
of traffic signals. First, information on traffic flows and multiple We consider the idea of AIoT-based traffic signal control,
intersections is formalized as input environments for performing in which urban traffic signals are controlled by dynamic
reinforcement learning. Second, we design a new reward function forecasts and feedback from AIoT monitoring and data mining.
to continuously select the most appropriate strategy as control
during multiagent learning to track actions for traffic signals. Vehicle-to-roadside (V2R) collaborative computing is consid-
Finally, we use a supporting tool, Simulation of Urban MObility ered to relieve the traffic congestion problem. Traffic signals
(SUMO), to simulate the proposed traffic signal control process control and regulate vehicles in urban road networks and at
and compare it with other methods. The experimental results intersections to improve the travel efficiency of users and the
show that our proposed MARL4TS method is superior to the overall efficiency of traffic circulation [6]–[8]. The principle of
baselines. In particular, our method can reduce vehicle delay.
urban traffic signalization is to control and optimize the roles
Index Terms— Traffic signal control, artificial intelligence of of vehicles and pedestrians at intersections by adjusting the
things, collaborative computing, multiagent reinforcement learn- duration of each traffic signal. One important goal is to ensure
ing, information fusion.
the safety of urban traffic [9], [10]. Vehicles and pedestrians
should be able to pass through intersections in a safe and
I. I NTRODUCTION orderly manner without interfering with each other. However,
due to the randomness, complexity and time variability of
T HE goal of automated driving is to integrate advanced
sensors, embedded cameras, fifth-generation/sixth-
generation (5G/6G) wireless communication, traffic signals,
urban traffic flows, existing traffic signal control methods
cannot meet the needs of current traffic conditions to solve
and cloud-fog-edge computing to collaboratively enable the problem of urban traffic congestion. Therefore, an efficient
method is needed to improve the utilization rate of roads and
Manuscript received 1 February 2021; revised 1 May 2021 and 24 July intersections. Predicting traffic flow is a challenging research
2021; accepted 13 August 2021. Date of publication 30 August 2021; date of topic for controlling traffic signals in intelligent transportation
current version 8 July 2022. This work was supported by the National Natural
Science Foundation of China under Grant 61902236. The Associate Editor for systems.
this article was S. Wan. (Corresponding authors: Yueshen Xu; Honghao Gao.) There are two common control methods for traffic signals
Xiaoxian Yang is with the School of Computer and Information Engi- at intersections [11]–[14]. The first method is a simple sta-
neering, Shanghai Polytechnic University, Shanghai 201209, China (e-mail:
[email protected]). tistical method that operates in accordance with the statistical
Yueshen Xu and Zhiying Wang are with the School of Computer Sci- characteristics of the intersection traffic conditions. The second
ence and Technology, Xidian University, Xi’an 710126, China (e-mail: method is based on the durability of the roads, seasonal factors
[email protected]; [email protected]).
Li Kuang is with the School of Computer Science and Engineering, Central such as climate, and timing plans for morning and evening
South University, Changsha 410075, China (e-mail: [email protected]). traffic. However, neither of these methods can successfully
Honghao Gao and Xuejie Wang are with the School of Computer Engi- adapt to changing traffic conditions or improper management
neering and Science, Shanghai University, Shanghai 200444, China (e-mail:
[email protected]; [email protected]). of other intersections. It is difficult to use conventional meth-
Digital Object Identifier 10.1109/TITS.2021.3105426 ods to solve the traffic signal control problem; instead, this
1558-0016 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9336 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
task is better suited to the field of artificial intelligence and model the traffic environment. Section IV introduces applica-
collaborative technologies. For example, fuzzy control, neural tions and the proposed techniques for traffic signal control.
networks, and generative adversarial networks (GANs) have Section V presents the experimental results and provides
been utilized for this purpose. Although fuzzy control is con- analysis and discussion. We conclude this paper and identify
venient for real-time control, it lacks learning capabilities and topics of future research in Section VI.
validity over time. Moreover, although applications involving
neural networks or GANs have yielded satisfactory results, II. R ELATED W ORK
adequate time for data training is required for such methods, Reinforcement learning has enabled many breakthroughs
and real-time response cannot be guaranteed. in theory and experiments in interdisciplinary studies involv-
Hence, we are motivated to use reinforcement learn- ing psychology, intelligent computing, operations research
ing [15], [16], which is suitable for the adaptive control of and control theory. In particular, reinforcement learning has
traffic signals. In reinforcement learning, an agent obtains become the most promising method in the field of traffic signal
useful information on current traffic conditions and then selects control, and related research is flourishing. In this section,
an appropriate strategy to interact with its environment. The we present a review of the major techniques and methods most
environment returns feedback, and the agent then selects closely related to our work.
a more suitable strategy in accordance with the feedback. Some studies aim to optimize traffic efficiency and sim-
Such a method of learning for traffic signal control can run ulate traffic networks with real-world data. These works
continuously. Therefore, the multiagent reinforcement learning have achieved promising results. Qiao et al. [17] proposed
for traffic signals (MARL4TS) method proposed in this paper a two-stage fuzzy logic control model for isolated signal-
has good learning capabilities, a satisfactory training speed, ized intersections that considers both traffic efficiency and
and a powerful instant feedback ability. MARL4TS utilizes fairness. An offline genetic algorithm (GA) was proposed
communication among multiple intersections to achieve an to optimize the fuzzy rules and membership functions for
optimal joint mode of multiagent learning. Thus, this method the controller. Wei et al. [18] addressed traffic signal control
is beneficial for urban development and can play a significant using an improved reinforcement learning method. However,
role in guaranteeing the safety and smoothness of urban road this approach cannot handle multiway traffic signal control.
traffic. Liang et al. [19] aimed to address the problems of long
Based on the infrastructure of AIoT, we use the Q-learning traffic delays and wasted energy. By collecting traffic data
algorithm, a reinforcement learning method, to perform timing and dividing a whole intersection into small grids, a complex
control for urban regional traffic signals. The Q-learning traffic scene was quantified in terms of states. Chu T et al. [20]
algorithm does not estimate an environment model but rather integrated a macroscopic fundamental diagram (MFD) into
directly optimizes an iteratively computable value function. their microscopic urban traffic flow model to constrain the
When certain conditions occur, only simple strategies are search space for control policies. After construction of the
needed to ensure convergence of the algorithm. These char- MFD model, Q-learning was implemented to reduce the com-
acteristics are consistent with the characteristics of urban putational cost of the large-scale stochastic control problem.
regional traffic signal control. Thus, we model traffic flows and Rodrigues et al. [21] attempted to improve reliability in highly
their changes as inputs representing environmental changes, dynamic urban areas. Darmoul et al. [22] aimed to optimize
and the traffic signals at intersections are optimized as the a common conceptual framework and address the lack of a
output behaviors for signal control. In summary, the contribu- built-in adaptation mechanism to achieve specific interference
tions of this paper are detailed as follows: management. They applied the concepts and mechanisms of
1) We study the signal control of regional intersections biological immunity to build a distributed and adaptive traffic
using AIoT devices, for which information on traffic flows signal control system. Pandit et al. [23] employed vehicular ad
and intersections, including vehicle length, traffic capacity, and hoc networks (VANETs) to collect and aggregate the speed and
signal duration, can be formalized as the input environments position information of each vehicle to optimize signal control
for reinforcement learning. at traffic intersections. They modeled traffic signal control as
2) We build a reinforcement learning model based on a job scheduling problem on processors. Baskar et al. [24]
multiple agents and collaborative computing to address the introduced a driving reimbursement coefficient and a delay
problem of traffic congestion. When using the Q-learning time to estimate the effectiveness of a time distribution system
algorithm to perform reinforcement learning, we select the and reduce the delay due to wait times. However, this approach
most appropriate strategy for traffic signal control as feedback does not perform well with traffic congestion.
after evaluating the reward function. Current traffic environments are dynamic and varying in
3) We demonstrate that our method can reduce vehicle time, making scheduled traffic signal timing methods obsolete.
delay through simulations implemented using the Simulation Zeng et al. [25] combined a recurrent neural network (RNN)
of Urban MObility (SUMO) software package. Furthermore, with a deep Q-network (DQN) and introduced a variant
we design an emergency vehicle routing scenario and perform deep reinforcement learning agent. This method takes advan-
comparative experiments to demonstrate the practical applica- tage of real-time Global Positioning System (GPS) data and
tion of our method. learns how to control traffic signals at a single intersection.
The remainder of this paper is organized as follows: Srinivasan et al. [26] proposed multiagent reinforcement learn-
Section II discusses related works. Section III shows how to ing based distributed unsupervised traffic signal control for
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9337
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9338 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
the vehicle queue length. In our study, the vehicle delay time
serves as the basis for judging the reward function. We use
the symbol di to represent the wait time for vehicles in each
of the four phases. The vehicle delay time follows Eq. (2):
C(1 − λ) x2
d= + (2)
Fig. 2. Intersection phase model under different signal lamps. 2(1 − x) 2q(1 − x)
where λ = g/C is the green signal ratio, q is the traffic flow
can collaborate to respond to changing traffic flows in order in the signal phase of interest, and x is the traffic saturation
to take optimal actions in controlling the traffic signals. in this phase, as defined below.
The agents work in accordance with the following steps. Considering the balance between the time consumption of
First, each agent reads the environmental information and the traffic flow calculation and the time interval of traffic flow
obtains the state space at the initial time. Second, a strategy data collection, we use the vehicle length li in each phase
selection process based on a reward function is implemented within the time interval li to represent the traffic flow in this
to select a suitable action for the current state in the action phase. The symbol D represents the maximum traffic capacity
space and calculate the return value Q using reinforcement of the road. The traffic saturation x i in a given phase is the
learning. Third, each agent chooses an action to perform ratio of the vehicle length li in that phase to the total traffic
at its corresponding traffic intersection in accordance with capacity D.
the returned strategy. Fourth, by continuously repeating these li
processes, the agents output the optimal control strategy. xi = (3)
D
The signal control of intersections by agents is a training li (t + t) = li (t) + li (t)
process with real-time feedback. The goal of the Q-learning
algorithm is to maximize the expected cumulative reward, = li (t) + li (4)
for which the value Q is the important element at each time Based on the AIoT-based framework shown in Figure 1,
instance. The Q-learning algorithm continues to apply policy AIoT devices play an important role in dynamically monitor-
selection actions until the loop is completed or the final ing and computing the index parameters of interest during a
state is reached. Fundamentally, a reward is given after each period of time. For example, the vehicle length li as monitored
time the process is updated; thus, there are n rewards, and and computed by AIoT devices is defined as the total length
the corresponding cumulative values, which are considered of all vehicles that pass the intersection in a certain phase
knowledge, can be used to determine which strategy to apply. during a given time interval. Inner-lane vehicles approaching
This process is the core focus of our research. an intersection have the option of either going straight or
turning left; in the latter case, this may conflict with oncoming
traffic. Note that the probabilities of these two scenarios in the
B. Traffic Environment Information
inner lane at an intersection are equal.
At an intersection, the urban semaphore model comprises
two-way control. There are four main phases of the signal C. Reward Function
lamps. Each lamp for each different traffic direction has a The reward function r (s, a) is the most important compo-
certain duration in one specified period. In practice, the signal nent in the Q-learning algorithm. However, establishing the
duration should be appropriately configured depending on the reward function r (s, a) is difficult. We need to consider the
intersection environment. If the duration is too short, this is feasibility of the reward function and guarantee that it is
not conducive to vehicle safety; that is, the intersection will be easy to compute and can satisfy the needs of the intended
prone to traffic accidents. If the duration is too long, the wait application. Here, the vehicle delay time as an important
time of vehicles will be increased, leading to traffic congestion. metric for measuring traffic conditions is used as a parameter
We use the symbol i ∈ {1, 2, 3, 4} to represent the four in the reward function. The reward function ri (s, a) is given
different phases, as illustrated in Figure 2. Generally, a traffic in Eq. (5):
signal uses three colors: red, green and amber. Since the amber
signal duration does not affect the driving and stopping of the − (φ ∗ dt + ϕ ∗ lt )
rt (s, a) = (5)
vehicles, this time is disregarded. We use the symbol ri to φ+ϕ = 1
represent the red signal duration in each of the four phases where rt (s, a) is the reward obtained after adopting strategy
and the symbol gi to represent the green signal duration in at in state st . We can adjust φ and ϕ to output an optimized
each of the four phases. We use the symbol C to represent the strategy. The parameter dt is the vehicle delay time at time t.
total semaphore period, that is, The parameter lt is the length of the vehicles waiting at time t.
4
4 The maximum reward means to minimize the time index.
C= ri + gi (1) G t = rt (s, a) + γ rt +1 (s, a) + γ 2rt +2 (s, a) · · ·
i=1 i=1
+γ T −t rt +T (s, a)
There are many indicators for evaluating signal control
T −t
schemes, such as the number of vehicles that pass through = γ i rt +i (s, a) (6)
an intersection in a unit of time, the vehicle delay time, and i=0
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9339
D. State-Action Pairs
When using reinforcement learning, descriptions of the
state space, action space, and reward function of the problem
must be given. The state space of the intersection at time Fig. 3. Learning process based on information fusion for traffic signal control.
k is expressed as sk = (L k.1 , L k.2 , L k.3 , L k.4 ), where L k.i Algorithm 1 Agent Learning for a Single Intersection (ALSI)
represents the time difference between the durations of two Input: Current traffic state s, action set A(s)
green signals in each phase. The state set is S(a) = {s1 , Output: Action at for handling traffic signal control
s2 . . . sk . . .}. In the case in which the state space is expressed 1: while T in episode:
in terms of the vehicle length, because the duration of the 2: Initialize s and Q(s,a) ← 0
cycle has been determined, we choose the green signal ratio 3: for step in each T
4: Choose action at from A(s t ) via the -greedy method
for signal control and use the different green signal ratios
5: Take this action a t and obtain feedback rt
in different phases as the action space. Therefore, the action 6: Q t (st ,at ) = (1 − βt ) Q t (st , at)
space at time k is ak = (g1, g2, g3, g4), and the action
+βt rt + γ max Q t st +1 , at
set is A(s) = {a1 , a2 . . . ak . . .}. The target formulae used to 7:
a
st ←st +1 && t ← t + 1
evaluate the strategy π supported by actions are expressed as 8: end for
follows: 9: end while
Q (s, a) = E [G t |st = s, at = a, π]
∃s ∈ S(a), ∃a ∈ A(s) As the ALSI algorithm shows, for each episode, the first step
s.t. arg max Q π∗ (s, a) (7) is to reset the environmental state s and initialize the Q(s, a)
a
value to 0. Second, for each step of the episode, the current
from which it can be seen that the optimal strategy π will traffic state st is observed, and a suitable action at is selected in
maximize Q(s, a), which means that the current action used accordance with the current strategy via the -greedy method.
to control the traffic signals is the best choice. Action a is selected from action set A(s) to act on the current
state st and obtain feedback rt . The state of the environment
IV. AGENT L EARNING A PPLICATION will vary with changes in traffic flow, and the traffic state
of the environment will change to the next state st +1 for the
The multiagent reinforcement learning method is introduced estimated value Q t (st , at ). Thus, the Q value is designed to
to learn how to take the optimal actions for traffic signal be updated in accordance with the temporal difference (TD)
control. In this section, we consider two different application method, that:
scenarios: agent learning at a single intersection and agent
learning for multiple intersections. Q t (st , at ) = (1 − βt ) Q t (st , at )
+βt rt + γ max Q t (st +1 , at ) (8)
a
A. Modeling a Single Intersection for Agent Learning
Because the reward function is expressed in terms of the
As shown in Figure 3, in the learning process that we delay, the goal of the strategy is to earn the maximum reward.
implement, each intersection is represented by an agent to Next, the state space and time phase are updated, that is, Q(s,
perform reinforcement learning. As introduced in the traffic a) is obtained to determine whether the value has changed
environment information section, the traffic volume, traffic from a small value to a large value, where βt ∈ (0, 1) is the
saturation, and signal phase are encoded to represent the update rate; then, s ← st +1 and t ← t + 1. Subsequently,
environment. The appropriate action for controlling the traffic the process loops through the previously described steps until
signal is returned in terms of the green signal ratio. The green the episode is complete.
signal ratio is an important consideration to reduce the delay
time.
B. Modeling Multiple Intersections for Agent Learning
For this learning process, the current traffic state is input to
represent the traffic environment. We randomly select an action Traffic information and decision making for a single inter-
from the action set A(s) to act on the current state. Next, based section affect the surrounding traffic flow and other intersec-
on the received feedback, we force the traffic state to the next tions. Therefore, an agent that controls multiple intersections
state in accordance with the updated Q value, traffic status and must globally balance the traffic flow to achieve the shortest
time. We then make a decision during the iterative process. overall wait time and average queue length at intersections for
More details are shown in Algorithm 1, Agent Learning for a the entire area. We consider multiagent reinforcement learning
Single Intersection (ALSI). to better understand the traffic environment.
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9340 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9341
TABLE I
P LATFORM E NVIRONMENT
TABLE II
PARAMETER S ETTINGS FOR A S INGLE I NTERSECTION
A. SUMO-Based Simulation
To build the simulation, we used SUMO as a tool to support
traffic scenario visualization. SUMO can simulate the precise
movement of a single vehicle under the conditions of various
transportation facilities at the micro level. When SUMO is
employed as a test platform for intelligent traffic control
algorithms, it needs to interact well with external programs.
Thus, we used Python to connect SUMO with the Q-learning
algorithm based on the Traffic Control Interface (TraCI) API.
Figure 6 shows an integrated example of SUMO. In general,
the SUMO connection process requires the following three
steps:
1) Road Network and Routing Files: SUMO uses the
TABLE III
SUMO graphical user interface (SUMO-GUI) as a visual
PARAMETER S ETTINGS FOR M ULTIPLE I NTERSECTIONS
interface. A road network file generated by the net-convert
module function, which accepts a node file and an edge file
as inputs, is imported into SUMO-GUI. The user needs to
define the node file to represent intersections and the edge file
to represent roads.
2) Sumo.cfg File: While SUMO is running, road network
files, routing files, etc., are the basic descriptions used, but the
SUMO-GUI execution file will not autonomously call these
files. Thus, these files must be integrated into a configuration
file. The configuration file is named sumo.cfg and contains the
important XML labels <net-file> and <route-files>.
3) TraCI API: The TraCI interface obtains the data from
the SUMO traffic simulation environment and modifies and
controls these data in real time. Currently, the Python version
of TraCI is the most comprehensive, followed by the C++.
NET, MATLAB, and Java versions.
B. Experimental Settings Based on the road parameters listed in these tables, we set
To specify how the experiments were carried out, the maximum vehicle capacity D of each road segment to
Table I describes the experimental environment. For example, 100 and the cycle time C to 120. During the experiment,
the operating system was Windows 10, and the platform lan- we performed a total of 1000 cycles.
guage was Python using the PyCharm development platform. 2) Algorithm Parameter Settings: As previously mentioned,
1) Traffic Data Settings: The parameters of a road net- γ is the discount factor, with γ ∈ [0, 1]. If γ is set to a
work include the potential left-turn conflicts at intersections, larger value, the agent selection strategy will be more focused
number of lanes, traffic signals, number of phases, intersec- on future returns; if γ is small, then the agent selection
tions, vehicle generation frequency, and number of roads. The strategy will be more focused on immediate returns. In our
experimental parameter configuration for a single-intersection experiments, βt was set to 0.00001, and γ was set to 0.99.
cross-traffic network is shown in Table II. Moreover, the exper- 3) Working Process and Data: Figure 7 shows the overall
imental parameter configuration for a multiple-intersection process. The first step is to simulate the traffic environment.
traffic network is defined in Table III. The SUMO tool utilized in this study requires road network
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9342 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
TABLE IV
P ERFORMANCE C OMPARISON OF THE Q-L EARNING
A LGORITHM W ITH D IFFERENT PARAMETERS
files and route files as input to provide the desired road infor-
mation. And the second step is to input the implementation
of multiagent reinforcement learning with related parameter
settings. Note that the green signal time in each phase could
be obtained only after training and was input into SUMO to
TABLE V
complete the test. During the learning process, the agent will
R ESULTS OF A LGORITHMS U SED FOR T RAFFIC S IGNAL C ONTROL
return the optimal strategy for traffic signal control.
C. Experimental Results
We obtained the experimental results through SUMO-GUI
and performed some basic preprocessing. Mousavil et al. [47]
introduced a method for converting a snapshot from
SUMO-GUI into desired experimental data, such as the aver-
age reward, delay time and queue length. We needed to
convert each snapshot from a red-green-blue representation to
a grayscale representation, resize it to 128 × 128 frames, and deployed, mainly based on location and deployment issues.
provide the results as input to the system. For compatibility On the other hand, the average queue length slowly increases
with the Q-learning algorithm model, we used the DQN with increasing complexity of the road information. This
architecture, with the output layer corresponding to the action increase reflects the stability of the agents using the Q-learning
value. The training time for this network was 1000 cycles. The algorithm.
agent learning strategy was evaluated every 5 training cycles. 2) Comparative Analysis: Next, we compared four traffic
After running SUMO-GUI 5 times, data such as the average signal control methods. The other three traffic signal control
delay time and queue length were obtained. methods considered for comparison are fixed timing control,
1) Performance Analysis: Based on the simulation environ- random timing control and longest queue control. Fixed timing
ment described above, control experiments using Q-learning and random timing are different options regarding the green
were carried out. To compare the performance of the time ratio at an intersection. The first option is to choose a
Q-learning algorithm under different conditions, we applied fixed green time ratio, and the second option is to choose a
the Q-learning algorithm with different parameters. The para- random green time ratio. The longest queue control method
meters that we varied included the frequency of vehicle refers to selecting the phase timing based on the longest queue
generation. For different phases, we ran SUMO-GUI simu- length when timing each phase at an intersection. We ran the
lations using the previously described configuration settings. SUMO-GUI simulator using the previously described method
Table IV shows the results, from which we can compare the again and compared Q-learning with the fixed timing, random
average delay time and average queue length when the number timing, and longest queue algorithms to evaluate the average
of cycles, the cycle time and the maximum lane capacity are delay time and queue length.
changed. Table V and Figure 8 compare the results in terms of
As shown in Table IV, as the vehicle generation frequency the average delay time and the average queue length when
increases, the average delay time and average queue length the different algorithms are used. These results indicate that
also increase. For example, the configuration (0.5, 1000, when the Q-learning algorithm is employed for traffic signal
120, 100) is worse than (0.3, 1000, 120, 100). However, timing at intersections, the average delay time and average
when the vehicle generation frequency is fixed, a longer cycle queue length are much lower than those under the other three
time results in a lower average delay time and a shorter algorithms. The results also demonstrate the efficiency of the
average queue length. Thus, better results may be achieved by Q-learning algorithm in optimizing urban traffic signal control.
appropriately extending the cycle time. In our experiments, the In multiagent reinforcement learning, an agent can not only
best configuration for delay time was found to be (0.5, 1000, collect traffic environment information for its own node but
120, 50), and the best configuration for queue length was also learn through information fusion based on information
found to be (0.3, 1000, 120, 100). Moreover, the maximum collected by other agents at nearby nodes.
capacity of each lane is also a key factor. Similarly, we can 3) Emergency Vehicle Routing Scenario: We also performed
conclude that it is better to increase the lane capacity where an experiment based on a real-world road network to illustrate
possible, which should be considered when traffic signals are the performance of the proposed Q-learning algorithm and
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9343
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9344 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022
system [50], [51] using probabilistic model checking, for [21] F. Rodrigues and C. L. Azevedo, “Towards robust deep reinforcement
example, before deploying a traffic signal control strategy learning for traffic signal control: Demand surges, incidents and sensor
failures,” in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019,
to quantitatively verify the reliability and correctness of the pp. 3559–3566.
returned strategy. [22] S. Darmoul, S. Elkosantini, A. Louati, and L. B. Said, “Multi-agent
immune networks to control interrupted flow at signalized intersections,”
Transp. Res. C, Emerg. Technol., vol. 82, pp. 290–313, Sep. 2017.
R EFERENCES [23] K. Pandit, D. Ghosal, H. M. Zhang, and C. N. Chuah, “Adaptive
traffic signal control with vehicular ad hoc networks,” IEEE Trans. Veh.
[1] H. Gao, C. Liu, Y. Li, and X. Yang, “V2VR: Reliable hybrid-network- Technol., vol. 62, no. 4, pp. 1459–1471, May 2013.
oriented V2V data transmission and routing considering RSUs and [24] L. D. Baskar, B. de Schutter, J. Hellendoorn, and Z. Papp, “Traffic
connectivity probability,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 6, control and intelligent vehicle highway systems: A survey,” IET Intell.
pp. 3533–3546, Jun. 2021, doi: 10.1109/TITS.2020.2983835. Transp. Syst., vol. 5, no. 1, pp. 38–52, Mar. 2011.
[2] X. Xu, J. Zhao, Y. Li, H. Gao, and X. Wang, “BANet: A balanced [25] J. Zeng, J. Hu, and Y. Zhang, “Adaptive traffic signal control with
atrous net improved from SSD for autonomous driving in smart deep recurrent Q-learning,” in Proc. IEEE Intell. Vehicles Symp. (IV),
transportation,” IEEE Sensors J., early access, Oct. 28, 2020, doi: Jun. 2018, pp. 1215–1220.
10.1109/JSEN.2020.3034356. [26] D. Srinivasan, M. C. Choy, and R. L. Cheu, “Neural networks for real-
[3] Z. Bi, L. Yu, H. Gao, P. Zhou, and H. Yao, “Improved VGG model-based time traffic signal control,” IEEE Trans. Intell. Transp. Syst., vol. 7,
efficient traffic sign recognition for safe driving in 5G scenarios,” Int. J. no. 3, pp. 261–272, Sep. 2006.
Mach. Learn. Cybern., pp. 1–12, Aug. 2020, doi: 10.1007/s13042-020- [27] S. El-Tantawy, B. Abdulhai, and H. Abdelgawad, “Design of reinforce-
01185-5. ment learning parameters for seamless application of adaptive traffic
[4] Z. Liu and W. Liang, “Advances in urban traffic signal control,” signal control,” J. Intell. Transp. Syst., vol. 18, no. 3, pp. 227–245,
J. Highway Transp. Res. Develop., vol. 20, no. 6, pp. 121–125, 2003. Jul. 2014.
[5] C. Chen, Z. Liu, S. Wan, J. Luan, and Q. Pei, “Traffic flow prediction
[28] S. S. Mousavi, M. Schukat, and E. Howley, “Traffic light control using
based on deep learning in internet of vehicles,” IEEE Trans. Intell.
deep policy-gradient and value-function-based reinforcement learning,”
Transp. Syst., vol. 22, no. 6, pp. 3776–3789, Jun. 2020.
IET Intell. Transp. Syst., vol. 11, no. 7, pp. 417–423, 2017.
[6] L. Kuang et al., “Traffic volume prediction based on multi-sources GPS
trajectory data by temporal convolutional network,” Mobile Netw. Appl., [29] L. Li, Y. Lv, and F.-Y. Wang, “Traffic signal timing via deep reinforce-
vol. 25, no. 4, pp. 1405–1417, Aug. 2020, doi: 10.1007/s11036-019- ment learning,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 247–254,
01458-6. Apr. 2016.
[7] H. Abuella, F. Miramirkhani, S. Ekin, M. Uysal, and S. Ahmed, [30] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas,
“ViLDAR—Visible light sensing-based speed estimation using vehi- “Dueling network architectures for deep reinforcement learning,” in
cle headlamps,” IEEE Trans. Veh. Technol., vol. 68, no. 11, Proc. Int. Conf. Mach. Learn., 2016, pp. 1995–2003.
pp. 10406–10417, Nov. 2019. [31] J. Jin and X. Ma, “Hierarchical multi-agent control of traffic lights based
[8] C. Chen, B. Liu, S. Wan, P. Qiao, and Q. Pei, “An edge traffic flow on collective learning,” Eng. Appl. Artif. Intell., vol. 68, pp. 236–248,
detection scheme based on deep learning in an intelligent transportation Feb. 2018.
system,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1840–1852, [32] D. A. Vidhate and P. Kulkarni, “Cooperative multi-agent reinforcement
Mar. 2021. learning models (CMRLM) for intelligent traffic control,” in Proc. 1st
[9] J. K. Naufal et al., “A2CPS: A vehicle-centric safety conceptual frame- Int. Conf. Intell. Syst. Inf. Manage. (ICISIM), Oct. 2017, pp. 325–331.
work for autonomous transport systems,” IEEE Trans. Intell. Transp. [33] T. Chu, J. Wang, L. Codecà, and Z. Li, “Multi-agent deep reinforcement
Syst., vol. 19, no. 6, pp. 1925–1939, Jun. 2018. learning for large-scale traffic signal control,” IEEE Trans. Intell. Transp.
[10] D. Zhao et al., “Accelerated evaluation of automated vehicles safety in Syst., vol. 21, no. 3, pp. 1086–1095, Mar. 2020.
lane-change scenarios based on importance sampling techniques,” IEEE [34] M. A. Wiering, “Multi-agent reinforcement learning for traffic light
Trans. Intell. Transp. Syst., vol. 18, no. 3, pp. 595–607, Mar. 2017. control,” in Proc. Int. Conf. Mach. Learn. (ICML), 2000, pp. 1151–1158.
[11] L. Qi, M. Zhou, and W. Luan, “Emergency traffic-light control system [35] K. J. Prabuchandran, A. N. H. Kumar, and S. Bhatnagar, “Multi-agent
design for intersections subject to accidents,” IEEE Trans. Intell. Transp. reinforcement learning for traffic signal control,” in Proc. 17th Int. IEEE
Syst., vol. 17, no. 1, pp. 170–183, Jan. 2016. Conf. Intell. Transp. Syst. (ITSC), Oct. 2014, pp. 2529–2534.
[12] Y. Zhang, Y. Zhang, and R. Su, “Pedestrian-safety-aware traffic light [36] S. El-Tantawy, B. Abdulhai, and H. Abdelgawad, “Multiagent rein-
control strategy for urban traffic congestion alleviation,” IEEE Trans. forcement learning for integrated network of adaptive traffic signal
Intell. Transp. Syst., vol. 22, no. 1, pp. 178–193, Jan. 2021. controllers (MARLIN-ATSC): Methodology and large-scale application
[13] Z. Ouyang, J. Niu, Y. Liu, and M. Guizani, “Deep CNN-based real- on downtown Toronto,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 3,
time traffic light detector for self-driving vehicles,” IEEE Trans. Mobile pp. 1140–1150, Sep. 2013.
Comput., vol. 19, no. 2, pp. 300–313, Feb. 2020. [37] M. Aslani, M. S. Mesgari, and M. Wiering, “Adaptive traffic signal
[14] Y.-T. Chuang, C.-W. Yi, Y.-C. Tseng, C.-S. Nian, and C.-H. Ching, control with actor-critic methods in a real-world traffic network with
“Discovering phase timing information of traffic light systems by stop- different traffic disruption events,” Transp. Res. C, Emerg. Technol.,
go shockwaves,” IEEE Trans. Mobile Comput., vol. 14, no. 1, pp. 58–71, vol. 85, pp. 732–752, Dec. 2017.
Jan. 2015.
[38] T. Wu et al., “Multi-agent deep reinforcement learning for urban traffic
[15] T. Tan, F. Bao, Y. Deng, A. Jin, Q. Dai, and J. Wang, “Cooperative deep
light control in vehicular networks,” IEEE Trans. Veh. Technol., vol. 69,
reinforcement learning for large-scale traffic grid signal control,” IEEE
no. 8, pp. 8243–8256, Aug. 2020.
Trans. Cybern., vol. 50, no. 6, pp. 2687–2700, Jun. 2020.
[16] L. A. Prashanth and S. Bhatnagar, “Reinforcement learning with function [39] J. Ma and F. Wu, “Feudal multi-agent deep reinforcement learning for
approximation for traffic signal control,” IEEE Trans. Intell. Transp. traffic signal control,” in Proc. 19th Int. Conf. Auton. Agents Multiagent
Syst., vol. 12, no. 2, pp. 412–421, Jun. 2011. Syst. (AAMAS), 2020, pp. 816–824.
[17] J. Qiao, N. Yang, and J. Gao, “Two-stage fuzzy logic controller for [40] T. Wang, T. Liang, J. Li, W. Zhang, Y. Zhang, and Y. Lin, “Adaptive
signalized intersection,” IEEE Trans. Syst., Man, Cybern. A, Syst., traffic signal control using distributed MARL and federated learning,”
Humans, vol. 41, no. 1, pp. 178–184, Jan. 2011. in Proc. IEEE 20th Int. Conf. Commun. Technol. (ICCT), Oct. 2020,
[18] H. Wei, G. Zheng, H. Yao, and Z. Li, “IntelliLight: A reinforcement pp. 1242–1248.
learning approach for intelligent traffic light control,” in Proc. 24th [41] C. Chen et al., “Toward a thousand lights: Decentralized deep reinforce-
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2018, ment learning for large-scale traffic signal control,” in Proc. AAAI Conf.
pp. 2496–2505. Artif. Intell., 2020, vol. 34, no. 4, pp. 3414–3421.
[19] X. Liang, X. Du, G. Wang, and Z. Han, “A deep reinforcement learning [42] H. Wei et al., “PressLight: Learning max pressure control to coordinate
network for traffic light cycle control,” IEEE Trans. Veh. Technol., traffic signals in arterial network,” in Proc. 25th ACM SIGKDD Int.
vol. 68, no. 2, pp. 1243–1253, Feb. 2019. Conf. Knowl. Discovery Data Mining, Jul. 2019, pp. 1290–1298.
[20] T. Chu and J. Wang, “Traffic signal control with macroscopic fun- [43] X. Zang, H. Yao, G. Zheng, N. Xu, K. Xu, and Z. Li, “MetaLight:
damental diagrams,” in Proc. Amer. Control Conf. (ACC), Jul. 2015, Value-based meta-reinforcement learning for traffic signal control,” in
pp. 4380–4385. Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 1, pp. 1153–1160.
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9345
[44] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement Zhiying Wang is currently pursuing the master’s
learning for multiagent systems: A review of challenges, solutions, degree with the School of Computer Science and
and applications,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3826–3839, Technology, Xidian University. Her research inter-
Sep. 2020. ests include mobile computing, sensor networks, and
[45] S.-M. Hung and S. N. Givigi, “A Q-learning approach to flocking with service computing.
UAVs in a stochastic environment,” IEEE Trans. Cybern., vol. 47, no. 1,
pp. 186–197 Jan. 2017.
[46] L. Cui, X. Wang, and Y. Zhang, “Reinforcement learning-based asymp-
totic cooperative tracking of a class multi-agent dynamic systems using
neural networks,” Neurocomputing, vol. 171, pp. 220–229, Jan. 2016.
[47] S. S. Mousavi, M. Schukat, and E. Howley, “Deep reinforcement learn-
ing: An overview,” in Proc. SAI Intell. Syst. Conf., 2018, pp. 426–440.
[48] W. Wei, R. Yang, H. Gu, W. Zhao, C. Chen, and S. Wan, “Multi-
objective optimization for resource allocation in vehicular cloud com-
puting network,” IEEE Trans. Intell. Transp. Syst., early access,
Aug. 3, 2021, doi: 10.1109/TITS.2021.3091321.
[49] L. Kuang, T. Gong, S. OuYang, H. Gao, and S. Deng, “Offloading
decision methods for multiple users with structured tasks in edge Honghao Gao (Senior Member, IEEE) received the
computing for smart cities,” Future Gener. Comput. Syst., vol. 105, Ph.D. degree in computer science from Shanghai
pp. 717–729, Apr. 2020, doi: 10.1016/j.future.2019.12.039. University, China, in 2012.
[50] H. Gao, W. Huang, and X. Yang, “Applying probabilistic model checking He started his academic career at Shanghai Univer-
to path planning in an intelligent transportation system using mobility sity in 2012. He is currently with the School of Com-
trajectories and their statistical data,” Intell. Automat. Soft Comput., puter Engineering and Science, Shanghai University.
vol. 25, no. 3, pp. 547–559, Jan. 2019. He is also a Professor at Gachon University, South
[51] H. Gao, H. Miao, L. Liu, J. Kai, and K. Zhao, “Automated quantitative Korea. Prior to that, he was a Research Fellow with
verification for service-based system design: A visualization transform the Software Engineering Information Technology
tool perspective,” Int. J. Softw. Eng. Knowl. Eng., vol. 28, no. 10, Institute, Central Michigan University, USA, and an
pp. 1369–1397, 2018. Adjunct Professor at Hangzhou Dianzi University,
Xiaoxian Yang received the Ph.D. degree in man- China. Moreover, he has broad working experience in cooperative industry-
agement science and engineering from Shanghai university research. He is a European Union Institutions-appointed external
University, China, in 2017. She is currently an expert for reviewing and monitoring EU Projects. He has published articles
Assistant Professor with Shanghai Polytechnic Uni- in IEEE T RANSACTIONS ON I NDUSTRIAL I NFORMATICS , IEEE T RANS -
versity, China. She has published more than ACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS , IEEE I NTERNET
20 articles in academic journals, such as IEEE OF T HINGS J OURNAL, IEEE T RANSACTIONS ON N ETWORK S CIENCE AND
T RANSACTIONS ON I NTELLIGENT T RANSPORTA - E NGINEERING, IEEE T RANSACTIONS ON C OGNITIVE C OMMUNICATIONS
TION S YSTEMS , IEEE T RANSACTIONS ON C OM - AND N ETWORKING , IEEE T RANSACTIONS ON G REEN C OMMUNICATIONS
PUTATIONAL S OCIAL S YSTEMS , ACM TOMM, AND N ETWORKING , IEEE T RANSACTIONS ON C OMPUTATIONAL S OCIAL
ACM TOIT, FGCS, MONET, IJSEKE, and COM- S YSTEMS , IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTA -
NET. She has obtained two patents and three reg- TIONAL I NTELLIGENCE, IEEE/ACM T RANSACTIONS ON C OMPUTATIONAL
istered software copyrights in China involving wireless networks, workflow B IOLOGY AND B IOINFORMATICS, ACM TOIT, ACM TOMM, ACM TMIS,
management, and formal verification. Her research interests include business IEEE J OURNAL OF B IOMEDICAL AND H EALTH I NFORMATICS , and IEEE
process management, formal verification, wireless networks, and mobile Network Magazine. His research interests include software formal verification,
health. She has participated in organizing international conferences and the Industrial IoT Networks, vehicular communication, and intelligent medical
workshops, such as CollaborateCom 2018–2020, ChinaCom 2019–2020, and image processing.
Mobicase 2019–2020. She has worked as a Guest Editor of MONET, WINE, Prof. Gao is a fellow of IET, BCS, and EAI, a Senior Member of CCF
and JIT, and served as a Reviewer for IEEE T RANSACTIONS ON I NDUSTRIAL and CAAI, a member of the EPSRC Peer Review Associate College for
I NFORMATICS , IEEE T RANSACTIONS ON I NTELLIGENT T RANSPORTATION U.K. Research and Innovation, U.K., and a Founding Member of the IEEE
S YSTEMS , IEEE T RANSACTIONS ON AUTOMATION S CIENCE AND E NGI - Computer Society Smart Manufacturing Standards Committee. He was a
NEERING , IEEE J OURNAL OF B IOMEDICAL AND H EALTH I NFORMATICS , recipient of the Best Paper Award at IEEE T RANSACTIONS ON I NDUSTRIAL
FGCS, PPNA, Wireless Networks, and Computer Networks. I NFORMATICS in 2020 and EAI CollaborateCom in 2020. He is the Editor-in-
Chief of the International Journal of Intelligent Internet of Things Computing
Yueshen Xu (Member, IEEE) received the Ph.D. (IJIITC), an Editor of Wireless Networks (WINE) and IET Wireless Sensor
degree from Zhejiang University. He was a Systems (IET WSS), and an Associate Editor of IEEE T RANSACTIONS ON
co-trained Ph.D. Student at the University of Illinois I NTELLIGENT T RANSPORTATION S YSTEMS (IEEE T-ITS), IET Intelligent
at Chicago. He is currently an Associate Professor Transport Systems (IET ITS), IET Software, the International Journal of
with the School of Computer Science and Technol- Communication Systems (IJCS), the Journal of Internet Technology (JIT), and
ogy, Xidian University. He has published more than Engineering Reports (EngReports).
40 papers in a series of international conference pro-
ceedings and journals. His research interests include
mobile computing, the Internet of Things Technol-
ogy, edge computing, and recommender systems.
He is a Reviewer and a PC Member of many journals
and conferences, such as IEEE ICDCS, Mobile Networks and Applications,
Neurocomputing, and Knowledge-Based Systems.
Li Kuang received the Ph.D. degree in computer sci- Xuejie Wang is currently pursuing the M.S. degree
ence from Zhejiang University, China, in 2009. She in computer science with the School of Com-
is currently a Professor with the School of Computer puter Engineering and Science, Shanghai University,
Science and Engineering, Central South University. China. His research interests include reinforcement
Her research interests include service computing, learning and edge computing.
mobile computing, and privacy preservation.
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.