0% found this document useful (0 votes)

22 views11 pages

An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things

This paper presents a novel approach to intelligent traffic signal control using multiagent reinforcement learning (MARL4TS) and artificial intelligence of things (AIoT) to optimize urban traffic management. The proposed method utilizes real-time traffic data to dynamically adjust traffic signals, thereby reducing vehicle delays and improving overall traffic flow efficiency. Experimental results demonstrate that MARL4TS outperforms traditional traffic signal control methods, addressing the challenges posed by increasing vehicle volumes and complex urban environments.

Uploaded by

vishnuvardhan.pola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things

Uploaded by

vishnuvardhan.pola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO.

7, JULY 2022 9335

An Information Fusion Approach to Intelligent

Traffic Signal Control Using the Joint Methods of
Multiagent Reinforcement Learning and
Artificial Intelligence of Things
Xiaoxian Yang , Yueshen Xu , Member, IEEE, Li Kuang , Zhiying Wang,
Honghao Gao , Senior Member, IEEE, and Xuejie Wang

Abstract— With the development of communication technol- the intelligent navigation of vehicles in a safe way [1], [2]
ogy and artificial intelligence of things (AIoT), transportation without human interaction. Currently, a large number of
systems have become much smarter than ever before. However, artificial intelligence of things (AIoT) devices are deployed in
the volume of vehicles and traffic flows have rapidly increased.
Optimizing and improving urban traffic signal control is a transportation systems to support information collection and
potential way to relieve traffic congestion. In general, traffic situation awareness. However, another problem has arisen:
signal control is a sequential decision process that conforms to the volume of cars has substantially increased, leading to a
the characteristics of reinforcement learning, in which an agent variety of traffic problems, such as road congestion, traffic
constantly interacts with its environment, thus providing strategy accidents and traffic disputes [3]–[5], and travel efficiency is
for optimizing behavior in accordance with feedback in response.
In this paper, we propose multiagent reinforcement learning for decreasing. Additionally, slow driving and long wait times
traffic signals (MARL4TS) to support the control and deployment for vehicles at intersections contribute to poor air quality.
of traffic signals. First, information on traffic flows and multiple We consider the idea of AIoT-based traffic signal control,
intersections is formalized as input environments for performing in which urban traffic signals are controlled by dynamic
reinforcement learning. Second, we design a new reward function forecasts and feedback from AIoT monitoring and data mining.
to continuously select the most appropriate strategy as control
during multiagent learning to track actions for traffic signals. Vehicle-to-roadside (V2R) collaborative computing is consid-
Finally, we use a supporting tool, Simulation of Urban MObility ered to relieve the traffic congestion problem. Traffic signals
(SUMO), to simulate the proposed traffic signal control process control and regulate vehicles in urban road networks and at
and compare it with other methods. The experimental results intersections to improve the travel efficiency of users and the
show that our proposed MARL4TS method is superior to the overall efficiency of traffic circulation [6]–[8]. The principle of
baselines. In particular, our method can reduce vehicle delay.
urban traffic signalization is to control and optimize the roles
Index Terms— Traffic signal control, artificial intelligence of of vehicles and pedestrians at intersections by adjusting the
things, collaborative computing, multiagent reinforcement learn- duration of each traffic signal. One important goal is to ensure
ing, information fusion.
the safety of urban traffic [9], [10]. Vehicles and pedestrians
should be able to pass through intersections in a safe and
I. I NTRODUCTION orderly manner without interfering with each other. However,
due to the randomness, complexity and time variability of
T HE goal of automated driving is to integrate advanced
sensors, embedded cameras, fifth-generation/sixth-
generation (5G/6G) wireless communication, traffic signals,
urban traffic flows, existing traffic signal control methods
cannot meet the needs of current traffic conditions to solve
and cloud-fog-edge computing to collaboratively enable the problem of urban traffic congestion. Therefore, an efficient
method is needed to improve the utilization rate of roads and
Manuscript received 1 February 2021; revised 1 May 2021 and 24 July intersections. Predicting traffic flow is a challenging research
2021; accepted 13 August 2021. Date of publication 30 August 2021; date of topic for controlling traffic signals in intelligent transportation
current version 8 July 2022. This work was supported by the National Natural
Science Foundation of China under Grant 61902236. The Associate Editor for systems.
this article was S. Wan. (Corresponding authors: Yueshen Xu; Honghao Gao.) There are two common control methods for traffic signals
Xiaoxian Yang is with the School of Computer and Information Engi- at intersections [11]–[14]. The first method is a simple sta-
neering, Shanghai Polytechnic University, Shanghai 201209, China (e-mail:
[email protected]). tistical method that operates in accordance with the statistical
Yueshen Xu and Zhiying Wang are with the School of Computer Sci- characteristics of the intersection traffic conditions. The second
ence and Technology, Xidian University, Xi’an 710126, China (e-mail: method is based on the durability of the roads, seasonal factors
[email protected]; [email protected]).
Li Kuang is with the School of Computer Science and Engineering, Central such as climate, and timing plans for morning and evening
South University, Changsha 410075, China (e-mail: [email protected]). traffic. However, neither of these methods can successfully
Honghao Gao and Xuejie Wang are with the School of Computer Engi- adapt to changing traffic conditions or improper management
neering and Science, Shanghai University, Shanghai 200444, China (e-mail:
[email protected]; [email protected]). of other intersections. It is difficult to use conventional meth-
Digital Object Identifier 10.1109/TITS.2021.3105426 ods to solve the traffic signal control problem; instead, this
1558-0016 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9336 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022

task is better suited to the field of artificial intelligence and model the traffic environment. Section IV introduces applica-
collaborative technologies. For example, fuzzy control, neural tions and the proposed techniques for traffic signal control.
networks, and generative adversarial networks (GANs) have Section V presents the experimental results and provides
been utilized for this purpose. Although fuzzy control is con- analysis and discussion. We conclude this paper and identify
venient for real-time control, it lacks learning capabilities and topics of future research in Section VI.
validity over time. Moreover, although applications involving
neural networks or GANs have yielded satisfactory results, II. R ELATED W ORK
adequate time for data training is required for such methods, Reinforcement learning has enabled many breakthroughs
and real-time response cannot be guaranteed. in theory and experiments in interdisciplinary studies involv-
Hence, we are motivated to use reinforcement learning psychology, intelligent computing, operations research
ing [15], [16], which is suitable for the adaptive control of and control theory. In particular, reinforcement learning has
traffic signals. In reinforcement learning, an agent obtains become the most promising method in the field of traffic signal
useful information on current traffic conditions and then selects control, and related research is flourishing. In this section,
an appropriate strategy to interact with its environment. The we present a review of the major techniques and methods most
environment returns feedback, and the agent then selects closely related to our work.
a more suitable strategy in accordance with the feedback. Some studies aim to optimize traffic efficiency and sim-
Such a method of learning for traffic signal control can run ulate traffic networks with real-world data. These works
continuously. Therefore, the multiagent reinforcement learning have achieved promising results. Qiao et al. [17] proposed
for traffic signals (MARL4TS) method proposed in this paper a two-stage fuzzy logic control model for isolated signal-
has good learning capabilities, a satisfactory training speed, ized intersections that considers both traffic efficiency and
and a powerful instant feedback ability. MARL4TS utilizes fairness. An offline genetic algorithm (GA) was proposed
communication among multiple intersections to achieve an to optimize the fuzzy rules and membership functions for
optimal joint mode of multiagent learning. Thus, this method the controller. Wei et al. [18] addressed traffic signal control
is beneficial for urban development and can play a significant using an improved reinforcement learning method. However,
role in guaranteeing the safety and smoothness of urban road this approach cannot handle multiway traffic signal control.
traffic. Liang et al. [19] aimed to address the problems of long
Based on the infrastructure of AIoT, we use the Q-learning traffic delays and wasted energy. By collecting traffic data
algorithm, a reinforcement learning method, to perform timing and dividing a whole intersection into small grids, a complex
control for urban regional traffic signals. The Q-learning traffic scene was quantified in terms of states. Chu T et al. [20]
algorithm does not estimate an environment model but rather integrated a macroscopic fundamental diagram (MFD) into
directly optimizes an iteratively computable value function. their microscopic urban traffic flow model to constrain the
When certain conditions occur, only simple strategies are search space for control policies. After construction of the
needed to ensure convergence of the algorithm. These char- MFD model, Q-learning was implemented to reduce the com-
acteristics are consistent with the characteristics of urban putational cost of the large-scale stochastic control problem.
regional traffic signal control. Thus, we model traffic flows and Rodrigues et al. [21] attempted to improve reliability in highly
their changes as inputs representing environmental changes, dynamic urban areas. Darmoul et al. [22] aimed to optimize
and the traffic signals at intersections are optimized as the a common conceptual framework and address the lack of a
output behaviors for signal control. In summary, the contribu- built-in adaptation mechanism to achieve specific interference
tions of this paper are detailed as follows: management. They applied the concepts and mechanisms of
1) We study the signal control of regional intersections biological immunity to build a distributed and adaptive traffic
using AIoT devices, for which information on traffic flows signal control system. Pandit et al. [23] employed vehicular ad
and intersections, including vehicle length, traffic capacity, and hoc networks (VANETs) to collect and aggregate the speed and
signal duration, can be formalized as the input environments position information of each vehicle to optimize signal control
for reinforcement learning. at traffic intersections. They modeled traffic signal control as
2) We build a reinforcement learning model based on a job scheduling problem on processors. Baskar et al. [24]
multiple agents and collaborative computing to address the introduced a driving reimbursement coefficient and a delay
problem of traffic congestion. When using the Q-learning time to estimate the effectiveness of a time distribution system
algorithm to perform reinforcement learning, we select the and reduce the delay due to wait times. However, this approach
most appropriate strategy for traffic signal control as feedback does not perform well with traffic congestion.
after evaluating the reward function. Current traffic environments are dynamic and varying in
3) We demonstrate that our method can reduce vehicle time, making scheduled traffic signal timing methods obsolete.
delay through simulations implemented using the Simulation Zeng et al. [25] combined a recurrent neural network (RNN)
of Urban MObility (SUMO) software package. Furthermore, with a deep Q-network (DQN) and introduced a variant
we design an emergency vehicle routing scenario and perform deep reinforcement learning agent. This method takes advan-
comparative experiments to demonstrate the practical applica- tage of real-time Global Positioning System (GPS) data and
tion of our method. learns how to control traffic signals at a single intersection.
The remainder of this paper is organized as follows: Srinivasan et al. [26] proposed multiagent reinforcement learn-
Section II discusses related works. Section III shows how to ing based distributed unsupervised traffic signal control for

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9337

one intersection in a traffic network. El-Tantawy et al. [27]

introduced reinforcement learning based adaptive traffic signal
control to solve the stochastic closed-loop optimal control
problem. Mousavi et al. [28] constructed two reinforcement
learning algorithms, a depth strategy gradient and an agent
based on a value function, to predict the best traffic signals at
intersections. Li et al. [29] combined a deep neural network
and reinforcement learning to determine appropriate signal
timing policies by implicitly modeling the control actions and
the resulting changes in the system states. Wang et al. [30]
proposed a new neural network architecture for model-free
reinforcement learning. Jin et al. [31] proposed a decentral-
ized traffic signal control system enabled by a hierarchical
multiagent modeling framework. Each intersection represents
an agent, and the optimal action is selected in accordance
with local control policies. In addition, the collective actions
of the agents are coordinated by setting different priorities. Fig. 1. AIoT-based framework for an agent learning about traffic.
Vidhate et al. [32] proposed a new set of improved cooperative
learning algorithms. His method solves the traffic congestion reinforcement learning methods and a federated averaging
problem for the target intersection, but the traffic congestion algorithm to solve the problems of weak robustness and
problem for adjacent intersections remains. high computational complexity caused by centralized rein-
Employing traffic signal control strategies at individual forcement learning. Chen C et al. [41] focused on traffic
intersections might affect the overall efficiency of the traffic signal control with thousands of traffic signals. The ‘pressure’
network, making the global traffic congestion problem more concept was used to achieve signal coordination at the region
difficult to solve. Thus, cooperation among intersections is level. Wei et al. [42] combined reinforcement learning with
necessary. Agent-cooperation-based multiagent reinforcement max pressure (MP) in the transportation field to redesign the
learning makes such cooperation possible. Chu et al. [33] reward function. Zang et al. [43] proposed a value-based meta-
utilized multiagent reinforcement learning to overcome the reinforcement learning framework for traffic signal control,
scalability problem and the extremely high dimensionality named MetaLight, to accelerate the learning process. This
of the joint action space. Wiering et al. [34] proposed a framework is able to learn new scenarios by leveraging the
multiagent reinforcement learning approach to optimize the knowledge learned from previous scenarios and improves upon
overall wait times of vehicles. They applied global commu- the state-of-the-art framework for reinforcement learning and
nication information between different types of traffic signal planning known as FRAP for traffic signal control.
agents to study reinforcement learning systems. Prabuchan- In contrast to the previously mentioned existing methods,
dran et al. [35] formulated the traffic signal control problem our method models traffic flows to represent environmen-
as a Markov decision process using multiagent reinforcement tal changes and then learns from this model to output an
learning for policy optimization. They modeled each traffic optimized strategy for traffic signal control. This method is
intersection as an agent and used Q-learning to update the an intelligent method that uses feedback responding to the
Q-factors of each agent. El-Tantawy et al. [36] introduced a changing environment in accordance with the current status
novel multiagent reinforcement learning system for an inte- of the transportation system.
grated network of adaptive traffic signal controllers. In their
method, each intersection controller works independently of
III. P ROPOSED M ETHOD
the other agents, and each controller coordinates signal control
actions with neighboring intersections. Aslani at el. [37] aimed In this section, we discuss the AIoT scenario regarding how
to address chronic traffic congestion in dense downtown areas to collect and monitor traffic flow data with different sensors.
and to achieve real-time adaptation by responding to chang- We introduce how to encode this environmental information
ing traffic network dynamics. They introduced an actor-critic as an input source for agents to learn from.
framework with the ability to generalize to new and unseen
traffic conditions. Wu T et al. [38] proposed a multiagent
recurrent deep deterministic policy gradient (MARDDPG) A. AIoT-Based Framework
algorithm for traffic signal control. In this algorithm, cen- Figure 1 shows the AIoT-based framework of our method.
tralized training enables each agent to coordinate with other It is assumed that AIoT devices, such as velocity sensors, pas-
agents in the decision-making process, and decentralized exe- sive infrared (PIR) sensors, target tracking video cameras, and
cution enables each agent to independently make decisions. traffic density sensors, are deployed at intersections with traffic
Ma et al. [39] combined multiagent reinforcement learning signals. These smart sensors are important for ensuring the
with a feudal hierarchy to avoid local optima. This method traffic awareness capacity, i.e., the ability to perceive the traffic
enables each coordinate to retain scalability. Wang T et al. [40] environment. Based on these sensors, agents with the power of
proposed two advantage actor-critic (A2C)-based multiagent learning and computing are also placed at intersections. They

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9338 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022

the vehicle queue length. In our study, the vehicle delay time
serves as the basis for judging the reward function. We use
the symbol di to represent the wait time for vehicles in each
of the four phases. The vehicle delay time follows Eq. (2):
C(1 − λ) x2
d= + (2)
Fig. 2. Intersection phase model under different signal lamps. 2(1 − x) 2q(1 − x)
where λ = g/C is the green signal ratio, q is the traffic flow
can collaborate to respond to changing traffic flows in order in the signal phase of interest, and x is the traffic saturation
to take optimal actions in controlling the traffic signals. in this phase, as defined below.
The agents work in accordance with the following steps. Considering the balance between the time consumption of
First, each agent reads the environmental information and the traffic flow calculation and the time interval of traffic flow
obtains the state space at the initial time. Second, a strategy data collection, we use the vehicle length li in each phase
selection process based on a reward function is implemented within the time interval li to represent the traffic flow in this
to select a suitable action for the current state in the action phase. The symbol D represents the maximum traffic capacity
space and calculate the return value Q using reinforcement of the road. The traffic saturation x i in a given phase is the
learning. Third, each agent chooses an action to perform ratio of the vehicle length li in that phase to the total traffic
at its corresponding traffic intersection in accordance with capacity D.
the returned strategy. Fourth, by continuously repeating these li
processes, the agents output the optimal control strategy. xi = (3)
D
The signal control of intersections by agents is a training li (t + t) = li (t) + li (t)
process with real-time feedback. The goal of the Q-learning
algorithm is to maximize the expected cumulative reward, = li (t) + li (4)
for which the value Q is the important element at each time Based on the AIoT-based framework shown in Figure 1,
instance. The Q-learning algorithm continues to apply policy AIoT devices play an important role in dynamically monitor-
selection actions until the loop is completed or the final ing and computing the index parameters of interest during a
state is reached. Fundamentally, a reward is given after each period of time. For example, the vehicle length li as monitored
time the process is updated; thus, there are n rewards, and and computed by AIoT devices is defined as the total length
the corresponding cumulative values, which are considered of all vehicles that pass the intersection in a certain phase
knowledge, can be used to determine which strategy to apply. during a given time interval. Inner-lane vehicles approaching
This process is the core focus of our research. an intersection have the option of either going straight or
turning left; in the latter case, this may conflict with oncoming
traffic. Note that the probabilities of these two scenarios in the
B. Traffic Environment Information
inner lane at an intersection are equal.
At an intersection, the urban semaphore model comprises
two-way control. There are four main phases of the signal C. Reward Function
lamps. Each lamp for each different traffic direction has a The reward function r (s, a) is the most important compo-
certain duration in one specified period. In practice, the signal nent in the Q-learning algorithm. However, establishing the
duration should be appropriately configured depending on the reward function r (s, a) is difficult. We need to consider the
intersection environment. If the duration is too short, this is feasibility of the reward function and guarantee that it is
not conducive to vehicle safety; that is, the intersection will be easy to compute and can satisfy the needs of the intended
prone to traffic accidents. If the duration is too long, the wait application. Here, the vehicle delay time as an important
time of vehicles will be increased, leading to traffic congestion. metric for measuring traffic conditions is used as a parameter
We use the symbol i ∈ {1, 2, 3, 4} to represent the four in the reward function. The reward function ri (s, a) is given
different phases, as illustrated in Figure 2. Generally, a traffic in Eq. (5):
signal uses three colors: red, green and amber. Since the amber
signal duration does not affect the driving and stopping of the − (φ ∗ dt + ϕ ∗ lt )
rt (s, a) = (5)
vehicles, this time is disregarded. We use the symbol ri to φ+ϕ = 1
represent the red signal duration in each of the four phases where rt (s, a) is the reward obtained after adopting strategy
and the symbol gi to represent the green signal duration in at in state st . We can adjust φ and ϕ to output an optimized
each of the four phases. We use the symbol C to represent the strategy. The parameter dt is the vehicle delay time at time t.
total semaphore period, that is, The parameter lt is the length of the vehicles waiting at time t.

4
4 The maximum reward means to minimize the time index.
C= ri + gi (1) G t = rt (s, a) + γ rt +1 (s, a) + γ 2rt +2 (s, a) · · ·
i=1 i=1
+γ T −t rt +T (s, a)
There are many indicators for evaluating signal control
T −t
schemes, such as the number of vehicles that pass through = γ i rt +i (s, a) (6)
an intersection in a unit of time, the vehicle delay time, and i=0

We can use the above reward formula for the computa-

tion considering the long-term benefit. For each time step t,
the reward is r (s, a). Note that the parameter γ ∈ [0, 1] is the
discount factor and it is called as the short-term benefit when
the value is near 0.

D. State-Action Pairs
When using reinforcement learning, descriptions of the
state space, action space, and reward function of the problem
must be given. The state space of the intersection at time Fig. 3. Learning process based on information fusion for traffic signal control.
k is expressed as sk = (L k.1 , L k.2 , L k.3 , L k.4 ), where L k.i Algorithm 1 Agent Learning for a Single Intersection (ALSI)
represents the time difference between the durations of two Input: Current traffic state s, action set A(s)
green signals in each phase. The state set is S(a) = {s1 , Output: Action at for handling traffic signal control
s2 . . . sk . . .}. In the case in which the state space is expressed 1: while T in episode:
in terms of the vehicle length, because the duration of the 2: Initialize s and Q(s,a) ← 0
cycle has been determined, we choose the green signal ratio 3: for step in each T
4: Choose action at from A(s t ) via the -greedy method
for signal control and use the different green signal ratios
5: Take this action a t and obtain feedback rt
in different phases as the action space. Therefore, the action 6: Q t (st ,at ) = (1 − βt ) Q t (st , at)
space at time k is ak = (g1, g2, g3, g4), and the action
+βt rt + γ max Q t st +1 , at
set is A(s) = {a1 , a2 . . . ak . . .}. The target formulae used to 7:
a
st ←st +1 && t ← t + 1
evaluate the strategy π supported by actions are expressed as 8: end for
follows: 9: end while

Q (s, a) = E [G t |st = s, at = a, π]
∃s ∈ S(a), ∃a ∈ A(s) As the ALSI algorithm shows, for each episode, the first step
s.t. arg max Q π∗ (s, a) (7) is to reset the environmental state s and initialize the Q(s, a)
a
value to 0. Second, for each step of the episode, the current
from which it can be seen that the optimal strategy π will traffic state st is observed, and a suitable action at is selected in
maximize Q(s, a), which means that the current action used accordance with the current strategy via the -greedy method.
to control the traffic signals is the best choice. Action a is selected from action set A(s) to act on the current
state st and obtain feedback rt . The state of the environment
IV. AGENT L EARNING A PPLICATION will vary with changes in traffic flow, and the traffic state
of the environment will change to the next state st +1 for the
The multiagent reinforcement learning method is introduced estimated value Q t (st , at ). Thus, the Q value is designed to
to learn how to take the optimal actions for traffic signal be updated in accordance with the temporal difference (TD)
control. In this section, we consider two different application method, that:
scenarios: agent learning at a single intersection and agent
learning for multiple intersections. Q t (st , at ) = (1 − βt ) Q t (st , at )

+βt rt + γ max Q t (st +1 , at ) (8)
a
A. Modeling a Single Intersection for Agent Learning
Because the reward function is expressed in terms of the
As shown in Figure 3, in the learning process that we delay, the goal of the strategy is to earn the maximum reward.
implement, each intersection is represented by an agent to Next, the state space and time phase are updated, that is, Q(s,
perform reinforcement learning. As introduced in the traffic a) is obtained to determine whether the value has changed
environment information section, the traffic volume, traffic from a small value to a large value, where βt ∈ (0, 1) is the
saturation, and signal phase are encoded to represent the update rate; then, s ← st +1 and t ← t + 1. Subsequently,
environment. The appropriate action for controlling the traffic the process loops through the previously described steps until
signal is returned in terms of the green signal ratio. The green the episode is complete.
signal ratio is an important consideration to reduce the delay
time.
B. Modeling Multiple Intersections for Agent Learning
For this learning process, the current traffic state is input to
represent the traffic environment. We randomly select an action Traffic information and decision making for a single inter-
from the action set A(s) to act on the current state. Next, based section affect the surrounding traffic flow and other intersec-
on the received feedback, we force the traffic state to the next tions. Therefore, an agent that controls multiple intersections
state in accordance with the updated Q value, traffic status and must globally balance the traffic flow to achieve the shortest
time. We then make a decision during the iterative process. overall wait time and average queue length at intersections for
More details are shown in Algorithm 1, Agent Learning for a the entire area. We consider multiagent reinforcement learning
Single Intersection (ALSI). to better understand the traffic environment.

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9340 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022

the multiagent learning process for multiple intersections.

Considering that with multiple agents, it is necessary to
consider the interactions among the agents, each agent must
learn its environment and save the maximum value. We define
V(s, a) as the global action function for the multiple agents
and define a state-action pair corresponding to the Q table
with Q(s, a) for the center agent to learn. The value V(s, a)
will be updated by selecting the value resulting from Q(s, a).
We then set the discount factor and learning update rate to
run the Q-learning algorithm again. Algorithm 2 shows our
Fig. 4. Agent learning for multiple intersections. method, Agent Learning for Multiple Intersections (ALMI).

Algorithm 2 Agent Learning for Multiple Intersections

(ALMI)
Input: Current traffic state {s1, s2, s3 . . . , sn}, one agent selected
as the center agent
Output: Action at for handling traffic signal control
1: Initialize Q(s,a) ← 0 and V(s,a) ← 0
2: Read the traffic states {s1, s2, s3 . . . , sn} and corresponding
action sets A(s1), A(s1)…, A(sn) from neighboring traffic
intersections
3: while (compare (Q, V))
4: Generate the maximum value Q in each state using the
ALSI algorithm
5: Compute the reward based on the delay when each action is
performed
6: Update the Q value, status and time
7: Store the maximum Q value for each intersection and update
Fig. 5. Traffic flow information exchange between intersections.
it to V(s, a)
8: Detect changes in Q(s, a) and V(s, a) that support
Figure 4 shows the topology of agent learning for multiple compare (Q, V)
intersections. Each agent not only learns its regional traffic 9: end while
environment but also interacts with other agents at neighboring
traffic intersections. Accordingly, we can take one of these The ALMI algorithm starts by reading the traffic state to
agents as the central agent to carry out learning based on initialize Q(s, a) and V (s, a). If Q(s, a) is different from
information fusion, and all agents interact with this central V (s, a), compare (Q, V) is used to check the change, which
agent to share their learned information. This interaction means that ALMI has not reached a balanced, globally opti-
method contributes to the cognition, learning, feedback, and mal, state. The second step is to determine the queue length
evolution (CLFE) loop [44]–[46]. Thus, the greater the amount and possible actions for a single phase. The third step is to
of fused information learned by the central agent is, the better determine the transition function from the queue length to the
it can perceive the traffic environment. corresponding state. The fourth step of the learning process
Communication among multiple agents guarantees regional is to obtain the maximum Q value belonging to a certain
traffic control. This communication can be considered to state. The reward value is computed based on the queue length
effectively create a much larger agent that carries out rein- before and after each action is taken. Fifth, the estimated value
forcement learning. For example, as shown in Figure 5, Q(s, a) is updated based on the state-action pair. The last step
there is interaction between intersections A and B in terms is to store the maximum Q value obtained at each intersection
of information and traffic flow. Taking intersection A as a and update it to obtain V (s, a). This process aims to achieve
reference, intersection A will receive a traffic flow from the global optimization of the traffic signal control strategy. Thus,
other intersection and will send a traffic flow to the other the loop will continue until the values Q(s, a) and V (s, a) are
intersection. Intersection A will also transmit traffic flow similar.
information to intersection B and simultaneously receive traffic
flow information from intersection B. Thus, the corresponding
V. E XPERIMENTS AND E VALUATION
strategy is a strong control strategy for these two adjacent
intersections that is based on traffic flow information from In this section, we demonstrate the feasibility of our pro-
both intersections. When multiagent reinforcement learning is posed method based on experiments with reinforcement learn-
carried out, the actions used to achieve traffic signal control ing to control traffic signals. First, an overview of the SUMO
are based on community consensus. and application programming interface (API) implementation
The multiple-intersection learning algorithm can be is provided. Second, we detail the experimental settings,
regarded as a larger reinforcement learning algorithm with analyze the experimental results and compare the results of
more traffic environment information. Algorithm 2 shows MARL4TS with those of different methods.

TABLE I
P LATFORM E NVIRONMENT

TABLE II
PARAMETER S ETTINGS FOR A S INGLE I NTERSECTION

Fig. 6. Integrated example of SUMO.

A. SUMO-Based Simulation
To build the simulation, we used SUMO as a tool to support
traffic scenario visualization. SUMO can simulate the precise
movement of a single vehicle under the conditions of various
transportation facilities at the micro level. When SUMO is
employed as a test platform for intelligent traffic control
algorithms, it needs to interact well with external programs.
Thus, we used Python to connect SUMO with the Q-learning
algorithm based on the Traffic Control Interface (TraCI) API.
Figure 6 shows an integrated example of SUMO. In general,
the SUMO connection process requires the following three
steps:
1) Road Network and Routing Files: SUMO uses the
TABLE III
SUMO graphical user interface (SUMO-GUI) as a visual
PARAMETER S ETTINGS FOR M ULTIPLE I NTERSECTIONS
interface. A road network file generated by the net-convert
module function, which accepts a node file and an edge file
as inputs, is imported into SUMO-GUI. The user needs to
define the node file to represent intersections and the edge file
to represent roads.
2) Sumo.cfg File: While SUMO is running, road network
files, routing files, etc., are the basic descriptions used, but the
SUMO-GUI execution file will not autonomously call these
files. Thus, these files must be integrated into a configuration
file. The configuration file is named sumo.cfg and contains the
important XML labels <net-file> and <route-files>.
3) TraCI API: The TraCI interface obtains the data from
the SUMO traffic simulation environment and modifies and
controls these data in real time. Currently, the Python version
of TraCI is the most comprehensive, followed by the C++.
NET, MATLAB, and Java versions.

B. Experimental Settings Based on the road parameters listed in these tables, we set
To specify how the experiments were carried out, the maximum vehicle capacity D of each road segment to
Table I describes the experimental environment. For example, 100 and the cycle time C to 120. During the experiment,
the operating system was Windows 10, and the platform lan- we performed a total of 1000 cycles.
guage was Python using the PyCharm development platform. 2) Algorithm Parameter Settings: As previously mentioned,
1) Traffic Data Settings: The parameters of a road net- γ is the discount factor, with γ ∈ [0, 1]. If γ is set to a
work include the potential left-turn conflicts at intersections, larger value, the agent selection strategy will be more focused
number of lanes, traffic signals, number of phases, intersec- on future returns; if γ is small, then the agent selection
tions, vehicle generation frequency, and number of roads. The strategy will be more focused on immediate returns. In our
experimental parameter configuration for a single-intersection experiments, βt was set to 0.00001, and γ was set to 0.99.
cross-traffic network is shown in Table II. Moreover, the exper- 3) Working Process and Data: Figure 7 shows the overall
imental parameter configuration for a multiple-intersection process. The first step is to simulate the traffic environment.
traffic network is defined in Table III. The SUMO tool utilized in this study requires road network

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9342 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022

TABLE IV
P ERFORMANCE C OMPARISON OF THE Q-L EARNING
A LGORITHM W ITH D IFFERENT PARAMETERS

Fig. 7. SUMO working process.

files and route files as input to provide the desired road infor-
mation. And the second step is to input the implementation
of multiagent reinforcement learning with related parameter
settings. Note that the green signal time in each phase could
be obtained only after training and was input into SUMO to
TABLE V
complete the test. During the learning process, the agent will
R ESULTS OF A LGORITHMS U SED FOR T RAFFIC S IGNAL C ONTROL
return the optimal strategy for traffic signal control.

C. Experimental Results
We obtained the experimental results through SUMO-GUI
and performed some basic preprocessing. Mousavil et al. [47]
introduced a method for converting a snapshot from
SUMO-GUI into desired experimental data, such as the aver-
age reward, delay time and queue length. We needed to
convert each snapshot from a red-green-blue representation to
a grayscale representation, resize it to 128 × 128 frames, and deployed, mainly based on location and deployment issues.
provide the results as input to the system. For compatibility On the other hand, the average queue length slowly increases
with the Q-learning algorithm model, we used the DQN with increasing complexity of the road information. This
architecture, with the output layer corresponding to the action increase reflects the stability of the agents using the Q-learning
value. The training time for this network was 1000 cycles. The algorithm.
agent learning strategy was evaluated every 5 training cycles. 2) Comparative Analysis: Next, we compared four traffic
After running SUMO-GUI 5 times, data such as the average signal control methods. The other three traffic signal control
delay time and queue length were obtained. methods considered for comparison are fixed timing control,
1) Performance Analysis: Based on the simulation environ- random timing control and longest queue control. Fixed timing
ment described above, control experiments using Q-learning and random timing are different options regarding the green
were carried out. To compare the performance of the time ratio at an intersection. The first option is to choose a
Q-learning algorithm under different conditions, we applied fixed green time ratio, and the second option is to choose a
the Q-learning algorithm with different parameters. The para- random green time ratio. The longest queue control method
meters that we varied included the frequency of vehicle refers to selecting the phase timing based on the longest queue
generation. For different phases, we ran SUMO-GUI simu- length when timing each phase at an intersection. We ran the
lations using the previously described configuration settings. SUMO-GUI simulator using the previously described method
Table IV shows the results, from which we can compare the again and compared Q-learning with the fixed timing, random
average delay time and average queue length when the number timing, and longest queue algorithms to evaluate the average
of cycles, the cycle time and the maximum lane capacity are delay time and queue length.
changed. Table V and Figure 8 compare the results in terms of
As shown in Table IV, as the vehicle generation frequency the average delay time and the average queue length when
increases, the average delay time and average queue length the different algorithms are used. These results indicate that
also increase. For example, the configuration (0.5, 1000, when the Q-learning algorithm is employed for traffic signal
120, 100) is worse than (0.3, 1000, 120, 100). However, timing at intersections, the average delay time and average
when the vehicle generation frequency is fixed, a longer cycle queue length are much lower than those under the other three
time results in a lower average delay time and a shorter algorithms. The results also demonstrate the efficiency of the
average queue length. Thus, better results may be achieved by Q-learning algorithm in optimizing urban traffic signal control.
appropriately extending the cycle time. In our experiments, the In multiagent reinforcement learning, an agent can not only
best configuration for delay time was found to be (0.5, 1000, collect traffic environment information for its own node but
120, 50), and the best configuration for queue length was also learn through information fusion based on information
found to be (0.3, 1000, 120, 100). Moreover, the maximum collected by other agents at nearby nodes.
capacity of each lane is also a key factor. Similarly, we can 3) Emergency Vehicle Routing Scenario: We also performed
conclude that it is better to increase the lane capacity where an experiment based on a real-world road network to illustrate
possible, which should be considered when traffic signals are the performance of the proposed Q-learning algorithm and
Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
YANG et al.: INFORMATION FUSION APPROACH TO INTELLIGENT TRAFFIC SIGNAL CONTROL 9343

Fig. 10. The fixed-timing-based path.

Figure 10 visualizes the fastest path computed under the fixed-

time algorithm, which is highlighted in red. Furthermore,
Fig. 8. Comparison of our method with other algorithms. Table VI compares the performance of the four algorithms
in the real-world road network. For emergency vehicles in
practice, time is life. Our Q-learning algorithm outperformed
the other three algorithms in both average delay time and
average queue length. This demonstrates that our method
can learn from changes in the environment and determine
appropriate actions for traffic signal control.

VI. C ONCLUSION AND F UTURE W ORK

Advancements in intelligent technologies and communi-
cation bandwidth have contributed to the development of
intelligent transportation systems. Based on these infrastruc-
tures, V2R collaborative computing is expected to support
Fig. 9. The Q-learning-based path. an increasing level of intelligence in transportation systems.
TABLE VI
In this paper, we propose a multiagent reinforcement learning-
R ESULTS OF THE A LGORITHMS W HEN U SED FOR R EAL -W ORLD ROADS
based approach, MARL4TS, for traffic signal control. This
approach enables fast real-time decision making in response to
changing traffic flows to regulate the timing of regional traffic
signals. Because of its advantages, the proposed approach
is expected to play an important role in coordinating and
balancing the traffic flows in all directions at urban road
intersections and alleviating urban traffic congestion.
We designed and trained our model to control vehicle
traffic at intersections by varying the green time ratio and
then observing the resulting vehicle movements through the
the other methods considered for comparison. We again used SUMO-GUI visual interface in the SUMO simulation tool.
the simulation tool SUMO to simulate real-world roads. The We also evaluated and compared the performance of differ-
simulation scenario is shown in Figure 9 and Figure 10. The ent methods through experiments. The results show that our
starting point is A, and the destination is B. In the simulated method is superior to the compared methods.
road network, there are 168 traffic signals and more than In future work, the issue of the transmission rate among
100 roads. Therefore, this road network is complicated and multiple agents will be considered. Increasing the speed of
appropriate for evaluating the performance of the different communication will be our main research direction, and we
methods. Regarding the parameter settings, the vehicle gen- will employ both local and global communication methods,
eration frequency in SUMO was set to 0.5, and the number such as the task offloading method employed in edge com-
of cycles was set to 1000, following the typical settings. puting architectures [48], [49], to address this problem. When
We again compared the performance of our algorithm with modeling the traffic environment, we plan to use roadside units
that of the fixed timing method, the random timing method and vehicle-to-everything communication as means to enhance
and the longest queue method in terms of two metrics, i.e., the the data sources for learning. In this way, valuable informa-
average delay time and average queue length. tion can be obtained for multiagent reinforcement learning,
Figure 9 visualizes the fastest path computed under such as vehicle velocities and vehicle directions. We also
the Q-learning algorithm, which is highlighted in blue. suggest performing a formal verification of the transportation

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.
9344 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 7, JULY 2022

system [50], [51] using probabilistic model checking, for [21] F. Rodrigues and C. L. Azevedo, “Towards robust deep reinforcement
example, before deploying a traffic signal control strategy learning for traffic signal control: Demand surges, incidents and sensor
failures,” in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019,
to quantitatively verify the reliability and correctness of the pp. 3559–3566.
returned strategy. [22] S. Darmoul, S. Elkosantini, A. Louati, and L. B. Said, “Multi-agent
immune networks to control interrupted flow at signalized intersections,”
Transp. Res. C, Emerg. Technol., vol. 82, pp. 290–313, Sep. 2017.
R EFERENCES [23] K. Pandit, D. Ghosal, H. M. Zhang, and C. N. Chuah, “Adaptive
traffic signal control with vehicular ad hoc networks,” IEEE Trans. Veh.
[1] H. Gao, C. Liu, Y. Li, and X. Yang, “V2VR: Reliable hybrid-network- Technol., vol. 62, no. 4, pp. 1459–1471, May 2013.
oriented V2V data transmission and routing considering RSUs and [24] L. D. Baskar, B. de Schutter, J. Hellendoorn, and Z. Papp, “Traffic
connectivity probability,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 6, control and intelligent vehicle highway systems: A survey,” IET Intell.
pp. 3533–3546, Jun. 2021, doi: 10.1109/TITS.2020.2983835. Transp. Syst., vol. 5, no. 1, pp. 38–52, Mar. 2011.
[2] X. Xu, J. Zhao, Y. Li, H. Gao, and X. Wang, “BANet: A balanced [25] J. Zeng, J. Hu, and Y. Zhang, “Adaptive traffic signal control with
atrous net improved from SSD for autonomous driving in smart deep recurrent Q-learning,” in Proc. IEEE Intell. Vehicles Symp. (IV),
transportation,” IEEE Sensors J., early access, Oct. 28, 2020, doi: Jun. 2018, pp. 1215–1220.
10.1109/JSEN.2020.3034356. [26] D. Srinivasan, M. C. Choy, and R. L. Cheu, “Neural networks for real-
[3] Z. Bi, L. Yu, H. Gao, P. Zhou, and H. Yao, “Improved VGG model-based time traffic signal control,” IEEE Trans. Intell. Transp. Syst., vol. 7,
efficient traffic sign recognition for safe driving in 5G scenarios,” Int. J. no. 3, pp. 261–272, Sep. 2006.
Mach. Learn. Cybern., pp. 1–12, Aug. 2020, doi: 10.1007/s13042-020- [27] S. El-Tantawy, B. Abdulhai, and H. Abdelgawad, “Design of reinforce-
01185-5. ment learning parameters for seamless application of adaptive traffic
[4] Z. Liu and W. Liang, “Advances in urban traffic signal control,” signal control,” J. Intell. Transp. Syst., vol. 18, no. 3, pp. 227–245,
J. Highway Transp. Res. Develop., vol. 20, no. 6, pp. 121–125, 2003. Jul. 2014.
[5] C. Chen, Z. Liu, S. Wan, J. Luan, and Q. Pei, “Traffic flow prediction
[28] S. S. Mousavi, M. Schukat, and E. Howley, “Traffic light control using
based on deep learning in internet of vehicles,” IEEE Trans. Intell.
deep policy-gradient and value-function-based reinforcement learning,”
Transp. Syst., vol. 22, no. 6, pp. 3776–3789, Jun. 2020.
IET Intell. Transp. Syst., vol. 11, no. 7, pp. 417–423, 2017.
[6] L. Kuang et al., “Traffic volume prediction based on multi-sources GPS
trajectory data by temporal convolutional network,” Mobile Netw. Appl., [29] L. Li, Y. Lv, and F.-Y. Wang, “Traffic signal timing via deep reinforce-
vol. 25, no. 4, pp. 1405–1417, Aug. 2020, doi: 10.1007/s11036-019- ment learning,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 247–254,
01458-6. Apr. 2016.
[7] H. Abuella, F. Miramirkhani, S. Ekin, M. Uysal, and S. Ahmed, [30] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas,
“ViLDAR—Visible light sensing-based speed estimation using vehi- “Dueling network architectures for deep reinforcement learning,” in
cle headlamps,” IEEE Trans. Veh. Technol., vol. 68, no. 11, Proc. Int. Conf. Mach. Learn., 2016, pp. 1995–2003.
pp. 10406–10417, Nov. 2019. [31] J. Jin and X. Ma, “Hierarchical multi-agent control of traffic lights based
[8] C. Chen, B. Liu, S. Wan, P. Qiao, and Q. Pei, “An edge traffic flow on collective learning,” Eng. Appl. Artif. Intell., vol. 68, pp. 236–248,
detection scheme based on deep learning in an intelligent transportation Feb. 2018.
system,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1840–1852, [32] D. A. Vidhate and P. Kulkarni, “Cooperative multi-agent reinforcement
Mar. 2021. learning models (CMRLM) for intelligent traffic control,” in Proc. 1st
[9] J. K. Naufal et al., “A2CPS: A vehicle-centric safety conceptual frame- Int. Conf. Intell. Syst. Inf. Manage. (ICISIM), Oct. 2017, pp. 325–331.
work for autonomous transport systems,” IEEE Trans. Intell. Transp. [33] T. Chu, J. Wang, L. Codecà, and Z. Li, “Multi-agent deep reinforcement
Syst., vol. 19, no. 6, pp. 1925–1939, Jun. 2018. learning for large-scale traffic signal control,” IEEE Trans. Intell. Transp.
[10] D. Zhao et al., “Accelerated evaluation of automated vehicles safety in Syst., vol. 21, no. 3, pp. 1086–1095, Mar. 2020.
lane-change scenarios based on importance sampling techniques,” IEEE [34] M. A. Wiering, “Multi-agent reinforcement learning for traffic light
Trans. Intell. Transp. Syst., vol. 18, no. 3, pp. 595–607, Mar. 2017. control,” in Proc. Int. Conf. Mach. Learn. (ICML), 2000, pp. 1151–1158.
[11] L. Qi, M. Zhou, and W. Luan, “Emergency traffic-light control system [35] K. J. Prabuchandran, A. N. H. Kumar, and S. Bhatnagar, “Multi-agent
design for intersections subject to accidents,” IEEE Trans. Intell. Transp. reinforcement learning for traffic signal control,” in Proc. 17th Int. IEEE
Syst., vol. 17, no. 1, pp. 170–183, Jan. 2016. Conf. Intell. Transp. Syst. (ITSC), Oct. 2014, pp. 2529–2534.
[12] Y. Zhang, Y. Zhang, and R. Su, “Pedestrian-safety-aware traffic light [36] S. El-Tantawy, B. Abdulhai, and H. Abdelgawad, “Multiagent rein-
control strategy for urban traffic congestion alleviation,” IEEE Trans. forcement learning for integrated network of adaptive traffic signal
Intell. Transp. Syst., vol. 22, no. 1, pp. 178–193, Jan. 2021. controllers (MARLIN-ATSC): Methodology and large-scale application
[13] Z. Ouyang, J. Niu, Y. Liu, and M. Guizani, “Deep CNN-based real- on downtown Toronto,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 3,
time traffic light detector for self-driving vehicles,” IEEE Trans. Mobile pp. 1140–1150, Sep. 2013.
Comput., vol. 19, no. 2, pp. 300–313, Feb. 2020. [37] M. Aslani, M. S. Mesgari, and M. Wiering, “Adaptive traffic signal
[14] Y.-T. Chuang, C.-W. Yi, Y.-C. Tseng, C.-S. Nian, and C.-H. Ching, control with actor-critic methods in a real-world traffic network with
“Discovering phase timing information of traffic light systems by stop- different traffic disruption events,” Transp. Res. C, Emerg. Technol.,
go shockwaves,” IEEE Trans. Mobile Comput., vol. 14, no. 1, pp. 58–71, vol. 85, pp. 732–752, Dec. 2017.
Jan. 2015.
[38] T. Wu et al., “Multi-agent deep reinforcement learning for urban traffic
[15] T. Tan, F. Bao, Y. Deng, A. Jin, Q. Dai, and J. Wang, “Cooperative deep
light control in vehicular networks,” IEEE Trans. Veh. Technol., vol. 69,
reinforcement learning for large-scale traffic grid signal control,” IEEE
no. 8, pp. 8243–8256, Aug. 2020.
Trans. Cybern., vol. 50, no. 6, pp. 2687–2700, Jun. 2020.
[16] L. A. Prashanth and S. Bhatnagar, “Reinforcement learning with function [39] J. Ma and F. Wu, “Feudal multi-agent deep reinforcement learning for
approximation for traffic signal control,” IEEE Trans. Intell. Transp. traffic signal control,” in Proc. 19th Int. Conf. Auton. Agents Multiagent
Syst., vol. 12, no. 2, pp. 412–421, Jun. 2011. Syst. (AAMAS), 2020, pp. 816–824.
[17] J. Qiao, N. Yang, and J. Gao, “Two-stage fuzzy logic controller for [40] T. Wang, T. Liang, J. Li, W. Zhang, Y. Zhang, and Y. Lin, “Adaptive
signalized intersection,” IEEE Trans. Syst., Man, Cybern. A, Syst., traffic signal control using distributed MARL and federated learning,”
Humans, vol. 41, no. 1, pp. 178–184, Jan. 2011. in Proc. IEEE 20th Int. Conf. Commun. Technol. (ICCT), Oct. 2020,
[18] H. Wei, G. Zheng, H. Yao, and Z. Li, “IntelliLight: A reinforcement pp. 1242–1248.
learning approach for intelligent traffic light control,” in Proc. 24th [41] C. Chen et al., “Toward a thousand lights: Decentralized deep reinforce-
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2018, ment learning for large-scale traffic signal control,” in Proc. AAAI Conf.
pp. 2496–2505. Artif. Intell., 2020, vol. 34, no. 4, pp. 3414–3421.
[19] X. Liang, X. Du, G. Wang, and Z. Han, “A deep reinforcement learning [42] H. Wei et al., “PressLight: Learning max pressure control to coordinate
network for traffic light cycle control,” IEEE Trans. Veh. Technol., traffic signals in arterial network,” in Proc. 25th ACM SIGKDD Int.
vol. 68, no. 2, pp. 1243–1253, Feb. 2019. Conf. Knowl. Discovery Data Mining, Jul. 2019, pp. 1290–1298.
[20] T. Chu and J. Wang, “Traffic signal control with macroscopic fun- [43] X. Zang, H. Yao, G. Zheng, N. Xu, K. Xu, and Z. Li, “MetaLight:
damental diagrams,” in Proc. Amer. Control Conf. (ACC), Jul. 2015, Value-based meta-reinforcement learning for traffic signal control,” in
pp. 4380–4385. Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 1, pp. 1153–1160.

[44] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement Zhiying Wang is currently pursuing the master’s
learning for multiagent systems: A review of challenges, solutions, degree with the School of Computer Science and
and applications,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3826–3839, Technology, Xidian University. Her research inter-
Sep. 2020. ests include mobile computing, sensor networks, and
[45] S.-M. Hung and S. N. Givigi, “A Q-learning approach to flocking with service computing.
UAVs in a stochastic environment,” IEEE Trans. Cybern., vol. 47, no. 1,
pp. 186–197 Jan. 2017.
[46] L. Cui, X. Wang, and Y. Zhang, “Reinforcement learning-based asymp-
totic cooperative tracking of a class multi-agent dynamic systems using
neural networks,” Neurocomputing, vol. 171, pp. 220–229, Jan. 2016.
[47] S. S. Mousavi, M. Schukat, and E. Howley, “Deep reinforcement learn-
ing: An overview,” in Proc. SAI Intell. Syst. Conf., 2018, pp. 426–440.
[48] W. Wei, R. Yang, H. Gu, W. Zhao, C. Chen, and S. Wan, “Multi-
objective optimization for resource allocation in vehicular cloud com-
puting network,” IEEE Trans. Intell. Transp. Syst., early access,
Aug. 3, 2021, doi: 10.1109/TITS.2021.3091321.
[49] L. Kuang, T. Gong, S. OuYang, H. Gao, and S. Deng, “Offloading
decision methods for multiple users with structured tasks in edge Honghao Gao (Senior Member, IEEE) received the
computing for smart cities,” Future Gener. Comput. Syst., vol. 105, Ph.D. degree in computer science from Shanghai
pp. 717–729, Apr. 2020, doi: 10.1016/j.future.2019.12.039. University, China, in 2012.
[50] H. Gao, W. Huang, and X. Yang, “Applying probabilistic model checking He started his academic career at Shanghai Univer-
to path planning in an intelligent transportation system using mobility sity in 2012. He is currently with the School of Com-
trajectories and their statistical data,” Intell. Automat. Soft Comput., puter Engineering and Science, Shanghai University.
vol. 25, no. 3, pp. 547–559, Jan. 2019. He is also a Professor at Gachon University, South
[51] H. Gao, H. Miao, L. Liu, J. Kai, and K. Zhao, “Automated quantitative Korea. Prior to that, he was a Research Fellow with
verification for service-based system design: A visualization transform the Software Engineering Information Technology
tool perspective,” Int. J. Softw. Eng. Knowl. Eng., vol. 28, no. 10, Institute, Central Michigan University, USA, and an
pp. 1369–1397, 2018. Adjunct Professor at Hangzhou Dianzi University,
Xiaoxian Yang received the Ph.D. degree in man- China. Moreover, he has broad working experience in cooperative industry-
agement science and engineering from Shanghai university research. He is a European Union Institutions-appointed external
University, China, in 2017. She is currently an expert for reviewing and monitoring EU Projects. He has published articles
Assistant Professor with Shanghai Polytechnic Uni- in IEEE T RANSACTIONS ON I NDUSTRIAL I NFORMATICS , IEEE T RANS -
versity, China. She has published more than ACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS , IEEE I NTERNET
20 articles in academic journals, such as IEEE OF T HINGS J OURNAL, IEEE T RANSACTIONS ON N ETWORK S CIENCE AND
T RANSACTIONS ON I NTELLIGENT T RANSPORTA - E NGINEERING, IEEE T RANSACTIONS ON C OGNITIVE C OMMUNICATIONS
TION S YSTEMS , IEEE T RANSACTIONS ON C OM - AND N ETWORKING , IEEE T RANSACTIONS ON G REEN C OMMUNICATIONS
PUTATIONAL S OCIAL S YSTEMS , ACM TOMM, AND N ETWORKING , IEEE T RANSACTIONS ON C OMPUTATIONAL S OCIAL
ACM TOIT, FGCS, MONET, IJSEKE, and COM- S YSTEMS , IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTA -
NET. She has obtained two patents and three reg- TIONAL I NTELLIGENCE, IEEE/ACM T RANSACTIONS ON C OMPUTATIONAL
istered software copyrights in China involving wireless networks, workflow B IOLOGY AND B IOINFORMATICS, ACM TOIT, ACM TOMM, ACM TMIS,
management, and formal verification. Her research interests include business IEEE J OURNAL OF B IOMEDICAL AND H EALTH I NFORMATICS , and IEEE
process management, formal verification, wireless networks, and mobile Network Magazine. His research interests include software formal verification,
health. She has participated in organizing international conferences and the Industrial IoT Networks, vehicular communication, and intelligent medical
workshops, such as CollaborateCom 2018–2020, ChinaCom 2019–2020, and image processing.
Mobicase 2019–2020. She has worked as a Guest Editor of MONET, WINE, Prof. Gao is a fellow of IET, BCS, and EAI, a Senior Member of CCF
and JIT, and served as a Reviewer for IEEE T RANSACTIONS ON I NDUSTRIAL and CAAI, a member of the EPSRC Peer Review Associate College for
I NFORMATICS , IEEE T RANSACTIONS ON I NTELLIGENT T RANSPORTATION U.K. Research and Innovation, U.K., and a Founding Member of the IEEE
S YSTEMS , IEEE T RANSACTIONS ON AUTOMATION S CIENCE AND E NGI - Computer Society Smart Manufacturing Standards Committee. He was a
NEERING , IEEE J OURNAL OF B IOMEDICAL AND H EALTH I NFORMATICS , recipient of the Best Paper Award at IEEE T RANSACTIONS ON I NDUSTRIAL
FGCS, PPNA, Wireless Networks, and Computer Networks. I NFORMATICS in 2020 and EAI CollaborateCom in 2020. He is the Editor-in-
Chief of the International Journal of Intelligent Internet of Things Computing
Yueshen Xu (Member, IEEE) received the Ph.D. (IJIITC), an Editor of Wireless Networks (WINE) and IET Wireless Sensor
degree from Zhejiang University. He was a Systems (IET WSS), and an Associate Editor of IEEE T RANSACTIONS ON
co-trained Ph.D. Student at the University of Illinois I NTELLIGENT T RANSPORTATION S YSTEMS (IEEE T-ITS), IET Intelligent
at Chicago. He is currently an Associate Professor Transport Systems (IET ITS), IET Software, the International Journal of
with the School of Computer Science and Technol- Communication Systems (IJCS), the Journal of Internet Technology (JIT), and
ogy, Xidian University. He has published more than Engineering Reports (EngReports).
40 papers in a series of international conference pro-
ceedings and journals. His research interests include
mobile computing, the Internet of Things Technol-
ogy, edge computing, and recommender systems.
He is a Reviewer and a PC Member of many journals
and conferences, such as IEEE ICDCS, Mobile Networks and Applications,
Neurocomputing, and Knowledge-Based Systems.
Li Kuang received the Ph.D. degree in computer sci- Xuejie Wang is currently pursuing the M.S. degree
ence from Zhejiang University, China, in 2009. She in computer science with the School of Com-
is currently a Professor with the School of Computer puter Engineering and Science, Shanghai University,
Science and Engineering, Central South University. China. His research interests include reinforcement
Her research interests include service computing, learning and edge computing.
mobile computing, and privacy preservation.

Authorized licensed use limited to: SASI INSTITUTE OF TECHNOLOGY & ENGINEERING. Downloaded on October 23,2024 at 05:34:51 UTC from IEEE Xplore. Restrictions apply.

BSBRES801 Initiate and Lead Applied Research
No ratings yet
BSBRES801 Initiate and Lead Applied Research
64 pages
Graph-Based Cooperation Multi-Agent Reinforcement Learning For Intelligent Traffic Signal Control
No ratings yet
Graph-Based Cooperation Multi-Agent Reinforcement Learning For Intelligent Traffic Signal Control
13 pages
Multi Agent Learning Automata For Online Adaptive Control of Large Scale Traffic Signal Systems
No ratings yet
Multi Agent Learning Automata For Online Adaptive Control of Large Scale Traffic Signal Systems
6 pages
Traffic Signal Control A Double Q-Learning Approac
No ratings yet
Traffic Signal Control A Double Q-Learning Approac
5 pages
IEEE 2023 Cooperative Control
No ratings yet
IEEE 2023 Cooperative Control
5 pages
Optimize Traffic Signal Control
No ratings yet
Optimize Traffic Signal Control
11 pages
A Review of Reinforcement Learning Applications in Adaptive Traffic
No ratings yet
A Review of Reinforcement Learning Applications in Adaptive Traffic
17 pages
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
No ratings yet
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
20 pages
Swarm Intelligence Inspired Adaptive Traf C Control For Traf C Networks
No ratings yet
Swarm Intelligence Inspired Adaptive Traf C Control For Traf C Networks
11 pages
1 Two-Layer Coordinated Reinforcement Learning For Traffic Signal Control in Traffic Network
No ratings yet
1 Two-Layer Coordinated Reinforcement Learning For Traffic Signal Control in Traffic Network
12 pages
Hamsa Seminar Report
No ratings yet
Hamsa Seminar Report
18 pages
Applsci 13 02750 v2
No ratings yet
Applsci 13 02750 v2
23 pages
Electronics 10 02363 v2
No ratings yet
Electronics 10 02363 v2
32 pages
Improving Traffic Light Systems Using Deep Q-Networks
No ratings yet
Improving Traffic Light Systems Using Deep Q-Networks
13 pages
Adaptability and Sustainability of Machine Learning Approaches To Traffic Signal Control
No ratings yet
Adaptability and Sustainability of Machine Learning Approaches To Traffic Signal Control
12 pages
Sensors 22 08732 v3
No ratings yet
Sensors 22 08732 v3
21 pages
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
No ratings yet
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
10 pages
Intelligent Traffic Strategy Based On 5G Auto Auto
No ratings yet
Intelligent Traffic Strategy Based On 5G Auto Auto
6 pages
8349-Article Text-48881-2-10-20201129
No ratings yet
8349-Article Text-48881-2-10-20201129
16 pages
Internet of Agents Framework For Connected Vehicles A Case Study On Distributed Traffic Control System
No ratings yet
Internet of Agents Framework For Connected Vehicles A Case Study On Distributed Traffic Control System
7 pages
Rpaper 32
No ratings yet
Rpaper 32
4 pages
4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment
No ratings yet
4 Cycle-Based Signal Timing With Traffic Flow Prediction For Dynamic Environment
17 pages
Smart Control of Trafficlight Using Artificial Intellegence-Gandhi2020
No ratings yet
Smart Control of Trafficlight Using Artificial Intellegence-Gandhi2020
6 pages
Sensors 24 03987 v2
No ratings yet
Sensors 24 03987 v2
19 pages
5 FairLight - Fairness-Aware Autonomous Traffic Signal Control With Hierarchical Action Space
No ratings yet
5 FairLight - Fairness-Aware Autonomous Traffic Signal Control With Hierarchical Action Space
13 pages
Neural Networks For Real-Time Traffic Signal Control (Srinivasan Et Al., 2006)
No ratings yet
Neural Networks For Real-Time Traffic Signal Control (Srinivasan Et Al., 2006)
12 pages
Control Strategies For Solving The Problem of Traffic Congestion
No ratings yet
Control Strategies For Solving The Problem of Traffic Congestion
7 pages
Actuators 13 00251
No ratings yet
Actuators 13 00251
15 pages
Traffic Control System
No ratings yet
Traffic Control System
8 pages
Overview of Traffic Light Control
No ratings yet
Overview of Traffic Light Control
6 pages
Mathematics 12 02056 v2
No ratings yet
Mathematics 12 02056 v2
24 pages
Fully Distributed Model Predictive Control of Connected Automated Vehicles in Intersections Theory and Vehicle Experiments
No ratings yet
Fully Distributed Model Predictive Control of Connected Automated Vehicles in Intersections Theory and Vehicle Experiments
13 pages
PDF Updated Literature Review Paper PDF
0% (1)
PDF Updated Literature Review Paper PDF
4 pages
Hemavathy Foc
No ratings yet
Hemavathy Foc
30 pages
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
No ratings yet
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
16 pages
1 s2.0 S2542660520300433 Main1 PDF
No ratings yet
1 s2.0 S2542660520300433 Main1 PDF
18 pages
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
No ratings yet
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
7 pages
The Traffic Signal Control Problem For Intersections: A Review
No ratings yet
The Traffic Signal Control Problem For Intersections: A Review
20 pages
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
No ratings yet
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
12 pages
Survey
No ratings yet
Survey
20 pages
CoTV Cooperative Control For Traffic Light Signals and Connected Autonomous Vehicles Using Deep Reinforcement Learning
No ratings yet
CoTV Cooperative Control For Traffic Light Signals and Connected Autonomous Vehicles Using Deep Reinforcement Learning
12 pages
Real-Time Traffic Flow Management Using OpenCV
No ratings yet
Real-Time Traffic Flow Management Using OpenCV
7 pages
IEEE DQL Regional Network
No ratings yet
IEEE DQL Regional Network
5 pages
Automated Traffic Signal Control System Using Timer Based Technology
No ratings yet
Automated Traffic Signal Control System Using Timer Based Technology
9 pages
Taffic Control System
No ratings yet
Taffic Control System
19 pages
A Two-Level Traffic Light Control Strategy For Preventing Incident-Based Urban Traffic Congestion
No ratings yet
A Two-Level Traffic Light Control Strategy For Preventing Incident-Based Urban Traffic Congestion
12 pages
Ijarcce 2024 134162
No ratings yet
Ijarcce 2024 134162
8 pages
SIH2024
No ratings yet
SIH2024
6 pages
Ecolight: Reward Shaping in Deep Reinforcement Learning For Ergonomic Traffic Signal Control
No ratings yet
Ecolight: Reward Shaping in Deep Reinforcement Learning For Ergonomic Traffic Signal Control
5 pages
Batch 8
No ratings yet
Batch 8
26 pages
Smart Traffic Signal Control System Design and Implimentation
No ratings yet
Smart Traffic Signal Control System Design and Implimentation
5 pages
Simulation-Based Traffic Management Model To Minimize The Vehicle Congestion in
No ratings yet
Simulation-Based Traffic Management Model To Minimize The Vehicle Congestion in
6 pages
Q1 Traffic Light Scheduling For Pedestrian-Vehicle Mixed-Flow Networks
No ratings yet
Q1 Traffic Light Scheduling For Pedestrian-Vehicle Mixed-Flow Networks
16 pages
Transportation Research Part C: Jiajie Yu, Pierre-Antoine Laharotte, Yu Han, Ludovic Leclercq
No ratings yet
Transportation Research Part C: Jiajie Yu, Pierre-Antoine Laharotte, Yu Han, Ludovic Leclercq
25 pages
A Comprehensive Regional Traffic Coordination Control Strategy Integrated The Short-Term Traffic Flow Identification and Prediction
No ratings yet
A Comprehensive Regional Traffic Coordination Control Strategy Integrated The Short-Term Traffic Flow Identification and Prediction
13 pages
Expert Systems With Applications: Leilei Kang, Hao Huang, Weike Lu, Lan Liu
No ratings yet
Expert Systems With Applications: Leilei Kang, Hao Huang, Weike Lu, Lan Liu
15 pages
Intelligent Road Traffic Control System For Traffic Congestion - A Perspective
No ratings yet
Intelligent Road Traffic Control System For Traffic Congestion - A Perspective
9 pages
IoT-Based Traffic Control System For Emergency Vehicles
No ratings yet
IoT-Based Traffic Control System For Emergency Vehicles
5 pages
Intelligent Road Traffic Control System For Traffi
No ratings yet
Intelligent Road Traffic Control System For Traffi
9 pages
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
No ratings yet
Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis On Reinforcing Successful Experiences
8 pages
Drugs and Society 12th Edition by Glen R Hanson Unlocked Test Bank
No ratings yet
Drugs and Society 12th Edition by Glen R Hanson Unlocked Test Bank
309 pages
MGT 2 HBO Activity 1
No ratings yet
MGT 2 HBO Activity 1
26 pages
Dessler - hrm14 - PPT - 03 N 04
No ratings yet
Dessler - hrm14 - PPT - 03 N 04
70 pages
MYP 2 Design Sem End-2
No ratings yet
MYP 2 Design Sem End-2
25 pages
LCC Based Bridge Management System For Existing Concrete Bridges in Japan
No ratings yet
LCC Based Bridge Management System For Existing Concrete Bridges in Japan
6 pages
Testbank For Beginnings Beyond Foundations in Early Childhood Education 10th Edition Gordon
No ratings yet
Testbank For Beginnings Beyond Foundations in Early Childhood Education 10th Edition Gordon
17 pages
Shannon-Weaver's Communication Model (1948)
No ratings yet
Shannon-Weaver's Communication Model (1948)
7 pages
Student Guide For News
No ratings yet
Student Guide For News
3 pages
Chapter 1 Introduction To IAS
No ratings yet
Chapter 1 Introduction To IAS
54 pages
Verb Use in Different Domains: Cognitive Domain Level Description Action Verbs Describing Learning Outcomes
No ratings yet
Verb Use in Different Domains: Cognitive Domain Level Description Action Verbs Describing Learning Outcomes
3 pages
Doing The Right Thing:: IMA Updates Ethics Guidance
No ratings yet
Doing The Right Thing:: IMA Updates Ethics Guidance
7 pages
Coughlan, P., & Graham, A. (2009) - Embedding A Threshold Concept in Teaching and Learning of Product
No ratings yet
Coughlan, P., & Graham, A. (2009) - Embedding A Threshold Concept in Teaching and Learning of Product
9 pages
Sika - 102 Waterplug
No ratings yet
Sika - 102 Waterplug
2 pages
Teaching Guide in MIL (WEEK 1)
100% (1)
Teaching Guide in MIL (WEEK 1)
3 pages
Gal 2019
No ratings yet
Gal 2019
15 pages
Record of Skills Development
No ratings yet
Record of Skills Development
7 pages
Conflict Resolution Model
No ratings yet
Conflict Resolution Model
9 pages
De Cuong Thi Cuoi hk2 - Friend Plus 8
No ratings yet
De Cuong Thi Cuoi hk2 - Friend Plus 8
10 pages
9-10 Standards Unpacked
No ratings yet
9-10 Standards Unpacked
41 pages
Complete Bundle Grays Anatomy Review 3rd Edition Maris Loukas HQ File
No ratings yet
Complete Bundle Grays Anatomy Review 3rd Edition Maris Loukas HQ File
405 pages
BSAIS-SBA 313 Information Sheet 2
No ratings yet
BSAIS-SBA 313 Information Sheet 2
12 pages
Architectural Engineering and Design Management
No ratings yet
Architectural Engineering and Design Management
78 pages
Commercial Management Construction 1st Edition PGguidance 2016
No ratings yet
Commercial Management Construction 1st Edition PGguidance 2016
62 pages
About Grigori Grabovoi
No ratings yet
About Grigori Grabovoi
2 pages
The Addiction Progress Notes Planner PracticePlanners Official Test Bank
No ratings yet
The Addiction Progress Notes Planner PracticePlanners Official Test Bank
404 pages
4 Goals of Literature Review
100% (2)
4 Goals of Literature Review
6 pages
Investigative and Interpretative Journalism An Insight Into Critical and Review Writing
No ratings yet
Investigative and Interpretative Journalism An Insight Into Critical and Review Writing
313 pages
BIZ204+Assessment+3 Recommendations+Report
No ratings yet
BIZ204+Assessment+3 Recommendations+Report
6 pages
Jurnal Increasing Learning Achievement Through The Application of Inquiry Methods in Entrepreneurship Courses
No ratings yet
Jurnal Increasing Learning Achievement Through The Application of Inquiry Methods in Entrepreneurship Courses
6 pages

An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things

Uploaded by

An Information Fusion Approach To Intelligent Traffic Signal Control Using The Joint Methods of Multiagent Reinforcement Learning and Artificial Intelligence of Things

Uploaded by

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO.

7, JULY 2022 9335

An Information Fusion Approach to Intelligent

one intersection in a traffic network. El-Tantawy et al. [27]

We can use the above reward formula for the computa-

the multiagent learning process for multiple intersections.

Algorithm 2 Agent Learning for Multiple Intersections

Fig. 6. Integrated example of SUMO.

Fig. 7. SUMO working process.

Fig. 10. The fixed-timing-based path.

Figure 10 visualizes the fastest path computed under the fixed-

VI. C ONCLUSION AND F UTURE W ORK

You might also like