0% found this document useful (0 votes)
41 views9 pages

2 Intelligent Traffic Light Systems Using Edge Flow Predictions

Uploaded by

1057258646
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views9 pages

2 Intelligent Traffic Light Systems Using Edge Flow Predictions

Uploaded by

1057258646
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Computer Standards & Interfaces 87 (2024) 103771

Contents lists available at ScienceDirect

Computer Standards & Interfaces


journal homepage: www.elsevier.com/locate/csi

Intelligent traffic light systems using edge flow predictions


Adam Rizvi Thahir a , Mustafa Coşkun b ,∗, Sultan Kübra Kılıç a , Vehbi Cagri Gungor a
a
Electrical & Computer Engineering, Abdullah Gul University, Kayseri, Turkey
b
Artificial Intelligence and Data Engineering, Ankara University, Ankara, Turkey

ARTICLE INFO ABSTRACT

Keywords: In this paper, we propose a novel graph-based semi-supervised learning approach for traffic light management
Artificial intelligence in multiple intersections. Specifically, the basic premise behind our paper is that if we know some of the
Reinforcement learning occupied roads and predict which roads will be congested, we can dynamically change traffic lights at
Traffic flow
the intersections that are connected to the roads anticipated to be congested. Comparative performance
Congestion
evaluations show that the proposed approach can produce comparable average vehicle waiting time and
reduce the training/learning time of learning adequate traffic light configurations for all intersections within
a few seconds, while a deep learning-based approach can be trained in a few days for learning similar light
configurations.

1. Introduction in the context of TMS is depicted in Fig. 1. More specifically, A TMS


consists of three main parts:
With the advent of mega-cities, the traffic congestion problem has
been ever-increasing and introducing additional problems, such as time • Environment: composed of traffic light phases and traffic condi-
consumption, air pollution, and economic burdens. For example, an tions, including congestion and traffic outward flow.
average driver in the United States had lost 99 h in traffic congestion • State: a feature representation of the environment, usually a
in 2019 with an estimated cost of $ 1,377 [1]. Moreover, the traffic grid representation of an intersection to a certain extent of the
congestion associated with time consumption and cost can be more intersection.
severe in large cities, such as in Boston, up to 149 h with an estimated • Agent: creates a model which is able to make a decision about
cost of $ 2,205 [1]. Thus, releasing traffic congestion by utilizing the light configuration based on the input that is sent from the
‘‘adaptable’’ traffic management systems has been one of the central environment.
research interests.
In today’s traffic management systems, most of the cities operate Concretely, following the decision-making process of the agent, the
their traffic lights in a predefined fixed time fashion [2,3] or design environment returns a reward value back to the state. Then, this reward
manual hand-crafted traffic rules by observing traffic in real-time [4]. is used as an additional input to the agent, the model updates its
However, predefined rule-based traffic management systems are vul- parameters to guide the agent and to make more optimal decisions.
nerable to changes that are inevitable due to the enlargement of the However, RL-based approaches can suffer from the large state space
cities and the increase in the number of vehicles on the roads. As a problem [9]. Thus, more recent AI-based approaches utilize function
result, these hand-crafted rule-based systems are often deemed to be approximation of RL through deep neural network learning [5,9–11]
limited [5]. to circumvent the large state space problem.
Inspired by the fascinating developments in Artificial Intelligence Despite the effectiveness of the existing deep learning-based TMS [5,
(AI), recent research attempts based on AI approaches aim at solving 9–11], when the number of intersections increases, the training time of
traffic management problems with the hope of learning adaptive traffic the neural networks to find the optimum parameters for traffic light
light configurations for intersections. To this end, the earlier studies
adjustment can be computationally costly. To alleviate this problem, in
have proposed to use traditional AI approaches, such as reinforcement
this paper, we propose to use graph-based semi-supervised learning for
learning (RL) algorithms [6–8]. In this context, the traffic management
edge label prediction [12]. More specifically, we treat the TMS problem
systems (TMSs) are stated as an RL problem by treating each intersec-
as a graph problem where intersections are represented by nodes and
tion as an intelligent agent. The overall flow of the usage of the RL

∗ Corresponding author.
E-mail addresses: [email protected] (A.R. Thahir), [email protected], [email protected] (M. Coşkun),
[email protected] (S.K. Kılıç), [email protected] (V.C. Gungor).

https://doi.org/10.1016/j.csi.2023.103771
Received 26 February 2023; Received in revised form 5 June 2023; Accepted 29 June 2023
Available online 4 July 2023
0920-5489/© 2023 Elsevier B.V. All rights reserved.
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

training/learning time of learning adequate traffic light configurations


for all intersections within a few seconds, while deep learning-based
approach can be trained in a few days for learning similar light con-
figurations. Fig. 2 visualizes the general workflow of the proposed
approach. Our contributions can be summarized as follows:

• We develop a system that is able to predict traffic flow between


multiple intersections using Edge-based Semi-Supervised Active
Learning.
• Our approach learns traffic light configuration in a few seconds
while deep learning-based approaches can learn similar light
Fig. 1. Visualization on how a generic Reinforcement Learning algorithm would work
configurations in a few days.
on a traffic light environment. • By assessing the average waiting times for vehicles in a traffic
environment, we observe that, in a few seconds, our method can
yield better than that of heuristic approaches and comparable
results to that of deep learning approaches, which last a few days.

The rest of the paper is organized as follows. Section 2 discusses


related work and other approaches. Section 3 formally defines the
problem. Section 4 explains the methodology. Section 5 explains the
results obtained from the simulations conducted. Finally, Section 6
concludes the study.

2. Related work

We can broadly divide traffic management system development


into three categories: (i) data collection, (ii) data processing/cleaning,
and (iii) optimization. The data collection and processing are two
consensus processes in the traffic management system, as the gathered
information in these two steps is used to understand vehicle queues
and occupancy at the intersections. Studies [19–23] focus on areas
related to communication with intelligent traffic management systems
at different network layers as well as hardware levels. The study [19]
employs a system with a number of layers within the system in order
to ensure data reliability as well as security of the data collection
and processing phases. The works of [20,21] propose a system where
hardware devices are used in order to store and exchange information
with a given intelligent traffic light system. In the literature, there are
also data collection techniques, including image processing [24,25] and
inductive loops [26,27], to name a few.
The final phase, optimization of the traffic management system,
aims at solving the traffic congestion problems by taking the number
Fig. 2. Flowchart representation breaking down the steps on how the proposed of vehicles, and queue sizes into account. Thus, the research developed
algorithm is set to work. for optimizing traffic management systems, which is also the interest
of this paper, can be viewed in two broad categories: (ii) Conventional
Traffic Light Control Systems and (iii) Reinforcement Learning (Deep
the roads connecting them are represented as edges in the graph. We Learning) based Traffic Light Control Systems.
then translate the TMS problem to a flow prediction problem on a
graph. Ideally, if we know that some of the roads are congested by the
2.1. Conventional traffic control systems
load of the traffic, we aim at predicting which roads would be most
affected by this flow of congestion. To this end, we use an edge label
propagation algorithm, where an edge label of 1 indicates that the road Earlier approaches for traffic light control systems aim to optimize
is congested; 0 otherwise. [12]. waiting time in traffic intersections [2,16]. These approaches are based
To assess the performance of our proposed approach against state- on the observation of current traffic flow and they set manual rules
of-of-the-art deep learning approaches as well as heuristic-based ap- for traffic lights accordingly and then they use these sets of rules
proaches, we conduct our experiments on Simulation of Urban MObility for intersections. Similar systems have been conducted in the studies
(SUMO) platform [13] by selecting average vehicle waiting time as of [2,3,16]. Another conventional approach is to use real-time traffic
an evaluation metric. Specifically, we compare our proposed approach data when processing the optimal times for a given traffic phase [4,28].
against two deep learning-based algorithms A2C [14] and PPO [15] as Although these types of traffic management systems can offer certain
well as two heuristic-based approaches, based on vehicle occupancy [5] merit in the management of traffic lights, the calculated optimal times
and scoring. Comparative performance evaluations on various traf- could become redundant due to out-of-date traffic data, or would not
fic intersection configurations show that our approach can produce be able to reach the optimal times due to not having any insights on
comparable average vehicle waiting time and drastically reduce the future traffic data.

2
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

Table 1
Comparison table.
Simulation Predict flows Real time data Historical data MultiIntersection
IntelliLight[11] ✓ 𝑋 ✓(online) ✓(offline) 𝑋
Francois Dion et al [16] 𝑋 𝑋 𝑋 ✓ 𝑋
Alan J Miller[2] 𝑋 𝑋 𝑋 ✓ 𝑋
Brian L, Smith [17] 𝑋 ✓ 𝑋 ✓ 𝑋
FRAP [18] ✓ 𝑋 ✓ 𝑋 ✓
CoLight [10] ✓ 𝑋 ✓ 𝑋 ✓
Our Approach ✓ ✓ ✓ ✓ ✓

2.2. Reinforcement learning for traffic light control Deep Reinforcement Learning (A2C) [14] and Proximal Policy opti-
mization Algorithm [15], as well as two heuristic-based approaches
As the traffic management system problem is a time-dependent [36], we conduct our experiments on Simulation of Urban Mobility
dynamic problem, with the development of AI-based algorithms, re- (SUMO) [13] platform by selecting average vehicle waiting time as
cently Reinforcement Learning/Deep Learning approaches [5], have an evaluation metric. The first heuristic-based approach simply de-
been employed to dynamically adjust the traffic lights by learning cides and prioritizes the next traffic light phase based on the current
traffic behaviors. vehicle occupancy at a given intersection. The other heuristic-based
Previous studies relate traffic data with the state space of the approach is based on a local scoring mechanism [36] and changes the
environment. [6,7,29–31] use the number of vehicles queued at a given traffic phase and priority based on local scores. For a better visualiza-
road as their state space. [32,33] use traffic flow obtained by sensors tion, Table 1 shows a comparison table showing differences between
exiting a given intersection. Action spaces are defined based on signal our approach and the existing recent studies focusing on traffic light
phases, which can include all available signal phases as seen in [33,34] management systems.
or only the relevant green signal phase, as seen in [32] and [30].
3. Methods
Reward functions of the environments are typically related with how
to direct change could be observed from actions taken, [33,34] define In this section, we first define the ‘‘intelligent’’ traffic manage-
their reward function based on the change in delay for the vehicles in ment system problem. We then give a brief background on state-
the environment, similarly, [30,32] define their reward function based of-the-art approaches to solve this problem, i.e., heuristic and deep
on the change in the number of queued vehicles at given roads in the learning-based approaches, respectively. We finally present our graph
intersection. edge-based semi-supervised learning approach.
The agents in a reinforcement learning solution act on the en- Problem Definition: The aim of this paper is to find an optimum
vironment based on the actions defined in the action space of the traffic light configuration in multi-intersections so that traffic conges-
environment, the observation of the action taken can be viewed from tion can be prevented as much as possible. Before we delve into formal
the state space, and the reward for each action taken is calculated from approaches to solve this traffic management system (TMS) problem, let
the reward function. The Deep Q learning and policy gradient approach us first give some brief notations that will be used throughout the paper.
was studied in [5], which proposes using the reward function as traffic We define the TMS environment as , which consists of multiple
flow and traffic delay of vehicles in the network. Moreover, other deep intersections. Here, in this environment , each intersection is assumed
learning-based approaches have been studied in the Stable Baselines to have an ‘‘intelligent’’ traffic light agent 𝑇 𝐿𝑖 . 𝑇 𝐿𝑖 and 𝑇 𝐿𝑗 (for
project [35], using the Asynchronous method for Deep Reinforcement i ≠ j) in which ‘‘Red’’ and ‘‘Green’’ are used as the default phases
Learning (A2C) [14] and Proximal Policy optimization Algorithm [15]. settings. Moreover, when a light changes from one phase to another,
Comparing the traditional methods [7,32] in the context of traffic an additional ‘‘yellow’’ phase is set for a given period of time (default
management systems, deep learning-based approaches [5,10,11,35] time is 7s).
have been repeatedly shown to be more effective since these methods
can learn dynamic nature of the traffic within trail-and-error based 3.1. State-of-the-art-approaches
approaches by updating the neural network parameters. However, in
3.1.1. Occupancy based algorithm
these deep learning algorithms, training the neural network parameters
The occupancy-based algorithm [37] is one of the heuristic algo-
is computationally expensive as the number of intersections increases.
rithms that tries to solve the TMS problem by relying on the basic idea
In this paper, we propose an alternative and efficient algorithm by
of using the predefined historical traffic data to determine occupancies
stating the TMS as a flow prediction problem. Our approach renders
of the roads. More specifically, in this algorithm, the occupancies of all
predicting the outward flow of vehicles from a given intersection and
roads with incoming traffic onto the intersection are taken into account
enables the TMS system to identify vehicle congestion that would
and this data is then used to prioritize traffic phases, i.e if a road poses
possibly occur at a following intersection. Thus, our method prepares
more occupancy, then its green phase duration is incremented up to a
the intersection that is anticipated to be congested. More specifically, certain threshold. This prioritization formula can be given as follows:
our approach in the multi-intersection setting can provide information
on where the traffic flow takes place, pinpointing the intersections 𝐷 = 𝑇 max ∗ (𝑁 ∗ (𝑊𝑖 ∕𝑊𝑇 )) (1)
that are going to be congested in the order of seconds (efficiently) where 𝐷 is the duration, which is also set to a certain limit to prevent
based on semi-supervised active learning on edges [12]. Here, the blocking the other roads, 𝑇 max is the total time, 𝑁 is the number of
main difference of our proposed approach over other deep learning or incoming roads, 𝑊𝑖 is the occupancy value for incoming road 𝑖, which
traditional methods is that the flow of traffic can be anticipated be- is heuristically set to a certain value by observing the historical traffic
forehand. Hence, the intersections, that are predicted to be congested, data, and 𝑊𝑇 is the sum of all the road occupancy coming into the
can adjust their traffic lights accordingly. More importantly, since we intersection.
treat each intersection as a node and a large city can have traffic light Despite the effectiveness and simpleness of the occupancy algo-
intersections in the range of thousands, our graph-based approach can rithm, this algorithm sets its parameters based on historical data which
efficiently process these nodes (intersections). Thus, our approach can is prone to change, i.e some roads may have higher occupancy values
scale to a city-level traffic management system. in the past, however, their occupancy value may decrease in the future.
To assess the performance of our proposed approach against state- Thus, this algorithm introduces a certain bias towards the historically
of-the-art deep learning approaches, such as Asynchronous method for occupied roads.

3
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

3.1.2. Scoring algorithm 3.2. Our proposed graph-semi supervised learning based approach
The scoring algorithm is another heuristic algorithm that is pro-
posed by Sébastien Faye et al. [36]. The basic premise behind the The basic idea behind our proposed approach is to predict which
scoring algorithm is to score each incoming road in an intersection and intersections are going to be occupied by vehicles by representing the
assign priorities based on these scores. Formally, these scores of each TMS problem as a graph problem; and predicting the edge flow on
incoming road can be computed as follows: this graph. More specifically, in our graph-based modeling, we treat
( ) ( ) each intersection as well as exit point as a node; and we treat the
𝑁 (𝑠,𝑑) 𝑇𝐹(𝑠,𝑑)
𝐿𝑆 = 𝛼 ⋅ ∑ +𝛽⋅ (2) roads that are connecting these nodes as edges. More concretely, if
(𝑎,𝑏)
{𝑎,𝑏}∈𝐷 𝑁 𝜎{𝑎,𝑏}∈𝐷 𝑇𝐹(𝑎,𝑏) we consider the 17 intersections network shown in Fig. 5, we can
observe that there are an additional 12 sets of nodes that relate to
where 𝛼 and 𝛽 are user-defined weights to optimize average vehicle
different vehicle spawn/despawn points and exit points. Thus, in our
time and starvation, we use default values as 1 for both, (𝑠, 𝑑) is a
graph-based representation, the graph has 29 nodes in total, which are
possible movement going from a direction 𝑠 to another direction 𝑑
marked with red dots in the figure. Any road that connects two distinct
within a set of all possible directions, 𝑁 (𝑠,𝑑) is the weighted sum of
intersections is considered to be an edge between these nodes. In order
the number of vehicles present on the coming lanes that compose
to predict the traffic flow from one intersection (node) to another
movement, and 𝑇𝐹(𝑠,𝑑) is the last time since the incoming road had a
node connected via roads (edges), we use the methodology proposed
green phase. In the case that there are no vehicles present on any of
in [12], which is an edge-based semi-supervised learning algorithm.
the roads, 𝐿𝑆 would be set to 0.
This way, we aim at determining the green phase time for upcoming
Finally, once the scores are obtained, the system calculates priority
intersections. With the help of edge-based semi-supervised learning,
duration using the same equation as equation (1). Replacing occupancy
each intersection’s predicted inwards flow allows the intersection to
with local score gives us:
optimize the total phase cycle according to the predicted traffic flows.
𝐷 = 𝑇 max ∗ (𝑁 ∗ (𝑆𝑖 ∕𝑆𝑇 )) (3) In addition, the outward prediction of one intersection would be used
as the inward prediction for the following intersections.
where 𝐷 is the duration, 𝑇 max is the total time, 𝑁 is the number of
Mathematically, we have a set of vertices (traffic intersections and
incoming roads. 𝑆𝑖 and 𝑆𝑇 represents the local score for incoming road
exit points) 𝑉 , a set of edges that connect the edges (roads) 𝜀, and
𝑖 and the sum of all scores, respectively.
a labeled set of edge flows (traffic flow) 𝜀 ⊆ 𝜀. The goal of the
algorithm is to predict the unlabeled edge flows 𝜀𝑈 ≡ 𝜀∖𝜀. Here,
3.1.3. Reinforcement learning algorithms
it is important to note that our aim is to conduct edge based semi-
In this paper, in addition to the heuristic-based approaches defined
supervised learning approach, rather than a well-known node-based
as above, we also employ two deep-learning algorithms for the TMS
semi-supervised learning method [38].
problem. More specifically, we use PPO [15] and A2C [14] trained
In Graph-based Semi-Supervised Learning for vertices [38], a
using the Stable Baselines library [35]. While the classical DQN derives
graph is constructed with nodes and edges, where nodes are specified
the policy by learning the Q-value function, these two policy-based
by labeled 𝑉 𝐿 and unlabeled samples 𝑉 𝑈 . Edges would be based on
algorithms improve the policy directly.
the similarities among the samples 𝑉 . The goal of the algorithm would
Proximal Policy Optimization (PPO) [15]: As given in formula (4),
be to assign labels for the unlabeled samples 𝑉 𝑈 based on the existing
maximizing TRPO algorithm can lead to frequent parameter updates,
data, such that the assigned labels vary smoothly across the neighboring
hence instability. PPO is a policy gradient method, which simplifies
nodes. The notion of smoothness can be defined by the log function as
TRPO by using a clipped surrogate objective. It forces 𝑟(𝜃) to fit into
follows:
a smaller interval [1 − 𝜖, 1 + 𝜖] using the 𝑐𝑙𝑖𝑝(𝑟(𝜃), 1 − 𝜖, 1 + 𝜖) func- ∑
tion. In addition, PPO encourages exploration using an error term and ‖𝐵 𝑇 𝑦‖2 = (𝑦𝑖 − 𝑦𝑗 )2 (7)
an entropy term shown in the formula (5) where 𝜖, 𝑐1, and 𝑐2 are (𝑖,𝑗)∈𝜖
hyper-parameters. where 𝑦 represents the vector containing vertex labels, and 𝐵 ∈ 𝐑𝑛𝑥𝑚
[ ] represents the incidence matrix of the network. This loss function can
𝐽 TRPO (𝜃) = E 𝑟(𝜃)𝐴̂ 𝜃old (𝑠, 𝑎) (4)
be written as ‖𝐵 𝑇 𝑦‖2 = 𝑦𝑇 𝐿𝑦 in terms of the graph Laplacian 𝐿 = 𝐵𝐵 𝑇 .
[ ( )2 ( )] We can now obtain labels for the unknown nodes by minimizing
𝐽 CLIP’ (𝜃) = E 𝐽 CLIP (𝜃) − 𝑐1 𝑉𝜃 (𝑠) − 𝑉target + 𝑐2 𝐻 𝑠, 𝜋𝜃 (.) (5) the quadratic form 𝑦𝑇 𝐿𝑦 with respect to 𝑦 while keeping the labeled
Advantage Actor Critic (A2C) [14]: Actor critic methods use two vertices fixed.
separate networks: (i) actor and (ii) critic. The critic network learns In the case of Graph-based Semi-Supervised learning for edge
the value function while the actor network is trained to update the flows [12], we can represent the edge flows in the network with the
policy according to the value function learned by the critic network. vector 𝑓 . If we account only for the 𝑛𝑒𝑡𝑓 𝑙𝑜𝑤 along an edge, we obtain
A2C makes use of the advantage function (6) in order to calculate the 𝑓𝑟 > 0 when the flow orientation of edge 𝑟 aligns with its reference
advantage of a specific action over average general action at a given orientation and 𝑓𝑟 < 0 in all other cases. The true edge flow in the
network is denoted by 𝑓̂. The divergence of a vertex can be found by
state as follows:
calculating the sum of outgoing flows minus the incoming flows at that
𝐴𝜋 (𝑠, 𝑎) = 𝑄𝜋 (𝑠, 𝑎) − 𝑉 𝜋 (𝑠) (6) vertex. This can be shown as follows:
∑ ∑
For A2C [14] and PPO [15] algorithms, the reward is calculated (𝐵𝑓 )𝑖 = 𝑓𝑟 − 𝑓𝑟 . (8)
based on the difference in waiting for the time of vehicles and the 𝜀𝑟 ∈𝜀∶𝜀𝑟 ≡(𝑖,𝑗),𝑖<𝑗 𝜀𝑟 ∈𝜀∶𝜀𝑟 ≡(𝑗,𝑖),𝑗<𝑖
respective road movement as follows: Additionally, we can also create a loss function for edge flows which
⎧ (∑ ∑ ) enforces a notion of flow conservation. This can be expressed using the
⎪−1 ∗ ( ∑𝑇 +∑ 𝑉) N == 0 sum-of-squares vertex divergence:
𝑅=⎨ ( 𝑇+ 𝑉 )
⎪−1 ∗ N >= 1

𝑁 ‖𝐵𝑓 ‖2 = 𝑓 2 𝐵 𝑇 𝐵𝑓 = 𝑓 𝑇 𝐿𝑒 𝑓 (9)

where 𝑅 is the calculated reward, 𝑇 is an array of waiting times for In the above equation; 𝐿𝑒 = 𝐵𝑇 𝐵
is the edge Laplacian matrix.
different vehicles, 𝑉 is the number of new vehicles compared to the In order to obtain the set of labeled edges that are most helpful in
previous iteration, and 𝑁 is the number of vehicles that moved to a determining the overall edge flows in the network, we use Active Semi-
different road. Supervised Learning. Similar studies for vertex-based semi-supervised

4
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

learning can be found in [39,40]. Using active semi-supervision in our


problem, we are able to determine which set of roads is most vital in
decision-making. In terms of efficiency, we could use this information
to deploy sensors only to this set of roads to gain optimal results and
thus, reduce the algorithm costs and even the real-world deployment
budget.
The approach we use is called Rank-revealing QR (RRQR) which
is a heuristic method for the optimal column subset selection, which is
an approximation to well-known maximum submatrix volume problem.
This problem is an NP-hard problem [41]. The method proposes the
idea that in order to select 𝜀𝐿 , we choose 𝑚𝐿 rows from 𝑉0 that would
maximize the smallest singular value of the resulting submatrix. The
RRQR heuristic computes:

𝑉𝐶𝑇 𝛱 = 𝑄[𝑅1 𝑅2 ]. (10)

where 𝛱 is a permutation matrix that keeps 𝑅1 well conditioned.


Fig. 3. Single Intersection network design.
Additionally, the first 𝑚𝐿 columns of 𝛱 is chosen by the resulting edge
set 𝜀𝐿 and edge indicated by the column permutation within 𝛱.
In summary, we represent the TMS problem as a graph problem and
aim at determining which intersections will be congested beforehand
so that we can set more green times for those intersections that are
predicted to be occupied via vehicles with the use of edge-based
semi-supervised algorithm [12].

4. Results

In this section, we first give a brief background on the simulation


environment that is used for evaluating the performance of algorithms
Fig. 4. Network design for simulation environment with 5 traffic light intersections.
used in this paper. We then explain the hyper-parameters in experi-
mental setups. Finally, we give a comparative performance evaluation
of the proposed algorithm against two deep learning-based algorithms,
i.e., A2C [14] and PPO [15], as well as against two heuristic-based 4.2. Hyper-parameters
approaches [36] in terms of waiting time.
Network model and Datasets: We evaluate our proposed method
on a synthetically generated network model using SUMO’s NetEdit tool.
4.1. Simulation environment: Simulation of urban mobility (SUMO) plat-
The vehicles were generated using a combination of the RandomTrips
form
tool and the Duarouter tool. We run the simulations with the following
hyper-parameters:
SUMO is an open-source traffic simulation suite that allows us to
design, model, and simulate traffic networks [13]. During a simulation, • Number of steps: The number of steps determines how long a
SUMO provides the necessary tools to interact with the simulation. specific simulation runs. Here, ideally, we want all the generated
We invoke various tools of SUMO, such as the NetEdit tool used vehicles to enter the simulation before the simulation ends. The
to create and model various traffic networks. the Random Trips tool is time at which vehicles leave the simulation varies depending
used to create random vehicle trips, and the Duarouter tools allow us on which algorithms are used for traffic phase selection. In this
to generate vehicle routes based on a previously created network. paper, the number of step hyper-parameter is set to 1600.
After obtaining a completed network model via the aforementioned • Number of simulations: The number of simulations determines
tools of SUMO, we simulate the network using the graphical and how many times we run each algorithm. We run the algorithm
command line interface tools. Interaction with an ongoing simulation multiple times and then take the average results obtained to fairly
evaluate the performance of the algorithms. Thus, we repeat the
is done using Traffic Control Interface (TraCI).
simulations 10 times for each algorithm and report the average
TraCI allows us to obtain information directly from an ongoing
of the algorithms’ performance.
simulation. For the scope of this study, we extract network structure
• Traffic Phases: Each intersection in the network has predefined
at the start of a simulation, and the occupancy of the relevant roads,
traffic phases. These phases are generated by the SUMO. We do
each time a traffic phase algorithm is run.
not have any custom traffic phases, as our proposed method aims
In TraCI, vehicle occupancies are defined as the number of vehicles
waiting time optimization only.
on a given road (edge). A given edge may have different number of
lanes. The number of lanes within an edge would affect the total queue A sample network structure of a classic single intersection is shown
in the edge, but the vehicle occupancy remains the same. The size of the in Fig. 3. Given that our approach focuses on multiple intersections,
queue within an intersection can cause delays for the vehicles within we integrate the proposed idea into different number of intersections
the edge due to delays caused by initial movement from a stopping and then connect them. An example of the final network for 5, 17 and
stage of the vehicle. In other words, a continuous flow of vehicles would 20 intersection networks can be seen in Fig. 4, 5 and 6. Additionally,
be an ideal approach to reduce such delays. In this paper, all four-way not all intersections within any of the multi-intersection networks are
intersections are designed as seen in Fig. 3, and three-way intersections the same, some of them may only intersect with three other roads
are designed in a similar manner. (unlike the classic scenario where four other roads are present in an

5
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

Fig. 5. Network design for simulation environment with 17 traffic light intersections.

Fig. 7. Average wait time obtained when simulating network models with 5, 17 and
20 intersections.

Fig. 6. Network design for simulation environment with 20 traffic light intersections.

intersection). This setup renders us a more versatile application of


real-world road networks.
Preprocessing: In order to integrate the Flow Prediction algorithm,
we pre-process some of the data obtained from the SUMO environment
using TraCI.
At the start of a simulation using the Flow Prediction algorithm [12],
we create additional files related to the network; a file to note the Fig. 8. Maximum wait time obtained when simulating network models with 5, 17 and
cartesian coordinates of every intersection in the network, and a 20 intersections.
secondary file noting all connections between intersections.
While running the simulation, we also use TraCI to obtain occu-
pancy and occupancy change information on all the relevant roads.
Each time we call the Flow Prediction algorithm, another file was
generated noting all the occupancy change data with regard to the
connection data previously generated.
Our experiments are conducted using Python3. The flow prediction
algorithm is implemented in Julia Programming Language.

5. Discussion on performance evaluations

To assess the performance of the proposed algorithm against two


deep learning-based algorithms A2C [14] and PPO [15] as well as two
heuristic-based approaches [36], we use average wait time, which is
defined as vehicles’ average wait time during the simulation and a
commonly used performance metric in the related literature. To this
end, we create three different intersection configurations, namely 5,
17, and 20 intersections in the SUMO platform using its TraCI tool.
The traffic routes are randomly generated each time the simulation is Fig. 9. Results obtained for simulation with 1600 steps. The simulation was performed
run. 10 times and the average wait times were taken.
Figs. 7 and 8 show the performance evaluation in the average total
waiting time and the average maximum waiting time for a vehicle
in a simulation. These results are gained by running the simulation the simulations are conducted over a period of 1600 steps and each
10 times, generating new vehicle routes at each time. Specifically, algorithm runs a total of 10 times; around every 160 steps.

6
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

Table 2
Average and maximum waiting time based evaluations’ summarization table.
Algorithms Average waiting time Maximum waiting time Number of intersections
Occupancy-based Alg. 63 69 5
Flow-based Alg. (Ours) 32 19 5
Scoring-based Alg. 32 15 5
A2C [14] 57 48 5
PPO2 [15] 57 45 5
Occupancy-based Alg. 85 241 17
Flow-based Alg. 64 91 17
Scoring-based Alg. 64 126 17
A2C [14] 45 39 17
PP02 [15] 52 46 17
Occupancy-based Alg. 68 131 20
Flow-based Alg. 55 88 20
Scoring-based Alg. 56 118 20
A2C [14] 52 49 20
PP02 [15] 53 50 20

Fig. 10. Results obtained for simulation with 20 traffic light intersections, 1600 steps
for each simulation. The simulation was performed 10 times and the average wait times Fig. 11. Run time (in hours) for each algorithm in the network model with 5, 17 and
were taken. 20 intersections.

To better highlight Figs. 7 and 8, we also present Table 2 that • The proposed flow prediction algorithm outperforms other algo-
summarizes the average and maximum waiting times for each vehicle rithms, since it can predict where traffic congestion will occur.
in the simulation. As it can seen in the table, our proposed approach • Having larger number of nodes (intersections) provides more ac-
delivers comparable results to the baseline algorithms in terms of curate results in terms of vehicle flow prediction. Large number of
average and maximum waiting time with much less running time, see intersections are more informative, and thus, prediction accuracy
Fig. 11. is improved.
Figs. 9 and 10 show the average waiting time for a single vehicle
• To the best of our knowledge, no other study uses edge-based
against the time step in a 17 intersection and a 20 intersection simula-
semi-supervised learning to predict vehicle flows between traffic
tion, values shown in the graph are average from 10 distinct simulation
intersections.
runs. From the figures, we can observe that our algorithm (Flow-based
algorithm) can deliver better results than that of heuristic algorithms
and comparable results to that of deep learning algorithms.
6. Conclusion
Specifically, we also compare the runtime performances of the
algorithms in Fig. 11. It can be clearly seen that the runtime of our
In this paper, we propose an adaptive traffic light management
algorithm is in minutes while the runtime may take several days for
system that is able to predict traffic flow from one intersection to
the deep learning algorithms. This finding suggests that the proposed
another. The principal algorithm behind the proposed system is graph-
algorithm is more practical in real-world traffic datasets compared with
deep learning algorithms, i.e., A2C [14] and PPO2 [15]. based semi-supervised learning for edge flows, where each traffic light
Finally, we also examine the effects of known edge-label in our intersection, vehicle spawn and despawn points are considered as a
algorithm. To this end, we gradually increase the known edge labels vertex; and the roads connecting any two vertices are considered as
and observe their effect. Fig. 12 shows the performance of our algo- an edge. Magnitudes of edge connections are then calculated using
rithm in all of the different network models, compared with the ZeroFill the proposed RRQR method. The obtained information is then used to
algorithm in terms of prediction. We see that the algorithm performs select and optimize the predefined traffic phases.
better when the number of intersections increases, indicating that graph Future efforts in this direction would include further optimization of
topological information is useful. the traffic selection system itself. Comparative performance evaluations
The main findings of the performance evaluation can be summa- on various traffic intersection configurations show that our approach
rized as follows: can produce comparable average vehicle waiting time and drastically

7
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

[10] Hua Wei, Nan Xu, Huichu Zhang, Guanjie Zheng, Xinshi Zang, Chacha Chen,
Weinan Zhang, Yanmin Zhu, Kai Xu, Zhenhui Li, Colight: Learning network-
level cooperation for traffic signal control, in: Proceedings of the 28th ACM
International Conference on Information and Knowledge Management, 2019, pp.
1913–1922.
[11] Hua Wei, Guanjie Zheng, Huaxiu Yao, Zhenhui Li, Intellilight: A reinforcement
learning approach for intelligent traffic light control, in: Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
ACM, 2018, pp. 2496–2505.
[12] Junteng Jia, Michael T. Schaub, Santiago Segarra, Austin R. Benson, Graph-
based semi-supervised & active learning for edge flows, 2019, arXiv preprint
arXiv:1905.07451.
[13] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-
Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter
Wagner, Evamarie Wießner, Microscopic traffic simulation using SUMO, in: The
21st IEEE International Conference on Intelligent Transportation Systems, IEEE,
2018.
[14] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim-
othy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous
methods for deep reinforcement learning, 2016, arXiv:1602.01783.
[15] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov,
Proximal policy optimization algorithms, 2017.
[16] Francois Dion, Hesham Rakha, Youn-Soo Kang, Comparison of delay estimates
at under-saturated and over-saturated pre-timed signalized intersections, Transp.
Res. B 38 (2) (2004) 99–122.
Fig. 12. Graph based SSL for synthetic flows from the simulation on 5, 17 and 20 [17] Brian L. Smith, Michael J. Demetsky, Short-term traffic flow prediction: Neural
intersection network models. The plots show the Pearson correlation between the network approach, Transp. Res. Rec. (1453) (1994).
estimated flow 𝑓 ∗ and the ground truth flow 𝑓̂ as a function of the ratio of labeled
[18] Guanjie Zheng, Yuanhao Xiong, Xinshi Zang, Jie Feng, Hua Wei, Huichu Zhang,
edges.
Yong Li, Kai Xu, Zhenhui Li, Learning phase competition for traffic signal control,
2019, arXiv preprint arXiv:1905.04722.
[19] Hsin-Hung Pan, Shu-Ching Wang, Kuo-Qin Yan, An integrated data exchange
platform for intelligent transportation systems, Comput. Stand. Interfaces 36
reduce the training/learning time of learning adequate traffic light
(3) (2014) 657–671, URL https://www.sciencedirect.com/science/article/pii/
configurations for all intersections within a few seconds, while deep S0920548913000949.
learning-based approach can be trained in a few days for learning [20] S.L. Toral, F. Barrero, F. Cortés, D. Gregor, Analysis of embedded CORBA mid-
similar light configurations. dleware performance on urban distributed transportation equipments, Comput.
Stand. Interfaces 35 (1) (2013) 150–157.
[21] D. Gregor, S. Toral, T. Ariza, F. Barrero, R. Gregor, J. Rodas, M. Arzamen-
Declaration of competing interest
dia, A methodology for structured ontology construction applied to intelligent
transportation systems, Comput. Stand. Interfaces 47 (2016) 108–119.
This paper has not been published in any other journal. [22] Federico Barrero, Jean A. Guevara, Enrique Vargas, Sergio Toral, Manuel Vargas,
Networked transducers in intelligent transportation systems based on the IEEE
Data availability 1451 standard, Comput. Stand. Interfaces 36 (2) (2014) 300–311.
[23] I. Román, G. Madinabeitia, L. Jimenez, G.A. Molina, J.A. Ternero, Experiences
applying RM-ODP principles and techniques to intelligent transportation system
Data will be made available on request. architectures, Comput. Stand. Interfaces 35 (3) (2013) 338–347, RM-ODP:
Foundations, Experience and Applications.
Acknowledgments [24] Yoichiro Iwasaki, Image processing system to measure vehicular queues and an
adaptive traffic signal control by using the information of the queues, Comput.
Stand. Interfaces 20 (6) (1999) 444.
This work was supported by the Turkish Scientific and Techni-
[25] Waris Hooda, Pradeep Kumar Yadav, Amogh Bhole, Deptii D. Chaudhari, An
cal Research Council (TUBITAK) TEYDEB Program under Project No: image processing approach to intelligent traffic management system, in: Proceed-
3220798 and produced from the master thesis [42]. ings of the Second International Conference on Information and Communication
Technology for Competitive Strategies, ICTCS ’16, Association for Computing
References Machinery, New York, NY, USA, 2016.
[26] Ninad Lanke, Sheetal Koul, Smart traffic management system, Int. J. Comput.
Appl. 75 (7) (2013).
[1] INRIX, Home.
[27] S. Sheik Mohammed Ali, Boby George, Lelitha Vanajakshi, Jayashankar Venka-
[2] Alan J. Miller, Settings for fixed-cycle traffic signals, J. Oper. Res. Soc. 14 (4)
traman, A multiple inductive loop vehicle detection system for heterogeneous
(1963) 373–386.
and lane-less traffic, IEEE Trans. Instrum. Meas. 61 (5) (2011) 1353–1360.
[3] Fo Vo Webster, Traffic Signal Settings, Technical Rreport, 1958.
[4] Seung-Bae Cools, Carlos Gershenson, Bart D’Hooghe, Self-organizing traffic lights: [28] Isaac Porche, Stéphane Lafortune, Adaptive look-ahead optimization of traffic
A realistic simulation, in: Advances in Applied Self-Organizing Systems, Springer, signals, J. Intell. Transp. Syst. 4 (3–4) (1999) 209–254.
2013, pp. 45–55. [29] Baher Abdulhai, Rob Pringle, Grigoris J. Karakoulas, Reinforcement learning for
[5] M. Coşkun, A. Baggag, S. Chawla, Deep reinforcement learning for traffic light true adaptive traffic signal control, J. Transp. Eng. 129 (3) (2003) 278–285.
optimization, in: 2018 IEEE International Conference on Data Mining Workshops, [30] Yit Kwong Chin, Nurmin Bolong, Aroland Kiring, Soo Siang Yang, Kenneth
ICDMW, 2018, pp. 564–571. Tze Kin Teo, Q-learning based traffic optimization in management of signal
[6] Lior Kuyer, Shimon Whiteson, Bram Bakker, Nikos Vlassis, Multiagent rein- timing plan, Int. J. Simul., Syst., Sci. Technol. 12 (3) (2011) 29–35.
forcement learning for urban traffic control using coordination graphs, in: [31] Patrick Mannion, Jim Duggan, Enda Howley, An experimental review of rein-
Joint European Conference on Machine Learning and Knowledge Discovery in forcement learning algorithms for adaptive traffic signal control, in: Autonomic
Databases, Springer, 2008, pp. 656–671. Road Transport Support Systems, Springer, 2016, pp. 47–66.
[7] M.A. Wiering, Multi-agent reinforcement learning for traffic light control, in: [32] P.G. Balaji, X. German, Dipti Srinivasan, Urban traffic signal control using
Machine Learning: Proceedings of the Seventeenth International Conference, reinforcement learning agents, IET Intell. Transp. Syst. 4 (3) (2010) 177–188.
ICML’2000, 2000, pp. 1151–1158. [33] Itamar Arel, Cong Liu, Tom Urbanik, Airton G. Kohls, Reinforcement learning-
[8] Elise Van der Pol, Frans A. Oliehoek, Coordinated deep reinforcement learners based multi-agent system for network traffic signal control, IET Intell. Transp.
for traffic light control, Proc. Learn., Inference Control Multi-Agent Syst., (At Syst. 4 (2) (2010) 128–135.
NIPS 2016) (2016). [34] Samah El-Tantawy, Baher Abdulhai, Hossam Abdelgawad, Multiagent reinforce-
[9] Hua Wei, Guanjie Zheng, Vikash Gayah, Zhenhui Li, Recent advances in rein- ment learning for integrated network of adaptive traffic signal controllers
forcement learning for traffic signal control: A survey of models and evaluation, (MARLIN-ATSC): Methodology and large-scale application on downtown toronto,
ACM SIGKDD Explor. Newsl. 22 (2) (2021) 12–18. IEEE Trans. Intell. Transp. Syst. 14 (3) (2013) 1140–1150.

8
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771

[35] Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, [39] Akshay Gadde, Aamir Anis, Antonio Ortega, Active semi-supervised learning
Rene Traore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, using sampling theory for graph signals, in: Proceedings of the 20th ACM SIGKDD
Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, International Conference on Knowledge Discovery and Data Mining, 2014, pp.
Stable baselines, 2018, https://github.com/hill-a/stable-baselines. 492–501.
[36] Sébastien Faye, Claude Chaudet, Isabelle Demeure, A distributed algorithm for [40] Andrew Guillory, Jeff A. Bilmes, Label selection on graphs, in: Advances in
multiple intersections adaptive traffic lights control using a wireless sensor Neural Information Processing Systems, 2009, pp. 691–699.
networks, in: Proceedings of the First Workshop on Urban Networking, UrbaNe [41] Ali Çivril, Malik Magdon-Ismail, On selecting a maximum volume sub-matrix
’12, Association for Computing Machinery, New York, NY, USA, 2012, pp. 13–18. of a matrix and related problems, Theoret. Comput. Sci. 410 (47–49) (2009)
[37] Sultan Kübra Can, Adam Thahir, Mustafa Coşkun, V. Çağrı Güngör, Traffic 4801–4811.
light management systems using reinforcement learning, in: 2022 Innovations [42] Adam Rizvi Thahir, Graph Theory Based Traffic Light Management (Master’s
in Intelligent Systems and Applications Conference, ASYU, IEEE, 2022, pp. 1–6. thesis), Abdullah Gül Üniversitesi, Fen Bilimleri Enstitüsü, 2022.
[38] Mustafa Coskun, Burcu Bakir Gungor, Mehmet Koyuturk, Expanding label sets
for graph convolutional networks, 2019, arXiv preprint arXiv:1912.09575.

You might also like