2 Intelligent Traffic Light Systems Using Edge Flow Predictions
2 Intelligent Traffic Light Systems Using Edge Flow Predictions
Keywords: In this paper, we propose a novel graph-based semi-supervised learning approach for traffic light management
Artificial intelligence in multiple intersections. Specifically, the basic premise behind our paper is that if we know some of the
Reinforcement learning occupied roads and predict which roads will be congested, we can dynamically change traffic lights at
Traffic flow
the intersections that are connected to the roads anticipated to be congested. Comparative performance
Congestion
evaluations show that the proposed approach can produce comparable average vehicle waiting time and
reduce the training/learning time of learning adequate traffic light configurations for all intersections within
a few seconds, while a deep learning-based approach can be trained in a few days for learning similar light
configurations.
∗ Corresponding author.
E-mail addresses: [email protected] (A.R. Thahir), [email protected], [email protected] (M. Coşkun),
[email protected] (S.K. Kılıç), [email protected] (V.C. Gungor).
https://doi.org/10.1016/j.csi.2023.103771
Received 26 February 2023; Received in revised form 5 June 2023; Accepted 29 June 2023
Available online 4 July 2023
0920-5489/© 2023 Elsevier B.V. All rights reserved.
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
2. Related work
2
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
Table 1
Comparison table.
Simulation Predict flows Real time data Historical data MultiIntersection
IntelliLight[11] ✓ 𝑋 ✓(online) ✓(offline) 𝑋
Francois Dion et al [16] 𝑋 𝑋 𝑋 ✓ 𝑋
Alan J Miller[2] 𝑋 𝑋 𝑋 ✓ 𝑋
Brian L, Smith [17] 𝑋 ✓ 𝑋 ✓ 𝑋
FRAP [18] ✓ 𝑋 ✓ 𝑋 ✓
CoLight [10] ✓ 𝑋 ✓ 𝑋 ✓
Our Approach ✓ ✓ ✓ ✓ ✓
2.2. Reinforcement learning for traffic light control Deep Reinforcement Learning (A2C) [14] and Proximal Policy opti-
mization Algorithm [15], as well as two heuristic-based approaches
As the traffic management system problem is a time-dependent [36], we conduct our experiments on Simulation of Urban Mobility
dynamic problem, with the development of AI-based algorithms, re- (SUMO) [13] platform by selecting average vehicle waiting time as
cently Reinforcement Learning/Deep Learning approaches [5], have an evaluation metric. The first heuristic-based approach simply de-
been employed to dynamically adjust the traffic lights by learning cides and prioritizes the next traffic light phase based on the current
traffic behaviors. vehicle occupancy at a given intersection. The other heuristic-based
Previous studies relate traffic data with the state space of the approach is based on a local scoring mechanism [36] and changes the
environment. [6,7,29–31] use the number of vehicles queued at a given traffic phase and priority based on local scores. For a better visualiza-
road as their state space. [32,33] use traffic flow obtained by sensors tion, Table 1 shows a comparison table showing differences between
exiting a given intersection. Action spaces are defined based on signal our approach and the existing recent studies focusing on traffic light
phases, which can include all available signal phases as seen in [33,34] management systems.
or only the relevant green signal phase, as seen in [32] and [30].
3. Methods
Reward functions of the environments are typically related with how
to direct change could be observed from actions taken, [33,34] define In this section, we first define the ‘‘intelligent’’ traffic manage-
their reward function based on the change in delay for the vehicles in ment system problem. We then give a brief background on state-
the environment, similarly, [30,32] define their reward function based of-the-art approaches to solve this problem, i.e., heuristic and deep
on the change in the number of queued vehicles at given roads in the learning-based approaches, respectively. We finally present our graph
intersection. edge-based semi-supervised learning approach.
The agents in a reinforcement learning solution act on the en- Problem Definition: The aim of this paper is to find an optimum
vironment based on the actions defined in the action space of the traffic light configuration in multi-intersections so that traffic conges-
environment, the observation of the action taken can be viewed from tion can be prevented as much as possible. Before we delve into formal
the state space, and the reward for each action taken is calculated from approaches to solve this traffic management system (TMS) problem, let
the reward function. The Deep Q learning and policy gradient approach us first give some brief notations that will be used throughout the paper.
was studied in [5], which proposes using the reward function as traffic We define the TMS environment as , which consists of multiple
flow and traffic delay of vehicles in the network. Moreover, other deep intersections. Here, in this environment , each intersection is assumed
learning-based approaches have been studied in the Stable Baselines to have an ‘‘intelligent’’ traffic light agent 𝑇 𝐿𝑖 . 𝑇 𝐿𝑖 and 𝑇 𝐿𝑗 (for
project [35], using the Asynchronous method for Deep Reinforcement i ≠ j) in which ‘‘Red’’ and ‘‘Green’’ are used as the default phases
Learning (A2C) [14] and Proximal Policy optimization Algorithm [15]. settings. Moreover, when a light changes from one phase to another,
Comparing the traditional methods [7,32] in the context of traffic an additional ‘‘yellow’’ phase is set for a given period of time (default
management systems, deep learning-based approaches [5,10,11,35] time is 7s).
have been repeatedly shown to be more effective since these methods
can learn dynamic nature of the traffic within trail-and-error based 3.1. State-of-the-art-approaches
approaches by updating the neural network parameters. However, in
3.1.1. Occupancy based algorithm
these deep learning algorithms, training the neural network parameters
The occupancy-based algorithm [37] is one of the heuristic algo-
is computationally expensive as the number of intersections increases.
rithms that tries to solve the TMS problem by relying on the basic idea
In this paper, we propose an alternative and efficient algorithm by
of using the predefined historical traffic data to determine occupancies
stating the TMS as a flow prediction problem. Our approach renders
of the roads. More specifically, in this algorithm, the occupancies of all
predicting the outward flow of vehicles from a given intersection and
roads with incoming traffic onto the intersection are taken into account
enables the TMS system to identify vehicle congestion that would
and this data is then used to prioritize traffic phases, i.e if a road poses
possibly occur at a following intersection. Thus, our method prepares
more occupancy, then its green phase duration is incremented up to a
the intersection that is anticipated to be congested. More specifically, certain threshold. This prioritization formula can be given as follows:
our approach in the multi-intersection setting can provide information
on where the traffic flow takes place, pinpointing the intersections 𝐷 = 𝑇 max ∗ (𝑁 ∗ (𝑊𝑖 ∕𝑊𝑇 )) (1)
that are going to be congested in the order of seconds (efficiently) where 𝐷 is the duration, which is also set to a certain limit to prevent
based on semi-supervised active learning on edges [12]. Here, the blocking the other roads, 𝑇 max is the total time, 𝑁 is the number of
main difference of our proposed approach over other deep learning or incoming roads, 𝑊𝑖 is the occupancy value for incoming road 𝑖, which
traditional methods is that the flow of traffic can be anticipated be- is heuristically set to a certain value by observing the historical traffic
forehand. Hence, the intersections, that are predicted to be congested, data, and 𝑊𝑇 is the sum of all the road occupancy coming into the
can adjust their traffic lights accordingly. More importantly, since we intersection.
treat each intersection as a node and a large city can have traffic light Despite the effectiveness and simpleness of the occupancy algo-
intersections in the range of thousands, our graph-based approach can rithm, this algorithm sets its parameters based on historical data which
efficiently process these nodes (intersections). Thus, our approach can is prone to change, i.e some roads may have higher occupancy values
scale to a city-level traffic management system. in the past, however, their occupancy value may decrease in the future.
To assess the performance of our proposed approach against state- Thus, this algorithm introduces a certain bias towards the historically
of-the-art deep learning approaches, such as Asynchronous method for occupied roads.
3
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
3.1.2. Scoring algorithm 3.2. Our proposed graph-semi supervised learning based approach
The scoring algorithm is another heuristic algorithm that is pro-
posed by Sébastien Faye et al. [36]. The basic premise behind the The basic idea behind our proposed approach is to predict which
scoring algorithm is to score each incoming road in an intersection and intersections are going to be occupied by vehicles by representing the
assign priorities based on these scores. Formally, these scores of each TMS problem as a graph problem; and predicting the edge flow on
incoming road can be computed as follows: this graph. More specifically, in our graph-based modeling, we treat
( ) ( ) each intersection as well as exit point as a node; and we treat the
𝑁 (𝑠,𝑑) 𝑇𝐹(𝑠,𝑑)
𝐿𝑆 = 𝛼 ⋅ ∑ +𝛽⋅ (2) roads that are connecting these nodes as edges. More concretely, if
(𝑎,𝑏)
{𝑎,𝑏}∈𝐷 𝑁 𝜎{𝑎,𝑏}∈𝐷 𝑇𝐹(𝑎,𝑏) we consider the 17 intersections network shown in Fig. 5, we can
observe that there are an additional 12 sets of nodes that relate to
where 𝛼 and 𝛽 are user-defined weights to optimize average vehicle
different vehicle spawn/despawn points and exit points. Thus, in our
time and starvation, we use default values as 1 for both, (𝑠, 𝑑) is a
graph-based representation, the graph has 29 nodes in total, which are
possible movement going from a direction 𝑠 to another direction 𝑑
marked with red dots in the figure. Any road that connects two distinct
within a set of all possible directions, 𝑁 (𝑠,𝑑) is the weighted sum of
intersections is considered to be an edge between these nodes. In order
the number of vehicles present on the coming lanes that compose
to predict the traffic flow from one intersection (node) to another
movement, and 𝑇𝐹(𝑠,𝑑) is the last time since the incoming road had a
node connected via roads (edges), we use the methodology proposed
green phase. In the case that there are no vehicles present on any of
in [12], which is an edge-based semi-supervised learning algorithm.
the roads, 𝐿𝑆 would be set to 0.
This way, we aim at determining the green phase time for upcoming
Finally, once the scores are obtained, the system calculates priority
intersections. With the help of edge-based semi-supervised learning,
duration using the same equation as equation (1). Replacing occupancy
each intersection’s predicted inwards flow allows the intersection to
with local score gives us:
optimize the total phase cycle according to the predicted traffic flows.
𝐷 = 𝑇 max ∗ (𝑁 ∗ (𝑆𝑖 ∕𝑆𝑇 )) (3) In addition, the outward prediction of one intersection would be used
as the inward prediction for the following intersections.
where 𝐷 is the duration, 𝑇 max is the total time, 𝑁 is the number of
Mathematically, we have a set of vertices (traffic intersections and
incoming roads. 𝑆𝑖 and 𝑆𝑇 represents the local score for incoming road
exit points) 𝑉 , a set of edges that connect the edges (roads) 𝜀, and
𝑖 and the sum of all scores, respectively.
a labeled set of edge flows (traffic flow) 𝜀 ⊆ 𝜀. The goal of the
algorithm is to predict the unlabeled edge flows 𝜀𝑈 ≡ 𝜀∖𝜀. Here,
3.1.3. Reinforcement learning algorithms
it is important to note that our aim is to conduct edge based semi-
In this paper, in addition to the heuristic-based approaches defined
supervised learning approach, rather than a well-known node-based
as above, we also employ two deep-learning algorithms for the TMS
semi-supervised learning method [38].
problem. More specifically, we use PPO [15] and A2C [14] trained
In Graph-based Semi-Supervised Learning for vertices [38], a
using the Stable Baselines library [35]. While the classical DQN derives
graph is constructed with nodes and edges, where nodes are specified
the policy by learning the Q-value function, these two policy-based
by labeled 𝑉 𝐿 and unlabeled samples 𝑉 𝑈 . Edges would be based on
algorithms improve the policy directly.
the similarities among the samples 𝑉 . The goal of the algorithm would
Proximal Policy Optimization (PPO) [15]: As given in formula (4),
be to assign labels for the unlabeled samples 𝑉 𝑈 based on the existing
maximizing TRPO algorithm can lead to frequent parameter updates,
data, such that the assigned labels vary smoothly across the neighboring
hence instability. PPO is a policy gradient method, which simplifies
nodes. The notion of smoothness can be defined by the log function as
TRPO by using a clipped surrogate objective. It forces 𝑟(𝜃) to fit into
follows:
a smaller interval [1 − 𝜖, 1 + 𝜖] using the 𝑐𝑙𝑖𝑝(𝑟(𝜃), 1 − 𝜖, 1 + 𝜖) func- ∑
tion. In addition, PPO encourages exploration using an error term and ‖𝐵 𝑇 𝑦‖2 = (𝑦𝑖 − 𝑦𝑗 )2 (7)
an entropy term shown in the formula (5) where 𝜖, 𝑐1, and 𝑐2 are (𝑖,𝑗)∈𝜖
hyper-parameters. where 𝑦 represents the vector containing vertex labels, and 𝐵 ∈ 𝐑𝑛𝑥𝑚
[ ] represents the incidence matrix of the network. This loss function can
𝐽 TRPO (𝜃) = E 𝑟(𝜃)𝐴̂ 𝜃old (𝑠, 𝑎) (4)
be written as ‖𝐵 𝑇 𝑦‖2 = 𝑦𝑇 𝐿𝑦 in terms of the graph Laplacian 𝐿 = 𝐵𝐵 𝑇 .
[ ( )2 ( )] We can now obtain labels for the unknown nodes by minimizing
𝐽 CLIP’ (𝜃) = E 𝐽 CLIP (𝜃) − 𝑐1 𝑉𝜃 (𝑠) − 𝑉target + 𝑐2 𝐻 𝑠, 𝜋𝜃 (.) (5) the quadratic form 𝑦𝑇 𝐿𝑦 with respect to 𝑦 while keeping the labeled
Advantage Actor Critic (A2C) [14]: Actor critic methods use two vertices fixed.
separate networks: (i) actor and (ii) critic. The critic network learns In the case of Graph-based Semi-Supervised learning for edge
the value function while the actor network is trained to update the flows [12], we can represent the edge flows in the network with the
policy according to the value function learned by the critic network. vector 𝑓 . If we account only for the 𝑛𝑒𝑡𝑓 𝑙𝑜𝑤 along an edge, we obtain
A2C makes use of the advantage function (6) in order to calculate the 𝑓𝑟 > 0 when the flow orientation of edge 𝑟 aligns with its reference
advantage of a specific action over average general action at a given orientation and 𝑓𝑟 < 0 in all other cases. The true edge flow in the
network is denoted by 𝑓̂. The divergence of a vertex can be found by
state as follows:
calculating the sum of outgoing flows minus the incoming flows at that
𝐴𝜋 (𝑠, 𝑎) = 𝑄𝜋 (𝑠, 𝑎) − 𝑉 𝜋 (𝑠) (6) vertex. This can be shown as follows:
∑ ∑
For A2C [14] and PPO [15] algorithms, the reward is calculated (𝐵𝑓 )𝑖 = 𝑓𝑟 − 𝑓𝑟 . (8)
based on the difference in waiting for the time of vehicles and the 𝜀𝑟 ∈𝜀∶𝜀𝑟 ≡(𝑖,𝑗),𝑖<𝑗 𝜀𝑟 ∈𝜀∶𝜀𝑟 ≡(𝑗,𝑖),𝑗<𝑖
respective road movement as follows: Additionally, we can also create a loss function for edge flows which
⎧ (∑ ∑ ) enforces a notion of flow conservation. This can be expressed using the
⎪−1 ∗ ( ∑𝑇 +∑ 𝑉) N == 0 sum-of-squares vertex divergence:
𝑅=⎨ ( 𝑇+ 𝑉 )
⎪−1 ∗ N >= 1
⎩
𝑁 ‖𝐵𝑓 ‖2 = 𝑓 2 𝐵 𝑇 𝐵𝑓 = 𝑓 𝑇 𝐿𝑒 𝑓 (9)
where 𝑅 is the calculated reward, 𝑇 is an array of waiting times for In the above equation; 𝐿𝑒 = 𝐵𝑇 𝐵
is the edge Laplacian matrix.
different vehicles, 𝑉 is the number of new vehicles compared to the In order to obtain the set of labeled edges that are most helpful in
previous iteration, and 𝑁 is the number of vehicles that moved to a determining the overall edge flows in the network, we use Active Semi-
different road. Supervised Learning. Similar studies for vertex-based semi-supervised
4
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
4. Results
5
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
Fig. 5. Network design for simulation environment with 17 traffic light intersections.
Fig. 7. Average wait time obtained when simulating network models with 5, 17 and
20 intersections.
Fig. 6. Network design for simulation environment with 20 traffic light intersections.
6
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
Table 2
Average and maximum waiting time based evaluations’ summarization table.
Algorithms Average waiting time Maximum waiting time Number of intersections
Occupancy-based Alg. 63 69 5
Flow-based Alg. (Ours) 32 19 5
Scoring-based Alg. 32 15 5
A2C [14] 57 48 5
PPO2 [15] 57 45 5
Occupancy-based Alg. 85 241 17
Flow-based Alg. 64 91 17
Scoring-based Alg. 64 126 17
A2C [14] 45 39 17
PP02 [15] 52 46 17
Occupancy-based Alg. 68 131 20
Flow-based Alg. 55 88 20
Scoring-based Alg. 56 118 20
A2C [14] 52 49 20
PP02 [15] 53 50 20
Fig. 10. Results obtained for simulation with 20 traffic light intersections, 1600 steps
for each simulation. The simulation was performed 10 times and the average wait times Fig. 11. Run time (in hours) for each algorithm in the network model with 5, 17 and
were taken. 20 intersections.
To better highlight Figs. 7 and 8, we also present Table 2 that • The proposed flow prediction algorithm outperforms other algo-
summarizes the average and maximum waiting times for each vehicle rithms, since it can predict where traffic congestion will occur.
in the simulation. As it can seen in the table, our proposed approach • Having larger number of nodes (intersections) provides more ac-
delivers comparable results to the baseline algorithms in terms of curate results in terms of vehicle flow prediction. Large number of
average and maximum waiting time with much less running time, see intersections are more informative, and thus, prediction accuracy
Fig. 11. is improved.
Figs. 9 and 10 show the average waiting time for a single vehicle
• To the best of our knowledge, no other study uses edge-based
against the time step in a 17 intersection and a 20 intersection simula-
semi-supervised learning to predict vehicle flows between traffic
tion, values shown in the graph are average from 10 distinct simulation
intersections.
runs. From the figures, we can observe that our algorithm (Flow-based
algorithm) can deliver better results than that of heuristic algorithms
and comparable results to that of deep learning algorithms.
6. Conclusion
Specifically, we also compare the runtime performances of the
algorithms in Fig. 11. It can be clearly seen that the runtime of our
In this paper, we propose an adaptive traffic light management
algorithm is in minutes while the runtime may take several days for
system that is able to predict traffic flow from one intersection to
the deep learning algorithms. This finding suggests that the proposed
another. The principal algorithm behind the proposed system is graph-
algorithm is more practical in real-world traffic datasets compared with
deep learning algorithms, i.e., A2C [14] and PPO2 [15]. based semi-supervised learning for edge flows, where each traffic light
Finally, we also examine the effects of known edge-label in our intersection, vehicle spawn and despawn points are considered as a
algorithm. To this end, we gradually increase the known edge labels vertex; and the roads connecting any two vertices are considered as
and observe their effect. Fig. 12 shows the performance of our algo- an edge. Magnitudes of edge connections are then calculated using
rithm in all of the different network models, compared with the ZeroFill the proposed RRQR method. The obtained information is then used to
algorithm in terms of prediction. We see that the algorithm performs select and optimize the predefined traffic phases.
better when the number of intersections increases, indicating that graph Future efforts in this direction would include further optimization of
topological information is useful. the traffic selection system itself. Comparative performance evaluations
The main findings of the performance evaluation can be summa- on various traffic intersection configurations show that our approach
rized as follows: can produce comparable average vehicle waiting time and drastically
7
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
[10] Hua Wei, Nan Xu, Huichu Zhang, Guanjie Zheng, Xinshi Zang, Chacha Chen,
Weinan Zhang, Yanmin Zhu, Kai Xu, Zhenhui Li, Colight: Learning network-
level cooperation for traffic signal control, in: Proceedings of the 28th ACM
International Conference on Information and Knowledge Management, 2019, pp.
1913–1922.
[11] Hua Wei, Guanjie Zheng, Huaxiu Yao, Zhenhui Li, Intellilight: A reinforcement
learning approach for intelligent traffic light control, in: Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
ACM, 2018, pp. 2496–2505.
[12] Junteng Jia, Michael T. Schaub, Santiago Segarra, Austin R. Benson, Graph-
based semi-supervised & active learning for edge flows, 2019, arXiv preprint
arXiv:1905.07451.
[13] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-
Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter
Wagner, Evamarie Wießner, Microscopic traffic simulation using SUMO, in: The
21st IEEE International Conference on Intelligent Transportation Systems, IEEE,
2018.
[14] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim-
othy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous
methods for deep reinforcement learning, 2016, arXiv:1602.01783.
[15] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov,
Proximal policy optimization algorithms, 2017.
[16] Francois Dion, Hesham Rakha, Youn-Soo Kang, Comparison of delay estimates
at under-saturated and over-saturated pre-timed signalized intersections, Transp.
Res. B 38 (2) (2004) 99–122.
Fig. 12. Graph based SSL for synthetic flows from the simulation on 5, 17 and 20 [17] Brian L. Smith, Michael J. Demetsky, Short-term traffic flow prediction: Neural
intersection network models. The plots show the Pearson correlation between the network approach, Transp. Res. Rec. (1453) (1994).
estimated flow 𝑓 ∗ and the ground truth flow 𝑓̂ as a function of the ratio of labeled
[18] Guanjie Zheng, Yuanhao Xiong, Xinshi Zang, Jie Feng, Hua Wei, Huichu Zhang,
edges.
Yong Li, Kai Xu, Zhenhui Li, Learning phase competition for traffic signal control,
2019, arXiv preprint arXiv:1905.04722.
[19] Hsin-Hung Pan, Shu-Ching Wang, Kuo-Qin Yan, An integrated data exchange
platform for intelligent transportation systems, Comput. Stand. Interfaces 36
reduce the training/learning time of learning adequate traffic light
(3) (2014) 657–671, URL https://www.sciencedirect.com/science/article/pii/
configurations for all intersections within a few seconds, while deep S0920548913000949.
learning-based approach can be trained in a few days for learning [20] S.L. Toral, F. Barrero, F. Cortés, D. Gregor, Analysis of embedded CORBA mid-
similar light configurations. dleware performance on urban distributed transportation equipments, Comput.
Stand. Interfaces 35 (1) (2013) 150–157.
[21] D. Gregor, S. Toral, T. Ariza, F. Barrero, R. Gregor, J. Rodas, M. Arzamen-
Declaration of competing interest
dia, A methodology for structured ontology construction applied to intelligent
transportation systems, Comput. Stand. Interfaces 47 (2016) 108–119.
This paper has not been published in any other journal. [22] Federico Barrero, Jean A. Guevara, Enrique Vargas, Sergio Toral, Manuel Vargas,
Networked transducers in intelligent transportation systems based on the IEEE
Data availability 1451 standard, Comput. Stand. Interfaces 36 (2) (2014) 300–311.
[23] I. Román, G. Madinabeitia, L. Jimenez, G.A. Molina, J.A. Ternero, Experiences
applying RM-ODP principles and techniques to intelligent transportation system
Data will be made available on request. architectures, Comput. Stand. Interfaces 35 (3) (2013) 338–347, RM-ODP:
Foundations, Experience and Applications.
Acknowledgments [24] Yoichiro Iwasaki, Image processing system to measure vehicular queues and an
adaptive traffic signal control by using the information of the queues, Comput.
Stand. Interfaces 20 (6) (1999) 444.
This work was supported by the Turkish Scientific and Techni-
[25] Waris Hooda, Pradeep Kumar Yadav, Amogh Bhole, Deptii D. Chaudhari, An
cal Research Council (TUBITAK) TEYDEB Program under Project No: image processing approach to intelligent traffic management system, in: Proceed-
3220798 and produced from the master thesis [42]. ings of the Second International Conference on Information and Communication
Technology for Competitive Strategies, ICTCS ’16, Association for Computing
References Machinery, New York, NY, USA, 2016.
[26] Ninad Lanke, Sheetal Koul, Smart traffic management system, Int. J. Comput.
Appl. 75 (7) (2013).
[1] INRIX, Home.
[27] S. Sheik Mohammed Ali, Boby George, Lelitha Vanajakshi, Jayashankar Venka-
[2] Alan J. Miller, Settings for fixed-cycle traffic signals, J. Oper. Res. Soc. 14 (4)
traman, A multiple inductive loop vehicle detection system for heterogeneous
(1963) 373–386.
and lane-less traffic, IEEE Trans. Instrum. Meas. 61 (5) (2011) 1353–1360.
[3] Fo Vo Webster, Traffic Signal Settings, Technical Rreport, 1958.
[4] Seung-Bae Cools, Carlos Gershenson, Bart D’Hooghe, Self-organizing traffic lights: [28] Isaac Porche, Stéphane Lafortune, Adaptive look-ahead optimization of traffic
A realistic simulation, in: Advances in Applied Self-Organizing Systems, Springer, signals, J. Intell. Transp. Syst. 4 (3–4) (1999) 209–254.
2013, pp. 45–55. [29] Baher Abdulhai, Rob Pringle, Grigoris J. Karakoulas, Reinforcement learning for
[5] M. Coşkun, A. Baggag, S. Chawla, Deep reinforcement learning for traffic light true adaptive traffic signal control, J. Transp. Eng. 129 (3) (2003) 278–285.
optimization, in: 2018 IEEE International Conference on Data Mining Workshops, [30] Yit Kwong Chin, Nurmin Bolong, Aroland Kiring, Soo Siang Yang, Kenneth
ICDMW, 2018, pp. 564–571. Tze Kin Teo, Q-learning based traffic optimization in management of signal
[6] Lior Kuyer, Shimon Whiteson, Bram Bakker, Nikos Vlassis, Multiagent rein- timing plan, Int. J. Simul., Syst., Sci. Technol. 12 (3) (2011) 29–35.
forcement learning for urban traffic control using coordination graphs, in: [31] Patrick Mannion, Jim Duggan, Enda Howley, An experimental review of rein-
Joint European Conference on Machine Learning and Knowledge Discovery in forcement learning algorithms for adaptive traffic signal control, in: Autonomic
Databases, Springer, 2008, pp. 656–671. Road Transport Support Systems, Springer, 2016, pp. 47–66.
[7] M.A. Wiering, Multi-agent reinforcement learning for traffic light control, in: [32] P.G. Balaji, X. German, Dipti Srinivasan, Urban traffic signal control using
Machine Learning: Proceedings of the Seventeenth International Conference, reinforcement learning agents, IET Intell. Transp. Syst. 4 (3) (2010) 177–188.
ICML’2000, 2000, pp. 1151–1158. [33] Itamar Arel, Cong Liu, Tom Urbanik, Airton G. Kohls, Reinforcement learning-
[8] Elise Van der Pol, Frans A. Oliehoek, Coordinated deep reinforcement learners based multi-agent system for network traffic signal control, IET Intell. Transp.
for traffic light control, Proc. Learn., Inference Control Multi-Agent Syst., (At Syst. 4 (2) (2010) 128–135.
NIPS 2016) (2016). [34] Samah El-Tantawy, Baher Abdulhai, Hossam Abdelgawad, Multiagent reinforce-
[9] Hua Wei, Guanjie Zheng, Vikash Gayah, Zhenhui Li, Recent advances in rein- ment learning for integrated network of adaptive traffic signal controllers
forcement learning for traffic signal control: A survey of models and evaluation, (MARLIN-ATSC): Methodology and large-scale application on downtown toronto,
ACM SIGKDD Explor. Newsl. 22 (2) (2021) 12–18. IEEE Trans. Intell. Transp. Syst. 14 (3) (2013) 1140–1150.
8
A.R. Thahir et al. Computer Standards & Interfaces 87 (2024) 103771
[35] Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, [39] Akshay Gadde, Aamir Anis, Antonio Ortega, Active semi-supervised learning
Rene Traore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, using sampling theory for graph signals, in: Proceedings of the 20th ACM SIGKDD
Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, International Conference on Knowledge Discovery and Data Mining, 2014, pp.
Stable baselines, 2018, https://github.com/hill-a/stable-baselines. 492–501.
[36] Sébastien Faye, Claude Chaudet, Isabelle Demeure, A distributed algorithm for [40] Andrew Guillory, Jeff A. Bilmes, Label selection on graphs, in: Advances in
multiple intersections adaptive traffic lights control using a wireless sensor Neural Information Processing Systems, 2009, pp. 691–699.
networks, in: Proceedings of the First Workshop on Urban Networking, UrbaNe [41] Ali Çivril, Malik Magdon-Ismail, On selecting a maximum volume sub-matrix
’12, Association for Computing Machinery, New York, NY, USA, 2012, pp. 13–18. of a matrix and related problems, Theoret. Comput. Sci. 410 (47–49) (2009)
[37] Sultan Kübra Can, Adam Thahir, Mustafa Coşkun, V. Çağrı Güngör, Traffic 4801–4811.
light management systems using reinforcement learning, in: 2022 Innovations [42] Adam Rizvi Thahir, Graph Theory Based Traffic Light Management (Master’s
in Intelligent Systems and Applications Conference, ASYU, IEEE, 2022, pp. 1–6. thesis), Abdullah Gül Üniversitesi, Fen Bilimleri Enstitüsü, 2022.
[38] Mustafa Coskun, Burcu Bakir Gungor, Mehmet Koyuturk, Expanding label sets
for graph convolutional networks, 2019, arXiv preprint arXiv:1912.09575.