Development of A Signal-Free Intersection Control System For CAVs and Corridor Level Impact Assessment
Development of A Signal-Free Intersection Control System For CAVs and Corridor Level Impact Assessment
1 Transportation Engineering Department, NJIT (New Jersey Institute of Technology), Newark, NJ 07102, USA
2 Civil Engineering Department, NJIT (New Jersey Institute of Technology), Newark, NJ 07102, USA;
[email protected]
3 NJDOT ITS Resource Centre, NJIT (New Jersey Institute of Technology), Newark, NJ 07102, USA;
[email protected]
* Correspondence: [email protected]
Abstract: Assuming a full market penetration rate of connected and autonomous vehicles (CAVs) would
provide an opportunity to remove costly and inefficient traffic lights from intersections, this paper
presents a signal-free intersection control system relying on CAVs’ communicability. This method
deploys a deep reinforcement learning algorithm and pixel reservation logic to avoid potential colli-
sions and minimize the overall delay at the intersection. To facilitate a traffic-oriented assessment
of the model, the proposed model’s application is coupled with VISSIM traffic microsimulation
software, and its performance is compared with other intersection control systems, including fixed
traffic lights, actuated traffic lights, and the Longest Queue First (LQF) control system. The simu-
lation result revealed that the proposed model reduces delay by 50%, 29%, and 23% in moderate,
high, and extreme volume regimes, respectively, compared to another signal-free control system.
Noticeable improvements are also gained in travel time, fuel consumption, emission, and Surrogate
Safety Measures.
Keywords: connected and autonomous vehicles; signal-free intersection control systems; deep
reinforcement learning; deep queue networks; machine learning; pixel reservation control system
Citation: Mirbakhsh, A.; Lee, J.;
Besenski, D. Development of a
Signal-Free Intersection Control
System for CAVs and Corridor Level 1. Introduction
Impact Assessment. Future Transp. Due to rapid population growth and the increased number of vehicles, traffic conges-
2023, 3, 552–567. https://doi.org/ tion, collisions, and pollution have become leading causes of decreased living standards.
10.3390/futuretransp3020032 In the USA in 2017, traffic congestion caused 8.8 billion hours of delay and 3.3 billion
Academic Editors: Raj Mani Shukla gallons of fuel waste, resulting in a total cost of 179 billion USD. Several studies have
and Lakshmi Babu-Saheer proved that human errors are pivotal in traffic congestion and accidents, and driver error
contributes to up to 75% of roadway crashes worldwide. Intersections are crucial in traffic
Received: 26 March 2023 delays and collisions among urban traffic facilities. In the United States, 44% of reported
Revised: 10 April 2023
traffic accidents occur at urban intersections, leading to 8500 fatalities and around 1 million
Accepted: 20 April 2023
injuries annually [1].
Published: 01 May 2023
In conventional intersection control systems, an intersection controller such as a traffic
light dictates the rules to the vehicles. However, recent advances in vehicles’ communica-
tion systems demand that there is communication between vehicles or between vehicles
Copyright: © 2023 by the authors.
and the controller system to take full advantage of the CAVs’ communication capabilities.
Licensee MDPI, Basel, Switzerland. Based on this idea, CAV-based intersection control logic has been a point of interest for the
This article is an open access article last couple of decades. Several approaches such as trajectory planning, real-time optimiza-
distributed under the terms and tion, and rule-based intersection control logics have been developed to establish purposive
conditions of the Creative Commons communication between facilities at the intersections. However, the most critical task in
Attribution (CC BY) license (https:// developing an intelligent intersection control system is to make the algorithm adjustable
creativecommons.org/licenses/by/ with stochastic or unprecedented circumstances. Existing CAV-based control systems are
4.0/). mostly based on fixed rules or conventional multi-objective optimization methods and are
Serve” (FCFS) control policy, assigning the passing priority to the vehicle with the earliest
arrival time. All other vehicles must yield to the vehicle with priority. In a following study,
Dresner et al. [6] added several complementary regulations to the FCFS policy to make
it work more reliably, safely, and efficiently. Simulation results revealed that the FCFS
policy noticeably reduces intersection delay compared to traffic light and stop sign control
systems.
Zhang et al. [7] proposed a state–action control logic based on Prior First in First
Out (PriorFIFO) logic. They assumed spatial–temporal and kinetic parameters for vehicle
movement based on a centralized scheduling mechanism. This study aimed to reduce the
control delay for vehicles with higher priority. The simulation results with a combination
of high-, average-, and low-priority vehicles showed that the algorithm works well for
vehicles with higher priority, meanwhile, causing some extra delays for regular vehicles
with lower priority.
Carlino et al. [8] developed an auction-based intersection control logic based on the
Clarke Groves tax mechanism and pixel reservation. In this approach, if commonly reserved
tiles exist between vehicles, an auction is held between the involved vehicles. All vehicles
in each direction contribute to their leading vehicle to win the auction, and the control
logic decides which leading vehicles receive a pass order first. The bid’s winner and
its contributors (followers) have to pay the runner-up’s bid amount with a proportional
payment (based on their contribution value in the bid). A “system wallet” component was
added to auction-based intersection control to ensure low-budget vehicles or emergency
vehicles would not be over-delayed. A comparison of simulation results showed that the
auction-based control logic outperforms the FIFO logic.
receives the arrival and departure times from approaching vehicles and optimizes the
vehicles’ arrival times. The optimization goal is to minimize the difference between the
current time and the last vehicle’s expected arrival time at the intersection. To ensure
all vehicles are not forced to travel near the speed limit, a cost value was defined as
a function of the difference between the assigned and desired crossing time for each
vehicle. Several constraints, such as speed limit, maximum acceleration, minimum headway,
and minimum cushion for conflicting movements, were applied to the model. The two-
movement intersection simulation results showed that the MILP-based controller reduces
average travel time by 70.5% and average stop delay by 52.4%. It was also proved that the
control logic encourages platooning under a specific gap setting.
Lee et al. [12] proposed a Cumulative Travel-time Responsive (CTR) intersection
control algorithm under CAVs’ imperfect market penetration rate. They considered the
elapsed time spent by vehicles from when they entered the network to the current position
as a real-time measure of travel time. The Kalman filtering approach was deployed to
cover the imperfect market penetration rate of CAVs for travel time estimation. Simulations
were run in VISSIM for an isolated intersection with 40 volume scenarios covering the
volume capacity ratio ranging from 0.3 to 1.1 and different CAV market penetration rates.
Simulation results showed that the CTR algorithm improved mobility measures such as
travel time, average speed, and throughput by 34%, 36%, and 4%, respectively, compared
to the actuated control system. The CO2 emission and fuel consumption were also reduced
by 13% and 10%. It was also revealed that the CTR would produce more significant benefits
as the market penetration rate passes the threshold of 30%. In general, more benefits were
observed as the total intersection volume increased.
state included its lane, speed, and moving intention. The intersection area was divided
into an n × n grid, and each vehicle was supposed to reserve its desired pixel ahead of
time. Vehicles could enter either a coordinated or an independent state, based on having
a reserved pixel in common or not. Conventional multiagent Q-learning was deployed
in this study, meaning that new states and their Q-values were added to a matrix as they
came up, and the controller agent had to refer to a specific cell in the matrix to make a
decision. Aligned with the multiagent optimization goal, each agent’s effort to maximize its
Q-value resulted in a maximized global reward for all agents. Simulation results revealed
that the proposed model outperforms the FCFS method, fixed-time traffic lights, and the
LQF control system in delay reduction.
3. Methodology
3.1. Deep Reinforcement Learning and Deep Q-Networks
DRL process can be expressed as an agent taking actions; the environment responding
to the agent by presenting new state and reward based on the last act’s quality. The agent
seeks to maximize the reward over a series of interactions with the environment in discrete
time steps (t) [20]. DQN formulation for estimating return values, called Q-function, is
presented in Equation (1).
Q(s, a) = r + ΥmaxQ s0 , a0
(1)
where:
• Q(s, a) or Q-value defines the value of the current state–action pair.
• Q(s0 , a0 ) defines the value of the next state–action pair.
• Υ or “discount rate” ranges from zero to one and defines the present value of future
rewards. If Υ = 0, the agent is myopic, only concerned with maximizing the immediate
reward. As Υ approaches one, the agent cares more about future rewards and becomes
more farsighted.
• r is the immediate reward for the current actions.
AI training and trial and error processes are usually performed in simulation testbeds,
enabling a trained agent to tackle a broad scope of real-world scenarios. In the DQN
approach, states are the input layer of the neural network, and the neural network’s output
layer is an estimation of Q-values for all possible actions in the current state. The training
occurs by updating the neural network weights based on a batch of recent historical data
(states, actions, Q-values). In this study, the simulation testbed is developed in VISSIM
Future Transp. 2023, 3, FOR PEER REVIEW 7
software, and the DQN is developed as a Python application; a high-level training process
of a DQN agent appears in Figure 1.
Figure 1.1.The
Figure Thetraining process
training of theofproposed
process intersection
the proposed control system.
intersection control system.
2. Training
Figure 2. Training environment’s
environment’s physical
physical settings.
settings.
Based
Based on on classical
classical physics
physics theory,
theory, the the survey
survey distance
distance equals
equals (V∆t + 12 𝑎∆𝑡
(𝑉∆𝑡 + a∆t2 )) if
if the
the
vehicle is moving slower than the maximum speed and can still accelerate, or it equals
vehicle is moving slower than the maximum speed and can still accelerate, or it equals
(V∆t) if the vehicle is already moving with the maximum speed, and the stop distance
(𝑉∆𝑡) if the
2 vehicle is already moving with the maximum speed, and the stop distance
equals ( V2a ). The acceleration and deceleration rates are set to 3.5 and 7 m/s2 . The stop and
equals ( ). The acceleration and deceleration rates are set to 3.5 and 7 m/s . The stop and
survey distance formulas are shown in Equation (2).
survey distance formulas are shown in Equation (2).
h 2 i h i
V 1 2
h 2a2 i𝑉 , 𝑉∆𝑡 +2 1 𝑎∆𝑡
max , V∆t + a∆t V < Vm
DS = ⎧ 𝑚𝑎𝑥 𝑉<𝑉 (2)
⎪
max V 2𝑎
, [ V∆t ] 2 V = Vm
𝐷𝑆 = 2a (2)
⎨ 𝑉
⎪ 𝑚𝑎𝑥 , [𝑉∆𝑡] 𝑉=𝑉
⎩ 2𝑎
where:
• ∆𝑡: each step’s time length.
Future Transp. 2023, 3 559
where:
• ∆t: each step’s time length.
• V: current speed.
• a: acceleration or deceleration capability.
• DS: number of desired cells.
If the algorithm detects a shared desired cell between vehicles, they will enter a
coordinated state; otherwise, they will be in an independent state. For example, vehicles 2
and 3 in Figure 2 are in a coordinated state since they have reserved common cells. Each
leading vehicle’s state includes its current speed, current cell, and the queue length behind
it. Possible actions for each vehicle to avoid collision are acceleration, deceleration, or
maintaining the current speed. Equations (3) and (4), present the state and possible actions.
Maximum acceleration and deceleration rate for CAVs is assumed to be same as Ford
Fusion 2019 hybrid specifications.
RL’s goal is to optimize the total return (reward) at the end of each episode and not the
immediate reward. Therefore, being independent in this environment does not necessarily
result in acceleration or maintaining the maximum speed.
h
∑ r jb
r sj, aj = (7)
b =1
N
R(S, A) = ∑ r(i or j) b (8)
b =1
where:
• Subscripts i refer to an agent being in an independent state;
• Subscripts j refer to an agent being in a joint state;
• (s,a) refers to taking a specific action (a), in specific state (s);
• h is the number of agents involved in a joint state;
• N is the total number of agents in an environment;
• R is the global reward.
Future Transp. 2023, 3 560
3.4. Transition from Independent to Coordinate State or from Coordinated to Independent State
Any transition from an independent state to a coordinated state or vice versa requires
specific settings for updating Q-value in MADQN. Three possible transition types are
explained below:
Type 1: when an agent moves from a coordinated state to another coordinated state or
from an independent state to another independent state, the coordinated or independent
Q-value is updated by:
Type 2: when an agent moves from a coordinated state to an independent state, the
joint Q-value is updated by:
h
∑[r(s, a) j + ΥmaxQ s0 , a0 i ]
Q(s, a) j = (10)
j
Type 3: when an agent moves from an independent state to a coordinated state, the
independent Q-value is updated by:
1 0 0
Q(s, a)i = r (s, a)i + Υ maxQ s , a i (11)
h
Based on the existing literature [5,6], the pixel reservation system guarantees a collision-
free vehicle maneuver system by itself. Therefore, unlike most other RL-based AV man-
agement systems, collision avoidance is not considered in the reward function as an
optimization goal.
Hyperparameter Value
Number of steps per epoch 1500
Step size (∆t), equal to simulation resolution 1s
Replay memory size 10,000
Minibatch size 32
Target network update frequency 5 epochs
Initial ε 1
Future Transp. 2023, 3, FOR PEER REVIEW
ε-decay factor 0.999 11
Future Transp. 2023, 3, FOR PEER REVIEW 11
Discount (γ) 0.9
4
−4 ×× 10
−4 104
4
−4.4 ×× 10
−4.4 104
4
−4.8 ×× 10
−4.8 104
4
−5.2 ×× 10
−5.2 104
4
−5.6 ×× 10
−5.6 104
4
−6 ×× 10
−6 104
Figure 3. Global
Figure 3. Global reward
reward convergence over training
convergence over epochs.
training epochs.
4.
4. Proof
Proof ofof Concept
Concept Test
Test
The is to
The trained model is applied
trained model applied to aa corridor
corridor of
of four
four intersections
intersections to
to assess
assess its
its impact
impact on
on
traffic flow in an extensive network. The testbed layout is shown in Figure 4.
traffic
traffic flow in an extensive network. The testbed layout is shown in Figure 4.
Figure
Figure 4.
4. Testbed
Figure 4. Testbed layout.
Testbed layout.
layout.
The DQN
The DQN control
control system’s
system’s performance
performance isis compared
compared withwith other
other conventional
conventional and and
CAV-based intersection
intersection control
control systems,
systems, including
including (1)(1) fixed
fixed traffic
CAV-based intersection control systems, including (1) fixed traffic signal, (2)
traffic signal,
signal, (2) actuated
(2) actuated
actuated traf-
traf-
fic
fic signal,
signal, and
and (3)(3) Longest
Longest Queue
Queue First
First (LQF)
(LQF) control
control logic,
logic, which
which was
was developed
developed by by
Wunderlich
Wunderlich et al. [21]. In this algorithm, the phases have no specific order and
et al. [21]. In this algorithm, the phases have no specific order and are
are trig-
trig-
gered
gered based
based onon queue
queue length
length in
in different
different approaches.
approaches. InIn our
our study,
study, the
the LQF
LQF logic
logic is
is mod-
mod-
eled
eled based
based on
on pixel
pixel reservation
reservation for
for connected
connected and
and autonomous
autonomous vehicles,
vehicles, so
so the
the only
only dif-
dif-
Future Transp. 2023, 3 562
traffic signal, and (3) Longest Queue First (LQF) control logic, which was developed by
Wunderlich et al. [21]. In this algorithm, the phases have no specific order and are triggered
based on queue length in different approaches. In our study, the LQF logic is modeled
based on pixel reservation for connected and autonomous vehicles, so the only difference
between LQF and DQN would be the optimization approach. The speed limit is 40.2 km/h,
and three different volume regimes (with different random seeds for vehicle production),
described below, are considered for evaluation purposes:
1. Moderate volume: approximately 1150 and 850 Veh/h on the major and minor streets,
Future Transp. 2023, 3, FOR PEER REVIEW respectively. This volume combination leads to Level Of Service (LOS) B for the major 12
street and LOS C for the minor street.
2. High volume: approximately 1600 and 1100 Veh/h on the major and minor streets, re-
spectively. This volume combination leads to LOS D for both major and minor streets.
3. Extreme
3. Extreme volume:
volume: approximately 2000 and
approximately 2000 and 1300
1300 Veh/h
Veh/h on
on the
the major
major and
and minor
minor streets,
streets,
respectively. This volume combination leads to LOS F or congestion
respectively. This volume combination leads to LOS F or congestion for major for major and and
minor
minor streets.
streets.
The LOS
The LOS is is calculated
calculated based
based on
on the
the Highway
Highway Capacity
Capacity Manual
Manual (HCM)
(HCM) 6th 6th edition
edition in in
Vistro,assuming
Vistro, assumingthe theintersection
intersectionoperates
operatesunder
underananoptimized
optimizedtraffic
trafficlight.
light.Vistro
Vistrois is a traf-
a traffic
analysis and signal optimization software. Simulation time and resolution are set to 60to
fic analysis and signal optimization software. Simulation time and resolution are set 60
min
min and two steps/sec, respectively; accordingly, 20 simulation runs with
and two steps/sec, respectively; accordingly, 20 simulation runs with different random different ran-
dom seeds
seeds areinrun
are run in VISSIM
VISSIM software
software forscenario.
for each each scenario.
4.1.
4.1. The Target
Target Traffic
TrafficMeasure
MeasureComparison
Comparison
Since
Since the
the control
control system’s
system’s optimization goal is to minimize the delay, delay, the
the delay
delay is
is
specified
specified as the target traffic measure. A comparison of the average delay between
target traffic measure. A comparison of the average delay between differ- dif-
ferent control
ent control systems
systems appears
appears in in Figure
Figure 5. According
5. According to the
to the figure,
figure, thethe
DQNDQN effectively
effectively re-
reduces delayininall
duces delay allvolume
volumeregimes,
regimes,specifically
specificallyin
in moderate
moderate volumes,
volumes, with
with a 50% delay
reduction
reduction compared
compared to to the
the second-best
second-best control system.
system. Delay reductions of 29% and 23%
are
are achieved
achieved compared
compared to the the second-best
second-best control
control systems
systems inin high
high and
and extreme
extreme volume
volume
regimes, respectively.
regimes, respectively.
Figure 5.
Figure 5. Average
Average delay
delay comparison.
comparison.
Followed
Followed by by delay
delay reductions,
reductions, travel
travel time
time improvement
improvementis is also
also expected.
expected. The
The average
average
travel
travel time values
values shown in Figure 6, 6, reveal
reveal that
that 22%
22% and
and 16%
16% travel
travel time
time reductions
reductions are
are
gained
gained compared
compared to to the LQF control system in moderate and high volume circumstances,
respectively.
respectively. In
In the
the extreme
extreme volume
volume regime,
regime, the
the proposed
proposed control
control system
system outperforms
outperforms the
the
actuated control system with an 11% travel time reduction.
actuated control system with an 11% travel time reduction.
Future Transp. 2023, 3, FOR PEER REVIEW 1
Future Transp.
Future Transp.2023,
2023,3,3 FOR PEER REVIEW 563 13
Figure7.7.Average
Figure Average fuel
fuel consumption.
consumption.
Figure 8.
Figure 8. Average
Average CO
CO22 emission.
Despite
Among acceleration
various Surrogate and deceleration rates for the
Safety Measures, DQN being limited to fixed
post-encroachment timevalues
(PET)only,
is a
fuel consumption
well-fitting measure and emission
used improve
to identify safetyslightly
threatsinfor moderate
crossingand highat
vehicles volume regimes,
an intersection.
Figure
and 8. Average
a slight
It represents the CO
increment
time2 emission.
occurs
between in the
the departure
extreme volume of thecondition.
encroaching Fuel consumption
vehicle from theand CO2
conflict
emission are estimated based on VISSIM software built-in models.
point and the vehicle’s arrival with the right-of-way at the conflict point [23]. Safety anal-
Among
ysis Among
is performedvarious
various Surrogate
Surrogate
in the Safety Measures,
Safety
SSAM software, Measures, the post-encroachment
the
automatically post-encroachment
identifying, classifying,time (PET) is aa
is
and eval-
well-fitting
well-fitting measure
measure used
used to
to identify
identify safety
safety threats
threats forfor crossing
crossing
uating traffic conflicts in the vehicle trajectory data output from microscopic traffic simu- vehicles
vehicles at at
an an intersection.
intersection. It
It represents
represents
lation models. the
the time time
The between between
comparison the departure
the departure
of PET forofdifferentof the encroaching
the encroaching vehicle
vehiclesystems
traffic control from thefrom the conflict
conflictinpoint
appears Fig-
point
and theand the vehicle’s
vehicle’s arrival arrival
with with
the the right-of-way
right-of-way at the at the
conflict
ure 9 According to the figure, pixel-reservation-based logics, including DQN and LQF, conflict
point point
[23]. [23].
Safety Safety anal-
analysis is
ysis is
performed performed
in the in
SSAM the SSAM
software, software, automatically
automatically identifying,
identifying,
have an almost equal PET. Both perform better than the actuated traffic light, specifically classifying,
classifying, and and eval-
evaluating
uating
traffic
in extremetraffic
conflictsconflicts
volumein the in thewith
vehicle
regimes vehicle
trajectory
a 22%trajectory data output
data output
improvement from
in PET.from microscopic
microscopic traffictraffic
Pixel-reservation-based simu-
simulationcon-
lation
models. models.
The The comparison
comparison of PET of PET
for for different
different traffic traffic
control
trol systems consist of more stop-and-go, which increases the chance of an accident control
systemssystems
appears appears
in in Fig-
Figure 9
in
According
ure 9 to
According the figure,
to the pixel-reservation-based
figure, pixel-reservation-based logics, including
logics,
regular intersections. Therefore, even a minor improvement in PET compared to conven- DQN
including and
DQN LQF,
and have
LQF,
an
have almost
an equalequal
almost PET. PET.
Both Bothperformperformbetter thanthan
better the actuated
the actuatedtraffic light,
traffic specifically
light, specifically in
tional control systems is noticeable.
extreme
in extreme volume
volume regimes
regimes with a 22%
with a 22%improvement
improvement in PET. Pixel-reservation-based
in PET. Pixel-reservation-based control
con-
systems
trol systemsconsist of more
consist stop-and-go,
of more whichwhich
stop-and-go, increases the chance
increases of an accident
the chance in regular
of an accident in
intersections. Therefore,
regular intersections. even a even
Post-encroachment
Therefore, minora minor
improvement
Timeimprovement in PETincompared
PET compared to conventional
to conven-
control systems
tional control is noticeable.
systems is noticeable.
Post-encroachment Time
tically significant for all measures in all volume regimes. The proposed model’s minor
losses or gains in fuel consumption and emission are not statistically significant (with a
confidence interval of 95%) compared to the actuated traffic lights. DQN gains compared
to LQF are statistically significant in all measures except for safety, which is predictable
since both models are based on pixel reservation logic to avoid collisions. The t-test results
appear in Table 2.
allow the DQN to target delay reduction as the only optimization goal. The traffic volume
and vehicle arrival’s random seed are constant during 3500 epochs of model training
iterations. However, the trained model can deal with any volume combination or stochastic
vehicle arrivals due to the chain impact of the agent’s random actions in a DQN training
course, known as exploration. The DQN is developed as a Python application, controlling
individual vehicles in VISSIM software.
Performance of the proposed model is compared with conventional and CAV-based
intersection control systems, including fixed traffic lights, actuated traffic lights, and LQF,
in a corridor of four intersections. Based on the simulation results, the DQN noticeably
outperforms the LQF control system with up to 50% and 20% delay and travel time
reduction. Both CAV-based control systems have shown the same performance in safety
measures. However, the DQN outperforms the LQF in fuel consumption and CO2 emission
by up to 30%. Although the proposed model is based on fixed acceleration and deceleration
rates, it does not cause any significant losses in fuel consumption and CO2 emission
compared to the conventional intersection control systems, which is a promising result.
This study’s future directions can be (1) developing a passenger throughput optimization-
based or emergency vehicle priority-based control system by adjusting the reward function,
(2) assuming several accelerations and declaration rates for each vehicle to improve environ-
mental measures, and (3) using other DRL methods such as Double DQN and Prioritized
Experience Replay to improve the model’s performance.
Author Contributions: Conceptualization, A.M. and J.L.; methodology, A.M. and J.L.; software, A.M.;
validation, A.M.; formal analysis, A.M.; writing—original draft preparation, A.M.; writing—review
and editing, A.M. and D.B.; supervision, J.L. Resources, D.B. and J.L. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing is not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Azimi, R.; Bhatia, G. STIP: Spatio-temporal intersection protocols for autonomous vehicles. In Proceedings of the IEEE Interna-
tional Conference on Cyber-Physical Systems, Berlin, Germany, 14–17 April 2014.
2. U.S. Department of Transportation. Available online: https://www.its.dot.gov/pilots/ (accessed on 6 April 2021).
3. GreyB. Autonomous Vehicle Market Report. May 2021. Available online: https://www.greyb.com/autonomous-vehicle-
companies/ (accessed on 21 July 2021).
4. Ghayoomi, H.; Partohaghighi, M. Investigating lake drought prevention using a DRL-based method. Eng. Appl. 2023, 2, 49–59.
5. Dresner, K.; Stone, P. Multiagent traffic management: A reservation-based intersection control mechanism. In Proceedings of the
Third International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA, 19–23 July 2004.
6. Dresner, K.; Stone, P. A Multiagent Approach to Autonomous Intersection Management. J. Artif. Intell. Res. 2008, 31, 591–656.
[CrossRef]
7. Zhang, K.; Fortelle, A. Analysis and modeled design of one state-driven autonomous passing-through algorithm for driverless
vehicles at intersections. In Proceedings of the 16th IEEE International Conference on Computational Science and Engineering,
Sydney, Australia, 3–5 December 2015.
8. Dustin, C.; Boyles, S.; Stone, P. Auction-based autonomous intersection management. In Proceedings of the IEEE Conference on
Intelligent Transportation Systems, Hague, The Netherlands, 6–9 October 2013.
9. Yan, F.; Dridi, M. Autonomous vehicle sequencing algorithm at isolated intersections. In Proceedings of the IEEE Conference on
Intelligent Transportation Systems, St. Louis, MO, USA, 4–7 October 2009.
10. Jia, W.; Abdeljalil, A. Discrete methods for urban intersection traffic controlling. In Proceedings of the IEEE Vehicular Technology
Conference, Barcelona, Spain, 26–29 April 2009.
11. Fayazi, A.; Vahidi, A. Optimal scheduling of autonomous vehicle arrivals at intelligent intersections via MILP. In Proceedings of
the American Control Conference, Seattle, WA, USA, 24–26 May 2017.
12. Lee, J.; Park, B. Cumulative travel-time responsive real-time intersection control algorithm in the connected vehicle environment.
J. Transp. Eng. 2013, 139, 1020–1029. [CrossRef]
Future Transp. 2023, 3 567
13. Lee, J.; Park, B. Development and Evaluation of a Cooperative Vehicle Intersection Control Algorithm Under the Connected
Vehicle Environment. In Proceedings of the IEEE Transactions on Intelligent Transportation Systems, Anchorage, AK, USA, 16–19
September 2012; p. 13.
14. Gutesa, S.; Lee, J. Development and Evaluation of Cooperative Intersection Management Algorithm under Connected Vehicles
Environment. Ph.D. Thesis, NJIT, Newark, NJ, USA, 2018.
15. Krajewski, R.; Themann, P.; Eckstein, L. Decoupled cooperative trajectory optimization for connected highly automated vehicles
at urban intersections. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, 19–22 June 2016;
pp. 741–746.
16. Lamouik, I.; Yahyaouy, A. Smart multi-agent traffic coordinator for autonomous vehicles at intersections. In Proceedings of the
International Confrence on Advanced Technologies for Signal and Image Processing, Fez, Morocco, 22–24 May 2017.
17. Liang, X.; Du, X.; Wang, G.; Han, Z. Multi-Agent Deep reinforcement learning for traffic light control in vehicular networks. IEEE
Trans. Veh. Technol. 2018, 69, 8243–8256.
18. Touhbi, S.; Babram, M.A. Traffic Signal Control: Exploring Reward Definition For Reinforcement Learning. In Proceedings of the
8th International Conference on Ambient Systems, Madeira, Portugal, 16–19 May 2017.
19. Wu, Y.; Chen, H. DCL-AIM: Decentralized coordination learning of autonomous intersection management for connected and
automated vehicles. Transp. Res. Part C Emerg. Technol. 2019, 103, 246–260. [CrossRef]
20. Graesser, L.; Keng, W.L. Foundations of Deep Reinforcement Learning; Pearson Education Inc.: New York, NY, USA, 2020.
21. Wunderlich, R.; Elhanany, I.; Urbanik, T. A Stable Longest Queue First Signal Scheduling Algorithm for an Isolated Intersection.
In Proceedings of the IEEE International Conference on Vehicular Electronics and Safety, ICVES, Shanghai, China, 13–15
December 2006.
22. Rakha, H.; Ahn, K.; Trani, A. Development of VT-Micro model for estimating hot stabilized light duty vehicle and truck emissions.
Transp. Res. Part D Transp. Environ. 2004, 9, 49–74. [CrossRef]
23. Gettman, D.; Head, L. Surrogate Safety Measures from Traffic Simulation Models. Transp. Res. Rec. 2003, 1840, 104–115. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.