Expert Systems With Applications
Expert Systems With Applications
Dynamic traffic signal control for heterogeneous traffic conditions using Max
Pressure and Reinforcement Learning✩ , ✩✩
Amit Agarwal a ,∗, Deorishabh Sahu a , Rishabh Mohata a , Kuldeep Jeengar b , Anuj Nautiyal a ,
Dhish Kumar Saxena b
a
Department of Civil Engineering, Indian Institute of Technology Roorkee, Haridwar, Uttarakhand, 247667, India
b Department of Mechanical and Industrial Engineering, Indian Institute of Technology Roorkee, Haridwar, Uttarakhand, 247667, India
Keywords: Optimization of green signal timing for each phase at a signalized intersection in an urban area is critical
Adaptive Traffic Signal Control for efficacious traffic management and congestion mitigation. Many algorithms are developed, yet very few
Max Pressure are for cities in developing nations where traffic is characterized by its heterogeneous nature. While some of
Mixed traffic
the recent studies have explored different variants of Max Pressure (MP) and Reinforcement Learning (RL) for
Reinforcement Learning
optimizing phase timing, the focus is limited to homogeneous traffic conditions. In developing nations, such
as India, control systems, like fixed and actuated, are still predominantly used in practice. Composite Signal
Control Strategy (CoSiCoSt) is also employed at a few intersections. However, there is a notable absence of
advanced models addressing heterogeneous traffic behavior, which have a great potential to reduce delays and
queue lengths. The present study proposes a hybrid algorithm for an adaptive traffic control system for real-
world heterogeneous traffic conditions. The proposed algorithm integrates Max Pressure with Reinforcement
Learning. The former dynamically determines phase order by performing pressure calculations for each phase.
The latter optimizes the phase timings for each phase to minimize delays and queue lengths using the proximal
policy optimization algorithm. In contrast to the past RL models, in which the phase timing is determined for
all phases at once, in the proposed algorithm, the phase timings are determined after the execution of every
phase. To assess the impact, classified traffic volume is extracted from surveillance videos of an intersection in
Ludhiana, Punjab and simulated using Simulation of Urban Mobility (SUMO). Traffic volume data is collected
over three distinct time periods of the day. The results of the proposed algorithm are compared with benchmark
algorithms, such as Actuated, CoSiCoSt, and acyclic & cyclic Max Pressure, Reinforcement Learning-based
algorithms. To assess the performance, queue length, delay, and queue dissipation time are considered as
key performance indicators. Of actuated and CoSiCoSt, the latter performs better, and thus, the performance
of the proposed hybrid algorithm is compared with CoSiCoSt. The proposed algorithm effectively reduces
total delay and queue dissipation time in the range of 77.07%–87.66% and 53.95%–62.07%, respectively.
Similarly, with respect to the best-performing RL model, the drop in delay and queue dissipation time range
from 55.63 to 77.12% and 22.13 to 43.7%, respectively, which is significant at 99% confidence level. The
proposed algorithm is deployed on a wireless hardware architecture to confirm the feasibility of real-world
implementation. The findings highlight the algorithm’s potential as an efficient solution for queues and delays
at signalized intersections, where mixed traffic conditions prevail.
✩ This work was supported by the IDEAS-TIH ISI Kolkata (Grant No. - ISI-1970-MID) under the National Mission on Interdisciplinary Cyber-Physical System
(NM-ICPS) of the Department of Science and Technology, Government of India.
✩✩ The authors also wish to thank the Punjab Police (Traffic Wing), Traffic Research Center, Mohali, and Safety Alliance for Everyone Society, Mohali, for their
support in facilitating the data.
∗ Corresponding author.
E-mail addresses: [email protected] (A. Agarwal), [email protected] (D. Sahu), [email protected] (R. Mohata), [email protected]
(K. Jeengar), [email protected] (A. Nautiyal), [email protected] (D.K. Saxena).
URL: https://faculty.iitr.ac.in/~amitfce/ (A. Agarwal).
https://doi.org/10.1016/j.eswa.2024.124416
Received 7 March 2024; Received in revised form 14 May 2024; Accepted 3 June 2024
Available online 6 June 2024
0957-4174/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
2
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Table 1
A list of past methods for adaptive traffic signal control systems.
Study Methodology/ Sudy area Mixed Benchmark Performance
model traffic
Wongpiromsarn, MP Toy Scenario – SCATS queue lengths ↓
Uthaicharoenpong,
Wang, Frazzoli, and
Wang (2012)
Varaiya (2013a) MP Toy Scenario – ACS better stabilization
Gregoire, Frazzoli, de MP Toy Scenario – O-BP performs well under heavy load;
La Fortelle, and performance gap of 10%–20%
Wongpiromsarn (2014)
Kouvelas, Lioris, Fayazi, MP Los Angeles, California – ACS Queue stabilization
and Varaiya (2014)
Pumir, Anderson, MP San Diego, California – FCS, ACS queue stabilization
Triantafyllos, and
Bayen (2015)
Le et al. (2017) MP Toy Scenario – BP network performance ↑,
congestion ↓
Anderson, Pumir, MP-C San Diego, California – ACS demonstrated better control
Triantafyllos, and stability and system performance
Bayen (2018) under defined demand limits
Sun and Yin (2018) MP Toy Scenario – A-MP, C-MP, ACS original non-cyclic MP
outperformed others, especially
under high-demand scenarios
Dixit, Nair, Chand, and MP Thane, Noida, India; – FCS delay ↓ by 12% to 30%
Levin (2020) Bandung, Indonesia
Levin, Hu, and Odell MP Austin, Texas – MP (w/o cycle constraints) slightly worse throughput due to
(2020) cyclical constraints but offers
improved predictability and
acceptance in practice
Mercader, Uwayid, and MP Jerusalem, Israel – FCS queue stabilization
Haddad (2020)
Levin, Barman, MP Hennepin County, – ACS delay ↓
Robbennolt, Hu, Odell, Minnesota
and Kang (2022)
Liu and Gayah (2022) D-MP Toy Scenario – V-MP delay ↓
Agarwal, Sahu, MP-C Ludhiana, India Yes ACS, MP delay ↓ by 22.96-54.8%; queue
Nautiyal, Agarwal and dissipation time ↓ by 5.45–8.32%
Gupta (2024)
Abdulhai, Pringle, and RL (Q-Learning) Toy Scenario – FCS delay ↓ by 56%–62%
Karakoulas (2003)
Prashanth and RL-Q, PGA Toy Scenario – FCS PGA > FCS
Bhatnagar (2011)
Patel, Mathew, and Optimization Toy Scenario Yes ACS delay ↓ by 35.5-38.8%
Venkateswaran (2016) model
Van der Pol and RL (DQN with Toy Scenario – Wiering et al. (2000) and DQN > TTLC in terms of TT ↓
Oliehoek (2016) transfer Kuyer, Whiteson, Bakker,
planning) and Vlassis (2008)
Wei et al. (2019) MP + RL (DQN) State College, USA; Jinan, – FCS, GreenWave, MP, LIT, travel time ↓
China; New York City, USA GRL
Dixit et al. (2020) D-MP Noida, India; Thane, India; Yes FCS delay ↓ by 12%–30%
Bandung, Indonesia
Huang and Qu (2020) RL (PPO + LSTM Toy Scenario – ACS, SCOOTS queue ↓ , congestion ↓
Networks)
Bouktif, Cheniki, and RL + P-DQ Toy Scenario – FCS, DDPG TT ↓
Ouni (2021)
Wang, Yin, Feng, and MP + RL (PPO) Ann Arbor, Michigan – Traditional MP delay ↓
Liu (2022)
Maripini, Vanajakshi, ATCS (TT) Toy Scenario Yes Webster method delay ↓ by 10.41-11.78%
and Chilukuri (2022)
Zhao et al. (2022) IPDALight Jinan and Hangzhou, – FCS, PressLight, CoLight travel time ↓ by 20%
China
Boukerche, Zhong, and RL Toy Scenario – T-RL (w/o delays) queue ↓ by 30%, wait time ↓ by
Sun (2022) 25%
3
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Table 1 (continued).
Study Methodology/ Sudy area Mixed Benchmark Performance
model traffic
Mao, Li, and Li (2022) RL- PPO, SAC, Hangzhou, China – Traditional MP SAC > (DRL, MP)
DQN, DDQN, QR
(DQN)
Ghosh, Anusha, Babu, ATCS (RFID) Trivandrum, Kerala, India Yes FCS delay ↓ by 12.6%
and Vanajakshi (2023)
Maripini et al. (2024) ATCS (TT) Chennai, India Yes FCS delay ↓ by 15.42%
Wei, Gao, Yang, and Li DDPG Toy Scenario – FCS, DQN-TSC queue ↓ by 40%, delay ↓ by 30%
(2023)
Zhao, Wang, Wang, and RL Hangzhou, China – FCS; DQN; Double Deep Q improvement up to 13% for
Liang (2024) (GNN + DDQN + Du- Network (DDQN), and a average reward, delay, queue
eling DQN enhanced with length, and waiting time
DQN) Prioritized Replay
Sampling (DQN-PER)
Saiki and Arai (2023) MORL Toy Scenario – MP and single-objective RL travel time ↓
Yazdani et al. (2023) DDQN Melbourne, Victoria, – Actuated reducing total delays
Australia
Ducrocq and Farhi D3QN Toy Scenario – MP and SOTL reducing total delays
(2023)
Zhang, Chang, Jin, D3QN Changsha, China – Fixed, Actuated, SCOOT 16% ↓ (traffic conflicts), 4% ↓
Yang, and Huang (carbon emissions), 18% ↓
(2024) (waiting time compared to
traditional ATSC), although there
was a slight increase (0.64%) in
waiting time compared to the
efficiency-only DRL model.
Models: Max Pressure: MP, Max Pressure based on crowd-sourced data: MP-C, Reinforcement learning: RL, Reinforcement learning using Q learning: RL-Q, Traditional Reinforcement
learning: T-RL, Policy gradient actor-critic algorithm: PGA, Soft Actor-Critic: SAC, MORL: Multi-Objective Reinforcement Learning, Delay-based Max Pressure: D-MP, Parameterized
Deep Q-Networks: P-DQN, Deep Deterministic Policy Gradient: DDPG, Optimal Back Pressure: O-BP, Quantile Regression DQN: QR (DQN), D3QN: Dueling Double Deep Q Network,
DDQN: Double Deep Q-Network
Benchmarks: Non-cyclic or Acyclic MP: A-MP, Variants of Max Pressure: V-MP, Cyclic MP: C-MP, Adaptive traffic control system: ATCS, Travel time: TT, traditional traffic light
control algorithms: TTLC, Sydney Coordinated Adaptive Traffic System: SCATS, Split Cycle Offset Optimisation Technique: SCOOT, Actuated control system: ACS, Fixed control
system: FCS.
Others: Bluetooth data: BD, Crowdsourced data: CD, Radio Frequency Identification: RFID.
whose input is based on travel times instead of queue lengths. The In the previous studies that have built upon (Varaiya, 2013a), the
authors consider this methodology advantageous as the travel times actuation of phases has often been done purely based on pressure
are easier to estimate, and a controller based on travel times has the calculation, lacking the concept of a signal cycle. However, Le et al.
property that considers each link’s finite capacity. Similarly, Agarwal, (2017) introduced a different approach where the time step was defined
Sahu et al. (2024) introduced a wireless architecture for traffic signal as one signal cycle, in contrast to selecting one phase per time step
control utilizing crowdsourced data, where the green signal timing is as commonly done in earlier studies. Their model did not impose a
determined using a travel time-based Max Pressure control algorithm. minimum green time allocation for each phase, which could limit the
Moreover, Liu and Gayah (2022) proposed a Max Pressure model based effectiveness due to factors such as startup delays. Similarly, Anderson
on travel delays , while inheriting the desirable maximum stability et al. (2018), Pumir et al. (2015) also employed a methodology where
feature of the queue-based Max Pressure frameworks, while addressing one phase was activated per time step. They also mandated that each
their limitations, such as challenges in estimation, simplified vehicle phase be activated at least once throughouta signal cycle and for a
depiction, and arbitrary delays at intersections with low demand. Dixit minimum duration. Most of the studies have used an overly simplistic
et al. (2020) proposed a novel traffic control system that uses only real- approach for determining the green signal timing for each phase using
time delay data, eliminating the need for traffic volume or queue length a constant cycle time, which potentially limits its effectiveness in com-
information. The approaches proposed by Agarwal, Sahu et al. (2024), plex traffic scenarios. Thus, the present study explores and introduces
Dixit et al. (2020) simplify signal control, avoiding the installation Reinforcement Learning (RL) to compute green signal timing. The
and upkeep of multiple detectors/sensors (intrusive/ non-intrusive) at idea behind the integration of RL is to make the green timing and,
each approach of an intersection, a common requirement in traditional consequently, cycle time dynamic.
methods like Max Pressure control.
The initial Max Pressure signal control suffered from an acyclic 1.3. Reinforcement learning
nature, in which a phase may be skipped multiple times due to Max
Pressure calculation. This will have issues in providing green signals The application of Reinforcement Learning (RL) in Adaptive Traffic
for pedestrians in urban areas. The problem is aggravated if one of Signal Control (ATSC) has gained traction and garnered significant
the approaches to the intersection is a minor road with low volume. interest in recent years, primarily due to its potential to optimize traffic
The acyclic Max Pressure control algorithm is revised to include a flow, mitigate congestion, and superior performance. These techniques
cyclical phase structure with a defined maximum cycle length (Levin enable controllers to learn the optimal control policy through direct
et al., 2020). The modified Max Pressure controller ensures that a environmental interaction. This section briefs the recent RL-based ATSC
predetermined sequence of phases is activated, with each phase being studies (also refer to Table 1) and highlights the limitations of different
activated at least once within a cycle. Sun and Yin (2018) conducted approaches.
a comparison between Max Pressure control and coordinated actuated Initial applications of RL in ATSC focused on simple models us-
traffic signals in VISSIM. As expected, additional constraints for cyclic ing algorithms like Q-learning and State–Action Reward–State–Action
nature downgrade performance with respect to acyclic Max Pressure. (SARSA) (Abdulhai et al., 2003; Prashanth & Bhatnagar, 2011). These
4
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
early methods laid the preliminary work by displaying the prominent Pressure and Self-Organizing Traffic Lights (SOTL), in minimizing total
use of RL to optimize traffic signal control in a dynamic environment. traffic delays.
Their ability to adjust to fluctuating traffic scenarios without requiring From the above discussion, it is evident that the reinforcement
predefined traffic models gave them an edge over existing strategies. learning models have the potential to nudge the traffic signal timing
However, their dependence on simplistic state representations and towards higher performance. However, a few issues are apparent.
discrete state–action spaces proved to be major limitations, hindering For instance, discrete state–action space, large data requirements for
their effectiveness in complex traffic conditions. Also, the computa- training, computational resources, applicability to heterogeneous traffic
tional complexity of large-scale traffic networks and their reliance on conditions, acyclic nature, etc. Additionally, the focus of many studies
simulated traffic data left a significant gap to be filled. is limited to calculating phase timings; very few studies have tried to
Subsequent research work recognized and addressed these chal- consider the learning mechanism for ordering the phases, which may
lenges by introducing continuous state–action spaces, the inclusion of also play a critical role.
real-time traffic data, and enhancing the responsiveness of the ATSC
systems (Mannion et al., 2016). However, even after these enhance- 1.4. MP, RL integrated studies
ments, models still struggled with high-dimensional state spaces and
often required extensive training to achieve considerable results. More The literature has very few studies that propose hybrid algorithms
recent studies have shifted towards integrating existing RL techniques (cf. Table 1). Some of them have used reinforcement learning with a
with Deep Learning (Deep Reinforcement Learning). DRL, exemplified time-series (e.g., Huang & Qu, 2020) and others integrated different
by works like Van der Pol and Oliehoek (2016), enables the handling RL models (e.g., Bouktif et al., 2021; Zhao et al., 2024). From the
of high-dimensional state spaces and reduces average travel times in discussions in Sections 1.2 and 1.3, it is evident that the MP-based algo-
comparison to existing approaches. Bouktif et al. (2021) showcase the rithms have great potential to prioritize the phases at an intersection,
determination of the next phase and provide optimal signal timings si- and the RL-based algorithms improve the performance by optimizing
multaneously by using Multi-Pass Deep Q-Networks (Multi-Pass DQN). signal timing. This nudges to a hybrid approach in which MP and RL
Significant improvements in traffic flow optimization and signal timing are integrated. So far, in the literature, only two studies have proposed
efficiency was recorded by the use of this hybrid DRL variant. However, hybrid algorithms that integrate MP and RL (Wang et al., 2022; Wei
et al., 2019). The former integrates pressure as a reward in the DQN
immense amounts of training data and susceptibility to non-stationary
RL model, whereas the latter incorporates the Max Pressure Controller
environments were major constraints to work on. Huang and Qu (2020)
as an agent in the policy gradient RL model. The authors showed that
highlight the use of Proximal Policy Optimization (PPO) to train neural
the proposed hybrid approaches perform better than conventional and
networks for optimal signal control strategy, which has proven to
existing approaches. The use of DQN in the former increases the cost of
be revolutionizing. The research highlights the algorithm’s ability to
learning (Wei et al., 2019) and thus limits real-world applications. The
balance the exploration-exploitation trade-off more effectively, leading
complexity of DQN is overcome by using RL-PPO (Wang et al., 2022).
to more efficient and realistic traffic flow management solutions. Yaz-
Both of these studies are acyclic, i.e., these approaches will not suit ur-
dani et al. (2023) implemented a Double Deep Q-Network (DDQN)
ban intersections where pedestrians are crossing the streets. Moreover,
algorithm for traffic signal control at intersections, focusing on opti-
the RL-PPO in the hybrid algorithm (Wang et al., 2022), determines
mizing the balance between pedestrian and vehicular traffic flows. The
the signal timing for all phases in one go. However, determining the
reward function was formulated by aggregating the individual rewards
phase timing after the execution of every phase is likely to further
of various road users through a weighted sum, with adjustments made
enhance the performance. Lastly, the applicability of both studies is
to penalize the extra delays resulting from interactions among these
limited to homogeneous traffic conditions, and the feasibility of the
users. The results of this study indicate that the application of the pro-
proposed algorithm is not tested in real-world system architecture.
posed DDQN algorithm yielded a significant improvement in pedestrian
The present study addresses these constraints by introducing a
travel times, reducing them by over 13% in comparison to actuated
hybrid algorithm tailored to heterogeneous traffic conditions. This al-
signal control. However, this benefit was accompanied by a marginal
gorithm integrates a cyclic framework to facilitate pedestrian crossings
increase of 1% in the average travel time when compared to actuated within each cycle. The proposed hybrid algorithm integrates RL and
control. Saiki and Arai (2023) proposed a multi-objective RL (MORL) to MP. For the former, a novel DRL framework, Proximal Policy Optimiza-
avoid the necessity of multiple policies owing to different traffic flows tion (PPO), is used due to its robustness and ability to adjust traffic
at different times of the day at an intersection. The inclusion of multiple signals dynamically, which have been proven to mitigate congestion
policies will lead to a decay in the performance. The authors compared and improve traffic flows. Being computationally light for large-scale
MORL with SORL and MP methods, finding that MORL achieved better traffic, trained on real-time traffic data, and using continuous state–
average travel times than SORL and similar performance to MP. Along action spaces, PPO enhances the responsiveness of the ATSC systems
similar lines, Zhang et al. (2024) also implemented a multi-objective and makes it a suitable candidate for real-time signal control oper-
deep reinforcement learning framework using a Dueling Double Deep ations. Its capability to adapt and optimize policy to non-stationary
Q Network (D3QN) to optimize traffic signals. The RL model proposed, environments reduces the chance of intersection cross-blocking. More-
integrates multiple objectives related to safety, efficiency, and carbon over, in RL-PPO, the phase timing is adjusted following the completion
emissions at an intersection. A normalized model and the entropy of each phase, a feature anticipated to further optimize traffic signal
weight method are applied to harmonize the different dimensions of performance.
these objectives. Despite these improvements, the findings showed no
substantial performance increase over the D3QN, which was purely fo- 1.5. Traffic signal control for mixed traffic conditions
cused on efficiency optimization. In reality, a 0.6% increase in waiting
time was observed. Signalized intersections play a crucial role in shaping overall road-
Ducrocq and Farhi (2023) employed a Dueling Double Deep Q- way network performance, yet heterogeneous traffic conditions impose
Network (D3QN) to optimize traffic signal control. However, the au- additional challenges for traffic flow optimization. The existing lit-
thors presumed that the data from connected vehicles (CVs) were erature provides ample evidence supporting the necessity of dynamic
precise despite the potential inaccuracies in the infrastructure. Fur- signal control across diverse traffic scenarios, leading to improved
thermore, they trained a distinct agent for each simulated scenario. performance. Traditional guidelines/models from developed countries
The results demonstrated that the proposed algorithm significantly may not be directly applicable to developing nations like India, which
outperformed conventional traffic management strategies, such as Max necessitates tailored measures. However, significant development for
5
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
traffic signal control under heterogeneous traffic conditions is limited 1.6. Research gaps and objectives
in defining the saturation flow and determining delays for a traffic in-
tersection under traffic conditions without lane discipline (Padinjarapat Building upon the preceding discussion, a notable research gap
& Mathew, 2020). For instance, CRRI (2017), Mondal, Arya, and Gupta exists in the exploration of various traffic signal control algorithms
(2022), Padinjarapat and Mathew (2020), Radhakrishnan and Mathew tailored for heterogeneous traffic scenarios. The present study has the
(2011) proposed different methods to formulate a saturation flow following research contributions: (a) A comprehensive evaluation of
model based on Passenger Car Units (PCUs). Very few studies attempt various algorithms (actuated, CoSiCoSt, RL, MP) for optimization of
to determine the traffic signal phase timings to optimize the traffic traffic signal phase timings, (b) development of a hybrid traffic signal
flow under mixed traffic conditions. For instance, Verghese, Chenhui, control algorithm suitable for heterogeneous traffic scenarios using the
Subramanian, Vanajakshi, and Sharma (2017) created a state feedback Max Pressure (MP) algorithm and Reinforcement Learning (RL) method
controller using macroscopic models for over-saturated circumstances. to make the phase order and cycle time dynamic, (c) a comparative
The conservation equation and a site-specific empirical relation served analysis of various performance indicators, such as delay, queue length,
as the foundation for the model. Likewise, George et al. (2020) and queue dissipation time, using a real-world case study and (d)
testing the feasibility in the real world by deploying the proposed
introduced area occupancy as a metric for formulating a model-based
algorithm on a hardware setup.
traffic signal control strategy. This approach accounts for fluctuat-
To facilitate a comprehensive exploration of our research, the rest
ing vehicle dimensions and effectively addresses lane indiscipline and
of the paper is organized as follows: Section 2 focuses on the multiple
traffic heterogeneity.
methods used in this study along with the theoretical and mathematical
Further, Patel et al. (2016) proposed an optimization model that
aspects of the Max Pressure algorithm as well as the reinforcement
minimizes total average control delay without explicit demand pre-
learning approaches used in the design and implementation of adaptive
diction for non-lane following heterogeneous road traffic. Simulation traffic control systems. Section 3 elaborates on the scenario develop-
results showed a significant reduction in average control delay and ment of the network and showcases the algorithms of the proposed
queue length compared to a vehicle-actuated system and a real-time design in a step-by-step approach for a total of six scenarios with the
reinforcement learning model for non-lane following heterogeneous help of micro-simulation. Section 4 provides detailed results obtained
traffic. Similarly, Maripini et al. (2022) proposed an adaptive signal in this study, performance analysis, practical implementation, and sen-
design for mixed traffic using sample travel time originated using sitivity of the proposed model. Section 5 contrasts the efficacy of the
GPS/ Bluetooth/ Wi-Fi sensors. The results indicated that real-time proposed algorithm with findings from two recent studies and outlines
optimal signal design may be achieved using data from only four the limitations inherent in the current research. Finally, Section 6
probe cars per phase. When applied to VISSIM, the suggested model concludes the study.
outperformed Webster’s signal design process, showing a theoretical
delay reduction of 11.78% and a real decrease of 10.41%. More- 2. Methods
over, Maripini et al. (2024) devised a traffic control system for mixed
traffic, incorporating travel time data from probe vehicles to enhance 2.1. Benchmark algorithms
intersection performance. The optimized model successfully attained
an average reduction of 15.42% in total intersection delay over 14 Contemporary literature reveals that most studies primarily utilize
cycles, specifically addressing near-saturated traffic conditions. Ghosh conventional benchmarks such as fixed and actuated signal controls
et al. (2023) concentrated on creating a dynamic signal control system (cf. Table 1). A few studies have used other models for benchmark-
for mixed traffic conditions by leveraging sparse data gathered from ing. In developing nations, such as India, where heterogeneous traffic
radio frequency identification (RFID) sensors. They reformulated the conditions prevail, fixed signal control is most common for field imple-
existing delay equation into an optimization problem for dynamic mentation. At a few intersections, actuated and CoSiCoSt algorithms
signal control. Compared to fixed-time signal control, a calibrated are also deployed in the field. Moreover, due to advancements in the
VISSIM microsimulation model demonstrated a 12.6% enhancement literature around ATCS, the performance is compared with two other
in the overall average intersection delay. Agarwal, Sahu et al. (2024), categories of the models, i.e., MP and RL, which are becoming popular
Dixit et al. (2020) proposed a Max Pressure-based traffic signal control but not necessarily implemented in the field, rarely are advanced
algorithms, like MP and RL, executed in practice, possibly due to
algorithm, which uses travel delay and travel time, respectively. The
their complexity and resource-intensive nature. In this study, actuated
proposed approaches were applied to mixed traffic conditions. The
and CoSiCoSt models are used as benchmarks (Muralidharan et al.,
application of such an approach is limited to specific phases (turning
2010; Saikrishna & Anusha, 2021). The proposed model is compared
movements are not available) and locations where crowdsourced data
to several MP and RL variants.
is available. The majority of studies have predominantly concentrated
on the estimation of saturation flow, and a few others have tried to
2.1.1. Actuated traffic signal control
optimize signal timings using Passenger Car Units (PCU). A few others
Actuated traffic signal control is an adaptive approach that dynam-
have demonstrated the use of various measures (e.g., travel time using
ically modifies traffic signal phases according to the prevailing traffic
probe vehicle, density, etc.) to detect the traffic conditions on the conditions. This approach is distinguished by its capacity to extend
approaches. These models often fail to capture the fluctuations and the the duration of a traffic phase in response to a continuous stream of
heterogeneous nature of traffic, resulting in sub-optimal signal timings vehicles.
that do not adequately respond to actual traffic conditions. The applica- The transition to the next phase is deliberately begun when a
bility of these models/algorithms remains limited to simulation models, suitable gap is detected in the vehicle stream.
and feasibility in the field is yet to be tested. T he literature lacks a The prolonged green timing per phase is limited to maximum green
dynamic model to identify the order of phases at an intersection and time. The minimum gap and maximum green time are the key parame-
also compute the green signal timing under heterogeneous traffic con- ters within this system, which are subject to adjustment (cf. Section 3.3
ditions. To address this gap, the reflection of heterogeneous traffic in a for the values of key parameters in the present study).
signal optimization problem becomes imperative, which is deployable
in the real world. By doing so, the accuracy and applicability of signal 2.1.2. Composite Signal Control Strategy (CoSiCoSt)
timing optimizations can be enhanced, ensuring a more comprehensive The Composite Signal Control Strategy (CoSiCoSt) traffic signal
and realistic approach to traffic management. control system employs the philosophy of Split, Cycle, and Offset
6
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Optimization (SCOOT) to achieve optimal throughput. This approach individual intersections, it is feasible to distill the concept of pressure,
dictates that the duration of the green signal for each phase is set equal into a more streamlined form, as presented in Eq. (1).
to the queue service time, which is determined by the ratio of queue ∑ ∑
length (in vehicles or PCU) and average queue service rate (in vehi- 𝑝𝑧,𝑛 (𝑡) = 𝑦𝑖𝑛 − 𝑦𝑜𝑢𝑡 (1)
(𝑗,𝑘)∈𝑀𝑧,𝑛 (𝑗,𝑘)∈𝑀𝑧,𝑛
cles/s or PCU/s). At every scan interval, the system computes the gap
between vehicles. If this gap exceeds a predefined threshold (minimum The incoming and outgoing traffic flows (in PCUs) for phase 𝑧 are
gap), the green phase is terminated; otherwise, the green phase dura- denoted by 𝑦𝑖𝑛 and 𝑦𝑜𝑢𝑡 , respectively. A set of links that define the
tion is extended. However, there is a stipulated maximum limit to the movement in a particular phase 𝑧 are symbolized by 𝑀𝑧,𝑛 .
duration of the green phase. Consequently, if the duration of the green Finally, the green time of a phase 𝑧 is determined by apportioning
phase surpasses this maximum threshold, the green phase is terminated the effective green time in the proportion of the pressure (cf., Eq. (2)).
to facilitate the transition to the next phase of the traffic signal cycle. The resulting green time is bound within a configurable threshold
The key parameters are scan interval, minimum gap threshold, and [𝑔𝑚𝑖𝑛 , 𝑔𝑚𝑎𝑥 ]. The former (𝑔𝑚𝑖𝑛 ) is estimated based on the time required
maximum green time, which are configurable (cf. Section 3.3 for the to cross the pedestrians and the latter (𝑔𝑚𝑎𝑥 )is to allow amber time and
values of key parameters in the present study). minimum green on three other approaches.
𝑝𝑧,𝑛 (𝑡)
2.2. Max Pressure (MP) 𝑔𝑚𝑖𝑛 ≤ 𝑔𝑧,𝑛 (𝑡) = ∑ ⋅ 𝐺𝑛 ≤ 𝑔𝑚𝑎𝑥 , ∀𝑧 ∈ 𝑛 (2)
𝑧∈𝑛 𝑝𝑧,𝑛 (𝑡)
The present study employs the Max Pressure algorithm proposed The share of effective green time allocated to phase 𝑧 during time
by Varaiya (2013a). In this, stationary traffic flow on each approach period 𝑡 is represented by 𝑔𝑧,𝑛 [𝑠] and total effective green time is
is determined to estimate the pressure on each approach of an inter- depicted by 𝐺𝑛 for intersection 𝑛.
section. The concept of stationary traffic flow, measured in Passenger Algorithm 2 Cyclic Max Pressure Heterogeneous (MPH) Algorithm
Car Unit (PCU), is utilized to represent the vehicles queued on a link,
1: procedure CyclicMaxPressureHeteroge-
serving as a standardized metric for evaluating congestion levels on
nous(𝑇end , 𝑖𝑛 , 𝑜𝑢𝑡 , 𝑃 𝐶𝑈𝑖 , 𝐺n , 𝑔min , 𝑔max )
that link. Consequently, the terms stationary traffic flow and queue
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1
length are used interchangeably throughout the study. In contrast to
in Fig. 1
Max Pressure based on travel time and travel delays, the present study
3: while 𝑇current < 𝑇end do ⊳ Run the simulation until end time
uses stationary traffic flow, which allows greater flexibility for phasing;
4: if phase duration expired then
in other words, vehicle counts allow the use of turning movements,
5: 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 ← ∅ ⊳ Initializing an empty list to store the
and thus, different phasing may be opted. A pseudo-code for the Max
pressure from each phase
Pressure algorithm is shown in Algorithm 1.
6: for all phases 𝑧 do
7: Calculate pressure 𝑝𝑧,𝑛 (𝑡) using Equation (8)
Algorithm 1 Algorithms for Max Pressure and Max Pressure with 8: Append (𝑧, 𝑝𝑧,𝑛 (𝑡)) to 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡
heterogeneous traffic mix 9: end for
1: procedure Max_Pressure(𝑇end , 𝑦in , 𝑦out , 𝐺n , 𝑔min , 𝑔max ) 10: Sort 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 in decreasing order of 𝑝𝑧,𝑛 (𝑡)
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1 11: for all phases (𝑧, 𝑝𝑧,𝑛 ) in 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 do
in Fig. 1 12: Determine 𝑔𝑧,𝑛 (𝑡) for phase 𝑧 using Equation (2)
3: while 𝑇current < 𝑇end do ⊳ Run the simulation until end time 13: if 𝑔𝑧,𝑛 (𝑡) < 𝑔𝑚𝑖𝑛 then
4: if phase duration expired then 14: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑖𝑛
5: for all phases 𝑧 do 15: else if 𝑔𝑧,𝑛 (𝑡) > 𝑔𝑚𝑎𝑥 then
6: Calculate pressure 𝑝𝑧,𝑛 (𝑡) 16: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑎𝑥
⊳ use Equation (1) for MP and Equation (8) for MPH 17: end if
7: end for 18: Activate phase 𝑧 with duration 𝑔𝑧,𝑛 (𝑡)
8: 𝑧next ← 𝑓 𝑖𝑛𝑑(𝑧) at max(𝑝𝑧,𝑛 (𝑡)) 19: 𝑧current ← 𝑧
⊳ Select the phase with Max Pressure 20: 𝑇current ← Update current time considering 𝑔𝑧,𝑛 (𝑡)
∑
9: Calculate total pressure 𝑧∈𝐹𝑛 𝑝𝑧,𝑛 (𝑡) 21: end for
10: Determine 𝑔𝑧,𝑛 (𝑡) for 𝑧next using Equation (2) 22: end if
11: if 𝑔𝑧,𝑛 (𝑡) < 𝑔𝑚𝑖𝑛 then 23: 𝑇current ← Update current time
12: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑖𝑛 24: end while
13: else if 𝑔𝑧,𝑛 (𝑡) > 𝑔𝑚𝑎𝑥 then 25: end procedure
14: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑎𝑥
15: end if As per the discussion in Section 1.2, acyclic Max Pressure suffers
16: Activate 𝑧next with duration 𝑔𝑧,𝑛 (𝑡) from various limitations, such as skipping of phases, which is critical
17: 𝑧current ← 𝑧next for pedestrian movements in urban areas. Thus, the present study also
18: end if employs cyclic Max Pressure (C-MP) and compares the results with
19: 𝑇current ← Update current time acyclic Max Pressure (MP).
20: end while The dynamic phasing-based cycling Max Pressure ensures the se-
21: end procedure
quential execution of all phases, prioritizing them in descending order
of their calculated pressure values. The allocation of green time within
this framework is in accordance with Eq. (2). The pseudo-codes for
In numerous South Asian urban areas, traffic conditions are markedly acyclic and cyclic Max Pressure are exhibited in Algorithms 1 and 2,
heterogeneous, characterized by a diverse array of vehicles, each respectively.
possessing distinct kinematic characteristics. This diversity contributes
to varying degrees of congestion. In the present study, a real-world use 2.3. Reinforcement learning (RL)
case is Ludhiana, India, where the traffic conditions are heterogeneous.
A Passenger Car Unit (PCU) (CRRI, 2017) is used to convert mixed traf- The Cyclic Max Pressure (C-MP) strategy emerges as a potent so-
fic to a common unit. To effectively manage all the phasing systems of lution to mitigate the issue of extended red durations; however, it
7
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
introduces a strategy employing Reinforcement Learning (RL) for signal where 𝐿 is the set of all links at the intersection 𝑛, 𝑦𝑖 , and 𝑑𝑖 are
control. stationary traffic flow and delay for vehicle type 𝑖. The set of all
Under the machine learning paradigm, an agent in the Reinforce- vehicle types is denoted by 𝐼. To normalize different vehicle types,
ment Learning (RL) model performs various actions and gets a reward these metrics are computed in terms of Passenger Car Units (PCUs).
The action space, 𝐴 (see Eq. (6)), is defined as the set of possible phase
or a punishment, which leads to a learning mechanism for decision-
durations (i.e., green time) for different phases of an intersection.
making in an environment. The agent’s primary objective is to learn
a policy, a strategy that dictates its actions in various environmental 𝐴 = {𝑎1 , 𝑎2 , … , 𝑎𝑘 } (6)
states to maximize cumulative rewards. The environment, a crucial
where 𝑘 is the number of configurations, which is the same as the
component, defines the context for the agent’s actions and their out-
number of phases. One of the actions is picked randomly for the exe-
comes. A state represents the environment’s current configuration. The cution in the model. The reward function 𝑅𝑡 is designed to incentivize
agent’s actions, selected from a set of possibilities known as the action the reduction of total vehicle delay and stationary traffic flow at the
space, aim to achieve an optimal policy that consistently maximizes intersection, as shown in Eq. (7).
rewards over time. ( )
With an objective to minimize traffic congestion by dynamically 𝑅𝑡 = − 𝑤𝑑 ⋅ 𝛥𝐷𝑛 (𝑡) + 𝑤𝑞 ⋅ 𝛥𝑌𝑛 (𝑡) (7)
adjusting phase timing based on real-time traffic conditions, the present where 𝑤𝑑 and 𝑤𝑞 are weight parameters for delay and stationary traffic
study uses the Deep Q-Network and Proximal Policy Optimization flow, respectively. 𝛥𝐷𝑛 (𝑡) and 𝛥𝑌𝑛 (𝑡) represent the change in delay
(PPO) RL models. The generic formulation of RL models is explained and stationary traffic flow from the last state, respectively. The 𝑤𝑑
first. Eq. (3) shows a state, 𝑆 of the environment, which includes vari- minimizes accumulated waiting times, and the 𝑤𝑞 focuses on reducing
ous states composed of traffic metrics such as approach-wise stationary vehicle queues. The negative sign aligns with standard reinforcement
traffic flows, total delay, current phase, and updated phase duration at learning principles, emphasizing the agent’s objective of minimizing
the intersection. cumulative costs. This succinct formulation captures the essence of the
agent’s strategy in achieving a balance between delay reduction and
𝑆 = [𝑠1 , 𝑠2 , … , 𝑠𝑗 ] (3) optimized queue lengths.
8
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
In the present study, two RL models are used; they are Deep Q- Algorithm 5 Max Pressure and Reinforcement Learning
Network (DQN) and Proximal Policy Optimization (PPO). An algorithm
1: procedure MaxPressureandRL(𝑇end , 𝑖𝑛 , 𝑜𝑢𝑡 , 𝑃 𝐶𝑈𝑖 , 𝛼, 𝜖, 𝛾, 𝑎, 𝑠)
for the former is shown in Algorithm 3, which combines Q-learning
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1
with deep neural networks. The states are the input to the neural
in Fig. 1
networks, which are mapped to the action (output nodes of the neural
3: Initialize SUMO with given demand on the network ⊳ Setup
network) and the Q-value; it is further used to get the reward value for
simulation environment
the state space and action pair. The hypertuning parameters in RL are 4: Initialize the PPO model with hyperparameters (𝛼 = 7.77 × 10−3 ,
𝛼, 𝛾, which are learning rate and discount factors, respectively. 𝜖 = 0.3, 𝛾 = 0.95), 𝑎0 , and 𝑠0 .
The PPO uses a policy gradient method and is chosen for its sta- 5: ⊳ Initial action (𝑎0 ) is a green time, picked randomly, within
bility and effectiveness in handling high-dimensional action spaces. It boundary conditions (i.e., [𝑔min , 𝑔max ])
makes a balance between exploration and exploitation by optimizing 6: ⊳ 𝑠0 is the initial traffic state
between maximizing expected returns and containing policy changes. 7: 𝑠𝑡 = 𝑠0
This ensures the stability of the training process in RL. The RL agent 8: 𝑎𝑡 = 𝑎0
is trained over numerous episodes, each representing a different traffic 9: while 𝑇current < 𝑇end do ⊳ Run simulation until end time
scenario. A pseudo-code for the PPO implementation of RL is shown 10: 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 ← ∅ ⊳ Initializing an empty list to store the
in Algorithm 4. In this, 𝛼, 𝛾, 𝜖 are hypertuning parameters, named as pressure from each phase
learning rate, discount factor, and clip factor, respectively. 11: for all phases 𝑧 do
12: Calculate pressure 𝑝𝑧,𝑛 (𝑡) using Equation (8)
2.4. Max Pressure and Reinforcement Learning-based traffic signal control 13: Append (𝑧, 𝑝𝑧,𝑛 (𝑡)) to 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 ⊳ Phase order and
corresponding pressure are stored in the list
The Reinforcement Learning (RL) methodology presents a sophisti- 14: end for
cated technique for the allocation of green times across various phases 15: Sort 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 in decreasing order of 𝑝𝑧,𝑛 (𝑡) ⊳ Getting
at a traffic intersection. However, this method exhibits a notable limita- the order of phases using MP
16:
tion due to its inherent cyclic and predetermined sequence of phase exe-
17: for all phases (𝑧, 𝑝𝑧,𝑛 ) in 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 do ⊳ Process for
cution. To address this shortcoming, this study introduces an innovative
cyclic execution of phases
signal control strategy that amalgamates the Max Pressure algorithm
18: 𝑎𝑡 ← PPO(𝑠𝑡 ) ⊳ Getting phase duration for the selected
with reinforcement learning principles. As shown in Section 2.2, the
phase using RL-PPO
pressure is calculated for each phase based on the vehicular inflow
19: if 𝑎𝑡 < 𝑔𝑚𝑖𝑛 then
and outflow. In the past, Max Pressure Control is mostly applied for
20: 𝑎𝑡 ← 𝑔𝑚𝑖𝑛
homogeneous traffic conditions. This is performed using traffic flow.
21: else if 𝑎𝑡 > 𝑔𝑚𝑎𝑥 then
The Indian traffic is marked by a diverse array of vehicle types, neces-
22: 𝑎𝑡 ← 𝑔𝑚𝑎𝑥
sitating a nuanced approach to analyzing its dynamics. Typically, for
23: end if ⊳ Boundary condition check for computed green
such traffic conditions, all vehicle types may be converted to a common
times
unit (e.g., PCU). However, to accurately account for the complexities of
24: Execute 𝑎𝑡 as per cyclic phase order ⊳ Phase duration at
heterogenous traffic conditions, an alternative variant of the maximum
time 𝑡
pressure algorithm is examined as formulated in Eq. (8).
25: Calculate 𝑅𝑡 ⊳ Calculate reward using Equation (7)
∑ 26: 𝑧current ← 𝑧next based on order given by 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡
𝑝𝑧,𝑛 (𝑡) = (𝑖𝑛 − 𝑜𝑢𝑡 )𝑖 ⋅ 𝑃 𝐶𝑈𝑖 (8)
(𝑖,𝑗,𝑘)∈𝑀𝐼,𝑧,𝑛
27: Prepare for the next traffic action (𝑠𝑡+1 )
28: 𝑎𝑡+1 ← update(𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 ,𝑠𝑡+1 ) ⊳ getting phase duration for
where 𝑖𝑛 and 𝑜𝑢𝑡 are the number of incoming and outgoing vehicles, next phase
𝑖 is the vehicle type and 𝐼 is the set of all vehicle types. This hybrid 29: 𝑔z,next ← 𝑎𝑡+1 ⊳ green duration for the next phase from
approach prioritizes phases based on the descending order of pressure the updated policy
values, concurrently employing adaptive RL-based green times. 30: Execute 𝑧next
The primary objective of this proposed model is to effectively har- 31: store transition (𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 , 𝑠𝑡+1 ) in the memory for lookup
ness the strengths of both the Max Pressure and Reinforcement Learning of similar state
frameworks. This synergistic integration is aimed at optimizing critical 32: end for
traffic parameters, including the minimization of queue lengths, reduc- 33: 𝑇current ← UpdateCurrentTime() ⊳ Increment simulation
tion in delay times, and expedited queue dissipation, thereby enhancing time
overall traffic flow efficiency. 34: end while
The pseudo-code for the proposed algorithm is shown in Algorithm 35: Evaluate PPO policy performance by averaging 𝑅𝑡 over episodes
5. It is initiated by setting the current time in the simulation to zero. 36: end procedure
The PPO model is initialized with defined hyperparameters, including
the learning rate (𝛼), discount factor (𝛾), and exploration rate (𝜖), along-
side the initial conditions for the action (𝑎0 ) and state (𝑠0 ). Throughout
make informed decisions. After executing the chosen action (𝑎𝑡 ), the
the simulation, the algorithm performs several sequential operations
algorithm calculates a reward (𝑅𝑡 ) based on the outcomes, which serves
until the end time (𝑇end ) is reached, i.e., no more vehicles remain
as feedback for the learning process. The current phase (𝑧current ) is
on the network. It involves an iterative process where traffic signal updated to the next phase (𝑧next ) as determined by 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡. The
phase sequences are continually adjusted according to the calculated state and action are updated continuously, refining the policy based
pressures for each phase. Pressure is computed using classified traffic on the reward information and updated traffic state, with each transi-
volume at each approach at any given time (cf. Eq. (8)). The computed tion stored in memory to facilitate faster learning. At the conclusion
pressures for each traffic phase are stored in a list –𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡– of the simulation, the performance of the PPO policy is evaluated
which is then used to sort the phases in decreasing pressure order. by averaging the rewards obtained over multiple episodes, provid-
The PPO model subsequently determines the optimal phase durations ing a metric for assessing the effectiveness of this integrated control
based on the current traffic state (𝑠𝑡 ), utilizing the learned policy to strategy.
9
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Table 2 the employed signal phasing system is depicted in Fig. 1. This system
Formulation of scenarios based on different algorithms.
is characterized by a sequential traffic control methodology where
Group Algorithm straight and left turn movements for two opposing approaches are
Scenario A Actuated concurrently given the green light. This is subsequently followed by
Benchmark
Scenario C Composite Signal Control
an exclusive phase dedicated to right-turn movements. This process is
Strategy
then analogously implemented for the remaining two approaches.
Scenario MP Max Pressure
In the SUMO simulation environment, the generation of random
Scenario MPH MP for Heterogeneous traffic
Max pressure (MP-H) traffic is an intrinsic capability, facilitating users to define key param-
Scenario CMPH Cyclic MP for Heterogeneous eters such as frequency, initiation, and cessation of vehicular flow.
traffic This feature enables precise control over route generation within the
Scenario RLD Reinforcement Learning Deep network. Initially, the simulated scenario was characterized by an
RL Q-Network (DQN) absence of vehicular activity.
Scenario RLP Reinforcement Learning Detailed route files are meticulously developed to infuse the net-
Proximal Policy Optimization
(PPO)
work with a realistic vehicular presence.. These files are tailored
to reflect traffic demand during the three predominant peak hours,
Hybrid Scenario H Hybrid using Cyclic MP for
Heterogeneous traffic and i.e., 08:00–09:00, 12:00–13:00, and 16:00–17:00. The composition
with RL PPO of these files is comprehensive, encompassing a diverse spectrum of
vehicle types that are typically prevalent in Indian traffic scenarios.
To address the distinct dimensional and kinematic characteristics of
the vehicles, Passenger Car Unit (PCU) values are assigned to each
3. Scenario development
vehicle category (CRRI, 2017). This allocation is critical in quantifying
the relative congestion each vehicle type could contribute to, thereby
To illustrate the efficacy of the proposed algorithms in real-world ensuring an accurate and nuanced depiction of traffic conditions within
situations, this study considers a four-leg intersection, namely, Vard- the simulation framework.
haman Chowk, Ludhiana, India. Each approach of the intersection has
surveillance cameras. Therefore, prerecorded video footages are used 3.3. Assumptions and default values
to extract the classified traffic volume count. The data extraction was
performed using a trained model developed by Agarwal, Thombre, In the present study, a 4-phase system is assumed as depicted
Kedia and Ghosh (2024). In order to assess the impact during varying in Fig. 1; in this, straight traffic from two approaches is allowed
traffic conditions, data from three distinct time intervals on July 26, in two phases, and right-turn (conflicting) traffic is allowed in the
2023, are extracted and used for simulation experiments. These in- other two phases. An amber time between the phases is considered as
tervals included morning peak (08:00-09:00), afternoon (12:00-13:00), 4 s. For actuated traffic signal control, threshold values are minimum
and evening peak (16:00-17:00) periods. The peak periods chosen for gap and maximum green time, which are assumed to be 10 s and
this study were deliberately selected to assess the effectiveness of the 60 s, respectively. In other words, if the time headway between the
proposed algorithms in effectively managing and dissipating queues consecutive vehicles is less than the minimum gap, the green extension
during the busiest hours of the day. will be given till the maximum green time is triggered. For CoSiCoSt
traffic signal control, the minimum gap and maximum green time are
3.1. Synthesis of scenarios the same as that of the actuated traffic signal control. Additionally,
the scan time is assumed to be 15 s, and the average queue service
The simulation experiment is designed on Simulation of Urban MO- rate is determined from the pre-recorded video data, which turns
bility (SUMO) (SUMO, 2024). To ensure a comprehensive comparative out to be 0.25 PCU/s, 0.23 PCU/s, and 0.17 PCU/s for 08:00–09:00,
analysis, seven distinct scenarios are constructed (see Table 2). The first 12:00–13:00, and 16:00–17:00 time periods respectively.
group has two scenarios, i.e., actuated and CoSiCoSt algorithms, which For the rest of the algorithms, the minimum and maximum green
are used as benchmarks for the rest of the algorithms. The second group times are taken as 10 s and 60 s, respectively. The former is estimated
has three scenarios of Max Pressure (MP). They are: traditional acyclic based on the time required for a pedestrian to cross the 4-lane dual
Max Pressure (MP), the Max Pressure variant for heterogeneous traffic carriageway with a speed of 1.4 m/s. The simulation interval ranging
(MP-H), and a cyclic variant of Max Pressure control for heterogeneous from 100 s to 250 s is utilized for the comparative analysis of signal (or
traffic (C-MP-H). For the cyclic variant, dynamic phasing is used in each phase) timings across various scenarios (cf., Figs. 4 and 8). Based on
cycle, i.e., the order of the phases in each cycle is different. The third the observations from the field conditions, the maximum queue length
group has two variants of Reinforcement Learning (RL), namely, Deep is assumed to be 100 m, i.e., the queue length is measured only on the
Q-learning Network (DQN) and Proximal Policy Optimization. Finally, 100 m length of the road segment for each approach.
the last group has the proposed hybrid model.
4. Results
3.2. Simulation framework
A comparative analysis is conducted to evaluate the efficacy of
An open-source traffic simulation tool, SUMO, is employed for gen- different traffic control scenarios, utilizing key performance indicators
erating, implementing, and assessing traffic scenarios. It facilitates the (KPI): queue lengths, delays, and queue dissipation times (see Table 3).
incorporation of customized traffic signal control algorithms, enabling In SUMO, the vehicles in stationary conditions are converted to PCU
the evaluation of their effectiveness through dynamic calculations of units to get the queue length at each approach of an intersection.
queue lengths and delays. This study utilizes Traffic Control Interface Similarly, the delay is defined as the duration for which a vehicle
(TraCI) version 1.19.0 (TraCI, 2024) to connect SUMO with external is stationary in the simulation model. The average queue lengths are
applications, facilitating the integration of custom algorithms. TraCI’s estimated for all approaches of an intersection and averaged over
real-time interaction capability with ongoing simulations permits the time. The total delay for a scenario is the aggregation (over the queue
modification of parameters such as traffic signals, vehicle routes, and dissipation time) of delays at all approaches at each time step. The time
signal control methods. SUMO’s OSM Web Wizard tool is used to to clear the intersection is referred to as queue dissipation time. The
replicate a network akin to the real-world layout. In the present study, simulation continues until all vehicles reach their destinations.
10
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
4.1. Comparison of benchmark models due to dynamic queue service times (i.e., the ratio of queue length and
average queue service rate) in the CoSiCoSt approach. Interestingly,
For the comparison of the performance of different algorithms, two the queue dissipation time decreases marginally for Scenario C. Clearly,
scenarios are used as benchmarks, i.e., Actuated and CoSiCoSt, and the CoSiCoSt benchmark scenario outperforms the actuated benchmark
referred to as Scenario A and Scenario C, respectively (see Table 2). scenario; the former is used for comparison with other algorithms.
Fig. 2 distinctly demonstrates the superior performance of Scenario
C compared to Scenario A, particularly in terms of delay reduction. 4.2. Comparison of Max Pressure models
This superiority is most pronounced during the morning and evening
peaks (08:00–09:00 and 16:00–17:00) time intervals, where Scenario In the analysis presented in Table 3, it is observed that Scenario
C exhibits a decrease in maximum delay, total delay, and maximum MP outperforms Scenario C in terms of total delay and queue dissi-
queue length by 30.5%, 25.1%, and 7.1% for the morning peak and by pation time, which decreased by 17.9% and 33.0% for morning peak,
23.6%, 17.9%, and 3.75% for evening peak, respectively. On the other by 25.3% and 41.2% for noon off-peak and by 18.6% and 28% for
hand, due to low traffic volume in off-peak hours (12:00–13:00), the evening peak, respectively. Interestingly, the average queue length and
reductions are not as substantial. maximum delay in Scenario MP are higher than in Scenario C, yet
Additionally, Table 3 further corroborates that the queue length and due to the faster queue dissipation, the total delay is much smaller in
delay trends for Scenario C are mostly under Scenario A. This happens Scenario MP.
11
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Table 3
Simulation results.
Traffic signal Avg. queue Max. queue Total Max. Queue dissipation
control length (PCU) length (PCU) delay (h) delay (s) time (s)
08:00–09:00
Scenario A 16.62 39.2 13 873.69 6379 26 573
Scenario C 16.04 36.4 10 391.49 4436 24 740
Scenario MP 19.61 35.9 8532.81 5532 16 587
Scenario MPH 19.35 35.6 7878.85 5035 16 126
Scenario CMPH 18.88 37 8466.97 5475 17 373
Scenario RLD 16.65 36 9793.78 5894 21 761
Scenario RLP 18.35 33.6 5369.86 3734 14 629
Scenario H 17.88 30.5 2382.38 2608 11 392
12:00–13:00
Scenario A 16.40 41.7 8419.67 4524 20 298
Scenario C 16.06 41.1 8187.58 4109 20 053
Scenario MP 25.11 40.7 6118.34 4201 11 798
Scenario MPH 22.05 40.2 4712.09 3564 11 306
Scenario CMPH 26.54 41.1 6859.32 4403 12 374
Scenario RLD 24.14 40.3 5745.67 4347 11 677
Scenario RLP 21.70 40.0 3643.06 2824 10 273
Scenario H 24.12 37.6 1191.80 1460 7240
16:00–17:00
Scenario A 12.48 40 6319.13 4488 21 720
Scenario C 13.04 38.5 5189.34 3427 19 473
Scenario MP 15.18 39.4 4224.05 4132 14 022
Scenario MPH 15.61 38.1 3519.98 3162 12 487
Scenario CMPH 19.97 39 4962.31 4319 12 039
Scenario RLD 13.61 39.9 4068.25 4072 15 494
Scenario RLP 13.17 34.4 2797.25 2262 13 121
Scenario H 14.83 28.6 640.15 1113 7387
Further, the proposed Max Pressure algorithm with different vehicle acyclic nature. Furthermore, the analysis reveals that scenario CMPH
mixes (Scenario MPH), is also investigated (see Fig. 3). Compared to introduces a practical measure by allocating a minimum green time of
the Scenario MP, the Scenario MPH reduces the total delay by 7.7%, 10 s to phase 4, a phase that was previously omitted in scenario MPH,
23%, and 16.7% for morning, noon, and evening periods, respectively. thereby ensuring that all four phases are executed.
Clearly, the estimation of pressure for each phase for different vehicles
and then converting to a common unit (i.e., PCU) yields a significant 4.3. Comparison of reinforcement learning models
improvement in the performance over Scenarios C and MP.
The analysis (see Fig. 3) revealed that despite the implementation While Max Pressure signal control is recognized as a sophisticated
of a strategy where phases are executed in an order prioritized by method for traffic signal control by determining the phase with maxi-
descending pressure values, the overall performance is detrimentally mum pressure, it nonetheless exhibits limitations in its simplistic ap-
affected in comparison to Scenarios MP and MPH. A similar outcome is proach towards the allocation of green times. Therefore, reinforce-
reported by Sun and Yin (2018), where cyclic MP is inferior to MP. ment learning models leveraging Deep Q-Network (DQN) and Proximal
Notably, Scenario CMPH is still better than Scenario C in terms of Policy Optimization (PPO) are introduced (Scenarios RLD and RLP,
total delay (a decrease in the range of 4.4% to 18.52%) and queue respectively). In the comparative analysis between the two RL variants
dissipation time (a decrease in the range of 29.8% to 38.29%). (see Fig. 6), the results demonstrated that scenario RLP outperformed
Further, Fig. 4 distinguishes the allocation of green signal timings scenario RLD across all indicators and time intervals. The superiority
between the Max Pressure variants and scenario C. In scenarios MP in Scenario RLP’s performance over Scenario RLD is best highlighted
and MPH, the traffic phases that exhibit the maximum pressure are in the 08:00–09:00 interval with 45.17% and 32.77% reduction in total
allocated green signals, diverging from the predetermined cyclic order delay and queue dissipation time, respectively.
characteristic of scenario C. It is noteworthy that in Scenarios MP Notably, while the green time allocation is dynamic and responsive
and MPH, certain traffic phases, especially those marked by signif- to real-time traffic conditions, the sequence of the phases remains
icantly low-pressure values, are susceptible to prolonged red times. fixed and cyclic for both RL variants. Compared with the benchmark
Specifically, phase 2 in scenario MP and phase 4 in scenario MPH are (Scenario C), the reduction in total delay and queue dissipation times
illustrative of this phenomenon. Also, certain phases, particularly those for Scenario RLP is in the range of 46.1% to 55.5% and 32.6% to 48.8%,
characterized by exceedingly high pressure values, are getting green respectively. Further, comparing the results of Scenario RLP with Max
timing more than once in the first four green phases (see phase 1 for Pressure-based algorithms (Scenarios MP, MPH, and CMPH), it can be
morning and noon periods and phase 3 for the evening period). To observed that reinforcement learning performs much better, which is
showcase this, refer to Fig. 5, which shows the phase timings for 1000 aligned with the literature (Van der Pol & Oliehoek, 2016; Wei, Zheng,
consecutive seconds of evening periods, starting at 𝑡 = 100. Looking Yao, & Li, 2018). Though it was not recorded systematically and was
at the first 100 s, it can be observed that Phase 1 and Phase 3 are not a primary objective of the present study, it has been observed that
getting green twice, phase 2 is getting green once, and phase 4 is Scenario RLP has a much lesser computational burden than Scenario
unable to get green till 141 s This is due to the pressure differential and RLD.
12
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Fig. 3. Comparison of queue lengths and delays for Max Pressure scenarios.
13
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Fig. 4. Comparison of signal timing for Scenario C, MP, MPH, and CMPH at different time periods of the day. The three columns represent the time periods 08:00–09:00,
12:00–13:00 and 16:00–17:00, respectively.
Table 5
Wilcoxon signed-rank test results.
Average queue length Total delay Queue
dissipation time
(PCU) (h) (s)
08:00–09:00
Statistic (𝑊 ) 0.0 0.0 0.0
𝑝-value 1.863 × 10−9 1.863 × 10−9 1.863 × 10−9
Decision reject 𝐻0 reject 𝐻0 reject 𝐻0
12:00–13:00
Statistic (𝑊 ) 0.0 0.0 0.0
𝑝-value 1.863 × 10−9 1.863 × 10−9 1.863 × 10−9
Decision reject 𝐻0 reject 𝐻0 reject 𝐻0
16:00–17:00
Fig. 5. Signal timings for Scenario MPH (evening period) for simulation period 100 s Statistic (𝑊 ) 16.0 0.0 0.0
to 1000 s. 𝑝-value 3.148 × 10−7 1.863 × 10−9 1.863 × 10−9
Decision reject 𝐻0 reject 𝐻0 reject 𝐻0
14
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
a statistically significant difference (at a 99% confidence level) in the system will enable efficient usage of the resources. In brief, this will
key performance indicators (i.e., average queue length, total delay, not only reduce installation time and costs but also reduce the need
and queue dissipation time) of scenarios RLP and H across all time for interruption of traffic conditions without any compromise in the
intervals. This statistical testing confirms that the proposed algorithm efficiency of the whole system.
significantly outperforms reinforcement learning with the proximal On the hardware side, the proposed algorithm is deployed in an
policy optimization algorithm. embedded computer vision device that processes videos using different
approaches in real-time. If the camera angle is changed by physical
4.5. Practical implementation or climatic factors in a way that prevents the road (region of interest)
from being focussed, the detection/ count would be erroneous. Further,
Some of the algorithms in the literature are computationally severe weather conditions may further affect the accuracy of the object
resource-intensive, which limits their applicability to the real world. detection, tracking, and counting modules.
To facilitate this, the proposed algorithm is embedded in a hardware Practically, the system was tested satisfactorily. However, a few
setup (cf. Fig. 11) for an intersection as proposed in Agarwal, Krish- failures may occur, for instance, failure of communication between
nan O., Ravi, and Saxena (2023). It is a wireless hardware architecture the server and edge module, crash of processor, power failures, com-
(cf. Fig. 10) where all approaches of an intersection have a processor patibility failures in the future during software degradation due to
and relay module setup (cf. Figs. 11(b) and 11(c)). A block diagram for dependency issues, etc.
an approach of an intersection is shown in Fig. 10. In this, the processor
is connected to a camera, which captures the images; these images are
4.6. Sensitivity analysis
processed in real-time to get the classified vehicle counts (Agarwal,
Thombre et al., 2024). The data from all approaches of an intersection
is stored on a cloud relational database server. The processor is also In order to test the robustness of the proposed algorithm, Scenario H
connected to the cloud computing node, where the proposed algorithm is run 30 times to manifest the variability in the performance indicators
is embedded. The whole setup is tested under laboratory conditions across different runs. The travel demand is fixed for three time periods.
using recorded videos of different approaches. The mean pre-processing The major source of the stochasticity is in the scenario. For instance,
time, inference time, and post-processing time are 12.92, 33.27, and the initial action is chosen randomly, which may affect the direction of
4.94 ms. Thus, the whole system worked without any lag in the the learning path and eventually affect the phase timings.
processing of the videos or delay in running the algorithm to get the Fig. 12 demonstrates the variation in average queue length, total
phase sequence and optimized signal timings. delay, and queue dissipation time over 30 simulation runs of Scenario
The proposed algorithm is deployed on a wireless hardware archi- H at different periods of the day. Further, Table 6 lists various measures
tecture, which would facilitate the retrofitting of an existing fixed/ of central tendency, 95% confidence intervals, statistical testing out-
adaptive traffic signal control system. The retrofitting of an existing comes, and variations for average queue length, total delay, and queue
15
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Fig. 7. Comparison of queue lengths and delays for the proposed scenario.
Fig. 8. Comparison of signal timing for different scenarios and time periods of the day. The three columns represent the time periods 08:00–09:00, 12:00–13:00 and 16:00–17:00,
respectively.
dissipation time at different periods of the day. The coefficient of vari- uniform distribution is adopted as the benchmark. Within this frame-
ation for average queue length, total delay, and queue dissipation time work, two tests are particularly pertinent: the Kolmogorov–Smirnov
is in the range of 0.008–0.019, 0.030–0.042, and 0.01–0.013, respec- (K-S) and Cramér-von Mises (C-vM). The K-S test, a well-established
tively. Similarly, the 95% confidence interval is very tight. Clearly, the methodology, compares the sample’s empirical distribution function
variations are very low, which confirms the robustness of the proposed with the cumulative distribution function of the hypothesized uniform
algorithm towards high performance. To further assess the sensitivity of distribution. The test statistic, 𝐷, is defined as the maximum absolute
the algorithm’s output, statistical evaluations are conducted. Given that difference between the empirical and theoretical distribution functions.
the ideal scenario anticipates consistent results across all 30 runs, the On the other hand, the C-vM test assesses the integrated squared
16
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Fig. 9. Comparison of rewards for scenarios RLP and H across different time periods.
Table 6
Descriptive statistics for Scenario H.
Average queue length Total delay Queue dissipation time
(PCU) (h) (s)
08:00–09:00
Mean 17.706 2438.725 11 516.90
95% confidence interval [17.63, 17.78] [2411.35, 2466.10] [11473.55, 11560.25]
Median 17.705 2445.201 11 508.50
Standard deviation 0.2 72.074 114.14
Coefficient of variation 0.011 0.030 0.01
statistic (𝐷) 0.217 0.170 0.211
K-S test 𝑝-value 0.103 0.314 0.119
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
statistic (𝜔2 ) 0.247 0.177 0.230
C-vM test 𝑝-value 0.192 0.317 0.216
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
12:00–13:00
Mean 24.367 1259.035 7349.70
95% confidence interval [24.29, 24.44] [1241.91, 1276.16] [7313.14, 7386.26]
Median 24.342 1254.155 7373
Standard deviation 0.201 45.101 96.259
Coefficient of variation 0.008 0.036 0.013
statistic (𝐷) 0.137 0.119 0.207
K-S test 𝑝-value 0.575 0.746 0.131
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
statistic (𝜔2 ) 0.146 0.097 0.268
C-vM test 𝑝-value 0.403 0.602 0.167
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
16:00–17:00
Mean 15.087 656.578 7366.367
95% confidence interval [14.98, 15.20] [646.17, 666.98] [7329.93, 7402.81]
Median 15.123 655.621 7382
Standard deviation 0.291 27.401 95.945
Coefficient of variation 0.019 0.042 0.013
statistic (𝐷) 0.128 0.230 0.227
K-S test 𝑝-value 0.667 0.072 0.078
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
statistic (𝜔2 ) 0.101 0.314 0.227
C-vM test 𝑝-value 0.582 0.123 0.221
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
differences throughout the distribution, denoted as 𝜔2 , offering a more Further, the phase timing is given by the reinforcement learning al-
comprehensive evaluation of the overall fit. For both tests, the null gorithm using proximal policy optimization. The study compares the
hypothesis (𝐻0 ) asserts that the sample originates from a uniformly outcome of the proposed algorithm with commonly used algorithms in
distributed population, while the alternative hypothesis (𝐻1 ) contends practice (e.g., actuated, CoSiCoSt), standard Max Pressure, and Rein-
that the sample does not derive from such a distribution. The results forcement Learning algorithms. The proposed algorithm demonstrates
from both tests substantiate that the KPIs (i.e., average queue length, considerable promise in diminishing total delay and enhancing queue
total delay, and queue dissipation time) demonstrate uniform behavior dissipation with respect to different categories of approaches, such
consistently across the 30 runs. as actuated, CoSiCoSt, Max Pressure, and Reinforcement Learning. A
reduction in total delay inherently suggests a decrease in the total travel
5. Discussion time. Less time spent at traffic intersections leads to lesser vehicular
emissions, fuel consumption, and noise pollution, thereby positing a po-
The present study proposes a hybrid algorithm to determine the tentially significant positive impact on the environment. Additionally,
phase sequence and timing. For phase sequence, Max Pressure con- it assists in reducing the negative effects on the health of the travelers.
trol is used, which is determined using the classified traffic volume. This happens due to a direct reduction in exposure duration, which
17
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Table 7
Performance comparison of Scenarios SAC, GNN with Scenario H.
Traffic signal Avg. queue Max. queue Total Max. Queue dissipation
control length (PCU) length (PCU) delay (h) delay (s) time (s)
08:00–09:00
Scenario SAC 19.24 34.5 5056.14 3469 13 116
Scenario GNN 19.52 34.6 3820.94 4051 12 617
Scenario H 17.88 30.5 2382.38 2608 11 392
12:00–13:00
Scenario SAC 22.36 40.6 3492.34 2737 9832
Scenario GNN 23.82 38.6 1988.11 2396 8678
Scenario H 24.12 37.6 1191.80 1460 7240
16:00–17:00
Scenario SAC 13.48 33.1 2355.65 2444 11 769
Scenario GNN 16.14 35.6 2195.26 2324 10 198
Scenario H 14.83 28.6 640.15 1113 7387
between 13.14% and 37.23% against Scenario SAC, and between 9.71%
and 27.56% in comparison to Scenario GNN. These quantitative im-
provements are further substantiated in Fig. 13, which aligns with and
supports these findings. Further, it is noteworthy that while Scenario
GNN exemplifies a complex reinforcement learning architecture, it
is characterized by significant computational demands. In contrast,
Scenario SAC, although computationally efficient, does not perform as
well as Scenario H in terms of KPIs.
18
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
19
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Fig. 13. Comparison of queue lengths and delays for the proposed scenario with respect to existing studies.
for average queue length, total delay, and queue dissipation time. Declaration of competing interest
Furthermore, the statistical test confirms the algorithm’s robustness,
demonstrating that the KPIs were consistent across the 30 simulation The authors declare that they have no known competing finan-
runs. In addition to traditional benchmarks, our study also explored cial interests or personal relationships that could have appeared to
comparisons with recent approaches in the literature, such as the Soft influence the work reported in this paper.
Actor-Critic and heterogeneous graph neural networks. The compar-
ative analysis unveiled the enhanced performance of the proposed Data availability
algorithm. After showcasing the effectiveness of the proposed approach
through simulations, the practical implementation of these advanced The data that has been used is confidential.
technologies necessitates thoughtful consideration of potential real-
world challenges. Thus, the proposed algorithm is deployed on a wire- References
less hardware architecture to facilitate seamless integration into traffic
Abdulhai, B., Pringle, R., & Karakoulas, G. J. (2003). Reinforcement learning for
signal control systems. An exciting avenue for enhancing our traffic
true adaptive traffic signal control. Journal of Transportation Engineering, 129(3),
signal control strategy further involves the integration of vehicle pri- 278–285. http://dx.doi.org/10.1061/(asce)0733-947x(2003)129:3(278).
oritization strategies. This study presents the opportunity to integrate Agarwal, A., & Kaddoura, I. (2019). On-road air pollution exposure to cyclists in an
priority levels for specific types of vehicles, such as buses, within agent-based simulation framework. Periodica Polytechnica Transportation Engineering,
the traffic management framework. This approach can be particularly 48(2), 117–125. http://dx.doi.org/10.3311/PPtr.12661.
Agarwal, A., Krishnan O., K., Ravi, D. K., & Saxena, D. K. (2023). Wireless edge
beneficial in urban areas where public transport often faces significant
computing-based adaptive traffic control system with real-time vehicle tracking
delays. Moreover, the concept of vehicle prioritization can be extended and cloud integration. 202311079549.
by considering vehicle occupancy rates. Transitioning from a vehicle- Agarwal, A., & Lämmel, G. (2016). Modeling seepage behavior of smaller vehicles
centric to a person-centric delay system allows for a more holistic in mixed traffic conditions using an agent based simulation. Transportation in
evaluation of traffic efficiency. Developing Economies, 2(8), http://dx.doi.org/10.1007/s40890-016-0014-9.
Agarwal, A., Sahu, D., Nautiyal, A., Agarwal, P., & Gupta, M. (2024). Fusing crowd-
sourced data to an adaptive wireless traffic signal control system architecture.
CRediT authorship contribution statement Internet of Things, http://dx.doi.org/10.1016/j.iot.2024.101169.
Agarwal, A., Thombre, A., Kedia, K., & Ghosh, I. (2024). ITD: Indian traffic dataset
Amit Agarwal: Conceptualization, Methodology, Writing – orig- for intelligent transportation systems. In 2024 16th international conference on
inal draft, Supervision. Deorishabh Sahu: Writing – original draft, cOMmunication systems & nETworkS (pp. 842–850). http://dx.doi.org/10.1109/
COMSNETS59351.2024.10427394.
Methodology, Data analysis. Rishabh Mohata: RL model formulation,
Anderson, L., Pumir, T., Triantafyllos, D., & Bayen, A. M. (2018). Stability and
Analysis. Kuldeep Jeengar: RL model formulation, Simulation, Data implementation of a cycle-based max pressure controller for signalized traffic
analysis. Anuj Nautiyal: Simulation, Data analysis, Writing – original networks. Networks and Heterogeneous Media, 13, http://dx.doi.org/10.3934/nhm.
draft. Dhish Kumar Saxena: Conceptualization, Supervision. 2018011.
20
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Anupriya, Bansal, P., & Graham, D. J. (2023). Congestion in cities: Can road capacity Noaeen, M., Naik, A., Goodman, L., Crebo, J., Abrar, T., Abad, Z. S. H., et al.
expansions provide a solution? Transportation Research Part A: Policy and Practice, (2022). Reinforcement learning in urban network traffic signal control: A systematic
174, Article 103726. http://dx.doi.org/10.1016/j.tra.2023.103726. literature review. Expert Systems with Applications, 199, Article 116830. http://dx.
Bickel, P. J., Chen, C., Kwon, J., Rice, J., van Zwet, E., & Varaiya, P. (2007). Measuring doi.org/10.1016/j.eswa.2022.116830.
traffic. Statistical Science, 22(4), http://dx.doi.org/10.1214/07-sts238. Padinjarapat, R. K., & Mathew, T. V. (2020). Estimation of saturation flow for non-lane
Boukerche, A., Zhong, D., & Sun, P. (2022). A novel reinforcement learning-based based mixed traffic streams. Transportmetrica B: Transport Dynamics, 9(1), 42–61.
cooperative traffic signal system through max-pressure control. IEEE Transactions http://dx.doi.org/10.1080/21680566.2020.1781708.
on Vehicular Technology, 71(2), 1187–1198. http://dx.doi.org/10.1109/tvt.2021. Park, S., Han, E., Park, S., Jeong, H., & Yun, I. (2021). Deep Q-network-based traffic
3069921. signal control models. Plos one, 16(9), Article e0256405.
Bouktif, S., Cheniki, A., & Ouni, A. (2021). Traffic signal control using hybrid action Patel, A., Mathew, T. V., & Venkateswaran, J. (2016). Real-time adaptive signal
space deep reinforcement learning. Sensors, 21(7), 2302. http://dx.doi.org/10. controller for non-lane following heterogeneous road traffic. In 2016 8th inter-
3390/s21072302. national conference on communication systems and networks (pp. 1–6). IEEE, http:
CRRI (2017). Indian highway capacity manual (Indo-HCM). Central Road Research //dx.doi.org/10.1109/comsnets.2016.7439944.
Institute, URL https://crridom.gov.in/indian-highway-capacity-manual. Prashanth, L., & Bhatnagar, S. (2011). Reinforcement learning with average cost for
Dixit, V., Nair, D. J., Chand, S., & Levin, M. W. (2020). A simple crowdsourced delay- adaptive control of traffic lights at intersections. In 2011 14th international IEEE
based traffic signal control. PLoS ONE, 15(4), http://dx.doi.org/10.1371/journal. conference on intelligent transportation systems (pp. 1640–1645). IEEE.
pone.0230598. Pumir, T., Anderson, L., Triantafyllos, D., & Bayen, A. M. (2015). Stability of modified
Ducrocq, R., & Farhi, N. (2023). Deep reinforcement Q-learning for intelligent traffic max pressure controller with application to signalized traffic networks. In 2015
signal control with partial detection. International Journal of Intelligent Transportation American control conference (pp. 1879–1886). http://dx.doi.org/10.1109/ACC.2015.
Systems Research, 21(1), 192–206. http://dx.doi.org/10.1007/s13177-023-00346-4. 7171007.
Eom, M., & Kim, B.-I. (2020). The traffic signal control problem for intersections: Radhakrishnan, P., & Mathew, T. V. (2011). Passenger car units and saturation
A review. European Transport Research Review, 12(1), http://dx.doi.org/10.1186/ flow models for highly heterogeneous traffic at urban signalised intersections.
s12544-020-00440-8. Transportmetrica, 7(2), 141–162. http://dx.doi.org/10.1080/18128600903351001.
Ghosh, T., Anusha, S., Babu, A., & Vanajakshi, L. D. (2023). Performance evaluation Robertson, D., & Bretherton, R. (1991). Optimizing networks of traffic signals in real
of a dynamic signal control system for mixed traffic conditions using sparse time-the SCOOT method. IEEE Transactions on Vehicular Technology, 40(1), 11–15.
data. Transportation Research Record: Journal of the Transportation Research Board, http://dx.doi.org/10.1109/25.69966.
2677(10), 797–807. http://dx.doi.org/10.1177/03611981231163770. Saiki, T., & Arai, S. (2023). Flexible traffic signal control via multi-objective reinforce-
Gregoire, J., Frazzoli, E., de La Fortelle, A., & Wongpiromsarn, T. (2014). Back- ment learning. IEEE Access, 11, 75875–75883. http://dx.doi.org/10.1109/access.
pressure traffic signal control with unknown routing rates. http://dx.doi.org/10. 2023.3296537.
48550/ARXIV.1401.3357.
Saikrishna, C. A., & Anusha, S. (2021). Vehicle actuated signal control system for
Huang, L., & Qu, X. (2020). Improving traffic signal control operations using proximal mixed traffic conditions. In Conference of transportation research group of India (pp.
policy optimization. IET Intelligent Transport Systems, 14(12), 1572–1580. 397–411). Springer.
Jia, H., Lin, Y., Luo, Q., Li, Y., & Miao, H. (2019). Multi-objective optimization
Saxena, D. K., Mittal, S., Kapoor, S., & Deb, K. (2023). A localized high-fidelity-
of urban road intersection signal timing based on particle swarm optimization
dominance-based many-objective evolutionary algorithm. IEEE Transactions on
algorithm. Advances in Mechanical Engineering, 11(4), Article 168781401984249.
Evolutionary Computation, 27(4), 923–937. http://dx.doi.org/10.1109/TEVC.2022.
http://dx.doi.org/10.1177/1687814019842498.
3188064.
Kouvelas, A., Lioris, J., Fayazi, S. A., & Varaiya, P. (2014). Maximum pressure
Sims, A. G., & Dobinson, K. W. (1980). The Sydney coordinated adaptive traffic (SCAT)
controller for stabilizing queues in signalized arterial networks. Transportation
system philosophy and benefits. IEEE Transactions on Vehicular Technology, 29,
Research Record: Journal of the Transportation Research Board, 2421(1), 133–141.
130–137, URL https://api.semanticscholar.org/CorpusID:26889615.
http://dx.doi.org/10.3141/2421-15.
Singh, V., Meena, K. K., & Agarwal, A. (2021). Travellers’ exposure to air pollution:
Kuyer, L., Whiteson, S., Bakker, B., & Vlassis, N. (2008). Multiagent reinforcement
A systematic review and future directions. Urban Climate, 38, http://dx.doi.org/10.
learning for urban traffic control using coordination graphs. In Machine learning and
1016/j.uclim.2021.100901.
knowledge discovery in databases: European conference, ECML pKDD 2008, antwerp,
Studer, L., & Ketabdari, M. (2015). Analysis of adaptive traffic control systems design
Belgium, September 15-19, 2008, proceedings, part I 19 (pp. 656–671). Springer.
of a decision support system for better choices. Journal of Civil & Environmental
Le, T., Vu, H. L., Walton, N., Hoogendoorn, S. P., Kovács, P., & Queija, R. N.
Engineering, 05(06), http://dx.doi.org/10.4172/2165-784x.1000195.
(2017). Utility optimization framework for a distributed traffic control of urban
SUMO (2024). Simulation of urban mobility. URL https://eclipse.dev/sumo/.
road networks. Transportation Research, Part B (Methodological), 105, 539–558.
http://dx.doi.org/10.1016/j.trb.2017.10.004. Sun, X., & Yin, Y. (2018). A simulation study on max pressure control of signalized
Levin, M. W., Barman, S., Robbennolt, J., Hu, J., Odell, M., & Kang, D. (2022). intersections. Transportation Research Record: Journal of the Transportation Research
Towards implementation of Max-Pressure signal timing on Minnesota roads: Technical Board, 2672(18), 117–127. http://dx.doi.org/10.1177/0361198118786840.
report, Minnesota Department of Transportation, URL https://hdl.handle.net/20. Tassiulas, L., & Ephremides, A. (1992). Jointly optimal routing and scheduling in
500.14153/mndot.3902. packet ratio networks. Institute of Electrical and Electronics Engineers. Transactions
Levin, M. W., Hu, J., & Odell, M. (2020). Max pressure signal control with cyclical on Information Theory, 38(1), 165–168. http://dx.doi.org/10.1109/18.108264.
phase structure. Transportation Research Part C (Emerging Technologies), 120, Article TraCI (2024). Traffic control interface. URL https://sumo.dlr.de/docs/TraCI.html.
102828. http://dx.doi.org/10.1016/j.trc.2020.102828. Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for
Liu, H., & Gayah, V. V. (2022). A novel Max pressure algorithm based on traffic traffic light control. In Proceedings of learning, inference and control of multi-agent
delay. Transportation Research Part C (Emerging Technologies), 143, Article 103803. systems: vol. 8, (pp. 21–38).
http://dx.doi.org/10.1016/j.trc.2022.103803. Varaiya, P. (2013a). Max pressure control of a network of signalized intersections.
Mannion, P., et al. (2016). Exploring novel approaches in reinforcement learning. Transportation Research Part C (Emerging Technologies), 36, 177–195. http://dx.doi.
International Journal of Machine Learning, 22(5), 345–359. org/10.1016/j.trc.2013.08.014.
Mao, F., Li, Z., & Li, L. (2022). A comparison of deep reinforcement learning models Varaiya, P. (2013b). The max-pressure controller for arbitrary networks of signalized
for isolated traffic signal control. IEEE Intelligent Transportation Systems Magazine, intersections. In Complex networks and dynamic systems (pp. 27–66). Springer New
15(1), 160–180. York, http://dx.doi.org/10.1007/978-1-4614-6243-9_2.
Maripini, H., Vanajakshi, L., & Chilukuri, B. R. (2022). Optimal signal control design Verghese, V., Chenhui, L., Subramanian, S. C., Vanajakshi, L., & Sharma, A. (2017).
for isolated intersections using sample travel-time data. Journal of Advanced Development and implementation of a model-based road traffic-control scheme.
Transportation, 2022, 1–16. http://dx.doi.org/10.1155/2022/7310250. Journal of Computing in Civil Engineering, 31(3), http://dx.doi.org/10.1061/(asce)
Maripini, H., Vanajakshi, L., & Chilukuri, B. R. (2024). A probe-based demand cp.1943-5487.0000635.
responsive signal control for isolated intersections under mixed traffic conditions. Wang, F., Tang, K., Li, K., Liu, Z., & Zhu, L. (2019). A group-based signal timing
Transportation Letters, 1–14. http://dx.doi.org/10.1080/19427867.2022.2164613. optimization model considering safety for signalized intersections with mixed traffic
Mercader, P., Uwayid, W., & Haddad, J. (2020). Max-pressure traffic controller based flows. Journal of Advanced Transportation, 2019, 1–13. http://dx.doi.org/10.1155/
on travel times: An experimental analysis. Transportation Research Part C (Emerging 2019/2747569.
Technologies), 110, 275–290. http://dx.doi.org/10.1016/j.trc.2019.10.002. Wang, X., Yin, Y., Feng, Y., & Liu, H. X. (2022). Learning the max pressure control for
Mirchandani, P., & Head, L. (2001). A real-time traffic signal control system: Ar- urban traffic networks considering the phase switching loss. Transportation Research
chitecture, algorithms, and analysis. Transportation Research Part C (Emerging Part C (Emerging Technologies), 140, Article 103670. http://dx.doi.org/10.1016/j.
Technologies), 9(6), 415–432. http://dx.doi.org/10.1016/s0968-090x(00)00047-4. trc.2022.103670.
Mondal, S., Arya, V. K., & Gupta, A. (2022). An optimised approach for saturation flow Wei, H., Chen, C., Zheng, G., Wu, K., Gayah, V., Xu, K., et al. (2019). PressLight:
estimation of signalised intersections. Proceedings of the Institution of Civil Engineers learning max pressure control to coordinate traffic signals in arterial network. In
- Transport, 175(3), 137–149. http://dx.doi.org/10.1680/jtran.18.00206. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery
Muralidharan, V., Ravikumar, P., & Nitin, S. (2010). A method for synchronizing & data mining (pp. 1290–1298). New York, NY, USA: Association for Computing
heterogeneous road traffic and system thereof. Google Patents, IN Patent IN239258B. Machinery, http://dx.doi.org/10.1145/3292500.3330949.
21
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416
Wei, L., Gao, L., Yang, J., & Li, J. (2023). A reinforcement learning traffic signal control Yue, W., Li, C., Chen, Y., Duan, P., & Mao, G. (2022). What is the root cause of
method based on traffic intensity analysis. In 2023 42nd Chinese control conference. congestion in urban traffic networks: Road infrastructure or signal control? IEEE
IEEE, http://dx.doi.org/10.23919/ccc58697.2023.10240019. Transactions on Intelligent Transportation Systems, 23(7), 8662–8679. http://dx.doi.
Wei, H., Zheng, G., Yao, H., & Li, Z. (2018). IntelliLight: A reinforcement learning org/10.1109/TITS.2021.3085021.
approach for intelligent traffic light control. In Proceedings of the 24th ACM SIGKDD Zhang, G., Chang, F., Jin, J., Yang, F., & Huang, H. (2024). Multi-objective deep
international conference on knowledge discovery & data mining (pp. 2496–2505). New reinforcement learning approach for adaptive traffic signal control system with
York, NY, USA: Association for Computing Machinery, http://dx.doi.org/10.1145/ concurrent optimization of safety, efficiency, and decarbonization at intersections.
3219819.3220096. Accident Analysis and Prevention, 199, Article 107451. http://dx.doi.org/10.1016/j.
Wiering, M. A., et al. (2000). Multi-agent reinforcement learning for traffic light aap.2023.107451.
control. In Machine learning: proceedings of the seventeenth international conference Zhao, Z., Wang, K., Wang, Y., & Liang, X. (2024). Enhancing traffic signal control with
(pp. 1151–1158). composite deep intelligence. Expert Systems with Applications, 244, Article 123020.
Wongpiromsarn, T., Uthaicharoenpong, T., Wang, Y., Frazzoli, E., & Wang, D. (2012). Zhao, W., Ye, Y., Ding, J., Wang, T., Wei, T., & Chen, M. (2022). IPDALight: Intensity-
Distributed traffic signal control for maximum network throughput. In 2012 15th and phase duration-aware traffic signal control based on reinforcement learning.
international IEEE conference on intelligent transportation systems (pp. 588–595). Journal of Systems Architecture, 123, Article 102374. http://dx.doi.org/10.1016/j.
http://dx.doi.org/10.1109/ITSC.2012.6338817. sysarc.2021.102374.
Yazdani, M., Sarvi, M., Asadi Bagloee, S., Nassir, N., Price, J., & Parineh, H. (2023).
Intelligent vehicle pedestrian light (IVPL): A deep reinforcement learning approach
for traffic signal control. Transportation Research Part C (Emerging Technologies), 149,
Article 103991. http://dx.doi.org/10.1016/j.trc.2022.103991.
22