0% found this document useful (0 votes)

121 views22 pages

Expert Systems With Applications

Uploaded by

1057258646

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views22 pages

Expert Systems With Applications

Uploaded by

1057258646

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Expert Systems With Applications 254 (2024) 124416

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Dynamic traffic signal control for heterogeneous traffic conditions using Max
Pressure and Reinforcement Learning✩ , ✩✩
Amit Agarwal a ,∗, Deorishabh Sahu a , Rishabh Mohata a , Kuldeep Jeengar b , Anuj Nautiyal a ,
Dhish Kumar Saxena b
a
Department of Civil Engineering, Indian Institute of Technology Roorkee, Haridwar, Uttarakhand, 247667, India
b Department of Mechanical and Industrial Engineering, Indian Institute of Technology Roorkee, Haridwar, Uttarakhand, 247667, India

ARTICLE INFO ABSTRACT

Keywords: Optimization of green signal timing for each phase at a signalized intersection in an urban area is critical
Adaptive Traffic Signal Control for efficacious traffic management and congestion mitigation. Many algorithms are developed, yet very few
Max Pressure are for cities in developing nations where traffic is characterized by its heterogeneous nature. While some of
Mixed traffic
the recent studies have explored different variants of Max Pressure (MP) and Reinforcement Learning (RL) for
Reinforcement Learning
optimizing phase timing, the focus is limited to homogeneous traffic conditions. In developing nations, such
as India, control systems, like fixed and actuated, are still predominantly used in practice. Composite Signal
Control Strategy (CoSiCoSt) is also employed at a few intersections. However, there is a notable absence of
advanced models addressing heterogeneous traffic behavior, which have a great potential to reduce delays and
queue lengths. The present study proposes a hybrid algorithm for an adaptive traffic control system for real-
world heterogeneous traffic conditions. The proposed algorithm integrates Max Pressure with Reinforcement
Learning. The former dynamically determines phase order by performing pressure calculations for each phase.
The latter optimizes the phase timings for each phase to minimize delays and queue lengths using the proximal
policy optimization algorithm. In contrast to the past RL models, in which the phase timing is determined for
all phases at once, in the proposed algorithm, the phase timings are determined after the execution of every
phase. To assess the impact, classified traffic volume is extracted from surveillance videos of an intersection in
Ludhiana, Punjab and simulated using Simulation of Urban Mobility (SUMO). Traffic volume data is collected
over three distinct time periods of the day. The results of the proposed algorithm are compared with benchmark
algorithms, such as Actuated, CoSiCoSt, and acyclic & cyclic Max Pressure, Reinforcement Learning-based
algorithms. To assess the performance, queue length, delay, and queue dissipation time are considered as
key performance indicators. Of actuated and CoSiCoSt, the latter performs better, and thus, the performance
of the proposed hybrid algorithm is compared with CoSiCoSt. The proposed algorithm effectively reduces
total delay and queue dissipation time in the range of 77.07%–87.66% and 53.95%–62.07%, respectively.
Similarly, with respect to the best-performing RL model, the drop in delay and queue dissipation time range
from 55.63 to 77.12% and 22.13 to 43.7%, respectively, which is significant at 99% confidence level. The
proposed algorithm is deployed on a wireless hardware architecture to confirm the feasibility of real-world
implementation. The findings highlight the algorithm’s potential as an efficient solution for queues and delays
at signalized intersections, where mixed traffic conditions prevail.

✩ This work was supported by the IDEAS-TIH ISI Kolkata (Grant No. - ISI-1970-MID) under the National Mission on Interdisciplinary Cyber-Physical System
(NM-ICPS) of the Department of Science and Technology, Government of India.
✩✩ The authors also wish to thank the Punjab Police (Traffic Wing), Traffic Research Center, Mohali, and Safety Alliance for Everyone Society, Mohali, for their
support in facilitating the data.
∗ Corresponding author.
E-mail addresses: [email protected] (A. Agarwal), [email protected] (D. Sahu), [email protected] (R. Mohata), [email protected]
(K. Jeengar), [email protected] (A. Nautiyal), [email protected] (D.K. Saxena).
URL: https://faculty.iitr.ac.in/~amitfce/ (A. Agarwal).

https://doi.org/10.1016/j.eswa.2024.124416
Received 7 March 2024; Received in revised form 14 May 2024; Accepted 3 June 2024
Available online 6 June 2024
0957-4174/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

1. Introduction traffic control algorithms under heterogeneous traffic conditions; nev-

ertheless, these studies employ saturation flow as a measure of traffic
1.1. Overview signal control strategies. Moreover, these models are based on actuated
traffic signal control with different sensing techniques (e.g., probe-
Traffic congestion usually arises due to increased demand for shared vehicle, RFID, RADAR, etc.) (Maripini, Vanajakshi, & Chilukuri, 2024;
infrastructure or facilities with limited capacity. When these infras- Saikrishna & Anusha, 2021) A dynamic and innovative approach to
tructures are over-utilized, queues form, and their efficiency declines, manage traffic signal timings, particularly one that adjusts to the
exacerbating the congestion. An example of infrastructure degradation specific flow characteristics of different traffic types and volumes,
could be the spill-back phenomenon, which occurs when congestion could significantly alleviate congestion. This requires a departure from
propagates upstream and possibly affects the network. Inefficient signal traditional static, peak-traffic flow-based, and actuated traffic control
timings at signalized intersections are among the major factors con- systems to innovative adaptive models, which can also be implemented
tributing to significant traffic congestion and delays (Anupriya, Bansal, in the real world.
& Graham, 2023; Yue, Li, Chen, Duan, & Mao, 2022). The effects In terms of field implementations of non-static traffic signal con-
of traffic congestion are severe, especially in developing nations with trol in developing economies, actuated and CoSiCoSt are often used
prevailing diverse traffic conditions, which calls for innovative ways of and, thus, can be used for establishing performance benchmarks of
traffic management. Empirical studies imply that effective traffic man- newly proposed algorithms. While the benchmark algorithms typically
agement could eliminate a large portion of urban congestion (Bickel employed demonstrate considerable efficiency in dynamically adjusting
et al., 2007). Traditionally, traffic signal control systems were put in the green phase durations with respect to fixed signal control, they
place as an effective approach to regulate vehicle movement at intersec- exhibit a limitation in their adaptability concerning the dynamic adjust-
tions to improve traffic flow and enhance safety. Despite the extensive ment of the sequence in which phases are executed. These algorithms
available research in this area, traffic signal control (TSC) remains a maintain a static and cyclic order of phase execution, thereby lacking
complex topic consisting of pertinent and sometimes unresolved issues the capability for advanced adaptability in terms of modifying the order
of phase implementation based on real-time traffic conditions.
(Eom & Kim, 2020).
To outline, the present study identifies commonly used traffic signal
Traffic signal controls are generally controlled through fixed signal
control models as a benchmark and provides a comprehensive compar-
control, actuated control, or adaptive control methods (Noaeen et al.,
ison of proposed hybrid algorithms with various traffic signal control
2022). A fixed-time (or static) signal control employs a constant phase
systems prevalent in different cities, including India. The proposed
timing despite the fluctuations in the traffic flow over the day.
algorithm is developed for mixed traffic conditions which adjusts the
This control strategyfails to adapt to the dynamic traffic conditions
phase sequence and phase timing dynamically. Further, the suitabil-
throughout the day, resulting in inefficient signal timings. In contrast,
ity of the proposed algorithm for field implementation is tested by
the actuated control approach modifies the traffic signal timing based
deploying it on a wireless, edge-module traffic control system.
on the time gap between consecutive vehicles, which is collected using
loop detectors or any other measures (e.g., cameras). In essence, actu-
1.2. Max Pressure signal control
ated control prolongs the green signal duration on an approach when
the time gap between consecutive vehicles falls below the threshold.
Among the most advanced and sophisticated signal control models
Despite its ability to respond to traffic conditions at an intersection
is the Max Pressure (MP) algorithm. The very first application of
level, this approach falls short of meeting variable traffic demand, MP was for wireless internet packet transmissions and routing (Tas-
particularly in extremely congested volumes and heterogeneous traffic siulas & Ephremides, 1992). Varaiya (2013a, 2013b) extended the
conditions. The adaptive traffic signal control (ATSC) systems overcome applicability of this concept to traffic signal timings. Here, it aimed
the constraints of other control techniques and offer flexibility to to ensure efficient usage of the road capacity and alleviate traffic
react to traffic fluctuations both at intersection and network levels, congestion by identifying the maximum ‘‘pressure’’ as a measure of
proving to be more effective. Over time, a variety of adaptive traffic traffic state. While the traffic state can be quantified in terms of
signal controllers have been developed to consider real-time traffic the total number of vehicles, vehicle queue length, travel delays, or
fluctuations. For example, Sydney Coordinated Adaptive Traffic System average speed at a particular approach of an intersection (Agarwal,
(SCATS) (Sims & Dobinson, 1980), Split Cycle and Offset Optimization Sahu et al., 2024; Dixit et al., 2020; Levin et al., 2022; Liu & Gayah,
Technique(SCOOT) (Robertson & Bretherton, 1991), Regional hierar- 2022; Pumir et al., 2015), the maximum ‘‘pressure’’ provides a com-
chical optimized distributed effective system (RHODES) (Mirchandani parative notion for different traffic states at different approaches of
& Head, 2001), Composite Signal Control Strategy (CoSiCoSt-Env) (Mu- an intersection. Kouvelas et al. (2014) compared Max Pressure with
ralidharan, Ravikumar, & Nitin, 2010) system. When these systems fixed signal control and found that Max Pressure is more efficient.
were implemented, typical travel times were successfully reduced by A few advanced Max Pressure control techniques (cf. Table 1) have
10% to 40% (Studer & Ketabdari, 2015). However, control systems such been developed aiming to optimize traffic signal timings by using
as RHODES are unsuitable for fundamental network-wide operations observed queue lengths as the underlying criterion for pressure estima-
due to their complicated algorithm requirements. However, SCOOT and tion. Within this paradigm, local control strategies have been developed
SCATS may fall short in quickly changing traffic situations. by Wongpiromsarn et al. (2012) using link-level occupancies and using
The traffic composition on Indian roadways is distinguished by an turning movement occupancies (Varaiya, 2013a).
extensive variety of vehicles, each with a unique set of static and dy- Gregoire et al. (2014) introduced a back-pressure (another term
namic characteristics, all sharing the same road space. However, there for Max Pressure) controller that eliminates the need for routing rates
is a noticeable reflection of poor lane discipline, even on roads with (probabilities associated with different possible vehicle routing deci-
many lanes. Typically, signal timings at intersections in many cities sions at a given intersection). Instead, the controller relies on aggre-
across India are designed based on static models of peak traffic flows gated queue lengths, loop detectors at the stop line, and queue measure-
(saturation flow). These models often fail to capture the fluctuations ments at a node for each potential routing decision. Likewise, earlier
and the heterogeneous nature of traffic, resulting in sub-optimal signal iterations of Max Pressure algorithms have relied on the measurement
timings that do not adequately respond to actual traffic conditions. of queue lengths, which can pose challenges in practical field measure-
Various authors proposed different methods to formulate a saturation ments. Additionally, these algorithms frequently assume that queues
flow model based on Passenger Car Units (PCUs) (Padinjarapat & are unlimited in terms of capacity. However, Mercader et al. (2020)
Mathew, 2020). There are a few studies that explore several forms of presented a modified version of the Max Pressure traffic controller,

2
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Table 1
A list of past methods for adaptive traffic signal control systems.
Study Methodology/ Sudy area Mixed Benchmark Performance
model traffic
Wongpiromsarn, MP Toy Scenario – SCATS queue lengths ↓
Uthaicharoenpong,
Wang, Frazzoli, and
Wang (2012)
Varaiya (2013a) MP Toy Scenario – ACS better stabilization
Gregoire, Frazzoli, de MP Toy Scenario – O-BP performs well under heavy load;
La Fortelle, and performance gap of 10%–20%
Wongpiromsarn (2014)
Kouvelas, Lioris, Fayazi, MP Los Angeles, California – ACS Queue stabilization
and Varaiya (2014)
Pumir, Anderson, MP San Diego, California – FCS, ACS queue stabilization
Triantafyllos, and
Bayen (2015)
Le et al. (2017) MP Toy Scenario – BP network performance ↑,
congestion ↓
Anderson, Pumir, MP-C San Diego, California – ACS demonstrated better control
Triantafyllos, and stability and system performance
Bayen (2018) under defined demand limits
Sun and Yin (2018) MP Toy Scenario – A-MP, C-MP, ACS original non-cyclic MP
outperformed others, especially
under high-demand scenarios
Dixit, Nair, Chand, and MP Thane, Noida, India; – FCS delay ↓ by 12% to 30%
Levin (2020) Bandung, Indonesia
Levin, Hu, and Odell MP Austin, Texas – MP (w/o cycle constraints) slightly worse throughput due to
(2020) cyclical constraints but offers
improved predictability and
acceptance in practice
Mercader, Uwayid, and MP Jerusalem, Israel – FCS queue stabilization
Haddad (2020)
Levin, Barman, MP Hennepin County, – ACS delay ↓
Robbennolt, Hu, Odell, Minnesota
and Kang (2022)
Liu and Gayah (2022) D-MP Toy Scenario – V-MP delay ↓
Agarwal, Sahu, MP-C Ludhiana, India Yes ACS, MP delay ↓ by 22.96-54.8%; queue
Nautiyal, Agarwal and dissipation time ↓ by 5.45–8.32%
Gupta (2024)
Abdulhai, Pringle, and RL (Q-Learning) Toy Scenario – FCS delay ↓ by 56%–62%
Karakoulas (2003)
Prashanth and RL-Q, PGA Toy Scenario – FCS PGA > FCS
Bhatnagar (2011)
Patel, Mathew, and Optimization Toy Scenario Yes ACS delay ↓ by 35.5-38.8%
Venkateswaran (2016) model
Van der Pol and RL (DQN with Toy Scenario – Wiering et al. (2000) and DQN > TTLC in terms of TT ↓
Oliehoek (2016) transfer Kuyer, Whiteson, Bakker,
planning) and Vlassis (2008)
Wei et al. (2019) MP + RL (DQN) State College, USA; Jinan, – FCS, GreenWave, MP, LIT, travel time ↓
China; New York City, USA GRL
Dixit et al. (2020) D-MP Noida, India; Thane, India; Yes FCS delay ↓ by 12%–30%
Bandung, Indonesia
Huang and Qu (2020) RL (PPO + LSTM Toy Scenario – ACS, SCOOTS queue ↓ , congestion ↓
Networks)
Bouktif, Cheniki, and RL + P-DQ Toy Scenario – FCS, DDPG TT ↓
Ouni (2021)
Wang, Yin, Feng, and MP + RL (PPO) Ann Arbor, Michigan – Traditional MP delay ↓
Liu (2022)
Maripini, Vanajakshi, ATCS (TT) Toy Scenario Yes Webster method delay ↓ by 10.41-11.78%
and Chilukuri (2022)
Zhao et al. (2022) IPDALight Jinan and Hangzhou, – FCS, PressLight, CoLight travel time ↓ by 20%
China
Boukerche, Zhong, and RL Toy Scenario – T-RL (w/o delays) queue ↓ by 30%, wait time ↓ by
Sun (2022) 25%

(continued on next page)

3
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Table 1 (continued).
Study Methodology/ Sudy area Mixed Benchmark Performance
model traffic
Mao, Li, and Li (2022) RL- PPO, SAC, Hangzhou, China – Traditional MP SAC > (DRL, MP)
DQN, DDQN, QR
(DQN)
Ghosh, Anusha, Babu, ATCS (RFID) Trivandrum, Kerala, India Yes FCS delay ↓ by 12.6%
and Vanajakshi (2023)
Maripini et al. (2024) ATCS (TT) Chennai, India Yes FCS delay ↓ by 15.42%
Wei, Gao, Yang, and Li DDPG Toy Scenario – FCS, DQN-TSC queue ↓ by 40%, delay ↓ by 30%
(2023)
Zhao, Wang, Wang, and RL Hangzhou, China – FCS; DQN; Double Deep Q improvement up to 13% for
Liang (2024) (GNN + DDQN + Du- Network (DDQN), and a average reward, delay, queue
eling DQN enhanced with length, and waiting time
DQN) Prioritized Replay
Sampling (DQN-PER)
Saiki and Arai (2023) MORL Toy Scenario – MP and single-objective RL travel time ↓
Yazdani et al. (2023) DDQN Melbourne, Victoria, – Actuated reducing total delays
Australia
Ducrocq and Farhi D3QN Toy Scenario – MP and SOTL reducing total delays
(2023)
Zhang, Chang, Jin, D3QN Changsha, China – Fixed, Actuated, SCOOT 16% ↓ (traffic conflicts), 4% ↓
Yang, and Huang (carbon emissions), 18% ↓
(2024) (waiting time compared to
traditional ATSC), although there
was a slight increase (0.64%) in
waiting time compared to the
efficiency-only DRL model.

Models: Max Pressure: MP, Max Pressure based on crowd-sourced data: MP-C, Reinforcement learning: RL, Reinforcement learning using Q learning: RL-Q, Traditional Reinforcement
learning: T-RL, Policy gradient actor-critic algorithm: PGA, Soft Actor-Critic: SAC, MORL: Multi-Objective Reinforcement Learning, Delay-based Max Pressure: D-MP, Parameterized
Deep Q-Networks: P-DQN, Deep Deterministic Policy Gradient: DDPG, Optimal Back Pressure: O-BP, Quantile Regression DQN: QR (DQN), D3QN: Dueling Double Deep Q Network,
DDQN: Double Deep Q-Network
Benchmarks: Non-cyclic or Acyclic MP: A-MP, Variants of Max Pressure: V-MP, Cyclic MP: C-MP, Adaptive traffic control system: ATCS, Travel time: TT, traditional traffic light
control algorithms: TTLC, Sydney Coordinated Adaptive Traffic System: SCATS, Split Cycle Offset Optimisation Technique: SCOOT, Actuated control system: ACS, Fixed control
system: FCS.
Others: Bluetooth data: BD, Crowdsourced data: CD, Radio Frequency Identification: RFID.

whose input is based on travel times instead of queue lengths. The In the previous studies that have built upon (Varaiya, 2013a), the
authors consider this methodology advantageous as the travel times actuation of phases has often been done purely based on pressure
are easier to estimate, and a controller based on travel times has the calculation, lacking the concept of a signal cycle. However, Le et al.
property that considers each link’s finite capacity. Similarly, Agarwal, (2017) introduced a different approach where the time step was defined
Sahu et al. (2024) introduced a wireless architecture for traffic signal as one signal cycle, in contrast to selecting one phase per time step
control utilizing crowdsourced data, where the green signal timing is as commonly done in earlier studies. Their model did not impose a
determined using a travel time-based Max Pressure control algorithm. minimum green time allocation for each phase, which could limit the
Moreover, Liu and Gayah (2022) proposed a Max Pressure model based effectiveness due to factors such as startup delays. Similarly, Anderson
on travel delays , while inheriting the desirable maximum stability et al. (2018), Pumir et al. (2015) also employed a methodology where
feature of the queue-based Max Pressure frameworks, while addressing one phase was activated per time step. They also mandated that each
their limitations, such as challenges in estimation, simplified vehicle phase be activated at least once throughouta signal cycle and for a
depiction, and arbitrary delays at intersections with low demand. Dixit minimum duration. Most of the studies have used an overly simplistic
et al. (2020) proposed a novel traffic control system that uses only real- approach for determining the green signal timing for each phase using
time delay data, eliminating the need for traffic volume or queue length a constant cycle time, which potentially limits its effectiveness in com-
information. The approaches proposed by Agarwal, Sahu et al. (2024), plex traffic scenarios. Thus, the present study explores and introduces
Dixit et al. (2020) simplify signal control, avoiding the installation Reinforcement Learning (RL) to compute green signal timing. The
and upkeep of multiple detectors/sensors (intrusive/ non-intrusive) at idea behind the integration of RL is to make the green timing and,
each approach of an intersection, a common requirement in traditional consequently, cycle time dynamic.
methods like Max Pressure control.
The initial Max Pressure signal control suffered from an acyclic 1.3. Reinforcement learning
nature, in which a phase may be skipped multiple times due to Max
Pressure calculation. This will have issues in providing green signals The application of Reinforcement Learning (RL) in Adaptive Traffic
for pedestrians in urban areas. The problem is aggravated if one of Signal Control (ATSC) has gained traction and garnered significant
the approaches to the intersection is a minor road with low volume. interest in recent years, primarily due to its potential to optimize traffic
The acyclic Max Pressure control algorithm is revised to include a flow, mitigate congestion, and superior performance. These techniques
cyclical phase structure with a defined maximum cycle length (Levin enable controllers to learn the optimal control policy through direct
et al., 2020). The modified Max Pressure controller ensures that a environmental interaction. This section briefs the recent RL-based ATSC
predetermined sequence of phases is activated, with each phase being studies (also refer to Table 1) and highlights the limitations of different
activated at least once within a cycle. Sun and Yin (2018) conducted approaches.
a comparison between Max Pressure control and coordinated actuated Initial applications of RL in ATSC focused on simple models us-
traffic signals in VISSIM. As expected, additional constraints for cyclic ing algorithms like Q-learning and State–Action Reward–State–Action
nature downgrade performance with respect to acyclic Max Pressure. (SARSA) (Abdulhai et al., 2003; Prashanth & Bhatnagar, 2011). These

4
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

early methods laid the preliminary work by displaying the prominent Pressure and Self-Organizing Traffic Lights (SOTL), in minimizing total
use of RL to optimize traffic signal control in a dynamic environment. traffic delays.
Their ability to adjust to fluctuating traffic scenarios without requiring From the above discussion, it is evident that the reinforcement
predefined traffic models gave them an edge over existing strategies. learning models have the potential to nudge the traffic signal timing
However, their dependence on simplistic state representations and towards higher performance. However, a few issues are apparent.
discrete state–action spaces proved to be major limitations, hindering For instance, discrete state–action space, large data requirements for
their effectiveness in complex traffic conditions. Also, the computa- training, computational resources, applicability to heterogeneous traffic
tional complexity of large-scale traffic networks and their reliance on conditions, acyclic nature, etc. Additionally, the focus of many studies
simulated traffic data left a significant gap to be filled. is limited to calculating phase timings; very few studies have tried to
Subsequent research work recognized and addressed these chal- consider the learning mechanism for ordering the phases, which may
lenges by introducing continuous state–action spaces, the inclusion of also play a critical role.
real-time traffic data, and enhancing the responsiveness of the ATSC
systems (Mannion et al., 2016). However, even after these enhance- 1.4. MP, RL integrated studies
ments, models still struggled with high-dimensional state spaces and
often required extensive training to achieve considerable results. More The literature has very few studies that propose hybrid algorithms
recent studies have shifted towards integrating existing RL techniques (cf. Table 1). Some of them have used reinforcement learning with a
with Deep Learning (Deep Reinforcement Learning). DRL, exemplified time-series (e.g., Huang & Qu, 2020) and others integrated different
by works like Van der Pol and Oliehoek (2016), enables the handling RL models (e.g., Bouktif et al., 2021; Zhao et al., 2024). From the
of high-dimensional state spaces and reduces average travel times in discussions in Sections 1.2 and 1.3, it is evident that the MP-based algo-
comparison to existing approaches. Bouktif et al. (2021) showcase the rithms have great potential to prioritize the phases at an intersection,
determination of the next phase and provide optimal signal timings si- and the RL-based algorithms improve the performance by optimizing
multaneously by using Multi-Pass Deep Q-Networks (Multi-Pass DQN). signal timing. This nudges to a hybrid approach in which MP and RL
Significant improvements in traffic flow optimization and signal timing are integrated. So far, in the literature, only two studies have proposed
efficiency was recorded by the use of this hybrid DRL variant. However, hybrid algorithms that integrate MP and RL (Wang et al., 2022; Wei
et al., 2019). The former integrates pressure as a reward in the DQN
immense amounts of training data and susceptibility to non-stationary
RL model, whereas the latter incorporates the Max Pressure Controller
environments were major constraints to work on. Huang and Qu (2020)
as an agent in the policy gradient RL model. The authors showed that
highlight the use of Proximal Policy Optimization (PPO) to train neural
the proposed hybrid approaches perform better than conventional and
networks for optimal signal control strategy, which has proven to
existing approaches. The use of DQN in the former increases the cost of
be revolutionizing. The research highlights the algorithm’s ability to
learning (Wei et al., 2019) and thus limits real-world applications. The
balance the exploration-exploitation trade-off more effectively, leading
complexity of DQN is overcome by using RL-PPO (Wang et al., 2022).
to more efficient and realistic traffic flow management solutions. Yaz-
Both of these studies are acyclic, i.e., these approaches will not suit ur-
dani et al. (2023) implemented a Double Deep Q-Network (DDQN)
ban intersections where pedestrians are crossing the streets. Moreover,
algorithm for traffic signal control at intersections, focusing on opti-
the RL-PPO in the hybrid algorithm (Wang et al., 2022), determines
mizing the balance between pedestrian and vehicular traffic flows. The
the signal timing for all phases in one go. However, determining the
reward function was formulated by aggregating the individual rewards
phase timing after the execution of every phase is likely to further
of various road users through a weighted sum, with adjustments made
enhance the performance. Lastly, the applicability of both studies is
to penalize the extra delays resulting from interactions among these
limited to homogeneous traffic conditions, and the feasibility of the
users. The results of this study indicate that the application of the pro-
proposed algorithm is not tested in real-world system architecture.
posed DDQN algorithm yielded a significant improvement in pedestrian
The present study addresses these constraints by introducing a
travel times, reducing them by over 13% in comparison to actuated
hybrid algorithm tailored to heterogeneous traffic conditions. This al-
signal control. However, this benefit was accompanied by a marginal
gorithm integrates a cyclic framework to facilitate pedestrian crossings
increase of 1% in the average travel time when compared to actuated within each cycle. The proposed hybrid algorithm integrates RL and
control. Saiki and Arai (2023) proposed a multi-objective RL (MORL) to MP. For the former, a novel DRL framework, Proximal Policy Optimiza-
avoid the necessity of multiple policies owing to different traffic flows tion (PPO), is used due to its robustness and ability to adjust traffic
at different times of the day at an intersection. The inclusion of multiple signals dynamically, which have been proven to mitigate congestion
policies will lead to a decay in the performance. The authors compared and improve traffic flows. Being computationally light for large-scale
MORL with SORL and MP methods, finding that MORL achieved better traffic, trained on real-time traffic data, and using continuous state–
average travel times than SORL and similar performance to MP. Along action spaces, PPO enhances the responsiveness of the ATSC systems
similar lines, Zhang et al. (2024) also implemented a multi-objective and makes it a suitable candidate for real-time signal control oper-
deep reinforcement learning framework using a Dueling Double Deep ations. Its capability to adapt and optimize policy to non-stationary
Q Network (D3QN) to optimize traffic signals. The RL model proposed, environments reduces the chance of intersection cross-blocking. More-
integrates multiple objectives related to safety, efficiency, and carbon over, in RL-PPO, the phase timing is adjusted following the completion
emissions at an intersection. A normalized model and the entropy of each phase, a feature anticipated to further optimize traffic signal
weight method are applied to harmonize the different dimensions of performance.
these objectives. Despite these improvements, the findings showed no
substantial performance increase over the D3QN, which was purely fo- 1.5. Traffic signal control for mixed traffic conditions
cused on efficiency optimization. In reality, a 0.6% increase in waiting
time was observed. Signalized intersections play a crucial role in shaping overall road-
Ducrocq and Farhi (2023) employed a Dueling Double Deep Q- way network performance, yet heterogeneous traffic conditions impose
Network (D3QN) to optimize traffic signal control. However, the au- additional challenges for traffic flow optimization. The existing lit-
thors presumed that the data from connected vehicles (CVs) were erature provides ample evidence supporting the necessity of dynamic
precise despite the potential inaccuracies in the infrastructure. Fur- signal control across diverse traffic scenarios, leading to improved
thermore, they trained a distinct agent for each simulated scenario. performance. Traditional guidelines/models from developed countries
The results demonstrated that the proposed algorithm significantly may not be directly applicable to developing nations like India, which
outperformed conventional traffic management strategies, such as Max necessitates tailored measures. However, significant development for

5
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

traffic signal control under heterogeneous traffic conditions is limited 1.6. Research gaps and objectives
in defining the saturation flow and determining delays for a traffic in-
tersection under traffic conditions without lane discipline (Padinjarapat Building upon the preceding discussion, a notable research gap
& Mathew, 2020). For instance, CRRI (2017), Mondal, Arya, and Gupta exists in the exploration of various traffic signal control algorithms
(2022), Padinjarapat and Mathew (2020), Radhakrishnan and Mathew tailored for heterogeneous traffic scenarios. The present study has the
(2011) proposed different methods to formulate a saturation flow following research contributions: (a) A comprehensive evaluation of
model based on Passenger Car Units (PCUs). Very few studies attempt various algorithms (actuated, CoSiCoSt, RL, MP) for optimization of
to determine the traffic signal phase timings to optimize the traffic traffic signal phase timings, (b) development of a hybrid traffic signal
flow under mixed traffic conditions. For instance, Verghese, Chenhui, control algorithm suitable for heterogeneous traffic scenarios using the
Subramanian, Vanajakshi, and Sharma (2017) created a state feedback Max Pressure (MP) algorithm and Reinforcement Learning (RL) method
controller using macroscopic models for over-saturated circumstances. to make the phase order and cycle time dynamic, (c) a comparative
The conservation equation and a site-specific empirical relation served analysis of various performance indicators, such as delay, queue length,
as the foundation for the model. Likewise, George et al. (2020) and queue dissipation time, using a real-world case study and (d)
testing the feasibility in the real world by deploying the proposed
introduced area occupancy as a metric for formulating a model-based
algorithm on a hardware setup.
traffic signal control strategy. This approach accounts for fluctuat-
To facilitate a comprehensive exploration of our research, the rest
ing vehicle dimensions and effectively addresses lane indiscipline and
of the paper is organized as follows: Section 2 focuses on the multiple
traffic heterogeneity.
methods used in this study along with the theoretical and mathematical
Further, Patel et al. (2016) proposed an optimization model that
aspects of the Max Pressure algorithm as well as the reinforcement
minimizes total average control delay without explicit demand pre-
learning approaches used in the design and implementation of adaptive
diction for non-lane following heterogeneous road traffic. Simulation traffic control systems. Section 3 elaborates on the scenario develop-
results showed a significant reduction in average control delay and ment of the network and showcases the algorithms of the proposed
queue length compared to a vehicle-actuated system and a real-time design in a step-by-step approach for a total of six scenarios with the
reinforcement learning model for non-lane following heterogeneous help of micro-simulation. Section 4 provides detailed results obtained
traffic. Similarly, Maripini et al. (2022) proposed an adaptive signal in this study, performance analysis, practical implementation, and sen-
design for mixed traffic using sample travel time originated using sitivity of the proposed model. Section 5 contrasts the efficacy of the
GPS/ Bluetooth/ Wi-Fi sensors. The results indicated that real-time proposed algorithm with findings from two recent studies and outlines
optimal signal design may be achieved using data from only four the limitations inherent in the current research. Finally, Section 6
probe cars per phase. When applied to VISSIM, the suggested model concludes the study.
outperformed Webster’s signal design process, showing a theoretical
delay reduction of 11.78% and a real decrease of 10.41%. More- 2. Methods
over, Maripini et al. (2024) devised a traffic control system for mixed
traffic, incorporating travel time data from probe vehicles to enhance 2.1. Benchmark algorithms
intersection performance. The optimized model successfully attained
an average reduction of 15.42% in total intersection delay over 14 Contemporary literature reveals that most studies primarily utilize
cycles, specifically addressing near-saturated traffic conditions. Ghosh conventional benchmarks such as fixed and actuated signal controls
et al. (2023) concentrated on creating a dynamic signal control system (cf. Table 1). A few studies have used other models for benchmark-
for mixed traffic conditions by leveraging sparse data gathered from ing. In developing nations, such as India, where heterogeneous traffic
radio frequency identification (RFID) sensors. They reformulated the conditions prevail, fixed signal control is most common for field imple-
existing delay equation into an optimization problem for dynamic mentation. At a few intersections, actuated and CoSiCoSt algorithms
signal control. Compared to fixed-time signal control, a calibrated are also deployed in the field. Moreover, due to advancements in the
VISSIM microsimulation model demonstrated a 12.6% enhancement literature around ATCS, the performance is compared with two other
in the overall average intersection delay. Agarwal, Sahu et al. (2024), categories of the models, i.e., MP and RL, which are becoming popular
Dixit et al. (2020) proposed a Max Pressure-based traffic signal control but not necessarily implemented in the field, rarely are advanced
algorithms, like MP and RL, executed in practice, possibly due to
algorithm, which uses travel delay and travel time, respectively. The
their complexity and resource-intensive nature. In this study, actuated
proposed approaches were applied to mixed traffic conditions. The
and CoSiCoSt models are used as benchmarks (Muralidharan et al.,
application of such an approach is limited to specific phases (turning
2010; Saikrishna & Anusha, 2021). The proposed model is compared
movements are not available) and locations where crowdsourced data
to several MP and RL variants.
is available. The majority of studies have predominantly concentrated
on the estimation of saturation flow, and a few others have tried to
2.1.1. Actuated traffic signal control
optimize signal timings using Passenger Car Units (PCU). A few others
Actuated traffic signal control is an adaptive approach that dynam-
have demonstrated the use of various measures (e.g., travel time using
ically modifies traffic signal phases according to the prevailing traffic
probe vehicle, density, etc.) to detect the traffic conditions on the conditions. This approach is distinguished by its capacity to extend
approaches. These models often fail to capture the fluctuations and the the duration of a traffic phase in response to a continuous stream of
heterogeneous nature of traffic, resulting in sub-optimal signal timings vehicles.
that do not adequately respond to actual traffic conditions. The applica- The transition to the next phase is deliberately begun when a
bility of these models/algorithms remains limited to simulation models, suitable gap is detected in the vehicle stream.
and feasibility in the field is yet to be tested. T he literature lacks a The prolonged green timing per phase is limited to maximum green
dynamic model to identify the order of phases at an intersection and time. The minimum gap and maximum green time are the key parame-
also compute the green signal timing under heterogeneous traffic con- ters within this system, which are subject to adjustment (cf. Section 3.3
ditions. To address this gap, the reflection of heterogeneous traffic in a for the values of key parameters in the present study).
signal optimization problem becomes imperative, which is deployable
in the real world. By doing so, the accuracy and applicability of signal 2.1.2. Composite Signal Control Strategy (CoSiCoSt)
timing optimizations can be enhanced, ensuring a more comprehensive The Composite Signal Control Strategy (CoSiCoSt) traffic signal
and realistic approach to traffic management. control system employs the philosophy of Split, Cycle, and Offset

6
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Optimization (SCOOT) to achieve optimal throughput. This approach individual intersections, it is feasible to distill the concept of pressure,
dictates that the duration of the green signal for each phase is set equal into a more streamlined form, as presented in Eq. (1).
to the queue service time, which is determined by the ratio of queue ∑ ∑
length (in vehicles or PCU) and average queue service rate (in vehi- 𝑝𝑧,𝑛 (𝑡) = 𝑦𝑖𝑛 − 𝑦𝑜𝑢𝑡 (1)
(𝑗,𝑘)∈𝑀𝑧,𝑛 (𝑗,𝑘)∈𝑀𝑧,𝑛
cles/s or PCU/s). At every scan interval, the system computes the gap
between vehicles. If this gap exceeds a predefined threshold (minimum The incoming and outgoing traffic flows (in PCUs) for phase 𝑧 are
gap), the green phase is terminated; otherwise, the green phase dura- denoted by 𝑦𝑖𝑛 and 𝑦𝑜𝑢𝑡 , respectively. A set of links that define the
tion is extended. However, there is a stipulated maximum limit to the movement in a particular phase 𝑧 are symbolized by 𝑀𝑧,𝑛 .
duration of the green phase. Consequently, if the duration of the green Finally, the green time of a phase 𝑧 is determined by apportioning
phase surpasses this maximum threshold, the green phase is terminated the effective green time in the proportion of the pressure (cf., Eq. (2)).
to facilitate the transition to the next phase of the traffic signal cycle. The resulting green time is bound within a configurable threshold
The key parameters are scan interval, minimum gap threshold, and [𝑔𝑚𝑖𝑛 , 𝑔𝑚𝑎𝑥 ]. The former (𝑔𝑚𝑖𝑛 ) is estimated based on the time required
maximum green time, which are configurable (cf. Section 3.3 for the to cross the pedestrians and the latter (𝑔𝑚𝑎𝑥 )is to allow amber time and
values of key parameters in the present study). minimum green on three other approaches.
𝑝𝑧,𝑛 (𝑡)
2.2. Max Pressure (MP) 𝑔𝑚𝑖𝑛 ≤ 𝑔𝑧,𝑛 (𝑡) = ∑ ⋅ 𝐺𝑛 ≤ 𝑔𝑚𝑎𝑥 , ∀𝑧 ∈ 𝑛 (2)
𝑧∈𝑛 𝑝𝑧,𝑛 (𝑡)

The present study employs the Max Pressure algorithm proposed The share of effective green time allocated to phase 𝑧 during time
by Varaiya (2013a). In this, stationary traffic flow on each approach period 𝑡 is represented by 𝑔𝑧,𝑛 [𝑠] and total effective green time is
is determined to estimate the pressure on each approach of an inter- depicted by 𝐺𝑛 for intersection 𝑛.
section. The concept of stationary traffic flow, measured in Passenger Algorithm 2 Cyclic Max Pressure Heterogeneous (MPH) Algorithm
Car Unit (PCU), is utilized to represent the vehicles queued on a link,
1: procedure CyclicMaxPressureHeteroge-
serving as a standardized metric for evaluating congestion levels on
nous(𝑇end , 𝑖𝑛 , 𝑜𝑢𝑡 , 𝑃 𝐶𝑈𝑖 , 𝐺n , 𝑔min , 𝑔max )
that link. Consequently, the terms stationary traffic flow and queue
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1
length are used interchangeably throughout the study. In contrast to
in Fig. 1
Max Pressure based on travel time and travel delays, the present study
3: while 𝑇current < 𝑇end do ⊳ Run the simulation until end time
uses stationary traffic flow, which allows greater flexibility for phasing;
4: if phase duration expired then
in other words, vehicle counts allow the use of turning movements,
5: 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 ← ∅ ⊳ Initializing an empty list to store the
and thus, different phasing may be opted. A pseudo-code for the Max
pressure from each phase
Pressure algorithm is shown in Algorithm 1.
6: for all phases 𝑧 do
7: Calculate pressure 𝑝𝑧,𝑛 (𝑡) using Equation (8)
Algorithm 1 Algorithms for Max Pressure and Max Pressure with 8: Append (𝑧, 𝑝𝑧,𝑛 (𝑡)) to 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡
heterogeneous traffic mix 9: end for
1: procedure Max_Pressure(𝑇end , 𝑦in , 𝑦out , 𝐺n , 𝑔min , 𝑔max ) 10: Sort 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 in decreasing order of 𝑝𝑧,𝑛 (𝑡)
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1 11: for all phases (𝑧, 𝑝𝑧,𝑛 ) in 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 do
in Fig. 1 12: Determine 𝑔𝑧,𝑛 (𝑡) for phase 𝑧 using Equation (2)
3: while 𝑇current < 𝑇end do ⊳ Run the simulation until end time 13: if 𝑔𝑧,𝑛 (𝑡) < 𝑔𝑚𝑖𝑛 then
4: if phase duration expired then 14: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑖𝑛
5: for all phases 𝑧 do 15: else if 𝑔𝑧,𝑛 (𝑡) > 𝑔𝑚𝑎𝑥 then
6: Calculate pressure 𝑝𝑧,𝑛 (𝑡) 16: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑎𝑥
⊳ use Equation (1) for MP and Equation (8) for MPH 17: end if
7: end for 18: Activate phase 𝑧 with duration 𝑔𝑧,𝑛 (𝑡)
8: 𝑧next ← 𝑓 𝑖𝑛𝑑(𝑧) at max(𝑝𝑧,𝑛 (𝑡)) 19: 𝑧current ← 𝑧
⊳ Select the phase with Max Pressure 20: 𝑇current ← Update current time considering 𝑔𝑧,𝑛 (𝑡)
∑
9: Calculate total pressure 𝑧∈𝐹𝑛 𝑝𝑧,𝑛 (𝑡) 21: end for
10: Determine 𝑔𝑧,𝑛 (𝑡) for 𝑧next using Equation (2) 22: end if
11: if 𝑔𝑧,𝑛 (𝑡) < 𝑔𝑚𝑖𝑛 then 23: 𝑇current ← Update current time
12: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑖𝑛 24: end while
13: else if 𝑔𝑧,𝑛 (𝑡) > 𝑔𝑚𝑎𝑥 then 25: end procedure
14: 𝑔𝑧,𝑛 (𝑡) ← 𝑔𝑚𝑎𝑥
15: end if As per the discussion in Section 1.2, acyclic Max Pressure suffers
16: Activate 𝑧next with duration 𝑔𝑧,𝑛 (𝑡) from various limitations, such as skipping of phases, which is critical
17: 𝑧current ← 𝑧next for pedestrian movements in urban areas. Thus, the present study also
18: end if employs cyclic Max Pressure (C-MP) and compares the results with
19: 𝑇current ← Update current time acyclic Max Pressure (MP).
20: end while The dynamic phasing-based cycling Max Pressure ensures the se-
21: end procedure
quential execution of all phases, prioritizing them in descending order
of their calculated pressure values. The allocation of green time within
this framework is in accordance with Eq. (2). The pseudo-codes for
In numerous South Asian urban areas, traffic conditions are markedly acyclic and cyclic Max Pressure are exhibited in Algorithms 1 and 2,
heterogeneous, characterized by a diverse array of vehicles, each respectively.
possessing distinct kinematic characteristics. This diversity contributes
to varying degrees of congestion. In the present study, a real-world use 2.3. Reinforcement learning (RL)
case is Ludhiana, India, where the traffic conditions are heterogeneous.
A Passenger Car Unit (PCU) (CRRI, 2017) is used to convert mixed traf- The Cyclic Max Pressure (C-MP) strategy emerges as a potent so-
fic to a common unit. To effectively manage all the phasing systems of lution to mitigate the issue of extended red durations; however, it

7
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Algorithm 3 Reinforcement learning (DQN) Algorithm 4 Reinforcement Learning (PPO)

1: procedure ReinforcementLearningDQN(𝑇end , 𝛾, 𝛼, 𝑎, 𝑠) 1: procedure ReinforcementLearningPPO(𝑇end , 𝛼, 𝜖, 𝛾, 𝑎, 𝑠)
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1 2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1
in Fig. 1 in Fig. 1
3: Initialize SUMO with given demand on the network ⊳ Setup 3: Initialize SUMO with given demand on the network ⊳ Setup
simulation environment simulation environment
4: Initialize the DQN model with hyperparameters (𝛼 = 7.77×10−3 , 4: Initialize the PPO model with hyperparameters (𝛼 = 7.77 × 10−3 ,
𝜖 = 0.3, 𝛾 = 0.95), 𝑎0 , and 𝑠0 . 𝜖 = 0.3, 𝛾 = 0.95), 𝑎0 , and 𝑠0 .
5: ⊳ Initial action (𝑎0 ) is a green time, picked randomly, within 5: ⊳ Initial action (𝑎0 ) is a green time, picked randomly, within
boundary conditions (i.e., [𝑔min , 𝑔max ]) boundary conditions (i.e., [𝑔min , 𝑔max ])
6: ⊳ 𝑠0 is the initial traffic state 6: ⊳ 𝑠0 is the initial traffic state
7: 𝑠𝑡 = 𝑠0 7: 𝑠𝑡 = 𝑠0
8: 𝑎𝑡 = 𝑎0 8: 𝑎𝑡 = 𝑎0
9: while 𝑇current < 𝑇end do ⊳ Run simulation until end time 9: while 𝑇current < 𝑇end do ⊳ Run simulation until end time
10: 10:
11: 𝑎𝑡 ← DQN(𝑠𝑡 ) ⊳ Getting phase duration for all phases using 11: 𝑎𝑡 ← PPO(𝑠𝑡 ) ⊳ Getting phase duration for all phases using
RL-DQN RL-PPO
12: Execute 𝑎𝑡 ⊳ Phase duration at time 𝑡 12: Execute 𝑎𝑡 ⊳ Phase duration at time 𝑡
13: Calculate 𝑅𝑡 ⊳ Calculate reward using Equation (7) 13: Calculate 𝑅𝑡 ⊳ Calculate reward using Equation (7)
14: 𝑧current ← 𝑧next based on cyclic order 14: 𝑧current ← 𝑧next based on cyclic order
15: Prepare for the next traffic state (𝑠𝑡+1 ) 15: Prepare for the next traffic state (𝑠𝑡+1 )
16: 𝑎𝑡+1 ← update(𝑅𝑡 ,𝑠𝑡+1 ) ⊳ getting phase duration for next 16: 𝑎𝑡+1 ← update(𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 ,𝑠𝑡+1 ) ⊳ getting phase duration for
phase next phase
17: 𝑔z,next ← 𝑎𝑡+1 ⊳ green duration for next phase 17: 𝑔𝑧,next ← 𝑎𝑡+1 ⊳ green duration for next phase from the
18: Execute 𝑧next and store transition (𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 , 𝑠𝑡+1 ) in the updated policy
memory for lookup of similar state 18: Execute 𝑧next and store transition (𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 , 𝑠𝑡+1 ) in the
19: if number of transition states > batch size then ⊳ Update memory for lookup of similar state
the DQN model periodically 19: 𝑇current ← UpdateCurrentTime() ⊳ Increment simulation
20: Sample minibatch from transition states time
21: Update the DQN model using minibatch 20: end while
22: end if 21: Evaluate PPO policy performance by averaging 𝑅𝑡 over episodes
23: 𝑇current ← UpdateCurrentTime() 22: end procedure
24: 𝑧current ← 𝑧next ⊳ Transition to next traffic signal phase
25: end while
26: Evaluate DQN policy performance by averaging 𝑅𝑡 over
episodes The stationary traffic flow (𝑌𝑛 (𝑡)) and delay (𝐷𝑛 (𝑡)) at time 𝑡 are
27: end procedure shown in Eqs. (4) and (5)
∑
𝑌𝑛 (𝑡) = 𝑦𝑖 (𝑡) ∀ 𝑖 ∈ 𝐼 (4)
𝐿∈𝑛
∑
still suffers from the issue of distribution of green timing for each 𝐷𝑛 (𝑡) = 𝑑𝑖 (𝑡) ∀𝑖∈𝐼 (5)
phase based on the pressure proportions. Therefore, the present study 𝐿∈𝑛

introduces a strategy employing Reinforcement Learning (RL) for signal where 𝐿 is the set of all links at the intersection 𝑛, 𝑦𝑖 , and 𝑑𝑖 are
control. stationary traffic flow and delay for vehicle type 𝑖. The set of all
Under the machine learning paradigm, an agent in the Reinforce- vehicle types is denoted by 𝐼. To normalize different vehicle types,
ment Learning (RL) model performs various actions and gets a reward these metrics are computed in terms of Passenger Car Units (PCUs).
The action space, 𝐴 (see Eq. (6)), is defined as the set of possible phase
or a punishment, which leads to a learning mechanism for decision-
durations (i.e., green time) for different phases of an intersection.
making in an environment. The agent’s primary objective is to learn
a policy, a strategy that dictates its actions in various environmental 𝐴 = {𝑎1 , 𝑎2 , … , 𝑎𝑘 } (6)
states to maximize cumulative rewards. The environment, a crucial
where 𝑘 is the number of configurations, which is the same as the
component, defines the context for the agent’s actions and their out-
number of phases. One of the actions is picked randomly for the exe-
comes. A state represents the environment’s current configuration. The cution in the model. The reward function 𝑅𝑡 is designed to incentivize
agent’s actions, selected from a set of possibilities known as the action the reduction of total vehicle delay and stationary traffic flow at the
space, aim to achieve an optimal policy that consistently maximizes intersection, as shown in Eq. (7).
rewards over time. ( )
With an objective to minimize traffic congestion by dynamically 𝑅𝑡 = − 𝑤𝑑 ⋅ 𝛥𝐷𝑛 (𝑡) + 𝑤𝑞 ⋅ 𝛥𝑌𝑛 (𝑡) (7)
adjusting phase timing based on real-time traffic conditions, the present where 𝑤𝑑 and 𝑤𝑞 are weight parameters for delay and stationary traffic
study uses the Deep Q-Network and Proximal Policy Optimization flow, respectively. 𝛥𝐷𝑛 (𝑡) and 𝛥𝑌𝑛 (𝑡) represent the change in delay
(PPO) RL models. The generic formulation of RL models is explained and stationary traffic flow from the last state, respectively. The 𝑤𝑑
first. Eq. (3) shows a state, 𝑆 of the environment, which includes vari- minimizes accumulated waiting times, and the 𝑤𝑞 focuses on reducing
ous states composed of traffic metrics such as approach-wise stationary vehicle queues. The negative sign aligns with standard reinforcement
traffic flows, total delay, current phase, and updated phase duration at learning principles, emphasizing the agent’s objective of minimizing
the intersection. cumulative costs. This succinct formulation captures the essence of the
agent’s strategy in achieving a balance between delay reduction and
𝑆 = [𝑠1 , 𝑠2 , … , 𝑠𝑗 ] (3) optimized queue lengths.

8
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

In the present study, two RL models are used; they are Deep Q- Algorithm 5 Max Pressure and Reinforcement Learning
Network (DQN) and Proximal Policy Optimization (PPO). An algorithm
1: procedure MaxPressureandRL(𝑇end , 𝑖𝑛 , 𝑜𝑢𝑡 , 𝑃 𝐶𝑈𝑖 , 𝛼, 𝜖, 𝛾, 𝑎, 𝑠)
for the former is shown in Algorithm 3, which combines Q-learning
2: 𝑇current ← 0, 𝑧current ← Initial Phase ⊳ Initial phase is phase 1
with deep neural networks. The states are the input to the neural
in Fig. 1
networks, which are mapped to the action (output nodes of the neural
3: Initialize SUMO with given demand on the network ⊳ Setup
network) and the Q-value; it is further used to get the reward value for
simulation environment
the state space and action pair. The hypertuning parameters in RL are 4: Initialize the PPO model with hyperparameters (𝛼 = 7.77 × 10−3 ,
𝛼, 𝛾, which are learning rate and discount factors, respectively. 𝜖 = 0.3, 𝛾 = 0.95), 𝑎0 , and 𝑠0 .
The PPO uses a policy gradient method and is chosen for its sta- 5: ⊳ Initial action (𝑎0 ) is a green time, picked randomly, within
bility and effectiveness in handling high-dimensional action spaces. It boundary conditions (i.e., [𝑔min , 𝑔max ])
makes a balance between exploration and exploitation by optimizing 6: ⊳ 𝑠0 is the initial traffic state
between maximizing expected returns and containing policy changes. 7: 𝑠𝑡 = 𝑠0
This ensures the stability of the training process in RL. The RL agent 8: 𝑎𝑡 = 𝑎0
is trained over numerous episodes, each representing a different traffic 9: while 𝑇current < 𝑇end do ⊳ Run simulation until end time
scenario. A pseudo-code for the PPO implementation of RL is shown 10: 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 ← ∅ ⊳ Initializing an empty list to store the
in Algorithm 4. In this, 𝛼, 𝛾, 𝜖 are hypertuning parameters, named as pressure from each phase
learning rate, discount factor, and clip factor, respectively. 11: for all phases 𝑧 do
12: Calculate pressure 𝑝𝑧,𝑛 (𝑡) using Equation (8)
2.4. Max Pressure and Reinforcement Learning-based traffic signal control 13: Append (𝑧, 𝑝𝑧,𝑛 (𝑡)) to 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 ⊳ Phase order and
corresponding pressure are stored in the list
The Reinforcement Learning (RL) methodology presents a sophisti- 14: end for
cated technique for the allocation of green times across various phases 15: Sort 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 in decreasing order of 𝑝𝑧,𝑛 (𝑡) ⊳ Getting
at a traffic intersection. However, this method exhibits a notable limita- the order of phases using MP
16:
tion due to its inherent cyclic and predetermined sequence of phase exe-
17: for all phases (𝑧, 𝑝𝑧,𝑛 ) in 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡 do ⊳ Process for
cution. To address this shortcoming, this study introduces an innovative
cyclic execution of phases
signal control strategy that amalgamates the Max Pressure algorithm
18: 𝑎𝑡 ← PPO(𝑠𝑡 ) ⊳ Getting phase duration for the selected
with reinforcement learning principles. As shown in Section 2.2, the
phase using RL-PPO
pressure is calculated for each phase based on the vehicular inflow
19: if 𝑎𝑡 < 𝑔𝑚𝑖𝑛 then
and outflow. In the past, Max Pressure Control is mostly applied for
20: 𝑎𝑡 ← 𝑔𝑚𝑖𝑛
homogeneous traffic conditions. This is performed using traffic flow.
21: else if 𝑎𝑡 > 𝑔𝑚𝑎𝑥 then
The Indian traffic is marked by a diverse array of vehicle types, neces-
22: 𝑎𝑡 ← 𝑔𝑚𝑎𝑥
sitating a nuanced approach to analyzing its dynamics. Typically, for
23: end if ⊳ Boundary condition check for computed green
such traffic conditions, all vehicle types may be converted to a common
times
unit (e.g., PCU). However, to accurately account for the complexities of
24: Execute 𝑎𝑡 as per cyclic phase order ⊳ Phase duration at
heterogenous traffic conditions, an alternative variant of the maximum
time 𝑡
pressure algorithm is examined as formulated in Eq. (8).
25: Calculate 𝑅𝑡 ⊳ Calculate reward using Equation (7)
∑ 26: 𝑧current ← 𝑧next based on order given by 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡
𝑝𝑧,𝑛 (𝑡) = (𝑖𝑛 − 𝑜𝑢𝑡 )𝑖 ⋅ 𝑃 𝐶𝑈𝑖 (8)
(𝑖,𝑗,𝑘)∈𝑀𝐼,𝑧,𝑛
27: Prepare for the next traffic action (𝑠𝑡+1 )
28: 𝑎𝑡+1 ← update(𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 ,𝑠𝑡+1 ) ⊳ getting phase duration for
where 𝑖𝑛 and 𝑜𝑢𝑡 are the number of incoming and outgoing vehicles, next phase
𝑖 is the vehicle type and 𝐼 is the set of all vehicle types. This hybrid 29: 𝑔z,next ← 𝑎𝑡+1 ⊳ green duration for the next phase from
approach prioritizes phases based on the descending order of pressure the updated policy
values, concurrently employing adaptive RL-based green times. 30: Execute 𝑧next
The primary objective of this proposed model is to effectively har- 31: store transition (𝑠𝑡 , 𝑎𝑡 , 𝑅𝑡 , 𝑠𝑡+1 ) in the memory for lookup
ness the strengths of both the Max Pressure and Reinforcement Learning of similar state
frameworks. This synergistic integration is aimed at optimizing critical 32: end for
traffic parameters, including the minimization of queue lengths, reduc- 33: 𝑇current ← UpdateCurrentTime() ⊳ Increment simulation
tion in delay times, and expedited queue dissipation, thereby enhancing time
overall traffic flow efficiency. 34: end while
The pseudo-code for the proposed algorithm is shown in Algorithm 35: Evaluate PPO policy performance by averaging 𝑅𝑡 over episodes
5. It is initiated by setting the current time in the simulation to zero. 36: end procedure
The PPO model is initialized with defined hyperparameters, including
the learning rate (𝛼), discount factor (𝛾), and exploration rate (𝜖), along-
side the initial conditions for the action (𝑎0 ) and state (𝑠0 ). Throughout
make informed decisions. After executing the chosen action (𝑎𝑡 ), the
the simulation, the algorithm performs several sequential operations
algorithm calculates a reward (𝑅𝑡 ) based on the outcomes, which serves
until the end time (𝑇end ) is reached, i.e., no more vehicles remain
as feedback for the learning process. The current phase (𝑧current ) is
on the network. It involves an iterative process where traffic signal updated to the next phase (𝑧next ) as determined by 𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡. The
phase sequences are continually adjusted according to the calculated state and action are updated continuously, refining the policy based
pressures for each phase. Pressure is computed using classified traffic on the reward information and updated traffic state, with each transi-
volume at each approach at any given time (cf. Eq. (8)). The computed tion stored in memory to facilitate faster learning. At the conclusion
pressures for each traffic phase are stored in a list –𝑃 𝑟𝑒𝑠𝑠𝑢𝑟𝑒𝐿𝑖𝑠𝑡– of the simulation, the performance of the PPO policy is evaluated
which is then used to sort the phases in decreasing pressure order. by averaging the rewards obtained over multiple episodes, provid-
The PPO model subsequently determines the optimal phase durations ing a metric for assessing the effectiveness of this integrated control
based on the current traffic state (𝑠𝑡 ), utilizing the learned policy to strategy.

9
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Table 2 the employed signal phasing system is depicted in Fig. 1. This system
Formulation of scenarios based on different algorithms.
is characterized by a sequential traffic control methodology where
Group Algorithm straight and left turn movements for two opposing approaches are
Scenario A Actuated concurrently given the green light. This is subsequently followed by
Benchmark
Scenario C Composite Signal Control
an exclusive phase dedicated to right-turn movements. This process is
Strategy
then analogously implemented for the remaining two approaches.
Scenario MP Max Pressure
In the SUMO simulation environment, the generation of random
Scenario MPH MP for Heterogeneous traffic
Max pressure (MP-H) traffic is an intrinsic capability, facilitating users to define key param-
Scenario CMPH Cyclic MP for Heterogeneous eters such as frequency, initiation, and cessation of vehicular flow.
traffic This feature enables precise control over route generation within the
Scenario RLD Reinforcement Learning Deep network. Initially, the simulated scenario was characterized by an
RL Q-Network (DQN) absence of vehicular activity.
Scenario RLP Reinforcement Learning Detailed route files are meticulously developed to infuse the net-
Proximal Policy Optimization
(PPO)
work with a realistic vehicular presence.. These files are tailored
to reflect traffic demand during the three predominant peak hours,
Hybrid Scenario H Hybrid using Cyclic MP for
Heterogeneous traffic and i.e., 08:00–09:00, 12:00–13:00, and 16:00–17:00. The composition
with RL PPO of these files is comprehensive, encompassing a diverse spectrum of
vehicle types that are typically prevalent in Indian traffic scenarios.
To address the distinct dimensional and kinematic characteristics of
the vehicles, Passenger Car Unit (PCU) values are assigned to each
3. Scenario development
vehicle category (CRRI, 2017). This allocation is critical in quantifying
the relative congestion each vehicle type could contribute to, thereby
To illustrate the efficacy of the proposed algorithms in real-world ensuring an accurate and nuanced depiction of traffic conditions within
situations, this study considers a four-leg intersection, namely, Vard- the simulation framework.
haman Chowk, Ludhiana, India. Each approach of the intersection has
surveillance cameras. Therefore, prerecorded video footages are used 3.3. Assumptions and default values
to extract the classified traffic volume count. The data extraction was
performed using a trained model developed by Agarwal, Thombre, In the present study, a 4-phase system is assumed as depicted
Kedia and Ghosh (2024). In order to assess the impact during varying in Fig. 1; in this, straight traffic from two approaches is allowed
traffic conditions, data from three distinct time intervals on July 26, in two phases, and right-turn (conflicting) traffic is allowed in the
2023, are extracted and used for simulation experiments. These in- other two phases. An amber time between the phases is considered as
tervals included morning peak (08:00-09:00), afternoon (12:00-13:00), 4 s. For actuated traffic signal control, threshold values are minimum
and evening peak (16:00-17:00) periods. The peak periods chosen for gap and maximum green time, which are assumed to be 10 s and
this study were deliberately selected to assess the effectiveness of the 60 s, respectively. In other words, if the time headway between the
proposed algorithms in effectively managing and dissipating queues consecutive vehicles is less than the minimum gap, the green extension
during the busiest hours of the day. will be given till the maximum green time is triggered. For CoSiCoSt
traffic signal control, the minimum gap and maximum green time are
3.1. Synthesis of scenarios the same as that of the actuated traffic signal control. Additionally,
the scan time is assumed to be 15 s, and the average queue service
The simulation experiment is designed on Simulation of Urban MO- rate is determined from the pre-recorded video data, which turns
bility (SUMO) (SUMO, 2024). To ensure a comprehensive comparative out to be 0.25 PCU/s, 0.23 PCU/s, and 0.17 PCU/s for 08:00–09:00,
analysis, seven distinct scenarios are constructed (see Table 2). The first 12:00–13:00, and 16:00–17:00 time periods respectively.
group has two scenarios, i.e., actuated and CoSiCoSt algorithms, which For the rest of the algorithms, the minimum and maximum green
are used as benchmarks for the rest of the algorithms. The second group times are taken as 10 s and 60 s, respectively. The former is estimated
has three scenarios of Max Pressure (MP). They are: traditional acyclic based on the time required for a pedestrian to cross the 4-lane dual
Max Pressure (MP), the Max Pressure variant for heterogeneous traffic carriageway with a speed of 1.4 m/s. The simulation interval ranging
(MP-H), and a cyclic variant of Max Pressure control for heterogeneous from 100 s to 250 s is utilized for the comparative analysis of signal (or
traffic (C-MP-H). For the cyclic variant, dynamic phasing is used in each phase) timings across various scenarios (cf., Figs. 4 and 8). Based on
cycle, i.e., the order of the phases in each cycle is different. The third the observations from the field conditions, the maximum queue length
group has two variants of Reinforcement Learning (RL), namely, Deep is assumed to be 100 m, i.e., the queue length is measured only on the
Q-learning Network (DQN) and Proximal Policy Optimization. Finally, 100 m length of the road segment for each approach.
the last group has the proposed hybrid model.
4. Results
3.2. Simulation framework
A comparative analysis is conducted to evaluate the efficacy of
An open-source traffic simulation tool, SUMO, is employed for gen- different traffic control scenarios, utilizing key performance indicators
erating, implementing, and assessing traffic scenarios. It facilitates the (KPI): queue lengths, delays, and queue dissipation times (see Table 3).
incorporation of customized traffic signal control algorithms, enabling In SUMO, the vehicles in stationary conditions are converted to PCU
the evaluation of their effectiveness through dynamic calculations of units to get the queue length at each approach of an intersection.
queue lengths and delays. This study utilizes Traffic Control Interface Similarly, the delay is defined as the duration for which a vehicle
(TraCI) version 1.19.0 (TraCI, 2024) to connect SUMO with external is stationary in the simulation model. The average queue lengths are
applications, facilitating the integration of custom algorithms. TraCI’s estimated for all approaches of an intersection and averaged over
real-time interaction capability with ongoing simulations permits the time. The total delay for a scenario is the aggregation (over the queue
modification of parameters such as traffic signals, vehicle routes, and dissipation time) of delays at all approaches at each time step. The time
signal control methods. SUMO’s OSM Web Wizard tool is used to to clear the intersection is referred to as queue dissipation time. The
replicate a network akin to the real-world layout. In the present study, simulation continues until all vehicles reach their destinations.

10
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 1. A possible order of signal phasing system.

Fig. 2. Comparison of queue lengths and delays for benchmark scenarios.

4.1. Comparison of benchmark models due to dynamic queue service times (i.e., the ratio of queue length and
average queue service rate) in the CoSiCoSt approach. Interestingly,
For the comparison of the performance of different algorithms, two the queue dissipation time decreases marginally for Scenario C. Clearly,
scenarios are used as benchmarks, i.e., Actuated and CoSiCoSt, and the CoSiCoSt benchmark scenario outperforms the actuated benchmark
referred to as Scenario A and Scenario C, respectively (see Table 2). scenario; the former is used for comparison with other algorithms.
Fig. 2 distinctly demonstrates the superior performance of Scenario
C compared to Scenario A, particularly in terms of delay reduction. 4.2. Comparison of Max Pressure models
This superiority is most pronounced during the morning and evening
peaks (08:00–09:00 and 16:00–17:00) time intervals, where Scenario In the analysis presented in Table 3, it is observed that Scenario
C exhibits a decrease in maximum delay, total delay, and maximum MP outperforms Scenario C in terms of total delay and queue dissi-
queue length by 30.5%, 25.1%, and 7.1% for the morning peak and by pation time, which decreased by 17.9% and 33.0% for morning peak,
23.6%, 17.9%, and 3.75% for evening peak, respectively. On the other by 25.3% and 41.2% for noon off-peak and by 18.6% and 28% for
hand, due to low traffic volume in off-peak hours (12:00–13:00), the evening peak, respectively. Interestingly, the average queue length and
reductions are not as substantial. maximum delay in Scenario MP are higher than in Scenario C, yet
Additionally, Table 3 further corroborates that the queue length and due to the faster queue dissipation, the total delay is much smaller in
delay trends for Scenario C are mostly under Scenario A. This happens Scenario MP.

11
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Table 3
Simulation results.
Traffic signal Avg. queue Max. queue Total Max. Queue dissipation
control length (PCU) length (PCU) delay (h) delay (s) time (s)
08:00–09:00
Scenario A 16.62 39.2 13 873.69 6379 26 573
Scenario C 16.04 36.4 10 391.49 4436 24 740
Scenario MP 19.61 35.9 8532.81 5532 16 587
Scenario MPH 19.35 35.6 7878.85 5035 16 126
Scenario CMPH 18.88 37 8466.97 5475 17 373
Scenario RLD 16.65 36 9793.78 5894 21 761
Scenario RLP 18.35 33.6 5369.86 3734 14 629
Scenario H 17.88 30.5 2382.38 2608 11 392
12:00–13:00
Scenario A 16.40 41.7 8419.67 4524 20 298
Scenario C 16.06 41.1 8187.58 4109 20 053
Scenario MP 25.11 40.7 6118.34 4201 11 798
Scenario MPH 22.05 40.2 4712.09 3564 11 306
Scenario CMPH 26.54 41.1 6859.32 4403 12 374
Scenario RLD 24.14 40.3 5745.67 4347 11 677
Scenario RLP 21.70 40.0 3643.06 2824 10 273
Scenario H 24.12 37.6 1191.80 1460 7240
16:00–17:00
Scenario A 12.48 40 6319.13 4488 21 720
Scenario C 13.04 38.5 5189.34 3427 19 473
Scenario MP 15.18 39.4 4224.05 4132 14 022
Scenario MPH 15.61 38.1 3519.98 3162 12 487
Scenario CMPH 19.97 39 4962.31 4319 12 039
Scenario RLD 13.61 39.9 4068.25 4072 15 494
Scenario RLP 13.17 34.4 2797.25 2262 13 121
Scenario H 14.83 28.6 640.15 1113 7387

Further, the proposed Max Pressure algorithm with different vehicle acyclic nature. Furthermore, the analysis reveals that scenario CMPH
mixes (Scenario MPH), is also investigated (see Fig. 3). Compared to introduces a practical measure by allocating a minimum green time of
the Scenario MP, the Scenario MPH reduces the total delay by 7.7%, 10 s to phase 4, a phase that was previously omitted in scenario MPH,
23%, and 16.7% for morning, noon, and evening periods, respectively. thereby ensuring that all four phases are executed.
Clearly, the estimation of pressure for each phase for different vehicles
and then converting to a common unit (i.e., PCU) yields a significant 4.3. Comparison of reinforcement learning models
improvement in the performance over Scenarios C and MP.
The analysis (see Fig. 3) revealed that despite the implementation While Max Pressure signal control is recognized as a sophisticated
of a strategy where phases are executed in an order prioritized by method for traffic signal control by determining the phase with maxi-
descending pressure values, the overall performance is detrimentally mum pressure, it nonetheless exhibits limitations in its simplistic ap-
affected in comparison to Scenarios MP and MPH. A similar outcome is proach towards the allocation of green times. Therefore, reinforce-
reported by Sun and Yin (2018), where cyclic MP is inferior to MP. ment learning models leveraging Deep Q-Network (DQN) and Proximal
Notably, Scenario CMPH is still better than Scenario C in terms of Policy Optimization (PPO) are introduced (Scenarios RLD and RLP,
total delay (a decrease in the range of 4.4% to 18.52%) and queue respectively). In the comparative analysis between the two RL variants
dissipation time (a decrease in the range of 29.8% to 38.29%). (see Fig. 6), the results demonstrated that scenario RLP outperformed
Further, Fig. 4 distinguishes the allocation of green signal timings scenario RLD across all indicators and time intervals. The superiority
between the Max Pressure variants and scenario C. In scenarios MP in Scenario RLP’s performance over Scenario RLD is best highlighted
and MPH, the traffic phases that exhibit the maximum pressure are in the 08:00–09:00 interval with 45.17% and 32.77% reduction in total
allocated green signals, diverging from the predetermined cyclic order delay and queue dissipation time, respectively.
characteristic of scenario C. It is noteworthy that in Scenarios MP Notably, while the green time allocation is dynamic and responsive
and MPH, certain traffic phases, especially those marked by signif- to real-time traffic conditions, the sequence of the phases remains
icantly low-pressure values, are susceptible to prolonged red times. fixed and cyclic for both RL variants. Compared with the benchmark
Specifically, phase 2 in scenario MP and phase 4 in scenario MPH are (Scenario C), the reduction in total delay and queue dissipation times
illustrative of this phenomenon. Also, certain phases, particularly those for Scenario RLP is in the range of 46.1% to 55.5% and 32.6% to 48.8%,
characterized by exceedingly high pressure values, are getting green respectively. Further, comparing the results of Scenario RLP with Max
timing more than once in the first four green phases (see phase 1 for Pressure-based algorithms (Scenarios MP, MPH, and CMPH), it can be
morning and noon periods and phase 3 for the evening period). To observed that reinforcement learning performs much better, which is
showcase this, refer to Fig. 5, which shows the phase timings for 1000 aligned with the literature (Van der Pol & Oliehoek, 2016; Wei, Zheng,
consecutive seconds of evening periods, starting at 𝑡 = 100. Looking Yao, & Li, 2018). Though it was not recorded systematically and was
at the first 100 s, it can be observed that Phase 1 and Phase 3 are not a primary objective of the present study, it has been observed that
getting green twice, phase 2 is getting green once, and phase 4 is Scenario RLP has a much lesser computational burden than Scenario
unable to get green till 141 s This is due to the pressure differential and RLD.

12
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 3. Comparison of queue lengths and delays for Max Pressure scenarios.

4.4. Performance of the proposed hybrid model Table 4

Performance comparison of scenario H.
% Reduction in % Reduction in queue
The present study proposed a hybrid scenario that integrates the total delay dissipation time
best-performing model from Max Pressure and reinforcement learning
08:00–09:00
groups (i.e., Scenarios MPH and RLP). As explained in Section 2.4, MP
Scenario C 77.07 53.95
determines the pressure on each phase and dictates the order of the
Scenario MPH 69.76 29.36
phases. RL estimates the green timings for each phase. The results are Scenario RLP 55.63 22.13
shown in Tables 3 and 4 and Figs. 7 and 8. The results derived from 12:00–13:00
Scenario H exceed the performance metrics of all prior scenarios in
Scenario C 85.44 63.89
terms of all indicators and for all periods. For instance, the total delay Scenario MPH 74.71 35.96
in Scenario H is reduced in the range of 77.07% to 87.66%, 69.76% to Scenario RLP 67.29 29.52
81.81%, and 55.63% to 77.12%, with respect to Scenarios C, MPH, and 16:00–17:00
RLP, respectively. Similarly, the queue dissipation time is also reduced Scenario C 87.66 62.07
considerably. Interestingly, the improvements are distinctly visible in Scenario MPH 81.81 40.84
off-peak hours, which was not the case while comparing Scenario C Scenario RLP 77.12 43.7
with Scenario A.
The observations from the latter portion of the simulation (Fig. 7)
emphasize sustained periods marked by very low queue lengths, delays, proposed scenario H amalgamates the strengths of the preceding strate-
and queue dissipation time. This phenomenon can be ascribed to the gies by assigning RL-derived green times to phases, which are then
traffic dynamics during these intervals, where traffic predominantly activated in descending order based on their pressure values.
traverses through minor approaches, following the alleviation of con- Additionally, an analysis of the reward metrics is illustrated in
gestion on major approaches. The efficacy of Scenario H is especially Fig. 9. It reveals a comparative superiority of scenario H over scenarios
notable in these conditions, as it demonstrates an expedited dissipation RLD and RLP. Notably, for intervals 08:00–09:00 and 16:00–17:00,
of such traffic queues. In contrast, other scenarios exhibit a prolonged Scenario H demonstrates a progressive escalation in rewards through-
duration for such queue resolution. out the simulation. It is interesting to see that Scenario H has a
Fig. 8 shows a comparative analysis of signal timings for Scenario higher reward in the initial stage compared to Scenarios RLP and RLD,
RLP and the proposed scenario H. Unlike Scenarios C (cf. Fig. 4) and i.e., benefits of MP are visible in the early stage itself. Further, the
RLP, which adhere to a fixed and cyclic sequence of phase activations, proposed hybrid model is able to attain a higher reward value sooner
the MP variant prioritizes phases based on the highest ‘‘pressure’’ than other scenarios. Clearly these trends emphasize the performance
values, potentially bypassing certain phases if deemed necessary. The of the proposed hybrid model.

13
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 4. Comparison of signal timing for Scenario C, MP, MPH, and CMPH at different time periods of the day. The three columns represent the time periods 08:00–09:00,
12:00–13:00 and 16:00–17:00, respectively.

Table 5
Wilcoxon signed-rank test results.
Average queue length Total delay Queue
dissipation time
(PCU) (h) (s)
08:00–09:00
Statistic (𝑊 ) 0.0 0.0 0.0
𝑝-value 1.863 × 10−9 1.863 × 10−9 1.863 × 10−9
Decision reject 𝐻0 reject 𝐻0 reject 𝐻0
12:00–13:00
Statistic (𝑊 ) 0.0 0.0 0.0
𝑝-value 1.863 × 10−9 1.863 × 10−9 1.863 × 10−9
Decision reject 𝐻0 reject 𝐻0 reject 𝐻0
16:00–17:00
Fig. 5. Signal timings for Scenario MPH (evening period) for simulation period 100 s Statistic (𝑊 ) 16.0 0.0 0.0
to 1000 s. 𝑝-value 3.148 × 10−7 1.863 × 10−9 1.863 × 10−9
Decision reject 𝐻0 reject 𝐻0 reject 𝐻0

To further confirm the statistical significance of the performance

gap between Scenario RLP (the most effective reinforcement learning times of the day. The test statistic, represented as 𝑊 , is calculated by
algorithm) and Scenario H (the proposed algorithm), a non-parametric summing either the ranks of positive differences or negative differences,
test (Wilcoxon signed-rank test) is employed (Park, Han, Park, Jeong, selecting the smaller sum. The null hypothesis (𝐻0 ) posits that the
& Yun, 2021). It determines if there is a significant difference in mean median of the differences between the two paired sets is zero. In
ranks between two matched or related samples. For both scenarios, the contrast, the alternative hypothesis (𝐻1 ) suggests that the median of
simulation is run 30 times with the same travel demand at different these differences is non-zero. The p-values (cf. Table 5) clearly indicate

14
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 6. Comparison of queue lengths and delays for RL scenarios.

a statistically significant difference (at a 99% confidence level) in the system will enable efficient usage of the resources. In brief, this will
key performance indicators (i.e., average queue length, total delay, not only reduce installation time and costs but also reduce the need
and queue dissipation time) of scenarios RLP and H across all time for interruption of traffic conditions without any compromise in the
intervals. This statistical testing confirms that the proposed algorithm efficiency of the whole system.
significantly outperforms reinforcement learning with the proximal On the hardware side, the proposed algorithm is deployed in an
policy optimization algorithm. embedded computer vision device that processes videos using different
approaches in real-time. If the camera angle is changed by physical
4.5. Practical implementation or climatic factors in a way that prevents the road (region of interest)
from being focussed, the detection/ count would be erroneous. Further,
Some of the algorithms in the literature are computationally severe weather conditions may further affect the accuracy of the object
resource-intensive, which limits their applicability to the real world. detection, tracking, and counting modules.
To facilitate this, the proposed algorithm is embedded in a hardware Practically, the system was tested satisfactorily. However, a few
setup (cf. Fig. 11) for an intersection as proposed in Agarwal, Krish- failures may occur, for instance, failure of communication between
nan O., Ravi, and Saxena (2023). It is a wireless hardware architecture the server and edge module, crash of processor, power failures, com-
(cf. Fig. 10) where all approaches of an intersection have a processor patibility failures in the future during software degradation due to
and relay module setup (cf. Figs. 11(b) and 11(c)). A block diagram for dependency issues, etc.
an approach of an intersection is shown in Fig. 10. In this, the processor
is connected to a camera, which captures the images; these images are
4.6. Sensitivity analysis
processed in real-time to get the classified vehicle counts (Agarwal,
Thombre et al., 2024). The data from all approaches of an intersection
is stored on a cloud relational database server. The processor is also In order to test the robustness of the proposed algorithm, Scenario H
connected to the cloud computing node, where the proposed algorithm is run 30 times to manifest the variability in the performance indicators
is embedded. The whole setup is tested under laboratory conditions across different runs. The travel demand is fixed for three time periods.
using recorded videos of different approaches. The mean pre-processing The major source of the stochasticity is in the scenario. For instance,
time, inference time, and post-processing time are 12.92, 33.27, and the initial action is chosen randomly, which may affect the direction of
4.94 ms. Thus, the whole system worked without any lag in the the learning path and eventually affect the phase timings.
processing of the videos or delay in running the algorithm to get the Fig. 12 demonstrates the variation in average queue length, total
phase sequence and optimized signal timings. delay, and queue dissipation time over 30 simulation runs of Scenario
The proposed algorithm is deployed on a wireless hardware archi- H at different periods of the day. Further, Table 6 lists various measures
tecture, which would facilitate the retrofitting of an existing fixed/ of central tendency, 95% confidence intervals, statistical testing out-
adaptive traffic signal control system. The retrofitting of an existing comes, and variations for average queue length, total delay, and queue

15
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 7. Comparison of queue lengths and delays for the proposed scenario.

Fig. 8. Comparison of signal timing for different scenarios and time periods of the day. The three columns represent the time periods 08:00–09:00, 12:00–13:00 and 16:00–17:00,
respectively.

dissipation time at different periods of the day. The coefficient of vari- uniform distribution is adopted as the benchmark. Within this frame-
ation for average queue length, total delay, and queue dissipation time work, two tests are particularly pertinent: the Kolmogorov–Smirnov
is in the range of 0.008–0.019, 0.030–0.042, and 0.01–0.013, respec- (K-S) and Cramér-von Mises (C-vM). The K-S test, a well-established
tively. Similarly, the 95% confidence interval is very tight. Clearly, the methodology, compares the sample’s empirical distribution function
variations are very low, which confirms the robustness of the proposed with the cumulative distribution function of the hypothesized uniform
algorithm towards high performance. To further assess the sensitivity of distribution. The test statistic, 𝐷, is defined as the maximum absolute
the algorithm’s output, statistical evaluations are conducted. Given that difference between the empirical and theoretical distribution functions.
the ideal scenario anticipates consistent results across all 30 runs, the On the other hand, the C-vM test assesses the integrated squared

16
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 9. Comparison of rewards for scenarios RLP and H across different time periods.

Table 6
Descriptive statistics for Scenario H.
Average queue length Total delay Queue dissipation time
(PCU) (h) (s)
08:00–09:00
Mean 17.706 2438.725 11 516.90
95% confidence interval [17.63, 17.78] [2411.35, 2466.10] [11473.55, 11560.25]
Median 17.705 2445.201 11 508.50
Standard deviation 0.2 72.074 114.14
Coefficient of variation 0.011 0.030 0.01
statistic (𝐷) 0.217 0.170 0.211
K-S test 𝑝-value 0.103 0.314 0.119
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
statistic (𝜔2 ) 0.247 0.177 0.230
C-vM test 𝑝-value 0.192 0.317 0.216
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
12:00–13:00
Mean 24.367 1259.035 7349.70
95% confidence interval [24.29, 24.44] [1241.91, 1276.16] [7313.14, 7386.26]
Median 24.342 1254.155 7373
Standard deviation 0.201 45.101 96.259
Coefficient of variation 0.008 0.036 0.013
statistic (𝐷) 0.137 0.119 0.207
K-S test 𝑝-value 0.575 0.746 0.131
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
statistic (𝜔2 ) 0.146 0.097 0.268
C-vM test 𝑝-value 0.403 0.602 0.167
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
16:00–17:00
Mean 15.087 656.578 7366.367
95% confidence interval [14.98, 15.20] [646.17, 666.98] [7329.93, 7402.81]
Median 15.123 655.621 7382
Standard deviation 0.291 27.401 95.945
Coefficient of variation 0.019 0.042 0.013
statistic (𝐷) 0.128 0.230 0.227
K-S test 𝑝-value 0.667 0.072 0.078
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0
statistic (𝜔2 ) 0.101 0.314 0.227
C-vM test 𝑝-value 0.582 0.123 0.221
decision cannot reject 𝐻0 cannot reject 𝐻0 cannot reject 𝐻0

differences throughout the distribution, denoted as 𝜔2 , offering a more Further, the phase timing is given by the reinforcement learning al-
comprehensive evaluation of the overall fit. For both tests, the null gorithm using proximal policy optimization. The study compares the
hypothesis (𝐻0 ) asserts that the sample originates from a uniformly outcome of the proposed algorithm with commonly used algorithms in
distributed population, while the alternative hypothesis (𝐻1 ) contends practice (e.g., actuated, CoSiCoSt), standard Max Pressure, and Rein-
that the sample does not derive from such a distribution. The results forcement Learning algorithms. The proposed algorithm demonstrates
from both tests substantiate that the KPIs (i.e., average queue length, considerable promise in diminishing total delay and enhancing queue
total delay, and queue dissipation time) demonstrate uniform behavior dissipation with respect to different categories of approaches, such
consistently across the 30 runs. as actuated, CoSiCoSt, Max Pressure, and Reinforcement Learning. A
reduction in total delay inherently suggests a decrease in the total travel
5. Discussion time. Less time spent at traffic intersections leads to lesser vehicular
emissions, fuel consumption, and noise pollution, thereby positing a po-
The present study proposes a hybrid algorithm to determine the tentially significant positive impact on the environment. Additionally,
phase sequence and timing. For phase sequence, Max Pressure con- it assists in reducing the negative effects on the health of the travelers.
trol is used, which is determined using the classified traffic volume. This happens due to a direct reduction in exposure duration, which

17
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Table 7
Performance comparison of Scenarios SAC, GNN with Scenario H.
Traffic signal Avg. queue Max. queue Total Max. Queue dissipation
control length (PCU) length (PCU) delay (h) delay (s) time (s)
08:00–09:00
Scenario SAC 19.24 34.5 5056.14 3469 13 116
Scenario GNN 19.52 34.6 3820.94 4051 12 617
Scenario H 17.88 30.5 2382.38 2608 11 392
12:00–13:00
Scenario SAC 22.36 40.6 3492.34 2737 9832
Scenario GNN 23.82 38.6 1988.11 2396 8678
Scenario H 24.12 37.6 1191.80 1460 7240
16:00–17:00
Scenario SAC 13.48 33.1 2355.65 2444 11 769
Scenario GNN 16.14 35.6 2195.26 2324 10 198
Scenario H 14.83 28.6 640.15 1113 7387

between 13.14% and 37.23% against Scenario SAC, and between 9.71%
and 27.56% in comparison to Scenario GNN. These quantitative im-
provements are further substantiated in Fig. 13, which aligns with and
supports these findings. Further, it is noteworthy that while Scenario
GNN exemplifies a complex reinforcement learning architecture, it
is characterized by significant computational demands. In contrast,
Scenario SAC, although computationally efficient, does not perform as
well as Scenario H in terms of KPIs.

5.2. Limitations and way forward

The proposed algorithm has a few limitations, which may be ad-

dressed in future works. In the present work, SUMO is used to sim-
ulate different scenarios and compare the outcomes. Though SUMO
offers different vehicle types, it does not offer many other behaviors
Fig. 10. Block diagram of architecture at an approach of an intersection. (e.g., seepage, lane filtering, Agarwal and Lämmel (2016)) prevalent in
many Asian cities. Potentially, the benefits from the proposed algorithm
would be further elevated due to the inclusion of mixed traffic in
plays a critical role in overall exposure to air pollution (Agarwal & the estimation of pressure for each phase. The proposed algorithm
Kaddoura, 2019; Singh, Meena, & Agarwal, 2021). relies on the classified traffic counts, and thus, it becomes critical to
In addition to the traffic intersections, the proposed algorithm may get reliable estimates, especially if the queues are longer and may be
be applied to other areas; for instance, to optimize the flow of internet barely in the field of view of the camera or other intrusive sensors.
packets, to optimize the pedestrian flows to a religious place or an event The use of specific non-intrusive measures (e.g., crowdsourced data)
may be useful in such scenarios (e.g., Agarwal, Sahu et al., 2024). In
(i.e., multiple entries/ exists), etc.
the future, transit signal priority and pre-emption of intersections for
emergency vehicles may be integrated with the proposed algorithm,
5.1. Performance comparison with existing studies
which may lead to changes in the phase sequence in real-time. The
present study considers a minimum green time of 10 s for each phase,
Table 1 lists the past studies and their performance with respect
which provides ample time for pedestrians to cross a four-lane dual
to different benchmark algorithms. A large number of studies have
carriageway (cf. Section 3.3). However, it might be affected by the
used fixed or actuated signal control and variants of SCATS, SCOOT,
number of pedestrians who would like to cross the street in each phase.
MP, RL, etc. These studies have used different indicators to show
Thus, in the future, it is recommended to include the pedestrian flow
the performance. Clearly, compared to the past studies, the proposed
in the model for effective green timing for each phase. This work
algorithm is able to reduce the delay to a larger extent due to the
considered only one possible combination of the 4-phase systems. In
nudging for efficient phase ordering and optimized signal timing for a few instances, the selection of an appropriate phase system may
each phase. also facilitate overall enhancement in the performance. Therefore, in
Further, to evaluate the efficacy of the proposed Scenario H, it is the future, different possible phase systems may also be dynamically
juxtaposed with isolated traffic signal control approaches as delineated selected. It is crucial to acknowledge that this study does not consider
in the existing literature. Mao et al. (2022) describe the implementation diverse driver behaviors such as aggression, risk tolerance, distractions,
of the Soft Actor-Critic (SAC) algorithm, while Zhao et al. (2024) and inattentiveness, factors that are particularly significant during
discuss the integration of heterogeneous graph neural networks with an amber times. Thus, possibly a safety-related feature (e.g., the number
optimized deep reinforcement learning algorithm, which amalgamates of conflicts) may also be included in the algorithm (e.g., Wang, Tang,
Double DQN and Dueling DQN (Scenario GNN). These methodologies Li, Liu, & Zhu, 2019). The outcome of the proposed algorithm may
are implemented as separate scenarios – Scenario SAC and Scenario be compared with many-objective optimization models (e.g., Jia, Lin,
GNN, respectively – keeping the order of phases sequential. The results Luo, Li, & Miao, 2019; Saxena, Mittal, Kapoor, & Deb, 2023), which
of the analysis, as shown in Table 7, unequivocally illustrate the may include various other components (e.g., emissions, safety) into
superior performance of proposed Scenario H compared to Scenarios consideration for phase timing. Moreover, the proposed algorithm has
SAC and GNN. Specifically, Scenario H achieves a reduction in total not yet been tested for multiple intersections of a network. The phase
delay ranging from 52.88% to 72.82% compared to Scenario SAC, timing at one intersection may have spill-back effects on adjacent
and from 37.65% to 70.84% relative to Scenario GNN. Additionally, intersections. Thus, the use of the proposed algorithm at the network
improvements in queue dissipation time are notable, with reductions level is yet to be explored.

18
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 11. Hardware set up for practical implementation.

Fig. 12. Performance of Scenario H over different simulation runs.

6. Conclusions dissipation times. These enhancements were observed in comparison

not only with benchmark algorithms — namely, actuated and CoSiCoSt
This research embarked on addressing the critical challenge of — but also across all other tested scenarios. Notably, when juxtaposed
urban congestion through the lens of advanced adaptive traffic signal with the CoSiCoSt algorithm, which emerges as the best-performing
control methodologies, particularly within the Indian context charac- benchmark, the observed improvements for total delay and queue dissi-
terized by its diverse vehicular composition with varying kinematic pation time in the proposed algorithm are remarkable, ranging between
characteristics. Yet fewer studies have proposed a solution to opti- 77.07–87.66% and 53.95–63.89%, respectively. Further, a comparative
mize the signal timing; many of them remained limited to simulation- analysis of the proposed algorithm with leading Max Pressure and
based studies, and thus, there is a need to develop algorithms us- Reinforcement Learning (RL) variants (i.e., MPH and RLP) showcases
ing advanced techniques, which are fully implementable in the field. substantial advancements as well. The hybrid model’s improvements
Recognizing the limitations inherent in conventional adaptive traf- over these variants are quantified in the ranges of 69.76–81.81% for to-
fic signal control techniques, such as those based on reinforcement tal delay and 29.36–40.84% for queue dissipation time when compared
learning, which necessitates modifications to manage the heterogeneity to MPH. Similarly, against RLP, the enhancements are in the range of
of traffic conditions effectively, this study introduced a novel hybrid 55.63–77.12% for total delay and 22.13–43.7% for queue dissipation
approach. By ingeniously modifying the traditional Max Pressure algo- time.
rithm to better accommodate mixed traffic scenarios and integrating These findings illustrate the hybrid method’s superior capability
it with the cutting-edge Proximal Policy Optimization (PPO)-based RL in addressing the complexities of urban traffic flows, particularly in
method, this research offers a comprehensive solution tailored for the environments with heterogeneous traffic conditions. Following the em-
complexities of heterogeneous urban traffic management. pirical evaluation, statistical tests confirm that the proposed algorithm
The efficacy of the proposed hybrid signal control method was has statistically significantly better performance than Scenario RLP (the
meticulously evaluated using real-world traffic data from Vardhman best-performing reinforcement learning algorithm). Moreover, a sensi-
Chowk in Ludhiana at three different periods (morning, noon, and tivity analysis was conducted to assess the robustness and adaptability
evening) of the day. The simulation analysis provided compelling ev- of the hybrid approach, further substantiating its effectiveness. The
idence of the method’s superiority, showcasing significant improve- analysis, focusing on the nuances of heterogeneous traffic conditions,
ments in performance indicators — queue length, delay, and queue yields coefficient of variation values within the range of 0.008–0.042

19
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Fig. 13. Comparison of queue lengths and delays for the proposed scenario with respect to existing studies.

for average queue length, total delay, and queue dissipation time. Declaration of competing interest
Furthermore, the statistical test confirms the algorithm’s robustness,
demonstrating that the KPIs were consistent across the 30 simulation The authors declare that they have no known competing finan-
runs. In addition to traditional benchmarks, our study also explored cial interests or personal relationships that could have appeared to
comparisons with recent approaches in the literature, such as the Soft influence the work reported in this paper.
Actor-Critic and heterogeneous graph neural networks. The compar-
ative analysis unveiled the enhanced performance of the proposed Data availability
algorithm. After showcasing the effectiveness of the proposed approach
through simulations, the practical implementation of these advanced The data that has been used is confidential.
technologies necessitates thoughtful consideration of potential real-
world challenges. Thus, the proposed algorithm is deployed on a wire- References
less hardware architecture to facilitate seamless integration into traffic
Abdulhai, B., Pringle, R., & Karakoulas, G. J. (2003). Reinforcement learning for
signal control systems. An exciting avenue for enhancing our traffic
true adaptive traffic signal control. Journal of Transportation Engineering, 129(3),
signal control strategy further involves the integration of vehicle pri- 278–285. http://dx.doi.org/10.1061/(asce)0733-947x(2003)129:3(278).
oritization strategies. This study presents the opportunity to integrate Agarwal, A., & Kaddoura, I. (2019). On-road air pollution exposure to cyclists in an
priority levels for specific types of vehicles, such as buses, within agent-based simulation framework. Periodica Polytechnica Transportation Engineering,
the traffic management framework. This approach can be particularly 48(2), 117–125. http://dx.doi.org/10.3311/PPtr.12661.
Agarwal, A., Krishnan O., K., Ravi, D. K., & Saxena, D. K. (2023). Wireless edge
beneficial in urban areas where public transport often faces significant
computing-based adaptive traffic control system with real-time vehicle tracking
delays. Moreover, the concept of vehicle prioritization can be extended and cloud integration. 202311079549.
by considering vehicle occupancy rates. Transitioning from a vehicle- Agarwal, A., & Lämmel, G. (2016). Modeling seepage behavior of smaller vehicles
centric to a person-centric delay system allows for a more holistic in mixed traffic conditions using an agent based simulation. Transportation in
evaluation of traffic efficiency. Developing Economies, 2(8), http://dx.doi.org/10.1007/s40890-016-0014-9.
Agarwal, A., Sahu, D., Nautiyal, A., Agarwal, P., & Gupta, M. (2024). Fusing crowd-
sourced data to an adaptive wireless traffic signal control system architecture.
CRediT authorship contribution statement Internet of Things, http://dx.doi.org/10.1016/j.iot.2024.101169.
Agarwal, A., Thombre, A., Kedia, K., & Ghosh, I. (2024). ITD: Indian traffic dataset
Amit Agarwal: Conceptualization, Methodology, Writing – orig- for intelligent transportation systems. In 2024 16th international conference on
inal draft, Supervision. Deorishabh Sahu: Writing – original draft, cOMmunication systems & nETworkS (pp. 842–850). http://dx.doi.org/10.1109/
COMSNETS59351.2024.10427394.
Methodology, Data analysis. Rishabh Mohata: RL model formulation,
Anderson, L., Pumir, T., Triantafyllos, D., & Bayen, A. M. (2018). Stability and
Analysis. Kuldeep Jeengar: RL model formulation, Simulation, Data implementation of a cycle-based max pressure controller for signalized traffic
analysis. Anuj Nautiyal: Simulation, Data analysis, Writing – original networks. Networks and Heterogeneous Media, 13, http://dx.doi.org/10.3934/nhm.
draft. Dhish Kumar Saxena: Conceptualization, Supervision. 2018011.

20
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Anupriya, Bansal, P., & Graham, D. J. (2023). Congestion in cities: Can road capacity Noaeen, M., Naik, A., Goodman, L., Crebo, J., Abrar, T., Abad, Z. S. H., et al.
expansions provide a solution? Transportation Research Part A: Policy and Practice, (2022). Reinforcement learning in urban network traffic signal control: A systematic
174, Article 103726. http://dx.doi.org/10.1016/j.tra.2023.103726. literature review. Expert Systems with Applications, 199, Article 116830. http://dx.
Bickel, P. J., Chen, C., Kwon, J., Rice, J., van Zwet, E., & Varaiya, P. (2007). Measuring doi.org/10.1016/j.eswa.2022.116830.
traffic. Statistical Science, 22(4), http://dx.doi.org/10.1214/07-sts238. Padinjarapat, R. K., & Mathew, T. V. (2020). Estimation of saturation flow for non-lane
Boukerche, A., Zhong, D., & Sun, P. (2022). A novel reinforcement learning-based based mixed traffic streams. Transportmetrica B: Transport Dynamics, 9(1), 42–61.
cooperative traffic signal system through max-pressure control. IEEE Transactions http://dx.doi.org/10.1080/21680566.2020.1781708.
on Vehicular Technology, 71(2), 1187–1198. http://dx.doi.org/10.1109/tvt.2021. Park, S., Han, E., Park, S., Jeong, H., & Yun, I. (2021). Deep Q-network-based traffic
3069921. signal control models. Plos one, 16(9), Article e0256405.
Bouktif, S., Cheniki, A., & Ouni, A. (2021). Traffic signal control using hybrid action Patel, A., Mathew, T. V., & Venkateswaran, J. (2016). Real-time adaptive signal
space deep reinforcement learning. Sensors, 21(7), 2302. http://dx.doi.org/10. controller for non-lane following heterogeneous road traffic. In 2016 8th inter-
3390/s21072302. national conference on communication systems and networks (pp. 1–6). IEEE, http:
CRRI (2017). Indian highway capacity manual (Indo-HCM). Central Road Research //dx.doi.org/10.1109/comsnets.2016.7439944.
Institute, URL https://crridom.gov.in/indian-highway-capacity-manual. Prashanth, L., & Bhatnagar, S. (2011). Reinforcement learning with average cost for
Dixit, V., Nair, D. J., Chand, S., & Levin, M. W. (2020). A simple crowdsourced delay- adaptive control of traffic lights at intersections. In 2011 14th international IEEE
based traffic signal control. PLoS ONE, 15(4), http://dx.doi.org/10.1371/journal. conference on intelligent transportation systems (pp. 1640–1645). IEEE.
pone.0230598. Pumir, T., Anderson, L., Triantafyllos, D., & Bayen, A. M. (2015). Stability of modified
Ducrocq, R., & Farhi, N. (2023). Deep reinforcement Q-learning for intelligent traffic max pressure controller with application to signalized traffic networks. In 2015
signal control with partial detection. International Journal of Intelligent Transportation American control conference (pp. 1879–1886). http://dx.doi.org/10.1109/ACC.2015.
Systems Research, 21(1), 192–206. http://dx.doi.org/10.1007/s13177-023-00346-4. 7171007.
Eom, M., & Kim, B.-I. (2020). The traffic signal control problem for intersections: Radhakrishnan, P., & Mathew, T. V. (2011). Passenger car units and saturation
A review. European Transport Research Review, 12(1), http://dx.doi.org/10.1186/ flow models for highly heterogeneous traffic at urban signalised intersections.
s12544-020-00440-8. Transportmetrica, 7(2), 141–162. http://dx.doi.org/10.1080/18128600903351001.
Ghosh, T., Anusha, S., Babu, A., & Vanajakshi, L. D. (2023). Performance evaluation Robertson, D., & Bretherton, R. (1991). Optimizing networks of traffic signals in real
of a dynamic signal control system for mixed traffic conditions using sparse time-the SCOOT method. IEEE Transactions on Vehicular Technology, 40(1), 11–15.
data. Transportation Research Record: Journal of the Transportation Research Board, http://dx.doi.org/10.1109/25.69966.
2677(10), 797–807. http://dx.doi.org/10.1177/03611981231163770. Saiki, T., & Arai, S. (2023). Flexible traffic signal control via multi-objective reinforce-
Gregoire, J., Frazzoli, E., de La Fortelle, A., & Wongpiromsarn, T. (2014). Back- ment learning. IEEE Access, 11, 75875–75883. http://dx.doi.org/10.1109/access.
pressure traffic signal control with unknown routing rates. http://dx.doi.org/10. 2023.3296537.
48550/ARXIV.1401.3357.
Saikrishna, C. A., & Anusha, S. (2021). Vehicle actuated signal control system for
Huang, L., & Qu, X. (2020). Improving traffic signal control operations using proximal mixed traffic conditions. In Conference of transportation research group of India (pp.
policy optimization. IET Intelligent Transport Systems, 14(12), 1572–1580. 397–411). Springer.
Jia, H., Lin, Y., Luo, Q., Li, Y., & Miao, H. (2019). Multi-objective optimization
Saxena, D. K., Mittal, S., Kapoor, S., & Deb, K. (2023). A localized high-fidelity-
of urban road intersection signal timing based on particle swarm optimization
dominance-based many-objective evolutionary algorithm. IEEE Transactions on
algorithm. Advances in Mechanical Engineering, 11(4), Article 168781401984249.
Evolutionary Computation, 27(4), 923–937. http://dx.doi.org/10.1109/TEVC.2022.
http://dx.doi.org/10.1177/1687814019842498.
3188064.
Kouvelas, A., Lioris, J., Fayazi, S. A., & Varaiya, P. (2014). Maximum pressure
Sims, A. G., & Dobinson, K. W. (1980). The Sydney coordinated adaptive traffic (SCAT)
controller for stabilizing queues in signalized arterial networks. Transportation
system philosophy and benefits. IEEE Transactions on Vehicular Technology, 29,
Research Record: Journal of the Transportation Research Board, 2421(1), 133–141.
130–137, URL https://api.semanticscholar.org/CorpusID:26889615.
http://dx.doi.org/10.3141/2421-15.
Singh, V., Meena, K. K., & Agarwal, A. (2021). Travellers’ exposure to air pollution:
Kuyer, L., Whiteson, S., Bakker, B., & Vlassis, N. (2008). Multiagent reinforcement
A systematic review and future directions. Urban Climate, 38, http://dx.doi.org/10.
learning for urban traffic control using coordination graphs. In Machine learning and
1016/j.uclim.2021.100901.
knowledge discovery in databases: European conference, ECML pKDD 2008, antwerp,
Studer, L., & Ketabdari, M. (2015). Analysis of adaptive traffic control systems design
Belgium, September 15-19, 2008, proceedings, part I 19 (pp. 656–671). Springer.
of a decision support system for better choices. Journal of Civil & Environmental
Le, T., Vu, H. L., Walton, N., Hoogendoorn, S. P., Kovács, P., & Queija, R. N.
Engineering, 05(06), http://dx.doi.org/10.4172/2165-784x.1000195.
(2017). Utility optimization framework for a distributed traffic control of urban
SUMO (2024). Simulation of urban mobility. URL https://eclipse.dev/sumo/.
road networks. Transportation Research, Part B (Methodological), 105, 539–558.
http://dx.doi.org/10.1016/j.trb.2017.10.004. Sun, X., & Yin, Y. (2018). A simulation study on max pressure control of signalized
Levin, M. W., Barman, S., Robbennolt, J., Hu, J., Odell, M., & Kang, D. (2022). intersections. Transportation Research Record: Journal of the Transportation Research
Towards implementation of Max-Pressure signal timing on Minnesota roads: Technical Board, 2672(18), 117–127. http://dx.doi.org/10.1177/0361198118786840.
report, Minnesota Department of Transportation, URL https://hdl.handle.net/20. Tassiulas, L., & Ephremides, A. (1992). Jointly optimal routing and scheduling in
500.14153/mndot.3902. packet ratio networks. Institute of Electrical and Electronics Engineers. Transactions
Levin, M. W., Hu, J., & Odell, M. (2020). Max pressure signal control with cyclical on Information Theory, 38(1), 165–168. http://dx.doi.org/10.1109/18.108264.
phase structure. Transportation Research Part C (Emerging Technologies), 120, Article TraCI (2024). Traffic control interface. URL https://sumo.dlr.de/docs/TraCI.html.
102828. http://dx.doi.org/10.1016/j.trc.2020.102828. Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for
Liu, H., & Gayah, V. V. (2022). A novel Max pressure algorithm based on traffic traffic light control. In Proceedings of learning, inference and control of multi-agent
delay. Transportation Research Part C (Emerging Technologies), 143, Article 103803. systems: vol. 8, (pp. 21–38).
http://dx.doi.org/10.1016/j.trc.2022.103803. Varaiya, P. (2013a). Max pressure control of a network of signalized intersections.
Mannion, P., et al. (2016). Exploring novel approaches in reinforcement learning. Transportation Research Part C (Emerging Technologies), 36, 177–195. http://dx.doi.
International Journal of Machine Learning, 22(5), 345–359. org/10.1016/j.trc.2013.08.014.
Mao, F., Li, Z., & Li, L. (2022). A comparison of deep reinforcement learning models Varaiya, P. (2013b). The max-pressure controller for arbitrary networks of signalized
for isolated traffic signal control. IEEE Intelligent Transportation Systems Magazine, intersections. In Complex networks and dynamic systems (pp. 27–66). Springer New
15(1), 160–180. York, http://dx.doi.org/10.1007/978-1-4614-6243-9_2.
Maripini, H., Vanajakshi, L., & Chilukuri, B. R. (2022). Optimal signal control design Verghese, V., Chenhui, L., Subramanian, S. C., Vanajakshi, L., & Sharma, A. (2017).
for isolated intersections using sample travel-time data. Journal of Advanced Development and implementation of a model-based road traffic-control scheme.
Transportation, 2022, 1–16. http://dx.doi.org/10.1155/2022/7310250. Journal of Computing in Civil Engineering, 31(3), http://dx.doi.org/10.1061/(asce)
Maripini, H., Vanajakshi, L., & Chilukuri, B. R. (2024). A probe-based demand cp.1943-5487.0000635.
responsive signal control for isolated intersections under mixed traffic conditions. Wang, F., Tang, K., Li, K., Liu, Z., & Zhu, L. (2019). A group-based signal timing
Transportation Letters, 1–14. http://dx.doi.org/10.1080/19427867.2022.2164613. optimization model considering safety for signalized intersections with mixed traffic
Mercader, P., Uwayid, W., & Haddad, J. (2020). Max-pressure traffic controller based flows. Journal of Advanced Transportation, 2019, 1–13. http://dx.doi.org/10.1155/
on travel times: An experimental analysis. Transportation Research Part C (Emerging 2019/2747569.
Technologies), 110, 275–290. http://dx.doi.org/10.1016/j.trc.2019.10.002. Wang, X., Yin, Y., Feng, Y., & Liu, H. X. (2022). Learning the max pressure control for
Mirchandani, P., & Head, L. (2001). A real-time traffic signal control system: Ar- urban traffic networks considering the phase switching loss. Transportation Research
chitecture, algorithms, and analysis. Transportation Research Part C (Emerging Part C (Emerging Technologies), 140, Article 103670. http://dx.doi.org/10.1016/j.
Technologies), 9(6), 415–432. http://dx.doi.org/10.1016/s0968-090x(00)00047-4. trc.2022.103670.
Mondal, S., Arya, V. K., & Gupta, A. (2022). An optimised approach for saturation flow Wei, H., Chen, C., Zheng, G., Wu, K., Gayah, V., Xu, K., et al. (2019). PressLight:
estimation of signalised intersections. Proceedings of the Institution of Civil Engineers learning max pressure control to coordinate traffic signals in arterial network. In
- Transport, 175(3), 137–149. http://dx.doi.org/10.1680/jtran.18.00206. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery
Muralidharan, V., Ravikumar, P., & Nitin, S. (2010). A method for synchronizing & data mining (pp. 1290–1298). New York, NY, USA: Association for Computing
heterogeneous road traffic and system thereof. Google Patents, IN Patent IN239258B. Machinery, http://dx.doi.org/10.1145/3292500.3330949.

21
A. Agarwal et al. Expert Systems With Applications 254 (2024) 124416

Wei, L., Gao, L., Yang, J., & Li, J. (2023). A reinforcement learning traffic signal control Yue, W., Li, C., Chen, Y., Duan, P., & Mao, G. (2022). What is the root cause of
method based on traffic intensity analysis. In 2023 42nd Chinese control conference. congestion in urban traffic networks: Road infrastructure or signal control? IEEE
IEEE, http://dx.doi.org/10.23919/ccc58697.2023.10240019. Transactions on Intelligent Transportation Systems, 23(7), 8662–8679. http://dx.doi.
Wei, H., Zheng, G., Yao, H., & Li, Z. (2018). IntelliLight: A reinforcement learning org/10.1109/TITS.2021.3085021.
approach for intelligent traffic light control. In Proceedings of the 24th ACM SIGKDD Zhang, G., Chang, F., Jin, J., Yang, F., & Huang, H. (2024). Multi-objective deep
international conference on knowledge discovery & data mining (pp. 2496–2505). New reinforcement learning approach for adaptive traffic signal control system with
York, NY, USA: Association for Computing Machinery, http://dx.doi.org/10.1145/ concurrent optimization of safety, efficiency, and decarbonization at intersections.
3219819.3220096. Accident Analysis and Prevention, 199, Article 107451. http://dx.doi.org/10.1016/j.
Wiering, M. A., et al. (2000). Multi-agent reinforcement learning for traffic light aap.2023.107451.
control. In Machine learning: proceedings of the seventeenth international conference Zhao, Z., Wang, K., Wang, Y., & Liang, X. (2024). Enhancing traffic signal control with
(pp. 1151–1158). composite deep intelligence. Expert Systems with Applications, 244, Article 123020.
Wongpiromsarn, T., Uthaicharoenpong, T., Wang, Y., Frazzoli, E., & Wang, D. (2012). Zhao, W., Ye, Y., Ding, J., Wang, T., Wei, T., & Chen, M. (2022). IPDALight: Intensity-
Distributed traffic signal control for maximum network throughput. In 2012 15th and phase duration-aware traffic signal control based on reinforcement learning.
international IEEE conference on intelligent transportation systems (pp. 588–595). Journal of Systems Architecture, 123, Article 102374. http://dx.doi.org/10.1016/j.
http://dx.doi.org/10.1109/ITSC.2012.6338817. sysarc.2021.102374.
Yazdani, M., Sarvi, M., Asadi Bagloee, S., Nassir, N., Price, J., & Parineh, H. (2023).
Intelligent vehicle pedestrian light (IVPL): A deep reinforcement learning approach
for traffic signal control. Transportation Research Part C (Emerging Technologies), 149,
Article 103991. http://dx.doi.org/10.1016/j.trc.2022.103991.

Car Dolly Illustrsated Parts Diagram
100% (4)
Car Dolly Illustrsated Parts Diagram
36 pages
Linear Optimal Control Systems
No ratings yet
Linear Optimal Control Systems
22 pages
Control System SP 2020 Final Exam
No ratings yet
Control System SP 2020 Final Exam
5 pages
PHD Thesis Alex Li Niger
100% (1)
PHD Thesis Alex Li Niger
136 pages
A Modified Reinforcement Learning Algorithm For Solving Coordinated Signalized Networks
100% (1)
A Modified Reinforcement Learning Algorithm For Solving Coordinated Signalized Networks
16 pages
Traffic Signal Duration Control by Estimating Vehicle Density
No ratings yet
Traffic Signal Duration Control by Estimating Vehicle Density
6 pages
Traffic Progression Models
100% (1)
Traffic Progression Models
19 pages
MDP For Traffic Light Control Based On Multi-Agent Reinforcement
No ratings yet
MDP For Traffic Light Control Based On Multi-Agent Reinforcement
20 pages
1 Controllability and Observability: MAE 280 B 5 Maur Icio de Oliveira
No ratings yet
1 Controllability and Observability: MAE 280 B 5 Maur Icio de Oliveira
18 pages
Properties of MIMO LTI Systems
No ratings yet
Properties of MIMO LTI Systems
8 pages
Chapter 12
No ratings yet
Chapter 12
91 pages
A Tutorial On Optimal Control Theory: Suresh P. Sethi
No ratings yet
A Tutorial On Optimal Control Theory: Suresh P. Sethi
14 pages
Chapter 4
No ratings yet
Chapter 4
2 pages
Traffic Simultaion
No ratings yet
Traffic Simultaion
12 pages
Chapter 9 Quantitative Feedback Theory
No ratings yet
Chapter 9 Quantitative Feedback Theory
44 pages
Scheduling: Theory, Algorithms and Systems Development: January 1992
No ratings yet
Scheduling: Theory, Algorithms and Systems Development: January 1992
9 pages
CMC Unit-2
No ratings yet
CMC Unit-2
50 pages
Traffic Light Controller Using V HDL
No ratings yet
Traffic Light Controller Using V HDL
10 pages
EACT633 - Optimal Control and It's Applications Worksheet For Chapter 4, 5 & 6
No ratings yet
EACT633 - Optimal Control and It's Applications Worksheet For Chapter 4, 5 & 6
3 pages
Review - 3 - Load Forecasting PDF
No ratings yet
Review - 3 - Load Forecasting PDF
25 pages
Chapter - 2 Wiener Filters
No ratings yet
Chapter - 2 Wiener Filters
25 pages
Direct and Preventive Control in Principles of Management
No ratings yet
Direct and Preventive Control in Principles of Management
12 pages
MSC Thesis Kekatos
No ratings yet
MSC Thesis Kekatos
118 pages
Stochastic Search Methods
100% (1)
Stochastic Search Methods
45 pages
Differential Evolution in Search of Solutions by Vitaliy Feoktistov PDF
No ratings yet
Differential Evolution in Search of Solutions by Vitaliy Feoktistov PDF
200 pages
Lecture - Signalised Intersection PDF
No ratings yet
Lecture - Signalised Intersection PDF
50 pages
Literature Review of PID Controller Based On Various Soft Computing Techniques
No ratings yet
Literature Review of PID Controller Based On Various Soft Computing Techniques
4 pages
Lab 4 QUBE-Servo Bump Test Modeling Workbook (Student)
No ratings yet
Lab 4 QUBE-Servo Bump Test Modeling Workbook (Student)
6 pages
IC6701 May 18 With Key
No ratings yet
IC6701 May 18 With Key
14 pages
VISSIM Based Traffic Flow Simulation Analysis On R
No ratings yet
VISSIM Based Traffic Flow Simulation Analysis On R
13 pages
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters
No ratings yet
Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters
15 pages
A Mathematical Approach To Control System PDF
No ratings yet
A Mathematical Approach To Control System PDF
666 pages
Time Delay Systems
No ratings yet
Time Delay Systems
25 pages
Tutorial I Basics of State Variable Modeling
No ratings yet
Tutorial I Basics of State Variable Modeling
11 pages
Model Based Controller Design For A Spherical Tank
No ratings yet
Model Based Controller Design For A Spherical Tank
6 pages
Razumikhin and Krasovskii Stability Theorems For Time-Varying Time-Delay Systems
No ratings yet
Razumikhin and Krasovskii Stability Theorems For Time-Varying Time-Delay Systems
11 pages
Nonlinear and Adaptive Control: Zhengtao Ding
No ratings yet
Nonlinear and Adaptive Control: Zhengtao Ding
116 pages
Stainless Steel Bipolar Plates For Proton Exchange Membrane Fuel Cells - Materials, Flow Channel Design and Forming Processes
No ratings yet
Stainless Steel Bipolar Plates For Proton Exchange Membrane Fuel Cells - Materials, Flow Channel Design and Forming Processes
23 pages
Design of Indirect MRAS-based Adaptive Control Systems
No ratings yet
Design of Indirect MRAS-based Adaptive Control Systems
5 pages
Frequency Response Analysis
No ratings yet
Frequency Response Analysis
28 pages
Traffic Signal Violation Detection System
No ratings yet
Traffic Signal Violation Detection System
5 pages
Lecture-06 Design of Control Systems in State Space Nust Masters 2022
No ratings yet
Lecture-06 Design of Control Systems in State Space Nust Masters 2022
99 pages
Traffic Light Controller Using V HDL
No ratings yet
Traffic Light Controller Using V HDL
10 pages
Traffic Engineering, Operations & Safety Manual
No ratings yet
Traffic Engineering, Operations & Safety Manual
49 pages
Report PID
100% (1)
Report PID
11 pages
Project 2 CEE 327: Kristal Kozai Jonathan Shuster Katie Hedgecock James Mitchell
No ratings yet
Project 2 CEE 327: Kristal Kozai Jonathan Shuster Katie Hedgecock James Mitchell
10 pages
Reinforcement Learning-Based Tracking Control For A Three Mecanum Wheeled Mobile Robot
No ratings yet
Reinforcement Learning-Based Tracking Control For A Three Mecanum Wheeled Mobile Robot
8 pages
IoT-based Web-Integrated OBD-II Telematics For Real-Time Vehicle Health Monitori
No ratings yet
IoT-based Web-Integrated OBD-II Telematics For Real-Time Vehicle Health Monitori
6 pages
Lec 4
No ratings yet
Lec 4
29 pages
Sliding Mode Control of DC Motor
No ratings yet
Sliding Mode Control of DC Motor
5 pages
A Study of H Infinity and H2 Synthesis For Active Vibration Control
100% (1)
A Study of H Infinity and H2 Synthesis For Active Vibration Control
6 pages
KANNUR UNIVERSITY BTech.S7 EE Syllabus
No ratings yet
KANNUR UNIVERSITY BTech.S7 EE Syllabus
16 pages
Lecture 1
No ratings yet
Lecture 1
40 pages
DR Husari Control Notes
No ratings yet
DR Husari Control Notes
149 pages
Operations Research
25% (4)
Operations Research
2 pages
CS365 Optimization Techniques Module4
No ratings yet
CS365 Optimization Techniques Module4
30 pages
Transfer Function Vs State Space
0% (2)
Transfer Function Vs State Space
2 pages
Power Control Algorithms in Wireless Communication
No ratings yet
Power Control Algorithms in Wireless Communication
19 pages
May 2024 - 03-1
No ratings yet
May 2024 - 03-1
1 page
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
No ratings yet
Transportation Research Part C: Xiang (Ben) Song, Bin Zhou, Dongfang Ma
16 pages
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
No ratings yet
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
10 pages
DCA-AC-AGA 01 Control of Obstacles-2nd Edition-Oct 2019
No ratings yet
DCA-AC-AGA 01 Control of Obstacles-2nd Edition-Oct 2019
21 pages
Hortatory Text - 2: I Think Is Used To Reflect The Write's Personal Opinion
No ratings yet
Hortatory Text - 2: I Think Is Used To Reflect The Write's Personal Opinion
3 pages
2 - Overview of Roles and Responsibilities in Port Operations
No ratings yet
2 - Overview of Roles and Responsibilities in Port Operations
9 pages
Teste Ingles 7ano Sem Solucoes
No ratings yet
Teste Ingles 7ano Sem Solucoes
2 pages
Driver Handbook
No ratings yet
Driver Handbook
53 pages
GG000 Design Manual For Road and Bridge Index
No ratings yet
GG000 Design Manual For Road and Bridge Index
22 pages
Intro Methodology
No ratings yet
Intro Methodology
23 pages
Uatg Ad 2.24.11-4
No ratings yet
Uatg Ad 2.24.11-4
2 pages
Peel Synchro Guidelines
100% (1)
Peel Synchro Guidelines
20 pages
LFMP
No ratings yet
LFMP
18 pages
SBXR Tma-Vitoria Atcsmac 20231130
No ratings yet
SBXR Tma-Vitoria Atcsmac 20231130
1 page
WEEK 15 - Rehearsal On Test 2 - 2024
No ratings yet
WEEK 15 - Rehearsal On Test 2 - 2024
10 pages
Cambridgeshire Design Guide
No ratings yet
Cambridgeshire Design Guide
14 pages
Seoul Transportation 2030
No ratings yet
Seoul Transportation 2030
53 pages
Sample - Incident Specific Press Release
No ratings yet
Sample - Incident Specific Press Release
2 pages
Reading GG
100% (1)
Reading GG
49 pages
CRI 224 Traffic Management and Accident Investigation PDF
No ratings yet
CRI 224 Traffic Management and Accident Investigation PDF
9 pages
Fhwahop 12005
No ratings yet
Fhwahop 12005
208 pages
Technical Schedules - Solapur Flyover R5
No ratings yet
Technical Schedules - Solapur Flyover R5
91 pages
Project Title: Cyberjaya 5-Storey Bus Terminal Project Subject: Road Works Person in Charge: Shanggara A/L Balu Date: 7/3/2021 No - Description Unit Quantity Rate (RM) Amount (RM)
100% (2)
Project Title: Cyberjaya 5-Storey Bus Terminal Project Subject: Road Works Person in Charge: Shanggara A/L Balu Date: 7/3/2021 No - Description Unit Quantity Rate (RM) Amount (RM)
4 pages
Urban Design Korea
No ratings yet
Urban Design Korea
40 pages
Drivers Acknowledgement Form
No ratings yet
Drivers Acknowledgement Form
1 page
IJTech - ME 1464 - Implementation of Traffic Separation Scheme For PR
No ratings yet
IJTech - ME 1464 - Implementation of Traffic Separation Scheme For PR
8 pages
GTX 335 Afms
No ratings yet
GTX 335 Afms
18 pages
Assessment and Management of Quality of Service at An Airport Passenger Terminal
No ratings yet
Assessment and Management of Quality of Service at An Airport Passenger Terminal
27 pages
SL-QMS-24 Line Operations Safety Audit
No ratings yet
SL-QMS-24 Line Operations Safety Audit
23 pages
ÔN TẬP HK1-TA8
No ratings yet
ÔN TẬP HK1-TA8
2 pages
Jsa-Rp-001 Site Preparation and General Earthworks
100% (1)
Jsa-Rp-001 Site Preparation and General Earthworks
20 pages
Faruk CV
No ratings yet
Faruk CV
1 page

Expert Systems With Applications

Uploaded by

Expert Systems With Applications

Uploaded by

Expert Systems With Applications 254 (2024) 124416

Contents lists available at ScienceDirect

Expert Systems With Applications

ARTICLE INFO ABSTRACT

1. Introduction traffic control algorithms under heterogeneous traffic conditions; nev-

(continued on next page)

Algorithm 3 Reinforcement learning (DQN) Algorithm 4 Reinforcement Learning (PPO)

Fig. 1. A possible order of signal phasing system.

Fig. 2. Comparison of queue lengths and delays for benchmark scenarios.

4.4. Performance of the proposed hybrid model Table 4

To further confirm the statistical significance of the performance

Fig. 6. Comparison of queue lengths and delays for RL scenarios.

5.2. Limitations and way forward

The proposed algorithm has a few limitations, which may be ad-

Fig. 11. Hardware set up for practical implementation.

Fig. 12. Performance of Scenario H over different simulation runs.

6. Conclusions dissipation times. These enhancements were observed in comparison

You might also like