RL Report TEAM - 6
RL Report TEAM - 6
Submitted by:
Avinash N 20221LCG0004
Chaithra C 20221CSG0092
Section: 6CSG02
Year: 2025
1
Table of Contents
1 Cover Page 01
2 Table of Contents 02
3 Abstract 03
4 Introduction 04
5 Problem Analysis 05
6 Methodology 07
7 Results 11
8 Conclusions 13
9 Bibliography 13
2
Abstract
Urban traffic congestion is one of the most persistent problems in modern cities, leading to
increased travel delays, fuel consumption, and emissions. Traditional traffic signal control (TSC)
systems, such as fixed-time or actuated controllers, lack the flexibility to adapt in real time to
fluctuating traffic demands. Recent advancements in Artificial Intelligence, particularly Deep
Reinforcement Learning (DRL), offer a promising alternative. DRL combines the learning
capability of Reinforcement Learning (RL) with the powerful feature-extraction ability of Deep
Learning (DL) to enable adaptive and autonomous decision-making. This case study explores the
application of DRL for traffic signal control by analyzing agent-based learning models,
environment interactions, and reward-driven optimization strategies. The report synthesizes
methodologies from state-of-the-art literature, illustrates architectural models, and presents
performance comparisons using key traffic metrics such as average delay, queue length, and
throughput. The findings demonstrate DRL’s potential to outperform traditional systems and
pave the way for more intelligent and responsive urban traffic management solutions.is the
inefficiency of traditional traffic signal control (TSC) systems.
A detailed implementation using the SUMO simulator and Deep Q-Network (DQN) is presented
in this report. The RL model receives state information such as queue lengths and signal phases,
and selects actions like switching or extending signal phases to improve traffic flow. A carefully
designed reward function ensures the learning process aligns with traffic optimization goals.
Results from the simulation show that the RL-based system outperforms traditional fixed-time
and actuated signal control methods. The model adapts to changing traffic patterns, reduces
average waiting times, and increases throughput, making it a promising solution for modern
urban traffic systems. This case study highlights the potential of AI-driven traffic control in
building smarter and more efficient cities.
3
4. Introduction
The rapid pace of urbanization has led to a dramatic increase in the number of vehicles on city
roads, resulting in severe traffic congestion, especially at major intersections. This congestion
contributes not only to wasted commuter time and economic loss but also to increased fuel
consumption and air pollution. One of the critical factors contributing to traffic congestion is the
inefficiency of traditional traffic signal control (TSC) systems.
Conventional TSC systems are generally categorized into three types: fixed-time systems,
actuated systems, and adaptive systems. Fixed-time controllers rely on historical traffic data and
follow predefined schedules, which are often outdated and not reflective of real-time conditions.
Actuated systems use sensors to detect vehicles and respond accordingly but still suffer from
limited adaptability. Adaptive systems offer some level of flexibility but are typically based on
manually tuned rules and are not robust enough to handle highly dynamic environments.
However, the application of traditional RL to traffic control faces significant challenges. One
major issue is the "curse of dimensionality"—as the state and action spaces grow (due to
multiple lanes, directions, and vehicle types), the number of possible state-action pairs increases
exponentially, making learning inefficient and computationally expensive.
The use of DRL in traffic signal control not only enables real-time adaptability but also allows
for more intelligent coordination across multiple intersections. Agents can learn to minimize
queue lengths, waiting times, and travel delays while maximizing throughput and fairness.
Through training in simulated environments and real-world-inspired traffic models, DRL-based
systems have demonstrated significant performance improvements over traditional methods.
This case study aims to explore the theoretical foundation, implementation architectures,
practical results, and future potential of DRL in traffic signal control. Drawing insights from
leading research, including the comprehensive review by Rasheed et al. (2020), this report
presents a structured understanding of how DRL is revolutionizing urban traffic management.
4
5. Problem Analysis
Traditional TSC systems operate under predefined rules and assumptions. While historically
effective, these systems now face growing challenges:
Fixed-Time Control: Operates based on historical traffic data using fixed cycles and
phase durations. It assumes predictable traffic patterns, which rarely hold true during
peak hours, emergencies, or special events.
Semi-Dynamic Control: Employs sensors (e.g., loop detectors) to detect vehicle
presence and adjust green phases accordingly. However, it still lacks the capability to
predict traffic trends or coordinate across intersections.
Fully Dynamic Control: Uses more complex sensor networks and real-time data but
often depends on manually crafted rules or threshold-based logic, which can become
brittle and hard to tune for changing traffic behaviors.
These legacy systems are plagued by three primary problems that significantly affect
performance:
The allocation of time to green signals (phase split) is often fixed or based on simple reactive
logic. When green time is too short, it increases queue length and waiting time. When too long, it
causes green idling—where lights stay green for empty lanes—resulting in wasted throughput at
other directions.
5
C. Lack of Coordination Across Intersections
Urban traffic networks are interconnected. Congestion at one intersection often propagates to
others. Traditional systems usually treat intersections independently, failing to share information
about downstream conditions (e.g., full lanes, blocked exits), leading to cross-blocking—a
scenario where vehicles cannot move forward due to occupied downstream segments, even when
the signal is green.
Even conventional Machine Learning (ML) approaches fall short in this domain:
Supervised Learning requires labeled datasets and cannot adapt on the fly.
Unsupervised Learning finds patterns but doesn’t optimize behavior.
Rule-Based Systems are inflexible and hard to scale or tune for complex environments.
Thus, the real need is for a learning agent that can operate in real-time, adapt continuously,
and optimize control policies based on feedback from the environment—which is where
Deep Reinforcement Learning (DRL) enters the picture.
6
Methodology
This section explains the technical framework and operational flow of applying Deep
Reinforcement Learning (DRL) to Traffic Signal Control (TSC). The DRL-based methodology
models each traffic signal controller as an agent that interacts with its environment (the
intersection and nearby traffic conditions), learns optimal control strategies through trial and
error, and aims to minimize overall congestion and delay.
A typical DRL system is modeled as a Markov Decision Process (MDP) with the following
components:
State (S): Current traffic condition (e.g., queue length, vehicle positions, current light
status).
Action (A): Decisions made by the agent (e.g., switch to a different traffic phase, extend
green time).
Reward (R): Feedback received based on the result of the action (e.g., reduced delay or
congestion).
Policy (π): The strategy that maps states to actions, which the agent aims to optimize.
Value Function (V): The expected cumulative reward for a state under a certain policy.
1. The agent observes the current state StS_tSt (e.g., number of vehicles waiting).
2. It selects an action AtA_tAt (e.g., change or hold traffic phase) based on a policy
π(St)\pi(S_t)π(St).
3. The environment transitions to a new state St+1S_{t+1}St+1 and provides a reward
Rt+1R_{t+1}Rt+1.
4. The agent updates its policy to maximize the long-term reward using a learning algorithm
(like Q-learning or policy gradients).
7
6.3 DRL Agent Architecture (Block Diagram)
Components:
8
6.4 DRL Algorithms Used in TSC
Different DRL architectures and methods are used depending on the complexity of the
environment:
State Representations:
9
Actions:
A.1: Select the next traffic phase (e.g., go straight, turn left).
A.2: Keep or switch current phase, adjust phase duration.
Reward Functions:
A good reward function is critical—it should reflect the real-world performance goals (e.g.,
minimizing delay or maximizing throughput) and guide the agent toward learning effective
strategies.
Simulation tools are used to train and evaluate DRL agents in realistic traffic environments:
These simulators allow integration with DRL frameworks via Python APIs (e.g., TraCI for
SUMO), enabling closed-loop control experiments and data collection.
Distributed multi-agent DRL is increasingly preferred due to its robustness and scalability in
large urban networks.
10
Results
To evaluate the effectiveness of Deep Reinforcement Learning (DRL) in Traffic Signal Control
(TSC), researchers have conducted extensive simulations using platforms such as SUMO,
VISSIM, and AIMSUN. These studies compare DRL-based systems against traditional fixed-
time, semi-dynamic, and actuated control systems using several key performance indicators
(KPIs). The consistent outcome: DRL-based TSC models outperform conventional methods
across multiple metrics.
Metric Description
Mean time vehicles wait at an intersection
Average Waiting Time
before passing through
Number of vehicles lined up during red signal
Queue Length
periods
Number of vehicles that pass through an
Throughput
intersection in a time period
Time lost compared to ideal travel conditions
Delay Time
(green wave)
Percentage of green signal duration that is
effectively used
Green Time Utilization
11
Notable Model Outcomes
12
Conclusion
Deep Reinforcement Learning has demonstrated significant potential in optimizing traffic signal
control, outperforming traditional methods in adaptability and performance. The ability of DRL
agents to learn from the environment and optimize based on real-time data leads to reduced
congestion, better resource use, and improved commuter experience.
Future research may focus on multi-agent coordination, real-time sensor integration, and hybrid
model approaches combining DRL with rule-based systems.
Bibliography
Rasheed, F., Yau, K.-L. A., Noor, R. M., Wu, C., & Low, Y.-C. (2020). Deep Reinforcement
Learning for Traffic Signal Control: A Review. IEEE Access, 8, 208016–208045.
Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015). Human-level control through deep
reinforcement learning. Nature, 518, 529–533.
Genders, W., & Razavi, S. (2016). Using a deep reinforcement learning agent for traffic
signal control. arXiv preprint arXiv:1611.01142.
13