0% found this document useful (0 votes)
20 views13 pages

RL Report TEAM - 6

This case study explores the application of Deep Reinforcement Learning (DRL) for traffic signal control, highlighting its advantages over traditional systems in managing urban traffic congestion. The report details the methodology, implementation using the SUMO simulator, and results demonstrating DRL's effectiveness in reducing average waiting times and increasing throughput. Despite its potential, challenges such as scalability and real-world deployment remain, suggesting areas for future research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

RL Report TEAM - 6

This case study explores the application of Deep Reinforcement Learning (DRL) for traffic signal control, highlighting its advantages over traditional systems in managing urban traffic congestion. The report details the methodology, implementation using the SUMO simulator, and results demonstrating DRL's effectiveness in reducing average waiting times and increasing throughput. Despite its potential, challenges such as scalability and real-world deployment remain, suggesting areas for future research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Itgalpura, Rajanukunte, Bengaluru - 560064

School of Computer Science And Engineering


Reinforcement Learning CSE3011

A Case Study on Reinforcement Learning for Traffic


Signal Control

Submitted by:

STUDENT NAME ROLL NO

Kuruba Arun Kumar 20221LCG0005

Avinash N 20221LCG0004

Chaithra C 20221CSG0092

Section: 6CSG02

Year: 2025

Submitted to: Ms. Smitha S P


Asst. Professor

Date of Submission: 21/04/2025

1
Table of Contents

1 Cover Page 01

2 Table of Contents 02

3 Abstract 03

4 Introduction 04

5 Problem Analysis 05

6 Methodology 07

7 Results 11

8 Conclusions 13

9 Bibliography 13

2
Abstract

Urban traffic congestion is one of the most persistent problems in modern cities, leading to
increased travel delays, fuel consumption, and emissions. Traditional traffic signal control (TSC)
systems, such as fixed-time or actuated controllers, lack the flexibility to adapt in real time to
fluctuating traffic demands. Recent advancements in Artificial Intelligence, particularly Deep
Reinforcement Learning (DRL), offer a promising alternative. DRL combines the learning
capability of Reinforcement Learning (RL) with the powerful feature-extraction ability of Deep
Learning (DL) to enable adaptive and autonomous decision-making. This case study explores the
application of DRL for traffic signal control by analyzing agent-based learning models,
environment interactions, and reward-driven optimization strategies. The report synthesizes
methodologies from state-of-the-art literature, illustrates architectural models, and presents
performance comparisons using key traffic metrics such as average delay, queue length, and
throughput. The findings demonstrate DRL’s potential to outperform traditional systems and
pave the way for more intelligent and responsive urban traffic management solutions.is the
inefficiency of traditional traffic signal control (TSC) systems.

A detailed implementation using the SUMO simulator and Deep Q-Network (DQN) is presented
in this report. The RL model receives state information such as queue lengths and signal phases,
and selects actions like switching or extending signal phases to improve traffic flow. A carefully
designed reward function ensures the learning process aligns with traffic optimization goals.

Results from the simulation show that the RL-based system outperforms traditional fixed-time
and actuated signal control methods. The model adapts to changing traffic patterns, reduces
average waiting times, and increases throughput, making it a promising solution for modern
urban traffic systems. This case study highlights the potential of AI-driven traffic control in
building smarter and more efficient cities.

3
4. Introduction

The rapid pace of urbanization has led to a dramatic increase in the number of vehicles on city
roads, resulting in severe traffic congestion, especially at major intersections. This congestion
contributes not only to wasted commuter time and economic loss but also to increased fuel
consumption and air pollution. One of the critical factors contributing to traffic congestion is the
inefficiency of traditional traffic signal control (TSC) systems.

Conventional TSC systems are generally categorized into three types: fixed-time systems,
actuated systems, and adaptive systems. Fixed-time controllers rely on historical traffic data and
follow predefined schedules, which are often outdated and not reflective of real-time conditions.
Actuated systems use sensors to detect vehicles and respond accordingly but still suffer from
limited adaptability. Adaptive systems offer some level of flexibility but are typically based on
manually tuned rules and are not robust enough to handle highly dynamic environments.

To overcome these limitations, researchers have explored Artificial Intelligence (AI)-based


approaches that can dynamically adapt to real-time traffic conditions. Reinforcement Learning
(RL), a subset of AI, has emerged as a promising technique because of its ability to model
decision-making in complex, stochastic environments. In an RL framework, an agent learns
optimal actions by interacting with the environment and receiving feedback in the form of
rewards.

However, the application of traditional RL to traffic control faces significant challenges. One
major issue is the "curse of dimensionality"—as the state and action spaces grow (due to
multiple lanes, directions, and vehicle types), the number of possible state-action pairs increases
exponentially, making learning inefficient and computationally expensive.

This is where Deep Reinforcement Learning (DRL) becomes transformative. By integrating


deep neural networks into RL algorithms, DRL can approximate complex value functions and
policies more efficiently. It allows agents to learn from high-dimensional inputs like vehicle
positions, speeds, and historical patterns, making it well-suited for dynamic traffic environments.

The use of DRL in traffic signal control not only enables real-time adaptability but also allows
for more intelligent coordination across multiple intersections. Agents can learn to minimize
queue lengths, waiting times, and travel delays while maximizing throughput and fairness.
Through training in simulated environments and real-world-inspired traffic models, DRL-based
systems have demonstrated significant performance improvements over traditional methods.

This case study aims to explore the theoretical foundation, implementation architectures,
practical results, and future potential of DRL in traffic signal control. Drawing insights from
leading research, including the comprehensive review by Rasheed et al. (2020), this report
presents a structured understanding of how DRL is revolutionizing urban traffic management.

4
5. Problem Analysis

Urban traffic congestion is a multifaceted problem influenced by infrastructure limitations,


human behavior, and environmental dynamics. The exponential growth in vehicle ownership and
limited road space have intensified pressure on traffic management systems. At the core of this
issue lies the inefficiency of traditional traffic signal control systems, which often fail to cope
with dynamic and unpredictable traffic patterns.

5.1 Traditional Traffic Signal Control (TSC) Limitations

Traditional TSC systems operate under predefined rules and assumptions. While historically
effective, these systems now face growing challenges:

 Fixed-Time Control: Operates based on historical traffic data using fixed cycles and
phase durations. It assumes predictable traffic patterns, which rarely hold true during
peak hours, emergencies, or special events.
 Semi-Dynamic Control: Employs sensors (e.g., loop detectors) to detect vehicle
presence and adjust green phases accordingly. However, it still lacks the capability to
predict traffic trends or coordinate across intersections.
 Fully Dynamic Control: Uses more complex sensor networks and real-time data but
often depends on manually crafted rules or threshold-based logic, which can become
brittle and hard to tune for changing traffic behaviors.

These legacy systems are plagued by three primary problems that significantly affect
performance:

5.2 Key Traffic Control Issues

A. Inappropriate Traffic Phase Sequences

A traffic phase is a specific combination of green lights allowing non-conflicting vehicle


movements. When these phases follow a rigid sequence (like round-robin), they fail to adapt to
real-time needs. For instance, giving green to a lane with no cars while another is heavily
congested wastes valuable time and contributes to unnecessary delays.

B. Ineffective Phase Splits (Green Time Allocation)

The allocation of time to green signals (phase split) is often fixed or based on simple reactive
logic. When green time is too short, it increases queue length and waiting time. When too long, it
causes green idling—where lights stay green for empty lanes—resulting in wasted throughput at
other directions.

5
C. Lack of Coordination Across Intersections

Urban traffic networks are interconnected. Congestion at one intersection often propagates to
others. Traditional systems usually treat intersections independently, failing to share information
about downstream conditions (e.g., full lanes, blocked exits), leading to cross-blocking—a
scenario where vehicles cannot move forward due to occupied downstream segments, even when
the signal is green.

5.3 Technical Challenges for Intelligent Control

Despite the promise of intelligent systems, several challenges persist:

 High-Dimensional State Space: Each intersection may involve dozens of input


parameters (vehicle counts, speeds, waiting times, etc.), making it computationally
intensive to determine optimal actions.
 Uncertainty and Variability: Traffic flow is inherently stochastic. It can be influenced
by weather, road incidents, time of day, and unpredictable driver behavior.
 Scalability: Controlling multiple intersections in a city-scale network requires distributed
learning, coordination, and communication without overwhelming system resources.
 Real-Time Constraints: Decisions must be made within milliseconds. Any delay in
computation or data processing can reduce system effectiveness.

5.4 Why Traditional AI Falls Short

Even conventional Machine Learning (ML) approaches fall short in this domain:

 Supervised Learning requires labeled datasets and cannot adapt on the fly.
 Unsupervised Learning finds patterns but doesn’t optimize behavior.
 Rule-Based Systems are inflexible and hard to scale or tune for complex environments.

Thus, the real need is for a learning agent that can operate in real-time, adapt continuously,
and optimize control policies based on feedback from the environment—which is where
Deep Reinforcement Learning (DRL) enters the picture.

6
Methodology

This section explains the technical framework and operational flow of applying Deep
Reinforcement Learning (DRL) to Traffic Signal Control (TSC). The DRL-based methodology
models each traffic signal controller as an agent that interacts with its environment (the
intersection and nearby traffic conditions), learns optimal control strategies through trial and
error, and aims to minimize overall congestion and delay.

6.1 DRL Framework Overview

A typical DRL system is modeled as a Markov Decision Process (MDP) with the following
components:

 State (S): Current traffic condition (e.g., queue length, vehicle positions, current light
status).
 Action (A): Decisions made by the agent (e.g., switch to a different traffic phase, extend
green time).
 Reward (R): Feedback received based on the result of the action (e.g., reduced delay or
congestion).
 Policy (π): The strategy that maps states to actions, which the agent aims to optimize.
 Value Function (V): The expected cumulative reward for a state under a certain policy.

6.2 DRL Operational Cycle

1. The agent observes the current state StS_tSt (e.g., number of vehicles waiting).
2. It selects an action AtA_tAt (e.g., change or hold traffic phase) based on a policy
π(St)\pi(S_t)π(St).
3. The environment transitions to a new state St+1S_{t+1}St+1 and provides a reward
Rt+1R_{t+1}Rt+1.
4. The agent updates its policy to maximize the long-term reward using a learning algorithm
(like Q-learning or policy gradients).

7
6.3 DRL Agent Architecture (Block Diagram)

Components:

 Replay Memory: Stores past experiences to break correlation in training data.


 Q-Network: Predicts Q-values for each action in a given state.
 Target Network: Stabilizes training by using separate parameters from the Q-network,
updated less frequently.
 Experience Replay: Random batches from memory are used to train the agent,
improving learning stability.

8
6.4 DRL Algorithms Used in TSC

Different DRL architectures and methods are used depending on the complexity of the
environment:

DRL Algorithm Architecture Key Features


Value-based, stable with
DQN (Deep Q-Network) Fully Connected NN
replay memory
Handles visual state inputs
CNN-DQN Convolutional NN
(e.g., vehicle grids)
3DQN (Double Dueling Better stability and action
Split Streams
DQN) advantage estimation
Combines value-based and
A2C (Advantage Actor-Critic) Actor-Critic
policy-based learning

6.5 State, Action, and Reward Design

State Representations:

 Queue Length (S.1): Number of waiting vehicles per lane.


 Vehicle Position & Speed (S.5, S.6): Spatial distribution of vehicles.
 Current Phase & Signal Timers (S.2, S.3, S.4): Active light and time duration.

9
Actions:

 A.1: Select the next traffic phase (e.g., go straight, turn left).
 A.2: Keep or switch current phase, adjust phase duration.

Reward Functions:

 R.1: Reduction in average waiting time.


 R.2: Difference between outflow and queue length.
 R.3: Penalty for frequent or inefficient phase changes.

A good reward function is critical—it should reflect the real-world performance goals (e.g.,
minimizing delay or maximizing throughput) and guide the agent toward learning effective
strategies.

6.6 Simulation Platforms

Simulation tools are used to train and evaluate DRL agents in realistic traffic environments:

Platform Type Features


SUMO Microscopic Open-source, widely used, flexible APIs
AIMSUN Next Microscopic High-fidelity, real-world traffic modeling
VISSIM Microscopic Detailed behavioral simulation
Paramics Microscopic Used for scenario-specific evaluations

These simulators allow integration with DRL frameworks via Python APIs (e.g., TraCI for
SUMO), enabling closed-loop control experiments and data collection.

6.7 Single vs Multi-Intersection Control

 Single-Agent Control: One DRL agent controls a single intersection independently.


 Multi-Agent Systems: Each intersection has a local agent, with optional coordination
mechanisms (e.g., message passing, joint reward sharing).
 Centralized Control: A global agent controls multiple intersections, but suffers from
scalability issues.

Distributed multi-agent DRL is increasingly preferred due to its robustness and scalability in
large urban networks.

10
Results

To evaluate the effectiveness of Deep Reinforcement Learning (DRL) in Traffic Signal Control
(TSC), researchers have conducted extensive simulations using platforms such as SUMO,
VISSIM, and AIMSUN. These studies compare DRL-based systems against traditional fixed-
time, semi-dynamic, and actuated control systems using several key performance indicators
(KPIs). The consistent outcome: DRL-based TSC models outperform conventional methods
across multiple metrics.

Metric Description
Mean time vehicles wait at an intersection
Average Waiting Time
before passing through
Number of vehicles lined up during red signal
Queue Length
periods
Number of vehicles that pass through an
Throughput
intersection in a time period
Time lost compared to ideal travel conditions
Delay Time
(green wave)
Percentage of green signal duration that is
effectively used
Green Time Utilization

Metric Traditional TSC DRL-based TSC Improvement


Avg. Waiting Time 55s 25s 55%
Queue Length 20 vehicles 8 vehicles 60%
Intersection
750 cars/hr 1200 cars/hr 60%
Throughput

11
Notable Model Outcomes

 Wan et al.: Dynamic discount factor improved responsiveness.


 Tan et al.: Multi-goal reward function enhanced overall flow.
 Genders et al.: Visual state inputs using CNNs led to robust learning.
 Van der Pol et al.: Coordinated multi-agent systems scaled to grids.

7.6 Limitations Noted

 Training Time: Requires extensive simulation training time (tens of thousands of


episodes).
 Generalization: Trained models may not perform well when traffic distributions shift
significantly (e.g., due to an event or road closure).
 Transfer to Real-World Deployment: Most results are simulation-based; sensor noise
and real-world unpredictability can hinder performance.

12
Conclusion

Deep Reinforcement Learning has demonstrated significant potential in optimizing traffic signal
control, outperforming traditional methods in adaptability and performance. The ability of DRL
agents to learn from the environment and optimize based on real-time data leads to reduced
congestion, better resource use, and improved commuter experience.

However, challenges remain, including:

 Scalability to city-wide networks


 Training data realism
 Deployment and safety in the real world

Future research may focus on multi-agent coordination, real-time sensor integration, and hybrid
model approaches combining DRL with rule-based systems.

Bibliography

 Rasheed, F., Yau, K.-L. A., Noor, R. M., Wu, C., & Low, Y.-C. (2020). Deep Reinforcement
Learning for Traffic Signal Control: A Review. IEEE Access, 8, 208016–208045.

 Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015). Human-level control through deep
reinforcement learning. Nature, 518, 529–533.

 Genders, W., & Razavi, S. (2016). Using a deep reinforcement learning agent for traffic
signal control. arXiv preprint arXiv:1611.01142.

 SUMO: Simulation of Urban MObility – https://www.eclipse.org/sumo/

13

You might also like