Ansys RTR RL Presentation
Ansys RTR RL Presentation
2. Discover how reinforcement learning coupled with neural networks benefits from high fidelity sensing
• Train a reinforcement model to control vehicle speed and braking on obstacle approach
• Leverage raw I/Q data and range-Doppler maps to control training process
2
Deep Reinforcement Learning
Some action…
3
Reinforcement Learning
• The agent, meant to perform a certain task, interacts with the environment through a
sequence of observations, actions and rewards.
Agent
Environment
• The action of the agent is interpreted in this environment as the actuation of the vehicle
pedals.
• The agent’s goal is to select actions in a fashion that maximizes cumulative future reward.
4
Vehicle Longitudinal Control
5
Vehicle Longitudinal Control
6
Open Simulation Interface
• The Open Simulation Interface (OSI) is a specification for interfaces between models
and components of a distributed simulation.
• OSI was also developed to addresses the emerging standard ISO 23150 for real
sensors' standardized communication interface.
7
Open Simulation Interface
RL Model ActionRequest*
8
Tool Chain Components
9
CARLA Terrain and Road Network (Town-02) & Driving Loops
10
Simulation Tool Chain – 3D Models
Number Plates
11
Tool Chain Implementation
Ansys Real Time Radar
• Ansys Real Time Radar (RTR):
‐ Physics-based and high-fidelity simulation.
‐ Multi-Bounce ray tracing and simulation fidelity.
‐ Captures physics beyond line-of-sight sensing.
‐ Support for a variety of electromagnetic material models.
• RTR Data output:
‐ Processed range-Doppler imagery.
‐ Rcvr raw data (I/Q or I_real) data from ADC.
Processed
range-Doppler Raw I/Q Rx
data per Rx channel data
channel (post-ADC)
12
Real-Time Radar Modeling: Waveforms and Outputs
Frequency Modulated Continuous Wave (FMCW)
Common automotive radar waveform
Lower power than pulse-Doppler
Range offset caused by coupling between the range-Doppler shift
freq
fmax Tx 1 Tx 2 Tx 1 Tx 2
fcenter
…
fmin
Chirp 1 Chirp 2 Chirp N-1 Chirp N time
Coherent Processing Interval (CPI)
Pulse-Doppler Waveform
freq
Tx 1 Tx 2 Tx 1 Tx 2
fmax
fcenter …
fmin
Pulse 1 Pulse 2 Pulse N-1 Pulse N time
Coherent Processing Interval (CPI)
14
Radar Configuration
Parameter Value
numChannels 1
hpbwHorizDeg 140
hpbwVertDeg 30
centerFreq 76.5e9
bandWidth 300e6
numFreqSamples 200
cpiDuration 0.00979
numPulseCPI 250
rPixels 512
dPixels 384
15
Real-Time Radar Sensor Integration with CARLA
CARLA
Server
• World-state and actors’ update.
• Sensor rendering.
Running in synchronous
• computation of physics
mode with a fixed fps rate.
• ...
UNREAL Engine
C++
Scalable Client-Server
Range-Doppler
Action
18
DRL Architecture
• The DRL model is based on a Deep Q learning model originally created to perform on
the Atari game “Space Invaders”:
‐ https://github.com/philtabor/Youtube-Code-Repository
• Updated Actions
19
DQN Reward Function
• Reward Policy
if_obstacle If_collision target_distance target_speed ego_speed reward
[True, False] [True, False] [m] [m/s] [m/s] [-]
True True - - - -10
True False >20 >0 0 -5
True False >10 =0 =0 -1
False False - - 0 -1
False False - - >0 +1
True False >=20 >0 >0 +1
True False <=20 >0 >0 +5
True False <=10 0 0 +3
20
Supervised Learning
Simulation Tool Chain for Machine Learning
Scenario Variation
Scenario Scenario Neural Network Training
Definition Generation Labeled Data Inference
“Ground Truth” Results
22
Tool Chain Implementation
Ansys optiSlang
23
Labeled Range-Doppler Maps
24
Results and Summary
Results
26
Results
27
Results After Training:
Avoid Car at Intersection and Following Stop
28
Results After Training:
Avoid False Alarms With Approaching Cars
29
Results After Training:
Handling Issues Not Addressed With Reward Function
30
Results After Training:
Avoiding Collision with Car at Intersection
31
Free Learning Resources at your Fingertips
32
Questions?
Jeff Blackburn
Senior Product Sales Manager – Ansys Autonomy
[email protected]
650-313-3649