0% found this document useful (0 votes)
20 views6 pages

Safe Navigation Based On Deep Q-Network Algorithm Using An Improved Control Architecture

Uploaded by

a.hechifa39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Safe Navigation Based On Deep Q-Network Algorithm Using An Improved Control Architecture

Uploaded by

a.hechifa39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC)

Safe Navigation Based on Deep Q-Network


Algorithm Using an Improved Control Architecture
Chetioui Marouane Babesse Saad
Department of Electrical Engineering Department of Electrical Engineering
Laboratory of Automatic(LAS) Laboratory of Automatic(LAS)
University of Setif1 University of Setif1
Setif, Algeria Setif, Algeria
2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC) | 979-8-3503-4974-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICEEAC61226.2024.10576248

[email protected] [email protected]

Abstract— In the last decades, the application of deep The paper is structured into five sections: The related works
reinforcement learning algorithms in autonomous vehicles are presented in Section II. Section III provides a synopsis
(AVs) has become a popular research topic due to their of DQN. Our suggested architecture, complete with actions,
effectiveness of controlling the cars based on current states of reward functions, and state space, is presented in Section IV.
the environment. In this paper we trained autonomous vehicle
Our simulation and its outcomes are shown in Section V.
to make decisions regarding navigation safely in the traffic using
deep reinforcement learning technique and Carla simulator. the II. RELATED WORK
study presents the integration of depth, segmentation camera
and collision sensor to allow the vehicle discover the The combination of deep learning algorithm and
environment and reach its destination without collide with the reinforcement learning technique enables robust control of
road users. The model is tested on 3 different paths in present of self-driving cars during complex scenarios. In their study [3],
other road users in Carla simulator. Our results show that the the authors merged deep learning methods, specifically long
proposed technique achieves better performance in terms of the short-term memory (LSTM) and convolutional neural
estimation of distance and heading angel over time, and success network (CNN), to forecast the steering angle of autonomous
rate. vehicles (AVs). They utilized CNN to extract the features
from collected images. LSTM used previous features to
Keywords—autonomous vehicle, deep Q-Network, safe predict the steering angle. Joseph et al. [4] presented DQN
navigation, CARLA based reinforcement learning algorithm to guide an ego
I. INTRODUCTION vehicle to a destination with a collision free trajectory.
Experimental results indicated that their approach yielded
In recent years, autonomous driving has emerged as a favorable outcomes on the road.
critical solution to address traffic accidents and transportation
issues in urban areas, such as traffic congestions and Elallid et al. [5] anticipated a DQN model to control
accidents. It is poised to revolutionize the way we travel in autonomous vehicle in driving phase. They use images
the future [1], According to the World Health Organization captured by RGB camera as states space. The authors in [6]
(WHO), 1.3 million people die in traffic accidents each year have proposed a DRL algorithm that allows to control
[2]. Most road traffic accidents are caused by human error autonomous vehicle in an intersection with dense traffic
.one of the proposed solutions is autonomous vehicles which including pedestrian and other vehicles. Their reward
have the effectiveness of navigating safely and comfortably function prioritized only the proximity between the ego
in traffic.one of the most popular technique using in the vehicle and other vehicles and pedestrians, neglecting crucial
autonomous vehicle is Deep reinforcement learning (DRL). factors such as the speed of the ego vehicle and its alignment
DRL has occurred as a powerful paradigm in artificial with the road (φ).
intelligence, particularly in the domain of self driving cars, it Óscar et al. [7] presented comparative study between
combines the techniques of deep learning with reinforcement DQN and DDPG in order to control autonomous vehicle to
learning rules to enable agents for learning and making navigate and following a determined route, they use visual
decisions in several types of environments. DRL plays a features and driving features as state space. The results prove
pivotal role in addressing the challenges associated with that their model is capable to reach its destination, but they
navigation, decision-making, and interaction with the don’t include other vehicles and traffic movements in the
environment. By leveraging neural networks to approximate town. in [8] authors presented RL method to generate the
the value functions and policies, DRL enables autonomous steering angle of an AV to avoid sharp turns in the road. they
vehicles to learn from raw sensor data such as: RGB camera, achieve good results and their model is stable, but it excluded
collisions sensors, lidar, radar... etc, to make adaptive other road users in the environment such as other vehicles
decision over time. behavior.
in our paper, we developed a deep Q-network model in The previous works, one of which tested the model
order to learn the autonomous vehicle to make decisions to without adding traffic movements. Others have designed the
navigate in traffic and reach its final destination without reward function without considering the ego vehicle's
colliding with other vehicles and pedestrians. this study velocity and the ego vehicle's direction with respect to the
includes preprocessing step to reduce the state space. we use road φ. Moreover, training the agent to reach the desired
segmentation, depth camera and collision sensors provided destination and avoid colliding with other vehicles or
by Carla to detect the obstacles in front of our agent and pedestrians, without estimating the distance from the nearest
estimate the distance to make the adaptive actions. obstacles is ineffective because the reward function must

979-8-3503-4974-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: University of Malaya. Downloaded on December 11,2024 at 13:45:21 UTC from IEEE Xplore. Restrictions apply.
2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC)

contain all road and driving features. for these reasons, we Within these regions, we calculate the average distance
proposed a RL model based on DQN to learn the agent to values separately for pedestrians and vehicles.
make decision to navigate safely in traffic and reach its
desired destination. we included the road features such as the The minimum of these distances is the distance from closest
distances and heading angle, we also add the driving features obstacle (d_obs).
such as the state of the collision sensors and the velocity of Fig. 2 shows preprocessing steps.
the vehicle.
III. A BRIEF OVERVIEW OF DEEP Q- LEARNING
DQN, or Deep Q-Network, is an algorithm based on
reinforcement learning that has been used in various domains
such as robotics, gaming, especially autonomous vehicles. In
the context of autonomous vehicles, DQN can be used to train
agents that make decisions based on input observations
(states) to navigate and control the vehicle (actions). DQN can
be used to learn a policy for making driving decisions, such as
steering, acceleration, and braking, based on input sensor data
such as cameras, lidar, and radar. The Q-learning algorithm
uses a bellman equation to update Q(s, a) value iteratively of
each state-action during the learning process. the bellman
equation is as follows:
( , )= ( , )+ [ + ( +1, )−
( , )] (1)
Fig. 1. Three different trajectories followed by the car in Carla simulator
Where the hyperparameters λ called the learning rate, γ is
discount factor and is the reward given to the agent. +1 is
the next state after taking action . The learning rate, λ,
represents how much an update changes the current Q-value,
while the discount factor, γ, determines the importance of
future actions in the update.
DQN uses a neural network to estimate the Q-value
function, it receives the state as inputs and outputs the Q-
values for each all possible actions.to insure the stability and
the effectiveness of the trained model and prevents overfitting,
DQN uses experience replay. It stores experiences
( , , , +1) in a replay buffer and samples mini-batches
during training. After training, we obtain the optimal policy as
follows:
π( ) = argmax ( , ) (2) Fig. 2. Preprocessing of segmentation and depth images extracting road
features.
IV. PROPOSED ARCHITECTURE
we show in this section, how we've investigated our
strategy for guiding the vehicle through traffic while ensuring
collision avoidance, from its initial position to its final
destination within the CARLA environment.
A. State space
This term refers the information which is received from
the environment to make agent’s inputs. In Fig.1, we establish
the trajectory that the vehicle must adhere to, between the
start position to desired destination using waypoints in Carla.
we use depth and segmentation images to detect relevant
information of the environment such the distance of the
obstacles from the agent.
In detail, our process entails preprocessing the
segmentation image by removing data corresponding to the
opposing lane. This decision stems from the understanding
Fig. 3. The extracted geometric information of the road
that vehicles from that direction have no impact on our agent's
performance. Subsequently, we utilize the depth image to Fig. 3 shows the geometric information extracted about
create a distance map, helping us gauge the closest obstacle the road and the ego vehicle, for that, we use the planned
to our agent. Lastly, we isolate binary masks representing trajectory P provided by Carla to compute angle and the
obstacle areas from the preprocessed segmentation images. distance between the ego vehicle and the center of its

979-8-3503-4974-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: University of Malaya. Downloaded on December 11,2024 at 13:45:21 UTC from IEEE Xplore. Restrictions apply.
2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC)

current lane using waypoints on the trajectory , the distance The reward function given to driving model based on the
are computed as follow: distance , angel ∅ and collision sensor ,)-- . The reward
given to driving model is defined as follow:
= || ||sin( "#( ∗ %) cos−1( ∗% ) (3)
|| ||||%||
+( , ∅, ,)--) = +( , 7 ( )) − 20[| | > 2] − 200[| | >
With % = +1 − , = − ( and p the position of the 3] 2 [|∅| > 80] − 400[,)-- > 0] (5)
vehicle.
With: ( , 7 ( )) a reward function that gives to the agent if it
is the difference in Yaw (rotation around vertical axis) follows a predetermined sub-optimal policy 7 ( ).
between waypoint and the agent.
D. DQN Models
In our model, we define five states: the minimum distance
as we mentioned above, we used two driving and braking
to an obstacle (dobs), the collision sensor state (coll), the
models. A list of the hyperparameters that we adopted is
orientation of the ego vehicle relative to the road ( ), the
lateral distance to the centerline of the road (d), and the shown in Table 1.
velocity of the ego vehicle.
TABLE II. ADOPTED HYPERPARAMETERS IN THE MODELS
B. Action space
Autonomous vehicle takes three control commands in the Parameter Value
environment, such as acceleration, steering angle, and brake.
Learning rate 0.0001
The acceleration takes a range [0,1], steering between [-1,1]
and the brake [0,1].to discretize the actions of the autonomous Episodes 100
vehicle we use five action, the table below summarizes
Batch size 32
different actions.
discount factor γ 0.99
TABLE I. ACTIONS AND THEIR DESCRIPTIONS AND VALUES
Epsilon ε 0.7
Action
index Action description Action Value Replay Memory size 500000
Steering Throttle Brake a) Braking model
0 Brake 0.0 0.0 1.0 The braking DQN model takes the actual values of
Fully forward distance from closest obstacle )* ,the velocity of the ego
1 0.0 0.5 0.0
vehicle and the state of collision sensor ,)-- as inputs and
Turn left outputs Q-values for two actions: brake (0) or drive (1). The
2 -0.6 0.1 0.0
model undergoes training within a straight road scenario,
Turn Right
3 0.6 0.1 0.0 where a car is randomly placed at a distance from our agent to
Turn Slightly Left make the agent stop near the car without any collision. we
4 -0.1 0.4 0.0
trained the ego vehicle for 100 episodes, we observe that the
5 Turn Slightly Right 0.1 0.4 0.0 overall reward is increasing and reach the value around 500
(figure4a). This shows that the ego vehicle was able to brake
and throttle in the different situations. the architecture of the
C. Reward braking and the curve of total reward is show as follow:
After the agent takes a specific action in a given situation,
scalar feedback is provided to encourage positive actions and
penalize negative ones. The car's objective is to learn a
strategy that can maximize rewards. In our case, we have
designed rewards for two models: braking and driving. The
goal is to make the ego vehicle reach final destination and go
as fast as possible through the center of the lane without
leaving the lane and avoiding collisions.
a) Braking reward
The reward function of the braking model should consider
avoiding collision, the velocity of the vehicle, the current
distance from closest obstacle. If the collision sensor detects
a crash a penalty of -400 is given, if the vehicle is stopped
and the current distance is less than 150 the reward of 200 is
given, if the velocity of the vehicle (v) is less than 1 and the
current distance from obstacle ( )* ) is greater than 100 the
penalty of -20 is given. The reward function given to the
braking model is defined as follows:
+( , )* , ,)--) = −20[v < 1] 2 [ )* > 100] + 200[ = (a)
0] 2 [ )* < 150] − 400[,)-- > 0] (4)
b) Driving model

979-8-3503-4974-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: University of Malaya. Downloaded on December 11,2024 at 13:45:21 UTC from IEEE Xplore. Restrictions apply.
2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC)

E. Final achitecture
Now we combine the two corresponding architectures of
the previous two-models : braking and driving.

(b)
Fig. 4. (a) Total reward as a function of episodes.

(b) The architecture of braking model.

b) Driving model
The driving model takes the state of collision sensor coll,
signed distance from the centerline of the road d and the
orientation of the car relative to the road and outputs Q-
values for five actions figure 5(b). We spawned the car on the
road and train it to follow predefined path and avoid collision. Fig. 6. The final combined architecture
Post-training analysis reveals a continuous increase in the total
reward, as illustrated in Figure 5(a). this shows that the car V. FINDINGS
learns to follow the trajectory and avoid the collisions. The In this section, we show the performance of our model,
architecture of the driving model and the curve of total reward using Carla simulator, where we train the ego vehicle on three
is given as follows. different trajectories in order to safely navigate and follow the
predefined path. During training, we record the distance d and
the direction angle ∅, and we also count the number of
collisions with road users, Given that our model's primary
objective is to minimize collisions, distance, and directional
angles, the results of the recorded information and collisions
are presented below.

(a)(b)

Fig. 7. Actual vs desired trajectory (first trajectory)

Fig. 5. (a) Total reward as a function of episodes.

(b)The architecture of driving model.

979-8-3503-4974-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: University of Malaya. Downloaded on December 11,2024 at 13:45:21 UTC from IEEE Xplore. Restrictions apply.
2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC)

Fig. 8. Actual vs desired trajectory (second trajectory) Fig. 11. Direction angle ∅ of the first trajectory(trajectory1)

Fig. 9. Actual vs desired trajectory (third trajectory)


Fig. 12. Distance d of the second trajectory(trajectory2)

Fig. 10. Distance d of the first trajectory(trajectory1)


Fig. 13. Direction angle ∅ of the second trajectory(trajectory2)

979-8-3503-4974-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: University of Malaya. Downloaded on December 11,2024 at 13:45:21 UTC from IEEE Xplore. Restrictions apply.
2024 2nd International Conference on Electrical Engineering and Automatic Control (ICEEAC)

as we can see in the table 3,when we test our agent on the


three trajectories for a total of 60 runs ,we achieved a success
rate of 90%,successful and failure runs for the 3 trajectories
(videos 1,2,3,4,5) are shown in [9].on the other hand ,we
observe that the vehicle tries to follow the desired path and
reduce the values of |d| and |∅|,which reflects the effectiveness
of our agent in following the predetermined path and reaching
its final destination.
VI. CONCLUSION
In this paper, we presented a robust DQN model capable
of deal with complex traffic and reach safely in final
destination. we used the two DQN models driving and
braking to help the vehicle make right decisions in each state.
We applied preprocessing step in order to remove useless
information and reduce state space. Our model empowers the
agent to evade collisions with various road users, including
vehicles, motorcycles, and pedestrians. The results
underscore the learning process of our AV, showcasing its
Fig. 14. Distance d of the third trajectory(trajectory3) adeptness at navigating through traffic scenarios.
REFERENCES

[1] Chan CY ,”Advancements, prospects, and impacts of automated


driving systems. Int J Transp Sci Technol, 2017, 6(3):208–216.J.
[2] Be, “Road traffic injuries,” https://www.who.int/news-
room/factsheets/detail/road-traffic-injuries, 2021.
[3] M.-j. Lee and Y.-g. Ha, “Autonomous driving control using end-to-end
deep learning,” in 2020 IEEE International Conference on Big Data and
Smart Computing (BigComp). IEEE, 2020, pp. 470–473.
[4] Joseph Clemmons and Yu-Fang Jin,’’ Reinforcement Learning-Based
Guidance of Autonomous Vehicles’’ in 2023 24th International
Symposium on Quality Electronic Design (ISQED). San Francisco,
CA, USA, IEEE, 2023, pp. 1-6.
[5] Badr Ben Elallid, Nabil Benamar, Nabil Mrani and Tajjeeddine
Rachidi,” DQN-based Reinforcement Learning for Vehicle Control of
Autonomous Vehicles Interacting with Pedestrians,” 2022
International Conference on Innovation and Intelligence for
Informatics, Computing, and Technologies (3ICT), IEEE,2022,pp.1-5..
[6] Badr Ben Elallid, Miloud Bagaa, Nabil Benamar and Nabil Mrani” A
Reinforcement Learning Based Approach for Controlling Autonomous
Vehicles in Complex Scenarios,” 2023 International Wireless
Fig. 15. Direction angle ∅ of the third trajectory(trajectory3) Communications and Mobile Computing (IWCMC), Marrakesh,
Morocco, IEEE,2023, pp.1-7.
TABLE III. STATISITCS OF THE THREE TRAJECTORIES [7] Óscar Pérez‑Gil, Rafael Barea, Elena López‑Guillén, Luis M. Bergasa,
Carlos Gómez‑Huélamo, Rodrigo Gutiérrez and Alejandro Díaz‑Díaz
% of
“Deep reinforcement learning based control for Autonomous Vehicles
Collisi
Trajectory Total in CARLA,” Multimedia Tools and Applications, vol.81, no.3,
on Success
Trajectories Distance Simulation pp.3553-3576,2022.
with Rate %
(m) Time(s) [8] J. Chen, C. Zhang, J. Luo, J. Xie, and Y. Wan, “Driving maneuvers
road
users prediction based autonomous driving control by deep monte carlo tree
Trajectory1 229 333.6 2 90 search,” IEEE Transactions on Vehicular Technology, 2020.
[9] https://drive.google.com/drive/folders/1rqY4uQllVueomV_fEozMPG
Trajectory2 248.5 350.2 3 85 dzR5tgZIk6?usp=drive_link
[10] Ghadi Nehme, Tejas Y. Deo,” Safe Navigation: Training Autonomous
Trajectory3 200 290.88 1 95
Vehicles using Deep Reinforcement Learning in CARLA”, arxiv,2023
Average 225.83 324.89 3 90

979-8-3503-4974-0/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: University of Malaya. Downloaded on December 11,2024 at 13:45:21 UTC from IEEE Xplore. Restrictions apply.

You might also like