0% found this document useful (0 votes)
38 views6 pages

Mainrep

The document discusses using deep reinforcement learning and the DDPG algorithm to develop an autonomous parking model. It summarizes literature on expert driver data-based methods and prior knowledge-based methods. It then outlines the challenges of autonomous driving and proposes using a reinforcement learning approach to enable autonomous parking.

Uploaded by

Ghanshyam s.nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views6 pages

Mainrep

The document discusses using deep reinforcement learning and the DDPG algorithm to develop an autonomous parking model. It summarizes literature on expert driver data-based methods and prior knowledge-based methods. It then outlines the challenges of autonomous driving and proposes using a reinforcement learning approach to enable autonomous parking.

Uploaded by

Ghanshyam s.nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Autonomous Driving Agent in Carla using DDPG


Ghanshyam S. Nair, MMV Sai Nikhil, Aniket Sarrin
Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa
Vidyapeetham, India
[email protected] , [email protected] , [email protected]

Abstract—When driving a car, it’s important to make sure


it consistently moves toward the parking spot, achieves a great witnessed by drawing on the values of related states. This
heading angle, and prevents major losses brought on by line method replaces tabular methods of forecasting state values
pressure. It is suggested to use deep reinforcement learning to with DRL-based algorithms. The potential to handle someof
create an autonomous parking model. To determine the various the most challenging issues in autonomous automobiles,
states of its movement, a parking kinematics model is including scheduling and judgement, has been demonstrated in
constructed. A thorough reward function is created to take into
account the emphasis of movement and security in various this area. Deep Deterministic Policy Gradient (DDPG) is one
parking stages. Steering angle and displacement are employed to of the Deep Reinforcement Learning (DRL) algorithms (TD3).
achieve engagement as actions.The automatic parking of the The primary methodology for the subjects of our forthcoming
vehicle is made possible by training, and the many steps and research is DDPG.
circumstances involved in parking are thoroughly examined. LIMITATIONS:
Additionally, a subsequent generalisation experiment
demonstrated the model’s strong generalizability. Driverless cars (AD) is a challenging area for application of
deep learning (ML). It is challenging to provide the
Index Terms—CARLA , secure , autonomous appropriate behavioural outputs for a driverless car system
because ”excel- lent” driving is already subject to arbitrary
I. INTRODUCTION interpretation. It can be challenging to choose the right input
properties for learning. There are many challenges involved in
Using automated technologies can help reduce traffic acci- achieving such a sophisticated idealogy in the context of
dents and disasters. As 94 percent of risky driving is caused by autonomous driving, including Due to the complex driving,
the actions or mistakes of the driver, self-driving vehicles can such as the wide variations in traffic jams and human
significantly minimize it. A driver’s tendency for dangerous involvement, as well as the need to balance efficiency (speed),
and harmful driving behaviours may decline as freedom levels elegance (smoothness),and danger, self-driving in urban
rise. Autonomous vehicles must pass a number of tests to contexts is still a majorconcern today. For autonomous driving
ensure their safe operation in everyday situations and to to work, decisionsmust be made in fluid, erratic environments.
prevent fatal collisions. However, conducting such Uncertainty in the prediction is caused by imperfect data and a
researchon the road is difficult because it is expensive and failure to accurately assess human drivers’ intentions. The
difficult. Simulated scenario testing is a well-known practice other cars are intended to function as hidden variables in this
in the automotive industry. For example, the situations utilized problem as a partly visible Markov decision making process
inthe past to instruct Automatic Braking Systems are (POMDP).
inadequatefor training driverless cars. In essence, incredibly
complex exercises are required to teach self driving cars to II. LITREATURE SURVEY
behave like humans. In this study, we used a self driving A. EXPERT DRIVERS’ DATA-BASED METHOD
simulation to examine the muti contextual perception problem
Typically, the data-based approaches for imitating experi-
of the sim- ulation model. AI is evolving through AV design,
enced drivers’ parking behaviours make use of supervised
addressing processes such as detection. The ego-vehicle may
learning techniques like neural networks. The network
use Multi- Object Tracking (MOT), scenario forecasting, or
receives information about the surrounding environment as
evaluation of the current driving conditions to decide the
inputs, including such visual information, location, and
safest course of action. Recently, approaches based on Deep
vehicle condi- tion, and produces the necessary driver action
Reinforcement Learning were used to resolve Markov
[3], [4]. Due to the neural network’s limited ability to
Decision Processes (MDPs), which aim to determine the best
generalise, a significant portion of training data is necessary to
course of actionfor an agent to adopt in order to maximise a
adequately represent target working scenarios. Additionally,
reward function. These techniques have produced positive
the quality of parking data has a significant impact on the
outcomes in fields like gaming and simple decision-making
functioning of the network. As a result, the information must
tools. To learn howto utilise the AV also view inside the car,
be of a high calibre. Expert data sets, however, are frequently
Deep Reinforcement Learning techniques are developed for
expensive and time and labour intensive. Additionally, the
autonomous vehicles. The Deep prefix in DRL relates to a
effectiveness of algorithms trained in this way will be
time series prediction that enables the agent—in this case, the
constrained by the expert data [5]. It is therefore challenging
self-driving car—to generalise the state value that it has
to attain ideal multi-objective driving effectiveness because
only largely or never
the computer could only approximate but not match expert
efficiency.
2

B. PRIOR KNOWLEDGE-BASED METHOD of learning speed for almost all of the situations, and exceeds
A strategy based on previous knowledge involves abstract- them by across an orders of magnitude for a few games,
ing human driving history into foreknowledge and using it according to the results. The steering column attitude change
to direct the planning process. It is separated into fuzzy logic range is also limited, regardless of whether the research
control, heuristic search, and geometric approaches. The discussed above is for the final portion of vertical parking
geometric method’s various curve types—including the or high-speed situation decisions, such as lane changes and
Reeds- Shepp (RS) curve , clothoid curve , Bezier curve, LKA. In this piece, parallel parking—generally thought to be
spline curve , and polynomial curve [12]—as well as heuristic more challenging than perpendicular parking—is presented.
functions used in search algorithms like A* and fuzzy rules There is a pressing requirement for speedy learning because
used in fuzzy inference methods —all draw inspiration from the steering wheel in this instance has a wide variety of input
real-world parking circumstances. The driving environment is angles.
biassed in and of itself. As a result of knowledge loss through
abstraction, achieving the best multi-objective driving III. SYSTEM ARCHITECHTURE
efficiency is more difficult.
A. Reinforcement Learning
Self-driving cars will likely become a common type of
C. REINFORCEMENT LEARNING-BASED METHOD
vehicle and symbolize future transportation within the next ten
As was already said, approaches based on specialist data years. The vehicle must be secure, reliable, and comfortable
and foreknowledge primarily draw from or abstract from for the user for this to operate. Driverless cars must be ex-
human experience, which calls for a significant amount of tremely good mediators when making right or left movements
high- quality parking data. The performance of the system will if they are to progress in crowded places. It is believed that the
be constrained also when professional sets of data are primary mechanism for teaching driving laws is reinforcement
accessible. Reinforcement learning systems, on the other learning. We provide an improved Q-learning reinforced
hand,are educated by their own history, which theoretically learning method that will maximize the advantages of large
enables them to outperform humans [5]. By engagement state space. For our research on driverless autos,we only use
between the agent and environment, the reinforcement the open-source simulation CARLA. This study simulates a
instruction learns tactics. Prototype methods and model-free real-world situation where a robot provides datato attempt to
methods are thetwo primary divisions of reinforcement resolve issues.
learning, dependingon a criterion that takes into account
whether it is essentialto represent the environment. In the
framework approach, a strategy to maximise the cumulative
reward is discovered after obtaining a state transition
probability matrix and reward function.
Whereas, the model-free method directly estimates the
value Q s a ( , ) of the action a taken in the states, and then
selects the action with the highest estimated return value to be
executed in each state. There hasn’t been much study on using
reinforce- ment learning to solve vehicle planning problems.
Zhang et al. [16] used deeper programme slope [17], an
experimental reinforcement learning technique, to solve the
perpendicular parking problem. Before moving the trained Fig 1.Implementation Architecture
agent to the actual vehicle to keep training, they initially train
the operator in the simulated environment. The speed policy is
B. Deep Q-Network
likewise streamlined to a single fixed command, and the work
only concentrates on the last parking segment of the two DDPG The DDPG (Deep Deterministic Policy Gradient)
phases in perpendicular parking. Some researchers used method is a model of a policy-supervised learning technique
model-free reinforcement learning techniques to study other that can be applied to issues involving continuous action
autonomous driving scenarios, including lane change vectors and high-dimensional space. Actor-critic, the central
decisions and lane- keeping assistance (LKA). These component of DDPG, is made up of a weighted linear net
techniques include , Actor- Critic ,deep Q-Network (DQN), (critic) and a policy nets (actor). There are several advantages
DDPG , and others. The environment need not be modelled in to the randomized planning networking which the DDPG
the model-free approach. Nevertheless, it is only taught by method employs and upgrades with strategy gradients. This
actual agent-environment interaction, which has a low enables the network to manage consistently high activity
learning effectiveness. To learn how to play video games, levels. In order to increase data utilisation, seasoned playback
Kaiser et al. [26] compared the learning effectiveness of their arrays were developed, making it possible to execute double
suggested model-based strategy with two cutting-edge model- wiring updates for the intended system and the existing or
free methods: Rainbow [27] and proximal policy optimization specific current network. Convergence of the techniques or
(PPO) [28]. The suggested model-based method beats the strategy is facilitated by this. The targeted system employsthe
model-free methods in terms mild update technique, which increases the stability of the
person. The DDPG simulates a human brain to approach the
3

Actor - State space - 16 -512-256-Tanh-1 Or 2 action


output. Critic - Action space - 2-64-128-OUT 1
State pace - 16 -64-128 OUT 2
Concatenate OUT 1 and OUT 2 to give out 256 layered
neurons which then gives out 1 dense output
Noise addition: Ornstein Uhlenbeck Process
Fig. 2. Showing car approaching the target first and reaching the target Replay buffer: buffer size – 100000, with 100 steps in each
buffer

functional form. This weighed sum network is also known as


a feedback net. Its input are the activity and state values [a,s], A. Reward Structure
and its output are Q. (s, a). There seems to be a neural network There are 2 kinds of reward structures used in this project
to roughly represent the strategy function. Plan network is both varied in penalty and formulation.
another name for this actor network. It specifically has an Angle penalty – Cos(a).
operator as a result and a state as an input. The steepest
descent approach is used in the critical network to reduce loss
and update connection weights. The specific strategy is as
follows. Mini-batch learning requires the usage of knowledge
replay arrays because DDPG requires that samples be evenly
and freely distributed. Additionally, in order to allude to
DQN’s split data network, DDPG divides both of the agent
nodes and the critique networks into the aim system and the
present network. The soft update method also significantly
increases the channel’s stability.

C. Environment
CARLA In order to facilitate the creation, instruction, and
testing of driverless systems, CARLA has been built from the
ground up. CARLA offers open digital content (urban designs,
houses, and vehicles) in conjunction to gathering and sharing
and protocols that were developed for this purpose. The
simulation platform offers customizable climatic conditions,
full control over all static and dynamic characters, the creation Fig.4 Angular Penalty Structure
of maps, and many other features.
There are two impediments surrounding the parking place,
and the project’s simulation of the parking environment is ver-
tical. Figure 1 depicts the creation of the parking surroundings
reference system using the base as the centre of the parking
spot.

Fig 3.Vertical Parking Environment

IV. RL MODEL FOR THE APPLICATION


Environment : CARLA - developed from the ground up to
support development, training, and validation of autonomous
driving systems. Fig.5 Constructed Reward Function
Action Space : 1 or 2 – depending on the mode set from Throttle Penalty - The reward can be calculated using linear
the user (Only throttle / steer and throttle) or gaussian methodology.Linear methodology - [Max(dis) -
State Space : 16 curr(dis)] / [max(dis) - prev(dis)]. Gaussian methodology -
RL algorithm/- policy: Soft Actor Critic Max(reward)( − dis2/2 ∗ sigma2)
4

Fig. 6. Critic Model

Moving target input - Concatenated reward +


gamma*(target critic).Distance is calculated using Euclidian
method.

Fig 8. Actor
Model

C. Ornstein Uhlenbeck Process


By choosing a random response based on probabilities,
exploration is carried out in reinforcement learning for fi-nite
action spaces (such as epsilon-greedy or Boltzmann
Fig exploration). Exploration in continuous action environments
Fig 7.System is accomplished by introducing noise to the motion itself.
Architecture The method has a few intriguing characteristics. It is both
a Markov and a Gaussian process. Theta, sigma, and mew are
B. Model Structure its two hyperparameters. Mew is set to 0, theta is adjusted to
0.015, and sigma is an array of numpy values between 0.3 and
0.4.

Final Actor Structure: Input layer: The input layer has D. Hyper Parameter values
state space as input size . State pace - 16 -512-256-Tanh-1Or 2
action output. Final Critic Structure: Input layer: This Given below are the hyper-parameters used their respective
structure has 2 different neural networks combining to give a values:
single output. One input layer has action space as input size MEMORY FRACTION = 0.3333
while the other has the state space. TOTAL EPISODES = 1000
STEPS PER EPISODE = 100
Action space - 2-64-128-OUT 1
AVERAGE EPISODES COUNT = 40
State space - 16 -64-128-OUT 2 CORRECT POSITION NON MOVING STEPS = 5
Concatenate OUT 1 and OUT 2 to give out 256 layered OFF POSITION NON MOVING STEPS = 20
neurons which then gives out 1 dense output. REPLAY BUFFER CAPACITY = 100000
THROTTLE NOISE FACTOR -
5

BATCH SIZE = 64
CRITIC LR = 0.002
ACTOR LR = 0.001
GAMMA = 0.99
TAU = 0.005
EPSILON = 1
EXPLORE = 100000.0
MIN EPSILON = 0.000001

Fig.6 Experiment 2 Result

VI. GAPS AND CHALLENGES


Parallel parking was difficult to implement due to high level
of crashes with other cars slow learning rate.
Traffic management - Various flows and management in
different localities . The constraints of the vehicle cannot
remain the same.
Research gap - Most of the autonomous cars make use of
the fuzzy logic and sensory based output deep learning models
they do not rely much on reinforcement learning.

VII. CONCLUSION
We created learning data for the neural network utilizing
the deep semisupervised method DDPG. Then, to guarantee
that the automobile is always approaching the parking centre
Fig 9.Concatenated NN
and achieves the ideal car inclination during the parking
V. RESULTS process, a straightforward and efficient reward system is built.
Additionally, in order to secure the safety of the parking
process, the converging of the strategy, and the commuter lot
The experiences replayed pool’s good parking activities and
data, we penalise the line pressing action. learning how to
states are sampled with a step size of 100 rounds, and the
prevent data un-sampling.
overall converging of the reward value in 500 separate
episodes over the course of the full DDPG algorithm training
VIII. FUTURE SCOPE
procedure is as follows:
Consider using some other, quicker, and more effective
strategies. constructing a more intricate custom parking sce-
nario to up the challenge and precision. Consider simulating
traffic in different simulation environments.

REFERENCES
[1] ]Pan, You, Wang Lu - Virtual To Real Reinforcement Learning, arXiv
(2017).
[2] Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua - Safe, Multi-
Agent, Reinforcement Learning for Autonomous Driving, arXiv (2016).
[3] Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, Daniel Cre-
mers - Learning to Drive using Inverse Reinforcement Learning ANd
DQN, arXiv (2017).
[4] Markus Kuderer, Shilpa Gulati, Wolfram Burgard - Learning Driving
Styles for Autonomous Vehicles from Demonstration, IEEE (2015).
[5] Jeff Michels, Ashutosh Saxena, Andrew Y. Ng - High Speed Obstacle
Avoidance using Monocular Vision and Reinforcement Learning, ACM
Fig 10.Experiment 1 result (2005).
[6] Ahmad El Sallab, Mohammed Abdou, Etienne Perot Senthil Yogamani
- Deep Reinforcement Learning Framework for Autonomous Driving,
arXiv (2017).
6

[7] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez,


and Vladlen Koltun -CARLA: An Open Urban Driving Simulator.
[8] Jan Koutn´ık, Giuseppe Cuccu, Ju¨ rgen Schmidhuber, Faustino
Gomez- Evolving Large-Scale Neural Networks for Vision-Based
Reinforcement Learning.
[9] Chenyi Chen, Ari Seff, Alain Kornhauser, Jianxiong Xiao- DeepDriving:
Learning Affordance for Direct Perception in Autonomous Driving.
[10] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess,
Tom Erez, Yuval Tassa, David Silver Daan Wierstra- Continuous Control
With Deep Reinforcement Learning.
[11] Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel
Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migi-
matsu, Royce Cheng-Yue, Fernando Mujica, Adam Coates, Andrew Y.
Ng - An Empirical Evaluation of Deep Learning on Highway Driving.
[12] Abdur R. Fayjie, Sabir Hossain, Doukhi Oualid, and Deok-Jin Lee-
Driverless Car: Autonomous Driving Using Deep Reinforcement Learn-
ing In Urban Environment. [14]Matt Vitelli, Aran Nayebi, CARMA: A
Deep Reinforcement Learning Approach to Autonomous Driving.

You might also like