0% found this document useful (0 votes)

105 views12 pages

Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks

This document summarizes a research paper that uses reinforcement learning and Q-learning algorithms to simulate navigation of a mobile robot. It uses an artificial neural network to generate the Q-function, which allows the robot to learn actions that maximize rewards in an unknown environment. The system architecture includes sensors to perceive the environment, possible actions for the robot to take, and a Q-value network to evaluate actions and select the optimal one based on past experience. The neural network is trained using reinforcement signals of -1, 0, or 1 to adjust weights and improve action selection over time.

Uploaded by

techlab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views12 pages

Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks

Uploaded by

techlab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Simulation of the navigation of a mobile robot by the Q-

Learning using artificial neuron networks

Mezaache Hatem and Abdessemed Foudil

University Hadj Lakhdar of Batna-Algeria-Faculty of Engineering Department of

Electronics
[email protected]

Abstract.
This paper presents a type of machine learning is reinforcement learning, this
approach is often used in the field of robotics. It aims to determine a control law for a
mobile robot in an unknown environment. This kind of technique applies when one
assumes that the only information on the quality of actions performed by the mobile
robot is a scalar signal which has a reward or punishment, the process of learning is to
improve the choice of actions to maximize rewards. One of the most used algorithms
for solving this problem is learning the Q-learning algorithm which is based on the Q-
function, and to ensure the generation of this latter function and the proper
functioning of the apprenticeship system using an artificial neural network as the
statements of changing environments where mobile robots have wide open spaces, the
action performed by the mobile robot in its environment is ensured by using a
selection function, this action is evaluated by a scalar signal which is -1, 0 and 1.

Key words: Reinforcement learning, Q-Learning, Q-Function, Artificial Neural

Networks, Mobile Robot.

1 Introduction

Learning is a process to improve the performance of a system based on its past

experiences. This method occurs when the problem seems too complicated to solve in
real time, or when it seems impossible to solve the problem in a classical and
rigorous. An example of learning methods cited reinforcement learning.
The reinforcement learning is a technique which is to acquire the agent executor
behavior desired by methods based on the concept of reward or punishment. The
optimal behavior of an agent is often difficult to implement given the large number of
variables that may play a role. In the framework of reinforcement learning, the agent
can learn to behave in the same way as we learn to ride a bicycle from a signal known
as reinforcement. One of the fundamental parts of the system of reinforcement
learning is the Q-function; it allows the agent to learn how to choose good actions and
how to measure their utility.
2 Mezaache Hatem and Abdessemed Foudil

One of the goals for the navigation of autonomous mobile robots is the avoidance of
obstacles, several techniques and methods are used for this reason [1], [2]. The
algorithm, which allows the mobile robot starting from a stop position and a final
position following a pattern set meadows. If the robot encounters obstacles in its path,
the algorithm of obstacle avoidance takes control of mobile robot. Once the path of
the robot is free shipping to the destination is taken [3]. In this article we solve the
problem of navigation with obstacle avoidance by a method of learning. For this
purpose the Q-Learning with the Q-function is generated by a network of neurons

2 Reinforcement learning

2.1 Principle

The reinforcement learning is a learning technique based on trial and error, and the
interaction between the agent and its environment [4], [5], where from a state or
situation s in the environment, the agent selects and performs an action that causes a
transition to state s'. He receives in return a reinforcement signal r, which can be a
reward or punishment. The purpose of this learning is to maximize future rewards [4].
Fig. 1 shows a state of interaction between the agent and the environment

Environment
Reinforcement
Action

Stat S

Agent
r

Fig. 1. Interaction between the agent and the environment

2.2 Q-Learning

The Q-Learning is a famous algorithm that is used for solving problems of

reinforcement learning, it was proposed in 1989 by C. J. Watkins [6], [7]. This
algorithm is based on three main functions, an evaluation function, a function of
strengthening and reinforcing function. Fig. 2 shows the general model for the
algorithms of reinforcement learning, by Q-Learning.
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 3

(Stat, Action, Q-value)

Evaluation Update Function

Function

Reinforcement

Action
Stat Reinforcement
Function

Environment

Fig. 2. The general model for the algorithms of reinforcement learning, by Q-

Learning.

2.2.1 Evaluation function

The state of the environment and the action performed by the agent are evaluated
by this function is called Q-Function
From a current state s of the environment that is observed by the agent, an action is
performed based on knowledge available within the internal memory (This
knowledge is stored in the form of utility value associated with a pair of "State
Action"). [8]

2.2.2 Reinforcement Function

After executing the action by the agent in its environment; Reinforcement function
of R, provides for each new state s', r a signal that can be a reward or punishment, this
signal usually takes a single value, 1, -1 or 0, which is used by the Update function to
adjust the Q-value function associated with the pair "State Action" [8].

2.2.3 Update Function

This function uses the value of reinforcement to adjust the value associated with
the pair "State, Action" which has just been completed [8].

The Q-Learning as a principle estimating the Q-function noted by: Q , and is defined
by:

  
Q* s, a  E rt 1  V * s   E rt 1   max Q* s' , a'
a'
 (1)

Using the equation for updating one finds that:

 
Qt1st , at   1t Qt st , at  t rt1   maxQt s' , a'
a'
 (2)

Or:
rt 1 Is the reinforcement received when the agent selected the "a" action in the
state "s" to move to the state "s’".
4 Mezaache Hatem and Abdessemed Foudil

 t Is a real positive:  t  0,1

In principle it should be randomly explore the environment for a large number of
iterations for the Q-Learning converges to the optimal Q - function [8], then we can
use the optimal policy defined by:

 ' s   arg max Q * s, a  (3)

a A

3 System architecture of reinforcement learning

The objective of this proposition is to have a learning system that allows a robot
that moved in an environment that is totally unknown in avoiding obstacles.
The generation of the Q-function is performed by an artificial neural network-type
MLP, where the inputs are reading sensors associated with the robot on its three
sides, giving the perception of its environment, and the vector of possible actions
(Turn Left, Forward and Turn Right). The choice of action that the agent must
perform is a function called by function selection.
Fig. 3 shows the structure of reinforcement learning based on an Artificial Neural
Network
Reinforcement
Possible
Actions Action
Q Value
Inputs ANN S. Action Environment

Sensors State

Fig. 3. The structure of reinforcement learning based on an Artificial Neural Network.

4 Generating the Q-function with an Artificial Neural Network

The generation of the Q-function can be made by a table in which each cell
corresponds to an approximation of the Q-function for a configuration of the pair state
/ action. This severely limits the size of problems we can solve; in fact, many real-
world problems such as robotics have a large space. An innovative approach to the
generation of the Q-function in the case of large spaces is to use Artificial Neural
Networks. The approximation of the Q-function is obtained, using Artificial Neural
Networks with the learning algorithm Back propagation [9], [10]
In this implementation, the Artificial Neural Networks chooses is a Multi Layer
Perceptron, which entered as the state of the environment and the possible actions, a
layer containing nh hidden neurons and one output neuron that max of the Q-function
[11]. The activation function of all neurons is the sigmoid function. Fig. 4 shows the
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 5

generation of the Q-function with Artificial Neural Network-type MLP for Q-

Learning.
State Hidden
Vector S Layer

Left
2

3
Proximity

Output Layer
detection

Forward
4

Q S , a , w 
5

6
Right

7
Possible Actions

Fig. 4. Q-function with Artificial Neural Network-type MLP for Q-Learning

4.1 Learning of Artificial Neural Networks

The learning Network is based on updating or adjusting the weight matrices of the
Neural Network using the equation of the update of the classic algorithm of Q-
Learning and the algorithm Back propagation

4.1.1 The output layer

 For the weight matrix
Q
w2 t   w2 t  1   r st , at   a S   Qst , at , wt  f S  aS   Qt arg et  (4)
w2

 For the vector of means we have

Q
b2 t   b2 t  1   r st , at   aS   Qst , at , wt  f S  aS   Qt arg et  (5)
w2

Where:
Qtarget is simplifying the equation of optimality BELLMAN which is given by the
following equation

Qt arg et  r s t , a t    max QS , a t , w (6)

f : Activation functions of neurons in the output layer.

Qs t , a t , w : Function value state / action corresponding to the action performed.
S : State of the environment.
6 Mezaache Hatem and Abdessemed Foudil

4.1.2 The hidden layer

 For the weight matrix
f (7)
w1 t   w1 t  1   r st , at   aS   Qst , at , wt S  w2 S
w1

 For the vector of means we have:

f (8)
b1 t   b1 t  1   r s t , a t   aS   Qs t , a t , wt S w S
2
w1

With:
f : Activation functions of neurons in the hidden layer.
S : State of the environment.
Qs t , a t , w : Function value state / action corresponding to the action performed.
These changes in values for the weight matrix and vector of bias are present if the
reinforcement signal is: -1 or 0.

5 Function Selection Action

The neural network allows us to generate the Q-function. The set of possible
actions is given by where:
• a1: Turn Left Action.
• a2: Forward Action.
• a3: Turn Right Action.
The selection of the action that the robot must execute is based on the policy
Exploration / Exploitation (EEP) [12], for this reason we used the greedy method (  -
greedy) which consists of choosing the greedy action with probability  and to
choose a random action with a probability, 1   .
• p  0,1 a random number.
• If p   we chooses a random action a "Exploration", where a  A of the all
actions possible.
• If p   you selected

a S   Arg max Q S , b, w  “Exploitation” (9)

b A

6 Environment

Reading the state of the environment is done through sensors that are placed on
three sides of the robot. Two on the left, two on the right and three in front. The
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 7

sensors can be used type of proximity detection. The opening angle of each sensor
varies between -π/12 and π/12. The state vector S is chosen so as to obtain
information on the existence of obstacles on three sides of the robot.
This vector is composed of seven binary variables si, i = 1,.. 7. The choice of these
variables is made in order to have any information or anything. i.e., for example if si =
1 then there is an obstacle near the robot, if si = 0 no obstacle near the robot.
Fig. 5 shows a state of the environment that gives the value of state vector S, such

as S  1,1,0,0,0,0,0 .

Fig. 5. State of the Environment.

7 Function of reinforcement

For each state in which the robot is found, an evaluation of the action done is found
by a signal known as signal reinforcement. This function of reinforcement allows the
robot to explore its environment; this signal is related to the values of sensor
measurements that indicate the presence or absence of obstacles in three directions,
left, front and right, which represent the state of the environment. The value of this
function or the signal reinforcement is given by:

 1 le robot percute un obstacle


r 0 les autres cas. . (10)
 1 le robot avance tout droit


8 Model of the Robot

The type of robot that we have chosen for the application is circular of radius R.
This robot is operated by two independent wheels separated by a distance L. Fig. 6
shows this type of robot.
8 Mezaache Hatem and Abdessemed Foudil

y
Right wheel

Left wheel L

Fig. 6. The robot Type.

This robot is characterized by the following cinematic equations:

8.1.1 The position of the robot

  k   (11)
x r k  1  x r k   vk   cos k   
 2 

  k   (12)
y r k  1  y r k   vk   sin  k   
 2 

xr (k) and yr (k) are the x and y of the robot in the landmark (Ox, Oy).

8.1.2 The orientation of the robot

 k 1   k    k  (13)

With:
 k  : Is the angular position of the robot in the landmark (Ox, Oy)

8.1.3 The speed of the robot

1 (14)
vk    r k    l k 
2
With:
 r k  : The angular velocity of the right wheel of the robot.
 l k  : The angular velocity of the left wheel of the robot.
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 9

8.1.4 The change in theta

1 (15)
 k    r k    l k 
2 L
Where:
L: is the distance between the right wheel and left wheel.

9 Algorithm Q-Learning with Artificial Neural Networks

(1)-Initialize random weights W of the neural network;

(2) Give the initial position of the robot [(Xr(0),

Yr(0), θr(0)];

For k = 1 to iteration k

(3)-Reading the St state of the environment by the seven

sensors (L, A, R);

(4) For a fixed state, calculation of the Q-function by

the neural network for the three possible actions (TL,
Ar, TR);

(5)-Determination of the optimal action is action which

is the maximum value of Q (optimal_Action  Max (Q));

(6)-Run the optimal with probability  ;or random action

with probability 1-

(7) Reading the new state St+1 of the environment

through sensors the Seven (G, A, D);

(8)-Reinforcement;

(9)-Test of Reinforcement

If r =- 1 (there is an obstacle)

(1) - Update weights and biases of the neural

network with the formula of Q-Learning.

(2) - Return to the original state.

10 Mezaache Hatem and Abdessemed Foudil

End if

(10) -    * 0.99 to decrease gradually;

End For

10 Simulation results

The proposed algorithm is implemented for a simulation for a mobile robot in a

scene or two obstacles are deferred. The use of the algorithm Q-Learning with
Artificial Neural Networks for MLP-type obstacle avoidance is the major objective of
our simulation.
The generation of the Q-function is based on the use of Artificial Neural Network-
type MLP, in which learning takes place by the algorithm of back-propagation. The
Artificial Neural Networks is characterized by ten neurons as input cells where seven
neurons present state of the environment and the three other neurons present possible
actions. The values of its input neurons are binary types (0, 1), a hidden layer which
contains seven neurons whose activation function is the sigmoid function, and the
output layer contains one neuron with activation function sigmoid. The adjustment of
weights and biases of this neurons network, that fact in a collision of the robot with an
obstacle, as it returns to its original state. The action performed by the robot is based
on the use of the greedy method. The state of the environment is obtained by the
simulation of a proximity sensor, placed on three sides of the robot. Fig. 7 shows the
environment evolution of the robot, where the reinforcement is given by a scalar
signal which is set to -1 when the robot hits an obstacle, the robot 1 when straight
ahead and 0 in all other cases.

Fig. 7. Evolution Environment of the robot.

The trajectory followed by the robot during the learning stage is presented in Fig. 8
for 2500 iterations. After learning the trajectory of the robot is presented in Fig. 9 for
2500 iterations.
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 11

Fig. 8. The trajectory followed by the robot during the learning stage for 2500 iterations.

Fig. 9. The trajectory of the robot after learning for 2500 iterations.

To test the ability of our artificial neural network proposed in Fig. 10 shows a
change of environment, where the robot moves through its environment while
avoiding obstacles.

Fig. 10. Change of environment.

Conclusion

In this paper we presented the technique of reinforcement learning where we have

chosen the Q-Learning algorithm by using artificial neural networks for the
12 Mezaache Hatem and Abdessemed Foudil

generation of Q-function this approach has been used for navigation of a mobile robot
in an unknown environment while avoiding obstacles, the results are very satisfactory,
and meet the target. This allows us to say that this approach can are used for the
navigation of mobile robot in an unknown environment.

References

[1] J. Barraquand and J.C. Latombe, "Robot Motion Planning: A Distributed Representation
Approach", The Int. Jour. of Robotics Research, Vol. 10, No. 6, 628-649 (1991).
[2] Hamid Boubertakh, Mohamed Tadjine, Pierre-Yves Glorennec," A Simple Goal Seeking
Navigation Method for a Mobile Robot Using Human Sense, Fuzzy Logic and
Reinforcement Learning", KES (1) 2008: 666-673
[3] S.T. Li and Y. C. Li, "Neuro Fuzzy Behavior-Based Control of a Mobile Robot in an
Unknown Environments", Proc. Of the 3rd Int. Conf. On Machine Learning and Cyb.,
Shangai, 26-29, August, 2004.
[4] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction,
Bradford Books, MIT, 1998.
[5] Bing-Oiang Huang. Guang- Yicao.Min Guo. Reinforcement Learning Neural Network to
the Problem of Autonomous Mobile Robot Obstacle Avoidance. Août 2005
[6] F. Abdessemed, K. Benmahammed and E. Monacelli, “ A Fuzzy-Based Reactive
Controller For Non-holonomic Mobile Robot”, Journal of Robotics and Autonomous
Systems, 47 (2004) 31-46
[7] Watkins J. C. H, Learning from Delayed Rewards, PhD Thesis, University of combridge,
England. 1989.
[8] Takanori Fukao, Takaaki Sumitomo, Norikatsu Ineyama and Norihiko Adachi. Departement
of Aapplied Systems Science, Graduate School of Engineering, Kyoto Uuniversity.1998. Q-
Learning Based on Regularization Theory to Treat the Continuous States and Actions.
[9] Claude F. Touzet. Q-Learning for Robots. 1997.
[10] Kristijan Macek, Ivan Petrovic and Nedjelko Peric. University of Zagreb A reiforcement
learning approach to obstacle avoidance of mobile robots.
[11] M. Carreras. P. Ridao and El- Fakdi. Institute of Informatics and Aapplications. University
of Girona. Spain Semi-Online Neuronal Q-Learning for real Time Robot Learning.
[12] Pierre Yves Glorennec Département d’informatique INSA de Rennes / IRISA 1999.
Algorithmes d’apprentissage pour les systèmes d’inférence floue.

Handbook of Cognition 1st Edition DR Koen Lamberts PDF Download
100% (3)
Handbook of Cognition 1st Edition DR Koen Lamberts PDF Download
81 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
How To Become A Straight-A Student - The Unconventional Strategies Real College Students Use To Score High While Studying Less (PDFDrive)
50% (6)
How To Become A Straight-A Student - The Unconventional Strategies Real College Students Use To Score High While Studying Less (PDFDrive)
315 pages
C.F.A.S. Hba1C: Cobas Integra 400 Plus Analyzer Value Sheet Ver. 1 Hba1C Reagent J 617107
No ratings yet
C.F.A.S. Hba1C: Cobas Integra 400 Plus Analyzer Value Sheet Ver. 1 Hba1C Reagent J 617107
9 pages
Q Learning
No ratings yet
Q Learning
187 pages
Facilitating Reflection: A Manual For Leaders and Educators
100% (1)
Facilitating Reflection: A Manual For Leaders and Educators
51 pages
Science 7: S7Lt-Iiij-13 Lesson 6 - Electrical Charges
No ratings yet
Science 7: S7Lt-Iiij-13 Lesson 6 - Electrical Charges
3 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit - 5 RL
No ratings yet
Unit - 5 RL
38 pages
Esraa Khaled
No ratings yet
Esraa Khaled
27 pages
Unit 5
No ratings yet
Unit 5
65 pages
Unit 5
No ratings yet
Unit 5
54 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
ML Unit-V
No ratings yet
ML Unit-V
20 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
5th Unit Notes Full File
No ratings yet
5th Unit Notes Full File
22 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 4
No ratings yet
Unit 4
56 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Essential Vs Accidential Properties
No ratings yet
Essential Vs Accidential Properties
12 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
Learning Task
No ratings yet
Learning Task
14 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
Lec 09
No ratings yet
Lec 09
26 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Fai Mid2 4ans
No ratings yet
Fai Mid2 4ans
4 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
Data Driven Control IEEE Paper
No ratings yet
Data Driven Control IEEE Paper
4 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Grade 6 Term 1 Data Handling Lesson 1 2
50% (2)
Grade 6 Term 1 Data Handling Lesson 1 2
4 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Type of Research and Research Process
No ratings yet
Type of Research and Research Process
75 pages
Elements of Experiential Consumption
No ratings yet
Elements of Experiential Consumption
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
15 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Unit 1
No ratings yet
Unit 1
18 pages
37 RL
No ratings yet
37 RL
18 pages
CSTP 1 Hoover Warther 4
No ratings yet
CSTP 1 Hoover Warther 4
8 pages
Red MAgIA - Lleva A Cabo Juegos Del Lenguaje en Inglés para Expresar Sensaciones, Emociones, Sentimientos e Ideas Sobre Las Familias y La Escuela.
No ratings yet
Red MAgIA - Lleva A Cabo Juegos Del Lenguaje en Inglés para Expresar Sensaciones, Emociones, Sentimientos e Ideas Sobre Las Familias y La Escuela.
4 pages
ML Module 5 2
No ratings yet
ML Module 5 2
32 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Loading Progress
No ratings yet
Loading Progress
1 page
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Developing Your Literature Review (Session 3 Reading 2)
No ratings yet
Developing Your Literature Review (Session 3 Reading 2)
13 pages
Mindful and Learner-Centered Syllabus Toolkit
No ratings yet
Mindful and Learner-Centered Syllabus Toolkit
11 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Daily A3 Flex Filling Machine
No ratings yet
Daily A3 Flex Filling Machine
2 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
34 pages
Engenharia de Prompt (Roadmap - SH)
No ratings yet
Engenharia de Prompt (Roadmap - SH)
1 page
Whether or Not We Are Aware of It, Over Recent Years Artificial Intelligence (AI) Has Become An Integral Part of Our Lives
No ratings yet
Whether or Not We Are Aware of It, Over Recent Years Artificial Intelligence (AI) Has Become An Integral Part of Our Lives
8 pages
Sink and Float Lesson Plan
No ratings yet
Sink and Float Lesson Plan
6 pages
Neural Networks Reinforcement Learning
No ratings yet
Neural Networks Reinforcement Learning
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Inf1520 2013summary-V2
No ratings yet
Inf1520 2013summary-V2
45 pages
C.F.A.S. Hba1C: English System Information
100% (1)
C.F.A.S. Hba1C: English System Information
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Difference Between Classical Conditioning and Operant Conditioning
No ratings yet
Difference Between Classical Conditioning and Operant Conditioning
15 pages
Energy Conversion and Transport: George G. Karady & Keith Holbert
No ratings yet
Energy Conversion and Transport: George G. Karady & Keith Holbert
44 pages
Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
No ratings yet
Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
31 pages
Manlabang National High School English: School Subject Area Grade Level
No ratings yet
Manlabang National High School English: School Subject Area Grade Level
3 pages
Application of Neural Q-Learning Controllers On The Khepera II Via Webots Software
No ratings yet
Application of Neural Q-Learning Controllers On The Khepera II Via Webots Software
8 pages
C.F.A.S. Hba1C: Value Sheet Ver.1 Cobas Integra 800 Analyzer
No ratings yet
C.F.A.S. Hba1C: Value Sheet Ver.1 Cobas Integra 800 Analyzer
8 pages
Fumman Agric. Product PLC A3 Flex Filling Machine Weekly Care Checklist
100% (1)
Fumman Agric. Product PLC A3 Flex Filling Machine Weekly Care Checklist
2 pages
2a NDIS Childhood Services Brochure
No ratings yet
2a NDIS Childhood Services Brochure
3 pages
Tina-Quant HbA1c Fact Sheet
100% (1)
Tina-Quant HbA1c Fact Sheet
2 pages
Reflective Portfolio Rubric 2019 Assessment 1
No ratings yet
Reflective Portfolio Rubric 2019 Assessment 1
2 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Progetto Del Controllo Di Macchine Automatiche Per L'impacchettamento
No ratings yet
Progetto Del Controllo Di Macchine Automatiche Per L'impacchettamento
4 pages
A Simulator For Robot Navigation Algorithms: Michael A. Folcik and Bijan Karimi
No ratings yet
A Simulator For Robot Navigation Algorithms: Michael A. Folcik and Bijan Karimi
6 pages
Reinforcement Learning by Comparing Immediate Reward: Punit Pandey Deepshikhapandey
No ratings yet
Reinforcement Learning by Comparing Immediate Reward: Punit Pandey Deepshikhapandey
5 pages
Alzheimer's Research Paper
No ratings yet
Alzheimer's Research Paper
5 pages
DLL - ORAL COM W2 - Jessica
No ratings yet
DLL - ORAL COM W2 - Jessica
4 pages
1 Speaking C2 Answer Mark Sample
No ratings yet
1 Speaking C2 Answer Mark Sample
5 pages
Lesson Plan: Susan B. Anthony
No ratings yet
Lesson Plan: Susan B. Anthony
4 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Project Work Y8 Global Perspective
No ratings yet
Project Work Y8 Global Perspective
2 pages
HSK考试
No ratings yet
HSK考试
3 pages
Deep Reinforcement Learning - Guide To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning - Guide To Deep Q-Learning
1 page
22B QS001B en P PDF
No ratings yet
22B QS001B en P PDF
18 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Butch V. Costales 7-D Cotabato St. Bago Bantay Quezon City Philippines Contact No.: 09164335663/2390076
No ratings yet
Butch V. Costales 7-D Cotabato St. Bago Bantay Quezon City Philippines Contact No.: 09164335663/2390076
2 pages
Lec 1
No ratings yet
Lec 1
12 pages
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet
Ethics Mini Project 2014
No ratings yet
Ethics Mini Project 2014
32 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Effective Communication Skill PPT at Bec Doms Mba
100% (4)
Effective Communication Skill PPT at Bec Doms Mba
16 pages
College Preparation
No ratings yet
College Preparation
17 pages
Insert TRIGL 0020767107322COIN V11 en
No ratings yet
Insert TRIGL 0020767107322COIN V11 en
4 pages
C.F.A.S. Hba1C: English System Information
No ratings yet
C.F.A.S. Hba1C: English System Information
2 pages

Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks

Uploaded by

Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks

Uploaded by

Simulation of the navigation of a mobile robot by the Q-

Learning using artificial neuron networks

Mezaache Hatem and Abdessemed Foudil

University Hadj Lakhdar of Batna-Algeria-Faculty of Engineering Department of

Key words: Reinforcement learning, Q-Learning, Q-Function, Artificial Neural

Learning is a process to improve the performance of a system based on its past

Fig. 1. Interaction between the agent and the environment

The Q-Learning is a famous algorithm that is used for solving problems of

(Stat, Action, Q-value)

Evaluation Update Function

Fig. 2. The general model for the algorithms of reinforcement learning, by Q-

2.2.1 Evaluation function

2.2.2 Reinforcement Function

2.2.3 Update Function

Using the equation for updating one finds that:

 t Is a real positive:  t  0,1

 ' s   arg max Q * s, a  (3)

3 System architecture of reinforcement learning

Fig. 3. The structure of reinforcement learning based on an Artificial Neural Network.

4 Generating the Q-function with an Artificial Neural Network

generation of the Q-function with Artificial Neural Network-type MLP for Q-

Fig. 4. Q-function with Artificial Neural Network-type MLP for Q-Learning

4.1 Learning of Artificial Neural Networks

4.1.1 The output layer

 For the vector of means we have

Qt arg et  r s t , a t    max QS , a t , w (6)

f : Activation functions of neurons in the output layer.

4.1.2 The hidden layer

 For the vector of means we have:

5 Function Selection Action

a S   Arg max Q S , b, w  “Exploitation” (9)

Fig. 5. State of the Environment.

 1 le robot percute un obstacle

8 Model of the Robot

Fig. 6. The robot Type.

This robot is characterized by the following cinematic equations:

8.1.1 The position of the robot

8.1.2 The orientation of the robot

8.1.3 The speed of the robot

8.1.4 The change in theta

9 Algorithm Q-Learning with Artificial Neural Networks

(1)-Initialize random weights W of the neural network;

(2) Give the initial position of the robot [(Xr(0),

(3)-Reading the St state of the environment by the seven

(4) For a fixed state, calculation of the Q-function by

(5)-Determination of the optimal action is action which

(6)-Run the optimal with probability  ;or random action

(7) Reading the new state St+1 of the environment

(1) - Update weights and biases of the neural

(2) - Return to the original state.

(10) -    * 0.99 to decrease gradually;

The proposed algorithm is implemented for a simulation for a mobile robot in a

Fig. 7. Evolution Environment of the robot.

Fig. 10. Change of environment.

In this paper we presented the technique of reinforcement learning where we have

You might also like