Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
Abstract.
This paper presents a type of machine learning is reinforcement learning, this
approach is often used in the field of robotics. It aims to determine a control law for a
mobile robot in an unknown environment. This kind of technique applies when one
assumes that the only information on the quality of actions performed by the mobile
robot is a scalar signal which has a reward or punishment, the process of learning is to
improve the choice of actions to maximize rewards. One of the most used algorithms
for solving this problem is learning the Q-learning algorithm which is based on the Q-
function, and to ensure the generation of this latter function and the proper
functioning of the apprenticeship system using an artificial neural network as the
statements of changing environments where mobile robots have wide open spaces, the
action performed by the mobile robot in its environment is ensured by using a
selection function, this action is evaluated by a scalar signal which is -1, 0 and 1.
1 Introduction
One of the goals for the navigation of autonomous mobile robots is the avoidance of
obstacles, several techniques and methods are used for this reason [1], [2]. The
algorithm, which allows the mobile robot starting from a stop position and a final
position following a pattern set meadows. If the robot encounters obstacles in its path,
the algorithm of obstacle avoidance takes control of mobile robot. Once the path of
the robot is free shipping to the destination is taken [3]. In this article we solve the
problem of navigation with obstacle avoidance by a method of learning. For this
purpose the Q-Learning with the Q-function is generated by a network of neurons
2 Reinforcement learning
2.1 Principle
The reinforcement learning is a learning technique based on trial and error, and the
interaction between the agent and its environment [4], [5], where from a state or
situation s in the environment, the agent selects and performs an action that causes a
transition to state s'. He receives in return a reinforcement signal r, which can be a
reward or punishment. The purpose of this learning is to maximize future rewards [4].
Fig. 1 shows a state of interaction between the agent and the environment
Environment
Reinforcement
Action
Stat S
Agent
r
2.2 Q-Learning
Reinforcement
Action
Stat Reinforcement
Function
Environment
Q* s, a E rt 1 V * s E rt 1 max Q* s' , a'
a'
(1)
Qt1st , at 1t Qt st , at t rt1 maxQt s' , a'
a'
(2)
Or:
rt 1 Is the reinforcement received when the agent selected the "a" action in the
state "s" to move to the state "s’".
4 Mezaache Hatem and Abdessemed Foudil
The objective of this proposition is to have a learning system that allows a robot
that moved in an environment that is totally unknown in avoiding obstacles.
The generation of the Q-function is performed by an artificial neural network-type
MLP, where the inputs are reading sensors associated with the robot on its three
sides, giving the perception of its environment, and the vector of possible actions
(Turn Left, Forward and Turn Right). The choice of action that the agent must
perform is a function called by function selection.
Fig. 3 shows the structure of reinforcement learning based on an Artificial Neural
Network
Reinforcement
Possible
Actions Action
Q Value
Inputs ANN S. Action Environment
Sensors State
The generation of the Q-function can be made by a table in which each cell
corresponds to an approximation of the Q-function for a configuration of the pair state
/ action. This severely limits the size of problems we can solve; in fact, many real-
world problems such as robotics have a large space. An innovative approach to the
generation of the Q-function in the case of large spaces is to use Artificial Neural
Networks. The approximation of the Q-function is obtained, using Artificial Neural
Networks with the learning algorithm Back propagation [9], [10]
In this implementation, the Artificial Neural Networks chooses is a Multi Layer
Perceptron, which entered as the state of the environment and the possible actions, a
layer containing nh hidden neurons and one output neuron that max of the Q-function
[11]. The activation function of all neurons is the sigmoid function. Fig. 4 shows the
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 5
Left
2
3
Proximity
Output Layer
detection
Forward
4
Q S , a , w
5
6
Right
7
Possible Actions
The learning Network is based on updating or adjusting the weight matrices of the
Neural Network using the equation of the update of the classic algorithm of Q-
Learning and the algorithm Back propagation
Where:
Qtarget is simplifying the equation of optimality BELLMAN which is given by the
following equation
With:
f : Activation functions of neurons in the hidden layer.
S : State of the environment.
Qs t , a t , w : Function value state / action corresponding to the action performed.
These changes in values for the weight matrix and vector of bias are present if the
reinforcement signal is: -1 or 0.
The neural network allows us to generate the Q-function. The set of possible
actions is given by where:
• a1: Turn Left Action.
• a2: Forward Action.
• a3: Turn Right Action.
The selection of the action that the robot must execute is based on the policy
Exploration / Exploitation (EEP) [12], for this reason we used the greedy method ( -
greedy) which consists of choosing the greedy action with probability and to
choose a random action with a probability, 1 .
• p 0,1 a random number.
• If p we chooses a random action a "Exploration", where a A of the all
actions possible.
• If p you selected
6 Environment
Reading the state of the environment is done through sensors that are placed on
three sides of the robot. Two on the left, two on the right and three in front. The
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 7
sensors can be used type of proximity detection. The opening angle of each sensor
varies between -π/12 and π/12. The state vector S is chosen so as to obtain
information on the existence of obstacles on three sides of the robot.
This vector is composed of seven binary variables si, i = 1,.. 7. The choice of these
variables is made in order to have any information or anything. i.e., for example if si =
1 then there is an obstacle near the robot, if si = 0 no obstacle near the robot.
Fig. 5 shows a state of the environment that gives the value of state vector S, such
as S 1,1,0,0,0,0,0 .
7 Function of reinforcement
For each state in which the robot is found, an evaluation of the action done is found
by a signal known as signal reinforcement. This function of reinforcement allows the
robot to explore its environment; this signal is related to the values of sensor
measurements that indicate the presence or absence of obstacles in three directions,
left, front and right, which represent the state of the environment. The value of this
function or the signal reinforcement is given by:
The type of robot that we have chosen for the application is circular of radius R.
This robot is operated by two independent wheels separated by a distance L. Fig. 6
shows this type of robot.
8 Mezaache Hatem and Abdessemed Foudil
y
Right wheel
Left wheel L
k (11)
x r k 1 x r k vk cos k
2
k (12)
y r k 1 y r k vk sin k
2
xr (k) and yr (k) are the x and y of the robot in the landmark (Ox, Oy).
k 1 k k (13)
With:
k : Is the angular position of the robot in the landmark (Ox, Oy)
For k = 1 to iteration k
(8)-Reinforcement;
(9)-Test of Reinforcement
If r =- 1 (there is an obstacle)
End if
End For
10 Simulation results
The trajectory followed by the robot during the learning stage is presented in Fig. 8
for 2500 iterations. After learning the trajectory of the robot is presented in Fig. 9 for
2500 iterations.
Simulation of the navigation of a mobile robot by the Q-Learning using artificial neuron
networks 11
Fig. 8. The trajectory followed by the robot during the learning stage for 2500 iterations.
Fig. 9. The trajectory of the robot after learning for 2500 iterations.
To test the ability of our artificial neural network proposed in Fig. 10 shows a
change of environment, where the robot moves through its environment while
avoiding obstacles.
Conclusion
generation of Q-function this approach has been used for navigation of a mobile robot
in an unknown environment while avoiding obstacles, the results are very satisfactory,
and meet the target. This allows us to say that this approach can are used for the
navigation of mobile robot in an unknown environment.
References
[1] J. Barraquand and J.C. Latombe, "Robot Motion Planning: A Distributed Representation
Approach", The Int. Jour. of Robotics Research, Vol. 10, No. 6, 628-649 (1991).
[2] Hamid Boubertakh, Mohamed Tadjine, Pierre-Yves Glorennec," A Simple Goal Seeking
Navigation Method for a Mobile Robot Using Human Sense, Fuzzy Logic and
Reinforcement Learning", KES (1) 2008: 666-673
[3] S.T. Li and Y. C. Li, "Neuro Fuzzy Behavior-Based Control of a Mobile Robot in an
Unknown Environments", Proc. Of the 3rd Int. Conf. On Machine Learning and Cyb.,
Shangai, 26-29, August, 2004.
[4] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction,
Bradford Books, MIT, 1998.
[5] Bing-Oiang Huang. Guang- Yicao.Min Guo. Reinforcement Learning Neural Network to
the Problem of Autonomous Mobile Robot Obstacle Avoidance. Août 2005
[6] F. Abdessemed, K. Benmahammed and E. Monacelli, “ A Fuzzy-Based Reactive
Controller For Non-holonomic Mobile Robot”, Journal of Robotics and Autonomous
Systems, 47 (2004) 31-46
[7] Watkins J. C. H, Learning from Delayed Rewards, PhD Thesis, University of combridge,
England. 1989.
[8] Takanori Fukao, Takaaki Sumitomo, Norikatsu Ineyama and Norihiko Adachi. Departement
of Aapplied Systems Science, Graduate School of Engineering, Kyoto Uuniversity.1998. Q-
Learning Based on Regularization Theory to Treat the Continuous States and Actions.
[9] Claude F. Touzet. Q-Learning for Robots. 1997.
[10] Kristijan Macek, Ivan Petrovic and Nedjelko Peric. University of Zagreb A reiforcement
learning approach to obstacle avoidance of mobile robots.
[11] M. Carreras. P. Ridao and El- Fakdi. Institute of Informatics and Aapplications. University
of Girona. Spain Semi-Online Neuronal Q-Learning for real Time Robot Learning.
[12] Pierre Yves Glorennec Département d’informatique INSA de Rennes / IRISA 1999.
Algorithmes d’apprentissage pour les systèmes d’inférence floue.