0% found this document useful (0 votes)

58 views20 pages

Q-Learning For Adaptive PID Control of A Line Follower Mobile Robot

Uploaded by

aldigratiasp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views20 pages

Q-Learning For Adaptive PID Control of A Line Follower Mobile Robot

Uploaded by

aldigratiasp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322930011

Q-Learning for Adaptive PID Control of a Line Follower Mobile Robot

Technical Report · December 2017

CITATIONS READS

2 6,835

1 author:

Canberk Suat Gurel

University of Maryland, College Park
10 PUBLICATIONS 13 CITATIONS

SEE PROFILE

All content following this page was uploaded by Canberk Suat Gurel on 05 February 2018.

The user has requested enhancement of the downloaded file.

2017

Q-Learning for Adaptive

PID Control of a Line
Follower Mobile Robot
FINAL PROJECT REPORT
CANBERK SUAT GUREL
Canberk Suat Gurel - 115595972

Abstract

In the field of mobile robotics, wheeled robots are widely used due to their advantages over
their legged counterparts. PID controller is a model-free control strategy that is often
implemented in the low-level control of the mobile robot control hierarchy. Although PID
controllers are the most employed controllers in robotics, their performance is greatly
dependent on how well the PID controller parameters are tuned. In the late 1970s and early
1980s, when the analog controllers were replaced by digital controllers, one of the
advantages that came with the digital controllers was the auto-tuning function [1]. Existing
auto-tuning technologies are based on the step response analysis and frequency response
analysis [1]. These kind of classical controller-tuning techniques are not suitable when the
system is not fully known and the operation conditions are variable [2]. The field of machine
learning promises an alternative approach to auto-tuning of PID controllers, which allows the
PID control parameters to be tuned even when the system is not fully known and operation
conditions are variable. In this paper, a Q-learning algorithm for adaptive PID control of a line
following mobile robot is presented. The robot is modeled in the Simulink environment and a
set of experiments were undertaken on the modeled test track. The experiment results give
an idea about the advantages of machine learning approach to optimize the control of
robotic systems compared to systematic manual-tuning approach.

1|Page
Canberk Suat Gurel - 115595972

Table of Contents

Abstract................................................................................................................................. 1
Table of Contents.................................................................................................................. 2
1. Introduction .................................................................................................................... 2
2. Related Work ................................................................................................................. 4
3. Approach ....................................................................................................................... 5
4. Implementation .............................................................................................................. 6
5. Results ........................................................................................................................... 7
6. Analysis ......................................................................................................................... 9
7. Conclusions ................................................................................................................. 10
8. Future Work ................................................................................................................. 10
9. References .................................................................................................................. 11
10. Appendix .................................................................................................................. 12
Experiment Results ......................................................................................................... 12
Code 1: Initialization Parameters ..................................................................................... 16
Code 2: PID Implementation............................................................................................ 17
Code 3: Q-learning Implementation ................................................................................. 18
Code 4: Extraction of PID parameters from the Q value .................................................. 18

1. Introduction

2|Page
Canberk Suat Gurel - 115595972

Wheeled mobile robots have a number of advantages compared to their legged counterparts
such as structural simplicity, energy efficiency, high locomotion speed and low cost of
manufacturing. One of the most widely used design is the two-wheel differential drive robot
that is composed of two-actuated wheels and one passive caster wheel, which statistically
balances the robot [3]. This particular design is advantageous in terms of maneuverability as
it can rotate on the spot when equal and opposite angular speeds are applied to the two
wheels [4].

The control hierarchy of wheeled mobile robots is often categorized as high-level and low-
level. In high-level control, one of the three major control paradigms, (e.g. hierarchical,
reactive, and hybrid) are applied to undertake a motion task such as point-to-point motion,
path following, trajectory tracking, obstacle avoidance, and wall following. The hierarchical
control architecture requires a complete world model to plan an action based on the sensor
data. Due to its high computation requirement, the hierarchical control architecture is slower
to response. The reactive (i.e. reflexive) control architecture does not have a planning stage;
it executes an action based on the sensor data hence it is quicker to response.

In practice, even two identical motors have a different response to the same applied voltage,
i.e. the differential drive robot does not go straight when the same PWM applied to motors.
Evidently, if the motors do not operate as anticipated by the robot, any task that requires the
navigation of the robot becomes pointless. The low-level controller is a closed loop feedback
controller, which measures the angular velocities of the two motors and applies the
necessary control law to match the operating regions of the two motors. As a result, the
robot navigates with a finer accuracy.

Line following task is an example of reactive control, which can be implemented to a

wheeled mobile robot using a PID controller where a number of line sensors detect the
position of the robot with respect to line and the necessary control action is applied to wheels
to keep the robot on the line. PID controllers do not require a model of the system, they are
simple to implement and the computation speed is relatively fast. However, the performance
of the PID controller is greatly dependent on the choice of controller parameters. There are a
number of controller tuning rules proposed in the classic control theory literature,
nevertheless, these techniques are not suitable when the system is not fully known and the
operation conditions are variable.

Reinforcement learning is a powerful approach when it is applied to robotics as it enables

the robot to discover an optimal behavior through trial-and-error interactions with the
environment [5]. Adaptive PID controllers based on the reinforcement learning approach

3|Page
Canberk Suat Gurel - 115595972

have a great potential to resolve the problem of controller tuning for mobile robots operating
in environments that are not fully known or in varying conditions [2].

This report is organized as follows, in Section 2, a discussion about previous related

research is presented. In Section 3, the theory behind reinforcement learning is discussed
then adaptive PID control approach that is developed to address the problem with tuning the
controller parameters are presented. In Section 4, the implementation of Q-learning
algorithm is demonstrated. In Section 5, the results obtained using manual PID tuning and
the results obtained by tuning the controller parameter with the proposed Q-learning
algorithm are presented. In Section 6, a discussion about why these results might have been
obtained is discussed. In Section 7, a brief summary of the project and a few final remarks
are given. The paper is concluded with Section 8 where a number of future work that can
further improve the project are presented.

2. Related Work

In the work of [2], an incremental Q-learning strategy is proposed for online learning of the
optimal PID controller parameters of wheeled mobile robots. In the learning process, a
temporal memory was used, which remains invariant while a specialization process is
carried out to seek for the learning spaces of states and actions to reduce the computational
load and mitigate the curse of dimensionality of the algorithm. Their experiment results show
that the proposed strategy is successfully implemented in a real world robotic system.

In the work of [6], a policy gradient reinforcement learning method is proposed to tune the
parameters of three low-level PID controllers of an omni-directional mobile robot. Their
computer based experiment results demonstrate that the robot learned to follow a given
path. In the work of [7], a similar approach was implemented to maximize the locomotion
speed of a legged mobile robot by finding a set of optimal leg motion parameters. Their
experiment results show that the robot achieved the fastest gait after 3 hours of training.

In the work of [8], a Q-learning algorithm was used to tune the PID controller parameters of
soccer robots then these robots were subjected to a set of experiments against robots that
are tuned by using the Ziegler-Nicholas method. The experiment results demonstrated that
the robots that are tuned using reinforcement learning with Q-learning method had a 1.5
times faster response time and better stability. The downside of this implementation is that
the entire action domain and the finite state transitions needs to be provided to the robot as
a priori information.

4|Page
Canberk Suat Gurel - 115595972

3. Approach

The reinforcement learning approach allows the robot to learn a control policy through the
interactions between the agent and its environment. Figure 1 demonstrates the framework of
reinforcement learning, where an agent takes a series of actions 𝐴𝑡 , each of which
generates a reward 𝑅𝑡 and a new state 𝑆𝑡+1 [9].

Figure 1: The reinforcement learning framework, taken from [9].

In 1989, Watkins presented Q-learning in his thesis [10] and provided a proof that it
converges to the optimal value function. Q-learning is a form of temporal difference learning
that is a model-free reinforcement learning method that combines elements of dynamic
programming and Monte Carlo estimation [11]. Q-learning uses the state-action value

to directly approximate the state-action function

Then, the optimal policy can be obtained through

without the need for a model of system dynamics and state transition probability.

The mobile robot (i.e. the agent) interacts with the track by taking an action 𝑘𝑡 , which in our
case is a set of controller parameters that the PID controller uses to compute a control action
𝑢𝑡 . The control action is then used to adjust the PWM signal applied to the motors and the
system evolves from state 𝑥𝑡 to state 𝑥𝑡+1. As a result, the agent receives a numerical
signal, 𝑟𝑡 so-called reward, which indicates how good/bad the action taken at state 𝑥𝑡 was.
The main objective of the agent is to learn the optimal policy 𝜋 ∗, which defines the set of
optimal control parameters 𝑘𝑡 for different states 𝑥𝑡 that maximize the reward received over
time.

5|Page
Canberk Suat Gurel - 115595972

𝑘𝑡 = 𝜋 ∗ (𝑥𝑡 )

Figure 2 shows the adaptive PID control architecture that was used to find the optimal set of
PID controller parameters for a given state and PWM value.

Figure 2: Adaptive PID control architecture, adapted from [2]

4. Implementation

A virtual mobile robot is modeled in the Simulink environment by deriving the forward
kinematic equations describing the position and orientation of the robot in the XY plane in
terms of the angular velocity of each motor. In order to make the model more realistic, a
mathematic model of the motors are included and the motor parameters (e.g. internal
resistance, torque constant, back EMF constant) can be adjusted in Code 1 (Appendix).

A virtual test track is created in Simulink

to visualize the motion of the mobile robot
during both the training stage and the
testing stage. Figure 3 shows the virtual
environment that is created. It can be
seen that the test track consists of a
ramp, a half circular turn (of radius 0.5m)
and two quarter circular turns (of radius
0.3 m). When the robot is climbing the
ramp, the resistant torque of the motors
are increased and the robot slowed down
to achieve more realistic results.
Figure 3: The virtual environment created in Simulink

6|Page
Canberk Suat Gurel - 115595972

In the model, the angular velocity of the two wheels are measured by encoders and then
using the forward kinematic equations the position of the center of two motors and the
direction of the robot are calculated. Since the relative position of the 4 line sensors that are
located in front of the robot are known they are assigned with an analogue signal in the
range of 10 to 510 depending on their closeness to the line. The sensors that output a signal
greater and lower than 500 is considered HIGH and LOW, respectively. Then, depending on
the HIGH and LOW sensors, the position of the robot with respect to the line is detected,
which denotes the state 𝑥𝑡 of the agent. The state 𝑥𝑡 is in fact an error signal that tells the
controller how far the robot is from the line. This error is the input to the PID controller, which
is a feedback controller that calculates a control action 𝑢𝑡 based on the chosen set of
controller parameters 𝑘𝑡 . Then, the control action 𝑢𝑡 is summed with the PWM value of the
right motor and subtracted from the PWM value of the left motor. The new PWM values are
applied to the motors to minimize the measured error signal. Code 2 (Appendix) shows the
PID controller implementation in MATLAB.

In the Q-learning algorithm, the state of the agent is determined depending on the size of
error, and a reward value is assigned. Then, the maximum Q value in the row that is denoted
by the given state is found as well as the column that correspond to the maximum Q value.
The Q table is updated using the reward, a discount factor and the maximum Q value. Q-
learning algorithm is given by Code 3 (Appendix). Initially the Q table was constructed in
such a way that the columns of the matrix was representing every different combination of
PID controller parameters. This approach turned out to be computationally inefficient due to
the large memory requirement. In order to solve this issue a different approach was
developed where the PID controller parameters are encoded inside the action 𝑘𝑡 as a 7 digit
number where different digits correspond to different controller parameters. The code that is
used to encode this representation of controller parameters is given by Code 4 (Appendix).

5. Results

As discusses earlier, the model requires a predefined PWM value, which is altered by the
PID controller to obtain the necessary change in angular velocity of the motors to keep the
robot on line. For every different predefined PWM value, the PID controller parameters need
to be retuned. A number of experiments with different predefined PWM values are
undertaken and the fastest lab time for a given predefined PWM value is recorded. Initially,
in order the test if the model works as anticipated, the PID controller is tuned manually when
PWM was set to 300. The robot completed the track in 11.29 seconds. Figures 6 and 7
(Appendix) show the results of experiment 1. The manual tuning was a quite tedious
process that took slightly under 4 hours. In the rest of the experiments, the Q-learning

7|Page
Canberk Suat Gurel - 115595972

algorithm trained the PID controller. Table 1 shows the fastest lab time achieved by the
mobile robot for the given predefined PWM value.

Predefined PWM Training Time Best Lap Time (s)

Experiment 2 500 1h 6.7

Experiment 3 600 1h 5.8

Experiment 4 600 1.5 h 5.77

Experiment 5 700 1.5 h 5.22

Table 1: Experiment result of Q-learning

Unlike the Tic-tac-toe assignment, in this case there is not a good indication of whether or
not the optimal policy has been obtained. Therefore, the experiments are timed and the
simulation is terminated at the end of allocated time slot. In experiment 4, the robot achieved
a better timing when it is trained for 30 minutes more compared to experiment 3, indicating
that the robot did not achieve an optimal policy in experiment 3. Figures 8-13 (Appendix)
show the error and XY coordinates of the mobile robot in experiments 2-4.

Figure 4: Error signal of experiment 5

Figure 4 shows the error signal of the fastest version of robot in experiment 5 when PWM
was set to 700. It can be seen that the error signal peaked 4 times when the robot reached a
turn in the test track. Figure 5 shows the XY coordinates of the robot in experiment 5. The
red arrow shows an overshoot when the robot reached at the last corner. The experiments
conducted with a predefined PWM value that is greater than 700 did not result with a
successful run. The possible reason for that is explained in Analysis section.

8|Page
Canberk Suat Gurel - 115595972

Figure 5: XY coordinates of experiment 5

6. Analysis

The experiments showed that the implementation of the Q-learning algorithm was successful
as the Q-learning developed a robot that completed the track 2.16 times faster in a period
that was 2.5 hours shorter compared to the manual tuning method.

In the experiments, a 10-bit motor driver board was used to apply the PWM signal to the
motors hence any PWM value that is greater than 210 = 1024 is truncated to 1024. As it was
stated in Results section, when the PWM value was set to 800 or greater, the controller
output (control action is 𝑢𝑡 forced to be between 0 and 224. However when the PWM was
700, the control action could take a value between 0 and 324 and hence it had a greater
authority over the angular velocities of motors.

Evidently, when the predefined PWM value is increased the robot traverses faster in the
straight line and therefore the controller must be more robust to supply a rapid change in
angular velocity when the robot reaches a corner at the end of the straight line.

The experiments showed that when PWM was 800 or greater even if the controller selected
the maximum value of the control action it was not enough of a change in angular velocity of
the motors to keep the robot on the line. Consequently, all of the experiments resulted with
the robot going out of bounce.

9|Page
Canberk Suat Gurel - 115595972

7. Conclusions

In this project, a model free reinforcement learning method was used to create an adaptive
PID controller for a wheeled mobile robot to undertake the line following task. The proposed
approach aimed to provide an alternative to manual and auto-tuning of PID parameters by
making use of a machine learning algorithm, Q-learning, which is among the most commonly
used and well-known reinforcement learning algorithms [11]. The experiment results proved
that the Q-learning algorithm has successfully tuned the PID controller parameters of the
mobile robot that was modeled in the Simulink environment. Hence, it can be said that the
project has fulfilled the objectives that were proposed in the initial project proposal.

8. Future Work

Even though this project has successfully met its predefined objectives, it can still be
improved by continuing to work on the implementation of machine learning algorithms to
optimize the control of robotic systems. A few possible paths to be followed in the future are
suggested as follows:

In the paper [2], Carlucho et al. presented a set of real world experiments by implementing
an incremental Q-learning algorithm in a widely used differential drive mobile robot, Pioneer
3AT. In our project, a similar approach may be applied in future to engineer a physical
mobile robot that self-tunes its PID controller parameters for the line following task.

Tuning PID parameters by Q-learning is just one of the methods of auto-tuning. Some of the
other auto-tuning methods are empirical tuning (e.g. Ziegler–Nichols), model-based tuning,
optimal tuning, and robust tuning. In the future, these methods may also be implemented to
tune the PID controller parameters and the results may be compared against the tuning by
Q-learning method.

As it was briefly mentioned earlier, due to the exponential increase in the requirement for
computational power and memory, the action encoding method was introduced to encode
the PID parameters in a 7 digit number. Although this method ameliorated the memory and
computation power requirements it only allowed the controller parameters to be incremented
by one. As a result, potentially a number of combinations of controller parameters that could
have resulted with the robot reaching the end of the track are not tested. In the future, a
more powerful PC may be used for achieving a finer step size or the Q-FA approach may be
used to replace the Q-table method.

10 | P a g e
Canberk Suat Gurel - 115595972

9. References

[1] T. Hägglund, "Autotuning," in Encyclopedia of Systems and Control, London, UK,

Springer-Verlag London, 2015, pp. 50-55.

[2] I. Carlucho, M. D. Paula, S. A. Villar and G. G. Acosta, "Incremental Q -learning

strategy for adaptive PID control of mobile robots," Expert Systems With Applications,
vol. 80, pp. 183-199, 2017.

[3] B. Siciliano and O. Khatib, Handbook of Robotics, Springer , 2007.

[4] G. Oriolo, "Wheeled Robots," in Encyclopedia of Systems and Control, London,

Springer-Verlag London, 2015, pp. 1548-1554.

[5] J. Kober, J. A. Bagnell and J. Peters, "Reinforcement Learning in Robotics: A Survey,"

Internationational Journal of Research, vol. 32, no. 11, pp. 1238-1278, 2013.

[6] A. Gloye, C. Goktekin, A. Egorova, O. Tenchio and R. Rojas, "Learning to Drive and
Simulate Autonomous Mobile Robots," in RoboCup 2004, Heidelberg, Springer-Verlag
Berlin, 2005, pp. 160-171.

[7] N. Kohl and P. Stone, "Policy Gradient Reinforcement Learning for Fast Quadrupedal
Locomotaion," Proceedings of the IEEE International Conference on Robotics and
Automation, pp. 2619-2624, May 2004.

[8] A. el Hakim, H. Hindersah and E. Rijanto, "Application of reinforcement learning on

self-tuning PID controller for soccer robot multi-agent system," in 2013 Joint
International Conference on Rural Information & Communication Technology and
Electric-Vehicle Technology, Bandung, Indonesia, 2013.

[9] R. Sutton and A. Barto, Reinforcement Learning: An Introduction, Cambridge,

Massachusetts, London, England: The MIT Press, 2012.

[10] C. Watkins, "Learning from delayed rewards," King's College, Cambridge, UK, 1989.

[11] P. Stone, "Q-Learning," in Encyclopedia of Machine Learning, New York City, Springer
New York, 2011, p. 819.

11 | P a g e
Canberk Suat Gurel - 115595972

10. Appendix

Experiment Results

Figure 6: Error signal of experiment 1

Figure 7: XY coordinates of experiment 1

12 | P a g e
Canberk Suat Gurel - 115595972

Figure 8: Error signal of experiment 2

Figure 9: XY coordinates of experiment 2

13 | P a g e
Canberk Suat Gurel - 115595972

Figure 10: Error signal of experiment 3

Figure 11: XY coordinates of experiment 3

14 | P a g e
Canberk Suat Gurel - 115595972

Figure 12: Error signal of experiment 4

Figure 13: XY coordinates of experiment 4

15 | P a g e
Canberk Suat Gurel - 115595972

Code 1: Initialization Parameters

% Q Learning Parameters
global Q
Q = zeros(7,100*100*1000);

previous_state = 0;
previous_reward = 0;

% Mobile Robot Hardware Parameters

wheel_distance=0.15; % wheel distance (in m)
wheel_radius=0.03; % wheel radius (in m)
u=0.1; % friction coefficient
m=0.5; % buggy mass (in kg)
G=30; % Gear ratio
ramp_angle=15; % ramp angle (in deg)
Rm = 3; % motor resistance (ohms)
Kt=0.004; % motor torque constant (Nm/A)
Ke=0.004; % motor back EMF constant (Vs/rad)

% Software Parameters
% Drive Board Parameter
Bipolar=0; % drive board selection: '0' for Unipolar, '1'
for Bipolar
% Sensor Position Parameter
Dy1=+0.07;Dy2=0.07;Dy3=0.07;Dy4=0.07; % sensor layout, y (in m)
Dx1=-0.035;Dx2=-0.015;Dx3=0.015;Dx4=0.035; % sensor layout, x (in m)
ads1=1;ads2=1;ads3=1;ads4=1; % sensor output type
selection: '0' for digital, '1' for analogue
sample_frequency=100; % sensor sample frequency

% NOTE: the higher frequency is, the slower simulation will be.

% Simulation Parameters
J1=5e-7; % motor moment of inertia
J2=m*wheel_radius^2/G^2; % buggy moment of inertia referred to motor
J=J1+J2/2; % total moment of inertia on one side

16 | P a g e
Canberk Suat Gurel - 115595972

Code 2: PID Implementation

function [PWM_LEFT,PWM_RIGHT,stop,e_now]
=PID_control(S1,S2,S3,S4,e_pre,speed_left,speed_right, KP,KI,KD)

PWM=300; %predefined PWM value

if(S1<=11 && S2<=11 && S3<=11 && S4<=11) %all sensors are LOW
PWM_LEFT=0.0; %stop the left motor
PWM_RIGHT=0.0; %stop the right motor
e_now=5;
stop=1; %terminate the simulation

else %at least one sensor is HIGH

e_now=((S1-10)+(S2-44)/5-(S3-44)/5-(S4-10))/5; %assignment of
error based on the sensor readings
propotional=e_now;
derivitive=(e_now-e_pre);
integral=e_pre+e_now;
output=KP*propotional+KI*integral+(KD)*derivitive; %control action
PWM_LEFT=PWM-output; %PWM of the Left motor
PWM_RIGHT=PWM+output; %PWM of the Right motor
stop=0;
end
end

17 | P a g e
Canberk Suat Gurel - 115595972

Code 3: Q-learning Implementation

function [KP, KI, KD, state, reward] = Qlearning(e_now, previous_state,
previous_reward)
%#codegen
global Q

%e_now needs to be an integer for the following if statements

e_now=int32(e_now);

%depending on the size of the error the state and rewards are determined
if ((e_now >= -1) && (e_now <= 1))
state = 1;reward = 7;
elseif ((e_now >= -5) && (e_now <= 5))
state = 2;reward = 6;
elseif ((e_now >= -10) && (e_now <= 10))
state = 3;reward = 5;
elseif ((e_now >= -15) && (e_now <= 15))
state = 4;reward = 4;
elseif ((e_now >= -20) && (e_now <= 20))
state = 5;reward = 3;
elseif ((e_now >= -25) && (e_now <= 25))
state = 6;reward = 2;
elseif ((e_now >= -30) && (e_now <= 30))
state = 7;reward = 1;
end

%find the max row and column of Q for given state

subMatrix = Q(state, :);
[maxQ_value, max_column] = max(subMatrix);

%Update Q matrix
Q(state, previous_state) = previous_reward + 0.9 * maxQ_value;

%find the set of actions corresponding to the max column

[KP, KI, KD] = action_decode(max_column);
end

Code 4: Extraction of PID parameters from the Q value

function [KP, KI, KD] = action_decode(max_column)

column=int32(max_column); %max_column is a 7 digit number
KP= rem(column, 100); %last 2 digits is the KP parameter
column = idivide(column, 100, 'floor');
KI = rem(column, 100); %4th and 5th digits is the KI parameter
column = idivide(column, 100, 'floor');
KD = column; %first 3 digits is the KD parameter
end

18 | P a g e

View publication stats

A Fuzzy Rule-Based Control System For Fast Line-Following Robots
No ratings yet
A Fuzzy Rule-Based Control System For Fast Line-Following Robots
8 pages
Reinforcement Learning-Based Mobile Robot Navigation
No ratings yet
Reinforcement Learning-Based Mobile Robot Navigation
22 pages
Robotics 12 00170 With Cover
No ratings yet
Robotics 12 00170 With Cover
19 pages
Manuscript
No ratings yet
Manuscript
38 pages
Jeas-D-23-00976 3
No ratings yet
Jeas-D-23-00976 3
18 pages
A New Conceptual Approach To The Design of Hybrid Control Architecture For Autonomous Mobile Robots
No ratings yet
A New Conceptual Approach To The Design of Hybrid Control Architecture For Autonomous Mobile Robots
27 pages
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
No ratings yet
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
22 pages
Autonomous Vehicle Adaptive PID
No ratings yet
Autonomous Vehicle Adaptive PID
11 pages
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
No ratings yet
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
14 pages
Control of The Mobile Robot Using C
No ratings yet
Control of The Mobile Robot Using C
10 pages
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
No ratings yet
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
15 pages
961 Fulltext 7364 2 10 20221226
No ratings yet
961 Fulltext 7364 2 10 20221226
9 pages
Dhruv Anirudh DrSandeep
No ratings yet
Dhruv Anirudh DrSandeep
21 pages
Analysis of Line Sensor Configuration For The Adva
No ratings yet
Analysis of Line Sensor Configuration For The Adva
13 pages
Benbouabdallah 2013
No ratings yet
Benbouabdallah 2013
6 pages
Adaptive Navigation in Collaborative Robots A Rein
No ratings yet
Adaptive Navigation in Collaborative Robots A Rein
20 pages
A Fuzzy Rule-Based Control System For Fast Line-Following Robots
No ratings yet
A Fuzzy Rule-Based Control System For Fast Line-Following Robots
14 pages
Yildiz2020 - Article - SlidingModeControlOfALineFollo 4
No ratings yet
Yildiz2020 - Article - SlidingModeControlOfALineFollo 4
13 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
An Overview Of: Robot That Used PID Control
No ratings yet
An Overview Of: Robot That Used PID Control
20 pages
Iros 00
No ratings yet
Iros 00
7 pages
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
No ratings yet
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
6 pages
Control of The Reference Posture of A Mobile Robot With Differential Type Wheels Using The Intermediate Point Technique
No ratings yet
Control of The Reference Posture of A Mobile Robot With Differential Type Wheels Using The Intermediate Point Technique
8 pages
Low Level Control of An Omnidirectional Mobile Robot : R. Comasolivas, J. Quevedo, T. Escobet, A. Escobet and J. Romera
No ratings yet
Low Level Control of An Omnidirectional Mobile Robot : R. Comasolivas, J. Quevedo, T. Escobet, A. Escobet and J. Romera
7 pages
Android Based Autonomous Coloured Line F
No ratings yet
Android Based Autonomous Coloured Line F
6 pages
Saud
No ratings yet
Saud
15 pages
Navigation of A Two-Wheel Differential Drive Robot in A Partially Unknown Environment
No ratings yet
Navigation of A Two-Wheel Differential Drive Robot in A Partially Unknown Environment
8 pages
Anh 2019
No ratings yet
Anh 2019
6 pages
Design of A PID Optimized Neural Networks and PD Fuzzy Logic Controllers For A Two-Wheeled Mobile Robot
No ratings yet
Design of A PID Optimized Neural Networks and PD Fuzzy Logic Controllers For A Two-Wheeled Mobile Robot
19 pages
Context and Literature Review
No ratings yet
Context and Literature Review
4 pages
Analysis of Line Sensor Configuration For The Adva
No ratings yet
Analysis of Line Sensor Configuration For The Adva
13 pages
IS and R.edited
No ratings yet
IS and R.edited
20 pages
Pp4wid 2
No ratings yet
Pp4wid 2
9 pages
Embedded Project Reportee3002
No ratings yet
Embedded Project Reportee3002
6 pages
Mobile Robot Path Tracking Using A Robust PID Controller
No ratings yet
Mobile Robot Path Tracking Using A Robust PID Controller
6 pages
Aveghseo,+05 PPEECS 17127 P
No ratings yet
Aveghseo,+05 PPEECS 17127 P
15 pages
Robot Controls Project Final Report
No ratings yet
Robot Controls Project Final Report
5 pages
IJCRT2406111
No ratings yet
IJCRT2406111
7 pages
Obstacle Avoidance Robot
No ratings yet
Obstacle Avoidance Robot
17 pages
A Neural Network-Based Navigation Approach
No ratings yet
A Neural Network-Based Navigation Approach
17 pages
Bank Statement PDF
50% (2)
Bank Statement PDF
3 pages
Beei 2022.1
No ratings yet
Beei 2022.1
8 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
4 pages
Implementation of A Line Follower Robot Using Microcontroller
No ratings yet
Implementation of A Line Follower Robot Using Microcontroller
4 pages
ControlsystemsinRobotics Review PDF
No ratings yet
ControlsystemsinRobotics Review PDF
11 pages
Q-Learning Solution
No ratings yet
Q-Learning Solution
7 pages
FL Robot
No ratings yet
FL Robot
11 pages
E Paper
No ratings yet
E Paper
8 pages
TR0918 PDF
No ratings yet
TR0918 PDF
65 pages
Mobile Robot Lyaponov
No ratings yet
Mobile Robot Lyaponov
5 pages
133 Other 470 1 2 20211003
No ratings yet
133 Other 470 1 2 20211003
9 pages
Position Control of A Mobile Robot Through PID Con
No ratings yet
Position Control of A Mobile Robot Through PID Con
7 pages
Paper Khas 2014
No ratings yet
Paper Khas 2014
13 pages
Seminar Report
No ratings yet
Seminar Report
19 pages
Video 8
No ratings yet
Video 8
16 pages
Ijret - Implementation of Pid Control To Reduce Wobbling in A Line Following Robot
No ratings yet
Ijret - Implementation of Pid Control To Reduce Wobbling in A Line Following Robot
5 pages
Day 3 - Customizing ChatGPT
No ratings yet
Day 3 - Customizing ChatGPT
44 pages
Sensor Based Motion Control of Mobile Car Robot
No ratings yet
Sensor Based Motion Control of Mobile Car Robot
5 pages
Tracking Control of A Spot Bead Welding Mobile Robot Using Camera Sensor
No ratings yet
Tracking Control of A Spot Bead Welding Mobile Robot Using Camera Sensor
7 pages
Modeling and Position Control of Mobile Robot
No ratings yet
Modeling and Position Control of Mobile Robot
6 pages
Verilog User Defined Primitives
No ratings yet
Verilog User Defined Primitives
8 pages
B.Tech Open Elective I 3rd Year (VI Semester) PDF
No ratings yet
B.Tech Open Elective I 3rd Year (VI Semester) PDF
16 pages
Smart Agriculture System
100% (1)
Smart Agriculture System
9 pages
SJ XJ Pump Manual
100% (1)
SJ XJ Pump Manual
18 pages
Contact Summary
No ratings yet
Contact Summary
19 pages
Slides Erp - SCM
No ratings yet
Slides Erp - SCM
79 pages
Bus Bus Bus
No ratings yet
Bus Bus Bus
4 pages
Ass 06
0% (1)
Ass 06
3 pages
A+ Guide To Managing and Maintaining Your PC, 6e: Motherboards
100% (1)
A+ Guide To Managing and Maintaining Your PC, 6e: Motherboards
36 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Wire Color Code Charts
No ratings yet
Wire Color Code Charts
4 pages
HP F210 User Manual
No ratings yet
HP F210 User Manual
31 pages
03 01 PatMax Logic
No ratings yet
03 01 PatMax Logic
15 pages
Geovision Hybrid Software Datasheet
No ratings yet
Geovision Hybrid Software Datasheet
6 pages
Riki Endri S (Kipas Angin Dinding Portable)
No ratings yet
Riki Endri S (Kipas Angin Dinding Portable)
10 pages
Designing and Implementing Weather Effects in Opengl: Stephen Tucker
No ratings yet
Designing and Implementing Weather Effects in Opengl: Stephen Tucker
64 pages
Flaws in Applying Proof Methodologies To Signature
No ratings yet
Flaws in Applying Proof Methodologies To Signature
19 pages
Reduced Row Echelon Form
No ratings yet
Reduced Row Echelon Form
4 pages
Fall 2023 - CS607 - 1
No ratings yet
Fall 2023 - CS607 - 1
3 pages
The Crystal World
No ratings yet
The Crystal World
41 pages
Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
6 pages
Duval
No ratings yet
Duval
9 pages
C++ Programming Task
No ratings yet
C++ Programming Task
6 pages
Find Changes Logs For A Table Using SM30 - SAP Blogs
No ratings yet
Find Changes Logs For A Table Using SM30 - SAP Blogs
7 pages
0417 s13 QP 31
No ratings yet
0417 s13 QP 31
8 pages
CV: Aditya Baliga Via
No ratings yet
CV: Aditya Baliga Via
2 pages
LEDGENTS For Building
No ratings yet
LEDGENTS For Building
1 page
ECall Letter
No ratings yet
ECall Letter
2 pages
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
From Everand
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Science of Controller Synthesis
From Everand
The Science of Controller Synthesis
Martin Braae
No ratings yet

Q-Learning For Adaptive PID Control of A Line Follower Mobile Robot

Uploaded by

Q-Learning For Adaptive PID Control of A Line Follower Mobile Robot

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Q-Learning for Adaptive PID Control of a Line Follower Mobile Robot

Technical Report · December 2017

Canberk Suat Gurel

The user has requested enhancement of the downloaded file.

Q-Learning for Adaptive

Line following task is an example of reactive control, which can be implemented to a

Reinforcement learning is a powerful approach when it is applied to robotics as it enables

This report is organized as follows, in Section 2, a discussion about previous related

Figure 1: The reinforcement learning framework, taken from [9].

to directly approximate the state-action function

Then, the optimal policy can be obtained through

Figure 2: Adaptive PID control architecture, adapted from [2]

A virtual test track is created in Simulink

Predefined PWM Training Time Best Lap Time (s)

Experiment 2 500 1h 6.7

Experiment 3 600 1h 5.8

Experiment 4 600 1.5 h 5.77

Experiment 5 700 1.5 h 5.22

Figure 4: Error signal of experiment 5

Figure 5: XY coordinates of experiment 5

[1] T. Hägglund, "Autotuning," in Encyclopedia of Systems and Control, London, UK,

[2] I. Carlucho, M. D. Paula, S. A. Villar and G. G. Acosta, "Incremental Q -learning

[3] B. Siciliano and O. Khatib, Handbook of Robotics, Springer , 2007.

[4] G. Oriolo, "Wheeled Robots," in Encyclopedia of Systems and Control, London,

[5] J. Kober, J. A. Bagnell and J. Peters, "Reinforcement Learning in Robotics: A Survey,"

[8] A. el Hakim, H. Hindersah and E. Rijanto, "Application of reinforcement learning on

[9] R. Sutton and A. Barto, Reinforcement Learning: An Introduction, Cambridge,

Figure 6: Error signal of experiment 1

Figure 7: XY coordinates of experiment 1

Figure 8: Error signal of experiment 2

Figure 9: XY coordinates of experiment 2

Figure 10: Error signal of experiment 3

Figure 11: XY coordinates of experiment 3

Figure 12: Error signal of experiment 4

Figure 13: XY coordinates of experiment 4

Code 1: Initialization Parameters

% Mobile Robot Hardware Parameters

Code 2: PID Implementation

PWM=300; %predefined PWM value

else %at least one sensor is HIGH

Code 3: Q-learning Implementation

%e_now needs to be an integer for the following if statements

%find the max row and column of Q for given state

%find the set of actions corresponding to the max column

Code 4: Extraction of PID parameters from the Q value

function [KP, KI, KD] = action_decode(max_column)

View publication stats

You might also like