Adaptive Dynamic Programming Based Linear Quadratic Regulator Design For Rotary Inverted Pendulum System
Adaptive Dynamic Programming Based Linear Quadratic Regulator Design For Rotary Inverted Pendulum System
Abstract
The rotary inverted pendulum system is an inherently unstable system with highly nonlinear dynamics. It is used for design,
testing, evaluating and comparing of different classical and contemporary control techniques. The goal of this project is to
design an ADP based LQR controller for the rotary inverted pendulum system. Here model-based policy iteration algorithm
is used to design the ADP based LQR controller. The swing-up and balance control is also implemented for the rotary inverted
pendulum system using ADP based LQR controller gain. The response of the rotary inverted pendulum system with
conventional LQR controller, ADP based LQR controller, swing-up and balance control is illustrated using MATLAB–SIMULINK
platform. The result obtained after comparing the ADP based LQR controller response with conventional LQR controller, the
rotary inverted pendulum system is stabilized faster with ADP based LQR controller and the swing-up and balance control
response of the rotary inverted pendulum system has also improved due to ADP based LQR controller gain.
Keywords—ADP, LQR controller, Swing up and balance control.
I. INTRODUCTION
The inverted pendulum is an inherently unstable system with highly nonlinear dynamics. This is a
system which belongs to the class of under-actuated mechanical systems having fewer control inputs
than the degree of freedom. This renders the control task more challenging, making the inverted
pendulum system a classical benchmark for the design, testing, evaluating and comparing of different
classical and contemporary control techniques. Being an inherently unstable system, the inverted
pendulum is among the most difficult systems, and is one of the most important classical problems.
The numerous practical applications of the rotary inverted pendulum system make its study
pertinent. In robotics, balancing systems are developed using inverted pendulums. These find
application in transport machines that need to balance objects, in systems that support walking for
patients, in robots that are used in domestic and industrial use and in object transport using drones.
Therefore, controlling this system is essential and throughout the years many classical control
solutions are proposed. However, for more efficient control this project proposes an ADP based LQR
controller for controlling the rotary inverted pendulum system.
H. Wang, H. Dong, L. He, Y. Shi and Y. Zhang, "Design and Simulation of LQR Controller with the
Linear Inverted Pendulum," International Conference on Electrical and Control Engineering, vol. 2, pp.
699-702, 2010 - This paper focused on modelling and performance analysis of linear inverted
pendulum and design and simulation of LQR controller. Main to introduce how to build the
mathematic model and the analysis of its system performance, then design a LQR controller in order
3221
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
to get the much better control. Simulation is done to show the efficiency and feasibility of proposed
approach [1].
F. A. Yaghmaie and S. Gunnarsson, "A New Result on Robust Adaptive Dynamic Programming for
Uncertain Partially Linear Systems," IEEE 58th Conference on Decision and Control (CDC), vol. 71, pp.
7480-7485, 2019 - This paper, presents a new result on robust adaptive dynamic programming for the
Linear Quadratic Regulation (LQR) problem, where the linear system is subject to unmatched
uncertainty. They assume that the states of the linear system are fully measurable and the matched
uncertainty models unmeasurable states with an unspecified dimension. They used the small-gain
theorem to give a sufficient condition such that the generated policies in each iteration of on-policy
and off-policy routines guarantee robust stability of the overall uncertain system. The sufficient
condition can be used to design the weighting matrices in the LQR problem and simulation example
are given to demonstrate the result [2].
Y. Liu, Y. Luo and H. Zhang, "Adaptive dynamic programming for discrete-time LQR optimal tracking
control problems with unknown dynamics," IEEE Symposium on Adaptive Dynamic Programming and
Reinforcement Learning (ADPRL), vol. 9, pp.1-6, 2014 – In this paper, an optimal tracking control
approach based on adaptive dynamic programming (ADP) algorithm is proposed to solve the linear
quadratic regulation (LQR) problems for unknown discrete-time systems in an online fashion. First, we
convert the optimal tracking problem into designing infinite-horizon optimal regulator for the tracking
error dynamics based on the system transformation. Then we expand the error state equation by the
history data of control and state. The iterative ADP algorithm of PI and VI are introduced to solve the
value function of the controlled system. It is shown that the proposed ADP algorithm solves the LQR
without requiring any knowledge of the system dynamics. The simulation results show the
convergence and effectiveness of the proposed control scheme [3].
3222
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
B. Hardware Components
The main QUBE-Servo 2 components are listed in table I. The components on the QUBE-Servo 2
USB Interface are labelled in figure 2(a), the components on the QUBE-Servo 2 Direct I/O Interface are
3223
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
shown in figure 2(b), and the components on the QUBE-Servo 2 myRIO Interface are in figure 2(c). The
interaction between QUANSER QUBE-Servo 2 components is also shown in figure 3.
TABLE I
QUBE-SERVO 2 COMPONENTS
ID COMPONENTS ID COMPONENTS
1 Chassis 11 Rotary arm hub
2 Module connector 12 Rotary pendulum magnets
3 Module connector magnets 13 Pendulum encoder
4 Status LED strip 14 DC motor
5 Module encoder connector 15 Motor encoder
6 Power connector 16 QUBE-Servo 2 DAQ/amplifier board
7 System power LED 17 SPI Data Connector
8 Inertia disc 18 USB connector
9 Pendulum link 19 Interface power LED
10 Rotary arm rod 20 Internal data bus
3224
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
3225
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
1) DC Motor
The QUBE-Servo 2 includes a direct-drive 18V brushed DC motor. The motor specifications are
given in table II.
2) Encoder
The encoder used to measure the angular position of the DC motor and pendulum on the QUBE-
Servo 2 is a single ended optical shaft encoder. It outputs 2048 counts per revolution in quadrature
mode (512 lines per revolution). A digital tachometer is also available for angular speed in counts/sec
on channel 14000.
The encoder used to measure the angular position of the DC motor and pendulum on the QUBE
is the US Digital E8P-512-118 single-ended optical shaft encoder. The complete specification sheet of
the E8P optical shaft encoder is given in E8P Data Sheet.
The QUBE-Servo 2 includes an integrated data acquisition device with two 24-bit encoder
channels with quadrature decoding and one PWM analog output channel. The DAQ also incorporates
a 12-bit ADC which provides current sense feedback for the motor. The current feedback is used to
detect motor stalls and will disable the amplifier if a prolonged stall is detected.
4) Power Amplifier
The QUBE-Servo 2 circuit board includes a PWM voltage-controlled power amplifier capable to
providing 2A peak current and 0.5A continuous current (based on the thermal current rating of the
motor). The output voltage range to the load is between ±10 V.
Amplifier Input Connector
The amplifier input RCA connector on the QUBE-Servo Direct I/O Interface is shown in figure 2(b).
It is single ended and has a range of 10V. As shown in figure 3, it is connected to the amplifier command
which then drives the motor.
5) Encoder Connector
The Encoder 0 and Encoder 1 5-pin DIN connectors pictured on the QUBE-Servo Direct I/O
interface in figure 2(b) output the measurements from the motor encoder and the add-on module
(e.g., pendulum) encoder, respectively.
6) MXP Connector
The myRIO Connector A/B connector pictured on the QUBE-Servo myRIO Interface in figure 2(c)
is used to connect the amplifier command line, and encoder readings from the QUBE-Servo
components to either of the two NI myRIO MXP connectors [5].
C. System Parameters
3226
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
TABLE II
DC Motor
Vnom Nominal input voltage 18.0 V
τnom Nominal torque 22.0 mN-m
ωnom Nominal speed 3050 RPM
Inom Nominal current 0.540 A
Rm Terminal resistance 8.4 Ω
kt Torque constant 0.042 N-m/A
km Motor back-emf 0.042
constant V/(rad/s)
Jm Rotor Inertia 4.0 × 10−6
kg-m2
Lm Rotor inductance 1.16 Mh
mh Module attachment 0.0106 kg
hub mass
rh Module attachment 0.0111 m
hub radius
Jh Module attachment 0.6 × 10−6
moment of Inertia kg- m2
Inertia Disc Module
md Disc mass 0.053 kg
rd Disc radius 0.0248 m
Rotary Pendulum Module
mr Rotary arm mass 0.095 kg
Lr Rotary arm length 0.085 m
(pivot to end of metal
rod)
mp Pendulum link mass 0.024 kg
Lp Pendulum link length 0.129 m
1) DC Motor Modelling
This section summarizes how to find the equations of motion of the DC motor. The motor electrical
equation is
𝑣𝑚 (𝑡) − 𝑅𝑚 𝑖𝑚 (𝑡) − 𝑘𝑚 𝜃̇𝑚 (𝑡) = 0 (1)
where 𝑣𝑚 (𝑡) is the motor input voltage (the control input), 𝑅𝑚 is the motor electrical resistance, 𝑖𝑚 (𝑡)
is the current, 𝑘𝑚 is the back-emf constant, and 𝑣𝑚 (𝑡)is the angular position of the motor shaft (i.e.,
the inertia disc).The motor shaft equation is expressed as
3227
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
The rotary pendulum model is shown in figure 2.4. The rotary arm pivot is attached to the QUBE
Servo 2 system and is actuated. The arm has a length of r, a moment of inertia of 𝐽𝑟 , and its angle
𝜃 increases positively when it rotates counter clockwise. The servo (and thus the arm) should turn in
the CCW direction when the control voltage is positive, 𝑣𝑚 > 0.
The pendulum link is connected to the end of the rotary arm. It has a total length of 𝐿𝑃 and it
center of mass is at 𝐼 = 𝐿𝑃 /2. The moment of inertia about its center of mass is 𝐽𝑝 . The rotary
pendulum angle α is zero when it is hanging downward and increases positively when rotated CCW
[5].
The equations of motion for the pendulum system were developed using the Euler LaGrange
method. This systematic method is often used to model complicated systems such as robot
manipulators with multiple joints. The total kinetic and potential energy of the system is obtained,
then the Lagrangian can be found. A number of derivatives are then computed to yield the EOMs. The
resultant nonlinear EOM are:
(𝐽𝑟 + 𝐽𝑝 𝑠𝑖𝑛 𝛼 2 )𝜃̈ + 𝑚𝑝 𝑙𝑟 𝑐𝑜𝑠 𝛼𝛼̈ + 2𝐽𝑝 sin 𝛼 cos 𝛼𝜃̇ 𝛼̇
−𝑚𝑝 𝑙𝑟 sin 𝛼𝛼̇ 2 = 𝜏 − 𝑏𝑟 (4)
and
𝐽𝑝 𝛼̈ + 𝑚𝑝 𝑙𝑟 𝑐𝑜𝑠 𝛼 𝜃̈ − 𝐽𝑝 sin 𝛼 cos 𝛼 𝜃̇ 2 + 𝑚𝑝 𝑔𝑙 sin 𝛼 = − 𝑏𝑝 𝛼̇
(5)
3228
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
where 𝐽𝑟 = 𝑚𝑟 𝑟 2 /3 is the moment of inertia of the rotary arm with respect to the pivot (i.e. rotary
arm axis of rotation) and 𝐽𝑝 = 𝑚𝑝 𝐿2𝑝 /3 is the moment of inertia of the pendulum link relative to the
pendulum pivot (i.e. axis of rotation of pendulum). The viscous damping acting on the rotary arm and
the pendulum link are 𝑏𝑟 and 𝑏𝑝 , respectively. The applied torque at the base of the rotary arm
generated by the servo motor is
𝑘
𝜏 = 𝑚 (𝑣𝑚 − 𝑘𝑚 𝜃̇)
𝑅𝑚
(6)
When the nonlinear EOM are linearized about the operating point, the resultant linear EOM for the
rotary pendulum is defined as:
𝐽𝑟 𝜃̈ + 𝑚𝑝 𝑙𝑟𝛼̈ = 𝜏 − 𝑏𝑟 𝜃̇ (7)
and
𝐽𝑝 𝛼̈ + 𝑚𝑝 𝑙𝑟𝜃̈ + 𝑚𝑝 𝑔𝑙𝛼 = −𝑏𝑝 𝛼̇ (8)
and
1
𝛼̈ = 𝐽 (−𝑚𝑝 𝑔𝑙𝐽𝑟 𝛼 + 𝑚𝑝 𝑙𝑟𝑏𝑟 𝜃̇ − 𝐽𝑝 𝑏𝑝 𝛼̇ − 𝑚𝑝 𝑟𝑙𝜏) (10)
𝑡
Where,
𝐽𝑡 = 𝐽𝑝 𝐽𝑟 − 𝑚𝑃2 𝑙 2 𝑟 2 (11)
where x is the vector of state variables (n x 1), u is the control input vector (r x 1), y is the output vector
(m x 1), A is the system matrix (n x n), B is the input matrix (n x r), C is the output matrix (m x n) and D
is the feedforward matrix
(m x r).
For the rotary pendulum system, the state and output are defined
𝑥(𝑡) = [𝜃(𝑡) 𝛼(𝑡) 𝜃̇ (𝑡) 𝛼̇ (𝑡) ]T (14)
and
0 0 1 0
0 0 0 1
A=[ ]
0 152.0057 −10.1381 −0.5005
0 264.3080 −10.0202 −0.8702
3229
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
0
0
B=[ ]
50.6372
50.0484
1 0 0 0
0 1 0 0
C=[ ]
0 0 1 0
0 0 0 1
0
0
D=[ ]
0
0
E. Swing-Up Control
In theory, if the arm angle is kept constant and the pendulum is given an initial perturbation, the
pendulum will keep on swinging with constant amplitude. The idea of energy control is based on the
preservation of energy in ideal systems: The sum of kinetic and potential energy is constant. However,
friction will be damping the oscillation in practice and the overall system energy will not be constant.
It is possible to capture the loss of energy with respect to the pivot acceleration, which in turn can be
used to find a controller to swing up the pendulum. The nonlinear equation of motion of a single
pendulum based on the diagram in figure 5 is
where 𝛼(𝑡) is the angle of the pendulum defined as positive when rotated counter clockwise, 𝐽𝑝 is the
moment of inertia with respect to the pivot point, 𝑚𝑝 is the mass of the pendulum link, 𝑙 is the
distance between the pivot and the center of mass, and 𝑢(𝑡) is the linear acceleration of the pendulum
pivot (positive along the 𝑥0 axis).
3230
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
Since the acceleration of the pivot is proportional to current driving the arm motor and thus also
proportional to the motor voltage, it is possible to control the energy of the pendulum with the
proportional control law
𝑢 = (𝐸 − 𝐸𝑟 )𝛼̇ 𝑐𝑜𝑠𝛼 (19)
This control law will drive the energy of the pendulum towards the reference energy, i.e. 𝐸(𝑡) →
𝐸𝑟 . By setting the reference energy to the pendulum potential energy, 𝐸𝑟 = 𝐸𝑝 , the control law will
swing the link to its upright position. Notice that the control law is nonlinear because it includes
nonlinear terms (e.g. 𝑐𝑜𝑠𝛼). Further, the control changes sign when 𝛼̇ changes sign and when the
angle is ±90 degrees. For the system energy to change quickly, the magnitude of the control signal
must be large. As a result, the following swing up controller is implemented in the controller as
𝑢 = 𝑠𝑎𝑡 𝑢𝑚𝑎𝑥 (𝑘𝑒 (𝐸 − 𝐸𝑟 )sign(𝛼̇ 𝑐𝑜𝑠𝛼)) (20)
where 𝑘𝑒 is a tunable control gain and the 𝑠𝑎𝑡 𝑢𝑚𝑎𝑥 function saturates the control signal at the
maximum acceleration of the pendulum pivot, 𝑢max. The expression sign(𝛼̇ 𝑐𝑜𝑠𝛼) is used to enable
faster control switching. The control law in equation 20 finds the linear acceleration needed to swing
up the pendulum. Because the control variable in the QUBE-Servo 2 is motor voltage, 𝑣𝑚 (𝑡), the
acceleration needs to be converted into voltage. This can be done using the expression
𝑅𝑚 𝑟𝑚𝑟
𝑣𝑚 (𝑡) = 𝑢(𝑡)
𝑘𝑡
3231
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
where 𝑅𝑚 is the motor resistance, 𝑘𝑡 is the current torque constant of the motor, 𝑟 is the length of
the rotary arm and 𝑚𝑟 is the mass of the rotary arm [5].
A. LQR Controller
The LQR controller is a well-known method that provides optimally controlled feedback gains to
enable the closed-loop
stable and high-performance design of systems. The block diagram of LQR controller is shown in figure
6.
LQR controller determines the feedback law to minimize the size of the state vector in the least
time with the least control effort. The assumptions that are made while designing LQR controller are
all the states of system are well known and the system is completely controllable. The settings of a
LQR controller governing either a machine or process (like an airplane or chemical reactor) are found
by using a mathematical algorithm that minimizes a cost function with weighting factors supplied by
a human (engineer). The cost function is often defined as a sum of the deviations of key
measurements, like altitude or process temperature, from their desired values and it is given by,
∞
𝑟(𝑥, 𝑢) = ∫0 ( 𝑦 𝑇 (𝑡)𝑄𝑦 𝑦(𝑡) + 𝑢𝑇 (𝑡)𝑅𝑢(𝑡))𝑑𝑡 (21)
Where 𝑄𝑦 ∈ 𝑅 𝑝×𝑝 ≥ 0 and 𝑅 ∈ 𝑅 𝑚×𝑚 > 0 are the weight matrices that are user-prescribed. The role
of the weighting matrices Q and R is to establish a trade-off between performance and actuator effort.
This Q weighting matrix refers to the performance and R weighting matrix refers to actuator effort [6].
B. ADP Algorithm
RL is a very useful tool in solving optimization problems by employing the principle of optimality
from DP. In particular, in control systems community, RL is an important approach to handle optimal
control problems for unknown nonlinear systems. DP provides an essential foundation for
understanding RL. One class of RL methods is built upon the actor-critic structure, namely adaptive
critic designs, where an actor component applies an action or control policy to the environment, and
a critic component assesses the value of that action and the state resulting from it. The combination
of DP, NN, and actor-critic structure results in the ADP algorithms [7].
There are many schemes available in ADP to enhance the LQR controller performance. Some of the
algorithms are the model based iterative scheme, model free iterative scheme, dynamic output
feedback scheme. The model based iterative scheme requires system dynamics for producing the
3232
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
output, whereas the model free iterative scheme continuously monitors the state trajectories of the
system for producing the output and it also does not require the system dynamics [4].
The model based iterative scheme has two types which are Policy Iteration (PI) and Value Iteration
(VI). The PI algorithm requires an initially stabilizing policy 𝐾𝑜 and utilizes Lyapunov equation which
makes computation process easier. The VI algorithm will perform recursive updates on the cost matrix
𝑃𝑖 instead of solving the Lyapunov equation in every iteration. It no longer requires stable initial policy
and generally take more iterations to converge. However, both algorithms are model-based as they
require full model information (A, B, C) [4].
It is one of the computational iterative methods. The key equation in this algorithm is the
Lyapunov equation, which is easier to solve. This method essentially consists of a policy evaluation
step followed by a policy update step. The first step in this algorithm is to compute the cost 𝑃𝑖 of the
control policy 𝐾𝑖 by solving the Lyapunov equation 22. The second step is to compute an updated
policy 𝐾𝑖+1 . This PI algorithm requires an initially stabilizing policy 𝐾𝑜 . For an open-loop stable system,
the initial stabilizing policy 𝐾𝑜 can be set to zero. However, for the case of unstable systems trial and
error method should be followed for finding the initial stabilizing policy 𝐾𝑜 in PI algorithm. The steps
followed for obtaining the optimized K matrix are given below,
(A1)P + P(B1) = C1 which is similar to Sylvester equation which solves P for given A1,B1 and C1.
Where,
A1 = (𝐴 + 𝐵𝐾𝑖 )𝑇
B1 = 𝐴 + 𝐵𝐾𝑖
C1 = -(𝑄 + 𝐾𝑖 𝑇 𝑅𝐾𝑖 )
𝐾𝑖+1 = −𝑅 −1 𝐵𝑇 𝑃𝑖 (23)
3233
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
In the above figure 7, state feedback gain K determines the stability of the plant. K gain is obtained
using 2 different methods (LQR and ADP) and their performances were plotted and compared in the
section V.
The swing-up control is used for bringing the pendulum from downward position to upright
position and the balance control is used for maintaining the pendulum at upright position within a
tolerance limit. The swing-up and balance control implemented for rotary inverted pendulum system
in MATLAB-SIMULINK platform is shown in figure 8.
The swing-up control sub-block is shown in figure 9. The energy-based swing-up control is a sub-
block in swing-up control block.
3234
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
Energy-based swing-up control block shown in figure 10 will perform mathematical operations for
the given inputs and provide outputs namely pendulum energy and linear acceleration of pendulum
pivot. This control block is implemented based on the equation 20 discussed in section II.
Last sub-block is pendulum energy which will give pendulum energy as output when the pendulum
angle and its derivative are given as inputs. By using many sub-blocks, mathematical operations are
performed to obtain swing-upcontrol which will bring up the pendulum to upright position.
V. RESULTS
The QUBE-Servo 2 rotary inverted pendulum system is inherently open loop unstable and non-
linear. As seen in the left side of the figure 11, it is evident that the pendulum angle is unstable and
produces unbounded output. The optimal control is used to find the optimum controller gain to
balance the pendulum at the upright position. LQR Controller is the optimal control used to determine
the controller gain matrix K to make the inverted pendulum system closed loop stable. In the right
side of figure 11, the closed loop response of pendulum angle is plotted. The K matrix (K = [-1.0000
35.0244 -1.4474 3.0909]) obtained by LQR Control is used in closed loop Simulink model shown in
figure 7 to obtain the closed loop stable response. LQR controller based output have smoother
performance, less setting time and the overshoot depends on Q and R Cost matrices.
3235
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
3236
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
The figure 13 shows the K value based on different iterations and the figure 14 shows how the
final K value of the algorithm changes based on ‘e’ value.
The figure 15 compares the K value of LQR and ADP. Both outputs will be similar since Q and R
value are same for both methods. In LQR, A, B, Q and R matrices are required to find K matrix. In ADP,
in addition to A, B, Q, R, the initial stabilizing gain matrix K0 is also needed such that the algorithm will
improve the K value for each iteration and provides best possible K value based on given inputs and
terminate condition.
Initial stabilizing gain matrix K0 (K = [-2 80 -4 40]) has worst performance as it has more oscillations
and settles very slowly. The gain matrix K1 (K1 = [-1.25 55.980906 -1.72095 20.3323283]) obtained
after first iteration has less oscillations but it settles slowly. The gain matrix K2 (K2 = [-1.000304
40.422411 -1.221734 6.028915]) obtained after second iteration has no oscillations and settles
quickly. The gain matrix K (K = [-1.0000 20.6532 -1.0202 2.3935]) obtained after the last iteration
K has better performance and settles quickly. Thus, if the initial stabilizing gain matrix K0 is known,
then the ADP Algorithm will evaluate the policy and produces better gain matrix which have less
settling time and gives better performance compared to that of initial stabilizing matrix K0.
3237
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
The figure 14 will explain the importance of ‘𝑒’ value in the elimination step of ADP algorithm. If
the 𝑒 value is 1000, the settling time is not very low. In the below figure, it is inferred that if the 𝑒 value
is reduced, the settling time will also reduce. In the ADP algorithm, 𝑒 value should be a positive
constant and less than 1, so the range of 𝑒 is 0 to 1. Lesser the 𝑒 value, smaller the settling time. For
the QUBE-Servo 2 rotary inverted pendulum system, the smallest 𝑒 value will be 0.000000000629, any
value less than this smallest value has no impact on the K value. In the figure, the pendulum angle for
the smallest 𝑒 value has very low settling time and quicker response.
The figure 15 shows the comparison of conventional LQR and ADP based LQR controller pendulum
angle output of QUBE-Servo 2 rotary inverted pendulum system. The LQR controller has better
performance for the system with gain matrix K = [-1.0000 35.0244 -1.4474 3.0909]. In ADP algorithm,
this gain is used as initial stabilizing gain matrix. The updated K matrix obtained from ADP algorithm
K = [-1.0000 20.6532 -1.0202 2.3935] also gives better performance similar to LQR based control
response. The only difference is that there will be a slight change in overshoot and settling time. This
change has been plotted in the figure 15.
The state feedback gain matrix K value obtained from LQR and ADP methods are used in swing-up
and balance control Simulink block shown in figure 8 and their performances are compared here.
The output for initial stabilizing K matrix used in ADP algorithm(K = [-2 80 -4 40]) is shown in figure
16, 17 and 18.
In figure 16, the actual rotary arm angle takes more seconds to track the set point rotary arm
angle. In figure 17, the pendulum angle output has worst performance, more oscillations and very
slow response. As this control gain has worst performance, it takes more pendulum energy to make
the system stable. This control gain is given as initial stabilizing gain in the ADP algorithm to increase
the performance. The updated K matrix from ADP algorithm and its output performances were shown
in the figures 19, 20 and 21. The outputs for initial stabilizing K matrix are shown in below figures 16
to 18.
3238
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
Fig. 16 Desired vs System Rotary Arm Angle for Initial Stabilizing K Matrix Used in
ADP algorithm
Fig. 17 Pendulum Angle for Initial Stabilizing K Matrix Used in ADP Algorithm
Fig. 18 Pendulum Energy for Initial Stabilizing K Matrix Used in ADP Algorithm
From figure 16, 17, 18, it is inferred that the initial stabilizing K matrix produces worst performance
with more settling time and oscillations. Here the pendulum takes around 7 seconds to settle and also
it consumes more energy for settling. Thus, using this K matrix value for controlling the rotary inverted
pendulum system is not advisable.
2) ADP Based K Matrix Response
The output for updated K matrix (K = [-1.0000 20.6532 -1.0202 2.3935]) obtained by ADP
algorithm is shown in figure 19, 20 and 21.
3239
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
From figure 19, 20, 21, it is inferred that the ADP based K matrix produces best performance with
less settling but with more overshoot. It also consumes less pendulum energy to settle the pendulum
at the right position. When compared with initial stabilizing K matrix response, the updated K matrix
obtained from ADP algorithm has smooth performance and quicker response but with slightly more
overshoot. Thus, using the K matrix obtained from ADP algorithm, it is possible to obtain best
performance results for QUBE-Servo 2 rotary inverted pendulum system.
3240
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
The outputs obtained using LQR gain has similar performance when compared to updated K
matrix obtained from ADP algorithm. ADP based output has less settling time and slightly more
overshoot when compared with LQR based output. From the figures 16 to 24, it is inferred that the
initial stabilizing K matrix used in ADP algorithm has worst performance compared to other two K
matrix values.
The rotary arm angle of the plant will track the desired angle very slowly in figure 16 (initial
stabilizing K matrix), whereas it tracks very quickly in figure 19 (updated ADP K matrix). The
performance of LQR based K matrix is little slower than the updated ADP K matrix.
Comparing the pendulum angle responses for three different K matrix, it is confirmed that the
updated K matrix obtained by ADP algorithm settles very quickly but has higher peak overshoot when
compared with the responses obtained from LQR based K matrix.
VI. CONCLUSION
Thus, the ADP algorithm is trained effectively using MATLAB-SIMULINK and the updated gain
matrix K is used to obtain balance control. The simulation results presented shows that the ADP based
LQR control gives better performance and the output settles little quickly than conventional LQR
controller. The future scope is to implement the ADP based LQR controller in real time system to
achieve better performance and provide real time disturbances to analyse the efficiency of ADP based
LQR controller in real time.
3241
Nat. Volatiles & Essent. Oils, 2021; 8(5): 3221-3241
REFERENCES
[1] H. Wang, H. Dong, L. He, Y. Shi and Y. Zhang, "Design and Simulation of LQR Controller with the
Linear Inverted Pendulum," 2010 International Conference on Electrical and Control Engineering,
2010, pp. 699-702.
[2] F. A. Yaghmaie and S. Gunnarsson, "A New Result on Robust Adaptive Dynamic Programming for
Uncertain Partially Linear Systems," 2019 IEEE 58th Conference on Decision and Control (CDC),
2019, pp. 7480-7485.
[3] Y. Liu, Y. Luo and H. Zhang, "Adaptive dynamic programming for discrete-time LQR optimal
tracking control problems with unknown dynamics," 2014 IEEE Symposium on Adaptive Dynamic
Programming and Reinforcement Learning (ADPRL), 2014, pp. 1-6.
[4] S. A. A. Rizvi and Z. Lin, "Reinforcement Learning-Based Linear Quadratic Regulation of
Continuous-Time Systems Using Dynamic Output Feedback," in IEEE Transactions on
Cybernetics, vol. 50, no. 11, pp. 4670-4679, Nov. 2020..
[5] https://www.quanser.com/products/qube-servo-2/
[6] https://en.wikipedia.org/wiki/Linear%E2%80%93quadratic_regulator
[7] Huaguang Zhang, Derong Liu, Yanhong Luo, Ding Wang, “Adaptive Dynamic Programming for
Control”, Springer, Switzerland, 2013.
3242